Browser Storage Part 1: Storage Structures

April 16, 2024
Chapter 2: Browser Elements

We now move into a series of posts about elements of browser-side storage.  As discussed in my second post, there are seven forms of storage that are standard in browsers today :

  • Session Storage
  • Local Storage
  • Indexed DB
  • Web SQL Databases (WebSQL and SQLite)
  • Origin Private File System
  • Application Cache
  • BLOB URL
  • Cookies

The Privacy Sandbox has added five other storage technologies that we need to understand before even discussing how the three main products/APIs work.  These are:

  • CHIPS
  • Partitioned Storage
  • Storage Buckets
  • Shared Storage
  • Topics API Model Storage
  • Private State Tokens

My goal in all content on this site has been only to talk about Privacy Sandbox technologies as much as possible. However, these different sets of technologies - “pre-sandbox” and “post-sandbox” are not unrelated.  In fact, the Privacy Sandbox-related technologies often build on the existing technologies and APIs.  For example, even though third-party tracking cookies are going away, not all third-party cookies are going away.  Companies that use third-party session management services like Akamai or Cloudflare still need to embed these companies’ strictly-necessary and performance cookies to make use of those services.  The problem is that these companies could create cross-site user profiles if their cookies were placed in typical cookie storage, which presents a privacy risk.  So Google created a new approach to partitioned cookies that would be required for third-party cookies once 3PCD occurs.  The standard for this is called Cookies Having Independent Partitioned State or CHIPS.  

So, the way I am going to manage this complexity is similar to what I have done in previous posts.  I will provide a high-level primer on the underlying required technology and then explain the new Privacy Sandbox storage element.  

Table 1, at the end of this post, shows the various types of browser storage, the subdirectories or SQLite files where they are stored, their format, types of encryption or other protection, and how they can be accessed. 

Where is Browser Storage

As I have drilled down into this topic, I realize that a huge amount of data is stored locally in Chrome for all sorts of reasons.  Chrome has its own data to store for the browser to manage all its functions.  This includes elements like your web browsing history, your preferences, your bookmarks, and favicons for various sites, to name just a very few.  But then every extension also has its own data to store.  Sometimes they use a standard browser storage element like IndexDB. Sometimes they will use SQLite or another mechanism.  

We are not going to delve into any of this.  For the most part it is not relevant to our discussion of the Privacy Sandbox.  But for our purposes it is enough to note that the discussion in the next few posts only covers a small portion of the data that Chrome keeps locally on your hard drive.

The main directories where Chrome 123 (latest stable version) stores the data we care about are:

On Windows: C:\Users\<your user name>\AppData\Local\Google\Chrome\User Data\Default

On Mac: ~/Library/Application Support/Google/Chrome

Figure 1 is an edited tree view of the \Default directory.  Many subdirectories and files that are in \default have been edited out in order to fit the image reasonably on the page.  The highlighted items in yellow correspond to the directory elements listed in Table 1.

Figure 1 - Shortened Directory Tree of \Default Chrome Data Storage Directory

One thing that has been most surprising to me is how easy it is to access many of these forms of storage from my desktop, outside Chrome Developer Tools or the browser using code and browser extensions dedicated to that purpose.  Using Chrome Developer Tools, it is easy to see the contents of all the forms of storage when visiting a specific web page.  But sometimes I may want to see the items in storage when I am not on a specific web page and for that I need some special tool.   For example, I was able to see all my keyword search terms from the SQLite database (the history file) by using a SQLite viewer extension (Figure 2).  Now these keywords are not only what I have typed in.  They are obviously related to the pages I have visited.  I did not type in “prebid.js architecture” four times in a row.  But I did go to four pages on the prebid website after using that keyword to get there.   Also, although I don’t show it, I can see that the file contains 239,000 URLs I have visited (whoa!) and 23,071 links that I have clicked on to go to another page somewhere on the web (whether on the same site or another site) - the kind of metadata that a simple query can provide about a specific table but which Chrome Developer Tools cannot.

Figure 2 - Using a SQL Tool to See the Keywords I have Searched On

Also notice that I have included the location of some of the new storage elements that are part of the Google Privacy Sandbox - in particular, interest groups, private aggregation data (which is part of the Private Aggregation API that will be discussed later), and Shared Storage.   We will discuss these storage locations in detail as we discuss the various types of storage and again when we drill into the core products of the Privacy Sandbox.

How Much Storage Can I Use

As we have discussed, browser storage sits on the user's hard drive. So there must be limits to what a browser can store locally. Otherwise, the browser and all its associated applications could, in theory, use up all available storage and leave no room for the user's other applications/data. Alternately, a web site/application could take up so much storage in browser-specific storage as to 'crowd out' other websites/applications. Thus the browser vendors have agreed on guidelines for how much storage the various storage types in the browser can use, although there is some variation between vendors. They have also agreed on standards for how much overall storage all forms of storage can use.

Since right now we are only dealing with Chrome, I will deal specifically with its restrictions. For overall browser storage:

  • Chrome allows the browser to use up to 80% of total disk space.
  • Chrome reduces the amount of storage an origin can use in incognito mode to approximately 5% of the total disk space.
  • When a user enables "Clear cookies and site data when you close all windows" in Chrome, the storage quota is significantly reduced to a maximum of approximately 300MB.

As for the per web site/application storage limits, this gets a bit trickier. There are not web site/application storage limits, per se. There are storage limits by origin, which are known technically as storage quotas. Meaning that if my.example.com is an application and my my.example.com/secondary_application is a separate application, then both those applications are using the same storage 'bucket' (we will come back to that term in a later post). and their combined storage use is deducted from the quota of their origin. An origin can use up to 60% of the total disk space. Actually, the way it works is that any origin can use up to 75% of the 80% allocated for browser storage. 80% x 75% = 60%. (BTW, if you are really ambitious, don't forget that the Chromium project, on which Chrome is based, is open source code and you can check the limits and the way they work yourself here. But if you are that good at this, you should probably just get on with your coding work and stop reading.)

75% you say. That sounds like a huge amount of available browser storage to give to a single application. Well, it actualy doesn't work that way unless there are no other origins fighting for storage space, which pretty much never happens. Even if you have only one page open, you may still have embedded content - like a iFrame - from another origin that will take up quota for that origin. How exactly it works is well beyond the scope of this blog. But leave it to say that space is dynamically allocated based on the overall available storage on the user's computer, the storage typically used by the origin, the kinds of data being stored by the origin, and how many other origins are demanding storage space and what their typical needs are. This gets very complicated very fast. For example, a file loaded from a hard drive may only be 300 KB, but because it comes from an opaque URL (remember I told you we'd need this definition) the minimum space allocated for it for security reasons is 7MB!

If that isn't complicated enough, consider an application like an addThis or 'like' button that is embedded on the page (can be in an iFrame or not). That embedded element's storage cost is charged against the storage quota of its origin, not the web page it is on. Can you see how trying to predict the use of storage quota suddenly gets complicated if your site is the origin? The developer can't know how many browser tabs with different origins are open in that user's browser that have their app embedded. They also cannot see when their element is loaded, so they can't easily know exactly how much quota their app is using. The only way to handle this is to watch how often their application reaches its storage quota from those embedded elements and make some general estimates. If you are the coder for the application, you need to write an "exception handler" for situations where you hit your storage quota so that the application doesn't crash the site in which your application is embedded.  I guarantee you that the maker of that embedded application will soon be out of business if that happened.

Google has provided some relief for developers through their Living Storage specification and StorageManager API. This allows developers to get estimates of the use of their storage quota so that they can both understand how much quota all the aspects of their business online are using and then write exception handlers for when certain estimated storage limits are reached.

After all the long posts I’ve written, we’ll make it easy today and keep the post short.

                                                                    NEXT UP: Cookie Basics and Cookies Having Independent Partitioned State (CHIPS)

Table 1 - Summary of Browser Storage By Type

Type of Storage Subdirectory Location Description Format Encryption Other Protection Accessible By
Session Storage Held in Memory (not stored on hard drive)  This storage is specific to a single browsing session. Data is lost when you close all browser windows or tabs. Session storage data isn't saved as files in a directory. It's kept in memory during your browsing session. NA NA NA Can be viewed locally by a user using Chrome developer tools (when on the origin site).
Local Storage \Local Storage Stores key-value pairs of data specific to a website or origin. Strings None Keyed by Origin Not directly, but entries can be viewed and managed through Chrome developer tools (Storage viewer).
WebSQL Database None Stores structured data used by web applications. Deprecated in 2022 and no longer supported as of Chrome 123. SQLite
None Keyed by origin and database name. Can be accessed locally by a user using Chrome developer tools (when on the origin site) or a SQLite viewer extension like SQLite Editor and Compiler.
Indexed Database \IndexedDB Stores large, structured data sets for web applications. LevelDB None Keyed by origin and database name. Can be accessed locally by a user using Chrome developer tools (when on the origin site) or an IndexedDB viewer/export extension like indexedDB viewer.
Origin Private File System Varies by application/website A native browser storage API optimized around object/key-value pairs. OPFS provides more granular control for file operations, enabling byte-by-byte access, file streaming, and even low-level manipulations. Varies depending on the application. Can be a file or a database. Possible but not required. Keyed by origin and database name Can only be accessed via Google's File System Access API.
Application Cache
\Cache Stores temporary website files like images, scripts, and HTML for faster page loads on subsequent visits. Various (HTML, CSS, JS, images)
None None Not directly, but entries are keyed by origin and can be cleared through Chrome settings or developer tools.
BLOB URL
\blob_storage Temporarily stores Binary Large Object (BLOB) files that will not fit into memory. Google proprietary
None None Not applicable
Cookie Store \Network\ Cookies Stores website cookies containing data like user preferences, login sessions, and tracking information.

Key-

value pairs

None Keyed by origin and path. May have additional flags like HttpOnly or Secure. Accessible by the origin site. The individual user can examine cookies on their hard drive via Chrome developer tools, directly via writing some code, or installing a browser extension like Cookie Editor.
Extension Storage \Extension<br>(Subdirectories vary) Stores data specific to installed Chrome extensions. Varies (JSON, blobs, strings) Potentially (implementation-specific) Potentially (implementation-specific) Not directly. Extensions can access their own storage through their APIs.
Browsing History \default\History (file in the default directory) Stores website visit history, including URLs, timestamps, and titles. SQLite None None Can be accessed locally by a user using Chrome developer tools or a SQLite viewer extension like SQLite Editor and Compiler.
Interest Groups \default\InterestGroups (file in the default directory) Stored interest groups that the current user agent belongs to. SQLite None Keyed by origin (site visited) and the interest group's owner (e.g. a DSP) Can be viewed locally by a user using Chrome developer tools (when on the origin site) or a SQLite viewer extension like SQLite Editor and Compiler.
Private Aggregation Data \default\Private Aggregation (file in the default directory) Temporarily stores reporting data for the Private Aggregation API SQLite Keyed by Origin Unclear, but likely can be viewed locally by a user using Chrome developer tools or a SQLite viewer extension like SQLite Editor and Compiler.
Shared Storage \default\SharedStorage (file in the default directory) Allows sites to store and access unpartitioned cross-site data. This data must be read in a secure environment to prevent leakage. SQLite Unclear Keyed by Origin Unclear, but likely can be viewed locally by a user using Chrome developer tools or a SQLite viewer extension like SQLite Editor and Compiler.