Introduction
So we have finished with storage, per se. But there is one last topic to discuss that is “indirectly” related to storage - Private State Tokens. Private State Tokens are a new mechanism that is part of the Google Privacy Sandbox. They are designed to help prevent fraud and abuse on the web while preserving user privacy.
Private state tokens are a completely invisible, private way to validate that real users are visiting a web site. They allow one website or mobile app (a user agent) to validate in a privacy-compliant way that a particular user agent represents a real viewer, not a bot or other fraudulent entity. Once validated, the user agent stores the tokens so they can be used by the same or other websites or mobile applications to quickly validate the reality of the end user, rather than having to perform a completely new validation. This validation lasts as long as the lifetime of the tokens, which can be set by each website or application developer based on the particular needs of their business.
Private State tokens are intended to supplement, or replace, other validation mechanics such as a CAPTCHA or a request for PII. They are also designed to convey trust signals while ensuring that user reidentification cannot occur through issuance of the tokens themselves. As such, they are a critical part of the Privacy Sandbox.
The reason private state tokens are related indirectly to storage is that they actually have their own unique storage area on the user’s hard drive in Chrome. Moreover, they are not physically an integral part of the browser itself - not a browser ‘element’ per se. So I grouped them in the module on storage in the browser elements image. Similarly to CHIPS, however, private state tokens are their own privacy-preserving mechanic and a specific, unique topic that needs to be covered in their own right.
Private state tokens are part of a broader protocol called the Privacy Pass API. Apple has already implemented a similar technology in 2022 called Private Access Tokens, also based on Privacy Pass. I hope to discuss the Privacy Pass API, as well as the differences between Apple’s and Google’s implementation of the technology, in a future post. It is a bridge too far today given the length that this post will end up being.
Because the audience for www.theprivacysandbox.com is ad tech professionals, I am going to assume that you generally understand the concept of tokens. We discussed them a bit in the post on cookies. But if you are not familiar with tokens and how they are used in computing, here is a good introduction.
What Are Private State Tokens?
Tokens are a technical concept in computing which packages some information in a self-contained format that can be read by other computer programs. A cookie is one example of a type of token, but tokens can take numerous formats. Private state tokens are designed to enable trust in a user’s authenticity without allowing tracking. Their unique features include:
- They are encrypted. Private state tokens are encrypted in a way that makes them unique and unable to be identified as a specific user or user agent. All anyone can know is whether or not this particular requester is verified as a real person.
- They can be shared between websites without breaching user privacy. Private state tokens were designed to allow one website or app to validate that a user is “real” and place a series of private state tokens confirming that in the user’s browser or app. Later a second website can use that act of validation, contained in those tokens, to verify the user agent represents a real person without having to do their own validations and token issuance procedure.
- They are stored locally in the browser.
- They require one or more trusted issuers. Tokens are issued by trusted third parties that provide the tokens to websites. There can be as many of these as the market has room for. As of this writing there are five: hCaptcha, Polyset, Captchafox, Sec4u(authfy), Amazon, and Clearsale. A trusted issuer is likely to be a PKI certificate authority of some kind, although nothing in the specification requires that.
- They are redeemable. The act of checking that a user has a valid token is called a redemption. A token is sent from the browser to the token issuer who then verifies (redeems) the token and provides a confirmation of identity back to the website. This confirmation is in the form of a redemption record. This process occurs without the issuer being able to know anything about the identity of the user agent.
- Trusted issuers must be verified by the website requesting a redemption. The website that needs to verify the “realness” of a user must already have a relationship with a trusted issuer or must use what is known as a a key commitment service to validate the issuer. Otherwise, they have no way to trust the company redeeming the token.
- A bad acting issuer cannot identify a user. One very unique, but hugely important feature of private state tokens is that the issuer is unable to correlate its issuances on one site with redemptions on a different site. As a result, private state tokens are protected from a malicious issuer reidentifying a user and their behavior across websites.
Use Cases for Private State Tokens
With all the changes in browsers and the deprecation of third-party cookies, we are moving into a world where the browser is going to prevent websites and mobile apps from knowing or tracking any individual. An individual site may put its first-party cookies into a user agent or collect a device identifier where the consumer allows it on iOS or Android. But this is only between 20-30% of most website or mobile traffic. Tracking an individual identity across sites, especially where the user chooses to remain anonymous, will be very difficult, although given third-party identifiers like ID5 or UID2.0, not impossible.
That’s good from consumers’ perspective, and as a privacy professional I wholeheartedly agree. But perfect anonymity means fraudulent traffic is undetectable since I cannot distinguish a real person from a bot. So the Google Privacy Sandbox and similar technologies from Mozilla, Safari, Android and iOS, create a problem for identifying and measuring ad fraud. This will only get worse once Chrome deprecates third-party cookies.
Private state tokens solve a number of privacy issues inherent in today’s browser design, but they are especially useful for programmatic advertising. Their design solves many ad fraud challenges without requiring a stable, global, per-user identifier which would violate the cross-site tracking preventions inherent in the Sandbox. Some of the ad fraud use cases they can apply to are shown in Figure 1:
Figure 1 - Use Cases for Private State Tokens
What Data Does a Private State Token Contain
What data do these files contain and how do they work together to deliver information to allow the processing of private state tokens? Private state tokens contain mandatory fields and can also contain optional information (Figure 2).
Figure 2 - Types of Data Carried By Private State Tokens
This data is used in two core mechanics - token issuance and token redemption. There are other mechanics like versioning tokens to the latest standard and reissuance, among others, but we won’t delve into them in this post.
Issuing Private State Tokens
Figure 3 shows the mechanics of private state token issuance. You show use this and Figure 4 as references to follow the step-by-step text below, which admiittedly can be a bit dense and thus cause you to have to slow down to have to take it in.
Figure 3 - The Private State Token Issuance Process
- Step 1: The browser requests a document from a website.
- Step 2: The website responds by delivering the document. Along with the document, the website returns a token challenge in its response header:
WWW-Authenticate: PrivateAccessToken challenge=abc..., token-key=123..., issuer-key=456...
- Step 3: The browser checks for available tokens. If there are no tokens from any provider the browser requests user attestation from the website. Attestation is a process which determines if the user is real. Attestation could involve using a CAPTCHA, for example.
- Step 4: The website performs attestation and, if attestation is positive, sends that notification to the browser. Otherwise the user agent is considered invalid/fraudulent and the whole process stops right there. Kinda obvious.
- Step 5: If the website can attest to the reality of the user, it then needs to send a request for token issuance to an issuer. But to do that, it needs to trust that it is making the request from the issuer it expects and that the issuer is a valid/attested issuer of private state tokens.
The Private State Token API uses a mechanic to establish trust with the unknown issuer called key commitments. A key commitment is a cryptographic assurance provided by the issuer that includes the public keys and associated metadata used for token issuance and redemption. This ensures that all clients interacting with the issuer can verify the authenticity and integrity of the tokens.
Key commitments serve several purposes:- Transparency: Key commitments provide a mechanism for clients to fetch and verify the issuer's keys before engaging in token transactions.
- Consistency: Key commitments ensure that all clients receive the same set of keys, preventing malicious issuers from presenting different keys to different users.
- Trust: Key commitments allow clients to verify that the keys used by the issuer are legitimate and have not been tampered with.
- Key commitments depend on a key commitment service (KCS) to act as a trusted intermediary. Key commitment services verify that the key commitments clients see are identical. This ensures that the keys used by issuers are consistent and trustworthy.
Key commitments via a KCS work as follows:- Fetching Key Commitments. The client makes an anonymous GET request to the KCS endpoint, which has the form <KCS_endpoint_name>/ .well-known/privacy-pass with a message of type fetch-commitment.
struct {
opaque server_id<1..2^16-1> = server_id;
opaque commitment_id<1..2^8-1> = commitment_id;
}
- KCS Responds with Commitment List. The KCS responds with a list of key commitments, including the public key, expiry, supported methods (issuance, redemption, or both), and a signature.
struct {
opaque public_key<1..2^16-1>;
uint64 expiry;
uint8 supported_methods; # 3:Issue/Redeem, 2:Redeem, 1:Issue
opaque signature<1..2^16-1>;
} KeyCommitment;
- User Agent Verifies Key Commitments: The user agent verifies the signature of each key commitment to ensure its authenticity. It then stores the list of commitments for use in token issuance and redemption.
At this point, the user agent makes the following call to the issuer:
fetch('<issuer>/<issuance path>', {
privateToken: {
version: 1,
operation: 'token-request'
}
}).then(...);
This call kicks off the issuance request, of which there are two key preparation steps around nonces before the request is forwarded to the issuer
- Step 6: User Agent Generates Nonces. A nonce is a unique random numeric value that is often used in cryptographic applications. Nonces are used with private state tokens to ensure that each token is unique and immune to certain types of hacks, like replay attacks. Once the issuer has been validated and key commitments stored in the browser, the user agent generates a set of random nonces that are unique to each token request.
- Step 7. User Agent Blinds Nonces. The client blinds the nonces. Blinding is a cryptographic process that hides the original nonces while still allowing the server to sign them. These blinded nonces will be sent to the issuer as validation elements in the request. If the issuer sends back the same blind nonces in their response, then the user agent knows that whatever message it receives is from the issuer to whom is sent the original message. You can think of these nonces as one-time codes between two people transmitting messages that prevent a third-party from pretending to be either the sender and receiver of the messages.
- Step 8: User Agent Makes Token Issuance Request. Once the nonces are blinded, the browser forwards the token issuance request with the blind nonces included directly to the token issuer.
- Step 9: The issuer processes the token request and generates a token response, signs it with their private key, and sends it back to the browser. The response includes the previously blinded nonces
- Step 10: User Agent Unblinds the Signatures. The user agent then unwraps the issuer’s response using the appropriate public key and checks the blinded nonces. If they match what the user agent sent, then the response is valid and the user agent stores some number (n) of private state tokens in the browser private state token storage subdirectory.
Each user agent can store up to 500 tokens per top-level website and issuer combination. Also, each token has metadata informing which key the issuer used to issue it. That information can be used to decide whether or not to redeem a token during the redeemption process.
Redeeming Private State Tokens
Figure 4 shows the mechanics of private state token redemption (or failure to do so).
Figure 4 - The Private State Token Redemption Process
- Step 1: The browser requests a document from a website B, which is a different website from the one which initially generated the token issuance and storage in the browser.
- Step 2: The website responds by delivering the document. Along with the document, the website returns the same token challenge in its response header:
WWW-Authenticate: PrivateAccessToken challenge=abc..., token-key=123..., issuer-key=456...
- Step 3: The header request generates a document.hasPrivateToken(<issuer>) call that returns ‘Yes’ when it finds a token from an issuer. It does not have to be an issuer that website B has a relationship with.
- Step 4: There is a token from that issuer. Is there a redemption record for that token from that issuer on the device? If so, then the browser validates the user agent as “real” to the website and it moves forward with its ad request (Step 10)
- Step 5: Without a redemption record, the browser determines whether or not it has a direct relationship with the token issuer.
- Step 6: If the browser does not have a direct relationship with the issuer, it requests validation of the issuer through the Key Commitment Service using the same mechanic as during token issuance. Once validated, the key commitment service sends confirmation to the browser.
- Step 7: Given a valid token without a redemption record and a validated issuer, Website B sends a direct redemption request using the fetch endpoint.
fetch('<issuer>/<redemption path>', {
privateToken: {
version: 1,
operation: 'token-redemption',
refreshPolicy: 'refresh' // either 'refresh' or 'none', default is 'none'
}
}).then(...)
- Step 8: If the issuer can validate the token, it sends a redemption record back to the browser. if not, it rejects the request. The user agent then has a choice of options based on the website owner's preferences. It can choose to go through its own attestation and validation process (the most likely scenario), it can choose simply to treat the user agent as fraudulent, or it could take the risk of moving forward through its regular process without validation.
- Step 9: The browser confirms to Website B that the browser is a “real” viewer.
- Step 10: The website requests an ad from its programmatic partners.
- Step 11: A programmatic ad is delivered to Website B
- Step 12: Website B delivers the ad to the browser.
When a token is redeemed, the Redemption Record (RR) is stored on the device. This storage acts as a cache for future redemptions. There is a limit of two token redemptions every 48 hours, per device, page and issuer. New redemption calls will use cached RRs where possible, rather than causing a request to the issuer.
When I read of the restructions mentioned in the last paragraph, it made me wonder how any site could actually depend on validation. I mean, imagine a news site. I could come back to that multiple times a day. My browser could easily use up my redemption requests and therefore not be able to validate itself. The answer lies in the caching of the redemption request. Not only can user agents cache redemption records, they can also refresh them when necessary. This means that even if a user visits a site multiple times, the site can rely on cached redemption records without needing to redeem new tokens each time. This lowers the amount of validation requests the site needs to have validation via private state tokens remain effective. Should a particular user agent somehow hit its validation limit, the website can fallback to using other trust signals and mechanisms to complement token-based validation.
Another question you may ask is why is there a limit at all? While there are several privacy concerns that limiting the number of redemption requests in a time period helps ameliorate, a major one is preventing what is known as a token exhaustion attack. Token exhaustion attacks are a type of abuse where a malicious actor attempts to deplete the available tokens of a user agent or system. This can be done by repeatedly requesting tokens or by using tokens in a way that exhausts the supply, making them unavailable for legitimate use. One reason why an attacker might want to undertake a token exhaustion attack is for monetary gain. In some cases, tokens might have monetary value or be used in systems where they can be exchanged for goods or services, such as an ecommerce site. Exhausting tokens can disrupt these systems and potentially allow attackers to profit. Limiting the number of validation attempts helps reduce the likelihood of such attacks.
How Do Private State Tokens Differ from Third-Party Cookies?
While it may seem obvious, given that both third-party cookies and private state tokens are used to detect ad fraud it is worth calling out how private state tokens differ from cookies and why they are better for fraud detection in a world where consumer privacy is key. The table in Figure 5 provides a summary of those differences
Figure 5 - Differences Between Third-Party Cookies and Private State Tokens
Where are Private State Tokens Stored
Private state tokens are stored in the C:\Users\<username>\AppData\Local\Google\Chrome\User Data\TrustTokenKeyCommitments directory in Windows. The reason for this directory name is that private state tokens used to be called trust tokens. WIthin that directory there is at least one subdirectory (and there may be more, but I haven’t had enough usage yet to have more than one). Mine is named for a date 2024.6.20.1. I thought this might be a temporary subdirectory that held data only for one session, but looking back over many days the folder is still there. So this is a more permanent directory. The directory date seems to relate to the manifest version, which is 2024.6.20.1, as shown in the manifest.json file (Figure 6). But how they are related is unclear.
Figure 6 - Contents of manifest.json (with the version date highlighted)
Within this subdirectory are four files and a subdirectory:
- keys.json
- manifest.fingerprint
- manifest.json
- a license file.
- \_metadata: subdirectory
- verified_contents.json within the \_metadata subfolder.
One of the first things to notice is that three of the four files in the directory have dates of “12/31/1979”. That can’t be a real date. After all, the Internet did not even exist until 1990, when Tim Berners-Lee set up his server at CERN (which, BTW, I got to see first-hand on a trip to CERN to visit my son in 2015. Almost felt like I should genuflect to the thing.). Chrome 1.0 was not released until 2008. It turns out this is a known bug with certain files in Chrome that has not been fixed due to it being a low priority.
The manifest.json file is obviously a “meta” file containing the version of the manifest, its name, and version. This file, I am almost certain, is used by the browser to interpret which version of the Private State Token code is being used and whether it needs to be updated. Manifest files are usually used to indicate the version of a web application or PWA (Progressive Web App) and whether there are updates to the PWA that need to be fetched and applied. This use is defined in the Web Application Manifest specification, which, frankly, I was completely unaware of until I wrote this post. I believe that is what is happening in this case.
The license file appears to be the user license for private state token usage.
keys.json contains the references to both the issuers of tokens and the public encryption keys of the private-public key pair that these issuers use to encrypt tokens (Figure 7). As shown in the image, issuers may advertise multiple token-keys for the same token-type to support key rotation. In this case, Issuers indicate a preference for which token key to use based on the order of keys in the list, with preference given to keys earlier in the list. Remember from above that each token has metadata informing which key the issuer used to issue it. So at the time the token is called for a redemption request, the token will identify which of these keys was used and then send that with the redemption request to find the appropriate private key to use for decryption.
Note the "PrivateStateTokenV1VOPRF" element directly under the issuer name. This tells the browser which version of the API to use to process the token.
Figure 7 - Contents of keys.json file
The manifest.fingerprint file is not explicitly defined in the PST API specification, but it is commonly used in web applications to ensure the integrity and authenticity of the manifest file. This file typically contains a cryptographic hash of the manifest file, which can be used to verify that the manifest has not been tampered with. This is discussed extensively in the manifest specification I mentioned above. You can see an example of the code used to do the verification in the specification here.
Within the manifest subdirectory there is a file called verifiedcontents.json. This file contains metadata used by the PST application. My guess, given the contents, is that these files contain information needed by the PST API to determine which token to use for the API calls
Conclusion
This has been a really long post and perhaps "too detailed" for my target readers. It was difficult to write for a number of reasons, and I imagine it required a bit of persistence by the reader to work through it all. Frankly I’m not particularly happy with it, but it is the best I can do for now. So I am going to stop here. But at least now you understand what a Private State Token is and how its data is both stored and used in the browser, which was the original goal of this particular post. This really is the last element of browser-side storage we need to cover. It’s on to headers and permissions, and then we can start on the Protected Audiences API (finally!)