The Evolution of the Internet, Identity, Privacy, and Tracking – How Cookies and Tracking Exploded, and Why We Need New Standards for Consumer Privacy

By Jordan Mitchell

It’s time we speak frankly about a very personal matter: your privacy on the internet.

It seems everyone these days wants to protect it. The European Union enacted a sweeping new set of laws, the General Data Protection Regulation (GDPR), to safeguard it. The California Consumer Privacy Act (CCPA), passed rapidly under duress, goes into effect in January. The U.S. Congress is debating Federal regulations that could pre-empt California and many other states considering their own privacy-protection laws. Facebook, facing widespread criticism for its mishandling of consumer data with the English consulting firm Cambridge Analytica, is implementing new procedures nearly every month to provide users more transparency into and control over their data. Apple is on a path to block virtually all advertiser access to usage data on its devices. Google is publicly mulling changes to its Chrome browser to create a “privacy sandbox” that will enable users to control their data while still enabling ad-supported content.

But here’s the frank talk we need to have: None of it will work, unless we work together. Four or five giant companies, each advancing its own competitive position against the others, won’t protect your privacy. Government regulators, passing conflicting laws with underfunded enforcement at the trans-regional, national, state, and local levels, won’t protect it. Self-regulatory bodies, including our own, which lack the legal authority to enforce compliance with their procedures, won’t protect it.

Only one thing will serve to truly provide consumers (i.e., all of us) with meaningful privacy that aligns harmoniously with other vital social, cultural, legal, and economic norms and expectations: We must work collaboratively across industries, governments, and NGOs to develop new technical standards that support continued innovation on a bedrock of consumer trust, privacy and security. We must set ourselves on an orderly path to rethink the cookie — an early, jerry-rigged internet technology that has far outlived its usefulness — and embrace a new paradigm of clear privacy settings and consumer controls tied to a standardized identifier.

At the IAB Technology Laboratory (IAB Tech Lab), the global, nonprofit technology standards-setting body whose work underlies many of the internet’s media and advertising distribution standards, we believe there is a path forward. We propose the establishment of a standardized user token — basically a shared technical mechanism for information, controlled by consumers — to which a consumer’s privacy settings and preferences may be attached and broadcast to all companies. We propose technical mechanisms for building enhanced accountability to consumer privacy and security into the fabric of the web, systematically ensuring ongoing responsible and compliant use of identifiers and data in strict accordance with the consumer preferences attached to a standardized token.

The Birth of the Cookie

As an open, decentralized platform in which any person can share information with anyone else, anywhere, the internet powered a historical global social, cultural, and economic disruption on par with the advent of electricity, the automobile, and broadcasting — much of it premised on its ability to offer personalized services, capabilities, and conveniences at scale. Yet as revolutionary as the internet has been, the core technical mechanisms supporting its ability to customize and personalize information and services have evolved very little.

Consider for example, the infamous cookie.

The movement towards egalitarian information-sharing and access, using open, ubiquitous technology standards instead of closed, proprietary, client/server systems, began in the mid-1990s. Free “web browsers” communicated to “web servers” using standard HTTP protocols — largely the same standards and architecture used today! The HTTP Request today, just as it was 25 years ago, is generated by the browser and contains information such as browser type, IP address, device type, operating system, web site URL, and other data. The corresponding HTTP Response is generated by the web server, using the Request data to deliver content, whether it’s an image, a video, a paragraph, an ad, a file, etc.

Very early on, the architects of the Internet realized they had a problem with their design: there was no way to distinguish one request or consumer from another. It was like talking to a room full of people who have no name tags, sound the same, look the same, and are all talking at once about different things. How do you make sense of it all?

So the “HTTP Cookie” was invented and added as part of the core HTTP protocol, which allowed any web server to write an arbitrary value (such as a pseudonymous ID) to a small text file on a user’s browser, which is then returned with every request. This helped distinguish one browser device (e.g., “user ID 123”) from another (“user ID 124”). It also allowed for web sites to capture basic analytics; they could now count distinct users, pageviews, popular content, and other metrics.

Thanks to the cookie, the late 1990s saw the internet become a personalized media and e-commerce engine. Amazon.com pioneered the customized commercial experience, offering recommendations and other personalized shopping choices based on consumer behavior. Without HTTP cookies, this was not possible then, and it’s still not possible today.

There’s no question that personalization and e-commerce advanced the internet experience in a positive way for consumers. As a result, other websites followed suit in order to compete. Third-party vendors sprang up to offer personalization and e-commerce platforms for web publishers to use — each of which, based on the design of internet protocols, added another cookie to the consumer’s browser.

However, there were unintended consequences. By design, the HTTP cookie may only be returned to (and read by) the server that set it, meaning website1.com cannot read the cookies from website2.com. There was no common cookie functionality which all parties could read. So each and every website, web server, site owner, and company had to create its own proprietary user identifier and store it in a cookie. This has resulted in millions of cookies proliferating around the internet, with each of those companies using a different method to recognize each individual user. Virtually overnight, the internet went from a roomful of people with no name tags, to a roomful of people, each of whom was wearing thousands of name tags, each of which was generated by (and could only be seen by) one specific entity at the other end of the relationship.

That internet personal relationship structure continues to this day. Proprietary HTTP cookies were (and remain) the core mechanism for distinguishing one consumer from another, and each cookie may only be read by the party that sets it. There is no standardized, centralized mechanism for consumers to convey their interests or privacy preferences, which can then travel with them and be reliably broadcast to the right parties as consumers surf the web or hop from app to app on their mobile devices.

Make no mistake: the cookie was a boon to the internet. It enabled sites to commercialize their offerings by personalizing ads and content according to user interests, undergirded the entirety of e-commerce, and enabled the analytics that have turned individuals into influencers and even media moguls. But the cookie also broadly fragmented and privatized privacy, while necessitating excessive, redundant HTTP requests from every consumer page view … commencing the data and privacy crises that we see today.

The Bakery Expands

While cookies may have been invented largely to serve commercial purposes, consumers soon took control. With high-speed broadband connections eclipsing dial-up internet access around 2005, user-generated content (UGC) — everything from personal videos and images, to reviews, comments, and blogs — eclipsed business-generated content for the first time. This era marked an unprecedented level of sharing — of personal feelings, opinions, images, videos, ideas… and data — making the internet more of a shared, personal experience for consumers. Without HTTP cookies, this would not have been possible 15 years ago, and it would not be possible today.

Web sites and services that centered around UGC flourished, betting that their freely-provided personalized consumer experience could be funded by personalized advertising. As a result, this era marked the beginning of data collection at scale. This information was known as “behavioral data,” for it identified individuals not by their names or addresses or telephone numbers (as offline direct marketing has for decades), but by a pseudonymous identifier (e.g., “123abc”) tied to their behaviors – the sites consumers visited, the content they preferred, the ads they viewed.

Again, most websites were not able to build for themselves the technologies that were fueling this small business revolution. But that didn’t matter. Because of its open architecture, the internet remained friendly to hundreds of third-party vendors that arose to offer UGC services for publishers and brands to integrate into their sites. And yes, each of those UGC-enabling vendors added yet another cookie to the consumer’s browser.

User-generated content fueled the social web. We connected with each other on sites like MySpace and Facebook, then sharing our “likes” (and other personal sentiments), interests, professional opinions, location, etc.

The social web set new consumer expectations for online experiences and sharing of personal data, for different purposes: business networking (LinkedIn), real-time news and opinion (Twitter), updates among friends (Facebook), photographs (Instagram), and more. All these services were free to consumers and funded by personalized advertising, and the more data we shared the more popular we seemingly became — fueled by social rewards and dopamine hits.

With consumer expectations for personalization and apparent willingness to share data at unprecedented levels, a new model for marketing emerged: targeting people instead of pages. Marketers were attracted by the ability to optimize their advertising budgets and limit their reach to specific audiences based on interests, location, opinions, demographics, sentiment, etc.

The dominant consumer-service companies built proprietary systems to connect marketers to their desired audiences via personalized advertising, while most others licensed innovative third-party platforms that allowed them to compete. Privacy concerns were voiced, and our industry’s first standardized privacy programs were created to enable consumers to broadly opt out of online behavioral advertising. These privacy programs also were based on HTTP cookies (again, the only technical mechanism available), even while cookie reliability began to diminish. There was still no standardized mechanism available on which to base these personalized experiences or for the consumer to set privacy preferences, other than proprietary HTTP cookies (where each cookie may only be read by the party that sets it), so, perhaps ironically, even more cookies were set within each consumer’s browser while the industry attempted to provide consumers with centralized privacy preferences.

The Mobile Moment

Through the 2010s, internet-enabled devices became increasingly ubiquitous; in the U.S., the number of smartphones passed the number of television sets around 2016. So now the internet was with us everywhere we went, and consumers could share personal data wherever we were — and we did, willingly.

Within the apps on smart devices however, the operating system vendors (starting with Apple) finally recognized the trouble with cookies, and provided device-based IDs instead. Unlike cookies, which are highly fragmented, unreliable, proprietary to each company, and cause excessive, redundant HTTP requests that slow consumer experiences, device-based IDs theoretically provided consumers with centralized privacy controls, and the consumers’ preferences can be automatically communicated to all downstream parties. Though proprietary and closed (where cookies are standardized and open), device-based IDs marked a dramatic improvement in consumer security and privacy, which we’ve yet to witness within the browser environment.

The mobile-internet revolution was complemented by another giant shift in the way the commercial internet operated: what the digital marketing and media sectors call “programmatic advertising.” This involves machines and algorithms replacing manual processes to connect brands with consumers, making it more efficient for marketing experiences to fund free consumer services.

Just as real-time software systems replaced manual buying and selling of stocks on financial exchanges, and Sabre automated the buying and selling of airline seats, “real-time bidding” systems have automated the buying and selling of digital advertising and content. Marketing automation fundamentally disrupted the digital media landscape, and most ads you see today in apps or on websites result from a real-time auction involving up to thousands of advertisers competing to deliver you their advertisement. In fact, there’s not a single website publisher, mobile app, or advertising brand today that doesn’t participate in real-time systems for buying or delivering personalized ads to consumers.

Marketing automation grew rapidly because it worked. With the enormous scale of consumer engagement in digital media, their “behavior” and “interest” data available, and the growing expectations for personalized experiences, only machines and algorithms could handle the volume of media and marketing transactions that were the foundation of the commercial Internet. Further innovation within the industry was fueled, as more small businesses grew — including the entire direct-to-consumer brand revolution sparked by the eyewear company Warby Parker, mattress company Casper, exercise company Peloton, and thousands of others — taking advantage of marketing automation to personalize their offerings, services, and communications.

By now, as a result of all this evolution of consumer experiences over 20+ years of the web and the robust innovation that enabled it, an elaborate “digital supply chain” has formed. For every ad-supported mobile app or website we appreciate as consumers, there are now hundreds of small, medium, and large businesses that integrate with each other on behalf of those publishers to deliver and enable your free, personalized experience. This is not dissimilar to the supply chain involved in delivering the vehicle you drive, or the food that you eat.

However, this supply chain still relies on the weak foundation of the HTTP cookie. Because there are no standardized mechanisms available on which to base personalized website experiences, each of these companies must set a different proprietary HTTP cookie — which causes even more cookies to be stored within our devices. Furthermore, since a cookie may only be read by the party that set it, each company in the supply chain utilizes processes to synchronize the different cookies used by each of their partners, for each consumer, on each web browser on every connected device. This process is called “pixel syncing” (or “cookie syncing”) and is hugely redundant — it’s like speaking to a roomful of people but having to repeat everything you say to each attendee separately. As such, the process can result in more than 100 third-party requests (aka “trackers”) on any given web page, slowing down the consumer experience and contributing to the anxiety around cookies and tracking.

The Age of Consumer Privacy

The basic design of the HTTP cookie 20+ years ago, which is still the only technical mechanism available within internet protocols for personalizing the consumer experience (including privacy preferences and features) is now a giant square peg being pushed through a tiny round hole. While the internet has evolved to offer consumers many conveniences, the HTTP cookie has not. It is now a source of great inconvenience and worry — for everyone!

A “perfect storm” of consumer privacy issues is upon us. We’re seeing a proliferation of personal, connected devices generating a massive amount of personal data, with increasing potential for misuse. The status quo — comprised of hundreds of proprietary fragmented cookies, identifiers and trackers, and no persistent, standardized consumer privacy controls — is untenable. For the next wave of innovation to be fueled responsibly, we must work together across industries to support a global foundation of consumer trust built upon privacy, transparency and control.

We propose standardized privacy settings and consumer controls tied to a neutral, standardized identifier, as an improved mechanism for audience recognition and personalization. We propose that this information travel with the consumer and be broadcast throughout the digital supply chain, so that it may be reliably honored, respected, and propagated.

We propose as a condition for any company to access the standardized identifier, they must consistently demonstrate compliance to the privacy preferences attached — directly and necessarily coupling the economic privilege of personalization to the responsibility of maintaining privacy.

To enhance trust in digital media and accountability to consumer privacy, we propose a joint accountability system with compliance mechanisms built into standardized protocols (and therefore software systems). This will systematically ensure ongoing responsible and compliant use of identifiers and data in strict accordance with the consumer preferences attached to the standardized tokens. Those found not in compliance may have their access privileges revoked.

Further, we propose the introduction of a standardized, controlled container for ad delivery to limit the execution of client-side code in order to reduce security, performance, and tracking concerns.

Just as the internet is not owned by any party, we propose that these standards be set up as public utilities, subject to regulations promulgated by government entities, with the digital media and marketing industries jointly governing the standards with the browser providers.

For years now, hardly a month goes by that we don’t hear negative sentiment regarding HTTP cookies, though they remain the only technical mechanism available within standard internet protocols to support the personalized web experience we expect as consumers, including our privacy preferences. However, if we eliminate the cookie without a suitable replacement, we constrain the open innovation, competition, access and choice that are indeed the hallmarks of the internet.

Make no mistake about it: Eliminating cookies today without an adequate, planned transition to a new, publicly-owned mechanism for recording and honoring consumer preferences will disenfranchise millions of independent businesses, entrepreneurs, influencers, and individual communicators, and concentrate control of the internet with four or five giant technology companies.

Just as we have standardized telephone numbers and addresses, to which we as consumers may attach our preferences, we need a neutral, standardized online identifier to which we attach our privacy preferences. In order for any party to access the former, they must consistently demonstrate compliance with the latter. This can replace our technical dependency on HTTP cookies and eliminate hundreds of redundant, proprietary cookies and third-party trackers that have eroded consumer trust and slowed down our web experiences. To build a strong foundation of consumer trust, privacy and security on the open Web for the next 20+ years, let’s work together across industries to establish a new paradigm that places the consumer and our privacy preferences at the center, and allows innovation — including valuable content and services for consumers — to flourish.

Learn more about our proposal to replace the cookie with a standardized identifier to communicate consumer preference at IAB Tech Lab’s Data Responsibility Innovation Day on September 19th San Francisco.

ABOUT THE AUTHOR