Introduction
The amount of data we generate as we browse the Internet is so vast that it’s hard to wrap our heads around it. From the billions of Google searches per day to the never-ending torrent of emails, Instagram posts, Facebook updates, tweets, YouTube videos, Skype calls, WhatsApp messages – the list goes on and on. Our online activity collectively fuels the industry of behavioral advertising based on big data, which has surpassed the value of oil, and is predicted to grow to $274.3 billion by 2022. The resource feeding this incredibly-lucrative industry is information about our personal lives. Some of it we willingly share with the owners of the digital products and services that have become integral to our lives, but much of it is also siphoned up without our knowledge. This guide outlines how computing power and commercial incentives have converged to establish behavioral advertising as the business model of the Internet, before diving into the many ways in which personal data is tracked, collected and commercialized. We will then consider the implications of the data-driven Internet beyond advertising, exploring its impact on politics and democracy, and offering practical tips for regaining a degree of control over your personal information.
How Advertising Took Over the Internet
The early Web of the late 1980s was intended to be a utopian space for connection, exchange, and freedom from corporate control. Three decades on, the internet has become a highly-commercial enterprise – one giant advertising space.
How did this happen? The Privacy Issue enlists the expertise of Varoon Bashyakarla, data scientist and researcher at Tactical Tech, to break down the answer. As Bashyakarla explains, the advertising-based business model of the Internet flourished due to the platform’s suitability as a marketplace: “The moment you search for something, you're expressing a need for that thing in some capacity. That's the exact moment in advertising that advertisers can capitalize on.” The immediacy and targeted relevance of online advertising immediately made it more potent than traditional advertising platforms, like television, newspapers, or billboards: “It’s inexpensive, measurable, customizable, interactive, and very far-reaching in scope,” Bashyakarla explains.
Many metrics allow marketers to measure the success of their ads – such as cost-per-click, engagement, impressions, and acquisitions – fueling the growth of online advertising over the past two decades. The more information advertisers have about people, the more effective their advertising can be. Thus, the industry is powered by data. This data comes from information that Internet users produce (e.g. posts, photos, and tweets) and also from what they consume (e.g. the ads they click on, the videos they watch).
“If you know age, income, gender, sexuality, location of a person, you can advertise to them more effectively, and ads can be more relevant,” says Bashyakarla. Accordingly, “anything you do online that is deemed useful – not just now but also potentially in the future – is collected. Who people are, what they consume, what they might want, who they know... even measurable actions in the offline world go towards building profiles of people.” The huge amounts of data Internet users collectively generate is what 2019 Mozilla Fellow in Media, Misinformation, and Trust, Renée DiResta has termed an "information glut".
Computing Meets Commerce
The technological infrastructure that has allowed this "information glut" to turn online advertising into the dominant business model of the Internet is rooted in Moore’s Law, says Bashyakarla. Moore’s Law holds that the number of transistors on a microchip – and therefore computer processing power – doubles every two years, a prediction that more-or-less held true for the past 50 years. The result is that computing became exponentially smaller and cheaper, advancements that companies have leveraged to process the huge troves of data that are collected in increasingly-precise ways. Algorithms can be updated, combined, and processed continually over time to not only understand, but also to predict human behavior. As Bashyakarla explains, “Is this person going to the gym regularly? Maybe that says something about their preference for fitness products. Do they go to an organic grocery store? That could speak to their environmental beliefs.” Being able to advertise directly to the particular categories of people advertisers want to reach – fitness buffs, environmentalists, and so on – increases the chance that people will click on these ads, which in turn allows them to sell more products, and ultimately may increase their profits. Accordingly, says Bashyakarla, “the scope of data collection is boundless.”
Types of Tracking and How to Block Them
How exactly does data collection work in practice? There are many different methods for online tracking. Here we take a closer look at six major forms – tracking cookies, browser fingerprinting, location spying, mobile app trackers, shady browser extensions, and dodgy VPNs. We explain how they work while sharing tips for how to combat them.
Tracking Cookies
Cookies are messages that are usually stored as small text files and contain a unique identifier. Web servers send these cookies to your browser as you visit websites. Session cookies are used to remember aspects of Internet users’ browsing preferences – like your preferred language or the items you placed and left in your shopping cart. While these kinds of cookies can be helpful (e.g. remembering your language settings), other types follow you around the Web, taking information from one website and serving you ads for it on other, unrelated websites.
These are called "third-party persistent cookies" or, more simply, tracking cookies. Tracking cookies are the favored data collection tools of Google and Facebook (read more on Facebook tracking in our explainer on the topic). Tracking cookies only require you to load a page – no clicking on ads required – for your data to be transmitted back to the server belonging to the company behind them. That’s how ads for items you may have left in a shopping cart but never purchased can appear in an ad on your Instagram feed the following week.
Cookie Self-Defense
- Force cookies to self-destruct after you close each tab with the Cookie-AutoDelete extension.
- Install the EFF's Privacy Badger, a browser extension that blocks a variety of cookie tracking methods.
- Regularly delete cookies and clear your browser cache.
Browser Fingerprinting
As cookies become easier to see and prevent, the intrusive tracking technique of browser fingerprinting has grown in popularity. Browser fingerprinting involves collecting the properties and settings of a user’s Web browser. These include information that is gleaned through access logs, JavaScript, and plugins such as Adobe Flash. This includes: IP address, language, time zone, screen resolution, plugins and extensions, installed fonts and font sizes, cookies, and browser settings such as "Do Not Track". When combined together, the combination of these properties can uniquely identify an individual, tracking them persistently across the Web. Due to the covert nature of browser fingerprinting, this kind of surveillance is difficult to prevent. Changing your browser settings likely increases the uniqueness of your browser fingerprint.
Preventing Browser Fingerprinting
Although browser fingerprinting is difficult to discover and prevent, the following actions will help minimize your exposure:
- Disable JavaScript with an extension like SafeScript or NoScript.
- Try to change your User Agent String to something generic with a User Agent Switcher.
- Check your browser fingerprint with the EFF's Panopticlick.
- Browse the Web with Tor Browser.
Location Spying
Location data is one of the most lucrative – and invasive – types of data, because it can be used to track our "real world" movements with precision, making us irrefutably identifiable in real time. Enabling location services on our phones offers many advantages: helping us to navigate our way around cities, alerting us about the local weather, or recommending a good restaurant nearby. The trade-off for these conveniences, however, is that information on our whereabouts can be tracked and sold on the location-based advertising market. This location spying is now a fact of everyday life for anyone carrying a mobile device.
A 2018 study by the New York Times found that at least 75 companies gain precise location data from hundreds of apps where users enable location-sharing permissions. A sample of the data gathered revealed that people’s travels throughout the day were mapped with extremely high accuracy, updated up to 14,000 times within 24 hours.
Businesses in the industry, and their investors (including Peter Thiel, IBM, and Goldman Sachs), have responded to criticism about privacy intrusions with two main arguments. First, they say that users give permission when they turn on location services for apps. However, notice of the collection and sale of location data is often obscured by complex language and buried deep in privacy policies that are too long for users to read or understand.
Second, companies in the business of location-based marketing argue that the data is semi-anonymized – tied to a unique identifier that is not connected to a name or phone number. However, as investigations continue to prove, raw location data becomes more unique as it is aggregated. These datasets can often be analyzed or reverse-engineered to identify individuals and track them without their knowledge or consent.
Avoiding Location Spying
- Turn off location tracking in your phone's settings.
- When you install new apps, do not give permission for location tracking.
- Only allow location tracking for an app if you understand its policies and trust the app creators.
- Occasionally review all the apps you've give location permissions to in the past and revoke access for apps you don't trust.
- Be wary of Bluetooth and turn it off when not in use. BLE Beacons like Apple's iBeacon are an increasingly-common method of location spying.
Mobile App Trackers
Our phones have “secret lives” while we sleep, according to a 2019 study by The Washington Post, which uncovered the extent that our phones leak personal and sensitive information via mobile app trackers. In the investigation, up to 5,400 hidden trackers passed data from an Apple iPhone to other parties in just one week, including Yelp, Microsoft OneDrive, Nike, Spotify, The Weather Channel (owned by IBM), and the crime alert service Citizen. All this happens on an Apple device, from the company who was bold enough to splash the message “What happens on your iPhone stays on your iPhone” on a huge billboard in Las Vegas before the 2019 CES convention.
Mobile app trackers are often hidden as libraries or Software Development Kits (SDKs) inside mobile apps, never disclosed to users when the apps are installed and permissions granted. Mobile app trackers slow down load times, eat up battery charge, and boomerang back to us in the form of targeted ads. Not only is it hard find out about each app’s data sharing policies, but the very existence of mobile app trackers may take a research project. Even if you know about mobile app trackers on your phone (and who owns them), there is no easy way to block them. Unlike an ad blocker you can install in a Web browser, our mobile phones and app stores offer no simple way to stop these trackers.
Google's Android operating system has also been found to contain a myriad of trackers. A 2017 investigation by Yale Privacy Lab and the non-profit Exodus Privacy found 44 trackers in over 300 Android apps, including AccuWeather, Lyft, Tinder, OkCupid, Uber, and Skype. These trackers allowed, for example, the shaving company Gillette to use trackers on Tinder to determine whether college-aged male users with neatly-groomed facial hair get more right swipes than those with unkempt facial hair. Since the initial investigation, over 260 trackers have been identified in tens of thousands of apps. Though these mobile app trackers are found on Android, almost all of them are also present on iOS, as advertised by the companies behind the trackers themselves.
Eluding Mobile App Trackers
- Download Exodus Privacy (Android) to see what information your phone is sending to which companies, and block trackers at the source.
- Turn off Background App Refresh on your iOS device.
- On Android, use the free F-Droid app store. F-Droid strongly audits all apps for privacy concerns and scans them before publishing using the trackers provided by Exodus Privacy.
- Install the Exodify plugin in your Web browser and check apps in Google Play before installing them on your phone.
Shady Browser Extensions
Browser extensions can customize and enhance your browsing experience in a wide range of ways, making your experience on the Web more convenient, more fun, or just downright weird. Most browser extensions access and store a portion of your Web browsing data, hopefully requesting permission in a clear and unambiguous way before doing so. Some browser extensions aren't just accessing your data to improve your browsing experience, however. These shady browser extensions are a serious threat to your privacy that is often overlooked.
What happens when the innocent-looking extension you downloaded starts to build a detailed data profile on you and shares that data with other companies? The Dataspii report revealed a large-scale data breach in which personally-identifiable information about millions of individuals, as well as corporations, was gathered via eight Firefox and Chrome browser extensions and sold to an unnamed buyer.
Avoiding Shady Browser Extensions
- Review your browser extensions and disable or uninstall any you don't use.
- Where possible, choose Free and Open-Source Software (FOSS) and extensions from non-profit entities – like those from the EFF.
- Many functions offered via browser extensions are now built into privacy-conscious browsers. Check out what’s available for Firefox, Brave or DuckDuckGo (Android or iOS), and consider switching to the one that best meets your needs.
Dodgy VPNs
A Virtual Private Network (VPN) acts as an intermediary for your Internet traffic. When you use a VPN, your Internet service provider (ISP) connects your device to one of the VPN provider’s servers. From that point on, your traffic is being routed through the VPN. If the VPN is at all trustworthy, it will transport your data using strong encryption. With a competent and trusthworthy VPN, any third party trying to intercept your Internet traffic will not be able to do so. All of your activity will appear to originate from the VPN server and will not be traceable to your location via your IP address.
However, if your VPN provider keeps logs (and many do, despite claims to the contrary), they can see and access your browsing data, including your IP address, location, search terms, the websites you visit, and the timestamps of your visits. These dodgy VPNs are usually offered at no cost because they are spying on their users. The business model of these cost-free VPNs is selling your data to the highest bidder. If you're lucky, your data will be used to build a detailed advertising profile of you, which in turn is used to target you with ads. If you're unlucky, the data will be used for more nefarious purposes.
Choosing a Trustworthy VPN
- Don’t fall into the trap of dodgy VPNs. Identify some of the worst offenders and learn more about at the deceptive practices of Facebook's defunct VPN and another popular free VPN provider.
- Pick a paid VPN service that has earned your trust through transparency about its business model and who owns it. Information about IVPN and its business practices is available here.
- Keep in mind that many VPN recommendation articles and websites are biased. Some authors work on a pay-for-ratings basis – so do your own research before you trust the validity of their top picks.
What Google Knows About You
Given how much data we voluntarily share with Google, it’s no surprise that the information it has about us is extensive. Perhaps most problematic is the lack of transparency and control it gives individuals. We have little knowledge about how long Google retains data, how they’re combining it with other data, who they’re sharing it with, and for what purposes.
Here’s a (non-exhaustive) overview of what information Google has about you:
- Your entire search history, including search history you’ve deleted.
- An advertisement profile based on where you live, how old you are, your relationship status, income, career, and interests.
- Every place you’ve ever been since you began using Google on your phone, while your phone has been on (if you have location tracking turned on).
- Every online purchase connected to your Gmail address.
- A 2018 study by The Wall Street Journal found that Google continued to allow hundreds of third-party software developers to scan emails, despite having announced it would stop the practice a year earlier.
Keep Tabs on Google
- There’s an alternative to every Google product you use. Start by searching with DuckDuckGo.
- Review and delete your search history at My Activity.
- Turn off "Ads Personalization" in your Ads Settings.
- You can view your Maps Timeline and turn off Location History.
- View your Account Purchases.
- Google calls the sum of the data it has on you a Takeout, which you can download and review.
Connecting The Dots
The Clandestine Industry of Data Brokers
Who’s buying all this data, and what are they doing with it? It’s hard to fully answer this question, given that the nature of the industry lends itself to secrecy. Data brokers collect information about people, then sell that data (or classifications based upon that data, sometimes anonymized) to other companies or individuals. Marketing and contact directories are two of the main purposes for this data, but there are ever-more-complex technologies utilizing these data profiles online and offline, in newer and creepier ways. In the U.S., a 2019 Vermont law has taken a small step to bring companies out of the shadows, and strategies for data aggregators are shifting in reponse to regulation and public outcry.
Marketing-focused data brokers combine purchased data with openly-available information such as public records and public-facing social media accounts to create digital profiles on individuals, placing them in categories according to ethnicity, age, location, education level, income bracket, family status, and interests. These profiles are then sold to even more data brokers and advertisers, who use them to reach the groups of people they want to target via advertisements – including advertisements based on sensitive personal data (information relating to a person’s ethnicity, health, sexual orientation, political, and religious beliefs).
Contact directories (like Pipl, Spock, and Spokeo) allow users to type in someone’s name in order to find highly-specific personal information about them – including birthday, address history, details about education, employment, property records, marital status, and finances. Anyone can access this information, either for free or a fee.
Because data brokers don’t have a direct relationship with the people they’re collecting information about, they are under no obligation – and have few incentives – to inform individuals that they’re doing so. This disconnect between Internet users, tech companies, and data brokers is one of the major causes for the lack of notice and informed consent about where our data is going. It allows tech companies who trade in data to absolve themselves from responsibility for what happens after data collection, placing the burden on individuals to request removal notices.
Hiding From Data Brokers
There’s little you can do to prevent data brokers from
sweeping up information about you. However, there are some strategies which can help to limit your exposure.
- Set your online and social media accounts to private and limit the amount of information you share online.
- Learn to compartmentalize your identities for work and home, keeping certain information private for friends and family only.
- Try to opt out of marketing and data broker sites – unfortunately, a time-consuming process. Read this guide (U.S.-focused) for opting out of direct marketing and data broker listings.
- EU citizens can claim the right to be forgotten (also codified in Article 17 of the GDPR), which obliges companies to erase personal data when an individual withdraws their consent (or for other reasons). Read the EU guide to claiming this right.
Debunking the Anonymity Defense
In response to privacy concerns about the collection and sale of data, tech companies often claim that the data they collect, repackage, and sell is stripped of its personal identifiers – and therefore poses little or no threat to privacy. This argument has been disproven time and time again. A 2019 study in Nature Communications established that the likelihood of individuals being able to be re-identified from anonymized data sets, even when that data set is incomplete. For example, the chance of identifying a person living in Massachusetts from an anonymized database from just 15 demographic attributes is a whopping 99.98%. If you live in the United States or the United Kingdom, you can check how easily you can be pinpointed from just a few identifiers using this tool, created by researchers from Imperial College London and the University of Louvain. Given the lack of options for action, knowledge might not feel like power in this case – but it’s the first step in the process of regaining control over our personal data.
Our taps, clicks, and scrolls coupled with our searches, posts, and locations are being gathered to create highly detailed profiles about us. Not just our demographics but also our online and offline behaviour, interests, preferences, purchasing habits, desires, hopes, and fears. All this information is traded in an opaque industry often without our knowledge or consent, and even "anonymized" data can be traced back to us – robbing us of our autonomy and privacy. This has become the new normal for the ad tech business.
Implications of the Data-Driven Internet Beyond Advertising
Once again, Bashyakarla, whose work with Tactical Tech explores how personal data is being used for political influence, reminds us of the stark reality we are facing. “The advertising based model of the internet – the infrastructure that exists to sell shoes and plane tickets and lifestyles – can be hijacked for political purposes,” he explains, "but the ball game is very different when it's not shoes or plane tickets or lifestyles being sold, but instead tomorrow's political leaders, or ideology. The ramifications are, of course, much greater.”
The Cambridge Analytica scandal revealed the reality that the same data-driven profiles that are used to sell us things also have the potential for political persuasion. In the wake of the scandal, says Bashyakarla, “people ask, can you really build my psychometric profile from what I do on Facebook? There's academic literature that points to yes. The next question that’s asked is: can you use that information to influence my vote? The answer to that remains unresolved. But the fact that we can't yet determine the effectiveness of these technologies is no reason to dismiss them.” Bashyakarla has observed two major changes in the course of his research for Tactical Tech. First, there are an increasing number of startups emerging to capitalize on the growing digital demand for data-intensive political campaigns. Second, companies that had nothing to do with politics when they were founded are now entering the political space. Snapchat, for example, has made its services available to political campaigns. So services that were previously non-political in terms of their business models are now finding political clients.
Though Bashyakarla believes the world of fake news, mis- and disinformation is still “pretty unsophisticated” in terms of its targeting in comparison to the highly-efficient industry of online advertising, he suspects that it is only a matter of time until those tools intersect, and disinformation and propaganda is computationally customized to every single viewer who sees it. This marks a fundamental shift in the nature of democracy which, fundamentally, means the results will play out in public. As Christopher Wiley, the Cambridge Analytica whistleblower, put it, when companies whisper different versions of facts to different voters they "risk fragmenting society." Bashyakarla poignantly adds, “What happens to public discourse, the idea of the public square, and common understanding in a world that's so highly personalized?”
Browsing the Internet in (Relative) Peace
While lawmakers continue to debate these crucial issues, what can we as individuals do to regain a degree of control over our data? There’s no one-size-fits-all model, says Bashyakarla. “For some, quitting Facebook is a terrible option,” he says, “What you do about it depends on your own values and your own system of preferences."
Bashyakarla's own rule of thumb is to use Free and Open Source Software (FOSS) wherever possible, especially when choosing an Internet browser, password manager, or file sharing service. “This means the code behind tech is public so anyone can look at it, test it, and verify that the tech is doing what it says it does – nothing more and nothing less.” For example, he uses the Mozilla Firefox web browser together with the plugin Facebook Container, which allows internet users to control how much of their web activity Facebook is allowed access to, and makes it harder for Facebook to follow users’ activity across different websites.
Tactical Tech has also developed a Data Detox Kit, which guides internet users through a review and cleanse of their data footprint – at their own pace and level of intensity. Another Tactical Tech resource is the Alternative App Centre, which suggests FOSS and privacy-focused alternatives for commonly-used platforms. With a bit of research, time and energy, it is possible to switch to a suite of technological tools that work for you. The place to start, according to Bashyakarla, is to consider the question: “How can I make digital decisions that are more in line with my own values?