My second digital life: what is privacy?

Part of an ongoing series exploring my digital life, starting with “Designing my digital life (again)”

Part of understanding my gut feeling for wanting to move away from Google, why I think HTTPS is good, or why I dislike targeted adverts, is understanding and unpacking the feelings I have when I hear about Google and data in the news, when I hear about three letter agencies dragnetting the internet and when I see targeted advertising going wrong. All of these things stem from a root of feeling my privacy has been breached.

My computer defines privacy as:

privacy [noun]

a state in which one is not observed or disturbed by other people: she returned to the privacy of her own home.

the state of being free from public attention: a law to restrict newspapers’ freedom to invade people’s privacy.

As an individual I don’t feel privacy in absolute terms, but rather, I feel the change. Words like “invasion”, “intrusion” and “violation” show up when I think about what happens when that change is large enough. So for me, “privacy” is a bit like my underwear. I don’t notice it until something changes to make me pay attention. Like a wedgie, for example …

One could continue the close-to-skin notion of the idea of privacy and say it’s a bit like some kind of aura, radiating away from me. Closest to me represents 100%, or perfect, visibility (and 0% privacy!), and furthest away represents 0% visibility and 100% privacy. With 100% privacy, I am unseen, unknown and I leave no visible traces for another human. With 0% privacy, everything I do - every action, thought, impulse and secret - is a matter of public record. Somewhere between these two points is a sort of fuzzy edge to the privacy aura that on one side I feel secure and on the other I feel violated.

If part of the definition of perfect privacy is being unseen and unknown, then being the only self-aware creature in the universe would be one way of accomplishing that. This also implies that humans and relationships are key to the idea of privacy, or perhaps the converse is true: as a species we need privacy for relationships. Additionally, this prompts another thought, which is that because humans and relationships are now involved, so culture and individuality is now important in the notion of privacy.

Wikipedia backs up this idea of privacy being part of relationships (which means it must be true!):

“Privacy is the ability of an individual or group to seclude themselves, or information about themselves, and thereby express themselves selectively. The boundaries and content of what is considered private differ among cultures and individuals.”

We’ll come back to privacy and individuality later, but in order to explore and talk about what privacy means to me, I need some categories or words to use to describe what I’m comfortable with and what I’m not.

The Privacy Wedgie

Feeling a privacy breach means that something happened with my data that I had not anticipated, somehow. In many cases I decide, deliberately and knowingly, to give away some piece of information about myself. This action is my choice. If there are no ramifications to this, then everything is cool, but what if there are? Thinking about the ways the act of giving data away could cause a privacy breach, I hit upon 3 different classes:

Unintended: data okay in one context transpires to be dangerous in another.
Unexpected: I gave data away but the implication of doing so is coming back in a way that I hadn’t planned and find it hard to take responsibility for.
Unknown: I don’t even know it’s happening, but I didn’t ask for it and it wasn’t an accidental slip up.

1. Unintended breaches

Living in a perfect state of privacy means saying and doing nothing, and ideally being unobservable. So I chose to give up some of my privacy to satisfy some basic human need to be part of a group. Trust is built by sharing, so I tell stories of my past. Some of this might be data that you could use to cause me harm in a different context (my mother’s maiden name, for example) but in the social setting of a party, telling a story about my grandparents isn’t unreasonable.

This kind of privacy breach is deliberate but unintended.

2. Unexpected breaches

My data is mine to give away, so I should be able to trade it as a commodity under certain conditions. The most common unexpected breach is where I’m contacted by 3rd parties who I did not trade my data to, but someone else did. This is managed with privacy policies (wordy, wordy privacy policies) and/or some expectation management.

This kind of breach occurs because your expectation wasn’t managed appropriately around who might be in touch with you and how aggressively that occurs.

3. Unknown breaches

Between bad actors and governments (foreign and domestic), there are a lot of people who would like access to systems in order to observe individuals. In benevolent cases, the observation is to ensure individuals are not causing harm to others. It’s a laudable goal: to keep many people secure and safe from harm (like stopping terrorists from blowing up train stations). The cost of this is that everyone who is not the observer must be less private.

Why is this a risk? Well, where mass surveillance used to be expensive (in time, manpower, data organisation), now it’s exceedingly cheap to do (as Palantir demonstrates). So long as The Good Guys™ are the only ones doing this, everything is fine, but databases of personal data like this start to look like good honeypots for all sorts of things at scale. As a “normal person”, I don’t feel like I stand much of a chance in assessing my risks or expressing my wishes in these unknown breach cases.

This is a case where you didn’t even know you were offering up data, yet it is being used, and you’ve no visibility or control of that.

What is “data”?

In the news, law, and over various industries “stuff you might not want to share with everyone” is attached to a few different laws and holds some different names. All of which broadly come under the heading of “personal data”.

In the US, personally identifiable information (PII) I hear is attached to a variety of federal and local laws, while protected health information (PHI) shows up under HIPAA (Health Insurance Portability and Accountability Act).

Here in the UK we have the Data Protection Act 2018 which supplements the EU GDPR (General Data Protection Regulation), both of which have definitions of “personal data”, the provisions about processing it and the notion of indirect identification - combining more than one piece of information that would otherwise be okay into a whole that suddenly makes it possible to identify an individual.

If we make a broad definition of personal data to be “any piece of information that relates to me”, the list might look something like this (in the context of understanding the GDPR) or my own stab at it:

Biometric data - physical properties of me: face, iris, fingerprint, gender, date of birth
Medical: appointments, treatments, medicines, conditions, psychometric profiles
Identity proxy data - information linked directly to me: email address, phone number, driving license, passport, online forum accounts, who my bank is, which utility companies I am with, my car number plate
Access data - information that allows me to gain access to things: bank pin, passwords, door keys, car keys
Activity data - information about what I do, what I have done, what I am doing and what I might do next: location history, search history, site visit history, purchase history, listen/watch history, current location, schedule/appointments, email contents
Intellectual data - relating to what I think: my interests, hobbies, favourite music, colour, religion, sexual preferences, political leaning, hopes, dreams, fears
Social data: who I’m married to, dating, friends with, who my parents are, who I work with

Often each piece of information is individually worthless, but when associated with other parts becomes valuable. Knowing my sexual preferences without also knowing those are my preferences (eg data without an associative key like my name or likeness or me) isn’t really going to help you blackmail me, but knowing people who have those preferences also tend to have certain political leanings (for example) could be useful for advertising.

Additionally, some of these bits of data are perfectly reasonable, desirable and potentially necessary to give away (phone numbers!). A stranger using my phone number to contact me after I’ve given away business cards at a conference doesn’t register on my breach wedgie-scale.

How might data get used?

The complexity of managing even a simple piece of data like my phone number is staggering, when I stop to think about it. Imagine handing out your business card to one person, who gives it away or drops it, only to be discovered by someone else who happens to know someone who’d dearly love to use it to sell you something! I mean, that’s entirely made up, but there’s a few concepts to consider when thinking about the types of data use that I are comfortable with, and the risks of them turning into something else.

Direct - they want to target me, personally.
Relational - they want to target someone I know, but only know about me.
Aggregate - they want to target the general group of people I belong to.

1. Direct usage

Anyone wanting to target me specifically, for whatever reason. Identity proxies (like email addresses) coupled with associated keys (like my name and face) are needed to identify and target me. When I give away my phone number to another individual, and they contact me, this is very much a “direct usage” scenario. I gave some data away for a specific purpose and fully expected this to happen. No breach here, unless I’m are contacted by someone I didn’t give my number to …

2. Relational

I are not the target of the data usage per se, but you’re a link in a chain, so your relationships to other people are a component of what the end goal is. Knowing something about me specifically is necessary, as in the direct case above, in addition to knowing how I specifically relate to another person.

This is at the heart of the Palantir link in the unknown breaches, but this could also be more localised. Imagine that my friend works with law enforcement, has a child, and uses Facebook to share photos of both of them together, privately, with their close family and friends. Now imagine that I forget they want to stay “hidden” in the world, and I share one of their photos that I love … whoops, I’ve created potentially an unknown breach for them, and my data could be used in a relational way by ne’erdowells to identify their child and them for all sorts of terrifying things.

3. Aggregate

My identity itself is secure, because I am part of a (large) cohort of similarish people. This is where I am not personally identifiable in a dataset, but I can be sorted into a dataset for the purposes of things like recommendations: people like you liked this, maybe you’ll also like this.

Now if the set of people that I am part of is small enough, then identifying me within that set is trivial. This is less good, but for the sake of this category, let’s assume that any categorisation of “aggregate data usage” is large enough that it’s non-trivial to use it to surveil me.

My privacy conclusions

So I’ve categorised a bunch of stuff, but what do I actually care about?

Breach types

Like a wedgie, no breach is ever going to be comfortable, but some are more uncomfortable than others. There’s not much I can do about unknown breaches except restrict them. Unexpected breaches are the ones that often show up because I think I’ve given company A my email, then get mailed by company B, not a huge deal but seem to rub me up the wrong way. Unintended breaches feel like the ones that are a bit more within my power to control, but could be awkward.

My thoughts now are:

Eliminate unintended breaches, by drawing harder lines over things that are private and things that are public, and to some extent, rebelling against the use of public data for “security questions”.
Limit unexpected breaches, by getting a better understanding of where my data is going and under what policy is it being managed. One version of this action looks a lot like reading the privacy policy and terms and service.
Restrict the possibility of unknown breaches by taking some reasonable precautions, like using HTTPS to mitigating against ISPs dragnetting your HTTP sessions. In an ideal world HTTPS with perfect forward secrecy would also prevent captured sessions being decrypted and broken in the future.

Usage types

When I started this project, I switched browsers to Brave (thoughts coming later). Lots of ad-blocking and the act of writing this all down has made me more aware of the sorts of usage I might accept, and I am surprised to find that I’m much more comfortable with aggregate usage (on paper) than I thought, so targeted ads now feel like a reasonable price to pay for free access to information.

I’m happy to be intentional about direct usage, and like above with the unexpected breaches this probably means understanding privacy policies and terms of services to a much better degree than I currently do.

Relational usage makes me the most uncomfortable because it seems like the thing that I could easily trip up on. I don’t know exactly what actions to take around this one yet.

I am definitely not happy with the notion that it’s trivial for me to be surveilled, so any actions should take that into consideration too. Not sure precisely how strong my principles are on this, because I’m a bit of a priviledged individual who isn’t doing anything wrong so why should this cause me a problem …

Data types

Listing out even some of the different types of data that exists has been a sobering exercise. I didn’t do an exhaustive study or think very hard about it, and yet that list is pretty long. I find it very challenging to harden harden the fuzzy boundary of privacy because I don’t live in a world of absolutes, and feel the need to be flexible. So defining “This is private” and “this is not” is a useless activity for me.

To some extent I can describe my comfort levels of each data type being used for each breach type, and maybe this could lead to highlighting the key areas that need addressing and mitigating for me.

Conversational privacy

I set out to understand my motivations and feelings around privacy and data, and part of that processing and understanding came from talking to my friends. It was during one of these conversations (in the pub, of course) that two key ideas surfaced.

1. Privacy vocabulary

Without the words to express the precise and nuanced ideas, we’re often reduced to catastrophising because there seems to be something in human nature that makes it easy to take the worst case scenarios instead of looking at a balanced view of severities and impacts.

2. Privacy is personal

The other part is the early acknowledgement that privacy is, and must be, personal (individual, societal, cultural). What I am happy to tolerate, you may not be. What my present self is okay with, my future self may not be. Crucially: that is okay. Having some language that focuses on the specifics (breach types, data types, usage types) with the understanding that “I cannot speak for you”, really moved the conversations with my friends from catastrophising to helping each other understand what private looks like for each of us.

What next?

The goal is something a bit more concrete around using my new-found digital privacy enlightenment to help me choose services to use that I am comfortable with and can justify. But as with many things I reserve the right to change my mind later :)

A big thank you to my friends who are accompanying me on this journey for their consultation, proof-reading and additional research.

October 10, 2019