Roger Clarke's Web-Site
© Xamax Consultancy Pty Ltd, 1995-2015
|Identity Matters||Other Topics||Waltzing Matilda||What's New|
Roger Clarke **
Original of 15 August 1997, revs. Sep 1999, Dec 2005, Aug 2006, 21 October
2013, 24 July 2016
This document is published only in this form - as a page on my personal web-site.
Its Google citation-count passed 200 in mid-2016.
© Xamax Consultancy Pty Ltd, 1997-2016
Available under an AEShareNet licence or a Creative Commons licence.
This document is at http://www.rogerclarke.com/DV/Intro.html
This paper provides a brief introduction to the topics of data surveillance and information privacy, and contains my definitions of key terms in the area. It is intended as a starter-resource for people who want to break into the area; and as a reference resource for people who've already broken in.
A separate document lists those papers that are designed to be accessible, rather than intended for publication in journals.
Another provides an annotated bibliography, that classifies my 100 or more papers into topic areas.
Another lists my main papers in reverse chronological order.
Finally, the document that is likely to be most up-to-date is my What's New Page.
We use a lot of words without thinking about what we mean by them. When the words are 'eat' or 'zebra', it probably doesn't matter too much. But when we use abstract words that are full of contentiousness, like 'discrimination' and 'ethnicity', all hope of a rational discussion disappears unless we achieve some degree of similarity between our understanding of the terms.
'Privacy' is an abstract and contentious notion. This document provides definitions of privacy and related terms, in a sequence that is intended to provide an introduction to the topic.
You don't have to like the definitions I use, although I believe them to be fairly mainstream among the people who specialise in these areas; and they're what I use when I write and talk about the topic. (I do a lot of it. See the index of papers, and my recent works). I welcome suggestions for improvement, constructive criticism, etc.
A more substantial treatment of what privacy means is in Clarke (2006). The purpose of the document that you're reading is to provide a brief overview.
People often think of privacy as some kind of right. Unfortunately, the concept of a 'right' is a problematical way to start, because a right seems to be some kind of absolute standard. What's worse, it's very easy to get confused between legal rights, on the one hand, and natural or moral rights, on the other. It turns out to be much more useful to think about privacy as one kind of thing (among many kinds of things) that people like to have lots of.
Privacy is the interest that individuals have in sustaining 'personal space', free from interference by other people and organisations.
[Clarification of July 2016: I extracted the definition in the early 1990s, directly from Morison (1973). The original referred to "a 'personal space'". But that could be misinterpreted as being a singular space, whereas Morison meant it in a generic sense, or perhaps in a multi-dimensional sense. I therefore prefer to omit the "a", in order to convey the broader notion of "sustaining personal space".]
Drilling down to a deeper level, privacy turns out not to be a single interest, but rather has multiple dimensions:
With the close coupling that has occurred between computing and communications, particularly since the 1980s, the last two aspects have become closely linked. This is the primary focus of public attention, and of this document. It is useful to use the term 'information privacy' to refer to the combination of communications privacy and data privacy.
[Addition of October 2013:] During the period since about 2005, a further disturbing trend has emerged, which gives rise to a fifth dimension that wasn't apparent when I structured this in the mid-1990s;
With the patterns becoming more complex, a list may no longer be adequate, and a diagram may help understanding:
There are many different reasons that people put forward to support the proposition that privacy is important. The following provides a classification and brief overview:
Privacy's importance is reflected in the fact that the fundamental documents that define human rights include reference to privacy, including the Universal Declaration of Human Rights (UDHR 1948, Article 12), the International Covenant on Civil and Political Rights (ICCPR 1966, Article 17), and many national Constitutions and Bills of Rights.
An important implication of the definition of privacy as an interest is that it has to be balanced against many other, often competing, interests:
Privacy Protection is a process of finding appropriate balances between privacy and multiple competing interests.
Because there are so many dimensions of the privacy interest, and so many competing interests, at so many levels of society, the formulation of detailed, operational rules about privacy protection is a difficult exercise. The most constructive approach is to:
The previous section introduced information privacy as a combination of the privacy of personal communications and of personal data:
Information Privacy is the interest an individual has in controlling, or at least significantly influencing, the handling of data about themselves.
The term 'data privacy' is sometimes used in the same way. 'Data' refers to inert symbols, signs or measures, whereas 'information' implies the use of data by humans to extract meaning. Hence 'information privacy' is arguably the more descriptive of the two alternatives.
The notion emerged during the mid-1960s, and the growth in its importance is often perceived to be directly linked to concern about the accelerating capability of computers, and their application to the processing of data about people. In fact, there has been a tendency throughout the twentieth century for organisations to increase:
The continuing increase in public concern about information privacy should therefore be seen as a reaction to the ways in which information technology is used by organisations, rather than to information technology itself.
Legislatures of countries on the Continent of Europe, and to some extent also in North America, passed laws addressing information privacy, primarily during the 1970s, though with some laggards deferring action until the 1980s or even 1990s. These laws mostly focus on 'data protection', i.e. they protect data about people, rather than people themselves. This is unfortunate because, although data protection is a more pragmatic concept than the abstract notion of privacy (and it's therefore easier to produce results), it's not what humans actually need.
Another term that has been used to describe current information privacy protections is 'fair information practices legislation'. Although it sounds good, this approach has failed to satisfy the need. It has been used to legitimise existing practices, and to permit virtually any use of data by virtually any organisation for virtually any reason, provided that it is handled 'fairly'. Genuine privacy protection forces uses of data to be justified.
Another related notion derives from the law of confidence:
Confidentiality is the legal duty of individuals who come into the possession of information about others, especially in the course of particular kinds of relationships with them.
Confidentiality is an incidental, and wholly inadequate, substitute for proper information privacy protection.
Like 'privacy', 'confidentiality' is an anglo-saxon notion, and is not readily translatable into other languages. The German Federal Constitutional Court, on the other hand, has read into that country's constitution a right of individuals to 'informational self-determination'. This would appear to bring European thinking somewhat closer to the ideas discussed in english-speaking countries.
The term 'privacy' is used by some people, particularly security specialists and computer scientists, and especially in the United States, to refer to the security of data against various risks, such as the risks of data being accessed or modified by unauthorised persons. In some cases, it is used even more restrictively, to refer only to the security of data during transmission.
These aspects are only a small fraction of the considerations within the field of 'information privacy'. More appropriate terms to use for those concepts are 'data security' and 'data transmission security'.
The term 'confidentiality' is also sometimes used by computer scientists to refer to 'data transmission security', risking confusion with obligations under the law of confidence.
Information privacy is valued very highly by individuals. But it is under threat from particular kinds of management practices, and from advances in technology. This section explains the concept of 'data surveillance'. To do so, it is first necessary to define some underlying terms.
Surveillance is the systematic investigation or monitoring of the actions or communications of one or more persons.
The primary purpose of surveillance is generally to collect information about the individuals concerned, their activities, or their associates. There may be a secondary intention to deter a whole population from undertaking some kinds of activity.
Two separate classes of surveillance are usefully identified:
Personal Surveillance is the surveillance of an identified person. In general, a specific reason exists for the investigation or monitoring. It may also, however, be applied as a means of deterrence against particular actions by the person, or represssion of the person's behaviour.
Mass Surveillance is the surveillance of groups of people, usually large groups. In general, the reason for investigation or monitoring is to identify individuals who belong to some particular class of interest to the surveillance organization. It may also, however, be used for its deterrent effects.
The basic form, physical surveillance, comprises watching (visual surveillance) and listening (aural surveillance). Monitoring may be undertaken remotely in space, with the aid of image- amplification devices like field glasses, infrared binoculars, light amplifiers, and satellite cameras, and sound- amplification devices like directional microphones; and remotely in time, with the aid of image and sound- recording devices.
The notion of the 'panopticon' (Jeremy Bentham's 18th century proposal for efficient prisons as an alternative to transporting felons to colonies like what is now Australia) has re-surfaced in the writings of Foucault, as a metaphor for what he sees as the prison-like nature of late 20th century societies.
In addition to physical surveillance, several kinds of communications surveillance are practised, including mail covers and telephone interception.
The popular term electronic surveillance refers to both augmentations to physical surveillance (such as directional microphones and audio bugs) and to communications surveillance, particularly telephone taps.
These forms of direct surveillance are commonly augmented by the collection of data from interviews with informants (such as neighbours, employers, workmates, and bank managers). As the volume of information collected and maintained has increased, the record collections of organizations have become an increasingly important source. These are often referred to as 'personal data systems'. This has given rise to an additional form of surveillance:
Data Surveillance (or Dataveillance) is the systematic use of personal data systems in the investigation or monitoring of the actions or communications of one or more persons.
Dataveillance is significantly less expensive than physical and electronic surveillance, because it can be automated. As a result, the economic constraints on surveillance are diminished, and more individuals, and larger populations, are capable of being monitored.
Like surveillance more generally, dataveillance is of two kinds:
Personal Dataveillance is the systematic use of personal data systems in the investigation or monitoring of the actions or communications of an identified person. In general, a specific reason exists for the investigation or monitoring of an identified individual. It may also, however, be applied as a means of deterrence against particular actions by the person, or represssion of the person's behaviour.
Mass Dataveillance is the systematic use of personal data systems in the investigation or monitoring of the actions or communications of groups of people. In general, the reason for investigation or monitoring is to identify individuals who belong to some particular class of interest to the surveillance organization. It may also, however, be used for its deterrent effects.
Dataveillance comprises a wide range of techniques. These include:
Front- End Verification. This is the cross-checking of data in an application form, against data from other personal data systems, in order to facilitate the processing of a transaction.
Computer Matching. This is the expropriation of data maintained by two or more personal data systems, in order to merge previously separate data about large numbers of individuals.
Profiling This is a technique whereby a set of characteristics of a particular class of person is inferred from past experience, and data-holdings are then searched for individuals with a close fit to that set of characteristics.
Dataveillance depends on the availability of data. Whereas the term 'personal data system' reflects the interests of a data collector, a term is needed that relates to the interests of individuals:
Data Trail This is a succession of identified transactions, which reflect real-world events in which a person has participated.
Identification is a process whereby a real-world entity is recognised, and its 'identity' established. Identity is operationalised in the abstract world of information systems as a set of information about an entity that differentiates it from other, similar entities. The set of information may be as small as a single code, specifically designed as an identifier, or may be a compound of such data as given and family name, date-of-birth and postcode of residence. An organisation's identification process comprises the acquisition of the relevant identifying information.
Contrary to the presumptions made in many information systems, an entity does not necessarily have a single identity, but may have multiple identities. For example, a company may have many business units, divisions, branches, trading-names, trademarks and brandnames. And many people are known by different names in different contexts.
A variety of person-identification techniques are available, which can assist in associating data with them. Important examples of these techniques are:
The term 'biometrics' is used to refer to those person-identification techniques that are based on some physical and difficult-to-alienate characteristic, such as:
These are examined in greater detail in a paper on human identification in information systems.
Person-identification schemes can be used by a single organisation for a single purpose. Alternatively, they can be used by one or more organisations for multiple purposes.
A special case of person-identification occurs where a country establishes a scheme intended to be used by many organisations for many purposes:
An Inhabitant Identification Scheme provides all, or most, people in the country with a unique code, and a token (generally a card) containing the code.
Such schemes are used in many Europe countries for a defined set of purposes, typically the administration of taxation, national superannuation and health insurance. In some countries, they are used for multiple additional purposes. There is deep concern in Anglo-American countries about such schemes, as evidenced by the demise of the Australia Card proposal.
Unlike physical, communications and electronic surveillance, dataveillance does not monitor the individual, but merely the shadow that the person casts in data. A term is needed to refer to the subject of dataveillance:
The Digital Persona is the model of an individual's public personality based on data, and maintained by transactions, and used as a proxy for the individual.
Like any mere model, a digital persona is a partial and inaccurate reflection of a complex reality. Serious dangers arise when determinations are made, and actions taken, about an individual, based on their digital persona.
A token that holds particular attraction as a tool in person-identification is a chip-based card (smart-card). This may carry a person's Private Key, enabling them to attach a Digital Signature to an electronic message. This has substantial privacy implications.
A chip-based card can also be used to carry a person's biometrics. Moreover, the technology can be, at enormous risk to privacy, used for a chip-based inhabitant identification scheme.
Authentication is the process whereby a degree of confidence is established about the truth of an assertion.
A common application of the idea is to the authentication of identity (Clarke 1995, 1996e). This is the process whereby an organisation establishes that a party it is dealing with is:
In addition, there are many circumstances in which organisations undertake authentication of value, e.g. by checking a banknote for forgery-resistant features like metal wires or holograms, and seeking pre-authorisation of credit-card payments.
Another approach is the authentication of attributes, credentials or eligibility. In this case, it is not the person's identity that is in focus, but rather the capacity of that person to perform some function, such as being granted a discount applicable only to tradesmen or club-members, or a concessional fee only available to senior citizens or school-children, or entry to premises that are restricted to adults only.
An anonymous record or transaction is one whose data cannot be associated with a particular individual, either from the data itself, or by combining the transaction with other data.
A great many transactions that people undertake are entirely anonymous, including:
Some of the reasons that people use anonymity are of dubious social value, such as avoiding detection of their whereabouts in order to escape responsibilities. Other reasons are of arguably significant social value, such as avoiding physical harm, enabling 'whistle-blowing', avoiding unwanted and unjustified public exposure, and keeping personal data out of the hands of intrusive marketers and governments.
Some categories of transactions, however, are difficult to conduct on an anonymous basis, without one or perhaps both of the parties being known to the other. Examples of transactions where there is a strong argument for identification include:
An identified record or transaction is one in which the data can be readily related to a particular individual. This may be because it carries a direct identifier of the person concerned, or because it contains data which, in combination with other available data, links the data to a particular person.
There is a current tendency for organisations to try to convert anonymous transactions (e.g. visits to counters, telephone enquiries and low-value payments) into identified transactions. The reasons for this trend include:
The privacy interest runs emphatically counter to attempts to convert anonymous into identified transactions.
Beyond anonymous and identified transactions, an additional alternative exists.
A pseudonymous record or transaction is one that cannot, in the normal course of events, be associated with a particular individual.
Hence a transaction is pseudonymous in relation to a particular party if the transaction data contains no direct identifier for that party, and can only be related to them in the event that a very specific piece of additional data is associated with it. The data may, however, be indirectly associated with the person, if particular procedures are followed, e.g. the issuing of a search warrant authorising access to an otherwise closed index.
To be effective, pseudonymous mechanisms must involve legal, organisational and technical protections, such that the link can only be made (e.g. the index can only be accessed) under appropriate circumstances.
Two closely related techniques are:
Pseudonymity is used in some situations to enable conflicting interests to be satisfied; for example in collections of highly sensitive personal data such as that used in research into HIV/AIDS. It is capable of being applied in a great many more situations than it is at present.
Generalising from this example, pseudonymity is used to enable the protection of individuals who are at risk of undue embarrassment or physical harm. Categories of such people range from celebrities and VIPs (who are subject to widespread but excessive interest among sections of the media and the general public) to protected witnesses, 'battered wives', celebrities under threat from 'stalkers', and people in security-sensitive occupations.
Another application of pseudonymity is to reflect the various roles that people play. For example, a person may act as their private selves, an employee of an organisation, an officer of a professional association, and an officer of a community organisation. In addition, a person may have multiple organisational roles (e.g. substantive position, acting position, various roles on projects and cross-organisational committees, bank signatory, first-aid officer, fire warden), and multiple personal roles (e.g. parent, child, spouse, scoutmaster, sporting team-coach, participant in professional and community committees, writer of letters-to-the-newspaper-editor, chess-player, participant in newsgroups, e-lists, chat-channels).
In this context, the terms 'identity' and 'identifier' become awkward, because a person may have multiple roles, and hence 'identities' and 'identifiers'. It is therefore useful to have another word available that can be used to refer to each of these virtual entities. The terms 'pseudonym' or 'pseudo-identity' are tenable; but the term 'nym' appears to be gaining currency.
Morover, nyms may not be associable with a specific person, or may only be associable with a specific person if particular conditions are fulfilled. Hence a further and important application of pseudonymity is the use of information technology to support multiple nyms. Under such arrangements, a person sustains separate relationships with multiple organisations, using separate identifiers, and generating separate data trails. These are designed to be very difficult to link, but, subject to appropriate legal authority, a mechanism exists whereby they can be linked.
In addition, a person may be able to establish multiple relationships with the same organisation, with a separate digital persona for each relationship. This may be to reflect the various roles the person plays when it interacts with that organisation (e.g. contractor, beneficiary, customer, lobbyist, debtor, creditor). Alternatively, it may merely be to put at rest the minds of people who are highly nervous about the power of organisations to bring pressure to bear on them.
In the new contexts of highly data-intensive relationships, and Internet-mediated communications, pseudonymity and multiple digital personae are especially important facets of human identification and information privacy.
A list of surveys of public opinion in North America and Australia is in Clarke (1998).
Australian public opinion is discussed at Clarke (1996) and in Clarke (1997).
Governments throughout the world recognised that a problem existed, and a great deal of legislation has been passed since the first statute in 1970. Background is provided in Clarke (1998).
Technology is bringing with it ever more challenges, and privacy advocates worldwide are actively seeking to sustain and extend privacy protections, in order to cope with these intrusions.
Australia was an early participant in and contributer to discussions about privacy. Australian Parliaments, on the other hand, have been among the tardiest in the world to establish legislative protections.
The recent history, and the present situation, are documented in references provided under my Current Awareness section.
The only means whereby people can sustain privacy protections, and with it their humanity, is through 'eternal vigilance', and hard-nosed lobbying.
A set of reasonably accessible materials is provided in Clarke (1995).
For more advanced materials, see the Reference List below.
HREOC (1992-) 'Federal Privacy Handbook: A Guide to Federal Privacy Law and Practice' Redfern Legal Centre Publishing Ltd, 13A Meagher St, Chippendale NSW 2008 (a loose-leaf, periodically updated compendium of privacy-related Statutes, Regulations, Guidelines, Determinations, Codes of Conduct and Compliance Notes)
HREOC (1995a) 'Community Attitudes to Privacy', Information Paper No. 3, Human Rights Australia - Privacy Commissioner, Sydney (August 1995)
Hughes G. (1991) 'Data Protection Law in Australia', Law Book Company, 1991
Morison W.L. (1973) 'Report on the Law of Privacy' Government Printer, Sydney, 1973
NSW Health (1996) 'Information Privacy: Code of Practice' N.S.W. Health Commission, May 1996
PCA (1998) 'National Principles for the Fair Handling of Personal Information' Office of the Privacy Commissioner, Australia, February 1998. Also at http://www.hreoc.gov.au/hreoc/privacy/natprinc.htm
Tucker G. (1992) 'Information Privacy Law in Australia' Longman Cheshire, Melbourne, 1992
A vast array of materials relating to privacy and dataveillance is provided at ../DV/
The archives of the Privacy Law and Policy Report, at http://lexsun.law.uts.edu.au/~graham/PLPR_guide.html
The Privacy Law & Policy Reporter's Australian Privacy Guide, at http://www2.austlii.edu.au/~graham/PLPR_australian_guide.html
The Privacy Law & Policy Reporter's Worldwide Guide to Privacy Resources, at http://www2.austlii.edu.au/~graham/PLPR_world_wide_guide.html
The Commonwealth Privacy Act 1998, at http://www.austlii.edu.au/au/legis/cth/num_act/pa1988108
In addition, here are:
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, and a Visiting Professor in the Department of Computer Science at the Australian National University.
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 50 million in early 2015.
Sponsored by Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916
Created: 15 August 1997 - Last Amended: 24 July 2016 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/DV/Intro.html