Roger Clarke's Web-Site
© Xamax Consultancy Pty Ltd, 1995-2018
|Identity Matters||Other Topics||Waltzing Matilda||What's New|
Roger Clarke **
Revised Draft of 9 August 2008
© Xamax Consultancy Pty Ltd, 2008
Available under an AEShareNet licence or a Creative Commons licence.
This document is at http://www.rogerclarke.com/DV/IdTerm.html
A new journal that focusses on a specific area faces a particular challenge in relation to the language that its contributors use. On the one hand, it is essential that there be sufficient commonality of understanding among its authors and readership for dialogue to take place. On the other hand, sufficient diversity of perspective must be encompassed to enable progress to be made.
This paper offers a set of working definitions of key terms that underpin studies of identity in the information society. They are offered not as authoritative interpretations, but as a baseline against which refinements, variations and alternative interpretations can be compared.
The definitions are instrumentalist in their origins and purpose, drawing on analyses conducted over two decades, in both research and consultancy contexts. They adopt the conventional ontological presumptions: that there is a real world of things; and that there is an abstract world of data that is created, stored, transmitted, used and disclosed by means of manual procedures and automated processes that utilise various kinds of information technology.
The definitions have been developed through a long series of publications, most importantly those listed in the sources below. They are summarised in the glossary in Clarke (2004a) with a version containing longer descriptions in Clarke (2004b). A few meta-comments are provided in italics. They relate to the origins of the less commonly-used terms and alternatives to them.
Figure 1 provides a diagrammatic representation of the key concepts, which is intended to illustrate and summarise the text below.
First published in Clarke (2003)
Entity. An entity is a real-world thing. The notion encompasses pallets piled with cartons, the cartons, and each item that they contain, plus artefacts such as computers and mobile phones, and animals and human beings.
Identity. An identity is also a real-world thing, but is of virtual rather than physical form. Some kinds of entity may present many identities. For example, identities may correspond to the multiple processes that are running in a computing device, or the particular SIM-card currently inserted into a particular mobile phone. A person may also present many identities, to different people, and in different contexts. Each identity is commonly a presentation or role of an underlying entity.
During recent decades, organisations have co-opted the term 'identity' to refer to something that they create and that exists in machine-readable storage. Better terms exist to describe that notion (such as 'digital persona', discussed below). The term 'identity' has widespread usage among normal people to refer to a real-world phenomenon evidenced by human beings, and it is important that observers respect that usage rather than co-opting the term for other purposes.
Attributes. Both entities and identities have attributes, or characteristics. For example, human entities have characteristics such as hair colour and expertise; whereas an identity such as 'associate editor of this journal' has attributes like 'domain of responsibility'. Attributes, like the things they are associated with, exist in the real world.
Records and Data-Items. In the abstract world of information systems, each entity and identity may be represented by a record that contains data. Each record relates to a particular instance of the general category of entity (e.g. computers, or human beings) or of identity (e.g. processes running in a computer, or roles played by a human being). The attributes of the real-world entity or identity are represented by the content of data-items stored in the relevant record.
Each (id)entity may give rise to associated records in multiple data collections, but each record is intended to relate to just one (id)entity. Hence the cardinality is shown as a 1:n relationship, by which is meant that each (id)entity is associated with 'n' (i.e. 1 or more) records.
The relationship between entities and identities is more complex, and is consequently shown as an m:n relationship. Firstly, each entity may have multiple identities (e.g. a person may play multiple roles, and a mobile-phone may contain multiple SIM-cards, at least at different times and in some cases even at the same time). In the diagram, that is represented by the 'n' at the end of the arrow.
In addition, each of the identities may be used by multiple entities, and hence the other end of the arrow is marked with an 'm'. For example, the identity 'associate editor' is adopted by multiple people, both in parallel and in succession. Another example is 'Googlebot', which is an identity adopted by many devices controlled by Google Inc. which trawl around the Web collecting web-page content. In the diagram, that is represented by the 'm' at the end of the arrow.
The ambiguity in the relationship between entities and identities may be intended and well-known. Alternatively, a sentient entity might want to be the only user of a particular identity, or a record-keeper may want a particular identity to be used only by a specific entity. It may not be feasible, however, to prevent use of identities by other parties. Such activities are described by such terms as impersonation, masquerade, identity fraud and identity theft.
Digital Persona. The collection of data stored in a record is designed to be rich enough to provide the organisation with an adequate image of the represented entity or identity. The organisation might use the term 'identity' to refer to that image; but to avoid ambiguity it is far preferable that some other term be used. A candidate term is digital persona. Although this was my own coinage, first published in Clarke 1994a, it is in any case an intuitive term and has gained some degree of currency. Another candidate term is e-persona. The term 'partial' (which originated in the sci-fi genre) is also a contender, because it underlines the inherent incompleteness of a digital persona in comparison with the real-world entity or identity it represents.
Authorisation. In addition to data-items that represent attributes of the real-world identity or entity, a record may contain data generated by the record-keeper. An important example is authorisations (in some contexts referred to as permissions or privileges) that are associated with an (id)entity. Each associate editor is delegated specific responsibilities and authority by the editor. An employee will generally not be authorised to approve his or her workmates' sick leave forms unless their usual manager is absent and the person is 'acting up' as his or her workmates' supervisor. Similarly, in the virtual world, a person's access to computer applications and databases depends on the identity or role that they are performing.
Nymity. The term nymity usefully encompasses both anonymity and pseudonymity. The term anonymity refers to a characteristic of and identity and the records associated with it, whereby they cannot be associated with any particular entity, whether from the data itself, or by combining it with other data. The term pseudonymity refers to a similar but materially different characteristic of an identity and the records associated with it. In this case, the records and identity may be able to be associated with a particular entity, but only if legal, organisational and technical constraints are overcome. In the diagram, nymity is signified by the interference with the arrow linking the entity with the nymous identity.
Identifier. An identity can be distinguished from others in the same category by means of some sub-set of its attributes. A data-item or items that represent such attributes is called an identifier.
One example of an identifier is the particular name or name-variant that a person commonly uses in a particular context (such as with friends, with family, and when working in a customer-facing role such as a telephone help-desk). For operators of information systems, names are inconvenient identifiers. Categories of identifier that are more convenient for them include an organisation-imposed alphanumeric code or a username (in the case of a human identity); an International Mobile Subscriber Identity or IMSI (in the case of the identity of the SIM-card currently in a particular GSM mobile phone); and a process-id (e.g. for a software agent).
Entifier. An identifier is associated with an identity, and not directly with the underlying entity, e.g. not directly with a person, a mobile-phone, or a computer. In order to distinguish an entity from others of the same category, a separate term is needed to refer to a suitable sub-set of the entity's attributes. An appropriate term is entifier. The term entifier has been used consistently in my works since 2001, but is not yet widely adopted. It has the advantages of being obvious and being otherwise unused. Contemporary approaches to 'identity management' suffer important deficiencies (discussed in Clarke 2008) that will not be overcome until the concept is better-appreciated, and a commonly-used name arises for it.
Examples of entifiers that distinguish an artefact from others of the same category include a computer's processor-id (or the identifier of its network interface card / NICId) and the International Mobile Equipment Identity (IMEI) which distinguishes each mobile-phone (as distinct from the subscriber module such as a SIM-card which is currently inserted in it).
In the case of human beings, distinguishing one entity from another can be achieved by means of biometrics. A biometric is a measure of some aspect of the physical person that is unique (or is claimed, or assumed, to be so). A further possible entifier for a human is what is usefully referred to as an 'imposed biometric' such as a brand, an RFID tag fastened to the person, or an implanted chip.
Identification. Identification refers to the process whereby data is associated with a particular identity. This is achieved by acquiring an identifier for it, such as a person's name, or a cargo container's unique number. This application of the term is consistent with dictionary definitions, and has been used in my works since Clarke (1994b). The term has many other, loose usages, however, particularly as a synonym for 'identifier' (discussed above) or for 'token' (discussed below). It is incumbent on analysts and authors of formal works to avoid such ambiguities.
Entification. The association of data with a particular entity depends on the acquisition of an entifier such as a processor-id or a human biometric. This is usefully described as entification. This term has been used in my work since 2001, but to date neither it nor any equivalent has become mainstream. The emergence of some such term is important, because there are material differences between identification and entification, firstly conceptually, secondly in terms of the data involved, and thirdly in relation to their impacts and implications.
Token. (Id)entification procedures need to be reliable and inexpensive. This can be facilitated by pre-recording an (id)entifier on a token. One common form of token is a card, with the data stored in a physical form such as embossing, or on, or in, a recording-medium such as a magnetic stripe or a silicon chip.
Nym. Several categories of identifiers can be distinguished, depending on whether or not they can be associated with the underlying entifier. The term pseudonym refers to a circumstance in which the association between the identifier and the underlying entity is not known, but in principle at least could be known, e.g. if access could be gained to data that is normally protected (such as an index of telephone subscribers). If an identifier cannot be linked to an entity at all, then it is appropriately described as an anonym. The term nym usefully encompasses both pseudonyms and anonyms.
The term 'pseudonym' is widely used, and has a large number of synonyms. In contrast, only a small number of authors have used the term 'nym', although it is readily traceable back prior to 1997. Even fewer have used the term 'anonym', but it is far from unknown and I have used it consistently in my work since 2003. It is important to have a term such as 'anonym' available, because it is entirely feasible to conduct persistent communication with an identity whose underlying entity or entities is, and will remain, unknown. A celebrated example is the whistleblower known as the 'Deep Throat' who brought US President Nixon undone.
Data Silo. As indicated by the cardinality markers in Figure 1, a real-world (id)entity may have multiple records associated with it, by means of multiple (id)entifiers. Each set of records may be a 'data silo', separate from the others. In particular, records about an individual that are held by different government agencies and different corporations are maintained separately from one another, in many cases as a legal requirement. During recent years, this phenomenon has often been discussed as an impediment to quality of service, and even more so to efficiency in business and government. These justifications have been used for the breaking down of data silos through the correlation, matching, consolidation or merger of multiple sets of records. This has undermined the longstanding privacy-protective aspect of data silos.
Identity Silo. When data silos are destroyed, the correlation, matching, consolidation or merger is undertaken on the basis of one or more identifiers, such as name and date of birth, or commonly occurring identifying codes. An identity and identifier that is used for a restricted purpose is usefully referred to as an 'identity silo'. A multi-purpose identifier is expressly intended to enable the conflation of identities, usually within some cluster of related functions such as taxation, health insurance and superannuation / national insurance / self-funded pensions. A general-purpose identifier, such as the national identity number that is imposed on the residents of some countries, is intended to enable the merger of all 'partials', deny the right to nymity, and thereby provide organisations with greater power over people. The term 'identity silo' is my own coinage, in consistent use since 2006. It is a natural extension of the established data silo notion, but has not at this stage come into common usage.
Authentication. The term authentication refers to a process that establishes a level of confidence in an assertion. The degree of confidence achieved in the assertion is determined by the quality or strength of the authentication process. The term 'verification' is sometimes used as a synonym for authentication.
Assertion. There are many different categories of assertion that may be important in particular contexts. They include assertions of fact, assertions relating to value, assertions that a particular identity or entity has a particular attribute, assertions that an entity is in a particular location, and assertions that an actor has the capacity to represent, or act as an agent for, a principal.
Authenticator. The way in which authentication is performed is by cross-checking the assertion against one or more authenticators, or items of evidence. For example, an assertion of value may be checked by examining the characteristics of the banknote that is being offered, or by comparing a newly-executed written signature with one previously executed by the presenter of the cheque or card; and an assertion that a person qualifies for a trade discount at a retail outlet may be authenticated against evidence of a trade qualification or a company letterhead.
Identity Authentication. Where the assertion is that a particular identity is performing a particular act, then the appropriate term to describe the process whereby a level of confidence is achieved in the assertion is identity authentication. Identity authentication is quite distinct from identification, which was described above as the process whereby data is associated with a particular identity, by acquiring an identifier for it. The alternative term identity verification (often just 'verification') is much-used in industry as a synonym for identity authentication. This term is misleading, however, because it implies a very high level of confidence ('verity'). Strong authentication of identity is very challenging and expensive for whoever is doing the authentication, and onerous on, and even demeaning of, the person on whom it is imposed.
Evidence of Identity (EOI). The authenticators used in the context of identity authentication are commonly referred to as evidence of identity (EOI). An alternative term that is in common use is proof of identity (POI). The term is misleading, because it implies a level of reliability that is generally unattainable.
Computer and telephony network operators design into their schemes means to authenticate computer process-ids and mobile-phone SIM-card identifiers. In the case of human identities, several forms of EOI are used. They include 'what the person knows' (such as a password or PIN), what the person does (such as the act of providing a written signature), and what the person has (such as credentials with physical existence, including tokens and documents; or with digital existence, such as the ability to generate a digital signature using a particular cryptographic key).
Entity Authentication. Where the relevant assertion is that a particular act is being performed by a particular entity, the term entity authentication needs to be applied. This is quite distinct from entification (defined earlier as the acquisition of an entifier).
To conduct entity authentication for an artefact, a test needs to be conducted of the claim that the device is properly distinguished by means of a relevant entifier (such as the processor-id or mobile-phone-id). To conduct entity authentication for a human, it is necessary to collect a measure of what the person is (a biometric), or what the person is now (i.e. an imposed biometric), and then compare the measure against some previously collected and stored measure of the same thing. Tokens such as cards can be designed to assist in entity authentication.
It is common for 'what the person is' to be treated as being a form of identity authenticator rather than entity authenticator. This is erroneous, and harmfully so. It was noted above that authentication of human identities is challenging, expensive, onerous and even demeaning. Authentication of human entities is substantially more so, suffers serious security vulnerabilities, and is highly personally intrusive and degrading.
Clarke R. (1988) 'Information Technology and Dataveillance' Commun. ACM 31,5 (May 1988) 498-512, at http://www.rogerclarke.com/DV/CACM88.html
Clarke R. (1994a) 'The Digital Persona and its Application to Data Surveillance' The Information Society 10,2 (June 1994), at http://www.rogerclarke.com/DV/DigPersona.html
Clarke R. (1994b) 'Human Identification in Information Systems: Management Challenges and Public Policy Issues' Info. Technology & People 7,4 (December 1994), at http://www.rogerclarke.com/DV/HumanID.html
Clarke R. (1999a) 'Anonymous, Pseudonymous and Identified Transactions: The Spectrum of Choice', Proc. IFIP User Identification & Privacy Protection Conference, Stockholm, June 1999, at http://www.rogerclarke.com/DV/UIPP99.html
Clarke R. (1999b) 'Person-Location and Person-Tracking: Technologies, Risks and Policy Implications' Proc. 21st International Conf. Privacy and Personal Data Protection, Hong Kong, September 1999. Revised version published in Info. Techno. & People 14, 1 (2001) 206-231, at http://www.rogerclarke.com/DV/PLT.html
Clarke R. (2001a) 'Biometrics and Privacy' Xamax Consultancy Pty Ltd, April 2001, at http://www.rogerclarke.com/DV/Biometrics.html
Clarke R. (2001b) 'The Fundamental Inadequacies of Conventional Public Key Infrastructure' Proc. Conf. ECIS'2001, Bled, Slovenia, 27-29 June 2001, at http://www.rogerclarke.com/II/ECIS2001.html
Clarke R. (2001c) 'Authentication: A Sufficiently Rich Model to Enable e-Business' Xamax Consultancy Pty Ltd, December 2001, at http://www.rogerclarke.com/EC/AuthModel.html
Clarke R. (2003) 'Authentication Re-visited: How Public Key Infrastructure Could Yet Prosper' Proc. 16th Int'l eCommerce Conf., at Bled, Slovenia, 9-11 June 2003, at http://www.rogerclarke.com/EC/Bled03.html
Clarke R. (2004a) 'Identification and Authentication Glossary' Xamax Consultancy Pty Ltd, extract from pp. 57-65 of 'Identity Management: The Technologies, Their Business Value, Their Problems, and Their Prospects' Xamax Consultancy Pty Ltd, March 2004, at http://www.rogerclarke.com/EC/IdAuthGloss.html
Clarke R. (2004b) 'Identification and Authentication Fundamentals' Xamax Consultancy Pty Ltd, May 2004, at http://www.rogerclarke.com/DV/IdAuthFundas.html
Clarke R. (2006) 'National Identity Schemes - The Elements' Xamax Consultancy Pty Ltd, February 2006, at http://www.rogerclarke.com/DV/NatIDSchemeElms.html
Clarke R. (2008) '(Id)Entities (Mis)Management The Mythologies underlying the Business Failures' Xamax Consultancy Pty Ltd, Prepared for an Invited Keynote at 'Managing Identity in New Zealand', Wellington NZ, 29-30 April 2008, at http://www.rogerclarke.com/EC/IdMngt-0804.html
The successive versions of this suite of inter-related definitions have benefited greatly from many interactions with my colleagues at ETC, now Convergence eBusiness Solutions Pty Ltd - David Jonas, Ian Christofis, Ross Oakley and Kevin Jeffery. Valuable feedback was also provided by participants in the many seminars at which the definitions and related models and analyses have been presented. Comments on advanced drafts by reviewers, David Vaile and Jill Matthews were valuable in clarifying the presentation. Responsibility of course lies with the author alone.
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, and a Visiting Professor in the Department of Computer Science at the Australian National University.
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 50 million in early 2015.
Sponsored by Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916
Created: 30 March 2008 - Last Amended: 9 August 2008 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/DV/IdTerm.html