Roger Clarke's Web-Site
© Xamax Consultancy Pty Ltd, 1995-2021
|Identity Matters||Other Topics||Waltzing Matilda||What's New|
Emergent Draft of 17 June 2021
Roger Clarke **
© Xamax Consultancy Pty Ltd, 2021
Available under an AEShareNet licence or a Creative Commons licence.
This document is at http://rogerclarke.com/ID/IDM-PM.html
This is the 2nd article in a series that presents and articulates a model of entities and identities that supports the design of effective information systems. Each is designed to be read as a standalone article, but are likely to be more fully appreciated if read in sequence. The series overview is at http://rogerclarke.com/ID/IDM-O.html.
There have been longstanding and ongoing difficulties in the area of identification and authentication, particularly where the entities in question are human beings. Suitable designs depend on understanding by designers of the nature of the phenomena they seek to document and to exercise control over. That requires a model of those phenomena that is pragmatic, in the sense of fitting to the needs of IS practitioners, but that also reflects insights from relevant aspects of philosophy.
This paper presents such a model. It commences by drawing on ontology, epistemology and axiology in order to establish an outline metatheoretic model. The model is articulated, firstly at the conceptual level and then at the data modelling level. Initially, a relatively simple model is established, sufficient for inanimate objects and artefacts. The more complex requirements of humans, organisations and active artefacts are then addressed. It is contended that the resulting model provides a robust framework for identification and authentication in IS.
[ REVISIT THIS LATER: ]
IS is concerned with system that handle data in a variety of ways. Distinctions are made between data processing (DP) systems, information systems (IS), decision support systems (DSS), and systems that act directly on the Real-World based on data handled in the Abstract-World on the basis on the models underlying Abstract-World systems.
Further important concepts are then discussed that underpin the models that enable IS to function. Real-world phenomena involve Entities that fall into different categories (such as the class of objects commonly called shipping containers) and Entity-Instances (in this case, particular shipping containers). Individual Entity-Instances are distinguished by particular Data-Items, which are usefully referred to as Entifiers.
An Entity-Instance is distinguished from other Entity-Instances by Entifiers such as the registration-number painted on the container. (If no Entifier exists, the Entity is a undifferentiated bulk commodity, such as a category of oil, a quality-level of coal, wheat or barley). Entification involves association of Data with an Entity-Instance.
An Entity may adopt various Identities. For example a particular Entity-Instance of a shipping container may be associated with a slot on a ship (or a truck or container flat-wagon for railway-haulage), or in a container-depot, or the cargo that it currently contains, or the lock currently on the door, or the refrigeration-unit installed in it. The process of Identification involves the association of Data with an Identity-Instance.
All of these concepts are applicable to (Id)Entities of all kinds, including inanimate objects, and living objects such as animals, including people. Their use in relation to inanimate objects and even animals is subject to few constraints. Mechanistic application to human beings is, on the other hand, fraught with difficulties, because human rights intrude. In the context of Data about humans, a number of further concepts are therefore relevant.
Of particular significance is Nymity, which arises where a particular Identity cannot be reliably associated with a particular Entity (Anonymity), or association can only be achieved if particular conditions are satisifed (Pseudonymity). The concept of a Nym encompasses both Pseudonyms and Anonyms.
The term Digital Persona refers to a Data Record that is sufficiently rich to provide someone with access to the record with an impression of the represented Entity or Identity that can be used in the Abstract-World as a proxy for the Real-World (Id)Entity.
Authentication is a process that establishes a level of confidence in an Assertion.
Assertions are of many kinds, including Value Assertions (this string of binary digits represents a bitcoin, millibitcoin or a satoshi), and Factual Assertions (an event occurred, such as a solar flare or a vehicle passing under a tollway gantry).
Of particular significance are Attribute Assertions (data-item represents the state of a particular Property of a particular (Id)Entity-Instance at a particular time, e.g. loadedness or otherwise of a container, or operational-readiness of the refrigeration-unit currently installed in it). A special case relevant to people and organisations is a Principal-and-Agent assertion (that a particular (Id)Entity-Instance has the legal authority to act on behalf of another (Id)Entity-Instance).
Another assertion-type of importance is that a particular Real-World (Id)Entity-Instance is appropriately associated with a particular (Id)Entifier, such as a container-number, or a Bill of Lading of an item of cargo stored inside it.
The model and the concepts it embodies are capable of being applied in a wide range of concepts. It is contended that, although some further model complexity is needed in some circumstances, this is a minimally sufficient model in many circumstances, variously to evaluate the design of particular, existing information systems, or to design new ones.
One application area of particular significance is to the notions of Data Quality and Information Quality. Another is the evaluation of so-called 'identity management' schemes used by organisations, both in physical/'meatspace' contexts and in electronic/digital/virtual/'cyberspace' contexts. Particular virtual domains in which (Id)Entity Authentication is often applied, but faces considerable challenges, include eCommerce, eBusiness and eGovernment systems.
The field of 'identity management' has been fraught for decades, with the battlefield strewn with the corpses of large numbers of failed schemes, and the techniques in use scarred by deficiencies. The work reported here reflects the author's longstanding belief that the problems derive from a mis-fit between designers' conceptions of the need, on the one hand, and the complexities of the real world on the other.
The management of data relating to entities and identities is a core component of the information systems (IS) field. Six decades after the emergence of the IS discipline, its nature is somewhat contested, and therefore appears appropriate to declare the interpretation applied in this paper. I view Information Systems (IS) research as the multi-disciplinary study of:
The practice of IS involves the design of systems to handle data and provide information to people, utilising information technologies (IT) as tools in the achievement of organisational and personal objectives. In IS practice, and practice-relevant IS research, the focus is to some extent on Real-World systems that exchange energy, but primarily on Abstract-World systems that deal in Data. Several categories of IS are distinguished. They emerged in the sequence presented below, reflecting the increases in capacity and sophistication in IS and IT across the last four decades of the 20th century.
A Transaction Data Processing (DP) System 'captures' Data, in some sense of that word, manipulates Data in ways useful to some purpose, and stores it in an manner organised in such a way that it can be discovered and accessed when needed. The notion of 'Data Capture' encompasses a number of diferent techniques, including:
In IS practice, an Information System (IS), used in a specific rather than a generic sense, is a set of interacting activities by humans and artefacts that performs one or more functions involving the handling of data, including data collection, creation and editing; data processing and storage; generation of Information through selection, filtering, aggregration, presentation and use; data disclosure; and data retention, archival and destruction. From a research viewpoint, the domain of IS is the study of information production, flows and use. The emphasis has been strongly on the use of data and information within organisations. In the US Business School tradition, the scope is narrow in that the beneficiary of the IS is predominantly, and even exclusively, an organisation, and the term Management Information System (MIS) is applied.
The scope of IS began with intra-organisational systems. As technologies matured and computing devices came to be used by individual people, and as the marriage of computing with communications enabled the interconnection of artefacts over distance, it progressively became feasible to operate inter-organisational systems (one-to-one), and then multi-organisational systems in various configurations. This culminated in open networks, with many systems now operating extra-organisationally, that is to say reaching beyond organisational boundaries to individuals (Clarke 1992).
There has also come to be a strong emphasis on the use of technology, often leading to narrow perspectives, misconceptions, unnecessary errors and harm, and missed opportunities. This evidences its most extreme and limiting form where the focus of the discipline is narrowed down to 'the IT artefact', and all other considerations are warped by that limitation.
A Decision Support System (DSS) uses available empirical data from operational support systems, combined with hypothetical or synthetic data, to enable 'what-if' investigations, and hence support strategic rather than tactical activities. Strategic thinking re-emphasises the importance of clarity about models of the relevant current and possible future realities.
A further category for which no mature term yet exists is IS that act in the Real-World. These take advantage of the marriage of computing and communications with robotics, by including actuators that enable direct action by elements of the system on the Real-World. The significance of this category of system is that actions may be delegated to artefacts, and arise from automated decisions.
Automated decisions may be based on fixed computational approaches ('algorithms' in the proper sense of the term), or on rule-based computations that are more or less analysable and capable of being subjected to scrutiny, or on opaque and inscrutably adaptive computational approaches applied to a sample of Empirical Data (commonly referred to as artificial intelligence / machine learning - AI/ML, typically applying artificial neural network techniques - ANN). The opaqueness of these techniques has the effects of precluding human review prior to action being taken, and undermining accountability. AI/ML techniques are often described as 'algorithmic', but are much more appropriately referred to as 'empirical'.
The author's previous endeavours during the period 1990-2010 presented an explicit model of real-world entities and identities that had considerable similarities with the implicit models used in industry and government, but also some important differences (Clarke 2001a, 2010a-d). A weakness of that model, however, was that it was not sufficiently grounded in existing theory, and hence was too easily dismissed by critics as being ad hoc. The significance of that weakness is underlined by the observation that IS that embody automated decisions and actions challenge the epistemological assumptions underlying IS practice and research, in that some aspects of information and codified knowledge may require reconsideration in the context of automated decision-making, particularly where it is of an inscrutable nature. Meanwhile, axiological assumptions are severely challenged by the apparent incapacity of such systems to embody human, social and environmental values.
This paper re-visits the problem-area, this time building on prior work in philosophy and the information systems (IS) literature. The establishment of intellectual foundations has given rise to some adjustments to the presentation of the model and the terminology used in describing it. Among the wide variety of possible philosophical assumptions, an approach is selected that reflects the pragmatic world of IS practice. This is directly relevant to that portion of IS research that seeks to deliver information relevant to IS practice. Given the recent, very strong tendency within the IS discipline towards sophistication and intellectualisation, and preference for addressing other researchers rather than IS professionals, the pragmatic metatheoretic model presented here will be relevant to only a moderate proportion of IS research.
The purpose of the model is to reflect the relevant complexities, and hence to guide organisations in devising architectures and business processes for IS that reflect real-world things and events, with a particular focus on systems in which some of the real-world things are human beings. The scope encompasses all aspects of the handling of data relating to all forms of entities and identities.
Wherever possible, the model presented here uses conventional terms in conventional ways. For each such term, it provides a definition that relates it to the remainder of the framework. Once defined, all of the key terms are thereafter referred to using an initial capital. However, many common usages of terms are ambiguous, inconsistent or unhelpful and even harmful to the effective design and operation of information systems. In these cases, terms are used, and in some cases varied or invented, in ways that are materially different from common usage. All are defined in ways that inter-relate them with other relevant terms within the model. A Glossary of the defined terms is provided.
The paper commences with an outline of the philosophical underpinnings of the analysis, comprising metatheoretic assumptions in three areas, relating to existence (ontology), knowledge (epistemology) and value (axiology). The conventional approach is adopted, with two levels one conceptual and the other concerned with data. The two-level model is then presented. Inanimate entities are addressed first, enabling a relatively mechanistic approach to be adopted. Human entities are then considered, which brings into play interests, rights and values, necessitating some further layers of complexity in the model. Also addressed are virtual entities, organisations and active artefacts. The following paper in the series extends the model to authentication, and further papers apply the model to access control, data and information quality, and the authentication of assertions of fact.
This section establishes the philosophical foundations underlying the model put forward in the later sections of the paper. The approach developed in the first paper in this series, Clarke (2021a) is briefly re-presented and extended. The section begins by identifying the elements of a conventional ontological position, involving the nature of reality and the relationship between humans and reality. It then declares assumptions of an epistemological nature, regarding what it means when we say that humans know things about the world. The third sub-section explains the axiological position, relating to the ways in which humans make value-judgements about alternative outcomes and hence about alternative strategic decisions. Figure 1 supports the textual explanations with a visual depiction of the key elements of the model.
The pragmatic approach adopted is that there is a reality, outside the human mind, where things exist (a position commonly referred to as 'realism'). Humans cannot directly know or capture those things. They can, however, sense and measure those things and create data reflecting them, and construct an internalised model of those things (an assumption related to the ontological assumption called 'idealism').
The model in Figure 1 accordingly distinguishes a Real World from an Abstract World. The Real World comprises Things and Events, collectively Phenomena, which have Properties. These can be sensed by humans and artefacts with reliability varying across a very wide spectrum. Humans create an Abstract World in which Entities are postulated that are intended to correspond to Real-World Things, and Attributes of Entities to represent the Properties of Things. Real-World Events give rise to changes in the Properties of Things, and these are reflected in the Abstract-World as Transactions that give rise to changes in Entitities' Attribute-values.
The abstract concept of an Identity, developed further below, caters for the different ways in which Entities present in different circumstances. The various kinds of Entities and Identities have Relationships with one another, represented by arrows in the depiction in Figure 1. The Relationships also have Attributes. Further discussion of these aspects of the model is provided in the following sub-sections.
In the IS field, it is necessary to adopt a flexible conception of what constitutes the Real-World. This is because some of the IS that practitioners develop, maintain and operate represent imaginary Things. Some IS model possible future IS, to, for example, assess those imagined or intended IS's operational effectiveness, efficiency or security. Other IS model purely formal systems such as games-worlds. Another category of pseudo-Real-Worlds involves past, possible future, and even entirely hypothetical contexts, such as the Earth's atmosphere millions of years ago, or following a large-scale meteorite strike, or 50 years from now, with and without stringent measures to reduce greenhouse gas emissions. The IS profession and discipline need to be able to contributew to and support activities in such areas.
Epistemology is the study of knowledge. The pragmatic assumptions adopted here are that both of the two alternative categories of philosophical theories are applicable, but in different circumstances. The proposition of 'empiricism' is that knowledge is derived from sensory experience. This works well in circumstances where the Things represented by Entities are inanimate, their handling is largely mechanical, and codified knowledge exists and is readily transmissible. This can apply, for example, in the cases of aircraft guidance systems and robotic production-lines.
On the other hand, some kinds of knowledge are internal and personal. The 'apriorist' or 'rationalist' proposition is that 'tacit knowledge' exists only in the mind of a particular person, is informal and intangible, and hence is not readily communicated to others. A different form of knowledge, usefully referred to as codified knowledge, can only emerge where individuals' insights can be extracted and structured. Comprehensive propositional ('know that') knowledge may be hard to come by, variously because of unstable phenomena, a high degree of environmental variability, or craft activities with a strong skills-base and hence a predominance of procedural ('know how to') knowledge. A pragmatic approach must support modelling not only in contexts that are simple, stable and uncontroversial, but also where there is no expressible, singular, uncontested 'truth'.
The Abstract World is depicted in Figure 1 as being modelled at two levels. The Conceptual Model level endeavours to reflect the modeller's perception of the Things, the Events and their Properties, by postulating Entities and Entity-Instances, presentations of Entities called Identities, and Transactions, with Relationships of various kinds among them, all with Attributes.
The notion of an Entity corresponds to a category of Things, and Transaction to a category of Events. In the dialect used by ontologists, the term 'universal' corresponds to a category, and 'particular' refers to an instance. For example, in biology, the notion 'species' (e.g. African Elephant) is a universal, and the notion 'specimen' is a particular. An example that is perhaps more pertinent to IS is the category cargo-containers, which is a universal or Entity, whereas a specific cargo-container is a particular or Entity-Instance. The ideas and terms used in this paper, and articulated further below, are similar to, but not identical with, related ideas in the well-developed and diverse sub-discipline of conceptual modelling.
The other level, referred to here as the Data Model, enables the operationalisation of the relatively abstract ideas in the Conceptual Model level. Central to this level is the notion of 'Data'. The term, used variously as a plural and as a generic noun, refers to any quantity, sign, character or symbol, or collection of them, that is in a form accessible to a person and/or an artefact. (The singular term 'datum' has fallen into disuse in recent times). 'Real-World Data' or 'Empirical Data' is data that represents or purports to represent some Property of a Real-World Phenomenon. That can be contrasted with 'Synthetic Data', which is Data that bear no direct relationship to any real-world phenomenon, such as the output from a random-number generator, or data created as a means of testing the performance of software under varying conditions.
The vast majority of real-world Things and Events do not give rise to Data. The background noise emanating from all points of the universe has been ignored for millions of years, until the last few decades, during which some astronomers have occasionally sampled a tiny amount of it. Some things about the trucks that carry goods in and out of a company's gates are of great interest to someone (such as which trucks, when, what they carried in, and what they carried out). But there is seldom any motivation to measure, let alone record, the pressure in the tyres on the trucks, the number of chip-marks in the paintwork, the condition of the engine-valves, or even the number of consecutive hours the driver has been at the wheel.
Of the real-world Things and Events for which Data is sensed or created, many kinds are very uninteresting. The streams of background noise emanating from various parts of the sky might on occasions contain a signal from a projectile launched from the earth, and just possibly might contain some pattern from which an inter-stellar event can be inferred, or perhaps the existence of intelligent life somewhere in the universe. But usually the contents are devoid of any value to anyone. Similarly, a great deal of the Data stored by commerce, industry and government is of interest for only a very short time, or 'just for the record', and kept only for contingencies, or because it was easier or cheaper than deleting it. The further notions of record, data-item, data-item-value, and the means whereby particular data-items, alone or in groups, may be used to differentiate among (Id)Entity Instances, are all addressed in a later section of this paper.
Beyond Data, the epistemological aspects of the pragmatic model comprise assumptions made about information, knowledge and wisdom. The term 'information' is used in many ways. Frequently, even in refereed sources, it is used without clarity as to its meaning, and often in a manner interchangeable with Data. The pragmatic model adopted in this paper uses the term 'Information' for a sub-set of Data: that Data that has value. Data has value in only very specific circumstances. Until it is in an appropriate context, Data is not Information, and once it ceases to be in such a context Data ceases to be Information.
The most straightforward way in which Data is useful is when it is relevant to a decision. A person's interest in the weather depends on whether that person has an interest in the conditions outside, and on where the person is now, or is going to. Data about a delivery of a particular batch of baby-food to a particular supermarket is lost in the bowels of the company's database, never to come to light again, unless and until something exceptional happens, such as the bill not being paid, the customer complaining about short delivery or poor product quality, or an extortionist claiming that poison has been added to some of the bottles.
The question as to what data is 'relevant to a decision' is not always clear-cut. On a narrow interpretation, Data is relevant and of value only if it actually makes a difference to the decision made. A broader interpretation is that Data is relevant and therefore of value if, depending on whether or not it is available to the decision-maker, it could make a difference to the decision.
In addition to decision-making, there are other circumstances in which Data can be interesting or valuable. When we read text, listen to audio, or watch 'infotainment' programs, we are seldom making decisions, and yet we perceive informational value in some of the Data presented to us. Sometimes it is merely humorous. Sometimes it is not what we would have expected, and therefore has 'surprisal' value ("The government might survive the election yet!" Or "An injury during a training session will keep the star player out of the Grand Final!"). In other cases, it may be something that fits into a pattern of thought we have been quietly and perhaps only semi-consciously developing for some time, and which seems, for no very clear reason, to be worth filing away.
Some people feel very uncomfortable with a definition that embodies such looseness, fuzziness and instability. Rather than a nice, straightforward 'thing', describable in mathematical terms, and analysable using formidable scientific tools, such a definition makes Information rubbery and intangible, a 'will o' the wisp'. I contend that the absence of that fuzziness from the definition lies at the heart of many problems in IS. By embodying in the IS profession's world-view a precise notion that bears little relationship with the Real-World, the modeller pre-destines the resulting IS to be a poor fit with the needs of the IS profession and the managers and executives whose needs they serve.
The question then arises as to how Data and Information relate to knowledge. Two contrasting conceptions of knowledge exist. One asserts that knowledge is a body of facts and principles accumulated by humankind over the course of time, that are capable of being stored in a warehouse. The other argues that facts and principles cannot be meaningful outside the mind of an individual human. Within the second school of thought, knowledge is the matrix of impressions within which an individual situates newly acquired information.
In order to cater for these two extremes, the pragmatic approach adopted in the model being presented here is that the term 'Knowledge' is to be avoided, except when qualified by one of two adjectives:
The assumption is often made that wisdom is closely related to Data, Information and Knowledge. Some presenters go so far as to depict a simple pyramidal arrangement, with large volumes of Data forming the base layer, smaller volumes of Information at the second-lowest layer, a slimmer, second-highest layer called Knowledge, and a layer at the peak called wisdom. The pragmatic model used here rejects such ideas as simple-minded and dangerous. It treats wisdom as being on an entirely different plane from Information, from Codified Knowledge and even from Tacit Knowledge.
The model assumes that, to the extent that 'Wisdom' exists, it is one of the following:
The final element of the pragmatic metatheoretic model is concerned with 'Value', in the sense of "the relative worth, usefulness, or importance of a thing" (OED II 6a). The values dominant in many organisations are operational, financial and economic. However, many contexts arise in which there is a pressing need to recognise broader economic interests, and values on the individual, social and environmental dimensions. Human values are particularly prominent in systems in which people are key players or users, and in systems that materially affect uninvolved people, usefully referred to as 'usees' (Clarke 1992, Fischer-H[[Ydieresis]]bner & Lindskog 2001, Baumer 2015).
The pragmatic approach to Value recognises that:
This section defines and discusses the notions of entity and identity, which are the two central features of the pragmatic metatheoretic approach adopted in this paper. It draws heavily on an earlier working paper (Clarke 2001a) and published article (Clarke 2010a-d), but re-casts the model in light of the metatheoretic discussions above. It first considers them within the Conceptual Model level, and then at the Data Model level. The notions are applied in this section to inanimate Real-World Things. The following section addresses additional considerations that arise when the Things are human beings.
The conception of an entity adopted here has a great deal in common with the approach used in a wide range of conceptual modelling techniques; whereas the identity notion diverges somewhat from the mainstream. An 'Entity' is an element of a Conceptual Model that corresponds with a Real-World Thing. It is a category or collective notion, or a set of instances. In one sense, recognition of Things and Entities is arbitrary, because a modeller can postulate whatever they want to postulate. Generally, however, a modeller has a purpose in mind, and postulates a category judged likely to be useful in understanding some part of the Real-World, and contributing to its management.
Examples of an Entity are the set of all cargo-containers, or the mobile-phones assigned by an organisation to its employees. Some objects comprise nested layers of objects. For example, cargo-containers may contain pallet-loads, and within that cartons, and within each carton smaller boxes. Each specific occurrence within the set of objects that makes up an Entity is an 'Entity-Instance'. Hence the Entity cargo-containers comprises many Entity-Instances, one for each particular container, and possibly many nested layers of Entity-Instances.
Each of the many specific conceptual modelling techniques has terms that correspond with those used here. In the case of the original Entity-Relationship Model of Chen (1976), an Entity corresponds with Chen's entity-set ("Entities are classified into different entity sets such as EMPLOYEE, PROJECT, and DEPARTMENT" (p.11)", and Entity-Instance has a degree of correspondence with Chen's entity: "An entity is a 'thing' which can be distinctly identified. A specific person, company, or event is an example of an entity" (Chen 1976, p.10 - but the model presented here does not treat an "event" as an Entity-Instance). An Entity may have 'Entity-Attributes', each of which is an element of a Conceptual Model that represents a Real-World Property. Containers, for example, have a colour, an owner, a type (e.g. refrigerated, or half-height), and various kinds of status (e.g. dirty or clean; and empty or loaded).
Many kinds of Entity are perceived rather differently by the modeller, depending on the context. An 'Identity' is a particular presentation of an Entity, as arises when it performs a particular role. A 'Role' is a pattern of behaviour adopted by an Entity. An Entity may adopt one Identity in respect of each Role, or may use the same Identity when performing multiple Roles.
An 'Identity-Instance' is a particular occurrence of an Identity. For example, any particular motor-vehicle is an Entity-Instance; but a motor-vehicle may at any given time be associated with an Identity-Instance, such as 'the getaway-car', 'the car carrying a person-at-risk' (e.g. the Pope), or 'the lead-vehicle in a convoy'. Another example is a single computing device, which is an Entity-Instance, supporting many processes that interact with one another and with processes running in other devices, each process being an Identity-Instance.
Whereas an Entity commonly has physical form, an Identity may have virtual form. An example of an Identity with physical form is the set of all SIM-cards inserted into mobile phones. Virtual form, on the other hand, is apparent in the case of processes running in consumer computing devices and communicating with other processes running in that or some other device. An Identity is related to the notion of role in Chen's ER Model: "The role of an entity in a relationship is the function that it performs in the relationship" (p.12).
The usage of 'Identity' in the pragmatic model presented here is very different from that attributed to the term during recent decades by most organisations. What are commonly referred to as 'identity management' services commonly embrace the implicit assumption that Entity and Identity are the same notion or that each Entity is limited to a single Identity. This does not correspond with Real-World phenomena, and this single error in mainstream models has led to a great many difficulties in the use of id management' services. The term 'identity' has widespread usage to refer to a Real-World phenomenon evidenced by human beings, and has subtleties that organisational practices have been ignoring. It is important that IS professionals and researchers, and the organisations that use IS, reflect Real-World phenomena, and respect common usage, rather than trapping themselves into misrepresentation, misunderstanding and mis-design.
An Identity may have 'Identity-Attributes', each of which is an element of a Conceptual Model that represents a Real-World Property. Whereas the colour of a car, and its make and model, are Attributes of the Entity, the dangerousness of its occupants is an Attribute associated with the Identity. Similarly, a mobile handset has different attributes from the SIM-card inserted into it, and a computer has different attributes from the various processes running inside it.
A 'Transaction' is an element of a Conceptual Model that corresponds with a Real-World Event. It has Transaction-Attributes that reflect Real-World Properties that the modeller considers to be relevant to the purpose. A key function of a 'Transaction-Instance' is to give rise to a change in the state of Attributes for one or more Entity-Instances or Identity-Instances.
A 'Relationship' is a linkage between two elements within the Conceptual Model level. Figure 1 in section 2 above depicts a Relationship between an Entity and an Identity with a line ending in an arrow at each end. This applies for example to mobile-handsets and SIM-cards. Entities may also have Relationships with other Entities, and Identities with other Identities. For example, motor vehicles need to be associated with other motor vehicles under joint contracts for roadside assistance, and where they are involved in the same accident. Similarly, containers need to be associated with the organisations that own them. Organisations also own and insure motor-vehicles, and hence the two Entities organisations and motor-vehicles need to have some form of link between them.
A Relationship may have 'Relationship-Attributes'. Cardinality is a particularly important attribute. At each end of the line depicting a Relationship it may be that no Relationship exists in that direction (cardinality 0), or a single linkage (1) may be mandatory, or a range of linkages may be possible (conventionally, 'n' and 'm', or '0-n' or '1-n). For example, a cargo container must have precisely one linkage with an owner (cardinality 1), whereas the Entity that corresponds to Real-World mobile-phone-handsets may be related to multiple Identity-Instances, associated with different SIM-cards that are inserted into it, successively or even simultaneously. The arrow-head on the other end of that line reflects the fact that a SIM-card may be used in multiple, successive mobile-phone-handsets. Similarly, an Entity for motor-vehicles has a one-to-many relationship with an Identity for 'getaway-cars'. Moreover, escapees may use a succession of vehicles, each of which in turn has the Identity 'getaway-car'; so the arrow depicting this Relationship is also two-headed.
In the remainder of this article, when referring to both Entities and Identities, the abbreviation (Id)Entities is applied, and the same approach is adopted to derivative terms such as (Id)Entity-Instance.
The previous sub-section had its focus on the Conceptual Model level. The (Id)Entity notions require further articulation at the Data Model level. The terms Data, Real-World Data, Synthetic Data and Information were introduced in s.2.2 above. The pragmatic approach proposed in this paper embodies several further concepts.
In the Abstract World in which IS operate, each Attribute of an (Id)Entity is represented by a 'Data-Item', which is a storage-location in which a discrete 'Data-Item-Value' can be represented. The term 'Value', in this context, is a somewhat generalised form of "a numerical measure of a physical quantity" (OED I 4). For example, Entity-Attributes of cargo-containers may be expressed at the Data Model level as Data-Items and Data-Item-Values of Colour = Orange, Owner = MSK (indicating Danish shipping-line Maersk), Type = Half-Height, Freight-Status = Empty.
A collection of Data-Items that refers to a single (Id)Entity-Instance is referred to as a 'Record'. A collection of Records may be referred to as a 'File' or data-set. A Record may relate to a particular Entity-Instance (e.g. a container, or mobile handset) or Identity-Instance (e.g. a SIM-card), or to a Transaction-Instance.
The term 'Metadata' refers to data that describes some attribute of other Data. Metadata may be explicitly expressed or captured, by cataloguers; or it may be automatically generated, i.e. inferred by software. It may be stored with the data to which it relates, or stored separately. During the last 2-3 decades, the term has become sufficiently widely-used that hyphenation is no longer common.
The Metadata concept is generic, and specific interpretations exist in a wide variety of contexts, including libraries, museums and health care, and for various media, including print-publications, web-pages, images and video. Examples relevant to the topic of this paper include the date on which data was collected, the scale against which the Data was measured (nominal, ordinal, cardinal or ratio), the meaning imputed to the Data at the time of collection, the contexts in which it was collected and has subsequently been stored and transmitted (its 'provenance'), and any supporting evidence for the Data's quality.
A vital question that needs to be addressed is the manner in which each individual (Id)Entity-Instance is distinguished from all of the other instances of the same (Id)Entity. Specific terms are adopted in the pragmatic metatheoretic approach proposed in this paper. The term 'Entifier' refers to any one or more Data-Items held in a Record whose value(s), alone or in combination, are sufficient to distinguish any particular Entity-Instance from all other Entity-Instances of the same Entity. The word 'entifier' is not to be found in the Oxford English Dictionary (OED), although 'entify' is, with a meaning not unrelated to that used here for 'entifier'. Surprisingly, as far as I can tell, 'entifier' is a neologism, first apparent in Clarke (2001), and first published in Clarke (2003), defined at the time as "the signifier for an entity".
Examples of single-item Entifiers include the BIC-code of a cargo-container (BIC being an abbreviation of Bureau International des Containers), the Vehicle Identification Number (VIN) of a motor-vehicle, and the International Mobile Equipment Identity (IMEI) of a mobile-phone. In some circumstances, a proxy-Entifier may be used, e.g. for a computing device, the Network Interface Card Identifier (NICId) of an Ethernet card that is installed in it.
Artefacts are usually distinguished by Entifiers that are purpose-designed, and hence comprise a single Data-Item. However, an example of a multi-data-item Entifier arises in jurisdictions that re-issue motor-vehicle registration-plates previously allocated to a now-defunct vehicle. To achieve the uniqueness that is highly desirable in an Entifier, a date-range needs to be included as part of the Entifier.
An 'Identifier' is any one or more Data-Items held in a Record whose value(s), alone or in combination, are sufficient to distinguish any particular Identity-Instance from all other Identity-Instances of the same Identity. This is a mainstream use of the term, as evidenced by Oxford English Dictionary (OED) definition 1a: "A thing used to identify someone or something".
Examples of single-item Identifiers include a code assigned by a traffic-control authority to a vehicle of interest, for example when monitoring average speed over a section of road, the Integrated Circuit Card Identification (ICCID) of a SIM-card, and a process-id (e.g. for a software agent).
Importantly, what constitutes an Identifier is open-ended. The term 'Candidate Identifier' refers to any combination of Data-Items in one Record that is considered capable of achieving reliable matches against the relevant Data-Items in another Record. The reliability, both generally, and in respect of any particular apparent match, varies greatly, and may be very difficult to estimate.
In Figure 2, a visual depiction is provided of the elements of the Conceptual and Data Modelling levels defined so far in this section.
When a Real-World Event occurs, and is reflected in a Conceptual Model-level Transaction, a Record arises, whose function is to cause a change of state in one or more Attributes of one or more (Id)Entities. Means are needed to establish which (Id)Entity-Instances are affected by the Transaction Record. This is achieved by means of (Id)Entification processes.
The term 'Entification' refers to the process whereby Data is associated with a particular Entity-Instance. This involves acquiring or postulating an Entifier that matches with previously-recorded Data-Item-Values. The term exists in some online dictionaries and with a not unrelated meaning, but not in the OED. The term has been used consistently in my work since Clarke (2001), but to date neither it nor, it seems, any equivalent has become mainstream. The emergence of some such term is important, because there are material differences between Identification and Entification, variously conceptually, in terms of the Data involved, and in relation to their impacts and implications.
Examples of Entification include the matching of a particular cargo-container's BIC-Code, or a motor-vehicle's VIN, to an existing Record. In addition to such purpose-designed Entifiers, Data-Items of convenience are often relied upon. For example, for computing devices that do not have a reliable, purpose-designed Identifier, the NICId of the Ethernet (or other) card inserted into the computing device, may be used as a proxy. An Ethernet NICId is an example of a multi-data-item Entifier, in that it comprises two Data-Items, an Organizational Unique Identifier (OUI) and a Manufacturer-Serial-No. Dependence on proxies of this nature has varying degrees of reliability.
The acquisition of the Entifier may be by observation followed by either transcription of the Data-Item-Value by a human, or alternatively by technologically-assisted means such as image-recording using a camera followed by application of optical character recognition (OCR) to extract the value. Another approach is to pre-store the Entifier in a machine-readable form, such as a barcode or a chip, and later use an appropriate technology to extract a copy of that pre-stored Data.
The term 'Identification' refers to the process whereby Data is associated with a particular Identity-Instance. This involves acquiring or postulating an Identifier that matches with previously-recorded Data-Item-Values. This application of the term is consistent with dictionary definitions, and has been used in this manner in my works since Clarke (1994b). The term has many other, loose usages, however, particularly as a synonym for 'identifier' (discussed above) or for 'token' or 'authenticator' (both of which are discussed below).
And example of the Identification process in operation is the matching of a SIM-card's ICCID to an existing Record. An example of the use of a multi-Data-Item Identifier is the recognition of a vehicle on the basis of its properties (such as make, model and colour) at each end of a section of roadway over which average speed is being assessed. Another example is the use, as a proxy Identifier for a particular process running in a computing device, the combination of a port-number and and IP-address, together with a date-range (to allow for IP-addresses being 'dynamic', i.e. subject to being re-assigned).
From an administrative perspective, (Id)Entification procedures need to be reliable and inexpensive. Achieving that aim can be facilitated by pre-recording an (Id)Entifier on a Token from which it can be conveniently captured. One common form of Token is a card, with the data stored in a physical form such as embossing, or on, or in, a recording-medium such as a magnetic stripe or a silicon chip.
This section has presented the key terms at the Conceptual Model level of Entity, Entity-Instance, and Entity-Attribute; Identity, Identity-Instance and Identity-Attribute; Transaction and Transaction-Instance and Transaction-Attribute; and Relationship, Relationship-Instance and Relationship-Attribute. At the Data Model level, the epistemological notions of Data and Information have been complemented by definitions for the terms Data-Item, Data-Item-Value, Record, File and Meta-Data. Mapping from the Conceptual to the Data Model has been presented as depending on (Id)Entifiers and (Id)Entification processes.
This section has used the simplifying assumption that the Things underlying the (Id)Entities are inanimate, and capable of being treated as mere objects, with minimal concern about the Thing's interests and about clashes among values. The following section relaxes that assumption and considers the additional factors that arise when the underlying Things are people.
The development of the model to this point has limited its focus to inanimate objects and their representations. This has enabled a straightforward, mechanistic approach to be adopted, and values to be left in the background. In many circumstances, animals are also treated as objects. Flies and mice are variously poisoned and injected, and the impacts are rendered as Data. Cattle are branded and ear-tagged, and pets have chips injected. On the other hand, animal welfare constraints are placed on handling of vertebrate animals during life and in relation to the manner of death. In some circumstances, Data is required by law to be gathered and stored, such as stocking densities for caged chickens and innoculation records, and some forms of animal slaughter are subject to monitoring and Data-recording.
Where the Entities being modelled are human beings, additional factors come into play, and hence both the Conceptual and Data Modelling levels need to be adapted in order to reflect those factors. One consideration is the 'free will' or volitional aspect of human beings: inanimate objects do not act of their own accord, and do not have interests that influence their behaviour. In addition, values and rights loom far larger when the Entities involved are human beings. The terms 'objectification', in its sense of "the demotion or degrading of a person or class of people ... to the status of a mere object" (OED 2.), and the recent terms 'digitalisation' / 'datafication' / 'datification' (Newell & Marabelli 2015), all carry a pejorative tone when used in respect of people. This is because the mechanistic application of data-handling notions to humans involves a clash of values between administrative efficiency on the one hand and humanism on the other. This section considers the impact on the modelling approach firstly at the conceptual and then at the data level.
In section 4.1, a series of concepts was discussed and defined. The application of these concept to humans requires care. The notion of human Entity remains, at this stage at least, reasonably uncontroversial, with Entity-Instances confined solely to specimens of the species homo sapiens. A great many Entity-Attributes are applicable only to human Entities. Some are physiological in nature, such as the person's hair-colour, gender, and date-of-birth or age-range. Others arise from the person's behaviour, such as residential address, qualifications, expertise, and capacity to act as an agent for another Entity-Instance.
Each human Entity-Instance may present many Identity-Instances, to different people and organisations, and in different contexts. The notion of Identity is especially important to humans, because each Entity-Instance (person) plays many roles in many contexts, and these in many cases give rise to separate Identity-Instances. Examples in economic contexts alone include seller, buyer, supplier, receiver, debtor, creditor, payer, payee, principal, agent, franchisor, franchisee, lessor, lessee, copyright licensor, copyright licensee, employer, employee, contractor, contractee, trustee, beneficiary, tax-assessor, tax-assessee, business licensor, business licensee, plaintiff, respondent, investigator, investigatee, and defendant. A similar richness exists in social contexts.
In many circumstances an Identity-Instance is a presentation or role of a single, specific underlying Entity-Instance, e.g. 'I' am the sole 'author of this paper'. On the other hand, some roles are filled by different people, in some cases only serially and in other cases in parallel as well. Examples of serial ambiguity include club treasurer and journal editor-in-chief, and examples of parallel ambiguity include club committee-member, journal senior editor, fire warden and race marshall.
Human Identity-Attributes are related to a particular human Entity's particular presentation or role, rather than to themselves as an Entity-Instance. For example, an eConsumer has a profile comprising such features as demographics and user-interface preferences. People performing roles in organisations inherit authorisations, permissions or privileges. While acting in their manager's absence, a person may be able to sign sick leave forms for their peers, and during an emergency, as fire warden, they can order the CEO's secretary, and even the CEO, to get out of the building. A major issue in data security and in fraud is the phenomenon of individuals abusing powers that they have by virtue of one role that they play, by applying them for extraneous purposes unrelated to that role. The Identity-Attribute commonly referred to as authorisation is accordingly very significant in many IS, and is further discussed in a later paper in the series.
In Figure 2 above, Entities and Identities are shown as having a Relationship. The complexities of this Relationship are particularly significant where the Real World Things are humans. Relationship has a Relationship-Attribute of cardinality. Any particular Relationship-Instance may be:
In the case of humans, each Entity-Instance may relate to multiple Identity-Instances (hence 'n'). Further, because each Identity-Instance may be adopted by multiple Entities, the other end of the arrow is marked with an 'm' (equivalent to 'n', but implying that it is a variable independent from the 'n' at the other end of the arrow). For example, the identities association treasurer and editor-in-chief of a particular journal are adopted by multiple people in succession. Others, such as association board-member and senior editor of a journal, may be adopted by multiple people at the same time.
Subtleties in the Relationships between human Entities and Identities need to be well-understood by the designers and users of IS, and reflected in data models and business processes. A particular human Entity-Instance may strongly desire to be the only user of a particular Identity-Instance (e.g. people are very particular about who exercises the capacity to operate on their various bank accounts). Similarly, an organisation may be very concerned that a particular Identity-Instance is used only by one or more specific Entity-Instances (e.g. for the signing of contracts that bind the organisation, and for making statements to the media). It is challenging, however, to prevent use of Identities by other parties. Undesirable activities of these kinds are described by such terms as impersonation, masquerade, spoofing, identity fraud and identity theft.
Transactions represent Real-World Events that give rise to changes in (Id)Entity-Attributes. Events involving humans can be both significant and sensitive, and hence considerable care is needed in the design and processing of such Transactions.
In section 4.2, further (Id)Entity notions were defined at the Data Model level. These too require further articulation where humans are involved.
Because of the high valuation placed on human-ness, many aspects of the manner in which Data relating to inanimate objects is handled is inadequate where the Data relates to human Entities. Since c.1970, as the collection and management of Data about humans exploded under the pressures of increased organisational scale, increased social distance, and increased IT capabilities, a great deal of public concern has arisen about the use and abuse of Personal Data by organisations. Personal-Data-Items vary enormously in their degree of sensitivity. However, no simple formula exists for assessing sensitivity. It is dependent on individuals, their personal histories and concerns, and the contexts that they find themselves in from time to time.
To address the risk that the activities of government and business might be negatively affected by these public concerns, laws relating to 'personal data/information' and 'data protection' emerged. The early, largely nominal protections have proven inadequate to placate an increasingly concerned public. As a result, laws have gone through several maturation phases, with the EU's General Data Protection Regulation (GDPR) currently seen as the benchmark, and appearing to drive further changes in law and practice throughout the world. These laws place considerable constraints on organisations' data-handling activities, and place considerable demands on the pragmatic model of (Id)Entification presented in this paper.
Organisations are confronted with challenges in relation to the collection, storage and use of particular Personal-Data-Items (e.g. religion, marital status, ethnicity, disability), and particular Personal Data-Item-Values (e.g. age, particular gender-affinities). Depending on the jurisdiction, (overt) discrimination based on such information, even if demonstrably relevant, may be precluded by law. There are also constraints on the sources from which organisations can draw Personal Data, and increasing demands for organisations to be able to document the source and demonstrate the quality of Personal Data - which translates into requirements for more and better Metadata.
A form of human Entifier is a biometric. This is a measure of some aspect of the physical person that is unique (or is claimed, or assumed, to be so). Examples include a thumbprint, fingerprints, an iris-pattern and DNA-segments. The uniqueness is not guaranteed. In theoretical terms, some biometric measures are capable of providing a very high probability of uniqueness. On the other hand, the practice of biometrics is far less reliable than theory suggests it might be, because a very substantial set of challenges have to be overcome. The literature on biometric challenges and resulting quality is somewhat sparse, but see Mansfield & Wayman (2002) and Clarke (2002), and the list of factors in Table 1. In some circumstances, occasional errors may matter very little and/or be easily discovered and corrected. On the other hand, some serious consequences arise from errors, varying from psychological, social and economic harm to some cases as dramatic as conviction and execution of the wrong person.
Extracted from Mansfield & Wayman (2002), App. A1, pp.28-32
Another category of human Entifier is usefully referred to as an 'imposed biometric'. Examples include a brand imposed by tattooing or other techniques on a person's skin, and a unique code pre-programmed into an RFID tag that is closely associated with the person, or implanted in them (Clarke 1994d, 1997, 2001a, 2002a).
As regards human Identifiers, common examples include the particular name or name-variant that a person commonly uses in a particular context, such as with family, with a particular group of friends, or when working in a customer-facing role such as a prison officer, psychiatric nurse, counsellor or telephone help-desk. Names are highly variable and error-prone. They do not represent convenient Identifiers for operators of information systems, and are often supplemented by synonym-breakers, such as date-of-birth or some component of address. More effective and efficient business processes can be achieved by means of an organisation-imposed alphanumeric code, such as a customer-code or a username (Clarke 1994d). Each human Identity-Instance may themselves use many Identifiers including variants of names, and may be assigned many more Identifiers by organisations.
As discussed above, some Identifiers comprise more than one Data-Item. In rich datasets, however, a large number of multi-data-item Candidate Identifiers may be available. Examples are particularly prevalent in the kinds of data-collections about which most people feel the greatest sensitivity: health data and financial data. For example, uniqueness can readily arise from unusual medical conditions and postcode of residence; or even place, gender and date of birth (Sweeney 2000). See also Ohm (2010) and Slee (2011). Yet it is precisely these kinds of rich data-collections that are being expropriated by governments obsessed by the 'big data' mantra, and blind to the issues of incommensurable data definitions, a-contextual applications of data, and low data-quality. The camouflage of Personal Data De-identification has been attempted, but rich data-sets are inadequately resistant to Personal Data Re-identification techniques. Personal Data Falsification is necessary if a balance is to be achieved between personal Values and collectivist Values (Clarke 2019).
The term Nymity refers to circumstances in which the relationship between Entity and Identity is unclear. The term Anonymity refers to a characteristic of an Identity-Instance, whereby it cannot be associated with any particular Entity-Instance, whether from the data itself, or by combining it with other data. In the case of Pseudonymity, on the other hand, association of an Identity with a particular Entity may be achieved, but only if legal, organisational and technical constraints are overcome (Clarke 1999b). In Figure 3, nymity is depicted as an obstacle to the arrow that links the Entity with the Nymous Identity.
Where either form of Nymity applies, it is inappropriate to use the term 'Identifier'. The term Pseudonym refers to a circumstance in which the association between the Identifier and the underlying Entity is not known, but in principle at least could be known. For example, a carefully-protected index may be used to sustain a link between a client-code and the name and address of the AIDS-sufferer to whom the record relates. If an Identifier cannot be linked to an Entity at all, then it is appropriately described as an Anonym. The term Nym usefully encompasses both Pseudonyms and Anonyms.
The term Pseudonym is widely used, and has a large number of synonyms (including aka, 'also-known-as', alias, avatar, character, handle, nickname, nick, nom de guerre, nom de plume, manifestation, moniker, persona, personality, profile, pseudonym, pseudo-identifier, sobriquet and stage-name). In contrast, only a small number of authors have used the term Nym, although it is readily traceable back prior to 1997. Even fewer have used the term Anonym, but it is far from unknown and I have used it consistently in my work since Clarke (2002b).
There are many circumstances in which an Identifier is unncessary and a Nym is entirely adequate. A common example is enquiries in which a set of circumstances is described by the enquirer, and a response is provided explaining the applicability of the law, or of an organisation's policies, to those circumstances. Enquiries are in many cases conducted as a single contiguous conversation. However, it is also possible for multiple, successive interactions to be connected with one another by means of a 'Persistent Nym'. A celebrated example of a Persistent Nym is the whistleblower who brought US President Nixon undone. 'Deep Throat' remained a Persistent Anonym from 1974 until 2005. 'Publius', which was used for contributions to debates about the U.S. Constitution, was a Persistent Anonym at the time, and has remained an Anonym since 1787.
During the early decades of IS, data-collections were designed for use within a particular context, and access was limited to that context. The term 'Data Silo' is used to refer to such arrangements. A frequently-encountered phenomenon in recent decades has been efforts by organisations to achieve linkage among Identifiers, in order to be able to associate, compare and/or consolidate Data-holdings in multiple collections, often across multiple organisations. The correlation, matching, consolidation or merger of separate records is undertaken on the basis of one or more Identifiers, such as name and date of birth, or commonly occurring identifying codes. Although some of these consolidation activities have benefits for individuals, they mostly address the needs of organisations. The protections that Data Silos afforded people are at least compromised by these activities, and even entirely destroyed.
A related concept is 'Identity Silos' - the limitation of the use of an (Id)Entifier to a specific context, IS or data-collection. This has been a strong protection for human values. (The term is my own coinage, in consistent use since Clarke 2006, and published in Clarke 2008, but appears to have been independently coined by a number of authors). The breaking down of Identity Silos represents the destruction of one of the most potent forms of data privacy protection, and hence is a significant contributor to the falls in organisational trustworthiness, and to public nervousness about manipulation of individuals' behaviour by governments and corporations alike.
An alternative approach to correlation among Identifiers is the use of Multi-Purpose Identifier. A common example is national registration numbers assigned to residents in many European countries, which are used within some cluster of related functions such as taxation, health insurance and self-funded pensions (also referred to as national insurance or superannuation). A General-Purpose Identifier, such as the national identity number that is imposed on the residents of countries such as Denmark, Estonia and Malaysia, is intended to enable the consolidation of all of an Entity-Instance's multiple Identity-Instances, complete the destruction of Identity Silos, deny the possibility of Nymity, and thereby provide the State, individual agencies and individual corporations with far greater power over the people they deal with (Clarke 1994d, 2006).
Entification refers to the process whereby Data is associated with a particular Human Entity-Instance. This depends on the acquisition of an Entifier such as a biometric, or an imposed biometric such as an implanted chip. All forms of biometric acquisition are highly personal and threatening, and many are demeaning. For example, high-quality recording of a thumbprint or fingerprints involves a skilled operator grasping the person's wrist and controlling the hand's movement, and iris-scans and retinal-scans involve submission of the body to whatever device the measuring organisation imposes on the individual. The moderate quality of those biometric measures results in a material degree of error and hence mistaken identity. The negative impacts of those errors commonly fall on the individual. Pre-stored Entifiers pre-stored on a Token such as a chip-card can be captured in a technologically-assisted or -performed manner. However, that greatly increases the risk of the Entifier being associated with the wrong person.
Pastoralists have had no qualms about clipping RFID-cards onto the ears of entire herds of stock-animals. Pet-owners have accepted the injection of chips into their beloved animals because they perceive it to increase the chances of a lost pet being returned to its owner. The same approaches, however, have historically excited revulsion when applied to humans. Early applications to humans have included chips in 'anklets' for convicts, and even remandees, in 'prisons without walls', in military 'dog-tags' to assist the identification of combat casualties, but also in a few consensual contexts, such as patient-tags to assist in ensuring that operations are performed on the right person; and chips injected into the bodies of staff in research facilities, and customers of fashionable bars, who want doors to open automatically for them. On the other hand, considerable concern has been expressed about Tokens imposed on the aged, and the insertion of chips into the tooth-enamel of children to identify victims of kidnapping and abduction was not attractive to parents. It remains to be seen whether and to what extent human values will be overridden and/or voluntarily sacrificed through this form of objectification of individuals.
Identification refers to the process whereby Data is associated with a particular Identity-Instance, in this case an Identity-Instance used by a human. It involves the acquisition of an Identifier, such as a name or a customer number. This may be provided by the person concerned, by voice or in textual form, by displaying a Token such as a membership card, or by making a Token available that contains a pre-stored Identifier capable of being read by a device operated by an organisation.
IS are designed to assist organisations in administering their interactions with humans by recording Data-Item-Values for relevant (Id)Entity-Instances. The Data-Item-Values for each particular (Id)Entity-Instance are stored in a Record that contains one of more of their Entifiers or Identifiers. Data-Item-Values contained in each new Transaction can be used to locate the appropriate Record on the basis of an (Id)Entifier that the Transaction contains. Hence decisions can be made, actions taken, and amendments made to the Record for that person.
During the early decades of IS, the primary source of organisations' Data about individuals was Transactions between that organisation and the individual concerned. Since the late 20th century, however, organisations have increasingly drawn Data from multiple, additional sources, and consolidated it into individuals' Records. The reliability of the association, the potential conflicts among meanings of apparently similar Data-Items, and the nature of the original collection and subsequent handling, result in a degree of doubt about data-quality standards in such circumstances. The degree of expropriation of Personal Data has intensified enormously since the emergence of the Digital Surveillance Economy c.2005. This commenced with the inversion of the originally user-driven World Wide Web by means of Web 2.0 technologies, and the explosion of social media (Zuboff 2015, Clarke 2019).
The term 'Digital Persona' refers to a model of an individual's public personality based on Data and maintained by Transactions, and intended for use as a proxy for the individual. In practical terms, it is a collection of Data stored in a Record that is rich enough to provide an organisation with what it treats as an adequate impression of the represented Entity or Identity. (The term is my own coinage, first presented at the Computers, Freedom & Privacy Conference in San Francisco Clarke 1993, and published in Clarke 1994a. But it is in any case an intuitive term and has gained some degree of currency Clarke 2012). It is quite common to see the term 'identity' used to refer to what is called here a Digital Persona; but 'identity' has many meanings, and to avoid ambiguity it is far preferable that some other term be used. Another candidate term is e-persona. The term 'partial' (which originated in the sci-fi genre) is also a contender, because it underlines the inherent incompleteness of a Digital Persona in comparison with the real-world Entity or Identity it represents.
A 'Projected Digital Persona' is under the control of the individual, and is fundamental to an individual's sense of self and self-esteem. It is vital to all kinds of performance art, from stage acting to job interviews and social media imagery and influencing. It enables individuals to achieve a safe space in which they can indulge in psychological game-playing (e.g. using handles and avatars), cultural creativity (e.g. using stage-names and noms de plume), inventive and innovative behaviour of a technical and economic nature, whistleblowing to expose waste, hypocrisy and corruption, and political opinion and dissent. The Projected Digital Persona is, of course, also exploited for criminal purposes, including the avoidance of Identification and the performance of various kinds of misleading behaviour that may be subject to civil and criminal sanctions, such as false rumours, 'alternative facts', defamation, deceptive commercial conduct and fraud.
An 'Active Projected Digital Persona' is capable of taking actions as an agent for the individual. An agent may be as simple as an auto-responder to emails, or as complex as a bot that conducts social interactions intended to create and maintain a Digital Persona radically different from the individual it nominally represents. An active agent may have varying degrees of autonomy. A Projected Digital Persona may constructively misrepresent the individual, to the point of in effect creating a pseudo-individual - an Identity-Instance with a very loose association with one or more underlying Entity-Instances. This may be done for psychological purposes as a form of self-defence or self-entertainment (an alter ego), or for social entertainment (as in role-playing in games), or as a means to obfuscate and falsify the person's profile to avoid manipulation by marketing organisations, or as a means conducting criminal behaviour while avoiding being identified by law enforcement agencies.
An 'Imposed Digital Persona' is controlled by someone other than the individual it is associated with. From the outset, it was clear that there was scope for an 'Active Imposed Digital Persona' to "[enable] people's interests or proclivities [to] be inferred from their recent actions, and appropriate goods or services offered to them by the supplier's computer program using program-selected promotional means" (Clarke 1994a, p. 83). This idea came into the mainstream from 2005 onwards, and forms the basis of the business model used by social media corporations. The term Digital Persona was applied soon after its coinage by IT industry CEO, Eric Schmidt (Adams 1997). In 2001, Schmidt became CEO of Google, departing from the company only in 2020. The Active Imposed Digital Persona has been one of key elements of the Digital Surveillance Economy (Clarke 2019).
Every Digital Persona is a simplification of a complex reality; and because it is incomplete, its use embodies a risk of mis-judgement. A Digital Persona is commonly constructed from multiple sources, and because those sources are imperfectly compatible, 'artefacts' may result, i.e. unwarranted inferences may be drawn about the individual associated with the Digital Persona. Added to that, although an Imposed Digital Persona may be known to the individual it is used to represent, it is very common for it to be used in covert fashion, and to be not merely inaccessible by the person concerned, but unknown to them.
In one sense, consumers might be pleased that marketing corporations are mis-reading them. On the other hand, search-services bias the sequence in which results are displayed based on the Digital Persona, and hence all forms of discovery are compromised and the service to consumers is manipulative (to the extent that the Digital Persona ia an accurate representation and applied effectively) and/or materially degraded (to the extent that it is inaccurate or poorly applied). Many individuals use the same Identity for multiple purposes, so manipulation may arise in one context and degraded services in another.
Dealings with government agencies are somewhat differently problematic, because the risk of wrong and unreasonable decisions can be high, the risk is borne by the individual not the agency, and prosecuting innocence is near-impossible, particularly where the Imposed Digital Persona is covert, and the content that constitutes it is inaccessible to the person. The nature of accusations is consequently unclear, in a manner related to Kafka's 'The Trial'.
The mobilisation of resources at scale depends on large groups acting in a coordinated fashion. Contemporary societies use many different organisational forms. The law recognises them as 'legal persons', and, in some common law countries, as either 'bodies politic' or 'bodies corporate'. In all cases, they are independent from the individuals who call them into being and who are from time to time their directors, members and employees.
Organisations present a challenge to the application of the pragmatic model of (Id)Entity. Organisations are in some sense an Entity, but, unlike both objects and humans they lack corporeal, Real-World form, and unlike humans they lack the capacity to act in the Real-World. On the other hand, they are accepted in law and practice as having existence, and having the capacity to make decisions, act in the Real-World, bear responsibilities and incur liabilities.
The approach adopted here is to recognise organisations as Virtual Entities, with Attributes and Relationships, that may present many Identities, to different people and organisations, and in different contexts, including as customer, supplier, employer, contractor, lender, borrower, investment and/or regulator or prosecutor. Organisational (Id)Entities are distinguished by means of many (Id)Entifiers, in particular names (e.g. associated with the organisation as a whole, and with particular business units, divisions, branches, trading-names, trademarks and brands), and numbers and codes assigned by other Entities. A key requirement, however, is that organisational (Id)Entities must have Relationships with other (Id)Entities which are of a principal-agent nature, whereby the organisational (Id)Entity delegates powers to decide and act, and, with that, delegates relevant parts of its responsibilities. The delegation may be through a chain of Virtual (Id)Entities (e.g. accounting firms), although ultimately, of necessity, to a human (Id)Entity.
With ongoing developments in the sophistication of artefacts, the scope must also be recognised for powers to be delegated to 'Active-Artefact Entities' (robots) and 'Active-Artefact Identities' (computing processes). Debate has commenced about accountability for the outcomes of acts performed or caused to be performed by Active-Artefact (Id)Entities. The century since Capek invented the word 'robot' has seen concerns about the implications of robotics migrate from artistic imagination to technological prospect. The artistic notion of a species of robots is associated with Capek a century ago, and articulated by Asimov and then Arthur C. Clarke, 1940-1990. Surprisingly, however, the earliest use of the term roboticus sapiens appears to be in Clarke (2014).
Many people have great difficulty with the notion that an organisation or a human could be permitted to avoid responsibilities by delegating them to an artefact. The approach adopted in the pragmatic model presented here is that organisational and human (Id)Entities may delegate powers to artefact (Id)Entities, but they can neither delegate nor escape responsibilities associated with, or arising from, the exercise of those powers.
A pragmatic model of (Id)Entification has been proposed, designed to reflect the relevant complexities of the Real World that need to be appreciated and dealt with by IS professionals and those IS researchers who address the needs of stakeholders. The model comprises two levels, firstly conceptual and then with a focus on data. The notions have been introduced progressively, with definitions provided for each term, and extracted into a Glossary. In a first pass across the territory, the simpler kinds of entities, inanimate objects and artefacts, have been addressed. Further concepts have then been introduced that are necessary to deal with humans, with organisations, and with active artefacts.
[ NEEDS A CROSS-CHECK
At various points in the exposition of the model, reference has been made to assertions about Data, about (Id)Entities, about the capacity of a particular (Id)Entity to act on behalf of another (Id)Entity. That leads to the question of what evidence supports these assertions. The next paper in the series applies the pragmatic model developed in this paper to the matter of authentication of assertions (Clarke 2021c).
[ From various,
TEXT, authorisations, permissions, privileges, single-on (single and otherwise)
[ From various, incl. http://www.rogerclarke.com/ID/IdModel-1002.html#MAs
[ From various, and check also:
This is being developed in a related project, at http://rogerclarke.com/SOS/IQLO.html
NEEDS A SUMMARY OF DIFFERENCES BETWEEN THE MAINSTREAM AND THIS MODEL - ELSE WHY HAVE WRITER AND READER GONE TO SO MUCH TROUBLE??
Adams B. (1997) '' Deseret News, 20 November 1997, at http://www.deseretnews.com/article/595933/Novell-introduces-goal-putting-a-friendly-face-on-computer-networking.html?pg=all
Baumer E.P.S. (2015) 'Usees' Proc. 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI'15), April 2015
Chen P.P.S. (1976) 'The Entity-Relationship Model - Toward a Unified View of Data' ACM Transactions on Database Systems 1 (March 1976) 9-36
Clarke R. (1992) 'Extra-Organisational Systems: A Challenge to the Software Engineering Paradigm' Proc. IFIP World Congress, Madrid, September 1992, PrePrint at http://www.rogerclarke.com/SOS/PaperExtraOrgSys.html
Clarke R. (1993) 'Computer Matching and Digital Identity' Proc. Computers Freedom & Privacy, Burlingame CA, March 1993, at http://cpsr.org/prevsite/conferences/cfp93/clarke.html/, PrePrint at http://rogerclarke.com/DV/CFP93.html
Clarke R. (1994a) 'The Digital Persona and its Application to Data Surveillance', The Information Society 10, 2 (June 1994)', at http://rogerclarke.com/DV/DigPersona.html
Clarke R. (1994b) 'Human Identification in Information Systems: Management Challenges and Public Policy Issues' Information Technology & People 7,4 (December 1994) 6-37, at http://rogerclarke.com/DV/HumanID.html
Clarke R. (2001a) 'Authentication: A Sufficiently Rich Model to Enable e-Business' Xamax Consultancy Pty Ltd, December 2001, at http://rogerclarke.com/EC/AuthModel.html
Clarke R. (2001b) 'The Re-Invention of Public Key Infrastructure' Working Paper, Xamax Consultancy Pty Ltd, 22 December 2001, at http://rogerclarke.com/EC/PKIReinv.html
Clarke R. (2002) 'Biometrics' Inadequacies and Threats, and the Need for Regulation' Xamax Consultancy Pty Ltd, April 2002, at http://www.rogerclarke.com/DV/BiomThreats.html
Clarke R. (2003) 'Authentication Re-visited: How Public Key Infrastructure Could Yet Prosper' Proc. 16th Int'l eCommerce Conf., Bled, Slovenia, 9-11 June 2003, PrePrint at http://www.rogerclarke.com/EC/Bled03.html
Clarke R. (2006) 'National Identity Schemes - The Elements' Xamax Consultancy Pty Ltd, February 2006, at http://www.rogerclarke.com/DV/NatIDSchemeElms.html
Clarke R. (2008) '(Id)Entities (Mis)Management : The Mythologies underlying the Business Failures' Proc. 'Managing Identity in New Zealand', Wellington NZ, 29-30 April 2008, PrePrint at http://www.rogerclarke.com/EC/IdMngt-0804.html
Clarke R. (2010a) 'A Sufficiently Rich Model of (Id)entity, Authentication and Authorisation' Proc. IDIS 2009 - The 2nd Multidisciplinary Workshop on Identity in the Information Society, LSE, London, June 2009, rev. February 2010 at http://rogerclarke.com/ID/IdModel-1002.html
Clarke R. (2010b) 'A Sufficiently Rich Model of (Id)entity, Authentication and Authorisation: Supplementary Materials' Xamax Consultancy Pty Ltd, February 2010, at http://rogerclarke.com/ID/IdModel-Supp-1002.html
Clarke R. (2010c) 'A Sufficiently Rich Model of (Id)entity, Authentication and Authorisation: Glossary of Terms' Xamax Consultancy Pty Ltd, February 2010, at http://rogerclarke.com/ID/IdModel-Gloss-1002.html
Clarke R. (2010d) 'A Sufficiently Rich Model of (Id)entity, Authentication and Authorisation: Application of the Model' Xamax Consultancy Pty Ltd, February 2010, at http://rogerclarke.com/ID/IdModel-App-1002.html
Clarke R. (2014) 'Promise Unfulfilled: The Digital Persona Concept, Two Decades Later' Information Technology & People 27, 2 (Jun 2014) 182 - 207, at http://www.rogerclarke.com/ID/DP12.html
Clarke R. (2014) 'What Drones Inherit from Their Ancestors' Computer Law & Security Review 30, 3 (June 2014) 247-262, PrePrint at http://www.rogerclarke.com/SOS/Drones-I.html
Clarke R. (2019) 'Risks Inherent in the Digital Surveillance Economy: A Research Agenda' Journal of Information Technology 34, 1 (March 2019) 59-80, PrePrint at http://www.rogerclarke.com/EC/DSE.html
Clarke R. (2021) ' A Pragmatic Metatheoretic Model for Information Systems Practice and Research' Xamax Consultancy Pty Ltd, May 2021, at http://rogerclarke.com/SOS/POEisy.html
Fischer-H[[Ydieresis]]bner S. & Lindskog H. (2001) 'Teaching Privacy-Enhancing Technologies' Proc. IFIP WG 11.8 2nd World Conf. on Information Security Education, Perth, Australia
Mansfield A.J. & Wayman J.L. (2002) 'Best Practices in Testing and Reporting Performance of Biometric Devices: Version 2.01' National Physical Laboratory Report, CMSC 14/02, United Kingdom, August 2002
Newell S. & Marabelli M. (2015) 'Strategic Opportunities (and Challenges) of Algorithmic Decision-Making: A Call for Action on the Long-Term Societal Effects of 'Datification'' The Journal of Strategic Information Systems 24, 1 (2015) 3-14, at http://marcomarabelli.com/Newell-Marabelli-JSIS-2015.pdf
Ohm P. (2010) 'Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization' 57 UCLA Law Review 1701 (2010) 1701-1711, at http://www.patents.gov.il/NR/rdonlyres/E1685C34-19FF-47F0-B460-9D3DC9D89103/26389/UCLAOhmFailureofAnonymity5763.pdf
Slee T. (2011) 'Data Anonymization and Re-identification: Some Basics Of Data Privacy: Why Personally Identifiable Information is irrelevant' Whimsley, September 2011, at http://tomslee.net/2011/09/data-anonymization-and-re-identification-some-basics-of-data-privacy.html
Sweeney L. (2000) 'Simple Demographics Often Identify People Uniquely' Data Privacy Working Paper 3, Carnegie Mellon University, 2000, at https://dataprivacylab.org/projects/identifiability/paper1.pdf
Zuboff S. (2015) 'Big other: Surveillance capitalism and the prospects of an information civilization' Journal of Information Technology 30, 1 (March 2015) 75-89, at https://cryptome.org/2015/07/big-other.pdf
This Working Paper draws on, consolidates and extends a long series of more than 10 working papers and 5 refereed publications. They are listed here as a matter of record and as a form of declaration of re-use or 'self-plagiarism'. Some are cited in the Working Paper. The conceptualisations and model have developed a great deal during the course of the more than 30 years over which the papers extend. There are accordingly inconsistencies both among them, and between the earlier papers and the present work. This includes considerable adjustment from the most iteration of the model from the version published in 2010, in order to achieve greater consistency with prior literature and the subsequently proposed pragmatic metatheoretical model.
Clarke R. (1990) 'Information Systems: The Scope of the Domain' Xamax Consultancy Pty Ltd, January 1990, at http://rogerclarke.com/SOS/ISDefn.html
Clarke R. (1992) 'Fundamentals of Information Systems' Xamax Consultancy Pty Ltd, September 1992, at http://rogerclarke.com/SOS/ISFundas.html
Clarke R. (1992) 'Knowledge' Xamax Consultancy Pty Ltd, September 1992, at http://rogerclarke.com/SOS/Know.html
Clarke R. (1994a) 'The Digital Persona and its Application to Data Surveillance', The Information Society 10, 2 (June 1994)', at http://rogerclarke.com/DV/DigPersona.html
Clarke R. (1994) 'Human Identification in Information Systems: Management Challenges and Public Policy Issues' Information Technology & People 7,4 (December 1994) 6-37, at http://rogerclarke.com/DV/HumanID.html
Dempsey G. (1999) 'Revisiting Intellectual Property Policy: Information Economics for the Information Age' Prometheus 17, 1 (March 1999) 33-40, at http://www.rogerclarke.com/II/DempseyProm.html
Clarke R. (2001) 'Information Management, Information Policy, Knowledge Management and Knowledge Organisations' Xamax Consultancy Pty Ltd, March 2001, at http://xamax.com.au/EC/IMKM.html
Clarke R. (2003) 'Key Insights from the Philosophy of Science' Xamax Consultancy Prt Ltd, January 2003, slide-set, at http://rogerclarke.com/SOS/10-PhilSci-5.pdf
Clarke R. (2003) 'Authentication Re-visited: How Public Key Infrastructure Could Yet Prosper' Proc. 16th Int'l eCommerce Conference, Bled, Slovenia, June 2003, at http://rogerclarke.com/EC/Bled03.html
Clarke R. & Dempsey G. (2004) 'The Economics of Innovation in the Information Industries' Xamax Consultancy Pty Ltd, April 2004, at http://www.rogerclarke.com/EC/EcInnInfInd.html
Clarke R. (2004) 'Identification and Authentication Fundamentals' Xamax Consultancy Pty Ltd, May 2004, at http://www.rogerclarke.com/DV/IdAuthFundas.html
Clarke R. (2004) 'Identification and Authentication: Glossary' Extract from a monograph on 'Identity Management: The Technologies, Their Business Value, Their Problems, and Their Prospects', at http://www.xamax.com.au/EC/IdMngt.html, May 2004, at http://www.rogerclarke.com/EC/IdAuthGloss.html
Clarke R. (2008) 'Terminology Relevant to Identity in the Information Society' Xamax Consultancy Pty Ltd, August 2008, at http://rogerclarke.com/DV/IdTerm.html
Clarke R. (2019) 'Beyond De-Identification: Record Falsification to Disarm Expropriated Data-Sets' Proc. 32nd Bled eConference, June 2019, PrePrint at http://www.rogerclarke.com/DV/RFED.html
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor associated with the Allens Hub for Technology, Law and Innovation in UNSW Law, and a Visiting Professor in the Research School of Computer Science at the Australian National University.
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 65 million in early 2021.
Sponsored by the Gallery, Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916
Created: 3 April 2021 - Last Amended: 17 June 2021 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/ID/IDM-PM.html