Roger Clarke's Web-Site
© Xamax Consultancy Pty Ltd, 1995-2018
|Identity Matters||Other Topics||Waltzing Matilda||What's New|
PrePrint of 13 October 2016
Published in Euro. Data Protection L. 2, 4 (Dec 2016) 555-560
Charles Raab and Roger Clarke **
© Xamax Consultancy Pty Ltd, 2016
Available under an AEShareNet licence or a Creative Commons licence.
This document is at http://www.rogerclarke.com/DV/DSEFR.html
Although it has antecedents going back to the 1940s (Bynum 2015), ethics has become the flavour of the year in the world of information practice. Organisations are scrambling to define and implement ethics-based systems and routines that go beyond legal compliance into the realm of doing good, or refraining from doing bad, when it comes to processing personal data. They have set up committees, panels and boards to deliberate on how this might be done; e.g., the European Data Protection Supervisor's Ethics Advisory Board, and the Information Accountability Foundation's "Big Data Ethics Initiative". Do initiatives like the latter signal the "new piety", or is it just "doing the sincerity thing", designed more for facing outward to gain the trust and confidence of regulators and the general public than for facing inward and elevating the level of understanding and practice of those who crunch data? Are there ethical principles that guide these developments? If so, what are they, and how can they be made potent in private and public sectors where information systems have been devised for efficiency, effectiveness, and commercial or policy goals that are, in the first instance, not appraised in terms of ethical precepts regarding privacy and other rights?
These questions come to mind with regard to a recent entrant into this field, the UK Cabinet Office's 2016 Data Science Ethical Framework, commended to the world by Matt Hancock, who is now Minister of State for digital policy. A framework would be a welcome entrant into this field, enabling government to act ethically in its policy projects where "data science" is applied. It would surpass mere adherence to data protection laws and regimes by grounding the work of government more securely in values relating to some clear conception of the human condition that treats people as something more than the objects of policies, even where the policies aim at human betterment according to one or another political ideology. To be effective, it would need to be precise, well-grounded in a grasp of privacy and other ethical dimensions, and clear in its guidance to those in the policy-making environment. Unfortunately, the Cabinet Office's bid falls far short of these criteria, for reasons that we explore here.
The declared intention is to "give civil servants guidance on conducting data science projects, and the confidence to innovate with data". However, it is not easy to see where the Framework comes from philosophically and data-scientifically, nor to what extent it is a document for the use of "government data scientists and policymakers" or an instrument for public reassurance. It is unclear what process was adopted in developing the Framework, but the announcement included a request to "the public, experts, civil servants and other interested parties to help us perfect and iterate". It is therefore a work in progress, but while that stance is commendable in a rapidly changing informatised world, it is no excuse for muddled thinking and sketchy guidance. Moreover, the subsequent desired iterations are likely to be constrained in scope and rethinking by the uses to which it may have already been put by data scientists and public officials.
First, we give a brief overview of the document's contents. It starts with a few paragraphs on why data science ethics is important, and identifies six principles "based on existing law" for guiding the user's project. These are then reiterated first with a few explanatory sentences for each; then once more with a quick scalar checklist for the user to answer some questions about the current position. There follow some free-text boxes for each principle, which, when answered, "will also act as your Privacy Impact Assessment" (PIA) (although it is earlier said, merely, that "[t]he Information Commissioner's Office has confirmed that the checklist can form the basis of" a PIA). Then, in the Annex, each principle is again reiterated with more discursive prose, the repeated scalar checklist, some boxed case studies of "real life examples" and a box of "Web scraping guidance" in six bullet-points. All done and dusted (but open for iteration) in 17 pages. But note one telling alteration, which occurs in the oft-repeated principle 5: most repeats say "Be as open and accountable as possible", but in the ultimate rehearsal, the words "without putting your project at risk" and "without putting us at risk" are added, somewhat giving the game away. We will comment further on some of these items below.
Considering first the Framework's provenance: there are many schools of ethics and a flourishing literature on digital or "cyber" ethics. One would not expect a Framework drafted in a "tools" mode to resemble a philosophical or social-scientific treatise, but it is far from clear just what is meant by "ethics" in the context of data science, and no encouragement of the odd inquisitive policymaker to dip her toes into the subject a bit further, for there are no references or pauses for thought about what it might mean to be ethical when using citizens' personal data - above and beyond mere legal compliance. It is important that a document intended to stimulate and guide data-driven policy initiatives that might have ethically negative consequences be the subject of an evaluation process; but the drafting does not seem to be the result of consultation with the affected public or their representative and advocacy organisations, and other outsiders.
Second, considering what the Framework is and what it might be used for: The Framework reads like something between a catechism and a check-list to be applied to data science, but the document itself does not provide any background on what it means by "data science". It instead links to a brief introduction (OPMT 2016), which refers to data science as using tools "to "analyse and visualise big and complex data to provide useful insight". Some confusion is evident, however, in that the Framework suggests that "data science can help us collect ... data", despite that activity's being outside the definition's scope, and raising some acute legal and ethical issues itself. The Framework's expressed objective is one-sided: to give policymakers "the confidence to innovate" by "bringing together...laws and standards" (e.g., the Data Protection Act, the Intellectual Property Act, and professional standards for data processing). There is no mention of ethics as such in this confidence-boosting amalgamation. The worthy qualification that there should be "respect for privacy", is equivocal: the aim that "no-one suffers unintended [original emphasis] negative consequences" leaves open the possibility that privacy intrusion or other negative consequences of the employment of data science in policymaking - for example, through profiling and data re-purposing - for the individual or society may be both intended and foreseen. But that possibility is not addressed, and would seem to be a prime candidate for the application of ethics, if only the Cabinet Office told us what that means.
The dominance of the project-facilitation objective plays through into the process, in which it is said that "the public benefit of doing the project needs to be balanced against the risk of doing so", thus deploying the tired mantra of "balancing" without indicating how, or whether, the Framework's application of ethics can get any closer to a reliable method of doing that than generations of practitioners and regulators have been able to do (Raab 1999). At no stage is there any act of weighing up the pros and cons and making a go/no-go decision, or of showing how the concepts of necessity and proportionality - ethical precepts as much as legal - should be applied. The assumption throughout the document is that there will be public benefit that will justify the intrusions. The purpose of the process appears to be to get the project through, avoiding harm where that can be done, and educating the public. Such "education" seems to amount to telling people, de haut en bas, how they or society will benefit and how risks will be managed; but a mature and more sophisticated process of public policymaking in a democracy should embody a richer, and arguably more ethical, approach to handling relationships between increasingly sceptical citizens and the error-prone state.
The six principles deserve a closer look. We append some comments to each:
This lays a weak foundation for the evaluation process, in the following ways:
This is a positive contribution, but a much clearer explanation of the nature, process and pitfalls of data analytics is necessary, together with assistance in drilling down to greater detail.
This refers to data quality, but falls far short of providing the guidance needed by civil servants evaluating initiatives. It also largely overlooks three major sources of problems:
Common problems in data analysis are the assumptions that the technique makes about the nature of the data, including the scale on which the data was recorded, its accuracy and precision, and the handling of missing data. The statement is made (but seemingly only in relation to a small sub-set of data analytic techniques, viz. "machine learning algorithms") that "These tools are dependent on the input data and do have limits". However, the discussion of this complex topic lacks structure and clarity, and no references are provided. It is particularly concerning to see a glib example provided, of the kind beloved of the proponents for big data analytics: "searching on Google for flu symptoms is a good indicator of having the flu". This both conflates correlation with causality, and mistakes moderate correlation for high correlation. It is precisely the kind of blunder that a Framework should be designed to prevent; yet it is presented as an example of the appropriate use of a dataset. This highlights a critical omission - the failure to warn readers that:
The document badly needs to be augmented, and to provide access to deeper treatment of data quality, data semantics, data compatibility, and process quality issues.
This segment of the document is also seriously inadequate, in the following ways:
The Framework urges accountability and transparency but is sketchy about how these are to operate; for instance, how, when and to whom policymakers and data scientists should be accountable, and the communication arrangements for transparency. It is understandable that civil servants might wish to avoid transparency where the aim might be jeopardised (e.g., regarding fraud, criminal or terrorist behaviour). However, the text fails to indicate how oversight should operate in monitoring the legitimacy of such avoidance, and the accountability or internal reporting procedures that should be undertaken in such exceptional cases. It leaves these issues as "further questions", a useful feature of the explanations given for all the principles; but in this case some better guidance is needed, given the ethical resonance of transparency and accountability. It is also a matter for concern that the discussion of this principle gives pride of place to making the case for the benefits of data science, and even sees this as a major reason for transparency. This tilts the playing field by giving policymakers a further opportunity to sell their projects to the public (see principle 1), and the link to principle 2 (minimum intrusion) needs to be strengthened.'
The discussion overlooks the questions of shared data and of open data. Moreover, in data-protection terms the security of data is not only a matter of installing technical measures, but of organisational measures as well; however, these are not addressed. This is the crucial realm of information governance and its manifestation in management structures and procedures, and the Framework seems to pay no attention to it.
The principles, then, are a mixture of data-protection law derivations, public-policy desiderata, and some specifics of data science. These are further expounded and illustrated in the Framework, and up to a point, they may be worth having as principles, even though such guidance as they offer is unremarkable. However, they are somewhat confusingly also deployed as a series of steps in what purports to be a PIA process. This is intellectually and structurally unsound: whereas "principles" may have application at various stages of an assessment routine, "process" needs to be driven by the practicalities of acquiring and analysing information, and by its nature is not necessarily sequential but also includes branching paths and iterations. As far as PIA is concerned, the checklist purporting to constitute one is seriously misleading. Some aspects of the process described are likely to generate information of relevance to the PIA process; but the process in the Framework document is not structured in an appropriate manner to satisfy the needs of a PIA, and in many specific aspects it falls far short of the requirements identified in the voluminous literature on PIAs literature (e.g., Clarke 2009, 2011; Wright and De Hert 2012; Wright et al. 2014) and those expressed in the ICO's Code.
In addition, it is not clear who the addressees are, for the principles are sometimes phrased in the language of data science (e.g., de-identification, shadow analysis, machine learning algorithms, synthetic data), and elsewhere speak to generalist policy officials who, if they wish to get up to speed on what the data scientists might be doing, are not offered any guidance or references. In various places, the Framework refers to policymakers and government data scientists as "us" (who might be at risk from too much transparency and accountability), "people" (who need confidence to innovate), "you" (who performs a PIA, and who holds people's data), and the "policy team" (with whom data scientists' algorithm design should take place) in ways that hint at what the data science-assisted policymaking process should look like, without being explicit or consistent enough to relate ethical responsibilities to the overall picture in which "projects" are being developed, or to specific roles.
Perhaps the most unsatisfactory feature of the principles and their use is the Framework's flawed understanding of what ethics require. Most notably, it confuses ethics with public opinion and public perceptions by its otherwise commendable emphasis upon finding out and "[u]nderstanding both stated and revealed public opinion (people's actual behaviour) about how people would want the data you hold about them to be used". Here it acknowledges that "public opinion is diverse and is shifting over time", thus recognising a dilemma, although offering no practical advice to policymakers and data scientists about what to do when discovering such diversity and fickleness. What if public opinion failed to see the public benefit that policymakers are convinced will result from their work? Would such opinion be deferred to as the last word? Or will the public be worked on or "nudged" to bring them into agreement? But, further, it is astonishing to say that "[b]oth the law and ethical practice require us to understand public opinion so we can work out what we should do" [emphasis added]. If ethics, and even law, are supposed to be the lodestar, public opinion cannot override or determine what the law or ethical precepts oblige policymakers to do: you cannot get an "ought" from an "is". The Framework thus damages its own claim to be setting ethical standards for data science's application to public policy projects. Because it stands uneasily between norms and practice (each of several and not necessarily congruent kinds), and because it clings closely - but not always adequately - to data protection law, it cannot move forward on an ethical front that it does not comprehend in any convincing way, even if the Framework professes to help decision-makers to "think through some of the ethical issues which sit outside the law". "Ethics" remains more a word than a usable basis for a practical method of performing data science in the milieu of policy. There are too many unanswered questions, too many compromised motives, for the Framework to enjoin ethics upon practice or to get practitioners to understand what acting ethically means in the policy and analytical contexts in which they operate.
The UK Cabinet Office's "Data Science Ethical Framework" should be developed on the basis of prior contributions in the refereed literature that provide relevant information about the nature and potential negative impacts of big data, how to evaluate proposals, how to conduct ethical analyses, and how to conduct PIAs. It should be held up to the light of an ethical discourse that does not begin or end with the need to inspire policymakers' need for confidence in what they are doing, or the need to keep abreast of public opinion.
The document is seriously deficient. It is so weak as to have the appearance of purely nominal guidance, designed not to filter out inappropriate applications of data analytics, but rather to provide a veneer of respectability, to head off criticisms that government agencies are conducting big data activities on an ad hoc basis, and thereby to enable clear sailing for projects without serious and creative wrestling with knotty ethical issues. In order to overcome the appearance of insincerity, it is essential that the Cabinet Office very promptly revise the document, perhaps through the crowdsourcing iteration it calls for as well as through some other reasoned process, considerably enhance it in order to address the long list of deficiencies including those that we have identified here, and ensure wide distribution of a revised and more cogent version so that a public process of deliberation can take place about the role of data science in ethical governance.
Terrell Bynum `Computer and information ethics' in Edward N. Zalta (ed.) The Stanford Encyclopedia of Philosophy (Winter 2015 Edition) , at http://plato.stanford.edu/archives/win2015/entries/ethics-computer/
Cabinet Office (2016) Data Science Ethical Framework, Version 1.0, 19 May.
Roger Clarke 'Privacy Impact Assessment: Its Origins and Development'  Computer Law & Security Review 123
Roger Clarke 'An Evaluation of Privacy Impact Assessment Guidance Documents'  IDPL 111
Roger Clarke 'Quasi-Empirical Scenario Analysis and Its Application to Big Data Quality'  Proc. 28th Bled eConference, Slovenia, 7-10 June
Roger Clarke 'Quality Assurance for Security Applications of Big Data'  Proc. European Intelligence and Security Informatics Conference (EISIC), Uppsala, 17-19 August
Ipsos Mori 'Public Dialogue on the ethics of data science in government'  May
OPMT (Open Policy Making Toolkit) 'Data science an introduction' [undated] at https://www.gov.uk/guidance/open-policy-making-toolkit/a-z#data-science-introduction
Charles Raab `From Balancing to Steering: New Directions for Data Protection' in Colin Bennett and Rebecca Grant (eds.), Visions of Privacy: Policy Approaches for the Digital Age (University of Toronto Press 1999)
Charles Raab `Privacy, Social Values and the Public Interest' in Andreas Busch and Jeanette Hofmann (eds.) `Politik und die Regulierung von Information' [`Politics and the Regulation of Information'], Politische Vierteljahresschrift Sonderheft 46, (Nomos Verlagsgesellschaft 2012).
David Wright and Paul De Hert (eds) Privacy Impact Assessment (Springer 2012)
David Wright, Kush Wadhwa, Monica Lagazio, Charles Raab and Eric Charikane 'Integrating privacy impact assessment in risk management'  IDPL 155.
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in Cyberspace Law & Policy at the University of N.S.W., and a Visiting Professor in the Computer Science at the Australian National University.
Charles Raab was Professor of Government and is currently Professorial Fellow in Politics and International Relations, School of Social and Political Science at the University of Edinburgh. He is co-Director of the Centre for Research into Information, Surveillance and Privacy (CRISP).
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 50 million in early 2015.
Sponsored by Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916
Created: 13 October 2016 - Last Amended: 13 October 2016 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/DV/DSEFR.html