Roger Clarke's Web-Site
© Xamax Consultancy Pty Ltd, 1995-2021
|Identity Matters||Other Topics||Waltzing Matilda||What's New|
Final Version of 11 March 2009
Commissioned as a contribution to a 'Collection of Essays from Industry Experts' on Deep Packet Inspection by the Privacy Commissioner of Canada
Roger Clarke **
© Xamax Consultancy Pty Ltd, 2008-09
Available under an AEShareNet licence or a Creative Commons licence.
This document is at http://www.rogerclarke.com/II/DPI08.html
The job of an intermediary node on the Internet is to pass each packet on to another node closer to the addressee. Deep packet inspection involves an intermediary node also poking its nose inside the packet.
Inspection of the header, and even the contents of the message, may be consensual. Even if it is not consensual, it may be beneficial to all parties. However, the proliferation of uncontrolled, non-consensual access is currently threatening to undermine the open, public Internet as it has been known for its first 15 years of operation.
Worse, these intrusions bring with them the threat that communications over the Internet may become much less free than the communications channels that residents of relatively free nations used in the pre-Internet era.
The term 'deep packet inspection' refers to a technique that is being imposed on data communications networks in order to probe into the contents of passing traffic. This short paper commences with some background, provides an overview of the technique, and undertakes a brief analysis of its implications.
The term 'packet' is ambiguous, and there are advantages in avoiding it. This paper uses the more straightforward term 'message' to refer to that which passes from a sender to a recipient.
The Internet comprises a very large number of nodes, each of which is a computer capable of performing a wide range of functions. Messages are created in a 'sending node', and addressed to a 'receiving node'. In order to get from sender to recipient, messages pass through many other nodes, which are usefully referred to as 'intermediary nodes'. The number of intermediary nodes that messages pass through is typically about 20. A large message is broken into as many parts as necessary in order to comply with the maximum message-size that intermediary nodes along the way are prepared to handle.
The task of an intermediary node is to compute the next node to pass each message on to, in order to either deliver it to the intended recipient, or get it one step closer. The notion of 'deep packet inspection' involves an intermediary node doing more than that. In order to analyse the technique's implications, it is necessary to understand a little about the layers of processing involved in data transmission.
Raw media (such as cable and radio waves) require considerable electronic engineering expertise, infrastructure, hardware and software to make them useful for the transmission of data. That expertise is embodied in 'protocols' (rules of engagement) that are implemented in software in the sending, intermediary and receiving nodes.
Interpreting the binary digits that are transmitted on those media requires a further and different kind of expertise. Another layer of software performs this functions. It implements a further set of protocols, and depends on sender and recipient addresses and other administrative data being stored in headers that are added to the underlying message content.
Shifting the groups of bits from one node to the next requires a different kind of expertise again, a number of protocols, software packages in each node, and an extra header added onto the message. De-constructing large messages into small ones and re-constructing them back into the original message requires another layer of expertise, protocols, software and header. And conveying the semantics of the message requires yet another of each.
In short, transmitting a message from a sending node via intermediary nodes to a receiving node involves a stack of protocols, software and headers. The protocol stack is roughly modelled by the first of the following diagrams, and the headers by the second.
Intermediary nodes run a number of software packages to perform the various functions at each level of the protocol stack. The best-known term for such software is 'router'. Used correctly, this refers to the software operating at the middle level of the stack, which handles the Internet Protocol (IP). Router software depends on lower-level software (switches and hubs).
The term 'router' is often used in misleading ways, however. It may refer to all of the layers of software combined, rather than just one layer. And often it refers to the device (the 'intermediary node') rather than just the software.
Software in an intermediary node, in performing its function as a way-station passing messages from a sender to a recipient, only needs to look at the header associated with the relevant protocol. It has no intrinsic need to look at the deep-nested headers associated with higher-level protocols, let alone at the data deep inside the message. So a well-behaved intermediary node does what it needs to do in order to pass messages on, and nothing more. In terms of Exhibit 1, that work is performed in the Network Layer, by the software called a router.
There are a number of circumstances in which an intermediary node can perform additional functions, as an agent of the sender or recipient. A general term for such software is a 'proxy-server'.
A recipient may use software on their own machine to scan incoming email, evaluate the headers and content in order to assess the likelihood that it is spam, and flag (or, more riskily, delete) messages whose spam-score exceeds some threshhold. Similarly, a recipient may use software on their own machine to scan the content of web-pages they have requested, and possibly block display of the page if the scan detects content that is undesirable in some way. A third example is commonly referred to as a 'firewall'. A firewall detects messages that are being directed at processes within the user's machine that are not expecting to receive such messages.
Rather than having such functions performed on their own device, a recipient may request an intermediary to provide 'spam-filtering', 'web-page filtering' or 'firewall' services. Such services may be offered by companies that provide consumers with connections to the Internet (which are often referred to as Internet Service Providers - ISPs). Where the consumer actively requests it, or provides informed and free consent to it, such services are positive and worthwhile enhancements to basic Internet infrastructure.
The previous examples all involved a message recipient. Circumstances also arise in which the sender may take advantage of additional services from an intermediary node. In particular, a proxy-server may send a message on behalf of the real sender, or manage a session of multiple messages between the sender and a remote server.
One example is called by the obscure name 'reverse-proxy'. For example, a person who is currently away from their normal place of work (e.g. on a client's site, in a hotel or at home) can be made to appear to a remote server as though they were at work. This service is commonly offered by university libraries to academics, enabling them to access publications databases that the library subscribes to, and to do so from anywhere in the world.
Another purpose to which proxy-servers are put is to obscure the sender's network location (their 'IP-Address'). Such services are commonly referred to as anonymous remailers and tools for anonymous web-surfing. They may offer anonymity. Alternatively, where an investigator has the technical capability and the legal authority to access relevant look-up tables, they offer pseudonymity rather than unbreakable anonymity. Such services are valuable, and arguably essential, for 'people with something to hide', such as whistle-blowers, protected witnesses, victims of domestic violence, celebrities, notorieties, and people in security-sensitive occupations, including undercover operatives and spies.
In order to perform these services, the software running on the intermediary node has to read the message content, or at least the deepest-nested 'application headers'; hence the term 'deep packet inspection'.
There are further circumstances in which an intermediary node can perform additional functions which are generally beneficial to all participants.
An intermediary node performs a function as a 'gateway' if it operates a transition facility between the Internet and some other network. For example, one participant in a telephone call may be using VOIP (voice over IP) but the other may be on the conventional Public Switched Telephone Network (PSTN, sometimes referred to as a landline), or on a cellular network (i.e. using a mobile phone). A gateway performs for messages much the same function as an intermodal terminus does for cargo - lifting containers on and off trucks, trains and ships.
Another example is 'network cache'. Many web-pages are requested by multiple web-browser users in a short period of time. An intermediary node can save everyone time and money by storing ('caching') the page for a while after the first request. This avoids having to unnecessarily fetch the same content a second time from a distant server.
To perform these services, however, gateway and network cache software have to read both the 'application headers', at the deepest level of the message, and the message content itself. This represents an intrusion inside the message envelope. Such behaviour may be justifiable on the grounds of efficiency, or perhaps implied consent. But care is needed, because the person whose message is being handled may not be aware of the activity, and may perceive problems that the operator of the intermediary node does not.
Some intermediary nodes contain software that reads deep-nested headers and even content, without the consent of the parties to the message, and for purposes that are not consistent with the interests of the parties. There are several categories, each of which has potentially serious negative implications for the parties, and for society as a whole.
An intermediary node may access the content of the message and either use it for the purposes of the interceptor, or disclose it to some other party. One example of this is software that detects and accumulates email-addresses - for use by spammers. Similarly, software may 'sniff out' credit-card details sent in email messages and typed into web-forms - for use in financial fraud.
Another example is message-monitoring by law enforcement agencies. In many jurisdictions, such monitoring is subject to judicial warrants and tight controls, but in others (including nominally free countries such as the UK, the USA and Australia) those independent authorisations and controls have been subverted, using terrorism as the excuse. As a result, a considerable amount of message-interception is being conducted in the absence of demonstrated and reasonable grounds for suspicion of criminal behaviour.
A further possibility is adaptation of the message and onforwarding of something that purports to have originated with the sender, but did not. This creates further possibilities for fraud, and for the 'planting' of evidence.
Another form of intrusion is masquerade by the intermediary node as though it were the recipient, and provision of a falsified response. This is understood to have been the mechanism whereby the People's Republic of China (PRC) has returned (and continues to return?) false responses to searches submitted to remote search-engines, and fake 'not found' messages in response to requests for web-pages blocked by the regime.
Yet another example of intrusion is the blocking of messages by an intermediary node on the basis that some aspect of the header information or of the message itself is deemed to offend some rule imposed by the party that operates the node. This is commonly the case in un-free regimes such as Burma, the PRC and Iran. But it is also the mechanism proposed by nominally free nations that are adopting a 'nanny state' role and seeking to censor such content as on-line gambling, pornography (however defined) and dissident political speech (however defined). See Dedman & Sullivan (2008) and ONI (2008).
Singapore was an early mover among economically advanced nations. But currently, governments in the USA and Australia are trying to impose much the same repressive measures. Such interference represents concrete steps towards the authoritarian future presaged in Clarke (2001).
The term 'deep packet inspection' refers to access by software running in an intermediary node to header data, and even the message-content, that the node does not need to access in order to perform its inherent function of passing messages on, along their journey from sender to recipient.
Deep packet inspection may be performed at the request, or with the consent, of a party to the message. This is an enhancement to fundamental Internet infrastructure.
Deep packet inspection may be performed without the consent of the parties to the message, but in such a manner that all parties benefit. Primary examples are enhanced response-time and the avoidance of unnecessary transmission of large files, through 'network caching'. This is more problematical than consensual access, because some party is making the judgement that the intrusion is beneficial to all parties.
Finally, and of far more serious concern, deep packet inspection may be performed not only without the authority of the sender and recipient, but also for purposes that are, or at least may be, against the interests of some of the parties. This requires strong justification, tight controls, and enforcement mechanisms. Unfortunately, these are seriously lacking, and both Internet Service Providers and government agencies in many countries (both nominally authoritarian and nominally free) are abusing and undermining Internet infrastructure in the process.
Anderson N. (2007) 'Deep packet inspection meets 'Net neutrality, CALEA' Ars Technica, 25 July 2007, at http://arstechnica.com/articles/culture/Deep-packet-inspection-meets-net-neutrality.ars
Clarke R. (2001) 'Paradise Gained, Paradise Re-lost: How the Internet is being Changed from a Means of Liberation to a Tool of Authoritarianism' Mots Pluriels 18 (August 2001), at http://www.arts.uwa.edu.au/MotsPluriels/MP1801rc.html
Dedman B. & Sullivan B. (2008) 'ISPs are pressed to become child porn cops' MSNBC, 16 October 2008, at http://www.msnbc.msn.com/id/27198621
ONI (2008) 'About Filtering' OpenNet Initiative, 2008, at http://opennet.net/about-filtering
Wikipedia entry (2008) 'Deep packet inspection', at http://en.wikipedia.org/wiki/Deep_packet_inspection
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, and a Visiting Professor in the Department of Computer Science at the Australian National University.
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 65 million in early 2021.
Sponsored by the Gallery, Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916
Created: 29 September 2008 - Last Amended: 11 March 2009 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/II/DPI08.html