Roger Clarke's Web-Site
© Xamax Consultancy Pty Ltd, 1995-2019
|Identity Matters||Other Topics||Waltzing Matilda||What's New|
Roger Clarke **
Version of 1 November 2004
© Xamax Consultancy Pty Ltd, 2004
Available under an AEShareNet licence
This document is at http://www.rogerclarke.com/EC/P2POview.html
The concept of peer-to-peer architecture or P2P has been 'in the air' for some time. The concept is rich, and the scope for misleading explanations and confusion is considerable. This document explains the origins and nature of P2P, in an endeavour to overcome the wide array of misunderstandings that exist, and to assist in better-informed discussions of P2P technologies, and of their strategic and policy implications.
The number of processors connected via the Internet is now vast. Some of them are powerful machines designed to run software that provides services to many people and devices, and to manage data that is accessible by many people and devices.
Most, on the other hand, are devices that service the needs of one user at a time, or users in one very small location. No fully satisfactory term exists to refer to such devices, but 'PC' and 'workstation' have been in common usage for a considerable period of time. Meanwhile, the diversity of connected devices is increasing. Networked playstations are becoming common, handheld devices and even appliances (in the sense of 'white goods' such as refrigerators) are increasingly Internet-connected, and computers are being released as home entertainment centres for the local rendering of audio and video, extracted from CD/DVD, broadcast and webcast. These devices may support multiple processes, different users at different times, and even multiple users at the same time, and they may be connected physically or by wireless means.
The capacity of these devices is now substantial, both in terms of processing and storage. This has not gone unnoticed. A variety of initiatives have sought to harness this 'power at the edge of the net'. This can be done on a hierarchical basis, managed centrally; but it may also be done on a collaborative basis among the users or devices themselves, with no centralisation, and only a limited amount of coordination from a central location.
During their first two decades, roughly the 1940s and 1950s, computers were very large devices that had no connection with other devices. The means whereby data was input to and output from computers started very primitively but gradually became more sophisticated. One technique was to use 'telex machines' or 'tele-types', which comprised a typewriter keyboard for input, and computer control over the printing facility for output. The first networks emerged in the 1960s, to connect remote 'terminals' of this kind to central computers.
These 'dumb terminals' had no programmable components. The topology of such a system resembled a star, with a computer at the centre or hub, and each terminal was connected to it by means of a cable. The architecture was referred to as 'master-slave', because the central device had all intelligence and power, and the remote devices did its bidding. See Exhibit 1.
During the 1970s, the original terminals were augmented with and then replaced by 'glass tele-types'. These still comprised keyboards for input, but instead of a printing device for output, they used cathode ray-tubes (CRTs) - the same technology used in television sets from the 1930s until about 2000. These 'visual display units (VDUs)' initially contained no programmability, and hence the network topology and architecture remained essentially the same.
During the 1970s and 1980s, processing power was added to VDUs, initially at a deep level to provide greater flexibility for the engineers installing and configuring them, and later to provide application programmers with some limited capacity to run part of the application on the terminal.
Meanwhile, the miniaturisation of processors had resulted in the explosion of personal computers (PCs), commencing in the mid-1970s, and picking up pace during the early 1980s. From very early in this phase, PCs were being used in conjunction with modems and dial-up lines for inter-PC communications. By the late 1980s, PCs were readily connected to the large computers that dominated the processing of business and government data.
The patterns were now very different. A great deal of processing was now being performed on the remote devices. A new architecture emerged that was dubbed 'client-server'. A client (software running on one device, such as a user's email-package or a web-browser) requested a service from a server (software running on another device, such as an email-host or a web-server). The server was generally running on a large central device (typically a 'mainframe' computer), and the client was generally running on a small remote PC. The nature of coordination and control had changed: the masters had become servers, and the slaves had become clients. See Exhibits 2a and 2b.
During the course of the 1990s, master-slave architectures became the exception. Client-server architecture became the mainstream, and remains so in the mid-2000s. Many of the most familiar working-tools currently operate in this way, including the downloading of emails from one's mailbox, and the use of a browser to access web-pages. Indeed, client-server architectures have been so dominant that many people who became computer users during the last two decades simply assume that this is how computers collaborate, and do not appreciate that alternative architectures exist and more could be conceived.
There are some significant problems evident with client-server architecture, however. These include:
A variety of refinements have been implemented in order to address these deficiencies. Several of these depend on applications of the redundancy principle, i.e. the provision of more than the minimum necessary resources, and the arrangement of the resources in such a manner that the excess caters for temporary failures. For example, multi-processor configurations such as server-farms reduce the fragility of service that arises from device downtime; and replication, mirroring and caching reduce the fragility of service that arises from sub-network outages and congestion.
Backup and recovery techniques, including business continuity planning and warm-site and hot-site services, improve resilience by shortening the time that a service is unavailable following a major interruption. Various approaches have been adopted to address the substantial security problems inherent in current technologies, including 'thin-clients' that are stripped of some of the capabilities of normal PC/workstations, less because of cost than for the reduction in vulnerability that may be able to be achieved.
The refinements do not, however, address the fundamental problems. The growth in the power of computing at the edge of the Internet has created an opportunity for an alternative architecture that would avoid many of the deficiencies, or enable them to be more readily overcome.
The value of such an architecture may be high for some kinds of application, but not for others. Characteristics of applications for which the new architecture would be particularly attractive include those for which:
Some of the application scenarios relate to processing services, such as:
Many more of the application scenarios relate to digital objects, such as:
Some applications are targetted at consumers, and others at business users. Considerable media attention has been paid to the sharing among consumer-users of entertainment materials, particularly in the form of recorded music, primarily in MP3 format, and increasingly video as well. These kinds of files are known to constitute a very large proportion of the total volume of transmissions over P2P networks, and a significant proportion of those file-transfers is known to be being performed in breach of copyright law. The majority of the application scenarios listed above are, however, already operational, and hence, however large it might be, copyright-infringing file-sharing needs to be appreciated as one form of P2P use among many.
The presumption is commonly made that P2P is a new idea. That is not the case. It was in use as early as the 1970s, an era that, as discussed above, was still dominated by master-slave architecture. Key examples include:
Appreciation of the principles underlying P2P, and of their general applicability, increased substantially during the second half of the 1990s, with the result that many experiments were conducted and many services were launched. Napster attracted huge volumes of traffic between 1998 and 2001. Since its demise due to the provider's inability to comply with court orders, other similar services have experienced similarly dramatic growth rates, most notably Kazaa. In late 2004, there are scores of active implementations.
The following are the characteristics of P2P architecture that need to be satisfied in order for it to offer an alternative that overcomes a significant proportion of the deficiencies of predecessor architectures:
Examples of the kinds of topologies that result are represented graphically in Exhibits 3a and 3b.
An array of definitions of P2P is available, ranging from populist to reasonably formal, and from accurate via incomplete to misleading and even seriously misleading. A working definition is proposed as follows:
peer-to-peer (P2P) is a network architecture in which nodes are relatively equal, in the sense that each node is in principle capable of performing each of the functions necessary to support the network, and in practice many nodes do perform many of the functions
In a 'pure P2P architecture', all functions and all relevant digital objects are distributed across many nodes, such that no node is critical to the network's operation; hence no node can exercise control over the network. Examples of these exist (such as USENET, Fidonet, Freenet and the original Gnutella).
Most instances of P2P architecture involve some degree of compromise of these requirements. Most commonly, the content or services may be fully distributed, but the index may be substantially distributed but not fully distributed (as is the case with FastTrack and the later version of Gnutella).
Alternatively, the index may be heirarchically structured (as in the DNS), or even centralised (as was the case with Napster, and is with BitTorrent). Napster and BitTorrent can be described as having a 'hybrid architecture', in that the index is accessed in client-server mode, whereas the digital objects are transferred directly between peers. If a very relaxed interpretation of 'P2P' is permitted, then at the extremity it corresponds to client-server architecture; but most usages of the term 'P2P' preclude that interpretation.
A literature on the history and concept of P2P is emergent. Some useful sources, academic sites and links to conferences are provided below.
It is useful to compare and contrast P2P architecture with the parallel research domain called 'grid computing'. This is a means whereby available processing resources can be located, used and coordinated; whereas P2P encompasses both processing and data resources. Grid computing also differs from P2P in that it is a largely pragmatic engineering effort, rather than a scientifically designed architecture. See, for example, Ledlie et al. (2003).
For P2P architecture to be implemented, it is dependent on infrastructure, and collaboration among the infrastructural elements. The necessary elements include:
Practical implementations of P2P architectures include features to encourage the availability of devices to perform as servers, and to discourage devices from 'free-riding' or 'over-grazing' the 'commons' that is intrinsic to the architecture. In many cases, participation is more likely to be achieved if the load on the server only lasts for a short period of time, during which the user is unlikely to adopt measures to cause the server component to cease operating. As a result, many P2P schemes involve ephemeral servers, a highly dynamic network topology, and highly volatile metadata.
In comparison with the deficiencies of master-slave and client-server architecture noted above, P2P architecture has the following characteristics:
There are, of course, aspects of P2P architectures that fall short of requirements. Of the deficiencies of predecessor architectures noted earlier, they fail to escape the following deficiencies:
In the absence of countermeasures, a P2P network may generally be more vulnerable to these kinds of attacks than is a client-server network. They are, however, capable of being addressed by similar techniques to those needed in client-server architectures (in particular content hashes and digital signatures).
In addition, a range of new issues arise, including the following:
P2P architecture differs substantially from its predecessors. There are many applications already in operation, and they have demonstrated that the theoretical technical advantages claimed for them are achievable. But they bring with them new challenges.
When evaluations are undertaken of P2P architecture, infrastructure and services, it is important that the opportunities and the issues be appreciated in context rather than in isolation, and that the positives and negatives be considered together.
The following three definitions express current usage effectively:
A small selection of other definitions is provided for completeness:
Anderson R. (1997) 'The Eternity Service' Cambridge University Computer Laboratory, June 1997
Blackmore N. (2004) 'Peer-To-Peer Filesharing Networks: The Legal and Technological Challenges for Copyright Owners' N.S.W. Society for Computers and the Law 55 (March 2004)
Chuang J. (2004) 'Economics of Peer-to-Peer Systems' Summer Institute on Peer-to-Peer Computing, Academia Sinica, August 2004
Clarke I., Sandberg O., Wiley B. & Hong T.W. (2000) 'Freenet: A Distributed Anonymous Information Storage and Retrieval System' Lecture Notes in Computer Science
Clarke I., Hong T.W., Miller S.G. & Sandberg O. (2002) 'Protecting Free Expression Online with Freenet' IEEE Internet Computing 6, 1 (2002) 40-49
Felter W. (2002) 'Design Choices in P2P Infrastructure' IBM Austin Research Laboratory (a slide-set)
Gummadi K. P., Dunn R. J., Saroiu S., Gribble S. D., Levy H. M. & Zahorjan J. (2003) 'Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload' Proc. 19th ACM Symposium on Operating Systems Principles (SOSP-19), October 2003
Habib A. & Chuang J. (2004) 'Incentive Mechanism for Peer-to-Peer Media Streaming' Proc. 12th IEEE Int'l Workshop on Quality of Service (IWQoS'04), June 2004
Howe J. (2003) 'BigChampagne is Watching You' Wired 11.10 (October 2003)
Karagiannis T., Broido A., Brownlee N., claffy k.c. & Faloutsos M. (2004) 'Is P2P dying or just hiding?' Proc. Globecom 2004, November-December 2004
Kurose J.F. & Ross K.W. 'Computer Networking: A Top-Down Approach Featuring the Internet' Pearson Education, 2005, pp. 58-59, 75-78 and 136-145
Ledlie J., Schneidman J., Seltzer M. & Huth J. (2003) 'Scooped, again' Proc. 2nd International Workshop on Peer-to-Peer Systems (IPTPS '03)
Leibowitz N., Ripeanu M. & Wierzbicki A. (2003) 'Deconstructing the Kazaa Network' 3rd IEEE Workshop on Internet Applications (WIAPP'03), 2003, Santa Clara, CA
Liang J., Kumar R. & Ross K.W. (2004) 'Understanding KaZaA' Working Paper,2004
Liang J., Kumar R. & Ross K.W. (2004b) 'The KaZaA Overlay: A Measurement Study' Working Paper, September 2004
Loban B. (2004) 'Between rhizomes and trees: P2P information systems' First Monday 9, 10 (October 2004)
Minar N. & Hedlund M. (2001) 'A Network of Peers - Peer-to-Peer Models Through the History of the Internet', Chapter 1 of Oram (2001)
OECD (2004) 'Peer to Peer Networks in OECD Countries' OECD, Paris, July 2004
Oram A. (Ed.) (2001) 'Peer-to-Peer: Harnessing the Power of Disruptive Technologies' O'Reilly, 2001
Pouwelse J.A., Garbacki P., Epema D.H.J. & Sips H.J. (2005) 'The Bittorrent P2P File-sharing System: Measurements and Analysis' Proc. 4th Int'l Workshop on Peer-to-Peer Systems (IPTPS'05), February 2005
Preston A. (2002) 'Peer-to-peer: an overview of a disruptive technology', Internet2 Peer-toPeer Working Group (a PowerPoint slide-set)
Ross K.W. (2004) 'Recommended Reading in P2P Networking Theory' Catalogue, 2004
Ross K.W. & Rubenstein D. (2004) 'P2P Systems' Tutorial Slide-set, 2004
Roussopoulos M., Baker M., Rosenthal D.S.H., Giuli T.J., Maniatis P. & Mogul J. (2004) '2 P2P or Not 2 P2P?' Proc. IPTPS 2004, February 2004
von Lohmann F. (2003) 'Peer-to-Peer File Sharing and Copyright Law: A Primer for Developers' Proc. 2nd International Workshop on Peer-to-Peer Systems (IPTPS '03)
Waldman M., Rubin A.D. & Cranor L. F. (2000) 'Publius: A robust, tamper-evident, censorship-resistant, web publishing system' Proc. 9th USENIX Security Symposium, August 2000
Wen H. (2002) 'Internet Radio the P2P Way' O'Reilly P2P.com, 24 September 2002
Wikipedia, including the articles on P2P, Kazaa, Hash_function and UUHash
Woody T. (2003) 'The Race to Kill Kazaa' Wired 11.02 (February 2003)
Xie M. (2003) 'P2P Systems Based on Distributed Hash Table' Department of Computer Science, University of Ottawa
I've developed my understanding of P2P through research and teaching since 2000. I've drawn on many published sources in developing this overview, and have benefited from comments from a number of colleagues. Responsibility for the content rests with me.
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, Visiting Professor in the Baker & McKenzie Cyberspace Law & Policy Centre at the University of N.S.W., and Visiting Fellow in the Department of Computer Science at the Australian National University.
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 60 million in early 2019.
Sponsored by the Gallery, Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916
Created: 15 October 2004 - Last Amended: 1 November 2004, plus references of 13 and 27 and 29 January 2005 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/EC/P2POview.html