Roger Clarke's Web-Site
© Xamax Consultancy Pty Ltd, 1995-2013
|Identity Matters||Other Topics||Waltzing Matilda||What's New|
Roger Clarke **
PrePrint of 30 April 2006
Published in the Journal of Theoretical and Applied Electronic Commerce Research 1, 3 (December 2006) 42 - 57, at http://www.jtaer.com/portada.php?agno=2006&numero=3#
© Xamax Consultancy Pty Ltd, 2005-06
Available under an AEShareNet licence or a Creative Commons licence.
This document is at http://www.rogerclarke.com/EC/P2PRes.html
Applications running over peer-to-peer (P2P) networks have exploded since the late 1990s. Research is needed into many aspects of P2P. These include architecture, application functionality, and the categories of digital works that are shared using P2P facilities. A range of significant strategic, consumer and policy issues also arise such as challenges to the operation of copyright, defamation and other censorship laws. Organisations affected by P2P are devising and deploying countermeasures, such as technological protections for digital works and attempts to identify devices and users. These too require study. This paper presents the landscape of research opportunities, indicating methods that might be appropriately applied to the various kinds of questions.
During the 1940s and 1950s, computers were large standalone devices, operating in a single (large) room. During the 1960s, as techniques were developed to enable communications among devices, an architecture arose that is commonly referred to as 'master-slave': a central, relatively powerful computing device controlled remote, 'dumb' 'terminals'.
During the 1970s, processing power and storage capabilities were added to the remote devices, and gradually a new architecture emerged. A client (software running on one device) requested a service from a server (software running on another device). Servers ran on powerful central devices (typically, 'mainframe' computers), and clients ran on multiple, remote, less powerful devices (typically, PCs). The nature of coordination and control had changed: the masters had become servers, and the slaves had become clients.
Since the late 1980s, client-server architectures have been the mainstream (Berson 1996, SEI 1997). Many of our most familiar working-tools currently operate in this way, including the downloading of emails from a mailbox, and the use of a browser to access web-pages. Indeed, client-server architectures have been so dominant that many people who became computer users during the last two decades simply assume that this is how computers collaborate, and do not appreciate that alternative architectures exist and that more could be conceived.
Client-server architecture is mature, and many sophisticated variants exist, featuring server dispersal, replication and synchronisation of directories, file-catalogues and databases, and outsourced service provision. All retain the characteristic of centralised control and coordination (Chevance 2004).
The capacity of remote devices has continually increased, and the aggregate 'power at the edge of the net' has grown to be larger than that of the population of old-style hosts. Meanwhile, the extent of public accessibility to the Internet has exploded since the early 1990s. The devices whose power is currently being utilised are conventional 'PC's and workstations. Others of increasing significance include home entertainment centres, handheld devices (including PDAs, the more powerful mobile phones and digital cameras, and networked gaming consoles), and even appliances (in the sense of 'white goods' such as printers and refrigerators).
Client-server architecture is being applied to exploit the opportunity presented by accessible, unused processor-cycles and spare storage-capacity. The best-known initiatives have been collaborative processing services to search large data collections for patterns. A leading project in 1999 was EFF's DES cracker project, which tested of a set of possible cryptographic keys (EFF 1998). There has been a series of '@home' projects, initially SETI@home since 1999 (Anderson et al. 2002), and later folding@home since 2000 and fightaids@home since 2003.
The challenge of harnessing the 'power at the edge of the net' stimulated experiments with an alternative architectural form. Since the late 1990s, peer-to-peer (P2P) architectures have linked millions of devices, some within organisations, but many controlled by consumers. It was estimated that, as early as September 2002, 31 million Americans shared music from a P2P service (Howe 2003). In 2004, the volumes of data being transmitted over P2P overlay networks were estimated by Karagiannis et al. (2004) as being about 10% of all Internet traffic, compared with 50% for the Web and 3% for all email including spam. Monitoring is conducted by a small number of companies such as BigChampagne, and a similar service called WebSpins is now incorporated into Nielsen ratings in the U.S.A. (Howe 2003). The nature of P2P is such that file-downloads are far more difficult to monitor than catalogue-searches.
A review of BigChampagne materials by a team at the OECD suggests that in early 2004 the number of simultaneous users of all P2P networks was running at close to 10 million, with some 50 million search queries per day. Work by Lyman & Varian (2003) suggested that around that time the most popular network, FastTrack, provided access to about 2 million files, totalling close to 15 terabytes of data.
FastTrack sank from a peak of 5.5 million users and 60% of the market in mid-2003, to 4 million and 40% in early 2004. This was attributed to publicity about lawsuits by the music industry in the U.S.A. This may, however, have been only a temporary lull. In mid-2004, the early dominance of audio files had already been broken, with just under 50% of files appearing to contain audio, 25% video, and 25% other formats including software. Audio files are relatively large, and video files relatively very large. As would be expected, there was a modest correlation between P2P usage and the penetration of broadband. (Except as noted, all of these figures are from OECD 2004, which drew on various sources).
P2P is still new, and is to date inadequately understood. It appears to have characteristics very different from its predecessors, and to have implications that are potentially highly significant both technically, and for corporate strategy and public policy. As with any new domain, the research that has been conducted during the first few years has been opportunistic, and lacks structure and cohesion. A considerable amount of research has been conducted within the computer science discipline (e.g. Risson & Moors 2004). On the other hand, it has attracted little attention within the eCommerce arena. Searches in the IS literature identified few papersof relevance. For example, among the 450 papers published in Commun. AIS to April 2006, only one article directly addresses P2P (Smith et al. 2003), and it is focussed on the needs of CIOs, not researchers.
The purpose of this paper is to propose a framework within which research into P2P can be conceived, managed and implemented. The paper commences by providing a brief review of P2P's origins and nature. It then discusses the concept of a 'research agenda', and proposes a framework. It then considers, within each of the segments of the P2P research domain, the kinds of questions that need to be researched, and the kinds of research techniques that can be deployed in order to work towards answers.
An important precursor to developments in P2P was Anderson (1997). Authoritative examinations of P2P are to be found in Minar & Hedlund (2001), Felter (2002), Preston (2002), Wikipedia, Kurose & Ross (2004) pp. 58-59, 75-78 and 136-145, Loban (2004), Risson & Moors (2004), Clarke (2004), and other references in Ross (2004). Design principles for P2P file-sharing systems are provided in section 5 of Liang et al. (2004b).
An examination of this literature suggests the following critical characteristics of P2P architecture:
In order to deliver an effective service, further practical requirements exist, in particular:
Drawing on and consolidating these elements, the following working definition is used in this paper:
peer-to-peer (P2P) is a network architecture in which nodes are relatively equal, in the sense that each node is in principle capable of performing each of the functions necessary to support the network, and in practice many nodes do perform many of the functions
P2P architectures are argued to be capable of offering a number of advantages over conventional master-slave and client-server alternatives. Central among them is the avoidance of bottlenecks and single-points-of-failure, resulting in much-reduced dependence on individual devices and sub-networks. This enables improved resilience, much-improved scalability, much-improved ability to service highly-peaked demand, and resistance to denial of service attacks.
Applications are particularly likely to benefit from P2P architecture where demand is high relative to the power of individual processor-clusters and sub-networks, whether for an extended period of time, or just brief periods; and/or demand is widespread rather than arising in close proximity to individual processors and sub-networks. The sense in which the term 'proximity' is used may be related to terrestrial geography or network-topology. A fuller analysis is in Roussopoulos et al. (2004).
The focus of public discussions tends to be on high-volume applications used for transferring music and increasingly video, much of which it is claimed is in breach of copyright. The most prominent names were at first Napster and Gnutella, and have recently been FastTrack and Kazaa (Leibowitz et al. 2003, Liang et al. 2004a and 2004b, and the set of Wikipedia articles), and increasingly eDonkey and Overnet. There are many other applications, however, including Freenet for text (Clarke et al. 2000), Publius (Waldman et al. 2000) and iMesh for publishing, BitTorrent (Cohen 2003, Pouwelse 2005), Edutella in education, and Skype for real-time two-way audio/telephone.
The term 'research agenda' is much-used, but seldom clearly defined. Some guidance is provided by Truch et al. (2000), Wand & Weber (2002) and Risson & Moors (2004). Drawing on those sources, a research agenda should provide the following elements:
Because it is essential that the whole of the research domain be addressed, it is inevitable that a research agenda of article length will treat each segment fairly superficially.
In the case of P2P, it is of particular importance to take into account the layered nature of the phenomenon, because it is very likely that research opportunities and appropriate research techniques will be significantly different at each layer. It is proposed that two networking layers be differentiated, an applications layer, and two layers addressing impacts and implications, as follows:
The remainder of the paper is organised in accordance with that 5-layer structure, and addresses each of the elements identified above as a making up a research agenda.
The primary network (or, more correctly, 'inter-network') over which P2P is implemented is the open, public Internet. Research questions include:
The open, public Internet is only one possible infrastructure for P2P. Similar questions arise in relation to intranets and extranets, and similar research techniques can be applied to test, for example, the efficacy of P2P-based backup and recovery within a corporate network, and the minimim network-scale needed to be effective.
Although the Internet Protocol Suite has been the primary focus of most designers and of most observers, other intermediate-layer protocols may also be appropriate hosts for P2P applications. Tests could be performed on networks running other protocol suites, including large regional networks, large dispersed corporate networks, and small local area networks (LANs) such as Ethernet segments within a single small building or floor.
Testing is needed of the robustness of networks (i.e. to what extent can they continue to perform their functions under adverse circumstances?), and of their resilience (i.e. how easily can they be recovered after a major outage?). The possibility exists that future adaptations of network infrastructure might include features intended to present barriers to the 're-booting' of P2P networks. Hence examination is also needed of the susceptibility of networks to a coordinated denial of service attack.
Several research techniques are applicable to the research questions that arise in this segment of the P2P domain. Existing systems can be studied, and the scope exists for field experimentation and quasi-experimental designs (e.g. by adapting an existing P2P application to gather additional data, and to perform somewhat differently from the intentions of its designers). New systems can be constructed in order to trigger features of the underlying infrastructure. Based on an understanding of Internet characteristics, simulation models can be constructed, in order to predict behaviour under various P2P-generated conditions. A considerable amount of research has been conducted into such matters within the computer science discipline (e.g. that catalogued by Ross 2004 and Risson & Moors 2004), and eCommerce researchers need to make themselves aware of, and draw on, that research.
P2P schemes generally involve a layer of software that intermediates between the application and the underlying telecommunications infrastructure. The function of this layer is to manage the linkage between nodes. This brings into existence a virtual network of nodes that is commonly referred to as an 'overlay network'.
There are a number of such schemes. Some are interwoven with applications, and many references fail to distinguish between the two. Well-known overlay networks include the original Napster, the original Gnutella, FastTrack, Kademlia, and eDonkey's Overnet. Catalogues of overlay networking tools are available at Wikipedia, Internet2 (2002-) and Slyck. It has been demonstrated that a P2P overlay network can be implemented in a very small program (Felten 2004, Skala 2005).
A first set of research questions relates to whether the claimed technical advantages over client-server architecture are realised in theory, and in existing, observable schemes. Investigations are likely to lead to sub-questions, such as whether the advantages only emerge once a particular scale is reached, whether there are limits to the scalability even of P2P, and the extent to which performance is dependent on the characteristics of participating devices, detailed design decisions and operational parameters.
Further questions arise in relation to various technical challenges. In particular, P2P is known to be vulnerable to masquerade attacks (falsified digital objects and services that purport to be known ones) and pollution attacks (adapted versions of known digital objects or services). A P2P overlay network is also more vulnerable to attack if its topology is relatively stable; but the more dynamic the topology, the greater the overheads of network management, and the smaller the percentage of requests that are satisfied quickly. The trade-offs under different approaches need to be evaluated.
Techniques that are particularly suited to the examination of such questions include field experimentation and quasi-experimental designs, laboratory experimentation, simulation, and engineering construction and de-construction.
It is feasible to treat P2P as though it were a single architecture. On the other hand, there are already some distinct variants, and their technical advantages and disadvantages may be very different from one another. Of especial interest is the approach taken to coordination within P2P overlay networks. Generally, the business of the network (most commonly file-discovery and file-transfer) is conducted peer-to-peer directly between participating nodes, in some cases 1-to-1, and in other cases many-servers-to-1-client. The management of the catalogue, and the various network management functions, may also be conducted in the same openly collaborative manner, or they may involve some specialisation among nodes, and perhaps some degree of centralised control.
This results in the following taxonomy of schemes for P2P overlay networks:
In addition, at least three special circumstances exist, which might create the possibility of some degree of control being exercised by some party. These are:
Research questions arise in relation to the operational characteristics of each of the architectural variants, and their relative controllability, and vulnerability. That leads to questions as to what contextual factors determine which the various architectural alternatives is most appropriate to a particular application.
A further consideration that appears to be vital to the success of P2P networks is the means whereby sufficient resources are acquired from participants. Most implementations of P2P architectures include features to encourage the availability of devices to perform as servers, and to discourage devices from 'free-riding' or 'over-grazing' the 'commons', which is a risk intrinsic to the architecture. Background on the economics of P2P networks is in Chuang (2004) and related papers.
One fairly common design feature is to ensure that nodes are not called upon to contribute resources for long periods of time, because that way the user is less likely to adopt measures to cause the server component to cease operating. This is a key technical reason why many P2P schemes involve ephemeral servers, a highly dynamic network topology, and hence highly volatile metadata. For the same reason, means are usually included for ensuring that nodes remain responsive to processes initiated by the device's local user(s).
Another rational concern among users is that the software that provides them with access to a P2P network may include functions that are harmful to the individual's interests. This is further discussed under Consumer Impacts. Incentives to make resources available must be sufficient to overcome such 'distrust' factors.
An interesting area of overlap between P2P and other research domains is the philosophy adopted by the developers of protocols, of standards, and of software. Some overlay network protocols and libraries are closed and proprietary (e.g. FastTrack, Altnet/Joltid and Skype), whereas others are open specifications (e.g. DNS, OpenNAP, BitTorrent and OpenFT).
Research questions include: Are there advantages for one or the other in terms of speed of implementation, speed of adoption, quality assurance, etc.? Such questions might be initially addressed through conceptual research, simulation modelling and scenario-building. Scope also exists for field studies, but control over confounding variables is likely to prove very challenging.
Built over the underlying communications infrastructure and the overlay network are applications of direct interest to users. One classification scheme for applications is provided by Smith et al. (2003, p. 101). The research questions identified below address variously the categories of services that P2P can provide, and the features the software offers in order to deliver value.
Four broad categories of application can be identified, based on the nature of the resources that are being shared:
File-sharing over P2P networks involves many different categories of digital object, which have rather diferent characteristics, and give rise to rather different impacts and implications. They include the following:
The majority of P2P traffic during the period 1998-2003 appears to have been audio files, with video file sharing increasing as network and tail-end capacity has increased. Research is needed into usage patterns. Of particular interest is the provision of estimates of the proportion of file-sharing that is copyright-infringing, and the extent to which the infringements are for commercial and for consumption purposes.
Further research questions of considerable interest include: Is P2P also applicable to streaming? Is P2P also applicable to collaborative processing? Are file-sharing, message-transfer, streaming and collaborative processing compatible, or do they require fundamentally different designs and hence separate architectures? Will P2P, streaming services and Grid Computing develop separately, cross-fertilise, or merge?
Such questions can be addressed using non-empirical techniques such as conceptual research, simulation and scenario-building. Field and laboratory experimentation using existing applications and networks are also feasible, as are engineering construction and de-construction techniques.
File-sharing applications are attracting more attention at present than message-transfer, streaming and collaborative processing, and are more mature. This section accordingly focusses on the features of packages intended for file-sharing.
Some packages are designed to utilise one specific overlay network; but many implement multiple protocols in order to provide access to several overlay networks. Researchers need to appreciate this distinction, and take it into account in their research designs, and in their interpretation of their sources and their data.
Multi-network packages are tending to dominate, and include Kazaa Media Desktop (KMD), Grokster, Morpheus and iMesh. Examples of specific-network packages include BitTorrent, PeerEnabler, MLDonkey and eMule. Catalogues of currently available packages are available at Wikipedia, Internet2 (2002-), Slyck, O'Reilly OpenP2P (Distributed Search Engines) and O'Reilly OpenP2P (File-Sharing).
With the proliferation in applications, information is needed about patterns of usage of the applications. Research questions include: Which packages are used by what categories of user? Which packages are used for what categories of file? What forms of specialisation are offered? Which specialisations appear to be valued by which users, for which categories of file? These are empirical questions, which may be answered through field studies, supplemented by surveys and perhaps focus groups, and secondary research utilising postings to mailing-lists and bulletin-boards.
Deeper information is needed about application functionality. There are some basic functions that all applications need to perform, such as the provision of interfaces with the user, with the relevant overlay network(s), and with tools to render the files that are received (such as a web-browser and Adobe Acrobat for text and image content, and Windows Media Player and Apple QuickTime for audio and video). Some may themselves perform rendering functions. Many also offer management facilities for the files that are downloaded, including indexing, annotation and metadata management. Most also include the capacity to perform server functions. All need to perform operational functions such as maintaining a list of nodes in the overlay network that they can make contact with. At least some are likely to include administrative functions, such as the collection and reporting of statistics.
The functionality of each P2P application, and in some cases of each overlay network layer, could be expected to reflect the kinds of uses and users that the designers had in mind. An example is censorship-resistant publishing. To achieve all of the objectives, multiple tools might need to be used in conjunction with one another, such as Freenet (Clarke et al. 2002), Publius (Waldman et al. 2000) or some other anonymous authoring and publishing scheme, and Mixmaster (Chaum 1981) or some other anonymous-proxy method for posting the document onto servers. To date, there tends to be a trade-off between robustness and nymity, on the one hand, and user-friendliness, on the other.
Research questions include: What alternative approaches are adopted to application design (such as the use of existing tools versus custom-building)? What tools are popular among designers (e.g. are Internet Explorer and Windows Media Player commonly required, or is scepticism about Microsoft products apparent among user communities)? Do packages appear to be specialised to particular markets, or to be generalised to support wide varieties of users and file-types? Do packages incorporate means whereby data can be gathered about users, or uses? Do packages incorporate means whereby some party may be able to exercise control over aspects of the network's operation?
Engineering de-construction techniques are especially applicable to such questions. Field and laboratory explerimentation may be valuable in conducting comparisons among applications. Interviews with developers might be a valuable supplementary technique.
Beyond the technical issues, the technology-in-use needs to be studied, to answer questions such as: Do users find the trade-offs selected by the designers to be appropriate? Such questions are appropriately addressed using surveys, and some of the interpretivist techniques.
P2P schemes appear capable of having substantial impacts both on organisations and individuals.
A variety of organisations may be affected, including:
To date, however, the primary categories of organisation embroiled in the turmoil appear to be corporations whose revenues and profits are significantly dependent upon copyrighted materials, in particular music publishers, and now publishers of video-format materials such as feature-films and documentaries. They are confronted by a succession of challenges:
With the challenges to enforceability comes a reduction in user accountability, because people perceive themselves and their behaviour to be difficult to identify, track and respond to. Traceability and the consequential possibility of retribution are not the only factors involved in control over anti-social behaviour; but they are very likely to be important factors.
The questions that arise in this segment of the P2P research domain are very different from those discussed in previous sections. For example: What proportion, and what volume, of file-sharing is conducted without an appropriate licence? What proportion, and what volume, of file-sharing is in breach of copyright law? (The two questions are not equivalent). What is the incidence of unauthorised adaptation of files? What proportion of downloads involve active endeavours to avoid identification (such as the use of proxy-servers)? These can be addressed using field study, secondary data research, and perhaps field experimentation and engineering techniques.
Other questions involve study of human factors. For example: What proportion of the population comprises inveterate anti-capitalists whose behaviour is independent of the law and of public information campaigns? To what extent is consumer behaviour changed by the knowledge that legal action is being taken by copyright-owners? Has consumer payment morality changed since 1998? What is the elasticity of demand for various kinds of digital objects? Are there threshhold prices for various categories of digital objects, above which consumer behaviour changes significantly? How do consumers perceive the use of pseudonyms by others, and by themselves? These are capable of study using surveys and by most forms of intrepretivist technique.
A further, important set of questions requires understanding of management disciplines: What margins did pre-P2P prices offer copyright-owners in various contexts? What scope do they have to reduce costs? In each sector affected by P2P, are there examples of re-engineered corporations that have achieved significantly lower cost-profiles than their competitors? If so, are there factors that prevent other corporations in those sectors following their lead?
Do business models exists that involve revenue-generation or cost-offset approaches different from those used by mainstream copyright-owning corporations in the sector? In particular, is it feasible for publishers to extract sufficient revenue from commercial users (such as radio stations and video-streaming services) such that they can afford to forego control over personal copying and use?
Is there scope for dis-inter-remediation? In particular, can copyright-owning organisations themselves exploit P2P technologies, as many theorists, Napster and now Apple have encouraged them to do? Might publishing corporations themselves be dis-intermediated out of existence, with originators communicating directly with downloading services and streaming channels, and/or directly with consumers? If so, how long might such a transition take, and can large publishers adapt quickly enough to survive?
The primary impacts on consumers are positive. Files are much more readily and quickly accessible, and in most cases to date there is no direct cost involved. On the other hand, consumers do incur costs for their own infrastructure, network connection and possibly traffic; they pay indirectly to the extent that advertising costs force up the prices of other goods and services; and, after a very long gestation period, early movers like Apple, through iTunes, are now forcing commercial operators to apply forms of P2P that incorporate direct charging models.
Negative impacts on individuals arise where P2P networks are used to reticulate digital objects against the wishes of people who have interests in them. These interests vary greatly. For example, in some cultures, pictures of deceased persons are sacrosanct; and in others veneration of the aged is important (even if those aged have a dubious past). The interests in secrecy that tend to dominate discussion are the protection of a person's reputation, the avoidance of civil lawsuits, and the avoidance of criminal prosecutions. These are far more than merely a personal and psychological matter. They have a social and even a political dimension, because they might increase conformity of behaviour with whatever the society perceives to be 'normal', might work against freedom of political speech, and might act as a disincentive against people being prepared to offer themselves for public office. Research questions include: Is there evidence of P2P being applied to character assassination, the embarrassment of hypocrites, and the exposure of villains?
A further issue is that security vulnerabilities exist until learning has taken place, appropriate features have been designed and implemented, and new versions of software have been widely deployed. One concern is that consumer devices may be subject to surreptitious enlistment into P2P networks, without the informed and freely-given consent of the person responsible for the device, or perhaps without any kind of consent, or even without the person's knowledge. Research questions include: What disclosures are made to users of P2P applications regarding the use of their devices as servers? To what extent are consumers aware that their P2P application may perform server as well as client functions? What protections exist to prevent P2P-server activity significantly affecting the consumer's use of the device?
Because of their popularity, P2P applications have also been attractive to people seeking vectors for the various kinds of 'malware' such as 'trap doors', 'backdoors' and 'trojan horses', including the acquisition of so-called 'zombies'. In addition, some P2P applications include adware (which uses data on or about the device or its presumed user to select among alternative advertisements to be displayed on the device), and some include spyware (which extracts personal data from the device and sends it elsewhere). It is also feasible that P2P applications may contain features to modify or delete data or software on the device. Research questions include: What evidence exists of the use of P2P applications as vectors for malware?
P2P is also having second-order effects on industry sectors, and on governments. This section scans these broader issues, in order to identify opportunities for researchers to contribute to understanding of the P2P phenomenon.
Organisations have interests in information flows being restricted, in such contexts as trade secrets, commercial negotiations, insider trading, and the avoidance of accusations in relation to environmental or social depravity, collusion and corruption. Research questions include: Is there evidence of P2P being applied to corporate leaks and whistleblowing?
The issue also arises of the dissemination of incomplete, misleading, and utterly false information, variously in order to harm reputation, drain an organisation's or person's resources, chill their behaviour, or inflate stock-prices. Research questions include: Is there evidence of P2P being applied to the spreading of rumours?
More broadly, those industry sectors appear to be under threat that are dependent for their revenue on their ability to control the dissemination of digital (or digitisable) objects. Rapid change is harmful to investors, to employees, to organisations upstream and downstream from those sectors, and to regions dependent on them for employment. There are economic and social interests in industry sector re-structuring being gradual rather than sudden.
Research questions include: What is the spread of returns from copyright-dependent industry sectors to originators, publishers, publishers' contractors, and elements of the distribution chain? What changes in turnover, profitability and employee-count have been apparent within copyright-dependent industry sectors? To what extent are copyright-dependent sectors large employers in depressed economic regions?
In varying degrees, and using varying methods, governments of all nation-states seek to control information and public perceptions. They may find their capacity to do so undermined by the use of P2P networks. Research questions include: Are P2P networks being used to achieve citizen-driven Freedom of Information to supplement existing, narrow government mechanisms? Is there evidence that government 'propagandists' and 'spin-doctors' are now having less success in guiding and manipulating public opinion?
More broadly, censorship laws may be undermined by the use of P2P networks. Research questions include: Are P2P networks being used for proscribed content, such as 'seditious' materials, child pornography, incitement to violence and hatred, and trading in proscribed objects such as scheduled drugs, explosives, firearms, Nazi memoribilia, and goods deriving from threatened species?
The threats that copyright-owners and government censors perceive in P2P have resulted in countermeasures. Copyright-owners and their agents are investing in technological protections for digital works. They are participating in P2P networks in order to gather information about them. More directly, music publishers and their agents have been actively polluting the quality of content on P2P networks by introducing similarly-named objects that are incomplete or otherwise spoilt, in an endeavour to reduce the networks' reputation and attractiveness (Liang et al. 2005). These measures may or may not be effective in achieving their aims, and may or may not have side-effects.
Research questions include: Are technological protections for copyright objects effective in preventing unauthorised access? Are technological protections for copyright objects harming the exercise of legal rights by consumers, such as licensed use, and use under fair use / fair dealing provisions? To what extent are P2P networks vulnerable to pollution attacks? Is copyright-owners' usage of masquerade and pollution techniques reducing the perceived quality and reliability of files shared using P2P networks?
The network equivalent of a field study can be used to investigate many of these questions. Engineering research is then needed in order to gain insights into the extent to which the effects of these attacks can be mitigated. Questions about user perceptions need to be pursued using survey and interpretivist techniques.
Governments are also implementing measures to protect themselves. One example is the use of proxy-servers to block content (including not only comments on the regime in the People's Republic of China, but also pictures of breasts and genitalia in libraries in the U.S.A. and schools in Australia). Another is attempts to identify and track devices and users, and to facilitate their identification and tracking. Research questions include: Are proxy-server techniques effective for censorship, and to what extent are they being used? Do proxy-server techniques have significant side-effects? To what extent are devices identifiable? To what extent are individual users identifiable? How easy is it to circumvent techniques to facilitate identification and tracking? In particular, are proxy-server techniques effective for anonymisation, and to what extent are they being used?
Governments already have considerable powers available to enable them to counter-attack against impacts of P2P networks that they perceive to be negative. Copyright-owners have not been in as strong a position, because until very recently copyright breach was a purely civil matter, and discovery processes were limited. Lobbying by powerful associations of copyright-owners, however, has resulted in dramatic change over the last decade. Some forms of copyright breach have been criminalised; and new powers have been granted, such as the U.S. Digital Millenium Copyright Act (DMCA) provisions and the copycat provisions in some other countries (e.g. Lunney 2001), and Anton Piller orders that have recently arisen as an extension to the common law (Suzor 2004). These are resulting in substantial impositions on the operations and costs of ISPs, on the availability of data, and on consumers. They have been supplemented by aggressive communications by lawyers on behalf of copyright-owners (using so-called 'nastygrams'), which frequently adopt the pretence that they have far more powers than is actually the case.
Research questions include: To what extent are new powers being used by corporations whose interests are negatively affected by the use of P2P networks? To what extent is the use of those powers effective? To what extent do those powers have negative side-effects?
Many business disciplines, including information systems, tend to avoid policy studies, ceding the domain to the applied social sciences and humanities. A variety of research techniques can be applied. Given that P2P is still new, there is scope for non-empirical techniques such as conceptual research, simulation modelling, scenario-building and game- or role-playing. Some of the questions are susceptible to survey techniques, and insights can be gained into many of them using interpretivist methods. Field and laboratory experimentation and engineering techniques have much to offer in establishing the extent to which surmised impacts and implications are technically feasible, and empirically evident.
The purpose of this paper has been to survey the vibrant and rapidly-changing domain of peer-to-peer networks and their applications, in order to identify research opportunities. A research agenda has been constructed, comprising the dimensions of domain segmentation and research techniques. A tentative mapping has been offered between research questions and research techniques.
Developers, copyright-dependent corporations, government agencies, regulators and consumer advocates need information about the nature and operation of P2P, in order to develop coherent strategies and action plans suitable to the new context. There is scope for a great deal of valuable research to be undertaken, across all segments of P2P, from highly technical activities to organisational, social, economic and legal studies. Many research techniques are applicable.
This preliminary work on an agenda for P2P research needs to be subjected to constructive criticism, and then extended and refined. The limited amount of research that has already been published needs to be identified, catalogued, exploited, and extended. Gaps in research activities need to be identified, and programs and projects developed to address them.
Except where otherwise stated, all links in this list were accessed on 30 April 2006.
Anderson D.P., Cobb J., Korpela E., Lebofsky M. & Werthimer D, (2002) 'SETI@home: An Experiment in Public-Resource Computing' Commun. ACM 45, 11 (November 2002) 56-61, at http://setiathome.ssl.berkeley.edu/cacm/cacm.html, accessed 3 April 2005
Anderson R. (1997) 'The Eternity Service' Cambridge University Computer Laboratory, June 1997, at http://www.cl.cam.ac.uk/users/rja14/eternity/eternity.html
Berson A. (1996) 'Client/Server Architecture', McGraw-Hill 2nd edition, 1996
Blackmore N. (2004) 'Peer-To-Peer Filesharing Networks: The Legal and Technological Challenges for Copyright Owners' N.S.W. Society for Computers and the Law 55 (March 2004), at http://www.nswscl.org.au/journal/55/Blackmore.html
Chaum D.L. (1981) 'Untraceable electronic mail, return addresses, and digital pseudonyms' Commun. ACM 24, 2 (1981) 84-88
Chevance R.J. (2004) 'Server Architectures : Multiprocessors, Clusters, Parallel Systems, Web Servers, Storage Solutions' Digital Press, 2004
Chuang J. (2004) 'Economics of Peer-to-Peer Systems' Summer Institute on Peer-to-Peer Computing, Academia Sinica, August 2004, at http://p2pecon.berkeley.edu/ppt/p2pecon-sinica.pdf
Clarke I., Sandberg O., Wiley B. & Hong T.W. (2000) 'Freenet: A Distributed Anonymous Information Storage and Retrieval System' Lecture Notes in Computer Science, at http://www.doc.ic.ac.uk/~twh1/academic/papers/icsi-revised.pdf, accessed 3 April 2005
Clarke I., Hong T.W., Miller S.G. & Sandberg O. (2002) 'Protecting Free Expression Online with Freenet' IEEE Internet Computing 6, 1 (2002) 40-49, at http://www.doc.ic.ac.uk/~twh1/academic/papers/ieee-final.pdf, accessed 3 April 2005
Clarke R. (2004) 'Peer-to-Peer (P2P) - An Overview' Working Paper, Xamax Consultancy Pty Ltd, November 2004, at http://www.rogerclarke.com/EC/P2POview.html
Cohen B. (2003) 'Incentives Build Robustness in BitTorrent' Working Paper, BitTorrent.com, May 2003, at http://bittorrent.com/bittorrentecon.pdf
Dubnicki C., Ungureanu C. & Kilian W. (2004) 'FPN: A Distributed Hash Table for Commercial Applications' Proc. Conf. HPDC-13, June 4-6 2004, Honolulu, Hawaii USA, at http://hpdc13.cs.ucsb.edu/papers/184.pdf
EFF (1998) 'Cracking DES, Secrets of Encryption Research, Wiretap Politics & Chip Design' Electronic Frontier Foundation, O'Reilly & Associates, 1998
Felten E. (2004) 'TinyP2P: The World's Smallest P2P Application' freedom-to-tinker.com, December 2004, at http://www.freedom-to-tinker.com/tinyp2p.html
Felter W. (2002) 'Design Choices in P2P Infrastructure' IBM Austin Research Laboratory, slide-set, at http://www.internet2.edu/presentations/20020130-P2P-Felter.htm
Habib A. & Chuang J. (2004) 'Incentive Mechanism for Peer-to-Peer Media Streaming' Proc. 12th IEEE Int'l Workshop on Quality of Service (IWQoS'04), June 2004, at http://p2pecon.berkeley.edu/pub/HC-IWQOS04.pdf
Howe J. (2003) 'BigChampagne is Watching You' Wired 11.10 (October 2003), at http://www.wired.com/wired/archive/11.10/fileshare.html
Karagiannis T., Broido A., Brownlee N., claffy k.c. & Faloutsos M. (2004) 'Is P2P dying or just hiding?' Proc. Globecom 2004, November-December 2004, at http://www.cs.ucr.edu/~tkarag/papers/gi04.pdf
Ku R.S.R. (2005) 'Grokking Grokster' Case Legal Studies, Research Paper No. 05-5, February 2005, at http://ssrn.com/abstract=675856
Kurose J.F. & Ross K.W. (2004) 'Computer Networking: A Top-Down Approach Featuring the Internet' Pearson Education, 2004
Leibowitz N., Ripeanu M. & Wierzbicki A. (2003) 'Deconstructing the Kazaa Network' 3rd IEEE Workshop on Internet Applications (WIAPP'03), 2003, Santa Clara, CA
Liang J., Kumar R. & Ross K.W. (2004a) 'Understanding KaZaA' Working Paper, 2004, at http://cis.poly.edu/~ross/papers/UnderstandingKaZaA.pdf
Liang J., Kumar R. & Ross K.W. (2004b) 'The KaZaA Overlay: A Measurement Study' Working Paper, September 2004, at http://cis.poly.edu/~ross/papers/KazaaOverlay.pdf
Liang J., Kumar R., Xi Y. & Ross K.W. (2005) 'Pollution in P2P file sharing systems' Proc. IEEE INFOCOM, 2005, at http://cis.poly.edu/~ross/papers/pollution.pdf
Loban B. (2004) 'Between rhizomes and trees: P2P information systems' First Monday 9, 10 (October 2004), at http://www.firstmonday.org/issues/issue9_10/loban/index.html
Lunney G.S. Jr. Jr. (2001) 'The Death of Copyright: Digital Technology, Private Copying, and the DMCA' Virginia Law Review 87 (September 2001)
Lyman P. & Varian H. R. (2003) 'How Much Information? 2003' School of Information Management and Systems, University of California at Berkeley, accessed on 15 January 2005, at http://www.sims.berkeley.edu/research/projects/how-much-info-2003/
Mann S. (1997) 'Wearable Computing: A First Step Toward Personal Imaging' Computer 30, 2 (February 1997), at http://www.wearcam.org/ieeecomputer/r2025.htm
Minar N. & Hedlund M. (2001) 'A Network of Peers - Peer-to-Peer Models Through the History of the Internet', Chapter 1 of Oram (2001), at http://www.oreilly.com/catalog/peertopeer/chapter/ch01.html
OECD (2004) 'Peer to Peer Networks in OECD Countries' OECD, Paris, July 2004, at http://www.oecd.org/dataoecd/55/57/32927686.pdf
Oram A. (Ed.) (2001) 'Peer-to-Peer: Harnessing the Power of Disruptive Technologies' O'Reilly, 2001, at http://www.oreilly.com/catalog/peertopeer/index.html
Pouwelse J.A., Garbacki P., Epema D.H.J. & Sips H.J. (2005) 'The Bittorrent P2P File-sharing System: Measurements and Analysis' Proc. 4th Int'l Workshop on Peer-to-Peer Systems (IPTPS'05), February 2005, at http://www.isa.its.tudelft.nl/~pouwelse/Bittorrent_Measurements_6pages.pdf
Preston A. (2002) 'Peer-to-peer: an overview of a disruptive technology', Internet2 Peer-toPeer Working Group, slide-set, at http://www.terena.nl/conferences/tnc2002/Slides/sl8b1.ppt
Risson J. & Moors T. (2004) 'Survey of Research towards Robust Peer-to-Peer Networks: Search Methods' Technical Report UNSW-EE-P2P-I-I, University of N.S.W., Sydney, September 2004
Ross K.W. (2004) 'Recommended Reading in P2P Networking Theory' Catalogue, 2004, at http://cis.poly.edu/~ross/p2pTheory/P2Preading.htm
Roussopoulos M., Baker M., Rosenthal D.S.H., Giuli T.J., Maniatis P. & Mogul J. (2004) '2 P2P or Not 2 P2P?' Proc. IPTPS 2004, February 2004, at http://www.eecs.harvard.edu/~mema/publications/iptps2004.pdf
SEI (1997) 'Client/Server Software Architectures--An Overview' Software Engineering Institute, Carnegie-Mellon University, 1997, at http://www.sei.cmu.edu/str/descriptions/clientserver_body.html
Skala M. (2005) 'MoleSter 0.0.4 - now 6 lines, 466 bytes' January 2005, at http://ansuz.sooke.bc.ca/software/molester/
Smith H.A., Clippinger J. & Konsynski B. (2003) 'Riding the Wave: Discovering the Value of P2P Technologies' Commun. Assoc. Infor. Syst. 11, 4 (January 2003) 94-107
Suzor N. (2004) 'Privacy v Intellectual Property Litigation: Preliminary Third Party Discovery on the Internet' Australian Bar Review 25, (2004) 228, at http://ssrn.com/abstract=627786
Truch E., Ezingeard J.-N. & Birchall D.W. (2000) 'Developing a relevant research agenda in Knowledge Management - bridging the gap between knowing and doing' Proc. Euro. Conf. Infor. Syst., 2000, at http://csrc.lse.ac.uk/asp/aspecis/20000190.pdf
Waldman M., Rubin A.D. & Cranor L. F. (2000) 'Publius: A robust, tamper-evident, censorship-resistant, web publishing system' Proc. 9th USENIX Security Symposium, August 2000, at http://cs1.cs.nyu.edu/~waldman/publius/publius.pdf
Wand Y. & Weber R. (2002) 'Research Commentary: Information Systems and Conceptual Modelling - A Research Agenda' Infor. Syst. Res. 13, 4 (December 2002)
Wen H. (2002) 'Internet Radio the P2P Way' O'Reilly P2P.com, 24 September 2002, at http://www.openp2p.com/pub/a/p2p/2002/09/24/p2pradio.html
Xie M. (2003) 'P2P Systems Based on Distributed Hash Table' Department of Computer Science, University of Ottawa, at http://www.site.uottawa.ca/~mxie/academic/bak/DHT.pdf
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, and a Visiting Professor in the Department of Computer Science at the Australian National University.
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 40 million by the end of 2012.
Sponsored by Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 1472, 6288 6916
Created: 7 January 2005 - Last Amended: 30 April 2006 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/EC/P2PRes.html