Roger Clarke's 'Google - User Perspectives'

Roger Clarke's Web-Site

© Xamax Consultancy Pty Ltd, 1995-2024

HOME

eBusiness

Information
Infrastructure

Dataveillance
& Privacy

Identity Matters

Other Topics

What's New

Waltzing
Matilda

Advanced Site-Search

Roger Clarke's 'Google - User Perspectives'

Gurgle - The Turmoil Induced by a Search-Engine

Roger Clarke **

Prepared for submission to Computer Law & Security Report

Version of 20 December 2005

Available under an AEShareNet licence or a Creative Commons licence.

This document is at http://www.rogerclarke.com/II/Gurgle0512.html

Abstract

Google began as yet another search-engine. Its owners have found a successful business model, have established additional lines of business, and have achieved quite dramatic growth and profitability. The now multi-facetted Google corporation is a 'newly big business', and is forthrightly challenging both 'old big business', and competition, copyright, consumer and privacy laws.

1. Introduction

The World WideWeb exploded into popular consciousness in 1993. The Web enabled access to content. As the volume of content increased, the informal exchange of URLs became an inadequate way for the escalating user-population to find what they were interested in accessing. Discovery services were needed, to complement the basic Web mechanism.

Techniques were already available to index large volumes of text. The earliest indexing had been biblical concordances, the first of which has been attributed to Hugh of St. Cher's in the 13th century. During the 1970s and 1980s, a range of software was developed to support the indexing of text and the discovery of documents that matched users' search-terms. What came to be known as 'search-engines' applied these established techniques, adapted them to the new context, and drove their effectiveness, efficiency and usability to new levels.

A large number of search-engines have been established, and many are still in existence. Altavista had the greatest impact in the period 1996-98, but fumbled its intended transition from gratis service to 'monetiser' and 'wealth-generator'.

Since then, a late entrant by the name of Google has grown very rapidly, and achieved dominance. It appears to cover a larger proportion of the Web than its competitors (although others appear to have better coverage in specific areas). It is fast, it is reliable, and its user-interface is uncluttered by flashing banners and 'pop-ups'. It has several innovative features that users value. It has attracted the capital it needs to continue to crawl vast volumes of data frequently.

The Web of the mid-1990s was successful because it was very, very simple. A decade later, the Web is still nowhere near the sophisticated eLibrary envisaged by the hypertext pioneers between 1945 and 1970 - Bush, Engelbart and Nelson. The World Wide Web Consortium (W3C) and eager innovators are tugging and bullying the Web towards additional, more sophisticated information services. In some ways these are developing towards the original vision; and in other ways, anything but.

Google is one of those innovators. It began as yet another search-engine. It was useful. It attracted large amounts of venture capital, achieved brand-recognition, projected an image of corporate responsibility, and 'captured eyeballs'. It devised ways to generate revenue based firstly on targetted advertising, and then on intermediated advertising through its Adsense service. And it has raised huge amounts of cash through public share-offerings.

Google has been successful, in terms of growth, profitability and share-price. After a mere 7 years, it has already attracted Business School hagiography (e.g. Vise & Malseed 2005). But those are not the reasons for this article. Rather, Google is important because of the features it has provided in its basic service, the further services it has introduced alongside its search-engine, and the many ways in which the now multi-facetted Google corporation has run close to the wind.

The purposes of this paper are to provide brief overviews firstly of the lines of business that the corporation has been developing, and then of the ways in which it is challenging competition, copyright, consumer and privacy laws.

2. What's Google?

At this stage, Google continues to be fundamentally a search-engine, with trimmings. But the corporation has diversified into a wide range of additional lines of business. Some of them it has developed, in some cases by replicating existing ideas but adding new features, in others by outright innovation, and others it has acquired by purchase and takeover. Consistently with conventional business strategy, these lines of business generally appear to be intended to 'cross-leverage' one another.

For the purposes of the analysis conducted in the later parts of this article, it is useful to cluster Google's services into three segments.

2.1 Helping Users Discover Content

Google's foundation service was, and remains, the search-engine. This depends on a 'web crawler' called Googlebot, which accesses and indexes a substantial proportion of the content accessible on the Web.

The index generated by Googlebot is then made available to all comers by means of a search-engine. Users key in 'search terms'. The richness of the enquiry language available to the user varies between suppliers; yet Google's is among the weakest of all. This reflects its intention to appeal to the masses, rather than to the librarian or even the moderately well-educated researcher.

The search-engine responds with a list of the pages that match the search-terms that the user provided. Given the massive scale of the Web, most search-terms generate very large numbers of 'hits'. The key challenge is to identify pages that are most likely to be of interest to the user. This is an area in which Google's demonstrated performance has excelled. Its particular 'precedence algorithm' or PageRank technique sorts the hits into a sequence that is usually helpful, and sometimes uncannily accurate. Some years ago, it cheekily began offering an 'I feel lucky' option, which delivers the page that comes first in its rankings. On the one hand, this reflects the fact that a very large proportion of users are entertainment-oriented surfers rather than people on a mission to discover specific information; but, on the other, it signals the confidence that the company has in being able to often deliver something of perceived value.

The company has developed a range of extensions to the basic service. Many of these are merely restrictions of the search-space to sub-sets of the index, such as searches only for images, or only of blogs. A variant is the experimental Froogle service, which invites searches for items for sale, and tries to restrict hits to suppliers of goods and services. Any web-site that has attracted the attention of the crawler can offer its visitors a local site-search. This takes about half-an-hour to implement, and is valued by the many small organisations and individuals who manage web-sites that are edging beyond manageability.

A recent extension of particular interest is Google Scholar. In this case, the search is restricted to materials of relevance in the academic world. This service differs somewhat from other scope-restricted searches in that it has been necessary to ensure that the crawler visits appropriate sites; and features have been added, such as a citation-count for each paper, and navigation to the papers that cite it.

An adjacent business line is Google Answers, which involves the company operating as an intermediary between customised research consumers and suppliers.

Generally, Google has won high marks from most users for not imposing censorship, and for not massaging content or lists of hits (other than to the extent that its PageRank system has that effect). That applies in most countries; but not in all. Google, like its competitors, filters pro-Nazi sites in Germany and France, and is complicit in the large-scale content censorship imposed by governments in countries such as China (Faris 2005).

2.2 Providing Users with Content

Without yet quite becoming a content-provider, various of Google's services are moving a little beyond mere discovery of content provided by others.

One extension of interest is Google News, a consolidator service, which provides users not only with links to current news, but also allows a degree of customisation of the selection, and displays from its cache the headline, source, and first line of text.

A service called Google Base provides a gratis content-hosting service. The basic free-text search capabilities can be enhanced with metadata. Google may be adopting the role of publisher, in that "We reserve the right to exercise editorial discretion when it comes to the items we accept on our site".

Another extension that merges into content-provision is Google Earth. This provides satellite imagery of the earth's surface, already at fairly high levels of resolution, although currently of uncertain age. It is accompanied by client-side software, and offÂrs several levels of service and data precision, some gratis and some for-fee.

A further initiative that involves content on a very large scale is Google Library, which is part of a broader Google Print project. This scheme, in collaboration with five leading libraries, involves the scanning of many books, the extraction of the text, the indexing of that text, and support for users in discovering segments of that text that satisfy their search-terms.

2.3 Knowing More About Users

Use of Google's search-engine service is gratis. The company earns considerable revenues from advertising, however, because the advertising is able to be targeted based in particular on the search-terms nominated by the user (Tyacke & Higgins 2004). And the growth imperative demands that the corporation extract more revenue from advertisers.

The third cluster of services was summed up in a statement by Google's CEO to financial analysts: "We are moving to a Google that knows more about you" (reported in The New York Times of 10 February 2005, and subsequently in many other places).

As will be discussed below, the Google services described earlier already provide the company with a massive amount of information about its users, which generates opportunities for substantial financial returns. But the company is making sure that there will be more, of both.

In 2004, it launched a gratis web-mail service dubbed GMail. This has features that differentiate it from its predecessors such as Hotmail and Yahoo. They include huge storage space, enabling long-term retention of messages, and auto-selection of ads for display to the subscriber. This is understood to be based on text in the message from the subscriber's correspondent, but it could be easily upgraded to also reflect the subscriber's accumulated profile.

A new category of web-based services has recently emerged (in a manner and of a form reminiscent of the halcyon days before 'the dot.com bust'). So-called 'social networking services' (SNS) provide spaces where people can establish and expand small networks and larger communities. SNS involve participants creating profiles for themselves that are variously honest, creative and downright dishonest. In addition, many SNS encourage participants to provide personal data about other people, and even to upload the contents of their address books. Google has an entrant in this market, called Orkut.

A further service that might later migrate into this cluster is Google Desktop. This is Google's version of a tool to provide search capabilities across the user's own storage. Such features have previously been provided on Macintoshes, but not yet by Microsoft for the dominant Wintel workstation environments. At least at present, Google Desktop appears to run entirely on the user's own machine, without any linkage out to Google's site. Data about its users therefore appears to be currently unavailable to Google.

But there appears to be very little to ensure that this remains the case. In December 2005, the Google Desktop Terms and Conditions still contained no link to any Privacy Policy, and were easily read as enabling Google to use any personal data that it gathered in any manner it sees fit, now or at any time in the future.

The following sections consider ways in which the various Google services present challenges to the operation of the law, and to the interests of the many different categories of party that have an interest in contemporary ePublishing. The first topic addressed is competition law.

3. Challenges to Competition Law and Practice

Google dominates search-engine usage, particularly for the general public. But it is far from alone, and it does not appear to have any natural advantage that makes its market uncompetable. Similarly, its other lines of business, such as Earth, Desktop and Orkut, do not have the field to themselves.

Google's moves into content, on the other hand, have given rise to concerns in some quarters that digitisation of old works create the risk of some kind of monopoly over the content of published works, and hence of monopoly rents.

There have been prior initiatives to digitise the world's literature, starting with Project Gutenberg in 1971. Google's deal with major libraries has stimulated parallel initiatives. Some are competitive with Google but collaborative among many other players, such as the European Digital Library mooted in May 2005, and the Open Content Alliance (OCA) announced in October 2005. One that appears to be head-to-head competitive is the British Library / Microsoft project announced in November 2005.

Particularly in view of what appears at this stage to be a virile response from elsewhere in the private sector, it is not clear that a reasonable argument can be mounted that Google's Print and Library campaigns are anti-competitive. That view could be subject to review if, for example, Google were able to use its patents on scanning technology to lock competitors out of the market for an extended period. Otherwise, it would appear that this may be a classic case of that (remarkably rare) phenomenon of successful first-mover advantage.

Discussions of anti-competitive effects draw attention towards the now-dramatic excesses of copyright law in favour of owners.

4. Challenges to Copyright Law and Practice

Stronger and longer monopoly has been now granted on every novel, biography, manual and learned work than has ever been the case in the past. Inter-supplier competition is only one dimension. Owners have been granted by the U.S., and subsequently by other parliaments unmindful of their own nations' self-interest, the ability to wield dramatic market power over both consumer and content-intermediaries. This is against the interests of a young, busy and rich content-intermediary, the Google corporation. This section considers Google's challenge to the legal rights of copyright-owners.

From the outset, there was considerable potential for fundamental practices of search-engines to be found to be copyright-infringing. In particular:

a crawler extracts a full copy of content into cache;
it reproduces some of the content in an index;
a search-engine reproduces parts of the content in its responses;
the search-engine's cache, at least in the case of Google, makes a full copy of the content available to users.

Many aspects of these uses have not been litigated, but some indicators of likely outcomes are available, at least in the U.S.A. Web caching by 'online services providers' is generally protected in the U.S. under provisions of the Digital Millenium Copyright Act (DMCA) of 1998. Reproduction of limited amounts of the content of each item is readily argued to be fair use. This includes temporary caching for indexing purpose, which was approved in 2002 in Kelly v. Arriba Soft. Meanwhile, the availability of opt-out for long-term caching may be enough to relieve a search-engine operator of liability for copyright infringement (Olsen 2003).

So perhaps American law is capable on at least some occasions of adapting reasonably quickly, and finding a balance appropriate to new capabilities delivered by new digital technologies (provided that very large dollops of money are available to support the necessary lobbying, and to pursue test-cases). On the other hand, doubts remain (e.g. Bercic 2005, Steward 2005).

There have also been trademark battles. The Adwords technique used by Google enables advertisers to gain priority display-space when particular search-terms are nominated by the user. Trademarked terms have been used, and not infrequently acquired by competitors to the trademark-owner (e.g. Atlee & McMahon 2005). It was reported that, in late 2003, a French tribunal held that Google France infringed registered trade marks in this manner. A similar decision in the U.S. Court of Appeals (9th Circuit) was reported in early 2004, although relating to another search engine.

The copyright and trademark wars are far from over, as Google continues to push at the boundaries. In March 2005, the extension service Google News came under attack in the courts by Agence France Press, which believes that the manner in which the service is designed infringes its copyright (e.g. Wright 2005).

The Google Earth web-site is almost devoid of information about copyright, or even terms of use, although marks asserting Google's ownership of copyright appear on many images. It appears that the service has been structured using conventional contract, agency and copyright licensing agreements. If that assumption is correct, then it would be reasonable to expect that such copyright disputes as arise in relation to the use of Google Earth will be settled under existing laws.

But that appears not to be the case with at least some of the complex of services in the Google Print / Google Library cluster. In many cases (e.g. out-of-copyright works held by organisations such as the Bodleian Library, and works for which the copyright-owner provides a licence), there would appear to be no sense in which the Google Library initiative could reasonably be claimed to infringe copyright law. Some uncertainties exist in relation to works whose copyright-owner is uncertain or cannot be located (referred to at least in the U.S. as 'orphan works', and the subject of current consideration by the U.S. Copyright Office).

Far more contentious is Google Library's handling of works that are in-copyright, and whose copyright-owners have not provided licences and are not prepared to do so. Most of Google's partners appear to be restricting the arrangement to out-of-copyright books. But at least one is making in-copyright books available for scanning. The University of Michigan Library states that "Google will digitally scan and make searchable virtually the entire collection of the U-M library". It very much appears that Google is flexing its now-considerable muscle and proposing to scan, index and enable search over copyright works.

This undertaking involves several steps that can be argued to breach copyright:

a copy is made by scanning the physical work;
extracting the text from the scanned image represents the making of a further copy and/or of an adaptation;
some of the content is reproduced in an index;
some of the content is reproduced and served to users of the search-engine;
a further copy is made and provided to the library whose book was scanned.

Unsurprisingly, the aggrieved have resorted to litigation to defend their position. Two separate actions have been launched in the U.S. District Court, one in September 2005 by the Authors Guild, and the other in October 2005 by five major book-publishers (TechLawJournal 2005, Band 2006). The locations of the protagonists's Head Offices and of the lawsuit could hardly be a better vindication of the Lessig (2000) thesis about 'West Coast Code vs. East Coast Code'.

Most content-expressors want to retain control over at least some aspects of the content they create. Copyright law has long provided rights to originators over copying and republication, and over adaptation and republication of adapted works. Because of the works' value, and their economic muscle, large, for-profit corporations have come to control much of the content that has the potential to attract revenue. Individual originators have been relegated to the role of employee or contractor. In addition, a few forms of content, such as feature films and light entertainment series, require considerable investment, and result from creative work by teams rather than a single primary originator. As a result, copyright in such works is commonly owned by corporations from the outset.

During the last 15 years, as the digital era has threatened their monopoly profits, copyright owners, particularly in the music and film industries, have successfully lobbied the U.S. Congress for stronger protections. The U.S. Administration is now lobbying, on their behalf, for these extended powers to be given effect in other countries. Remarkably, many countries appear incapable of recognising the disadvantages to themselves in doing so, and are falling into line. Australia, an ultra-loyal ally in American foreign policy, ignored or was ignorant of its own economic interests and was among the first to do so, through the US-Australia 'Free Trade' Agreement. The stage is set for enormous tensions between the interests of content-accessors and copyright-owners, with the legal dice already very heavily loaded in favour of the latter.

Google was formed in 1998, but already has $4-5 billion annually in revenue. Its allure as an investment gave it a market capitalisation in the second half of 2005 of $125 billion, which on that measure puts it in the top 100 corprations worldwide. It is a 'newly big business'; and its interests are strongly divergent from those of 'old big business'. Whereas old big business sits fatly, and exploits and arranges extensions to its monopolies, 'newly big business' makes its money by adapting quickly to new contexts, and creating new monopolies that it can dominate from the outset.

From the discussion to date, it might appear that Google is aligned with the interests of content-accessors. The following section considers, however, the challenges that Google presents to consumers and consumer protection laws.

5. Challenges to Consumer Law and Practice

The Web and search-engines began life in the mid-1990s as gratis services, socially-oriented, and socialist or communitarian in nature. The patterns have changed a great deal during the following decade, as business enterprises have sought ways to make money from the vast volumes of content and traffic. The social dimension is far from dead, but there is now a substantial economic dimension that threatens to swamp it.

Even since the commercialisation of the Internet, consumers have benefitted greatly from the Web, and the flood of readily accessible content made available on it. Search-engines, not least Google, have made golden needles discoverable within an increasingly large haystack. The first round unarguably brought massive consumer benefit, particularly given that the actual access to content was gratis - for those people with the necessary infrastructure available to them.

But the contemporary context is economic to the point of being anti-social. It's therefore necessary to consider the extent to which the interests of consumers are holding up against the interests of the generally much more powerful corporations that control much of the available content. Of particular relevance are the interests of the less powerful consumers that are Google's primary target-market - individuals, associations, and small business enterprises.

In many circumstances, consumer rights are simply not respected by commercial providers of content on the Web. Commonly, terms are not negotiable, and in many cases are not even transparent. They are changeable at short notice. They do not survive takeover, or even change in management, or just in management policy. Old versions of terms are simply over-written, and cease to be discoverable. Communications from consumers to providers are ignored, and in many cases barriers are created to make it difficult for the consumer to work out how to send them in the first place. Recourse and enforcement are almost non-existent, not only across jurisdictions, but even within historically consumer-friendly jurisdictions. One interpretation is that the U.S. has successfully exported its marketer-friendly / consumer-hostile / low-regulation approach to the rest of the world.

Google looms as one of the most arrogant of the new generation of content-intermediaries, and appears set to carry its attitudes across into its content-provision business lines. Its terms are abrupt and invariant. Changes are made without notice, as the company sees fit: "We reserve the right to modify these Terms of Service from time to time without notice". Changes are not shown, no version-number or effective-date is provided, and prior versions are not available. Consumers who use its Adsense facility find similarly hostile terms, and inflexible application of them, as the company sees fit, without correspondence being entered into. Its terms for Gmail have drawn particular criticism.

No expansion of consumer protection to cope with these abuses is currently in prospect. As content-expressors increasingly flex the powers granted to them by the U.S. Congress and subsidiary parliaments such as Australia's, the interests of content consumers seem likely to be lost in the surge of copyright supremacist activity. There is little comfort to be had in the recent Tunis Declaration of the World Summit on the Information Society that "We call for the development of national consumer protection laws and practices, and enforcement mechanisms where necessary ..." (WSIS 2005, para. 47), not least given that the document is published by the International Telecommunications Union.

The final section shifts back from the economic dimension of content-access and consumption, to the social dimension, and considers the challenges that Google presents for privacy law.

6. Challenges to Privacy Law and Practice

The Web and search-engines have delivered enormous benefits; but they have also undermined the longstanding protection of 'privacy through obscurity'. The risk is compounded by the emergence of long-term archival services, such as the Internet Archive Wayback Machine. See, for example, Clarke (1998, 1999), Noguchi (2004) and Aljifri & Sánchez Navarro (2004).

Many people conduct research into other people, drawing on the myriad hits that search-engines deliver, such as mentions in the media, court reports, letters to the editor, records of participation in events, and postings to lists, fora and blogs. The motivations for some of these activities are constructive (such as to prepare for a forthcoming meeting), but in other cases they are less so (e.g. stalking, harassment, and extortion).

The sensitivity of personal data is a serious enough concern, but to that must be added the huge problem of pitifully low data quality. Web content is commonly out-of-date, incomplete, uncorroborated, unsourced, or lifted out of its original context without so much as a reference to what that context was. Some of the content is inaccurate. Many of the hits are inevitably spurious, and relate to another person with a similar name. Some Web content is scurrilous, as captured by the sceptical epithet 'It's on the Web; it must be true'. The case study of the John Siegenthaler entry in Wikipedia, in May-December 2005, highlighted the weaknesses and strengths of the Wikipedia model and process.

The Google search-engine is the most intrusive of all Web facilities. The foundation is laid by the size of its cachement. But the company has exacerbated the problem by providing many additional services and ensuring that data arising from the use of each of them is able to be correlated with that arising from all of the others.

Given the privacy-threatening nature of its data practices, it would be reasonable to expect that Google would be especially careful in its structuring of privacy protections. Unfortunately, that is not the case: its Privacy Policy falls far short of people's needs.

As at mid-December 2005, Google provides a master Privacy Policy statement, and a shorter version called Google Privacy Policy Highlights. These appear to apply to all services, including the search-engine and all restricted-search services. There is a supplementary statement for Google Desktop, but none at this stage for Google Earth. Google Print and Google Groups have their own, but they appear to be merely copies of the master statement.

An analysis of Google's master Privacy Policy statement identifies many serious causes for concern (Clarke 2005). Google's use of cookies is extremely intrusive. The personal data that is has available is used for the "display of customized content and advertising". The company misunderstands (or abuses) the concept of 'consent'. Data disclosure to affiliates is uncontrolled. Data appears to be retained indefinitely, and the claim is made that, even if someone requests deletion of specific data, the request can be declined if the company wants to keep it for "legitimate business purposes". In any case, it appears unlikely that the undertakings could be enforced, especially not be individuals. The U.S. has only an extremely watered-down version of genuine privacy protections called the 'Safe Harbor Privacy Principles' (DOC 2000a), and U.S. federal agencies have been highly unhelpful to consumers, as evidenced by the vacuous advice in FAQ No 11: Dispute Resolution and Enforcement (DOC 2000b).

It gets worse.

There is a supplementary statement for Orkut. Google would appear to be free to apply to its own purposes personal data provided to Orkut, including inter-subscriber messages, and the membership of social networks. Some 'social networking services' entice users to disclose personal data about their friends, business contacts or acquaintances (Clarke 2004). From the limited information that is openly available, it is unclear whether Orkut indulges in that practice.

Gmail is a special case that requires closer attention. The privacy implications of email are often overlooked. Most email content was written in unguarded moments, in expectation of limited distribution and ephemerality. The expectation may prove to be unwarranted, because there is an increasing risk of email becoming available to indexing software. For example, private email may escape onto lists by being forwarded by recipients to other parties; and it may be subject to pre-trial discovery, sub poena or search warrant, and hence find its way into court records.

Every user's Internet Access Provider (IAP) maintains logs of traffic, in some cases including content. Every user's email-Internet Service Provider (ISP) maintains an email database. In the case of webmail-only services (such as Hotmail, Yahoo and GMail), the retention-period is highly uncertain. In all of these cases, the traffic-details and text are subject to unexpected use and to both legally authorised and unauthorised disclosure, often without notification to the individuals who thought it was 'their' mail.

Google's Gmail represents the extremity of untrustworthiness in email-provision. A detailed analysis is provided in EPIC's FAQ. For one thing, it refuses to explain the circumstances under which it releases its subscribers' information, and the number of occasions on which it has done so. For another, Gmail's special features have considerably extended the list of risks. Its subscribers are subject to targeted ads based on text from senders. Google is in a strong position to correlate the ads with other data it holds, including, if and when it chooses to do so, with the content of outbound emails. It also has ready access to the social networks that the individual belongs to. How rich a profile does an advertiser need to enable the manipulation of consumer behaviour, and to become a honeypot that attracts interest from other marketers, and from law enforcement agencies seeking specific personal data to extract, and large datasets to mine?

Importantly, the threats are not limited to Gmail subscribers. The messages that people send to Gmail addresses are examined, they are retained long-term, the content and email-address are subject to largely uncontrolled use and disclosure, and correspondence with someone is enough to enable an inference of association with a social network.

The doctrine of privity of contract and the manifold weaknesses and patchiness of privacy laws together suggest that people who send messages to Gmail addresses simply have no rights at all in relation to the content of those emails. Consequently, some people decline to send to Gmail addresses. Many more would be likely to do so if they fully appreciated the risks that they face.

The whole is, in this case, potentially far greater than the sum of the parts. Google is structuring its business portfolio in order to achieve cross-leveraging. The consolidation of information about the behaviour of users of multiple Google-provided services is a particularly valuable form of cross-leveraging. At this stage in its development, Google the corporation has the following streams of data about its users available to it:

logs of:
- the IP-Addresses from which users send search-requests, and from which they access all other Google services;
- the search-terms that they send;
- the data that they provide in interactions with all other Google services;
- the ads that they click on;
- the within-Google pages that they go to (e.g. Google-cache);
- perhaps soon the Book and Library content that they access;
the contents of long-term cookies that are associated with all Google Services, and that contain an identifier that enables all data arising from all visits to all Google site to be correlated;
the vast Gmail archive, comprising:
- all emails sent by Gmail subscribers;
- all emails sent by all correspondents of Gmail subscribers;
- all ads displayed as a result of text in inbound emails;
- the social networks of subscribers and their correspondents;
within the Orkut social networking service:
- self-nominated profiles of members;
- messages exchanged between members;
- data and comments about other individuals that are captured by members, including invitations issued to non-members to join a network;
- the social networks of members and perhaps also non-members.

There is no evidence that the Google corporation has yet moved to bring the full power of data mining technology to bear on this rapidly growing mound of data. But that would in any case be a strategically unwise manoeuvre at this early stage. The various protections nominated in the various privacy policies are nothing like adequate, and they are in any case largely unenforceable, and malleable at the will of the company.

Conclusions

Google is a newcomer to the big end of town. New money is always brash; but Google is big new money. The courts are assured of good sport in the next few years, as elephants battle over copyright.

The apparent alignment of user interests with Google in the copyright arena does not carry over into consumer rights and privacy. The early, socially-oriented era of the Web is being swamped by the contemporary dominance of corporate interests. The tensions among human, corporate and government interests on the Internet are now very high, and are mostly being resolved against the interests of individuals.

Google promises to be a major player in a range of battles. Its claim that "You can make money without doing evil" is being put to the test, as its growth and diversification puts enormous temptations in front of its executives. As this article was being completed, news of a deal between Google and Time-Warner was reported (Liedtke 2005). This included "a plan that may display more graphical ads on some of Google's traditionally sparse Web pages" - perhaps not 'evil', but a departure from one of the highly-valued features that underpinned the service's growth.

An examination of the epithet is instructive. Google is emphatically not built on the assumption that 'the company should not do evil'. Moreover, a corollary is easily formulated: "But you can make more money by doing evil". Given the obligations of corporations under law, the epithet arguably implies that evil should be done, in order to make more money. Google will see it as being in the company's best interests to gather more personal data, to cross-correlate it, to mine it, to exploit its users, and indeed to exploit anyone else who falls within the scope of its increasing market power.

References

Aljifri H. & Sánchez Navarro D. (2004) 'Search engines and privacy' Computers & Security 23, 5 (July 2004) 379-388

Atlee S.D. & McMahon B.F. (2005) 'Search Terms: The Use of Trademarked Terms by Web Search Pages Has Challenged Traditional Boundaries of Trademark Protection' Los Angeles Lawyer 28 (November, 2005) 38

Band J. (2006) 'Copyright owners v. The Google Print Library Project' Ent. L.R. 2006, 17(1), 21-24

Bercic B. (2005) 'Protection of Personal Data and Copyrighted Material on the Web: The Cases of Google and Internet Archive' Information & Communications Technology Law 14, 1 (March 2005) 17-24

Clarke R. (1998) 'Information Privacy On the Internet: Cyberspace Invades Personal Space' Telecomm. J. Aust. 48, 2 (May/June 1998)

Clarke R. (1999) 'Internet Privacy Concerns Confirm the Case for Intervention' Commun. ACM 42, 2 (February 1999) 60-67

Clarke R. (2004) 'Very Black 'Little Black Books' Xamax Consultancy Pty Ltd, February 2004

Clarke R. (2005) 'Evaluation of Google's Privacy Statement against the Privacy Statement Template of 19 December 2005' Xamax Consultancy Pty Ltd, December 2005

DOC (2000a) 'Safe Harbor Privacy Principles' U.S. Department Of Commerce, 21 July 2000

DOC (2000b) 'FAQ No 11: Dispute Resolution and Enforcement' U.S. Department Of Commerce, 21 July 2000

Faris S. (2005) '"Freedom": No documents found' Salon, 16 December 2005

Kelly v. Arriba Soft Corp., 280 F.3d 934 (9th Cir. 2002)

Lessig L. (2000) 'Code and Other Laws of Cyberspace' Basic Books, 2000

Liedtke M. (2005) 'America Online, Google seal $1B deal' Business Week, 20 December 2005

Noguchi Y. (2004) 'Online Search Engines Help Lift Cover of Privacy' Washington Post, Monday, February 9, 2004; Page A01

Steward S. (2005) 'The DMCA protects search engine page caching, indexing, etc.? Not so fast' O'Reilly Developer Weblogs, 7 November 2005

TechLawJournal (2005) 'Major Book Publishers Sue Google for Digitizing Copyrighted Books' TechLawJournal October 19, 2005

Tyacke N. & Higgins R. (2004) 'Searching for trouble - keyword advertising and trade mark infringement' Computer Law & Security Report 20, 6 (November-December 2004) 453-465

WSIS (2005) 'Tunis Agenda for the Information Society' World Summit on the Information Society, WSIS-05/TUNIS/DOC/6(Rev.1)-E ,18 November 2005

Vise D. & Malseed M. (2005) 'The Google Story' Delacorte, 2005

Wright N. (2005) 'Copyright infringement case brought against Google by AFP' EarthTimes Sat, 19 Mar 2005

Author Affiliations

Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, and a Visiting Professor in the Department of Computer Science at the Australian National University.

Acknowledgements

My thanks to Matthew Rimmer of the Law Faculty at the Australian National University, who stimulated this paper by inviting me to present user perspectives in a seminar on 'Google: Infinite Library, Copyright Pirate, or Monopolist?', at the National Institute of Social Sciences and Law, A.N.U., Canberra, on 9 December 2005. My thanks also to the other presenters at that seminar, for the challenges they presented, and to the reviewers who provided feedback on drafts of the paper.

Personalia Photographs
Presentations
Videos Access
Statistics

The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.

From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 65 million in early 2021.

Sponsored by the Gallery, Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer

Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916

Created: 8 December 2005 - Last Amended: 20 December 2005 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/II/Gurgle0512.html
Mail to Webmaster - © Xamax Consultancy Pty Ltd, 1995-2022 - Privacy Policy