Implications of Spiders in General
and Digital's Alta Vista in Particular

Date: Tue, 16 Jan 1996 11:38:52


Mime-Version: 1.0


From: (Timothy C. May)

Subject: Spiderspace


Precedence: bulk

I've been thinking a lot about the problems and opportunities that are coming up as more and more "spiders" (Web searchers, crawlers) are indexing directories and files on systems they can find.

For the sake of this post, the files and whatnot these spiders and super-spiders can hit constitute a universe I'll call "spiderspace," as it semi-euphoniously matches cyberspace and cypherspace.

Two things caused me to think more intensely abou this:

1. At the Saturday Cypherpunks physical meeting, Marianne Mueller (I think) was telling me about an experience where an old letter she'd written to someone showed up in an Alta Vista search. A personal letter, that is. How this happened was that the letter to her friend was buried several subdirectories deep in a directory he made accessible to the outside world. Presto, Alta Vista found it, indexed it, and made it keyword-searchable! (Humans are pretty bad at doing such meticulous file prep work, but all-seeing spiders are very good at seeing everything.)

2. Someone on the Cyberia-l list, Mike Godwin in fact, asked if anyone had a particular post he'd written last summer, a post he'd neglected to save but that he needed. I had not kept that post, according to my own archives, but I decided to see what Alta Vista might turn up. (The Cyberia-l list is not officially archived, and I believe archives of it are discouraged by the list owner, for various reasons especially worrisome to lawyers and law professors!)

Sure enough, a search of "Cyberia-l" in Alta Vista showed all sorts of hits, including what appeared to be several _private archives_ of parts of the traffic. (By "private" I mean in the sense that they were someone's personal archives, and not necessarily complete or even semi-officially sanctioned.)

And a search of "Cyberia-l AND Godwin AND parental AND Ferber" (some of the keywords in the post he knew he was looking for) produced two hits, most probably of the post he was seeking. (They were on a Kent Law School archive site that, I believe, is no longer accessible to the outside...the Alta Vista spiders must have gotten to it and indexed it before the site was made less accessible...just a thought.)

This fits with the point made above, that increasing numbers of odd things--letters, love letters, resumes, job applications, even things like PGP passwords!--will likely show up by accident in spiderspace.

I've started to look for things like PGP files laying around buried in subdirectories. I can imagine attacks based on this.

Declan McCullagh, on the Cyberia-l list, followed up to my post on this topic by noting that things will really get interesting when the internal file systems of many sites are made searchable, such as with the Andrew File System (AFS) at CMU and elsewhere. Apparently most users make their directories accessible to others.

Implications for Cypherpunks?

First, an alert for you to be very careful about what you make accessible to the outside world. It's no longer just a matter of people taking the time to rummage through your subdirectories, it's now trivial to find things with the new Web search engines.

Second, what is out there in spiderspace is incredibly useful for building dossiers, for compiling correlations, and for doing competitive analyses.

Third, more and more kinds of files are going into spiderspace. This may include files compiled by others, such as files containing Web accesses! (All it takes is for someone to keep a record of site accesses, subscriptions, etc., and then put record in a searchable place: it then becomes trivial to search on a name and find out interesting things.)

Fourth...left to your imagination.

--Tim May

We got computers, we're tapping phone lines, we know that that ain't allowed.


Timothy C. May | Crypto Anarchy: encryption, digital money, 408-728-0152 | anonymous networks, digital pseudonyms,

W.A.S.T.E.: Corralitos, CA | zero knowledge, reputations, information

Higher Power: 2^756839 - 1 | markets, black markets, collapse of governments.

"National borders aren't even speed bumps on the information superhighway."


Go to Roger's Home Page.

Go to the contents-page for this segment.

Send an email to Roger

Last Amended: 21 January 1996

These community service pages are a joint offering of the Australian National University (which provides the infrastructure), and Roger Clarke (who provides the content).
The Australian National University
Visiting Fellow, Faculty of
Engineering and Information Technology,
Information Sciences Building Room 211
Xamax Consultancy Pty Ltd, ACN: 002 360 456
78 Sidaway St
Chapman ACT 2611 AUSTRALIA
Tel: +61 6 288 6916 Fax: +61 6 288 1472