slashdot.txt
[10-12-2004]

 
[This page has been dropped from the nav and will probably go away soon]

Slashdot

In case you don't know, slashdot.org is a website geared towards what it terms "nerds" (it would appear that "nerd" is a slightly less perjorative word in Slashdot's country of origin than in mine). It announces news items on diverse subjects including Linux, Free Software, Science & Technology and Star Wars.

Slashdot Effect

If you're been hit by the Slashdot Effect, you'll know all about it. If not, some background will be required.

Slashdot generally announces items simply with a hyperlink to the page, as nature and Tim Berners-Lee intended. The problems come when the combined slashdot readership descend on an unprepared site - the result is often a hopelessly overloaded web server and a forest of complaints from both readers and admin.

After being slashdotted following the posting of three papers, Stephen Adler prepared an detailed analysis of his apache webserver logs, entitled "The Slashdot Effect, An Analysis of Three Internet Publications".

robots.txt

The 'net comes across new hazards every so often, and generally evolves a way of working around them. An example that springs to mind is the robots problem. This became evident with the emergence of search engines that indexed the web by following hyperlinks from web page to web page. All too often, the robot could cause problems by not understanding the structure of the site it was indexing, by saving temporary information for later retrival, activating scripts with unfortunate side-effects or just getting lost in an infinite webspace created by a script.

To counter this problem, a Standard for Robot Exclusion has been proposed, and almost universally accepted. It specifies a way to keep robots out of web space where they may cause problems, using a file with a URL of http://www.some.website.com/robots.txt.

slashdot.txt

The format of the robots.txt file isn't particularly relevant to this discussion - it's the principle that it illustrates that is important. I propose a analogous file with the purpose of keeping away the vast increases in traffic that announcement sites like Slashdot generate. I'm sure that Rob won't mind if it I give it a working name of slashdot.txt.

robots.txt provides a list of URLs that robots should not enter. I don't see this approach as sensible for slashdot.txt - the load on the web server is largely independant of the URL, unless the announcement site links directly to a particularly hairy CGI script.

Instead, the approach I propose is to offer an estimate of the number of additional hits the webserver can cope with. This estimate can be derived either from the hardware specification of the server or from the limitation on the bandwidth available to the server.

It is then the responsibility of the announcement site to maintain an estimate of its readership. When it receives a request to post an item, it can compare this with the slashdot.txt for the site and decide if the site is capable of withstanding the assault that the announcement is likely to generate.

The form I would expect it to take is:

Hits: 60# Per minute
No-Links-To: *.cgi# Don't link directly to a cgi script, please.
No-Links-To: /hourly/# Or anything that changes regularly

Comments

This very much a work in progress. If you have any thoughts or comments on this subject, please get in touch. The aim is to soften the blow of the Slashdot Effect. Keep the web working. ;-)

© 1998-2008 Iain Georgeson