1. Sitescooper README

This is sitescooper, a perl script which you run on your Palm Computing handheld organizer's hotsync machine. It will retrieve news stories automatically from various news websites and convert them into Palm DOC, iSilo, RichReader or text format; in addition, it can now convert into any other format for which you have a conversion program that takes text or HTML input.

(If you've just installed sitescooper, you probably don't want to read the blurb again; so just go straight to the Installation page.)

HTTP and local files, using the file:/// protocol, are both supported.

Multiple types of sites can be snarfed:

1-level sites, where the text to be converted is all present on one page, (such as Slashdot, Linux Weekly News, BluesNews, NTKnow, Ars Technica);
2-level sites, where the text to be converted is linked to from a Table of Contents page (such as Wired News, BBC News, and I, Cringely);
3-level sites, where the text to be converted is linked to from a Table of Contents page, which in turned is linked to from a list of issues page (such as PalmPower or New Scientist).

In addition sites that post news as items on one big page, such as Slashdot, Ars Technica, and BluesNews, are supported using diff.

It even trims out sidebar tables automatically, by making the assumption that tables < 30% of the average browser width are not part of the news story. Effectively, sitescooper is a transcoder for handheld PCs.

The script should run easily on most UNIX variants that support perl, as well as the Win32 platform, even Windows 95 (tested with ActivePerl 5.00502 build 509). It has been reported to work on a Mac, using MacPerl 5.1.9r4.

Output is supported in the following formats:

DOC format, Plucker format, and text are all free. RichReader is shareware, and iSilo has both shareware and free readers available.

You may ask, "why not just use AvantGo, 'lynx -dump' and 'makedoc', or some other web-page-downloading software?" Well, sitescooper has several advantages:

In short, it's pretty neat.

Pick up the latest version of sitescooper at the following URL:

http://sitescooper.org/

Sitescooper is distributed under the GNU GPL, and as such is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The full text of the GPL is available here.

The next thing to do is to follow the links below to the next section, Installing.


[ README ]
[ Installing ]|[ on UNIX ]|[ on Windows ]|[ on a Mac ]
[ Running ]|[ Command-line Arguments Reference ]
[ Writing a Site File ]|[ Site File Parameters Reference ]
[ The rss-to-site Conversion Tool ]|[ The subs-to-site Conversion Tool ]
[ Contributing ]|[ GPL ]|[ Home Page ]