Sitescooper

Read over 400 news websites on your Palm handheld

[ Blurb ] [ Download Scoops ] [ Download Software ] [ Tips ]
[ Contributing ] [ CVS ] [ Reviews ] [ Similar Projects ] [ Admin ] [ Latest Sites ]
[ Mirrors: Europe | USA ]

The Blurb

Sitescooper automatically retrieves the stories from several news websites, trims off extraneous HTML, and converts them into formats you can read on your Palm computing device for later reading on-the-move. It maintains a cache, and will avoid stories you've already read. It can handle 1-page sites, 1-page with diffing, 2-level and 3-level sites, and it's very easy to add a new site to its list.

Even if you don't have a Palm handheld, it's still handy for simple website-to-text conversion, and offline HTML reading. For example, here's some screenshots of an iPaq displaying sitescooper output.

The output formats supported by sitescooper are as follows:

plain text
HTML
Plucker, a free, HTML-based format for Palm handhelds
iSilo, a HTML-based format for the Palm Computing organizers from DC and Co. Free and shareware versions of the viewer are available.
DOC format, as used by AportisDoc, TealDoc, CSpotRun, etc. Again, free and shareware viewers are available.
RichReader, an RTF-based format with formatting.
Any other format that converts from text or HTML, using the -pipe functionality.

Included in the bundle are site files for Slashdot, NTKnow, BluesNews, Linux Weekly News, Wired News BBC News, TBTF, Hacker News Network, Robot Wisdom weblog, Memepool, Jakob Neilsen's Alertbox, Ars Technica, I, Cringely, Kernel Traffic, Linux Today, comp.risks, and over 300 more.

The latest released version is 3.1.2.

HTTP and local files, using the file:/// protocol, are both supported, and it works fine on most UNIX platforms, Windows 95, 98 and NT, and Macs.

The web-retrieval logic can handle a wide variety of formats (1-page sites, 1-page sites with diffing, 2-level sites, and 3-level sites). It trims out sidebar tables and search forms automatically, and can deliver the output as one big page with all the articles and a table of contents, multiple pages and a TOC, or just all the pages in one long list. Effectively, sitescooper acts as a transcoder for handheld PCs.

It's easily extensible to add your own sites, and can use My-Netscape-style RSS files to find the articles on a given site.

In short, it's neat.

To check out the kind of output it produces, here's a quick demo:

SlashDot, using the default "all in one big page" output style.
SlashDot, in the "one page per story" style.

(Note: if you tried to access this site as http://sitescooper.tsx.org/ and got a "URL not found" error, my apologies; it's because I've deleted that forwarding URL. When I started work on sitescooper, tsx.org was a reputable forwarding service; when I checked http://sitescooper.tsx.org/ today, it provided me with 2 uncloseable ad windows, advertising a variety of porn sites, and another 3 ad windows on top of that. This is not the kind of thing I want sitescooper to be associated with, so I'd prefer to delete the forwarding URL than provide my implied support.)