Read over 400 news websites on your Palm handheld
Sitescooper automatically retrieves the stories from several news websites, trims off extraneous HTML, and converts them into formats you can read on your Palm computing device for later reading on-the-move. It maintains a cache, and will avoid stories you've already read. It can handle 1-page sites, 1-page with diffing, 2-level and 3-level sites, and it's very easy to add a new site to its list.
Even if you don't have a Palm handheld, it's still handy for simple website-to-text conversion, and offline HTML reading. For example, here's some screenshots of an iPaq displaying sitescooper output.
The output formats supported by sitescooper are as follows:
iSilo, a HTML-based format for the Palm Computing organizers from DC and Co. Free and shareware versions of the viewer are available.
RichReader, an RTF-based format with formatting.
Included in the bundle are site files for Slashdot, NTKnow, BluesNews, Linux Weekly News, Wired News BBC News, TBTF, Hacker News Network, Robot Wisdom weblog, Memepool, Jakob Neilsen's Alertbox, Ars Technica, I, Cringely, Kernel Traffic, Linux Today, comp.risks, and over 300 more.
The latest released version is 3.1.2.
HTTP and local files, using the file:/// protocol, are both supported, and it works fine on most UNIX platforms, Windows 95, 98 and NT, and Macs.
The web-retrieval logic can handle a wide variety of formats (1-page sites, 1-page sites with diffing, 2-level sites, and 3-level sites). It trims out sidebar tables and search forms automatically, and can deliver the output as one big page with all the articles and a table of contents, multiple pages and a TOC, or just all the pages in one long list. Effectively, sitescooper acts as a transcoder for handheld PCs.
It's easily extensible to add your own sites, and can use My-Netscape-style RSS files to find the articles on a given site.
In short, it's neat.
To check out the kind of output it produces, here's a quick demo:
SlashDot, in the "one page per story" style.
(Note: if you tried to access this site as http://sitescooper.tsx.org/ and got a "URL not found" error, my apologies; it's because I've deleted that forwarding URL. When I started work on sitescooper, tsx.org was a reputable forwarding service; when I checked http://sitescooper.tsx.org/ today, it provided me with 2 uncloseable ad windows, advertising a variety of porn sites, and another 3 ad windows on top of that. This is not the kind of thing I want sitescooper to be associated with, so I'd prefer to delete the forwarding URL than provide my implied support.)