Web scraping papunet.net
I completed this weekend’s project of updating my web scraping script for papunet.net’s new site layout. The script is available under GPLv3 at github. The new script is better at getting at the right stuff, so there are now a couple hundred images more that it is able to scrape. I use this script to get the image data for Viito, so there is now an update pending that brings these new images available to the app as well.
As an item for future improvement to folks at papunet.net, it would be nice to have a more programmable API to accessing the image data. I wouldn’t mind being able to get the images as a zip file either.























