scrapelib 1.0.0¶
Overview¶
scrapelib is a library for making requests to websites, particularly those that may be less-than-reliable.
scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.
As of version 0.7 scrapelib has been retooled to take advantage of the superb requests library.
Advantages of using scrapelib over alternatives like httplib2 simply using requests as-is:
- All of the power of the suberb requests library.
- HTTP(S) and FTP requests via an identical API
- support for simple caching with pluggable cache backends
- request throtting
- configurable retries for non-permanent site failures