cache overview

Assign a MemoryCache, FileCache, or SQLiteCache to the cache_storage property of a scrapelib.Scraper to cache responses:

from scrapelib import Scraper
from scrapelib.cache import FileCache
cache = FileCache('cache-directory')
scraper = Scraper()
scraper.cache_storage = cache
scraper.cache_write_only = False

MemoryCache object

class scrapelib.cache.MemoryCache

In memory cache for request responses.

get(key)

Get cache entry for key, or return None.

set(key, response)

Set cache entry for key with contents of response.

FileCache object

class scrapelib.cache.FileCache(cache_dir, check_last_modified=False)

File-based cache for request responses.

Parameters:
  • cache_dir – directory for storing responses
  • check_last_modified – set to True to compare last-modified timestamp in cached response with value from HEAD request
get(orig_key)

Get cache entry for key, or return None.

set(key, response)

Set cache entry for key with contents of response.

SQLiteCache object

class scrapelib.cache.SQLiteCache(cache_path, check_last_modified=False)

SQLite cache for request responses.

Parameters:
  • cache_path – path for SQLite database file
  • check_last_modified – set to True to compare last-modified timestamp in cached response with value from HEAD request
clear()

Remove all records from cache.

get(key)

Get cache entry for key, or return None.

set(key, response)

Set cache entry for key with contents of response.