scrapeshell

Many times, especially during development, it is useful to open an interactive shell to tinker with a page. Often the HTML being returned is slightly out of sync with what is being seen in the browser, and it can be difficult to detect these differences without firing up an interactive python shell and inspecting what the request is returning.

If scrapelib is installed on your path it provides scrapeshell, an entrypoint that will open an IPython shell. It will present the user with an instance of requests.Response with the contents of the scraped page and if lxml is installed, an lxml.html.HtmlElement instance as well.

scrapeshell arguments

url

scrapeshell requires a URL, which will then be retrieved via a get() call.

--ua user_agent

Set a custom user agent (useful for seeing if a site is returning different results based on UA).

--noredirect

Don’t follow redirects.