Review Scraper (blog example)

Scrapes customer reviews via the Crawlbase Crawling API, parses structured data (rating, text, date, verified, helpful votes), and writes JSONL. Supports pagination and the Enterprise Crawler for bulk scale.

Matches the blog post How to Scrape Customer Reviews.

Setup

python3 -m venv .venv && source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
export CRAWLBASE_TOKEN=your_js_token   # Use JS token for review sites

Run

# Scrape Trustpilot reviews (default: 10 pages max)
python main.py https://www.trustpilot.com/review/example.com

# Custom output and page limit
python main.py https://www.trustpilot.com/review/example.com -o reviews.jsonl --max-pages 5

# For infinite-scroll pages (e.g. Amazon)
python main.py <url> --scroll --page-wait 3000

Layout

config.py — Env-based config (token, API base, timeouts, retries).
models.py — Review TypedDict schema.
fetcher.py — Crawlbase Crawling API client; fetch_page(), fetch_page_enterprise_crawler().
pagination.py — build_page_urls(), add_page_to_url().
storage.py — append_review(), append_reviews() for JSONL.
parsers/base.py — ReviewParser abstract base.
parsers/trustpilot.py — Trustpilot implementation.
main.py — CLI; orchestrates fetch → parse → storage.

Output is JSONL: one JSON object per line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Review Scraper (blog example)

Setup

Run

Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
parsers		parsers
.gitignore		.gitignore
.gitkeep		.gitkeep
README.md		README.md
config.py		config.py
fetcher.py		fetcher.py
main.py		main.py
models.py		models.py
pagination.py		pagination.py
requirements.txt		requirements.txt
storage.py		storage.py

Folders and files

Latest commit

History

Repository files navigation

Review Scraper (blog example)

Setup

Run

Layout

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages