Skip to content

ScraperHub/how-to-scrape-customer-reviews

Repository files navigation

Review Scraper (blog example)

Scrapes customer reviews via the Crawlbase Crawling API, parses structured data (rating, text, date, verified, helpful votes), and writes JSONL. Supports pagination and the Enterprise Crawler for bulk scale.

Matches the blog post How to Scrape Customer Reviews.

Setup

python3 -m venv .venv && source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
export CRAWLBASE_TOKEN=your_js_token   # Use JS token for review sites

Run

# Scrape Trustpilot reviews (default: 10 pages max)
python main.py https://www.trustpilot.com/review/example.com

# Custom output and page limit
python main.py https://www.trustpilot.com/review/example.com -o reviews.jsonl --max-pages 5

# For infinite-scroll pages (e.g. Amazon)
python main.py <url> --scroll --page-wait 3000

Layout

  • config.py — Env-based config (token, API base, timeouts, retries).
  • models.pyReview TypedDict schema.
  • fetcher.py — Crawlbase Crawling API client; fetch_page(), fetch_page_enterprise_crawler().
  • pagination.pybuild_page_urls(), add_page_to_url().
  • storage.pyappend_review(), append_reviews() for JSONL.
  • parsers/base.pyReviewParser abstract base.
  • parsers/trustpilot.py — Trustpilot implementation.
  • main.py — CLI; orchestrates fetch → parse → storage.

Output is JSONL: one JSON object per line.

About

Scrape customer reviews from Trustpilot & more via Crawlbase. Python CLI → structured JSONL (rating, text, date, verified).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages