Scrapes customer reviews via the Crawlbase Crawling API, parses structured data (rating, text, date, verified, helpful votes), and writes JSONL. Supports pagination and the Enterprise Crawler for bulk scale.
Matches the blog post How to Scrape Customer Reviews.
python3 -m venv .venv && source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
export CRAWLBASE_TOKEN=your_js_token # Use JS token for review sites# Scrape Trustpilot reviews (default: 10 pages max)
python main.py https://www.trustpilot.com/review/example.com
# Custom output and page limit
python main.py https://www.trustpilot.com/review/example.com -o reviews.jsonl --max-pages 5
# For infinite-scroll pages (e.g. Amazon)
python main.py <url> --scroll --page-wait 3000- config.py — Env-based config (token, API base, timeouts, retries).
- models.py —
ReviewTypedDict schema. - fetcher.py — Crawlbase Crawling API client;
fetch_page(),fetch_page_enterprise_crawler(). - pagination.py —
build_page_urls(),add_page_to_url(). - storage.py —
append_review(),append_reviews()for JSONL. - parsers/base.py —
ReviewParserabstract base. - parsers/trustpilot.py — Trustpilot implementation.
- main.py — CLI; orchestrates fetch → parse → storage.
Output is JSONL: one JSON object per line.