Releases · spider-rs/spider

31 Mar 21:58

j-mendez

v2.48.13

4164aa9

v2.48.13 — CLI authenticate for Spider Cloud Latest

Latest

What's New

spider authenticate command — Store your Spider Cloud API key locally for remote crawls.

Usage

# Authenticate (stores key in ~/.spider/credentials)
spider authenticate sk-your-key
spider auth  # alias, interactive prompt

# Crawl via Spider Cloud (key auto-loaded)
spider crawl -u https://example.com -o

# Choose cloud mode
spider crawl -u https://example.com --spider-cloud-mode smart -o
spider crawl -u https://example.com --spider-cloud-mode unblocker -o

# Remote headless browser via Spider Browser Cloud
spider crawl -u https://example.com --spider-cloud-browser --headless -o

Key Resolution Order

--spider-cloud-key flag
SPIDER_CLOUD_API_KEY env var
Stored credentials from spider authenticate

Cloud Modes

Mode	Description
`proxy`	Transparent proxy (default)
`api`	POST /crawl API
`unblocker`	Anti-bot bypass
`fallback`	Direct fetch first, fallback on 403/429/503
`smart`	Auto-detects bot protection

Assets 2

25 Mar 21:01

j-mendez

v2.48.4

6b59946

v2.48.4

New spider_mcp crate — MCP server for Spider.

cargo install spider_mcp

Setup (Claude Code ~/.claude/settings.json or Claude Desktop config):

{ "mcpServers": { "spider": { "command": "spider-mcp" } } }

Usage examples:

Scrape a page:       "Fetch https://example.com as markdown"
Crawl a site:        "Crawl https://example.com up to 5 pages"
Extract links:       "Get all links from https://example.com"
Transform HTML:      "Convert this HTML to markdown: <h1>Hello</h1>"

Tools: spider_scrape · spider_crawl · spider_links · spider_transform

Assets 2

25 Mar 16:23

github-actions

v2.48.2

41ce49e

v2.48.2 — Parallel Crawl Backends

Race alternative browser engines alongside your primary crawl. Best HTML wins.

use spider::configuration::{BackendEndpoint, BackendEngine, ParallelBackendsConfig};

let mut website = Website::new("https://example.com");
website.configuration.parallel_backends = Some(ParallelBackendsConfig {
    backends: vec![BackendEndpoint {
        engine: BackendEngine::LightPanda,
        endpoint: Some("ws://127.0.0.1:9222".to_string()),
        binary_path: None,
        protocol: None,
    }],
    ..Default::default()
});
website.crawl().await;
// page.backend_source → "primary" | "lightpanda" | "servo" | "custom"

Features: lightpanda, servo, parallel_backends_full, custom backends via BackendEngine::Custom.

Zero overhead when disabled, anti-fingerprint jitter, auto-disable on failures, custom quality validator, full network interception parity with Chrome.

Assets 2

20 Mar 20:55

j-mendez

v2.47.75

d19430d

v2.47.75

What's New

PageData & Crawler trait abstractions for extensible crawl pipelines
Proxy support for LLM HTTP requests (#378)
Chrome remote_addr via CDP Network.responseReceived
Remote cache for Chrome responses — dump & fallback support

Performance

SIMD-accelerated byte scanning (memchr), unrolled FNV hash
Trie: Box<str> keys + manual byte-walk + memchr dot scan
Bloom filter bitmask addressing + inline early-exit
Zero-alloc DNS cache hits via Arc<[SocketAddr]>
Skip robots.txt for single-page crawls, TCP keepalive always
Batch LRU cache eviction to reduce write-lock hold time
cache_mem auto-enables skip_browser for Chrome mode
Remove unnecessary Box allocations and redundant string conversions

Fixes

Fall back to HTTP crawl when Chrome is unavailable (#373)
Skip redirect-caused net::ERR_ABORTED instead of aborting navigation
Deterministic CDP event listener shutdown via watch channel
Remote cache fallback + 3s timeout for chrome_remote_cache path
Add decentralized stubs for set_url_parsed_direct and links_full

Deps

strum 0.28, phf 0.13, gemini-rust 1.7, chromey 2, llm_models_spider 0.1

Full Changelog: v2.47.52...v2.47.75

Assets 2

19 Mar 15:07

j-mendez

v2.47.51

96b5d8e

v2.47.51

NUMA thread pinning for multi-socket servers (numa feature)
zerocopy wire parsing for HTTP status lines, cache headers, DNS records (zero_copy feature)

Assets 2

19 Mar 14:26

j-mendez

v2.47.50

38efde9

v2.47.50

Zero-copy page passing (bytes::Bytes), mmap+hugepages bloom filter for URL dedup (bloom feature).

Assets 2

15 Mar 17:55

github-actions

v2.47.24

ec27689

v2.47.24

io_uring TCP connect + lightweight background runtime

io_uring TCP connect: Socket + Connect opcodes for kernel-async TCP connects via the existing uring
worker
Lightweight background runtime: Drops from multi-thread to current-thread tokio executor when
io_uring is active
Public API: uring_fs::tcp_connect(addr), uring_fs::is_uring_enabled()
CI fixes: clippy unnecessary_cast, io_other_error, cargo fmt

Full Changelog: v2.47.23...v2.47.24

Assets 2

02 Mar 15:28

j-mendez

v2.45.28

948b72e

v2.45.28

Agent Hardening

Cap LLM-controlled durations (Wait, ClickHold, SetViewport, OpenPage)
Add js_escape() for safe JS string interpolation in action handlers
Wrap Navigate and screenshot calls with timeouts
Use PageWaitStrategy::Load for WaitForNavigation instead of fixed sleep
Replace eval_with_timeout for Fill/Type/Clear actions with error propagation
Improve semaphore and logging diagnostics on error paths

Assets 2

21 Feb 12:54

github-actions

v2.45.24

ac0e0f1

v2.45.24

What's New

Performance

Cache-first fast path — skip browser/HTTP entirely when cache has data (~5-50ms vs 1-3s)
Deferred Chrome — process multi-page crawls from cache before launching a browser
Work-stealing (hedged requests) — parallel retry for slow crawl requests
io_uring — StreamingWriter for high-throughput file I/O on Linux

Agent

Per-round model pool routing — route cheap rounds to fast models, complex rounds to capable ones
Comprehensive router tests — 30+ multi-LLM reliability tests

Fixes

Chrome mode honors wait_for config (e.g. idle_network0) before HTML extraction
Smart mode lifecycle waiting matches Chrome coverage without re-fetch
Auto-retry www. URLs on SSL protocol errors
Reject empty HTML from all cache paths
Default no_sandbox() for headless Chrome (#354)

CLI

--wait-for flag for page load strategies (#352)

Housekeeping

Clippy warnings and formatting cleanup across workspace
Rewritten README, CONTRIBUTING.md, CHANGELOG backfill
Issue/PR templates, security policy, CI workflows (lint, audit, release)

Assets 2

05 Feb 21:04

j-mendez

v2.45.20

65a32f1

v2.45.20

What's New

Relevance Gate for Remote Multimodal Crawling

Added a relevance_gate config that instructs the LLM to return a "relevant": true|false field in its JSON response. When a page is deemed irrelevant, its wildcard budget credit is refunded so the crawler discovers more relevant content.

New config fields:

relevance_gate: bool — enables the feature
relevance_prompt: Option<String> — optional custom relevance criteria

How it works:

When enabled, the system prompt instructs the LLM to include "relevant": true|false
If the model returns false, a budget credit is atomically accumulated
Credits are drained in the crawl loop to restore the wildcard budget
Default fallback is true (assume relevant) if the model omits the field

Example:

let cfgs = RemoteMultimodalConfigs::new(api_url, model)
    .with_relevance_gate(Some("Only pages about Rust programming".into()));

Full Changelog

feat(agent): add relevance_gate and relevance_prompt to RemoteMultimodalConfig
feat(agent): add atomic relevance_credits counter to RemoteMultimodalConfigs
feat(agent): add relevant: Option<bool> to AutomationResult and AutomationResults
feat(agent): extend system prompt and extraction with relevance gate instructions
feat(spider): add restore_wildcard_budget() for budget refund
feat(spider): drain relevance credits in crawl loop dequeue

Assets 2

Releases: spider-rs/spider

v2.48.13 — CLI authenticate for Spider Cloud

What's New

Usage

Key Resolution Order

Cloud Modes

Uh oh!

v2.48.4

Uh oh!

v2.48.2 — Parallel Crawl Backends

Uh oh!

v2.47.75

What's New

Performance

Fixes

Deps

Uh oh!

v2.47.51

Uh oh!

v2.47.50

Uh oh!

v2.47.24

Uh oh!

v2.45.28

Agent Hardening

Uh oh!

v2.45.24

What's New

Performance

Agent

Fixes

CLI

Housekeeping

Uh oh!

v2.45.20

What's New

Relevance Gate for Remote Multimodal Crawling

Full Changelog

Uh oh!