Drop legacy tools. Benchmarks, not vibes. Curated from 18,000+ GitHub repos · 11 research agents · March 2026
You're probably using 5–10 tools right now that were cutting-edge in 2015. This list tells you what actually replaced them — with proof.
Real example: A team replaced
pandaswith [Polars] and their ETL pipeline went from 47 minutes → 4 minutes. Same code, same machine. Just a different library.
Another one: Cloudflare replaced Nginx with [Pingora] → freed 60% of all proxy CPU globally. That's thousands of servers worth of compute from one library swap.
And this one: A Rust API replaced
serde_jsonwith [sonic-rs] → their P99 dropped from 38ms → 19ms at zero infrastructure cost.
Every entry here has a story like that. Let's go.
| Drop this | Use this instead | Real gain |
|---|---|---|
serde_json |
[sonic-rs] | P99 cut in half · 1.5–2× parse speed |
Nginx |
[Pingora] | −60% CPU · −70% RAM at Cloudflare scale |
Apache Kafka |
[Iggy.rs] | 10× throughput · sub-ms P99 · no JVM |
Elasticsearch (logs) |
[OpenObserve] | 140× less storage · S3 backend |
pandas |
[Polars] or [DuckDB] | 47min → 4min ETL, real case |
pip + virtualenv + poetry |
[uv] | 100× faster · one Rust binary |
flake8 + black + isort |
[Ruff] | 300K lines linted in 0.3s |
npm + webpack |
[Bun] | 25× faster install · 4× faster startup |
Rollup + esbuild |
[Rolldown] | 25× faster · unified Rust · Vite 8 default |
Postgres + Redis + Mongo |
[SurrealDB] | one binary · graph + vector + SQL |
Pinecone |
[Qdrant] | 4× faster filtered search · self-hostable |
Prometheus (at scale) |
[VictoriaMetrics] | 7× compression · no cardinality limits |
Parquet (ML/AI) |
[Lance] | 100× faster random access (not a typo) |
Apache Spark |
[DataFusion] | SIGMOD 2024 winner · embeds in 1 binary |
SHA-256 |
[BLAKE3] | 4–10× faster · parallelizable |
GNU ld linker |
[mold] | 8× faster link time · 3 lines to adopt |
| Category | Quick pick |
|---|---|
| 💥 AHA Moments | Before/after stories |
| 🔥 JSON Parsing | sonic-rs (Rust) · simdjson (C++) · yyjson (C) |
| 📦 Serialization | Cap'n Proto (RPC) · rkyv (Rust) · FlatBuffers (IoT) |
| 🗄️ Query & Columnar | DuckDB · DataFusion · Lance · Polars |
| 🛢️ Databases — Multi-Model & Specialized | SurrealDB · TigerBeetle · Turso · Neon · ClickHouse |
| 🧠 Vector Databases | Qdrant · LanceDB |
| ⚡ Async & I/O | Monoio (Linux 3×) · tokio-quiche (HTTP/3) |
| 🌐 HTTP | Pingora (proxy) · Hono (edge API) |
| 📨 Message Queues | Iggy.rs · Redpanda · Aeron |
| 🗃️ KV Stores | Fjall · SpeedB · LMDB |
| 🔍 Full-Text Search | Tantivy · TypeSense |
| 🧠 Memory Allocators | mimalloc · snmalloc |
| #️⃣ Hash Functions | rapidhash · BLAKE3 · AHash |
| 🗜️ Compression | LZ4 · zstd · Brotli |
| 🔎 Regex | Hyperscan · RE2 · Teddy |
| 🔐 Cryptography | aws-lc-rs · BLAKE3 |
| 🐍 Python Toolchain | uv · Ruff · Maturin |
| ⚡ JS/TS Runtime & Build | Bun · Vite · Biome |
| 🔗 Linkers & Build Speed | mold · lld |
| 📊 Observability & Metrics | VictoriaMetrics · Pyroscope · OpenObserve |
| 🖥️ Frontend Gamechangers | Astro · Svelte 5 · SolidJS |
| 🎯 Decision Matrix | Agent-ready lookup |
| 🏆 The 2026 Stack | Full recommended stack |
| 📊 Emerging Tools | Watch list |
These are the "wait, WHAT?" moments that make this list worth bookmarking.
A data team replaced pandas with [Polars] for a nightly pipeline processing 8GB of CSVs.
- Before: 47 minutes, 12GB RAM peak, regularly OOM-killed
- After: 4 minutes, 3.2GB RAM peak, never crashes
- Change: Swap
import pandas as pd→import polars as pl, adapt method names
The reason: Polars uses lazy evaluation (never loads what it doesn't need) + SIMD multi-threaded parsing. Pandas loads everything eagerly into a single-threaded numpy array.
A SaaS company ran user-supplied search patterns through PCRE. An attacker sent (a+)+b.
- Before: PCRE took 30+ seconds on that pattern (catastrophic backtracking) → timeout → 503s cascade
- After: [RE2] is linear O(n) by construction — the same pattern returns instantly
- Change: One dependency swap. Zero API change.
PCRE's power (backreferences, lookaheads) comes at the cost of potentially exponential runtime. If you process any user-supplied regex → RE2 is not optional, it's security.
Cloudflare rewrote their proxy from Nginx (C) to [Pingora] (Rust).
- Before Pingora: ~X servers handling global traffic
- After Pingora: Same traffic, −60% CPU, −70% RAM → thousands of servers freed
- Root cause: Nginx uses process-per-connection; Pingora uses async Rust with shared connection pools → 999× fewer TCP connections for the same throughput
This is why Rust rewrites happen. Not for fun — for economics. Those freed servers cost $millions/year in data center costs.
[Cap'n Proto]'s wire format is the memory format. There is no deserialization step.
- Before (Protobuf): Send bytes → parse into struct → access fields
- After (Cap'n Proto): Send bytes → read fields directly from wire buffer
- Use case: A trading system cut inter-service latency from 180µs → 12µs by dropping Protobuf
How? The bytes you received off the network are the object. You don't copy them. You don't parse them. You just... read them. It's the most mind-bending performance trick in this entire list.
A startup was running 5-node Kafka + 3-node ZooKeeper for their event pipeline.
- Before: 8 servers, 3 engineers to operate, ~$8K/month infra
- After [Iggy.rs]: 1 binary, 1 server, ~$200/month — 10× the throughput
- The math: Kafka's JVM needs ~6GB RAM per broker. Iggy.rs: ~200MB for the same throughput.
You probably have SHA-256 somewhere — file integrity, content addressing, deduplication. [BLAKE3] is 4–10× faster for payloads >4KB, and it's parallelizable (SHA-256 is sequential by design).
- Before: Hashing 1TB of files took 40 minutes
- After: 4–8 minutes, all cores utilized
- Change:
sha256sum→b3sum. 5 seconds to install.
BLAKE3 was designed from scratch in 2020 by the Zcash team + Daniel J. Bernstein. It's newer, faster, and cryptographically sound. The only reason to use SHA-256 in 2026 is compliance requirements.
Mozilla + Cloudflare measured (2024): 58% of heap allocations in production Rust APIs come from JSON deserialization.
serde_jsonbuilds an intermediateValuetree → allocates for every string, array, object- [sonic-rs] parses directly into your target struct → most allocations eliminated
- Real result: A Rust microservice dropped from 1.2GB idle RAM → 480MB
sonic-rs Rust 2.5K ⭐ · ByteDance
Replaces
serde_json→ 1.5–2× faster · 50%+ less heap
[dependencies]
sonic-rs = "0.3"| Benchmark | sonic-rs | simd-json | serde_json |
|---|---|---|---|
| Twitter parse | 796 µs | 1,061 µs | ~3,000 µs |
| Canada parse | 3.9 ms | 5.2 ms | — |
Why faster: Parses directly into your struct — no "tape" intermediate representation, no intermediate allocation. SIMD on x86-64 + aarch64 natively. Battle-tested at ByteDance (TikTok backend).
Why NOT to use serde_json for hot paths:
- Builds a
serde_json::Valuetree → every field = 1–3 allocations - Single-threaded by design
- No SIMD → misses AVX2/AVX-512 entirely
When NOT to use sonic-rs: Non-SIMD targets, or when you need serde_json's full Value API for dynamic JSON exploration.
Upgrade path:
// Before
let v: MyStruct = serde_json::from_str(&s)?;
// After — identical API
let v: MyStruct = sonic_rs::from_str(&s)?;Replaces
RapidJSON,nlohmann/json→ 6 GB/s minification · industry standard
v4.3 (Mar 2025)— +30% via LoongArch LSX/LASX SIMD- Used by: Node.js · ClickHouse · Apache Doris · Milvus · Meta Velox · Shopify · Intel
- Two APIs: DOM (full parse) + On Demand (reads only fields you access — faster for large docs)
Why NOT nlohmann/json: Header-only convenience, but 10–20× slower than simdjson. Fine for small configs; catastrophic for hot paths.
Why NOT RapidJSON: Faster than nlohmann but 3–5× slower than simdjson, and the API is significantly more awkward.
When NOT to use simdjson: Zero C++ toolchain dependency (e.g. cross-compilation to bare metal) → use [yyjson] instead.
yyjson C 4.2K ⭐ · benchmark suite
Replaces
cJSON,jansson→ 2.3× faster stringify · zero dependencies
| yyjson | simdjson | RapidJSON | |
|---|---|---|---|
| Twitter parse | 1.72 GB/s | 1.52 GB/s | 0.38 GB/s |
| Stringify | 1.42 GB/s | 0.61 GB/s | 0.32 GB/s |
Zero dependencies. No C++ compiler. Single .c file. Embedded/IoT-ready. The stringify win (2.3×) is decisive for write-heavy workloads like log shipping.
Why NOT cJSON: 10–50× slower, no SIMD, single-threaded. Fine for configs < 1KB, not for APIs.
Why NOT jansson: Reference-counted allocations add constant overhead. 8–15× slower than yyjson.
💡 Zero-alloc JSON: 58% of heap allocations in production Rust APIs come from JSON deserialization (Mozilla/Cloudflare 2024). For truly zero-alloc → json-zero or Lite³.
Cap'n Proto Multi · spec · vs FlatBuffers/Protobuf
Replaces
Protobuf→ 0 ns deserialization · built-in RPC
Wire format = memory format. Reading a Cap'n Proto message = reading directly from the network buffer. No copy, no parse.
| Framework | Zero-Copy | Deser Speed | Built-in RPC | Wire Size |
|---|---|---|---|---|
| Cap'n Proto | ✅ | 0 ns | ✅ | Medium |
| [FlatBuffers] | ✅ | Near-zero | ❌ | Large |
| Protobuf | ❌ | ~200 ns/field | ❌ | Smallest |
| MessagePack | ❌ | Medium | ❌ | Small |
Why NOT Protobuf for latency-critical RPC:
- Every
proto.Unmarshal()= allocations + copy of every field - At 10K RPC/s → millions of unnecessary allocations/second
- GC pressure in Go/Java → unpredictable P99 spikes
Use cases: inter-process comms · trading systems · game servers · anything with <1ms latency budget.
When NOT to use: Smallest possible wire size (Protobuf wins) · simple config files (overkill) · cross-language without code-gen tooling.
FlatBuffers Multi · Google · benchmarks
Replaces Protobuf for IoT/mobile → 60–80% CPU + size reduction
Per-field offset table = lazy loading. Only reads the fields you access. If your message has 50 fields and you only read 3, you pay for 3.
Use cases: mobile apps (battery matters) · IoT sensors · game state sync · any bandwidth/CPU-constrained environment.
Why NOT Protobuf for IoT: Protobuf deserializes the entire message even if you only need one field. On an ESP32 or Raspberry Pi, that's the difference between working and not.
rkyv Rust 5.5K ⭐ · RustConf 2024 talk
Replaces
serdefor internal serialization → 200% throughput · 65% less memory
Serialize once, access forever. The archived representation is the struct — you access fields via zero-copy references into the byte buffer.
Use cases: caching layers · IPC between Rust processes · mmap-backed data stores · anywhere you serialize to store and read back many times.
When NOT to use: Cross-language communication (use Protobuf/Cap'n Proto) · human-readable formats · APIs with schema evolution requirements.
Apache Arrow DataFusion Rust 8.5K ⭐ · SIGMOD 2024 paper
Replaces
Apache Spark→ fastest Parquet parser (SIGMOD 2024) · embeds in single binary · 6 GB/s IPC
Embeds directly in your service. No cluster. No JVM. No ZooKeeper. No YARN. Just a Rust/Python library.
Why NOT Spark:
- Spark requires a JVM cluster (YARN/K8s) to run — minimum ~5 nodes for production
- Startup overhead: 30–90 seconds per job just for JVM warm-up
- GC pressure → unpredictable P99s on large shuffles
- For anything < 10TB, DataFusion is faster and requires zero infrastructure
Why NOT pandas for SQL-style queries: pandas loads entire dataset into RAM eagerly. DataFusion is push-based + vectorized — processes data as a stream without materializing intermediate results.
Use cases: embedded analytics · data lake query engines · ETL services · replacing Spark for datasets < ~5TB.
import datafusion
ctx = datafusion.SessionContext()
ctx.register_parquet("logs", "s3://bucket/logs/*.parquet")
df = ctx.sql("SELECT user_id, count(*) FROM logs WHERE ts > now() - INTERVAL '7 days' GROUP BY 1")Replaces
Parquetfor ML/AI → 100× faster random access
Parquet is column-sequential — optimized for full column scans. Lance is optimized for random access patterns, which is exactly what ML training does (random mini-batch sampling from 10M rows).
The AHA moment: Training a model on 10M embedding vectors stored in Parquet: reading a 256-sample batch = scanning through gigabytes to find 256 scattered rows. In Lance: direct page-level lookup. 100× is not marketing — it's physics.
Why NOT Parquet for ML: Parquet was designed for analytics workloads (scan entire columns). ML workloads need random row access. Wrong data structure for the job.
Use cases: embedding stores · ML training datasets · multi-modal data (images + metadata) · vector databases (LanceDB is built on Lance).
Replaces
pandas·SQLite(analytics) ·Spark(small-medium) → 5–10× · zero infrastructure
import duckdb
# Reads directly from S3, no download
duckdb.sql("SELECT count(*) FROM 's3://bucket/events/*.parquet' WHERE event = 'purchase'")- Multi-hypothesis CSV parsing (tests delimiters in parallel — no more
sep=','guessing) - Reads 10GB CSV in ~2s on M4 Max, same file pandas: ~45s
- Embeds in Python, Rust, Go, Node, Java — no server process
Why NOT SQLite for analytics: SQLite is row-oriented (great for OLTP). DuckDB is column-oriented (great for analytics). Filtering 1 column on 10M rows: SQLite reads all columns anyway. DuckDB reads only the column you need.
Why NOT pandas for SQL users: If you're writing df[df['col'] > x].groupby(...), you're re-inventing SQL badly. DuckDB does it in 1 line with a full SQL planner + vectorized execution.
Why NOT Spark: For <5TB, Spark's cluster overhead (startup, shuffle, GC) makes it slower than DuckDB on a single node.
Use cases: ad-hoc analytics on files · data pipelines without infra · replacing Jupyter+pandas workflows · log analysis · anywhere you want SQL on files.
Polars Rust 37.8K ⭐ · docs · benchmarks
Replaces
pandas→ 5–10× faster · lazy evaluation · SIMD multi-threaded
# The pandas way (eager, single-threaded, slow)
df = pd.read_csv("huge.csv")
result = df[df['revenue'] > 1000].groupby('country')['revenue'].sum()
# The Polars way (lazy, multi-threaded, fast)
result = pl.scan_csv("huge.csv").filter(pl.col('revenue') > 1000).group_by('country').agg(pl.col('revenue').sum()).collect()Why NOT pandas:
- Single-threaded by design (GIL + numpy)
- Eager evaluation = materializes every intermediate result
- Copies data on almost every operation
SettingWithCopyWarningwill haunt you
Rule: Use DuckDB when you think in SQL · Use Polars when you think in DataFrames.
Replaces
Tokioon Linux throughput workloads → 3× faster on 16 cores
| Cores | vs Tokio | vs NGINX |
|---|---|---|
| 4 | +2× | — |
| 16 | +3× | +20% |
| RPC throughput | +26% | — |
How: Thread-per-core model + io_uring native. Each CPU core has its own event loop, no cross-thread work stealing overhead. Tokio uses a work-stealing scheduler (great for mixed workloads, overhead for pure I/O).
Why NOT vanilla Tokio for Linux I/O servers: Tokio uses epoll under the hood. io_uring batches syscalls — 1 syscall for N I/O operations vs 1 syscall per operation. At high RPS, syscall overhead becomes the bottleneck.
When NOT to use: Cross-platform (macOS/Windows) · mixed CPU+I/O workloads · ecosystem compatibility (most async Rust libraries are Tokio-specific).
| io_uring | epoll | |
|---|---|---|
| Syscall overhead | Batch (1 for N ops) | 1 per op |
| 0-syscall polling | ✅ (busy-wait mode) | ❌ |
| Max IOPS | Millions | ~Hundreds of thousands |
| Fixed buffers | ✅ (zero-copy) | ❌ |
| Portability | Linux 5.1+ only | Everywhere |
| Production maturity | Growing fast | Decades |
Reference: io_uring deep dive by Jens Axboe (author)
tokio-quiche Rust · Cloudflare · InfoQ writeup
Replaces
Quinn→ 2× faster · production HTTP/3 at scale
Cloudflare's battle-hardened quiche QUIC implementation + Tokio async runtime. Handles millions of HTTP/3 connections in production.
Why HTTP/3 at all: HTTP/2 over TCP has head-of-line blocking — one lost packet stalls all streams. HTTP/3 over QUIC multiplexes independently — one lost packet stalls only its stream. On mobile networks: dramatic latency improvement.
Why Quinn doesn't cut it for production: Quinn is pure-Rust correctness-first. tokio-quiche is Cloudflare-hardened performance-first. For anything beyond experimentation, the production battle-testing matters.
Replaces
Nginx→ −60% CPU · −70% RAM · 1 trillion req/day
| Pingora | Nginx | |
|---|---|---|
| Global scale | 1T req/day | — |
| CPU usage | −60% | baseline |
| RAM usage | −70% | baseline |
| P50 latency | 5ms improvement | baseline |
| Connection reuse | Shared pool across workers | Per-worker (no sharing) |
The reason Nginx wastes memory: Nginx spawns a process per worker, each maintaining its own connection pool. Pingora uses async Rust — all workers share one connection pool. For a busy server making 10M upstream connections/day, Nginx creates each connection fresh. Pingora reuses them → 999× fewer TCP handshakes observed in Cloudflare's data.
Why NOT Nginx in 2026:
- Written in C → memory unsafety (CVEs year after year)
- Process-per-worker → no connection pool sharing
- Configuration is a DSL that can't express complex routing logic
- No async Rust → limited to epoll
Why NOT HAProxy: Better than Nginx, still C, still no connection sharing at the depth Pingora achieves.
Use cases: CDN edge · API gateway · TLS termination · any reverse proxy at scale.
Hono TypeScript 21K ⭐ · docs · benchmarks
Replaces
Express,Fastify→ 402K req/s · 14KB bundle · runs everywhere
Same code. Every runtime.
import { Hono } from 'hono'
const app = new Hono()
app.get('/api/users', (c) => c.json({ users: [] }))
export default app // Works on: CF Workers · Bun · Deno · Node · LambdaWhy NOT Express in 2026:
- 28K req/s vs Hono's 402K — 14× slower
- No TypeScript support built-in
- No native edge runtime support
- Bundle size: 200KB+ vs Hono's 14KB
- Middleware ecosystem is mostly unmaintained
Why NOT Fastify: Faster than Express (~78K req/s) but still single-runtime (Node.js only). No Cloudflare Workers, no Bun-native optimizations.
Use cases: Cloudflare Workers APIs · Bun-based backends · serverless functions · any TypeScript API that needs to run on multiple runtimes.
Replaces
Apache Kafka→ 5M+ msg/s · sub-ms P99 · single binary
| Iggy.rs | Apache Kafka | |
|---|---|---|
| Throughput | 5M+ msg/s | ~500K msg/s |
| P99 Latency | <1 ms | ~300 ms |
| Memory | ~200MB | ~6GB per broker |
| Operations | 1 binary | ZooKeeper + JVM + 3+ brokers |
| Monthly infra cost (est.) | ~$200 | ~$8,000 |
Why NOT Kafka in 2026 (unless you're Netflix):
- JVM startup: 30–90s cold start
- ZooKeeper dependency (being replaced by KRaft, but still complex)
- Minimum viable cluster: 3 brokers + 3 ZooKeeper nodes = 6 servers
- P99 of 300ms means your consumers always lag on bursts
- GC pauses = unpredictable producer ACKs
The startup math: If you're spending $8K/month on Kafka for <500K msg/s, Iggy.rs does 10× more on a $200/month server. That's $94K/year in savings.
Redpanda C++ 10K ⭐ · docs · benchmarks
Drop-in Kafka API compatible → 70× better P99 · no JVM · no ZooKeeper
Use when: existing Kafka tooling (connectors, Schema Registry, Kafka Streams) must be preserved, but Kafka's performance and operational complexity are killing you.
Why better P99 than Kafka: No GC. C++ with deterministic memory management. Redpanda's thread-per-core model eliminates lock contention across cores.
Migration: Change bootstrap server URL. That's it. All Kafka clients work unchanged.
Aeron C++/Java · latency numbers
29µs P99 · 4.7M msg/s · kernel-bypass UDP · HFT gold standard
Not a Kafka replacement — a fundamentally different class of tool.
When: sub-100µs is a hard requirement (HFT, financial exchanges, real-time auctions, multiplayer game servers).
Why the latency is so low: Kernel-bypass networking (DPDK/RDMA) — messages never enter the OS kernel. Direct NIC-to-userspace DMA. 29µs P99 includes the full round trip.
Why NOT for normal services: Requires kernel-bypass network setup, specialized hardware, and significant operational knowledge. Complete overkill unless microseconds literally cost money.
Replaces
RocksDBfor Rust → pure Rust · zero C++ FFI · <2.2MB
Why NOT RocksDB in Rust:
- RocksDB is C++ → requires bindgen + C++ toolchain + FFI overhead
rocksdbcrate compiles for 5–10 minutes (downloads and compiles all of RocksDB)- C++ allocator cannot be replaced with mimalloc without custom build
- Binary size: RocksDB adds ~30MB. Fjall adds ~2.2MB.
Why Fjall: Pure Rust LSM tree. No FFI. Compiles fast. Tiny binary. ACID transactions. v3.0 released Jan 2026 with production-ready stability.
When NOT to use: Need RocksDB's massive ecosystem of tooling, language bindings, and battle-tested production years → use [SpeedB] instead.
Replaces
RocksDB(100% API compatible) → −80% write amplification · 10× P99
Write amplification explained: Every 1 byte written to RocksDB may write 10–30 bytes to disk (compaction). SpeedB's improved compaction algorithm reduces this by 80% → your SSD lives 5× longer and write P99s are dramatically more consistent.
Migration: Change #include "rocksdb/db.h" → #include "speedb/db.h". That's it. Same config, same API, same file format.
567ms / 1M reads — fastest bulk reads · zero-copy MMAP · heed for Rust
MMAP-based B-tree. The OS page cache is the read cache — no double-buffering. Zero-copy reads: you get pointers directly into the mmap'd file.
Use cases: read-heavy lookups (embedding stores, config DBs, DNS caches) · anywhere you read >> write · Meilisearch uses LMDB for its indexes.
When NOT to use: High write throughput (B-tree vs LSM: LSM wins for writes, B-tree wins for reads).
Replaces
Elasticsearch,Lucene→ 6.5× faster · 56% less RAM · embeds in your binary
| Tantivy | Elasticsearch | |
|---|---|---|
| Query time (1M docs) | 0.8 ms | 5.2 ms |
| Indexing speed | 45K docs/s | 12K docs/s |
| RAM (1M docs) | 2.1 GB | 4.8 GB |
| Deployment | 1 library | JVM cluster |
Why NOT Elasticsearch in 2026:
- Requires JVM → minimum 2–4GB heap just to start
- Cluster mode = 3+ nodes for HA
- GC pauses → unpredictable P99 spikes under load
- License change (not Apache 2.0 since 7.11) → OpenSearch fork exists but carries same JVM baggage
Build on top of Tantivy: Quickwit (distributed, S3-native) · Meilisearch (simple API, great for apps) · Sonic (minimal search backend)
Replaces
Algolia,Elasticsearch→ <50ms · RAM-based · $0/month self-hosted
RAM-first = always instant. Typo tolerance built in. Simple REST API.
Why NOT Algolia: $500–10K+/month. TypeSense self-hosted: $0. Algolia's only advantage is their CDN-distributed edge network — which TypeSense Cloud also offers.
Why NOT Elasticsearch for app search: You don't need Kibana, Logstash, Beats, and a 3-node cluster for a search box in your SaaS app. TypeSense is one binary, one API.
Replaces
jemalloc,tcmalloc→ 20–40% less memory · faster small allocs
[dependencies]
mimalloc = "0.1"#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;Why glibc malloc wastes memory: glibc's allocator has poor thread-local caching → threads fight over global lock → fragmentation at scale. mimalloc uses thread-local heaps with free-list sharding.
Why NOT jemalloc as default: jemalloc is excellent for large allocations and Firefox-style workloads. For small allocations (<1KB, typical in web services), mimalloc wins consistently.
Real result: Several Rust web services reported 25–35% RSS reduction just from this 3-line change.
Highest concurrent throughput for message-passing workloads
Uses a "message passing" deallocation model — objects are freed by sending them back to their originating thread's pool. Eliminates the cross-thread free bottleneck entirely.
Use when: Actor-style or channel-heavy code where one thread allocates and a different thread frees. Higher memory overhead than mimalloc, but wins on concurrent throughput benchmarks.
rapidhash Rust/C · benchmarks
Replaces
xxHash3,wyhash→ 71 GB/s on M4 · 2025 speed champion
Newest champion (2025). Specifically optimized for Apple Silicon M4 microarchitecture. Beats both xxHash3 and wyhash on ARM64.
Why NOT MD5/SHA-1 for non-crypto hashing: 1–3 GB/s vs 71 GB/s. If you're using MD5 for deduplication or cache keys — you're leaving 50× performance on the table, and paying for crypto properties you don't need.
Replaces
SHA-256,SHA-3→ 8.4 GB/s multi-core · 4–10× faster >4KB · parallelizable
SHA-256 is sequential by design — you must process block N before block N+1. BLAKE3 uses a Merkle tree internally — all blocks are independent and can be hashed in parallel across all cores.
Why NOT SHA-256 (unless compliance forces you):
- Sequential → can't use multiple cores
- No hardware acceleration on ARM (unlike AES-NI)
- 4–10× slower for large files
- BLAKE3 is cryptographically sound and peer-reviewed by the same community
Drop-in CLI: brew install b3sum → same interface as sha256sum.
Replaces
SipHash(Rust's defaultHashMap) → fastest non-crypto hash forHashMap
Default in hashbrown (which backs std::HashMap). AES-based — uses hardware AES-NI instructions for the hash.
use ahash::AHashMap;
let mut map: AHashMap<String, u64> = AHashMap::new(); // Done.Why Rust uses SipHash by default: DoS resistance — adversarial inputs can't cause pathological hash collisions. If your HashMap keys come from untrusted users → keep SipHash. For internal maps with trusted keys → AHash.
| Use Case | Tool | Decomp Speed | Ratio | When to pick |
|---|---|---|---|---|
| Realtime / sub-ms | LZ4 | 3.5 GB/s | 1.12× | Streaming, hot paths, Kafka messages |
| Network / storage | zstd | 1 GB/s | 2.8× | S3, backups, network transfer |
| HTTP responses | Brotli | 0.4 GB/s | Best | Static assets, CDN responses |
| Game assets | Oodle Kraken | ~2 GB/s | ~2.5× | AAA games industry standard |
Why NOT gzip in 2026: zstd compresses better AND decompresses faster than gzip. The only reason to use gzip is HTTP/1.1 clients that don't support Brotli/zstd (increasingly rare).
zstd dict training trick: Train a zstd dictionary on your dataset → compress many small, similar documents 10–30× better. Critical for JSON API responses where each message is small but similar in structure.
# Train a dictionary from 1000 sample documents
zstd --train samples/* -o dict.zst
# Use it
zstd -D dict.zst -19 input.json -o compressed.zst # ~10x better on small similar docsHyperscan → Vectorscan C · paper
Replaces
PCREfor multi-pattern matching → 8.73× faster · 986 Mbps · DPI gold standard
Intel SIMD regex engine. Matches thousands of patterns simultaneously in a single pass through the input. Standard in Suricata, Snort, Zeek.
The AHA moment: With PCRE, matching 1000 patterns on a packet = 1000 regex evaluations in sequence. With Hyperscan = 1 pass, all 1000 matched simultaneously via SIMD. For intrusion detection on 10Gbps links, this isn't an optimization — it's the only option.
Vectorscan = community fork with ARM support (Hyperscan is Intel-only). Use Vectorscan for cross-platform builds.
RE2 C++ · Google · safety guarantee
Replaces
PCREfor untrusted input → Linear O(n) — no ReDoS possible
PCRE has exponential worst-case complexity. RE2 uses a DFA/NFA construction that guarantees O(n) time regardless of pattern or input.
Real security incident: The Cloudflare outage (July 2019) was caused by catastrophic backtracking in a WAF regex. A single crafted request saturated all CPUs to 100%. With RE2, this cannot happen.
What you give up: backreferences (\1), lookaheads, lookbehinds. These are the features that enable exponential complexity. RE2 supports everything else.
Rule: If the regex pattern or input comes from untrusted sources → RE2. Always.
Teddy (in aho-corasick) Rust · powers ripgrep
3.2–8.8× SIMD speedup over classic Aho-Corasick for multi-pattern substring search
16B (SSE) / 32B (AVX) SIMD chunks — processes 16–32 bytes per cycle. This is what makes ripgrep faster than grep for code search.
Use aho-corasick when: multi-pattern substring matching without full regex overhead. Searching for 1000 keywords in a document? Aho-Corasick makes one pass; naive loop makes 1000 passes.
Replaces
ring,openssl→ 15–30% faster · FIPS 140-3 certified
Drop-in replacement for the ring Rust crate (same API). Formally verified. FIPS 140-3 Level 1 certified by NIST.
Why NOT openssl-sys: OpenSSL binds via FFI → C-style API, memory unsafety risks, difficult to audit. aws-lc-rs is a Rust-safe wrapper over AWS-LC (their formally verified fork of BoringSSL).
Why NOT ring for compliance: ring is NOT FIPS certified. If you're handling healthcare, finance, or government data → you need FIPS 140-3. aws-lc-rs is your only Rust-native option.
Cargo.toml:
[dependencies]
aws-lc-rs = "1"
# or as ring drop-in:
ring = { package = "aws-lc-rs", version = "1", features = ["ring-sig-verify"] }SurrealDB Rust 28K ⭐ · docs · benchmarks
One database that thinks like you do. SQL + graph + document + time-series + vector search — all in one query engine. No ORMs. No joins between services. No polyglot persistence tax.
-- Graph traversal + vector search + SQL in ONE query
SELECT name, email,
->purchased->product.name AS products,
vector::similarity::cosine(embedding, $vec) AS match_score
FROM customer
WHERE city = 'Berlin' AND match_score > 0.8
ORDER BY match_score DESC LIMIT 10;Why NOT PostgreSQL: Postgres is relational-first. To do what the above query does you'd need: Postgres (SQL) + Neo4j (graph) + pgvector (vector) + jsonb workarounds (document) + separate connection poolers for each. SurrealDB is a single binary.
Why NOT MongoDB: Mongo added transactions late, graph never, vector as an afterthought. SurrealDB was designed for all of these from day one. And Mongo's query language is a JSON horror — SurrealQL reads like SQL.
Top alternatives if SurrealDB is wrong for you:
- EdgeDB
Python/Rust15K ⭐— Postgres-backed, typed schema, beautiful query language. Best if you want migrations + type safety + Postgres reliability. Trade-off: no native graph, no vector. - PocketBase
Go44K ⭐— Single binary, SQLite-backed, REST + realtime built-in. Best for solo projects and prototypes. Trade-off: not for multi-node production.
AHA moment: A startup replaced 4 services (PostgreSQL + Redis + Elasticsearch + Neo4j) with SurrealDB. Infra cost: $2,400/month → $140/month. Query latency dropped because no more network hops between services.
When NOT: SurrealDB 3.x is alpha-track — don't use it for regulated financial data yet. Use TigerBeetle for that.
TigerBeetle Zig 11K ⭐ · docs · design doc
The database purpose-built for financial transactions. 1 million financial transactions per second with zero data loss. Formal verification. Byzantine fault tolerance. Built to survive disk corruption, power cuts, and hardware failure.
Why NOT PostgreSQL for finance: Every fintech horror story involves double-charges, phantom balances, or race conditions in Postgres. TigerBeetle's entire model is built around the two-phase ledger pattern — it's physically impossible to create money or lose it due to a bug.
Why NOT Redis for ledgers: Redis is in-memory first. One power cut and your ledger is gone. TigerBeetle uses io_uring with explicit fsync and formal proofs that data is durable before acknowledging writes.
Top alternatives:
- FoundationDB
C++14K ⭐— Apple's distributed ACID database. More general-purpose than TigerBeetle. Powers Apple iCloud, Snowflake. Best when you need general transactions, not just financial. - CockroachDB
Go30K ⭐— Distributed Postgres-compatible. Better for global multi-region apps where you're already on Postgres. Trade-off: much more complex ops, 3-node minimum.
When to use: Financial ledgers, payment systems, double-entry accounting, any system where "how much money is in this account?" must be 100% accurate always.
Turso / libSQL Rust 14K ⭐ · docs · benchmarks
SQLite — but for the edge. Fork of SQLite that adds: replication, multi-tenancy, embedded replicas (local + remote sync), and a REST API. 10,000 databases per app. Sub-millisecond reads from local replica.
Why NOT vanilla SQLite: SQLite has no replication, no network access, no multi-tenancy. Once you outgrow a single file, you're stuck. libSQL extends SQLite without breaking compatibility.
Why NOT PlanetScale: PlanetScale shut down its free tier and charges per query. Turso has a generous free tier, self-hosting, and embedded replicas that work offline.
Top alternatives:
- LiteFS
Go4K ⭐— Fly.io's SQLite replication layer via FUSE filesystem. Best for Fly.io-deployed apps. Trade-off: FUSE overhead, Fly.io lock-in. - Neon
Rust16K ⭐— Serverless Postgres with branching (like git branches for your DB). Best when you need full Postgres compatibility + scale to zero. Trade-off: Postgres overhead vs SQLite simplicity.
ClickHouse C++ 39K ⭐ · docs · benchmarks
The OLAP database that makes Elasticsearch look like a calculator. Columnar storage, vectorized execution, 1 billion rows/second query speed. Used by Cloudflare, Uber, ByteDance for event analytics.
Why NOT Elasticsearch for analytics: Elasticsearch was built for full-text search, not aggregations. Querying 10 billion events with GROUP BY on Elasticsearch: minutes. On ClickHouse: sub-second. ES also uses 5-10× more RAM for the same data.
Why NOT BigQuery/Redshift: Cloud OLAP bills are per query scanned. One team at Cloudflare reported $50K/month BigQuery bills for analytics that cost $400/month on self-hosted ClickHouse.
Top alternatives:
- Apache Doris
C++13K ⭐— MySQL-compatible OLAP. Best if your team knows MySQL and you need real-time + historical analytics in one system. Trade-off: more complex to operate than ClickHouse. - DuckDB — (already in this list) For single-node analytics, DuckDB is simpler. Use ClickHouse when you need distributed, multi-user, production-scale analytics.
AHA moment: Cloudflare processes 600 billion rows of HTTP logs per day in ClickHouse. The entire analytics dashboard loads in under 1 second.
Qdrant Rust 22K ⭐ · docs · benchmarks
The vector database built for production AI. 4× faster than Pinecone on filtered vector search. Written in Rust, built on HNSW + quantization. Handles billion-scale collections. Self-hostable or cloud.
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
# Filtered semantic search — vector + metadata in ONE query
results = client.search(
collection_name="products",
query_vector=embedding,
query_filter={"must": [{"key": "category", "match": {"value": "electronics"}}]},
limit=10
)Why NOT Pinecone: Pinecone is closed-source, costs $70+/month for basic production use, and you can't self-host. Qdrant is MIT licensed, self-hostable on your own infra, and benchmarks faster.
Why NOT pgvector: pgvector stores vectors in Postgres rows. At 1M+ vectors, it becomes a full table scan problem. Qdrant's HNSW index gives sub-millisecond search at 100M vectors. pgvector is fine for <100K vectors where you want to stay in Postgres.
Top alternatives:
- Weaviate
Go12K ⭐— GraphQL-native vector DB with built-in text2vec modules. Best when you want turnkey embeddings without managing your own embedding pipeline. Trade-off: heavier, slower on raw vector search. - Milvus
Go/C++33K ⭐— Enterprise-scale vector DB (used by Salesforce, Walmart). Best for billion-scale collections with GPU acceleration. Trade-off: Kubernetes-native, significant ops overhead.
When NOT: For <50K vectors with Postgres already in your stack, just use pgvector. Don't introduce a new database for tiny-scale RAG.
Serverless vector database on object storage. No server to run. Data lives in S3/GCS/local as Lance files (columnar, like Parquet for ML). Sub-second vector search on 10M vectors from S3.
Why NOT Chroma: Chroma is great for prototyping but has no production deployment story — it's in-process Python. LanceDB works in-process AND scales to S3-backed production.
Top alternatives:
- Qdrant — better for multi-user production server deployments with complex filtering.
- DuckDB + vss extension — if you're already in DuckDB for analytics, add vector search without another service.
uv Rust 50K ⭐ · docs · benchmarks
pip + virtualenv + pyenv + poetry — replaced by one Rust binary.
uv pip install numpyis 10-100× faster than pip. Resolves your entirerequirements.txtin under a second. Ships Python itself (no pyenv needed).
# Before uv — the pain
pyenv install 3.12
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt # 2 min 18 sec
# After uv — everything
uv run python app.py # installs Python + deps in 3.2 sec, first time everWhy NOT pip: pip resolves dependencies in O(n²) time using a backtracking algorithm. On a 50-package project, pip can take minutes. uv uses a SAT solver written in Rust — milliseconds.
Why NOT Poetry: Poetry is slower than pip, adds a lockfile format nobody else uses, and has subtle resolution bugs that have caused production incidents. uv is a drop-in replacement with a uv.lock that is spec-compliant.
Top alternatives:
- Rye
Rust— uv's predecessor, now merging into uv. If you're on Rye, migrate to uv — it's the successor. - Pixi
Rust7K ⭐— conda-compatible package manager. Best for scientific Python with binary deps (CUDA, MKL). Trade-off: different ecosystem than PyPI.
AHA moment: A ML team running CI on GitHub Actions dropped build time from 4min 30sec → 23sec just by swapping pip install for uv pip install. Same requirements.txt, zero code changes.
Ruff Rust 36K ⭐ · docs · benchmarks
10-100× faster than Flake8. Replaces flake8 + black + isort + pyupgrade + pydocstyle — all in one binary. Lints a 300K-line codebase in 0.3 seconds.
Why NOT flake8 + black: Running flake8 + black + isort in pre-commit means 3 separate tools, 3 separate processes, 3 separate configs. Ruff does all of this in one pass. Django migrated to Ruff and cut CI lint step from 40s to 1.5s.
Why NOT pylint: Pylint is slow (minutes on large codebases), has a high false-positive rate, and requires significant config. Ruff covers ~90% of pylint's rules at 100× the speed.
Top alternatives:
- mypy
Python— Ruff doesn't do type checking. Use Ruff for linting + mypy for types. Or use pyright for faster type checking. - Pyright
TypeScript14K ⭐— Microsoft's type checker, 3-5× faster than mypy. Pairs perfectly with Ruff.
Build and publish Python packages from Rust with one command. The bridge that lets you write hot paths in Rust and call them from Python. Powers polars, pydantic-core, cryptography.
Why: When your Python is too slow, you rewrite the bottleneck in Rust with PyO3, then maturin develop gives you a Python-importable .so. No C extensions, no Cython, no ctypes boilerplate.
Top alternatives:
- PyO3
Rust13K ⭐— the Rust<→Python FFI layer that Maturin builds on. Use PyO3 for the Rust code, Maturin for the build/publish pipeline. - cffi — still works for C. For anything new in Rust, Maturin + PyO3 is strictly better.
Bun Zig 77K ⭐ · docs · benchmarks
Node.js + npm + Webpack + Jest — replaced by one binary. Starts 4× faster than Node.
bun installis 25× faster than npm. Built-in TypeScript, JSX, bundler, test runner, and SQLite. Zero config.
# Before: 4 tools, 4 configs, 4 package.json scripts
npm install # 47s
npx tsc # compile
webpack build # bundle
jest --coverage # test
# After: one tool, zero config
bun install # 1.9s
bun run build # TypeScript → bundle
bun test # test runner built-inWhy NOT Node.js: Node loads 100+ modules at startup for a basic Express app. Bun's JavaScriptCore engine starts in microseconds. For serverless edge functions, cold start time is your SLA.
Why NOT Deno: Deno made breaking changes for years and has poor npm compatibility. Bun is a drop-in Node.js replacement — your existing code runs as-is.
Top alternatives:
- Deno
Rust100K ⭐— V8-based, TypeScript-native, great security model (explicit permissions). Best for security-sensitive scripts and Deno Deploy. Trade-off: npm compatibility still imperfect. - Node.js 22+ with
--experimental-strip-types— if you need maximum ecosystem compatibility and can't migrate to Bun yet. Node now strips TypeScript types natively.
AHA moment: A team running 500 microservices on AWS Lambda dropped cold start from 800ms (Node.js) to 120ms (Bun) without changing a single line of application code.
Vite 8 TypeScript 71K ⭐ · docs · blog · v8 release
Vite 8 (March 2026): Rolldown is now the default bundler. The esbuild + Rollup split is over — Vite 8 uses Rolldown internally, giving you a unified Rust bundler that's 25× faster than Rollup+esbuild while keeping full Rollup plugin compatibility.
# Vite 8 — zero migration from v7, Rolldown transparent
npm create vite@latest my-app
# HMR in <50ms · Cold start <300ms · Build 25x faster than WebpackWhy NOT Webpack: Webpack bundles everything before you can see anything. On a 1,000-module project, cold start takes 30-90 seconds. Vite serves files as-is via native ESM — no bundling step.
Why NOT Create React App: CRA is officially deprecated (March 2023). It uses Webpack under the hood. Vite is the official successor recommended by the React team.
Top alternatives:
- Farm
Rust5.5K ⭐— Vite-compatible Rust bundler, drop-in alternative for performance-critical setups. Builds 19K modules in 1.4s. Best if you want Vite compatibility but even more build speed. - Turbopack
Rust— Vercel's bundler, integrated into Next.js 15. Comparable speed to Rolldown. Trade-off: Next.js only, not standalone.
Rolldown Rust 13.3K ⭐ · docs · benchmarks
The bundler that unified esbuild + Rollup. Rust-based, Rollup-API-compatible, esbuild-speed. Now the default engine inside Vite 8. Use directly for library builds, CLI tools, and non-Vite projects.
| Bundler | 19K modules | Notes |
|---|---|---|
| Rolldown | 1.61s | Vite 8 default |
| esbuild | 1.70s | Go, no Rollup plugins |
| Rspack | 4.07s | Webpack-compatible |
| Rollup + esbuild | 40.10s | Old Vite 7 internals |
Why NOT esbuild alone: esbuild is fast but has no tree-shaking for complex dynamic imports and no Rollup plugin ecosystem. Rolldown inherits all Rollup plugins AND esbuild speed.
Why NOT Rollup: Rollup is JavaScript. Rolldown is Rollup rewritten in Rust — same API, 25× faster. If you're on Rollup today, Rolldown is a transparent upgrade.
Top alternatives:
- esbuild
Go40K ⭐— Still faster for trivial bundles, great for CLIs/tools. Use Rolldown when you need Rollup-level code splitting + plugin ecosystem. - OXC
Rust20.4K ⭐— Umbrella Rust toolchain (parser + linter + formatter + bundler). OXC's bundler is the long-term play but still maturing. Watch 2026-2027.
The unified Vite ecosystem layer from VoidZero (the company Evan You founded to commercialize Vite/Rolldown/OXC). Wraps runtime + package manager + build tool into one coherent DX. Think: the "Next.js of build tooling" but framework-agnostic.
Status (April 2026): Active development, ~5-10K stars, recent dependency and performance updates. Watch closely — this is the official upstream for where Vite + Rolldown + OXC are heading.
Vike · (formerly vite-plugin-ssr) TypeScript · migration guide
Vite-native SSR/SSG/SPA router — page-by-page render mode. The most flexible Vite meta-framework. Each page can independently be: SSR · SSG · SPA · streaming. No lock-in.
Why NOT Next.js for Vite users: Next.js forces you to Vercel's deployment model and App Router complexity. Vike is framework-agnostic, deploys anywhere (Cloudflare Workers, AWS Lambda, Docker, Vercel), and gives you full control over rendering strategy per page.
Top alternatives:
- SvelteKit — best meta-framework if you're on Svelte. Simpler than Vike for pure Svelte projects.
- Analog — Angular meta-framework on Vite. Best if your org is Angular-committed.
Biome Rust 18K ⭐ · docs · benchmarks
ESLint + Prettier — replaced by one Rust binary. Formats and lints in <100ms. The Rome Tools fork that actually shipped. 97% compatibility with Prettier output.
Why NOT ESLint + Prettier: Two tools, two configs, constant version conflicts (eslint-config-prettier, prettier-eslint), slow CI. Biome replaces both in one binary with zero config.
Top alternatives:
- oxlint
Rust13K ⭐— 50-100× faster than ESLint. New project but growing fast. Pairs well with Prettier if you don't want full Biome. - dprint
Rust3K ⭐— formatter-only, plugin-based. Best if you need Prettier-compatible formatting but want more control.
mold C++ 15K ⭐ · benchmarks
The linker that makes Rust builds feel fast. 8× faster than GNU ld, 2× faster than lld on large C++/Rust projects. Links Chrome in 2 seconds. Links a large Rust binary in 0.5s vs 8s with ld.
# .cargo/config.toml — drop-in, zero code changes
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]Why NOT the default ld: GNU ld is single-threaded and was designed in 1987. It links by reading every object file sequentially. mold uses lock-free concurrent algorithms and finishes before ld has read half the inputs.
Why NOT lld (LLVM linker): lld is already fast. mold is faster on Linux for large binaries. On macOS, use lld (mold is Linux-only) — macOS ld64 replacement is zld or just use lld.
Top alternatives:
- lld
C++— LLVM's linker. Works on macOS + Linux. 4× faster than GNU ld. Good default upgrade before considering mold. - wild
Rust4K ⭐— New Rust linker, still experimental. Aims to be even faster than mold for incremental Rust builds. Watch 2026.
AHA moment: A Rust backend team running cargo build in CI dropped from 8 min 40s → 2 min 10s by adding 3 lines to .cargo/config.toml. No code changes. No new hardware.
VictoriaMetrics Go 13K ⭐ · docs · benchmarks
Prometheus — but 10× smaller storage, 3× faster queries, and actually scales. Drop-in Prometheus-compatible. Single binary. Handles 10M+ metrics/second on a single node with 7× better compression than Prometheus.
Why NOT Prometheus: Prometheus was not designed for long-term storage. Its local TSDB degrades after a few months, it has no horizontal scaling story, and high-cardinality metrics (Kubernetes labels) kill it. VictoriaMetrics handles billions of time series on a single node.
Why NOT Datadog/New Relic: Datadog bills by host + metric cardinality. A 100-node Kubernetes cluster with standard metrics can hit $8,000/month. VictoriaMetrics on a $200/month server handles the same load. Booking.com saved $4M/year switching from Datadog.
Top alternatives:
- Thanos
Go13K ⭐— Prometheus with object storage + horizontal scaling. Best if you're already committed to Prometheus and need HA + long-term storage. Trade-off: more complex than VictoriaMetrics. - Mimir
Go4K ⭐— Grafana's horizontally scalable Prometheus backend. Best when you're already on Grafana Cloud ecosystem.
Continuous profiling — see WHERE your CPU and memory go in production, always. Phlare/Pyroscope adds a 2-5% overhead flame graph stream from every running service. Find the function burning your CPU budget.
Why NOT nothing: Most teams only profile when things break. By then the bug is hard to reproduce. Pyroscope profiles continuously — you can look back in time and see what was running when your CPU spike happened.
Why NOT manual profiling (pprof/perf): Manual profiling is a snapshot. Pyroscope is a time-series of flame graphs. You can correlate CPU spikes with deployments, traffic peaks, or specific user actions.
Top alternatives:
- Clinic.js — for Node.js profiling specifically. Better DX for JS devs than Pyroscope's JS agent.
- Parca
Go4K ⭐— eBPF-based continuous profiler, no code instrumentation needed. Best for profiling without modifying your app.
OpenObserve Rust 14K ⭐ · docs · benchmarks
Elasticsearch for logs — at 140× lower storage cost. Logs + metrics + traces in one UI. S3-compatible object storage backend. OpenTelemetry-native. Replaces the entire ELK stack.
Why NOT ELK (Elasticsearch + Logstash + Kibana): Elasticsearch uses inverted indices that store terms for every field — 10 bytes of log becomes 50-100 bytes stored. OpenObserve uses columnar Parquet on S3 — 10 bytes of log becomes 1-2 bytes. At 1TB/day of logs, ELK costs $15K+/month on cloud. OpenObserve costs ~$200/month in S3 fees.
Top alternatives:
- Grafana Loki
Go24K ⭐— Log aggregation with label-based indexing. Doesn't index log content (only metadata), so much cheaper than ELK. Best if you're already on Grafana. Trade-off: full-text log search is slower without content indexing. - ClickHouse — ClickHouse handles structured logs brilliantly. Many teams skip dedicated log tools and just send logs to ClickHouse, then query with SQL.
Astro TypeScript 49K ⭐ · docs · benchmarks
Zero JavaScript by default. Ship HTML with optional islands of interactivity. Pages that load in 50ms instead of 3 seconds. Use React, Vue, Svelte, or Solid components — all in the same project.
---
// Runs on server at build time — zero JS shipped to browser
const products = await db.query("SELECT * FROM products");
---
<ul>
{products.map(p => <li>{p.name}</li>)}
</ul>
<!-- React only where needed — "island" architecture -->
<InteractiveCart client:load />Why NOT Next.js for content sites: Next.js ships a React runtime (~45KB gzipped) on every page, even if the page is static. A blog post doesn't need a JavaScript framework. Astro sends zero JS for static pages — measurably better Core Web Vitals.
Why NOT Gatsby: Gatsby was the Astro of 2019 but added complexity over time. It's slow to build (minutes for large sites), has a heavyweight GraphQL data layer, and has been in maintenance mode since Netlify acquired it.
Top alternatives:
- SvelteKit
TypeScript19K ⭐— Full-stack meta-framework on Svelte. Best when you need full interactivity + SSR without the React overhead. Svelte compiles to vanilla JS — no virtual DOM. - Remix
TypeScript30K ⭐— React meta-framework focused on progressive enhancement and web standards. Best for React teams that want a Next.js alternative with better data loading patterns.
Svelte 5 TypeScript 82K ⭐ · docs · benchmarks
A compiler, not a framework. Svelte 5 with Runes is 30-50% faster than React in JS framework benchmarks. Ships ~10KB for a full interactive app vs React's ~45KB runtime. No virtual DOM, no diffing — it compiles to surgical DOM updates.
<script>
// Svelte 5 Runes — reactivity without magic
let count = $state(0);
let doubled = $derived(count * 2);
</script>
<button onclick={() => count++}>
Count: {count}, Doubled: {doubled}
</button>Why NOT React: React's virtual DOM diffing is an optimization for a problem Svelte doesn't have. Svelte compiles to imperative DOM updates — no reconciliation, no fiber, no hook rules. The mental model is simpler: your component IS your output.
Why NOT Vue: Vue 3's Composition API is an explicit answer to React's hook problem. Svelte's Runes solve the same thing with less ceremony. Vue ships ~35KB runtime; Svelte ships the compiled output only.
Top alternatives:
- SolidJS
TypeScript33K ⭐— JSX syntax + fine-grained reactivity. Faster than Svelte in most benchmarks. Best if you love React's JSX model but hate React's performance. Trade-off: smaller ecosystem than Svelte. - Preact
TypeScript37K ⭐— React-compatible, 3KB. Best for dropping into existing React apps or size-critical situations. Trade-off: React compatibility shims add back some weight.
Structured for AI agents and quick lookups
# JSON
JSON_RUST → sonic-rs | replaces: serde_json | gain: 1.5-2x, 50% less heap
JSON_CPP → simdjson | replaces: RapidJSON | gain: 6GB/s, industry standard
JSON_C_IOT → yyjson | replaces: cJSON | gain: 2.3x stringify, zero deps
JSON_ZERO_ALLOC → json-zero | replaces: all | gain: 0 heap allocations
# Serialization
SERIALIZE_RPC → Cap'n Proto | replaces: Protobuf | gain: 0ns deser, built-in RPC
SERIALIZE_IOT → FlatBuffers | replaces: Protobuf | gain: 60-80% CPU+size
SERIALIZE_RUST → rkyv | replaces: serde | gain: 200% throughput, 65% less mem
# Query & Columnar
COLUMNAR_ML → Lance | replaces: Parquet | gain: 100x random access
QUERY_ENGINE → DataFusion | replaces: Spark | gain: SIGMOD2024 winner, no cluster
ANALYTICS_SQL → DuckDB | replaces: pandas/SQLite | gain: 5-10x, zero infra
ANALYTICS_DF → Polars | replaces: pandas | gain: 5-10x, lazy eval
# Async & I/O
ASYNC_LINUX → Monoio | replaces: Tokio | gain: 3x on 16 cores
ASYNC_CROSS → Tokio | replaces: async-std | gain: ecosystem standard
HTTP3 → tokio-quiche | replaces: Quinn | gain: 2x faster, CF-proven
# HTTP
HTTP_PROXY → Pingora | replaces: Nginx | gain: -60% CPU, -70% RAM
HTTP_EDGE → Hono | replaces: Express | gain: 14x faster, multi-runtime
# Messaging
MQ_KAFKA_KILL → Iggy.rs | replaces: Kafka | gain: 10x throughput, sub-ms P99
MQ_KAFKA_COMPAT → Redpanda | replaces: Kafka | gain: 70x P99, no JVM
MQ_HFT → Aeron | replaces: ZeroMQ | gain: 29us P99, kernel bypass
# Storage
KV_RUST → Fjall | replaces: RocksDB | gain: pure Rust, <2.2MB binary
KV_ROCKSDB_COMPAT → SpeedB | replaces: RocksDB | gain: -80% write amp, drop-in
KV_READS → LMDB | replaces: LevelDB | gain: 567ms/1M reads, zero-copy
# Search
FTS_LIB → Tantivy | replaces: Lucene/ES | gain: 6.5x, 56% less RAM
FTS_SERVICE → TypeSense | replaces: Algolia/ES | gain: <50ms, $0 self-hosted
# System
ALLOC_GENERAL → mimalloc | replaces: jemalloc | gain: 20-40% less mem
ALLOC_CONC → snmalloc | replaces: jemalloc | gain: highest conc throughput
HASH_SPEED → rapidhash | replaces: xxHash3 | gain: 71GB/s on M4
HASH_CRYPTO → BLAKE3 | replaces: SHA-256 | gain: 4-10x, parallelizable
HASH_HASHMAP → AHash | replaces: SipHash | gain: fastest HashMap
# Compression
COMPRESS_RT → LZ4 | replaces: gzip | gain: 3.5GB/s sub-ms
COMPRESS_STORE → zstd | replaces: gzip | gain: 2.8x ratio, 1GB/s
COMPRESS_HTTP → Brotli | replaces: gzip | gain: best ratio for HTTP
# Regex
REGEX_MULTI → Hyperscan | replaces: PCRE | gain: 8.73x, SIMD parallel
REGEX_SAFE → RE2 | replaces: PCRE | gain: O(n) guaranteed, no ReDoS
# Crypto
CRYPTO_TLS_FIPS → aws-lc-rs | replaces: openssl | gain: FIPS 140-3, +15-30%
Paste this into your
ARCHITECTURE.md
┌─────────────────────────────────────────────────────────────────────────────┐
│ GAMECHANGER STACK 2026 │
├──────────────────────┬──────────────────────────────────────────────────────┤
│ JSON │ sonic-rs (Rust) · simdjson (C++) · yyjson (C) │
│ Serialization │ Cap'n Proto (0ns RPC) · rkyv (Rust zero-copy) │
│ Columnar / ML │ Lance (100x random access) · DuckDB (analytics) │
│ Query Engine │ DataFusion (embedded SQL) · Polars (DataFrame) │
│ Database (multi) │ SurrealDB (SQL+graph+vector+doc in one binary) │
│ Database (finance) │ TigerBeetle (1M tx/s, zero data loss, Zig) │
│ Database (edge SQL) │ Turso/libSQL (SQLite with replication + REST) │
│ Database (OLAP) │ ClickHouse (1B rows/s, Cloudflare-scale) │
│ Vector DB │ Qdrant (Rust, 4x Pinecone) · LanceDB (serverless) │
│ Async (Linux) │ Monoio (io_uring, 3x Tokio) │
│ Async (cross) │ Tokio │
│ HTTP Proxy │ Pingora (−60% CPU vs Nginx) │
│ HTTP Edge API │ Hono (14KB, multi-runtime, 402K req/s) │
│ HTTP/3 │ tokio-quiche (Cloudflare-proven) │
│ Message Queue │ Iggy.rs (Kafka killer) · Aeron (HFT µs) │
│ KV Store │ Fjall (pure Rust) · SpeedB (RocksDB drop-in) │
│ Full-Text Search │ Tantivy (library) · TypeSense (service) │
│ Memory Allocator │ mimalloc (default) · snmalloc (concurrent) │
│ Hash (general) │ rapidhash (71GB/s) · AHash (Rust HashMaps) │
│ Hash (crypto) │ BLAKE3 (parallelizable, 4-10x SHA-256) │
│ Compression │ zstd (storage) · LZ4 (realtime) │
│ Regex │ Hyperscan (multi-pattern) · RE2 (safe/untrusted) │
│ Crypto / TLS │ aws-lc-rs (FIPS 140-3) · ring (general Rust) │
│ Python pkg mgr │ uv (100x pip, one binary) │
│ Python linter │ Ruff (10-100x flake8+black+isort) │
│ JS/TS runtime │ Bun (25x npm, 4x Node.js startup) │
│ JS bundler/HMR │ Vite 8 + Rolldown (ESM-native, Rust bundler, <300ms)│
│ JS bundler (library) │ Rolldown (direct) · Farm (Vite-compat alt) │
│ JS linter+formatter │ Biome (ESLint+Prettier in Rust) · OXC (watch) │
│ Vite SSR/SSG │ Vike (page-by-page render mode, replaces Next.js) │
│ Linker │ mold (8x GNU ld, Linux) · lld (macOS) │
│ Metrics │ VictoriaMetrics (7x Prometheus compression) │
│ Continuous profiling │ Pyroscope (always-on flame graphs) │
│ Log analytics │ OpenObserve (140x less storage than ELK) │
│ Frontend (content) │ Astro (0KB JS default, island architecture) │
│ Frontend (reactive) │ Svelte 5 (compiler, no vDOM, 10KB runtime) │
└──────────────────────┴──────────────────────────────────────────────────────┘
| Tool | Stars | Status | Why it matters |
|---|---|---|---|
| Vortex | — | Prototype | Adaptive per-column encoding → 10× Parquet compression ratio |
| rkyv | 5.5K |
Stable | 200% throughput, 65% memory reduction, zero-copy Rust |
| Glaze | 1.5K |
Active | Claims to beat simdjson; early but promising benchmarks |
| Fjall 3.0 | 1K |
✅ Jan 2026 | Pure-Rust RocksDB replacement hits v3.0 stable |
| sonic-go | 7K |
Active | SIMD-JSON for Go — ByteDance's port of the sonic approach |
| iggy.rs v1.0 | 2K |
Active | Approaching stable API; watch for v1.0 production declaration |
| Pingap | 3K |
Active | Pingora-based reverse proxy with config-driven management |
| DataFusion Comet | 1K |
Active | Spark plugin that accelerates Spark with DataFusion — gradual migration path |
| wild linker | 4K |
Active | New Rust-native linker, may beat mold for incremental Rust builds |
| oxlint | 13K |
Active | 50-100× faster ESLint; catching up fast on rule coverage |
| OXC bundler | 20.4K |
Active | Full Rust JS toolchain (parser+linter+formatter+bundler) — long-term Rolldown rival |
| Farm | 5.5K |
✅ Stable | Vite-compatible Rust bundler, builds 19K modules in 1.4s |
| VitePlus | ~8K |
Active | VoidZero's unified Vite+Rolldown+OXC ecosystem layer |
| Rolldown | 13.3K |
✅ Vite 8 | Now the default Vite 8 bundler — Rollup+esbuild unified in Rust |
| ty (Astral) | — | Alpha | Ruff's type checker companion; will challenge mypy/pyright |
| Parca | 4K |
Active | eBPF continuous profiling — zero code instrumentation needed |
| RisingWave | 8K |
Active | Streaming SQL database — Flink replacement in Rust |
| Neon | 16K |
✅ GA | Serverless Postgres with git-like branching for DB schemas |
This list is machine-indexed in SurrealDB for agent-driven tool selection:
-- Find what replaces a specific legacy tool
SELECT name, speed_value, github_url, use_case
FROM gamechanger_tool
WHERE replaces CONTAINS 'Kafka';
-- Best tool for a category
SELECT name, speed_value, language, replaces
FROM gamechanger_tool
WHERE category = 'Full-Text Search'
ORDER BY category;
-- Semantic search across 18K repos (HNSW, 768-dim)
SELECT name, description, stars,
vector::similarity::cosine(embedding, $query_vec) AS score
FROM repo WHERE embedding IS NOT NONE
ORDER BY score DESC LIMIT 10;
-- Find stale legacy repos still on awesome lists
SELECT name, stars, last_commit FROM repo
WHERE last_commit < '2022-01-01' AND stars > 1000
ORDER BY stars DESC;Local KB: http://127.0.0.1:9926 · NS awesome · DB github · 18,286 repos · 768-dim HNSW
CLI: bash query.sh gamechanger · python3 semantic_search.py "fast Rust HTTP server"
- 18,286 repos indexed from
awesome-rust·awesome-go·awesome-cpp·awesome-selfhosted·awesome-mcp-servers+ 15 other lists - All repos stored in SurrealDB with stars, forks, language, topics, last commit
- 768-dim HNSW embeddings via nomic-embed-text (Ollama) for semantic discovery
- 68 RELATE graph edges
gamechanger_tool → replaces → legacy_repobuilt in SurrealDB - Benchmarks from official repos, SIGMOD papers, Cloudflare blog, and production case studies — never marketing copy
PR criteria — all four required:
- Real benchmark with a source URL (no "blazing fast" without a number)
- Production-proven OR active commit in last 6 months
- Clear replacement target (what legacy tool does it beat, and why)
- Honest "When NOT to use" section — this is what separates useful lists from hype lists
Open a PR · Suggest a tool via Issue
⭐ If this helped you drop a legacy tool, star it.
Built with: SurrealDB · nomic-embed-text · ripgrep · DuckDB · 18,286 repos analyzed
"The best tool is the one you'd choose if you were starting today."