Name	Name	Last commit message	Last commit date
parent directory ..
src	src
CLAUDE.md	CLAUDE.md
Cargo.toml	Cargo.toml
README.md	README.md

preflate

Core DEFLATE analysis and reconstruction engine for the preflate-rs workspace.

This crate takes an existing DEFLATE-compressed bitstream and splits it into two parts:

The uncompressed plaintext
A compact corrections blob that captures everything needed to recreate the original bitstream exactly

Given those two parts, the original DEFLATE stream can be reconstructed bit-for-bit. This enables re-compressing the plaintext with a modern algorithm (Zstd, Brotli, LZMA) while preserving binary-exact fidelity.

How It Works

Analysis pipeline

Parse (deflate/) — Decode the DEFLATE bitstream into a token sequence: literals and length/distance back-references.
Estimate (estimator/) — Analyze the token sequence to fingerprint the original compressor. Identifies hash algorithm, chain depth, nice-length cutoff, window size, and add policy.
Predict (token_predictor.rs) — Re-run compression using the estimated parameters, generating a predicted token sequence.
Encode (statistical_codec.rs, cabac_codec.rs) — Record each position where the actual token differs from the prediction, using CABAC arithmetic coding.

Reconstruction pipeline

Feed the plaintext and corrections back into the predictor. It replays every compression decision, applying corrections where recorded, and writes the resulting tokens back to a DEFLATE bitstream using the original Huffman trees.

API

Whole-stream (simple)

use preflate_rs::{preflate_whole_deflate_stream, recreate_whole_deflate_stream, PreflateConfig};

let config = PreflateConfig::default();

// Analysis: DEFLATE → plaintext + corrections
let (result, plain_text) = preflate_whole_deflate_stream(&compressed_data, &config)?;

// Reconstruction: plaintext + corrections → original DEFLATE
let recreated = recreate_whole_deflate_stream(plain_text.text(), &result.corrections)?;

assert_eq!(compressed_data, recreated);

Streaming (chunked)

For large streams where memory is a concern, use the streaming processors:

use preflate_rs::{PreflateStreamProcessor, RecreateStreamProcessor, PreflateConfig};

// Analysis
let mut processor = PreflateStreamProcessor::new(&config);
let chunk_result = processor.decompress(&compressed_chunk)?;
// chunk_result.corrections contains the encoded corrections for this chunk
// processor.plain_text() gives access to the decompressed data

// Reconstruction
let mut recreator = RecreateStreamProcessor::new();
let (deflate_output, _blocks) = recreator.recompress(&mut plain_text_reader, &corrections)?;

Configuration

pub struct PreflateConfig {
    /// Maximum hash chain traversal depth. Higher values improve prediction
    /// accuracy for streams compressed with high chain limits, at the cost
    /// of analysis time. Default: 4096.
    pub max_chain_length: u32,

    /// Maximum plaintext held in memory at once. Default: 128 MB.
    pub plain_text_limit: usize,

    /// Verify round-trip correctness after analysis. Default: true.
    pub verify_compression: bool,
}

Supported Compressors

The estimator detects the following DEFLATE implementations by their token patterns and hash algorithms:

Compressor	Detection method
zlib	Default hash, standard chain behavior
zlib-ng	Distinct hash variant
libdeflate	4-byte hash tables
miniz / miniz_oxide	Fastest-mode hash function
Windows zlib	Built-in PNG/ZIP codec fingerprint

Unknown compressors still round-trip correctly with higher corrections overhead.

Key Types

Type	Description
`PreflateStreamChunkResult`	Output of analysis: corrections blob, compressed size, estimated parameters
`TokenPredictorParameters`	Compressor fingerprint: hash algorithm, nice_length, max_chain, window_bits, add policy
`HashAlgorithm`	Enum of detected compressor families
`PlainText`	Decompressed data with sliding-window dictionary support
`PreflateError`	Rich error type with detailed exit codes

Encoding Format

Parameters are serialized with bitcode. Corrections use CABAC (Context Adaptive Binary Arithmetic Coding), the same codec used in Lepton JPEG compression. The format is chunked so memory use is bounded regardless of input size.

The following differences from the predicted stream are encoded:

Block type (uncompressed, static Huffman, dynamic Huffman)
Token count per block
Dynamic Huffman tree encoding
Literal vs. length/distance choice
Incorrect distance or length (encoded as hop count back through the hash chain)

Constraints

No unsafe code — #![forbid(unsafe_code)]
Minimum Rust version: 1.89, Edition 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

preflate

How It Works

Analysis pipeline

Reconstruction pipeline

API

Whole-stream (simple)

Streaming (chunked)

Configuration

Supported Compressors

Key Types

Encoding Format

Constraints

FilesExpand file tree

preflate

Directory actions

More options

Directory actions

More options

Latest commit

History

preflate

Folders and files

parent directory

README.md

preflate

How It Works

Analysis pipeline

Reconstruction pipeline

API

Whole-stream (simple)

Streaming (chunked)

Configuration

Supported Compressors

Key Types

Encoding Format

Constraints