HuffSpace
Parallel file compression engine using Huffman coding with multi-GB throughput.
40–60
Compression Ratio
~3.8x
Speedup vs Single-thread
4
Max Input Size Tested
4
Chunk Size
Problem
Problem statement, constraint shape, and the gap this project explores.
How do you compress multi-GB files with Huffman coding without running into single-threaded throughput limits?
Engineered compression system for high-throughput environments where memory constraints, processing latency, and data integrity requirements shaped core architectural decisions.
Standard Huffman compression is inherently sequential — you build a frequency table, construct the tree, then encode. This serializes the entire pipeline. For multi-GB inputs, single-threaded encoding becomes a bottleneck. HuffSpace explores a chunk-parallel approach: split input into independent segments, encode each in parallel, and reconstruct with per-chunk Huffman trees embedded in the output stream.
Constraints
Non-negotiable boundaries that shaped the implementation.
Multi-GB files
Bounded — no full-file buffering
All available cores via Rayon
Bit-exact decompression required
Sequential output despite parallel encoding
Architecture
The primary design surface: flow, subsystem roles, and state boundaries.
HuffSpace divides input into fixed-size chunks, processes each chunk independently through frequency analysis → tree construction → encoding, then serializes the output stream with per-chunk metadata headers for decompression.
Input file
Chunker
[Parallel: Frequency
Tree
Encode]
Stream Assembler
Compressed output
Chunker
Splits input into fixed-size segments for parallel processing.
Frequency Analyzer
Builds per-chunk byte frequency tables.
Tree Builder
Constructs Huffman trees from frequency tables using a min-heap.
Encoder
Bit-packs symbols according to the Huffman code table.
Stream Assembler
Serializes chunk outputs with metadata for sequential decompression.
Engineering Tradeoffs
Design review notes: what was optimized and what was deliberately left behind.
Chunk-parallel vs streaming
Streaming Huffman requires sequential frequency analysis. Chunk parallelism trades slightly higher memory for significant throughput gains.
Memory-optimal streaming approach
Rayon-based chunk parallelism
Per-chunk trees vs global tree
A global tree requires two passes over the entire file. Per-chunk trees allow single-pass parallel encoding at the cost of 5–10% compression ratio.
Optimal global compression ratio
Per-chunk Huffman trees
Fixed chunk size vs adaptive
Fixed chunks simplify parallelism scheduling and memory allocation. Adaptive chunking adds complexity without significant ratio benefit for binary data.
Content-aware chunking
Fixed 4MB chunks
Failure Modes
Incident-style notes for the ways the design can break.
Bit boundary misalignment on decompression
FM-01Chunk boundaries require careful bit offset tracking. Off-by-one errors cause cascading decompression failures.
with explicit boundary markers.
Memory pressure on large files
FM-02Holding all chunk outputs in memory before assembly caused OOM on constrained systems.
by streaming assembled chunks directly to disk.
Rayon thread pool starvation
FM-03Imbalanced chunk sizes caused some threads to idle.
by equalizing chunk sizes and using dynamic work-stealing.
Corrupted output on panic in worker thread
FM-04Rayon propagates panics to the joining thread. Added explicit error handling to ensure partial output is discarded cleanly.
Rayon propagates panics to the joining thread. Added explicit error handling to ensure partial output is discarded cleanly.
Benchmarks
Environment first, numbers second. Metrics should be inspectable, not ornamental.
Rust / rayon
40–60% compression ratio
Rust, rayon, clap, serde
Systems Programming
Project-level benchmark notes
40–60%
~3.8xon 8 cores
4GB
4MB
~2x chunk sizeper worker
Lessons Learned
Engineering takeaways from the implementation, including remaining work.
Parallelizing inherently sequential algorithms requires rethinking the data format
per-chunk metadata is necessary overhead.
Rayon's work-stealing scheduler handles uneven workloads well, but only if tasks are appropriately sized.
Bit-level encoding bugs are extremely hard to debug without property-based test roundtrip checks.
Future: explore ZSTD-style dictionary compression across chunks to recover the ratio loss from per-chunk trees.