Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor
Guoxin Ma, Yibing Liu, Chengzhengxu Li, Yu Liang, Yan Wang, Yueyang Zhang, Kecheng Chen, Zhaohan Zhang, Zhiyuan Sun, Daiting Shi
Read on arXiv →Key claim
TaC outperforms existing compression methods significantly.
This paper introduces Thinking as Compression (TaC), a novel approach that allows LLMs to compress long contexts by generating thinking traces. The method outperforms existing compression techniques, achieving significant improvements in F1 and Exact Match scores at high compression ratios.
In plain English
This paper introduces Thinking as Compression (TaC), a novel approach that allows LLMs to compress long contexts by generating thinking traces. The method outperforms existing compression techniques, achieving significant improvements in F1 and Exact Match scores at high compression ratios.
The introduction of Thinking as Compression (TaC) presents a significant new method for context compression that leverages the intrinsic capabilities of LLMs.
The experiments across multiple benchmarks provide solid evidence supporting the claims, although more extensive baselines could enhance reliability.
Deep reliability assessment
The experiments support that reasoning traces can serve as effective query-conditioned compressed contexts for long-context QA, and that reward tuning improves budget control and discourages answer-like shortcuts. The broader claim that reasoning models are generally intrinsic context compressors is overextended without stronger evidence on end-to-end latency/cost, non-QA tasks, tool/agent histories, code, and extremely long contexts.
Reproducibility
No code repository is mentioned in the provided abstract, introduction, conclusion, or visible references. Datasets are identifiable only at a high level as four long-context QA benchmarks, with some cited benchmarks such as LongBench, Natural Questions, and multi-hop QA datasets, but the exact experimental setup is not fully recoverable from the excerpt.
Discussion questions
- 1.Is a thinking trace truly a reusable compressed context, or is it closer to an intermediate answer/rationale that may leak task-specific shortcuts and reduce verifiability?
- 2.For production RAG systems, does TaC-C reduce total cost and latency once the extra thinking/compression pass is included, or is it mainly useful when the compressed trace is reused across multiple downstream generations?
- 3.What result would falsify the core claim: failure on adversarial multi-hop QA, poor transfer to non-QA tasks like code or tool-use logs, or lower end-to-end efficiency than simpler retrieval/pruning baselines?
Key figure
Figure 1 contrasts task-agnostic global compression, task-aware relevance-based segment selection, and TaC, where a thinking model dynamically focuses, skips, revisits, links, and organizes evidence into a compact trace.