• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Sunday, May 3, 2026
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Machine Learning

How a 2021 Quantization Algorithm Quietly Outperforms Its 2026 Successor

Admin by Admin
May 3, 2026
in Machine Learning
0
Feature.jpg
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Churn With out Fragmentation: How a Get together-Label Bug Reversed My Headline Discovering

The right way to Implement Device Calling with Gemma 4 and Python


[3], a web-based vector quantization methodology, drew extensive public consideration at ICLR 2026. For me, it appeared very acquainted: it overlaps closely with EDEN, a quantization methodology first launched because the 1-bit methodology DRIVE at NeurIPS 2021 [1] and generalized to arbitrary bit-widths at ICML 2022 [2]. Co-authored on my own, with Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson, Michael Mitzenmacher, and Shay Vargaftik.

The TurboQuant paper presents two variants: TurboQuant-mse and TurboQuant-prod. In an in depth new comparability [5] we present that TurboQuant-mse is a degenerate case of EDEN, and that the EDEN variants persistently outperform their counterparts.

How EDEN quantizes a vector

Suppose you could compress a dd-dimensional vector xx (a gradient replace, an embedding, a KV-cache entry) down to some bits per coordinate. EDEN proceeds in 4 steps:

  1. Random rotation — Multiply by a random orthogonal matrix ΠPi. After rotation the coordinates are identically distributed and, for giant dd, roughly Gaussian.
  2. Scalar quantization — Spherical every rotated coordinate to one in all 2b2^b ranges from a Lloyd–Max codebook educated on the recognized rotated coordinate distribution (bb is the goal variety of bits per coordinate).
  3. Scale — Multiply by a scale issue SS.
  4. Inverse rotation — Apply Π⊤Pi^high to recuperate an approximation x^hat{x} of the unique vector.

Whereas earlier works (e.g., Suresh et al. (2017) [6]) used rotation primarily to shrink the coordinates’ dynamic vary (the hole between the most important and smallest coordinate worth), EDEN [1] was, to the very best of our information, the primary quantization scheme to take advantage of a stronger reality about random rotation: the post-rotation coordinates comply with a recognized distribution, which lets us use a deterministic quantizer paired with a closed-form scale that, relying on the appliance, both minimizes MSE or makes the estimate unbiased. Each scales are derived analytically, and the development yields an asymptotic MSE discount over the earlier strategy.

Concretely, EDEN’s two variants differ solely within the alternative of SS:

  • EDEN-biased — units SS to the closed-form worth that minimizes the reconstruction MSE.
  • EDEN-unbiased — chooses SS so the decompressed output is right on common (𝔼[x^]=xmathbb{E}[hat{x}] = x), which issues notably everytime you common many quantized vectors (e.g., distributed coaching, consideration).

Lined up in opposition to EDEN, TurboQuant-mse matches at each step besides one: the place EDEN derives the size SS analytically, TurboQuant-mse, though it targets MSE minimization, skips the optimized scaling.

The pseudocode beneath reveals the three aspect by aspect.

Determine 1: EDEN’s pseudocode instantiated for EDEN-biased, EDEN-unbiased, and TurboQuant-mse. The three are an identical besides at step 5: the selection of S. Picture by creator [5].

Why the optimum scale is price it

The worth of making use of correct scale SS grows with bit-width. At b=1b = 1 bit, the hole is marginal. At d=128d = 128 and b=4b = 4 bits, EDEN-biased reduces MSE by 2.25% over TurboQuant-mse, and these are the bit-widths practitioners really use for embeddings and KV caches.

Throughout dimensions 16 to 4096 and all examined bit-widths b∈{1,2,3,4}b in {1,2,3,4}, EDEN-biased vNMSE (vector-normalized MSE, 𝔼[‖x−x^‖2]/‖x‖2mathbb{E}[|x – hat{x}|^2] / |x|^2) falls beneath TurboQuant-mse’s in each case (Determine 2). As dimension grows very massive, the optimum SS approaches 1 and the 2 algorithms converge, however at sensible dimensions (128–1024), the hole persists.

Determine 2: vNMSE vs. dimension evaluating EDEN-biased and TurboQuant-mse throughout bit-widths b∈{1,2,3,4}b in {1,2,3,4} (panels left to proper). EDEN-biased (which optimizes the size issue SS) achieves decrease error than TurboQuant-mse (which fixes S=1S=1) at each examined dimension. The curves converge at excessive dimension because the optimum SS approaches 1. Picture by creator [5].

Unbiased compression: saving greater than a full bit

The outcomes above concern the biased (MSE-minimizing) variants. Now take into account the unbiased case, the place purposes corresponding to distributed coaching, approximate consideration, or inner-product retrieval want 𝔼[x^]=xmathbb{E}[hat{x}] = x as a result of they common many quantized vectors.

EDEN-unbiased makes use of the identical single-pass algorithm as EDEN-biased, simply with SS chosen for bias correction. TurboQuant’s unbiased variant, TurboQuant-prod, takes a distinct route: it spends (b−1)(b-1) bits on the biased TurboQuant-mse step and reserves 1 bit for a QJL (Quantized Johnson–Lindenstrauss) [4] correction on the residual (QJL is much like EDEN at b=1b=1, however with larger variance).

EDEN-unbiased outperforms TurboQuant-prod in each examined configuration, and by a considerable margin. The hole traces to a few structural benefits of EDEN’s single-pass design:

  1. EDEN optimizes the size. TurboQuant-prod inherits TurboQuant-mse’s s=1s=1 first stage, so it carries the identical MSE penalty.
  2. EDEN’s 1-bit development has decrease variance than QJL. In massive dimensions, EDEN’s 1-bit vNMSE converges to π/2−1≈0.57pi/2 – 1 approx 0.57 [1], whereas QJL’s converges to π/2≈1.57pi/2 approx 1.57 [4], roughly 2.75× larger.
  3. EDEN spends the total bit finances on a single unbiased quantizer. TurboQuant-prod splits the finances into (b−1)(b-1) biased bits plus 1 residual bit, which empirically underperforms spending all bb bits on a single unbiased quantizer [5].

These results compound. The consequence: 1-bit, 2-bit, and 3-bit EDEN-unbiased are every extra correct than 2-bit, 3-bit, and 4-bit TurboQuant-prod, respectively (Determine 3). By swapping in EDEN you possibly can drop a bit per coordinate and nonetheless match TurboQuant-prod’s accuracy.

Determine 3: vNMSE vs. dimension evaluating EDEN-unbiased and TurboQuant-prod throughout bit-widths b∈{1,2,3,4}b in {1,2,3,4}(panels left to proper). EDEN-unbiased achieves decrease error at each dimension. The hole is massive sufficient that EDEN with bb bits usually outperforms TurboQuant-prod with b+1b + 1 bits. Picture by creator [5].

On TurboQuant’s personal benchmarks

The identical image holds on the usual ANN benchmarks TurboQuant evaluates on, Stanford’s GloVe pre-trained phrase vectors (Open Knowledge Commons Public Area Dedication and License v1.0) and Qdrant’s dbpedia-entities-openai3-text-embedding-3-large embeddings (Apache 2.0), utilizing TurboQuant’s revealed analysis code:

EDEN-biased achieves decrease MSE than TurboQuant-mse, EDEN-unbiased achieves markedly decrease inner-product error than TurboQuant-prod, and nearest-neighbor recall on each datasets favors EDEN (Determine 4).

Determine 4: Nearest-neighbor recall on GloVe and OpenAI3 embeddings at 2 and 4 bits per coordinate. EDEN-unbiased outperforms TurboQuant-prod throughout all 4 settings. Picture by creator [5].

Takeaway: use EDEN; optimum scaling issues

EDEN’s scale connects the recognized post-rotation distribution to an analytically optimum quantizer. TurboQuant-mse retains EDEN’s rotation and the codebook however pins S=1S=1, which is what makes it a strictly weaker particular case. TurboQuant-prod provides a 1-bit QJL stage on high of that, the place EDEN-unbiased will get the identical property, with higher accuracy, by simply selecting a bias-correcting scale.

  • For MSE-targeted compression (mannequin weight quantization, nearest-neighbor search, KV cache): EDEN-biased computes the optimum scale SS and persistently beats TurboQuant-mse (which is EDEN with S=1S=1 mounted).
  • For unbiased estimation (distributed imply estimation, approximate consideration, inner-product retrieval): EDEN-unbiased considerably outperforms TurboQuant-prod’s bit-splitting technique, by margins price greater than a full bit per coordinate.

EDEN was initially developed for distributed imply estimation in federated and distributed coaching. Subsequent work has, for instance, utilized it to embedding compression for doc re-ranking (SDR, 2022 [8]), tailored it for NVFP4 LLM coaching (MS-EDEN in Quartet II, 2026 [10]), generalized it to vector quantization for data-free LLM weight compression (HIGGS, 2025 [9]), which was then used for KV-cache compression (AQUA-KV, 2025 [11]).

EDEN implementations can be found: in PyTorch and TensorFlow, in Intel’s OpenFL [7], and its 1-bit variant in Google’s FedJax, TensorFlow Federated, and TensorFlow Mannequin Optimization.

For the total technical comparability evaluation with TurboQuant (all figures, detailed experimental methodology), see our observe [5].

For the unique derivations, proofs, and additional extensions, see our unique papers [1] [2].

References

  1. S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, DRIVE: One-bit Distributed Imply Estimation (2021), NeurIPS 2021.
  2. S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, EDEN: Communication-Environment friendly and Strong Distributed Imply Estimation for Federated Studying (2022), ICML 2022.
  3. A. Zandieh, M. Daliri, A. Hadian, V. Mirrokni, TurboQuant: On-line Vector Quantization with Close to-optimal Distortion Fee (2026), ICLR 2026.
  4. A. Zandieh, M. Daliri, I. Han, QJL: 1-Bit Quantized JL Rework for KV Cache Quantization with Zero Overhead (2024), arXiv:2406.03482.
  5. R. Ben-Basat, Y. Ben-Itzhak, G. Mendelson, M. Mitzenmacher, A. Portnoy, S. Vargaftik, A Word on TurboQuant and the Earlier DRIVE/EDEN Line of Work (2026), arXiv:2604.18555.
  6. A. T. Suresh, F. X. Yu, S. Kumar, H. B. McMahan, Distributed Imply Estimation with Restricted Communication (2017), ICML 2017.
  7. VMware Open Supply Weblog, VMware Analysis Group’s EDEN Turns into A part of OpenFL (November 2022).
  8. N. Cohen, A. Portnoy, B. Fetahu, A. Ingber, SDR: Environment friendly Neural Re-ranking utilizing Succinct Doc Illustration (2022), ACL 2022.
  9. V. Malinovskii, A. Panferov, I. Ilin, H. Guo, P. Richtárik, D. Alistarh, HIGGS: Pushing the Limits of Giant Language Mannequin Quantization through the Linearity Theorem (2025), NAACL 2025.
  10. A. Panferov, E. Schultheis, S. Tabesh, D. Alistarh, Quartet II: Correct LLM Pre-Coaching in NVFP4 by Improved Unbiased Gradient Estimation (2026), arXiv:2601.22813.
  11. A. Shutova, V. Malinovskii, V. Egiazarian, D. Kuznedelev, D. Mazur, N. Surkov, I. Ermakov, D. Alistarh, Cache Me If You Should: Adaptive Key-Worth Quantization for Giant Language Fashions (2025), ICML 2025.
Tags: AlgorithmOutperformsQuantizationQuietlySuccessor

Related Posts

Screenshot 2026 04 30 at 23.49.12.png
Machine Learning

Churn With out Fragmentation: How a Get together-Label Bug Reversed My Headline Discovering

May 2, 2026
Mlm mayo how to implement tool calling with gemma 4 and python b.png
Machine Learning

The right way to Implement Device Calling with Gemma 4 and Python

May 1, 2026
Skarmavbild 2026 04 26 kl. 16.36.44.jpg
Machine Learning

Agentic AI: The way to Save on Tokens

April 30, 2026
Mlm awan getting started with zero shot text classification 1024x571.png
Machine Learning

Getting Began with Zero-Shot Textual content Classification

April 29, 2026
Pexels magda ehlers pexels 4184216 scaled 1.jpg
Machine Learning

Correlation Doesn’t Imply Causation! However What Does It Imply?

April 28, 2026
Mlm olumide build local ai agents with slms 1024x571.png
Machine Learning

Constructing AI Brokers with Native Small Language Fashions

April 28, 2026
Next Post
Bitcoin atm canada.jpg

Canada needs to ban crypto ATMs as fraud fears flip Bitcoin entry right into a political goal

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
Chainlink Link And Cardano Ada Dominate The Crypto Coin Development Chart.jpg

Chainlink’s Run to $20 Beneficial properties Steam Amid LINK Taking the Helm because the High Creating DeFi Challenge ⋆ ZyCrypto

May 17, 2025
Image 100 1024x683.png

Easy methods to Use LLMs for Highly effective Computerized Evaluations

August 13, 2025
Blog.png

XMN is accessible for buying and selling!

October 10, 2025
0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025

EDITOR'S PICK

Istock 1356848899.jpg

Bitcoin Value To Backside At $45K? On-Chain Indicator Says Sure

February 15, 2026
Screenshot 2025 02 20 At 8.49.05 am.png

AI Brokers from Zero to Hero – Half 1

February 24, 2025
Nathan dumlao eksqjxtlpak unsplash scaled 1.jpg

Take a Deep Dive into Filtering in DAX

February 26, 2026
Chatgpt Image May 1 2025 11 35 32 Pm.png

Construct and Question Information Graphs with LLMs

May 2, 2025

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Canada needs to ban crypto ATMs as fraud fears flip Bitcoin entry right into a political goal
  • How a 2021 Quantization Algorithm Quietly Outperforms Its 2026 Successor
  • Which Regularizer Ought to You Really Use? Classes from 134,400 Simulations
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?