SVDQuant and Nunchaku

Introduction

This is a super brief summary of SVDQUant. Please refer to the original paper if interested.

\[Q_X = \operatorname{round}(X / s_X)\]

$s_X = \max(\vert X \vert ) / q_{max}$ and $q_{max} = \text{possible max value in repr}$

\[Q(X) = \text{dequantization of }X = s_X\,Q_X\]

$XW$ can be approximated by

\[XW = Q(X)Q(W) = S_X\,S_W\,Q_X\,Q_W\]

SVDQuant introduces two-path quantization.

Let’s introduce a smoothing factor ${\lambda}$: \(\hat{X} = X\,\operatorname{diag}({\lambda})^{-1}\)

Then,

\[XW = \hat{X}\,W\,\operatorname{diag}({\lambda}) = \hat{X}\,\hat{W}\]

Use SVD to decompose $\hat{W}$ as \(\hat{W} = L_1\,L_2 + R,\quad \text{where } L_1 = S\Sigma \text{ and } L_2 = V\)

Thus, \(XW = \hat{X}\,\hat{W} = \hat{X}\,L_1\,L_2 + \hat{X}\,R\)

  • $L_1$ and $L_2$ are low-rank (32 in actual implementations), preserved in 16 bits.
  • Quantize $\hat{X}$ and $R$ using W4A4 quantization.

This is open-sourced in Nunchaku. I’m also a maintainer of the project, responsible for Python engine–related tasks such as caching and adding new modules. In my last post, I mentioned my interest in contributing—and the chance has arrived.

NEXTThoughts