Introduction
This is a super brief summary of SVDQUant. Please refer to the original paper if interested.
\[Q_X = \operatorname{round}(X / s_X)\]$s_X = \max(\vert X \vert ) / q_{max}$ and $q_{max} = \text{possible max value in repr}$
\[Q(X) = \text{dequantization of }X = s_X\,Q_X\]$XW$ can be approximated by
\[XW = Q(X)Q(W) = S_X\,S_W\,Q_X\,Q_W\]SVDQuant introduces two-path quantization.
Let’s introduce a smoothing factor ${\lambda}$: \(\hat{X} = X\,\operatorname{diag}({\lambda})^{-1}\)
Then,
\[XW = \hat{X}\,W\,\operatorname{diag}({\lambda}) = \hat{X}\,\hat{W}\]Use SVD to decompose $\hat{W}$ as \(\hat{W} = L_1\,L_2 + R,\quad \text{where } L_1 = S\Sigma \text{ and } L_2 = V\)
Thus, \(XW = \hat{X}\,\hat{W} = \hat{X}\,L_1\,L_2 + \hat{X}\,R\)
- $L_1$ and $L_2$ are low-rank (32 in actual implementations), preserved in 16 bits.
- Quantize $\hat{X}$ and $R$ using W4A4 quantization.
This is open-sourced in Nunchaku. I’m also a maintainer of the project, responsible for Python engine–related tasks such as caching and adding new modules. In my last post, I mentioned my interest in contributing—and the chance has arrived.