SVDQuant and Nunchaku

Introduction

This is a super brief summary of SVDQUant. Please refer to the original paper if interested.

\[Q_X = \operatorname{round}(X / s_X)\]

$s_X = \max(\vert X \vert ) / q_{max}$ and $q_{max} = \text{possible max value in repr}$

\[Q(X) = \text{dequantization of }X = s_X\,Q_X\]

$XW$ can be approximated by

\[XW = Q(X)Q(W) = S_X\,S_W\,Q_X\,Q_W\]

SVDQuant introduces two-path quant...

Click to read more ...

A plan for getting involved in an open source project

I have little experience with open source, but currently I’m neither associated with nor active in any open source project. I want to be, and I have a good plan for it. From my experience, I believe it’s a good plan, so other people who want to get involved but struggle might find it helpful. Keep in mind this is my plan and may not suit everyone.

Precaution

While many guides paint a rosy picture, keep in mind two key points:

Cuda Matmul with wmma

Matmul

There are several ways to do matmul in CUDA.

  • Use libraries
    • cutlass
    • cuBlas
  • Implementing your own
    • naive way
    • tiled matmul
    • you can also decide to use shared memory or not

It’s not the end, yet another way to implement matmul is to use wmma. This accro...

Click to read more ...

Learning with chatGPT

Recently I’m learning stuffs with chatGPT. I found that it is exteremly useful to use chatGPT to learn something new.

Some Pros:

  • I can dive into codebase first with few prior knowledge. chatgpt isn’t pretty good at generating code yet but good at explaining concepts like “Vector” in mlir.
  • Enable semantic search, not keyword matching query. For example, I can query like “what is that doing …” this kind of query will yield bad answers with google search, mo...
Click to read more ...