A plan for getting involved in an open source project
I have little experience with open source, but currently I’m neither associated with nor active in any open source project. I want to be, and I have a good plan for it. From my experience, I believe it’s a good plan, so other people who want to get involved but struggle might find it helpful. Keep in mind this is my plan and may not suit everyone.
Precaution
While many guides paint a rosy picture, keep in mind two key points:
- Use libraries
- cutlass
- cuBlas
- Implementing your own
- naive way
- tiled matmul
- you can also decide to use shared memory or not
- I can dive into codebase first with few prior knowledge. chatgpt isn’t pretty good at generating code yet but good at explaining concepts like “Vector” in mlir.
- Enable semantic search, not keyword matching query. For example, I can query like “what is that doing …” this kind of query will yield bad answers with google search, mo...
- A is M by K sized matrix
- B is K by N sized matrix
- C = A @ B, is M by N sized matrix
Cuda Matmul with wmma
Matmul
There are several ways to do matmul in CUDA.
It’s not the end, yet another way to implement matmul is to use wmma
. This accro...
Learning with chatGPT
Recently I’m learning stuffs with chatGPT. I found that it is exteremly useful to use chatGPT to learn something new.
Some Pros:
Implementing FlashAttnetion V1 naively
Warning
This is not a comprehensive tutorial. It’s more a note for myself to write what descisions I made while implementing naive FlashAttention V1. So sadly this also describes my limitation of skills.
I already posted an introductory post about CUDA a year ago. I’ve been not using CUDA actively after writing this post. It would be great if I continue to develop Parallel Computing sinc...
Tiled Matmul 101
I’m extremly poor at thinking about matrices. I’ve seen many people graphically think and draw matrix strides, multiplications, etc…. Yet as a person who work in ML, I think I should understand matmul, even in high-level concepts. This post is about my struggle to understand matmul in HPC environment.
for convenience, I set the notation:
Click to read more ...
Flash Attention Idea
This post is basically a commentary, or more introductory version of this gist. I added some which helped me to understand the article, but more verbosity and confusion may be introduced. Thanks to Kunwar Grover for making a great tutorial on FlashAttention and mlir.linalg.
The goal of this post is to provide conceptual understanding of FlashAttention for peop...
Click to read more ...
Flash Attention Idea
This post is basically a commentary, or more introductory version of this gist. I added some which helped me to understand the article, but more verbosity and confusion may be introduced. Thanks to Kunwar Grover for making a great tutorial on FlashAttention and mlir.linalg.
The goal of this post is to provide conceptual understanding of FlashAttention for peop...