Cuda Matmul with wmma

Matmul

There are several ways to do matmul in CUDA.

Use libraries
- cutlass
- cuBlas
Implementing your own
- naive way
- tiled matmul
- you can also decide to use shared memory or not

It’s not the end, yet another way to implement matmul is to use wmma. This accro...

Click to read more ...

Learning with chatGPT

Recently I’m learning stuffs with chatGPT. I found that it is exteremly useful to use chatGPT to learn something new.

Some Pros:

I can dive into codebase first with few prior knowledge. chatgpt isn’t pretty good at generating code yet but good at explaining concepts like “Vector” in mlir.
Enable semantic search, not keyword matching query. For example, I can query like “what is that doing …” this kind of query will yield bad answers with google search, mo...

Click to read more ...

Implementing FlashAttnetion V1 naively

Warning

This is not a comprehensive tutorial. It’s more a note for myself to write what descisions I made while implementing naive FlashAttention V1. So sadly this also describes my limitation of skills.

I already posted an introductory post about CUDA a year ago. I’ve been not using CUDA actively after writing this post. It would be great if I continue to develop Parallel Computing sinc...

Click to read more ...

Tiled Matmul 101

I’m extremly poor at thinking about matrices. I’ve seen many people graphically think and draw matrix strides, multiplications, etc…. Yet as a person who work in ML, I think I should understand matmul, even in high-level concepts. This post is about my struggle to understand matmul in HPC environment.

for convenience, I set the notation:

A is M by K sized matrix
B is K by N sized matrix
C = A @ B, is M by N sized matrix

Click to read more ...
mlsys
Jan 27, 2025

Flash Attention Idea

This post is basically a commentary, or more introductory version of this gist. I added some which helped me to understand the article, but more verbosity and confusion may be introduced. Thanks to Kunwar Grover for making a great tutorial on FlashAttention and mlir.linalg.

The goal of this post is to provide conceptual understanding of FlashAttention for peop...

Click to read more ...

9월 16일의 일기

잘 산다는 것은 무엇일까. 항상 생각해왔던 것을 잊지 않게 간단하게 적어두려 한다.

성취하려고 애쓰는 것이 아니라 유지하는 것이다.

몸과 마음이 건강하게 지내는 것, 주변 사람들과 좋은 관계를 유지하는 것, 먹고 살 정도의 소득이 있는 것, 때로는 열심히, 때로는 즐겁게 시간을 보낼 일, 친구, 취미가 있는 것은 일시적으로는 어렵지 않은 목표이다.

하지만 이런 상황을 유지하는 것은 어렵다. 조울증 환자로서, 피할 수 없이 이따금 무기력해지는 날들이 있다. 잠을 못 드는 시간이 길어지면 낮에 피곤해진다. 낮에 피곤해지면 짜증을 쉽게 내고, 주변 사람들과의 관계가 틀어지기 쉽다. 일의 페이스를 유지해지기 어렵고, 즐거운 일도 즐겁지 않게 느껴진다. 이런 상황이 오지 않게 최대한 막고, 오면 최대한 빠르게 탈출해야 한다.

그리고 이러한 일들은 내가 갖고 있을 때는...

Click to read more ...

Cuda Matmul with wmma

Matmul

Learning with chatGPT

Implementing FlashAttnetion V1 naively

Warning

Tiled Matmul 101

Click to read more ... mlsys Jan 27, 2025

Flash Attention Idea

9월 16일의 일기

성취하려고 애쓰는 것이 아니라 유지하는 것이다.

Click to read more ...
mlsys
Jan 27, 2025