Episode 1: Introduction — Bielik Architecture and Triton

Back to Series Overview | Next: Matmul - Heart of the Transformer

Overview

The opening episode of the Bielik Anatomy series. We look at the architecture of Bielik 1.5B - a Polish language model with 1.6 billion parameters - and explain why Triton is the tool of choice for writing custom GPU kernels.

Topics Covered

What is Bielik - a 1.6B-parameter Polish LLM based on a Qwen-like architecture, created by the SpeakLeash community
Architecture walkthrough - embedding layer, 32 decoder layers, final RMSNorm, language model head
Grouped Query Attention (GQA) - 12 query heads with only 2 KV heads, achieving 55% parameter reduction
SwiGLU activation - the gated feed-forward network used in modern LLMs
Why Triton over CUDA - Python-like syntax, automatic block management, compiler optimizations, cross-GPU portability
Triton vs PyTorch - RMSNorm example: 1 fused kernel in Triton vs 3-4 separate kernels in PyTorch, 2-3x faster
Series roadmap - basic kernels, attention, FFN, text generation

Key Takeaways

Bielik uses modern architectural improvements (GQA, SwiGLU, RMSNorm) that we will implement one by one
Triton lets you write high-performance GPU kernels without the boilerplate of raw CUDA
The series is a learning journey - building every component from scratch

References

Back to Series Overview | Next: Matmul - Heart of the Transformer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Episode 1: Introduction — Bielik Architecture and Triton

Overview

Topics Covered

Key Takeaways

References

FilesExpand file tree

ep01-introduction.md

Latest commit

History

ep01-introduction.md

File metadata and controls

Episode 1: Introduction — Bielik Architecture and Triton

Overview

Topics Covered

Key Takeaways

References