LLM from scratch

We just released a PyTorch boilerplate to code an LLM from scratch. Not only boring transformers implementation, but we also include cool stuff such as:
- RoPE
- GQA (Grouped-query attention)
- Fast Feedforward Networks
- Liger Kernel
- SOAP optimizer
GitHub repo: https://github.com/kreasof-ai/LLM-from-scratch