Aug-Nov 2025 | Department of Computer Science
This graduate-level course discusses recent advancements in the field of NLP. It covers different aspects of Large Langauge Models - the different architectural and design challenges, as well as useful considerations for training such models.
| Component | Weight |
|---|---|
| In-class quizzes (3-4) | 10-20% |
| Project | 40-60% |
| Semester Exam | 30-50% |
CS 5803 (Natural Language Processing)
| Serial No. | Topics | Reading |
|---|---|---|
| 1 | Transformer architecture, different types of positional embeddings: Absolute, relative, rotary | Shared during class |
| 2 | Attention Mechanisms: MHA, MQA, GQA, MHLA, Flash Attention | Shared during class |
| 3 | Pre-tokenization and Tokenizaton: BPE, Word-piece, Sentence-Piece, Unigram LM, Pruning, ByteT5 | Shared during class |
| 4 | Multilinguality: Multilingual Pre-Training, Data Sampling | Shared during class |
| 5 | Mechanistic Interpretability: LogitLens, PatchScope, Circuits | Shared during class |
| 6 | Vision Language Models: CLIP, LLaVa, InstructBLIP | Shared during class |
| 7 | State Space Models - S4 Architecture, Mamba | Shared during class |
| 8 | Choices during model building: Data Mixing, Mixture of Experts | Shared during class |
| 9 | Scaling Laws: Kaplan, Chinchila | Shared during class |
| 10 | Agent Foundation Models | Shared during class |
Detailed day-wise schedule is not given. Depending on the coverage, discussion on some topics may span over multiple lectures.