-
Notifications
You must be signed in to change notification settings - Fork 0
Modernize NLP course with updated syllabus, slides, and assignments #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Major Updates: ============= 📚 Syllabus Enhancements: - Added 40+ primary source research papers from top venues - Integrated cognitive science and linguistics papers (Fedorenko, Schrimpf, etc.) - Added comprehensive HuggingFace course chapter references throughout - Updated all weeks with modern 2023-2025 papers (Self-RAG, Mixtral, Llama 3, etc.) - Enhanced distributional semantics, neuroscience, and dialogue sections 📊 Complete Beamer Slide Decks (7 weeks): - Week 1: Introduction, String Manipulation, ELIZA - Week 2: Computational Linguistics, Tokenization, POS Tagging - Weeks 3-4: Text Embeddings, LSA, LDA, Word2Vec, Modern Topic Modeling - Weeks 5-6: Transformers, Attention, BERT, Neuroscience Perspectives - Week 7: Models of Conversation, Pragmatics, Dialogue, Common Ground - Week 8: GPT Evolution (GPT-1 to GPT-4), LLMs and the Brain - Week 9: RAG, Mixture of Experts, Self-Supervised Learning, Future Directions All slides include: - Engaging emojis and modern design (Metropolis theme) - Code examples with syntax highlighting - TikZ diagrams and visualizations - Discussion questions for undergraduates - References to primary sources and HuggingFace resources - Interactive learning elements 🤖 GitHub Actions Automation: - Automatic LaTeX compilation on push - PDF generation for all slides - Beautiful web interface with gradient design - GitHub Pages deployment - One-click access to all course materials 📝 Comprehensive Assignment Updates: - Assignment 4: Context-Aware Customer Service Chatbot - RAG-based implementation with semantic search - BERT/sentence-transformers + FAISS - Multi-turn conversation handling - Comprehensive evaluation metrics - Assignment 5: Build and Train GPT Model - Implement transformer from scratch - Multiple implementation paths (scratch/nanoGPT/HF) - Training on custom datasets - Attention visualization and analysis - Comparison with GPT-2 - Final Project: Capstone Research Project - 22 diverse project ideas across 8 categories - Clear timeline with milestones - Detailed grading rubric - Team-based (2-3 students) - Leverages GenAI for ambitious scope All assignments designed for sophisticated undergraduate work with GenAI coding assistance. Technical Details: - All slides compilable with pdflatex/xelatex - Automated CI/CD pipeline for slides - Comprehensive documentation and resources - Modern tools: HuggingFace, PyTorch, Colab-compatible - Cutting-edge topics: RAG, MoE, multimodal learning, agents
Assignment Updates: ================== 📝 Assignment 1: ELIZA (113 → 520 lines) - Added historical context and ELIZA effect psychology - New Part 2: Analysis and Exploration (conversation testing, pattern analysis) - New Part 3: Reflection on conversation, understanding, and ethics - Optional extensions: advanced pattern matching, emotion tracking, hybrid systems - Comprehensive grading rubric and resources - Emphasis on critical thinking about AI 📊 Assignment 2: SPAM Classifier (145 → 589 lines) - Expanded to require multiple methods: traditional ML + neural + ensemble - Comprehensive evaluation with 6+ metrics, not just AUC - Systematic error analysis (20+ failure cases) - NEW: Adversarial testing (create spam to evade detection) - Real-world deployment considerations - Statistical rigor requirements (cross-validation, significance tests) - Performance benchmarks: AUC > 0.85 (minimum), > 0.96 (excellent) 📈 Assignment 3: Wikipedia Embeddings (95 → 856 lines) - Expanded from 4 to 10+ embedding methods across 5 categories - Classical: LSA, LDA - Static: Word2Vec, GloVe, FastText - Contextualized: BERT, GPT-2 - Modern: Sentence-BERT, Llama 3 8B - Topic Models: BERTopic, Top2Vec - Sophisticated clustering with multiple algorithms - Quantitative metrics (silhouette, Davies-Bouldin, coherence) - Cognitive science connection (distributional hypothesis, human judgments) - Advanced visualization (interactive Plotly, UMAP + t-SNE) - Optional extensions: cross-lingual, temporal analysis, applications - 40-60 hour multi-week project All Assignments Now Include: - Clear learning objectives - Detailed grading rubrics (100 points + bonus) - Tips for success and common pitfalls - Comprehensive resources and references - Submission guidelines and checklists - Academic integrity policies - FAQ sections - Professional structure matching Assignments 4-5 Philosophy: All assignments now balance hands-on implementation with deep analytical thinking, leveraging GenAI for ambitious scope while ensuring genuine understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR modernizes an NLP course with comprehensive updates including new beamer slide decks covering 7 weeks of content, spanning from introductory ELIZA chatbots through advanced topics like Transformers, GPT evolution, RAG (Retrieval Augmented Generation), and Mixture of Experts. The materials include engaging emojis, code examples with syntax highlighting, TikZ diagrams, discussion questions for undergraduates, and references to primary sources and HuggingFace resources.
Key Changes:
- Complete slide decks for weeks 1, 5-6, 7, 8, and 9
- Modern design with Metropolis/Madrid themes and extensive visualizations
- Integration of cognitive neuroscience perspectives on language
- Comprehensive code examples in Python using HuggingFace transformers
- Assignment descriptions and pedagogical materials
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| slides/week9/lecture.tex | Week 9 slides covering RAG, MoE, self-supervised learning, CLIP, and future directions with final project guidelines |
| slides/week8/lecture.tex | Week 8 slides on GPT evolution (GPT-1 through GPT-4), open-source LLMs, brain-language model convergence, and Turing test discussion |
| slides/week7/lecture.tex | Week 7 slides on conversation models, pragmatics, dialogue, common ground, and embodied language with ConvoKit demo |
| slides/week5-6/lecture.tex | Weeks 5-6 slides covering sequence-to-sequence models, attention mechanisms, transformers, BERT, and cognitive neuroscience perspectives |
| slides/week1/lecture.tex | Week 1 introduction slides covering consciousness, language vs. thought, pattern matching, ELIZA, and Assignment 1 |
| slides/week1/README.md | Documentation for week 1 slides with compilation instructions and presentation tips |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Updated all assignments to 1-week timelines (except Assignment 3: 2 weeks, Final Project: 4 weeks) - Added comprehensive day-by-day syllabus schedule with: * MWF 10:00-11:05 lecture times * X-hours (Tu 12:00-12:50) in first 3 weeks * MLK Day and instructor absence noted * Topics, readings, slide links, and assignment deadlines for each class - Updated slides README with detailed lecture-by-lecture breakdown - All assignments now include daily schedules optimized for GenAI assistance
Major updates: Schedule corrections: - Update meeting times to MWF 10:10-11:15 (was 10:00-11:05) - Update X-hour to Thursday 12:15-1:05 (was Tuesday 12:00-12:50) - Update syllabus with corrected Thursday X-hour dates Individual lecture slides (24 total): - Split combined lecture files into individual decks for each lecture - Week 1: Lectures 1-3 (Introduction, Pattern Matching, ELIZA) - Week 2: Lectures 4-6 (Data Cleaning, Tokenization, POS/Sentiment) - Week 3: Lectures 7-8 (Classic Embeddings, Word Embeddings) - Week 4: Lectures 9-11 (Contextual Embeddings, Dimensionality Reduction, Cognitive Models) - Week 5: Lectures 12-14 (Attention, Transformers, Training) - Week 6: Lectures 15-17 (BERT Deep Dive, Variants, Applications) - Week 7: Lectures 18-20 (GPT Architecture, Scaling, Implementation) - Week 9: Lectures 21-23 (RAG, MoE, Ethics) - Week 10: Lecture 24 (Final Project Work Session) - All slides use Metropolis theme with emojis, TikZ diagrams, and discussion questions X-hour demo notebooks (3 total): - Week 1: ELIZA implementation and debugging workshop - Week 2: Text classification with multiple methods - Week 3: Embeddings comparison (LSA, LDA, Word2Vec, visualization) - All notebooks are Google Colab-ready with hands-on exercises Additional materials: - Week 10 presentation guidelines (comprehensive Markdown guide) - Removed Week 8 lecture slides (no classes during instructor absence) All materials ready for Winter 2026 term.
- Update classroom from TBD to Moore 302 - Correct course end date to March 9, 2026 (last day of classes) - Update Week 10 schedule: presentations on March 9 - Final project materials due March 13 (final exam period)
Major Updates:
📚 Syllabus Enhancements:
📊 Complete Beamer Slide Decks (7 weeks):
All slides include:
🤖 GitHub Actions Automation:
📝 Comprehensive Assignment Updates:
Assignment 4: Context-Aware Customer Service Chatbot
Assignment 5: Build and Train GPT Model
Final Project: Capstone Research Project
All assignments designed for sophisticated undergraduate work with GenAI coding assistance.
Technical Details: