JHU Center for Language and Speech Processing

All

46 repositories

al-qasida
Public
Python
•0•1•0•0•Updated Dec 27, 2025Dec 27, 2025
science-hierarchography
Public
Python
•3•4•0•0•Updated Dec 15, 2025Dec 15, 2025
csci-601-771-self-supervised-models
Public
HTML
•0•0•0•0•Updated Dec 10, 2025Dec 10, 2025
MedExpert
Public
Code for the "MedExpert: An Expert-Annotated Dataset for Medical Chatbot Evaluation" paper at Machine Learning for Health (ML4H) 2025.
Python
•
MIT License
•0•6•0•0•Updated Dec 3, 2025Dec 3, 2025
RepliCan-C4C
Public
Python
•4•4•0•1•Updated Nov 27, 2025Nov 27, 2025
icl-in-genomics-models
Public
Essential code for the paper *Genomic Next-Token Predictors are In-Context Learners*.
Python
•0•1•0•0•Updated Nov 16, 2025Nov 16, 2025
BloomScrub
Public
Python
•0•1•0•0•Updated Nov 5, 2025Nov 5, 2025
mmBERT
Public
A massively multilingual modern encoder language model
Python
•9•117•1•0•Updated Oct 13, 2025Oct 13, 2025
Speech_NSF_NextGen
Public
NSF CCRI ENS Project: Next Generation Tools for Spoken Language Science & Technology
1•0•0•0•Updated Oct 1, 2025Oct 1, 2025
challenging_the_judge
Public
Python
•0•2•0•0•Updated Sep 23, 2025Sep 23, 2025
hell-or-high-water
Public
Code and data for the paper: "Hell or High Water: Evaluating Agentic Recovery from External Failures"
Python
•
MIT License
•0•6•0•0•Updated Aug 14, 2025Aug 14, 2025
eval-the-eval-readability
Public
Jupyter Notebook
•0•2•0•0•Updated Aug 11, 2025Aug 11, 2025
according-to
Public
0•2•0•0•Updated Aug 6, 2025Aug 6, 2025
ettin-encoder-vs-decoder
Public
State-of-the-art paired encoder and decoder models (17M-1B params)
Python
•
MIT License
•3•53•0•1•Updated Aug 6, 2025Aug 6, 2025
Feedback-Friction
Public
Code for paper FEEDBACK FRICTION: LLMs Struggle to Fully Incorporate External Feedback https://arxiv.org/pdf/2506.11930
Python
•0•8•0•0•Updated Jun 16, 2025Jun 16, 2025
translation-barrier
Public
Python
•1•1•0•0•Updated Jun 12, 2025Jun 12, 2025
NeoCoder
Public
Official implementation of our paper "Benchmarking Language Model Creativity: A Case Study on Code Generation"
Python
•
Apache License 2.0
•4•10•0•0•Updated May 16, 2025May 16, 2025
ModernBERT
Public
Bringing BERT into modernity via both architecture changes and scaling
Python
•
Apache License 2.0
•136•0•0•0•Updated Apr 22, 2025Apr 22, 2025
arXiv2Table
Public
Code and dataset for the paper: Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol (https://arxiv.org/pdf/2504.10284)
Python
•
MIT License
•0•1•2•0•Updated Apr 21, 2025Apr 21, 2025
verifiable-by-design
Public
Code for paper "Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data"
Python
•0•3•0•0•Updated Apr 21, 2025Apr 21, 2025
CLAIMCHECK
Public
The repo for the paper "CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers? "
0•0•0•0•Updated Apr 20, 2025Apr 20, 2025
web-agent-arena
Public
Web Agent Arena
HTML
•0•0•0•0•Updated Apr 10, 2025Apr 10, 2025
CS-601-471-671-Sp25
Public
Python
•8•6•0•0•Updated Apr 7, 2025Apr 7, 2025
icl-ciphers
Public
Python
•1•0•1•0•Updated Feb 15, 2025Feb 15, 2025
AnaloBench
Public
This project focus on curating a robust analogical reasoning dataset for research and development.
Python
•2•5•0•0•Updated Dec 18, 2024Dec 18, 2024
turking-bench
Public
Web-grounded natural language instructions
HTML
•
Apache License 2.0
•6•17•3•0•Updated Nov 25, 2024Nov 25, 2024
RATIONALYST
Public
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
Python
•5•35•0•0•Updated Oct 3, 2024Oct 3, 2024
wikicite
Public
Python
•0•1•0•0•Updated Jul 29, 2024Jul 29, 2024
Kreyol-MT
Public
Python
•
MIT License
•0•6•0•0•Updated Jun 1, 2024Jun 1, 2024
Cost-Effective-Experiment
Public
Scripts and docs that help us run cost effective experiment with OpenAI APIs
Python
•2•4•0•0•Updated May 28, 2024May 28, 2024