- π Sophomore at Grinnell College, pursuing B.A. in Computer Science & Mathematics (Concentration: Statistics)
- π Striving for jobs in quantitative research, data engineering, and AI/ML
- π» I was a Data Engineer and AI Engineer at Gtel Data Research Group in Summer 2025, and a NLP Intern at Data Glacier in Fall 2025.
- π± Learning diffusion models, reinforcement learning, and LLM finetuning
- π― Open to collaborating on quant research, machine learning/computer vision projects, and solving sudoku problems
- π¬ Ask me about machine learning, deep learning architectures, or just life in general
- π Portfolio & Blog: https://ducduong-portfolio.vercel.app/
- β‘ Fun fact: I love dabbling in variants Sudoku, badminton, and soccer
- School Email: [email protected]
- Work Email: [email protected]
- π§© Cracking the Cryptic is the best YouTube channel in the world.
- β½ Born to play soccer but peaked at πΈ badminton
- π Reading AI/ML research papers, quant finance literature, and manga
|
|
π Sponsored by American Statistical Association (ASA) & CAUSE. | Dec. 2025
Analyzed theft patterns across Los Angeles using the 2020 LAPD dataset to understand how spatial and demographic factors affect theft distribution.
- Applied nested logistic regression models with predictors such as population size, density, victim age, sex, and race.
- Found population density to be the strongest negative predictor of theft, while demographic analysis showed older victims and women were slightly more likely to be targeted.
- Highlighted racial differences in exposure to theft vs. violent crimes.
- Work was recognized nationally, earning 1st Prize in the USPROC Introductory Statistics Class Project competition.
π GitHub Repository | Aug 2024 β May 2025
A research project exploring whether machine learning models can distinguish tonal vs. non-tonal languages from multilingual audio samples.
- Collected and processed 125 multilingual audio clips from 18 countries.
- Designed spectral and pitch-based features that reduced raw noise by 30% and improved dataset balance.
- Benchmarked 7 ML models (logistic regression, SVM, random forest, neural nets, etc.) with cross-validation, achieving 65% accuracy (20% over baseline).
- Built reproducible pipelines in scikit-learn and PyTorch for comparative metrics (precision, recall, F1).
- Proposed scalable data collection strategies for future interdisciplinary research in linguistics + machine learning.
- π 1st Prize β USPROC Introductory Statistics Class Project Competition (June 2025)
Project: Spatial And Demographic Effects On Theft Distribution Across Los Angeles - π 2nd Place β 2025 Iowa Collegiate Mathematics Competition (99/100 score)
- π Machine Learning and Data Science AβZ (Python/R, Udemy)
- π UR2PhD Undergraduate Pre-Research Experience Course Credential































