Skip to content

pablordoricaw/columbia-ms-courses-home

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

51 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Columbia MS Courses Home

The following table lists the courses I took during my Computer Engineering MS degree at Columbia University and links to the repositories for assignments, projects, and other course materials.

Semester Course Repositories Summary
Fall 2024 Natural Language Processing
  • Assignment 1: Built a trigram language model in Python for text classification.
  • Assignment 2: Developed a neural network for dependency parsing using PyTorch.
  • Assignment 3: Implemented a conditioned LSTM language model for image captioning.
  • Assignment 4: Created a Semantic Role Labeling system utilizing BERT.
Fall 2024 Introduction to Databases
  • Project 1: Designed a database system for a job board application (modeling companies, job postings, skills, and requirements), implemented it on a Google Cloud PostgreSQL server, and built a Web Application to interact with it.
  • Project 2: Expanded the Project 1 database schema with advanced PostgreSQL features including text/array attributes, composite data types, and PL/pgSQL triggers/functions.
Fall 2024 System-on-Chip Platforms
  • Homework 2: Implemented a fully connected neural network layer (fc_layer) using SystemC as part of a deep convolutional neural network (DWARF7) for image classification.
  • Homework 4: Designed a SystemC convolutional neural network accelerator with AXI4 interfaces and synthesized it to an RTL implementation via High-Level Synthesis (HLS) using Catapult.
Fall 2024 Heterogenous Computing for Signal and Data Processing
  • Homework 1: Compared host-to-device memory allocation methods for elementwise vector operations using PyCUDA and PyOpenCL, profiling the performance with NVIDIA Nsight.
  • Homework 2: Wrote modular device functions to implement a sin(x) Taylor series approximation using built-in math libraries in PyCUDA and PyOpenCL.
  • Homework 3: Implemented 2D convolution (correlation) on GPUs, progressively optimizing performance using shared and constant memory locality techniques.
  • Homework 4: Implemented naive and work-efficient parallel prefix scans, and built a basic CNN forward pass pipeline (Convolution, ReLU, Flatten, Fully Connected) using PyCUDA and PyOpenCL.
  • Project: Investigated GPU parallelization performance of the normalized cross-correlation (NCC) algorithm for template matching ("Where's Waldo?") across various precision formats (FP32, FP16, FP8, BF16, TF32) on NVIDIA Ampere and Hopper architectures.
Spring 2025 Private Systems
  • Homework 1: Explored unintended memorization in neural networks by evaluating the Secret Sharer framework.
  • Homework 2: Investigated privacy accounting and quality control using the Sage Differentially Private ML platform.
  • Homework 3: Built an encrypted database over Google BigQuery supporting computation on encrypted data using non-deterministic (AES-GCM), deterministic (AES-SIV), and partially homomorphic (Paillier) encryption schemes.
  • Project: Investigated the NYC Tuition Assistance Program dataset for quasi-identifiers that could lead to privacy leaks, developing a custom Python CLI tool (find-quasi-ids) to conduct privacy-preserving data analysis.
Spring 2025 Applied ML in the Cloud
  • Homework 1: Compared cloud VM (IaaS), data warehouse (PaaS), and conversational AI (SaaS) offerings across IBM Cloud, AWS, and Google Cloud Platform.
  • Homework 2: Developed an Infrastructure-as-Code (IaC) Python application using Pulumi to automate the discovery and provisioning of scarce GPU instances across multiple GCP regions.
  • Homework 3: Created a deep learning workflow on GCP's Kubernetes Engine (GKE) with Kubeflow, training a PyTorch LeNet-5 model and serving it via KServe with the NVIDIA Triton Inference Server.
  • Project 1: Profiled and compared theoretical vs. measured workload characteristics (FLOPs, memory, arithmetic intensity) for CNNs (LeNet-5, VGG16) running on a Google Cloud T4 GPU in a VM versus a containerized environment.
  • Project 2: Optimized a ResNet50 model using TensorRT (Pruning, Sparsity, Quantization) and deployed the variants to distinct Google Cloud Vertex AI endpoints, building a Streamlit web app to benchmark parallel inference latency and accuracy.
Spring 2025 Embedded Scalable Platforms
  • Project: Developed a new frontend for the hls4ml framework to support Google Flax's NNX API, enabling the generation of high-level synthesizable C++ code for hardware-accelerated machine learning on FPGAs and ASICs. The implementation addressed architectural differences in Flax and was assessed by comparing Power, Performance, and Area (PPA) against the existing TensorFlow/Keras frontend.
Spring 2025 Large-Scale Stream Processing
  • Homework 1: Processed HTTP server logs using PySpark RDD and DataFrame APIs to compute total bytes served, filter top-K IPs, and analyze requests within specified time windows and subnet masks.
  • Homework 2: Implemented and evaluated streaming optimizations (Operator Reordering, Load Shedding, and Redundancy Elimination) using PySpark Structured Streaming and DStream APIs on a denormalized Formula 1 dataset.
  • Project: Designed a real-time analytics dashboard to visualize live Formula 1 race telemetry and detect strategy decisions, built using Apache Beam deployed on GCP Dataflow with a Streamlit frontend.
Fall 2025 Parallel Functional Programming
  • Assignments: Completed programming exercises in Haskell covering parallelization and functional programming concepts, utilizing GHCi and Stack.
  • Project: Developed a Haskell-based 4-player Blokus game engine and parallelized the multi-agent MaxN search algorithm to evaluate game states, optimizing execution by using Haskell's Control.Parallel.Strategies with depth-budgeting and multi-core work-stealing strategies.
Fall 2025 Artificial Intelligence-of-Things
  • Labs: Incrementally built a smart watch using an Adafruit HUZZAH32 ESP32 microcontroller with MicroPython. Features added included sensor data processing, I2C/SPI bus communication for display, voice assistant capabilities using an LLM (Whisper), and Human Activity Recognition (HAR) using cloud services.
  • Project: Developed an end-to-end AI-powered medical device error triage system. The system simulates medical device logs, integrates real-time environmental data from an ESP32 sensor node, and leverages a Large Language Model (LLM) enriched with FDA MAUDE reports to generate diagnostic reports and root-cause analyses.
Fall 2025 Computer Networks
  • Homework 1: Analyzed HTTP packets using Wireshark, calculated end-to-end delays using ping, and traced network routes with traceroute.
  • Homework 2: Developed a caching HTTP Proxy Server from scratch using Python's socket and selectors libraries, and analyzed HTTP interactions (including HTTP Conditional GETs and Authentication) with Wireshark.
  • Homework 3: Implemented a recursive DNS Resolver in Python and analyzed DNS resolution paths using nslookup and dig. Also analyzed HTTP/2 performance vs HTTP/1.1 and evaluated DASH adaptive video streaming.
  • Homework 4: Implemented a TCP-Lite reliable data transfer protocol over UDP in Python and analyzed TCP connection establishments, bulk data transfers, and TCP Reno/Tahoe congestion control algorithms.
  • Homework 5: Explored IPv4/IPv6 packet headers, IP fragmentation, NAT translation, BGP routing paths using looking glass servers, and traced high-speed Internet2 connections.
Fall 2025 Malware Analysis & Reverse Engineering
  • Homework 1: Performed basic static and dynamic analysis on Windows executables and DLLs using tools like PEiD, CFF Explorer, strings, Process Monitor, FakeNet, and Wireshark to identify packed files, keyloggers, and ransomware.
  • Homework 2: Analyzed x86 assembly code and reverse engineered programs using IDA Pro to identify C constructs (loops, conditionals), analyze network-based malware, and defuse a command-line C "bomb".
  • Homework 3: Conducted advanced static and dynamic analysis using IDA Pro and debuggers to identify complex code constructs (switch statements, loops) in malware loaders, reverse engineered a game's registration key, and patched a Windows executable (Solitaire) to alter its behavior.
Spring 2026 Hardware Security
  • Homework 1: Acted as a red team to inject subtle hardware Trojans into a Verilog 2-stage pipeline microcontroller (e.g., homoglyph opcodes, state machine deadlocks, pipeline timing bypasses), and acted as a blue team to analyze and detect hardware vulnerabilities inserted into a Verilog IEEE 754 double-to-float converter.
Spring 2026 Computer Hardware Design
  • Labs 1-4: Completed initial labs focusing on Verilog and hardware design basics.
  • Project 1: Designed and tested various priority selectors (arbiters) in SystemVerilog, including 4-bit and 8-bit selectors using basic assign statements, if-else logic, and hierarchical module designs, as well as a 4-bit rotating priority selector.
  • Project 2: Currently in progress.

Getting Started

  1. Create a GitHub repository for a course assignments and/or project following the naming convention <course-code>-<course-name>-<[assignments/project]> i.e. comsw4705-nlp-assignments.

  2. Create an entry for the course with the created repo in the table above.

  3. Run the Pulumi Python ms-courses-home app locally to configure specific settings of the newly created repo.

Template Metadata

Useful metadata to have handy for homeworks and projects.

**Author:** Pablo Ordorica Wiener (UNI: po2311)

**Course:** <Course Number> <Course Name>

- **Semester:** <Fall/Spring> <yyyy>
- **Instructor:** <Instructor Full Name> (UNI: <xxxx>)

- **TA:** <TA Full Name> (UNI: <xxxx>)

Cloud Infrastructure Strategy for Assignments and Projects

Some of the assignments and projects require cloud computing, and this section explains my approach to managing cloud resources for those courses.

As seen on the table above, there are two type of repositories:

  • Individual assignments: One single repository per course containing all individual homework.
  • Team projects: Separate repository per project (especially for team collaborations).
Feature ๐Ÿ‘ค Individual Assignments ๐Ÿ‘ฅ Team Projects
GCP Project Shared GCP Project New Dedicated GCP Project
Pulumi Project Unique name per Assignments Repo Unique name per Project
Pulumi Stack main (local deployment) Depends on the project.
State Backend Pulumi Cloud (Personal Org) GCP Bucket Storage

Workflow for New Repositories

1. Individual Assignments

All individual assignments

  1. share a single GCP Project to avoid overhead, but
  2. use separate Pulumi Projects (one per assignments repo) to keep state isolated.

Run this in the root of your assignment repo:

# Initialize a new Pulumi Python project using 'uv' as the package manager
# --force is used because the directory already exists (the repo root)
pulumi new python --name <course-code>-<assignment-name> --toolchain uv --force

After initialization, add the my shared cloud infra library:

uv add "git+https://github.com/pablordoricaw/my-cloud-lib.git@v0.2.0#subdirectory=pulumi"

Team Projects

For group projects, I use:

  • A dedicated GCP Project is created for the team. This ensures my personal credits are not billed for team usage and allows teammates to have IAM access.
  • A GCP Storage Bucket inside the team's project. This allows all teammates to read/write state without needing access to my personal Pulumi Cloud organization.

Prerequisites:

  • A new GCP Project in the Google Cloud Console.
  • Grant "Editor" IAM roles to all team members on that GCP Project.
  • A storage bucket to use as the backend for the IaC state

Run this in the root of the project repo:

# 1. Authenticate to the Team's State Bucket (Teammates must do this too)
#    Ensure you have 'Storage Object Admin' on the bucket.
gcloud auth application-default login
pulumi login gs://<team-project-bucket-name>

# 2. Initialize the project (same as individual)
pulumi new python --name <project-name> --description "Course Code Team Project"

# 3. Configure the Stack to use the Team's GCP Project
pulumi config set gcp:project <team-gcp-project-id>

Add my shared infrastructure library if needed:

uv add "git+https://github.com/pablordoricaw/my-cloud-lib.git@v0.2.0#subdirectory=pulumi"

About

๐Ÿ  Repo for my Columbia University Courses

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages