Project Write-Ups

Technical work, compressed for a deeper scan.

Each write-up is structured around the problem, approach, results, and lessons learned. Public code and formal papers will be linked here as they become available.

Operations Research

Pareto Confidence Bands

Problem

Noisy multi-objective samples make the Pareto frontier uncertain, especially when the decision space is sampled indirectly through a black-box objective.

Approach

Fit Gaussian process surrogate models, generated posterior frontier simulations, and formulated a PuLP/CBC MILP with contiguity constraints.

Result

Built an automated Monte Carlo pipeline targeting confidence bands that cover at least 95% of simulated Pareto frontiers.

Lesson

The modeling challenge is not just finding a frontier; it is communicating uncertainty in a way that preserves decision usefulness.

NLP Infrastructure

SpokenCRS Dataset API

Problem

Conversational recommendation datasets such as ReDial and Inspired use heterogeneous formats that slow down benchmark setup.

Approach

Created CRSDataFrame and TurnWrapper abstractions with a unified turn-level schema for utterances, entities, ratings, and metadata.

Result

Reduced dataset onboarding time by an estimated 80-90% and enabled plug-and-play compatibility across 3+ CRS model architectures.

Lesson

Good research infrastructure removes silent data engineering work so model comparisons become cleaner and faster.

Market Sentiment

Reddit to Equity Direction

Problem

Social finance text is noisy, ticker-dependent, and difficult to align cleanly with price movement targets.

Approach

Scraped 7 finance subreddits with PRAW, classified sentiment using FinBERT, merged with yFinance data, and built sklearn preprocessing.

Result

Benchmarked 5 classifiers with GridSearchCV for TSLA, AAPL, and AMZN next-day direction labels.

Lesson

Signal quality depends heavily on timing, ticker ambiguity, and labeling choices before model selection matters.

HackMIT

Blind Karaoke

Problem

Create a fast, playful karaoke experience where the singer cannot see the lyrics and the app scores what they remember.

Approach

Built a Flask and Next.js prototype using Spotify playback, Whisper transcription, and lyric comparison logic.

Result

Developed during HackMIT, a 24-hour hackathon with about 1,000 selected students internationally.

Lesson

Real-time audio projects reward simple architecture, fast feedback loops, and clear scoring constraints.