Riddhi Bhagwat

I am an undergraduate at MIT studying CS with a minor in Brain & Cognitive Science. Currently, I am researching at MIT CSAIL in the Language and Intelligence Group under Prof. Jacob Andreas, where my recent focus is on interpretability and reinforcement learning in language models. Previously, I interned at Letta on the research team, building memory management capabilities in agentic systems, and this summer I will be at Databricks working on large-scale AI systems.

I'm currently exploring agentic AI systems that learn from interaction, focusing on reward modeling and model introspection as mechanisms for improving self-consistency, reliability, and generalization. I'm interested in how such agents incorporate feedback, communicate, and remain robust under real-world constraints. Always happy to chat!

Current
Research

LLM Internals & Self-Consistency Ongoing

Investigating the broader connections between LLM internal representations, reward signals, and self-consistency — exploring how understanding a model’s internals can enable more reliable alignment and predictable behavior.

Interpretability Reward Modeling Self-Consistency Alignment
LLM Hypnosis diagram

LLM Hypnosis Under Review ICLR 2026 Workshop

Almog Hilel & Riddhi Bhagwat, Leshem Choshen, Idan Shenfeld, Jacob Andreas

Exposes a vulnerability in preference tuning pipelines: adversarial inputs can systematically hijack RLHF-aligned models, causing them to behave contrary to their stated preferences. Demonstrates risks in current alignment workflows.

Adversarial Robustness Preference Tuning RLHF AI Safety
FeeL — Feedback Loop

FeeL — Feedback Loop

Riddhi Bhagwat, Jen Ben Arye, Leshem Choshen, Idan Shenfeld, Jacob Andreas

A community-driven platform for improving multilingual language models through structured human feedback and RLHF pipelines, making AI more equitable for speakers of all languages.

LLMs RLHF Multilingual AI HuggingFace
FracNet

FracNet

Riddhi Bhagwat, Hannah Lu, Lluis Salo-Saldago, Ruben Juanes

A deep learning tool based on flow modeling architecture to provide insights into underground fractured rock networks, enabling better subsurface characterization for energy and environmental applications.

Deep Learning Flow Modeling Geoscience
Heart
Health

Heart Health Technology Ongoing

Exploring technologies for holistic heart health monitoring and sudden cardiac arrest prevention. Interested in collaborating? Feel free to reach out.

Healthcare AI Wearables Cardiac Health
X-Ray
Fairness

Mitigating Unfairness in Chest X-Ray Classifiers

Riddhi Bhagwat, Courtney Ma, Maggie Lin — Final Project, 6.7960 (Deep Learning), MIT

Investigated demographic bias in chest X-ray disease classifiers and proposed a combined framework of adversarial debiasing and network pruning to mitigate it. Evaluated across fairness metrics including false negative rate parity and equalized odds, with findings showing meaningful reductions in inter-group disparities without sacrificing diagnostic accuracy. Completed as a final project for 6.7960 (Deep Learning) at MIT.

Algorithmic Fairness Medical Imaging Adversarial Debiasing Deep Learning

Talks & Posters

  • ICLR 2026 Workshop — Poster Presentation, 2026
  • C3E Energy Conference, Stanford — Poster Finalist, 2025
  • MIT Energy Initiative Slam — Top 5 Speaker, 2025

Writing

I enjoy documenting life as I navigate it. My Substack, Confessions of an Avid Laptop Sticker Collector, is a long-winded series of intense monologues, deep thoughts, sidewalk conversations, and memorable experiences — reflections of my take on life.

Read on Substack →

human-AI interaction AI interpretability reward learning healthtech cogsci + metacognition tech equity education startups + venture reading writing prose chai + matcha dancing traveling learning languages

Let’s talk!

I’m always happy to chat about research, collaborations, startups, or swap matcha recommendations :) My inbox is open.