I’m Elliot “Li” Bearden, an AI safety research engineer focused on LLM evaluation methodology. My capstone research investigates sycophancy detection and demographic variation in eval validity — specifically, how current evaluation infrastructure misses the failure modes that matter most in production deployment. MSCS completion July 2026.
Five years deploying speech and language models at Deepgram gave me a front-row seat to the gap between benchmark performance and deployment behavior — the gap my research now targets directly. At Deepgram I built and owned the evaluation and custom training infrastructure, including the pipeline that cut custom model delivery from 14 days to 1.
Earlier civic-tech work as a Civic Digital Fellow at the NIH National Library of Medicine (through Coding it Forward) shapes how I think about institutional dynamics in safety review — how measurement infrastructure gets designed, contested, and used inside organizations.
I also take on selected applied evaluation engagements — details here.