Projects

Building a Legal AI Application With SimpleMem, AutoResearch, and Legal Skills

May 2026

A local Legal AI application that combines document upload, SimpleMem conversation memory, legal workflow skills, and retrieval optimization for contract and legal question answering.

PDF | Code | Read more →

Recursive Failure Archaeology: RLM for Agent Failure Diagnosis

May 2026

A recursive investigation pipeline that diagnoses agent failures by segmenting long trajectories, generating hypotheses, pruning evidence, verifying controls, and comparing against full-context GPT-5.5.

Code | Read more →

Trace2Evolve: A Karpathy-Style AutoResearch Harness for Customer-Support Agents

May 2026

A research-style AutoResearch harness for improving tool-using customer-support agents using benchmark traces, heldout tau2 evaluations, synthetic pressure tests, and reliability-gated promotion.

Code | Read more →

Teaching an LLM to Explore: Reinforcement Learning for Document Navigation

March 2026

PDF | Code | Read more →

Privacy Guard: Learning Privacy-Budgeted Active Sensing Policies via Reinforcement Learning in Smart Home Environments

March 2026

PDF | Code | Read more →

Training a Deep Q-Network to Master Uno: A Comprehensive Study in Reinforcement Learning for Imperfect Information Games

January 2026

PDF | Read more →