Uber AI: AI Prompt Engineer for Google's Gemini
Jan 2026 - Current
- Designed challenging evaluation tasks for Gemini within Terminal Bench and Colab Bench, targeting temporal reasoning, multimodal data integration, and physics-based inference.
- Built rigorous test harnesses to verify agent trajectories and intermediate reasoning artifacts beyond final outputs.
- Engineered tasks involving loss and failure analysis, anomaly detection, and signal decoding to identify capability gaps in LLM agents.
- Tech Stack: Python, Jupyter, Docker, NumPy, SciPy