Chains of Thought

Musings on AI, machine learning, and life

Systematic Pessimism

A new paradigm for scaling quality engineering with AI — automated discovery of edge cases or potential failure modes at every commit.

18 min read · February 10, 2025

2025 · ai coding-assistants human-centered-ai
Beyond Automation — The Case for AI Augmentation

The really transformative interfaces won't be the ones that make us more productive; they'll be the ones that make us more thoughtful, more creative, more aware of our own cognitive patterns. Like mirrors for our minds, showing us our blind spots and suggesting perspectives we habitually miss.

12 min read · January 06, 2025

2025 · ai human-centered-ai llm
Rethinking Generation & Reasoning Evaluation in Dialogue AI Systems

As we rely further on (and reap the benefits of) LLMs’ reasoning abilities in AI systems and products, how can we still grasp a sense of how LLMs “think”? Where steerability is concerned — users or developers may desire to add in custom handling logic and instructions — how can ensure that these models continue to follow and reason from these instructions towards a desirable output?

13 min read · November 08, 2023

2023 · llm reasoning ai-evaluation machine-learning
Concepts for Reliability of LLMs in Production

By replacing traditional NLP models with LLM APIs, we trade the controllability for their flexibility, generalizability, and ease of use. How might we de-risk our ML systems and safeguard GenAI-enabled features in production?

14 min read · July 05, 2023

2023 · llm machine-learning ai-evaluation ml-system
Designing Human-in-the-Loop ML Systems

As machine learning practitioners, we constantly strive to produce the highest-performing models to achieve the best business outcomes. But model development is only the tip of the iceberg; how well an ML solution performs has to be continuously evaluated on live predictions. When using trained models, we subtly invoke an assumption -- that the training data distribution sufficiently approximates the unseen data distribution. Unfortunately, though, this does not always hold.

13 min read · February 05, 2023

2023 · machine-learning human-in-the-loop production