-
Rethinking Generation & Reasoning Evaluation in Dialogue AI Systems
As we rely further on (and reap the benefits of) LLMs’ reasoning abilities in AI systems and products, how can we still grasp a sense of how LLMs “think”? Where steerability is concerned — users or developers may desire to add in custom handling logic and instructions — how can ensure that these models continue to follow and reason from these instructions towards a desirable output?
-
Concepts for Reliability of LLMs in Production
By replacing traditional NLP models with LLM APIs, we trade the controllability for their flexibility, generalizability, and ease of use. How might we de-risk our ML systems and safeguard GenAI-enabled features in production?
-
Designing Human-in-the-Loop ML Systems
As machine learning practitioners, we constantly strive to produce the highest-performing models to achieve the best business outcomes. But model development is only the tip of the iceberg; how well an ML solution performs has to be continuously evaluated on live predictions. When using trained models, we subtly invoke an assumption -- that the training data distribution sufficiently approximates the unseen data distribution. Unfortunately, though, this does not always hold.
-
Learning Bayesian Hierarchical Modeling from 8 Schools
A walkthrough of a classical Bayesian problem.
-
Understanding Copulas
In statistics, copulas are functions that allow us to define a multivariate distribution by specifying their univariate marginals and interdependencies separately. In modelling returns of assets, for example, this enables greater flexibility and ability to model joint behaviour in extreme events.