Concepts

Building blocks

Trace
The fundamental object in Braintrust. A trace is a single execution of an instrumented set of code. Traces appear in two places in Braintrust: Experiments (for traces run outside of production on test data) and Logs (for traces run in production on live user inputs).

Span
An atomic unit of computation. Spans form the building blocks of a trace. Components of a Span include:

Input, the input to your span, which could be a user's query, function input, etc.
Output, the output of your span, which could be a model's prediction, a function's return value, etc.
Expected output (optional), where applicable, a reference output that you expect your output to match. Expected values can be updated after the fact.
Metadata, system generated metadata like metrics (e.g. duration) and context (where in your code this ran), and flexible metadata which you can define in your code.
Scores, numbers between 0 and 1 that represent eval metrics.

Objects and organizing constructs

Organization
The billing unit of Braintrust. Can represent a company or team.
- Project
  A single AI feature. Experiments run within the same project should generally be able to be compared to each other.
  - Experiment
    A set of traces run to test the behavior of code outside of production. The input data comes from pre-defined data, where each member will result in one Experiment Trace. (Note: Braintrust Datasets are a convenient way to store and manage input data for Experiments.)
    - Experiment Trace
      A single trace in an Experiment.
  - Logs
    The set of all Log Traces.
    - Log Trace
      A trace run to monitor code in production. The input data comes from user input in an application.
  - Dataset
    A list of data inputs and, optionally, their expected values and metadata. Primarily used for running Experiments.
  - Prompt
    A versioned prompt that can be referenced in your code.
  - Playground
    An environment for quickly iterating on prompts and model parameters and seeing the result.

Systems

Braintrust consists of three core systems:

Your code, instrumented with Braintrust's SDKs/API. Use Eval() to define evaluations, traced() to trace code, and .log() to log data. Your code can run wherever you'd like—locally, in the cloud, in a CI/CD pipeline, etc.
The data plane, which stores and serves data for experiments, logs, datasets, prompts, and playgrounds. By default, we host the data plane for you, but you can also self-host it.
The control plane, which hosts the Braintrust UI and metadata. The control plane cannot (and does not) access the data plane. We host the control plane for you, but if you really want, you can too!