Custom query syntax

Braintrust allows you to search and query data in your experiments, logs, and datasets using a combination of natural language and the Braintrust Query Language (BTQL). This guide covers the BTQL syntax, which you can use to filter and run more complex queries to analyze your data.

How search works

Every search bar in Braintrust accepts both natural language and BTQL as input. When you type in a query, Braintrust first checks to see if it's valid BTQL. If it is, then it runs the query directly. Otherwise, it falls back to an AI system to figure out how to either construct the right BTQL query or full-text search.

When in doubt, just type what you want to search for. The system is designed to figure out what you mean, and automatically translate it into the appropriate query.

Braintrust Query Language (BTQL)

There are two main components to BTQL: expressions and queries. When you type in the search bar, you're writing an expression that is used to filter logs to those that match. Let's cover how expressions work first, and then zoom out to see how you can construct more complex queries.

BTQL is currently in beta. To enable it, visit your profile settings, and enable "Use Braintrust query language for search filters".

Expressions

BTQL uses a familiar SQL-like syntax to construct expressions. Here are a few examples:

-- Retrieve every record with a Factuality score greater than 0.5
scores.Factuality > 0.5
 
-- Retrieve every log with the tag "triage" and not the tag "triaged"
tags includes "triage" and NOT tags includes "triaged"
 
-- Check if input contains the string "hello"
input ILIKE '%hello%'
 
-- Check if the model is gpt-4-turbo
metadata.model = "gpt-4-turbo"
 
-- Check if the number of tokens used is greater than 1000 for gpt-4-turbo and 500 for all other models
metrics.tokens > (metadata.model = "gpt-4-turbo" ? 1000 : 500)

BTQL expressions support the following operators:

and, or, ternary (? :), not
=, != (aka <>), >, <, >=, <=
includes (aka contains) checks if a json object or array contains a value
ILIKE (case-insensitive LIKE) check for substring matches
IS NULL, IS NOT NULL (note that like SQL, you cannot do =NULL to test for equality with NULL)
+, -, *, /, % (for arithmetic operations)
- (for negation)
(<expr>) (for grouping expressions)

There are two types of values: fields and literals. Fields can refer to nested JSON fields, e.g.

metadata.model

would return "gpt-3.5-turbo" for {"metadata": {"model": "gpt-3.5-turbo"}}. Like SQL languages, you can escape field names with double quotes, e.g.

"metadata"."my custom field" = 'foo'

Literals can be strings, numbers, booleans, null, arrays, or objects. Strings can be single or double quoted, e.g.

metadata.model = "gpt-3.5-turbo"

Objects are specified using JSON-like syntax, e.g.

metadata = { model: "gpt-3.5-turbo" }

And arrays are specified using square brackets, e.g.

tags contains ['triage', 'triaged']

Quoting rules

Strings can be single or double quoted
A single-part field name must be unquoted, e.g. metadata refers to the field metadata, but "metadata" is a string (equivalent to 'metadata')
Multi-part identifiers can use double quotes to escape spaces, e.g. "metadata"."my custom field"
JSON objects can use single quoted, double quoted, or unquoted strings as object names
- { model: "gpt-3.5-turbo" }, { "model": "gpt-3.5-turbo" }, { 'model': "gpt-3.5-turbo" } are equivalent

Typesystem

The BTQL typesystem is dynamic and can be partially specified using JSON schema. For example, when you write a filter like scores.Factuality > 0.5, the system knows that scores.Factuality is a number.

However, when you write a filter like input LIKE '%hello%', the system doesn't know whether input is a JSON object, string, or something else, and will automatically infer the type based on the data.

Queries

BTQL queries consist of a series of clauses that specify how you want to filter and analyze your data. Let's look at a few examples:

-- Retrieve every record from "experiment_id" with a Factuality score greater than 0.5
select: *
from: experiment('experiment_id')
filter: scores.Factuality > 0.5

-- Retrieve every log with the tag "triage" and not the tag "triaged"
select: *
from: project_logs('project_id')
filter: tags includes 'triage' and NOT tags includes 'triaged'

-- Compute the average number of tokens used by each model
dimensions: metadata.model
measures: avg(metrics.tokens)
from: project_logs("project_id")

The available clauses are:

select: specifies the fields you want to retrieve
dimensions/measures: specifies the fields you want to group by and aggregate.
- Either select or dimensions/measures must be specified.
from: specifies the source of the data
filter: an optional filter expression
sort: an optional sort expression
limit: an optional limit on the number of results to return
offset: an optional offset on the results to return

Each clause is defined as the name of the clause followed by a colon and then the value. By convention, you can put each clause on a new line, or separate them by |, but this is optional.

Projection

select, dimensions, and measures are called projections, and are each a list of expressions with an optional alias that you can use to refer to the result. For example:

select: scores.Factuality as factuality_score

select plainly retrieves the fields you want to see. On the other hand dimensions and measures are used to group and aggregate data. They are split into two separate clauses to make your query easier to read. If you only specify dimensions, it's equivalent to specifying a SELECT and GROUP BY ALL in SQL without any aggregates. If you only specify measures, it's equivalent to specifying a SELECT with aggregate expressions and no GROUP BY.

From

A from clause specifies the source of the data you want to query. In Braintrust, sources tend to have a type, e.g. experiment, and an id, e.g. '4da8be1e-368c-4def-a8aa-ad629330af2e'. To query from this experiment, use

from: experiment("4da8be1e-368c-4def-a8aa-ad629330af2e")

You can query across multiple experiments by specifying multiple ids:

from: experiment(
  "4da8be1e-368c-4def-a8aa-ad629330af2e",
  "220c943f-1dc0-409d-a4ff-1e8a2573164f",
)

The list of available object types is:

experiment
project_logs
dataset
project (all objects in a project)

Filter

A filter clause specifies the conditions that must be met for a record to be included in the result. It is just a single expression, which can be and-d or or-d together. For example:

filter: scores.Factuality > 0.5 and metadata.model = 'gpt-3.5-turbo'

Sort

Like SQL, each sort expression is a field name followed by an optional direction, e.g.

sort: factuality_score desc

BTQL currently only supports using aliases to refer to items in the sort clause.

API access

In addition to using the search bar, you can also use access BTQL via the API. For example, to run the query

select: *
from: project_logs('<YOUR_PROJECT_ID>')
filter: tags includes 'triage'

you can use the following API call (the extra quoting is necessary to escape the quotes in bash, but not in your code):

curl -X POST https://api.braintrust.dev/btql \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"query": "select: * | from: project_logs('"'<YOUR_PROJECT_ID>'"') | filter: tags includes '"'triage'"'}'

The body is a JSON-encoded object with the following fields:

query (required): the BTQL query to run. The query should be a string.
fmt (optional): the format of the response, either json or parquet. Defaults to json.
tz_offset (optional): if specified, all timezone bucketing operations will be offset by tz_offset minutes.
version (optional): the version of the data to query. If unspecified, the query will run against the latest version.
use_columnstore (optional): if true, use a columnstore for the query. For queries that scan lots of data, this can be much faster.
audit_log (optional): if true, return the audit log for the query. This can be useful for debugging.