Create Test Sets
This guide outlines the various methods for creating test sets in Agenta and provides specifications for the test set schema.
Test sets are used for runnning automatic or human evaluation. They can also be loaded into the playground, allowing you to experiment with different prompts.
Test sets contain input data for the LLM application. They may also include a reference output (i.e., expected output or ground truth), though this is optional.
You can create a test set in Agenta using the following methods:
- By uploading a CSV or JSON file
- Using the API
- Using the UI
- From the playground
- From traces in observability
Creating a Test Set from a CSV or JSON
To create a test set from a CSV or JSON file:
- Go to
Test sets
- Click
Upload test sets
- Select either
CSV
orJSON
CSV Format
We use CSV with commas (,) as separators and double quotes (") as quote characters. The first row should contain the header with column names. Each input should have its own column. The column containing the reference answer can have any name, but we use "correct_answer" by default.
If you choose a different column name for the reference answer, you'll need to configure the evaluator later with that specific name.
Here's an example of a valid CSV:
text,instruction,correct_answer
Hello,How are you?,I'm good.
"Tell me a joke.",Sure, here's one:...
JSON Format
The test set should be in JSON format with the following structure:
- A JSON file containing an array of objects.
- Each object in the array represents a row, with keys as column headers and values as row data. Here's an example of a valid JSON file:
[
{ "recipe_name": "Chicken Parmesan", "correct_answer": "Chicken" },
{ "recipe_name": "a, special, recipe", "correct_answer": "Beef" }
]
Schema for Chat Applications
For chat applications created using the chat template in Agenta, the input should be saved in the column called chat
, which would contain the input list of messages:
[
{ "content": "message.", "role": "user" },
{ "content": "message.", "role": "assistant" }
// Add more messages if necessary
]
The reference answer column (by default correct_answer
) should follow the same format:
{ "content": "message.", "role": "assistant" }
Creating a Test Set Using the API
You can upload a test set using our API. Find the API endpoint reference here.
Here's an example of such a call:
HTTP Request:
POST /testsets
Request Body:
{
"name": "testsetname",
"csvdata": [
{ "column1": "row1col1", "column2": "row1col2" },
{ "column1": "row2col1", "column2": "row2col2" }
]
}
Creating/Editing a Test Set from the UI
To create or edit a test set from the UI:
- Go to
Test sets
- Choose
Create a test set with UI
or select the test set - Name your test set and specify the columns for input types.
- Add the dataset.
Remember to click Save test set
Creating a Test Set from the Playground
The playground offers a convenient way to create and add data to a test set. This workflow is useful if you want to build your test set ad hoc, each time you find an interesting input for the LLM app, you can immediately add these inputs to the test set and optionally set a reference answer.
To add a data point to a test set from the playground, simply click the Add to test set
button located near the Run
button.
A drawer will display the inputs and outputs from the playground. Here, you can modify inputs and correct answers if needed. Select an existing test set to add to, or choose +Add new
to create a new one. Once you're satisfied, click Add
to finalize.
Currently, when adding a test point from the playground, the correct answer is always added to a column called correct_answer
.
When adding a new data point, ensure that the column names in the test set match those of the LLM application. All columns from the playground (input columns and correct_answer
) must exist in the test set. They will be created automatically if you're making a new test set. Any additional columns in the test set not available in the playground will be left empty.
Adding Chat History from the Playground
When adding chat history, you can choose to include all turns from the conversation. For example:
- User: Hi
- Assistant: Hi, how can I help you?
- User: I would like to book a table
- Assistant: Sure, for how many people?
If you select "Turn by Turn," two rows will be added to the test set: one for "Hi/Hi, how can I help you?" and another for "Hi/Hi, how can I help you?/I would like to book a table/Sure, for how many people?"
Adding Data From Traces
You can add any data logged to agenta to test sets. Simply navigate to observability, select the trace (or any span), then click on Add to testset
or the +
button.