AgentTest revolves around the concepts of Experiments and their Variants. Understanding these is key to effectively using the platform.
An Experiment is the primary container for your A/B tests. It defines what you are testing and includes all the different versions you want to compare.
Key characteristics of an Experiment:
slug
: A unique, URL-friendly identifier for the experiment (e.g., onboarding-prompt-test
). This slug is used in API calls.name
: A human-readable name for the experiment (e.g., “Onboarding Prompt Test V1”). This is for display and organizational purposes.status
: Indicates the current state of the experiment. Common statuses include:
DRAFT
: The experiment is being set up and is not yet active.ACTIVE
: The experiment is currently running, and variants are being assigned to users.PAUSED
: The experiment is temporarily stopped; no new assignments will be made, but existing assignments might still be logged.COMPLETED
: The experiment has concluded.You typically create an experiment when you want to test a hypothesis, such as “Does prompt A perform better than prompt B for user summarization tasks?”
A Variant represents a specific version of your AI agent, prompt, configuration, or workflow that you are testing within an Experiment.
Key characteristics of a Variant:
key
: A unique identifier for the variant within its parent experiment (e.g., “A”, “B”, “control”, “treatment-alpha”). This key is returned by the assignment API and used in logging.payload
: A JSON object that contains the actual content or configuration for this variant. This is the data your application will use. Examples include:
{"prompt": "You are an expert travel assistant..."}
{"model": "gpt-4", "temperature": 0.8}
{"enable_new_summary_feature": true}
trafficPercent
: An integer (0-100) representing the percentage of users/requests that should be allocated to this variant. The sum of trafficPercent
for all active variants in an experiment should ideally (but not strictly enforced by API at assignment time) be 100. The assignment logic uses this to distribute users. For example, in a two-variant experiment, you might have:
trafficPercent: 50
trafficPercent: 50
The payload
of a variant is what your application receives when a user is assigned to that variant. Your application logic then uses this payload to alter its behavior, for example, by using the received prompt with an LLM or adjusting model parameters.