Skip to main content

Overview

ze.choose() enables A/B testing by making weighted random selections between variants (models, prompts, parameters, etc.), timeboxing each experiment, and automatically tracking the chosen variant on the active span/trace/session for downstream analytics. Key features:
  • Weighted random selection between variants
  • Experiment timeboxing via duration_days
  • Automatic tracking of choices within spans, traces, or sessions
  • Consistency caching — same entity always gets the same variant
  • Built-in validation of weights, variant keys, and defaults
  • Automatic fallback to a default variant once an experiment completes

Basic Usage

import zeroeval as ze

ze.init()

# Must be called within a span, trace, or session context
with ze.span("my_operation"):
    # Choose between two models with 70/30 split for 14 days
    model = ze.choose(
        "model_selection",
        variants={"fast": "gpt-4o-mini", "powerful": "gpt-4o"},
        weights={"fast": 0.7, "powerful": 0.3},
        duration_days=14,
        default_variant="fast"  # optional fallback after day 14
    )
    
    # Use the selected model
    # model will be either "gpt-4o-mini" (70% chance) or "gpt-4o" (30% chance)

Parameters

ParameterTypeRequiredDescription
namestrYesName of the A/B test (e.g., “model_selection”, “prompt_variant”)
variantsDict[str, Any]YesDictionary mapping variant keys to their values
weightsDict[str, float]YesDictionary mapping variant keys to selection probabilities (must sum to ~1.0)
duration_daysintYesNumber of days the experiment should run; must be > 0
default_variantstrNoVariant key to use automatically once the experiment ends (defaults to the first key if omitted)

Returns

Returns the value from the selected variant (not the key).

Experiment Lifecycle & Defaults

  • duration_days timeboxes the experiment. Once the backend marks it completed, ze.choose() automatically serves the default_variant.
  • If default_variant is omitted, the first key in variants becomes the fallback.
  • When an experiment is still active, the same entity (span/trace/session) receives a cached, consistent variant choice.

Tracking Signals

Attach success metrics to the same span where ze.choose() runs so dashboards can correlate outcomes with variant performance:
with ze.span("recommendation_flow") as span:
    model = ze.choose(
        "reco_models_v2",
        variants={"mini": "gpt-4o-mini", "full": "gpt-4o"},
        weights={"mini": 0.6, "full": 0.4},
        duration_days=21,
        default_variant="mini",
    )
    
    score = run_inference(model)
    ze.set_signal(span, {"conversion_success": score > 0.75})

Complete Example

import zeroeval as ze
import openai

ze.init()
client = openai.OpenAI()

with ze.span("model_ab_test", tags={"feature": "model_comparison"}):
    # A/B test between two models
    selected_model = ze.choose(
        "model_selection",
        variants={
            "mini": "gpt-4o-mini",
            "full": "gpt-4o"
        },
        weights={
            "mini": 0.7,  # 70% traffic
            "full": 0.3   # 30% traffic
        },
        duration_days=14,
        default_variant="mini"
    )
    
    # The selected model is automatically tracked
    response = client.chat.completions.create(
        model=selected_model,
        messages=[{"role": "user", "content": "Hello!"}]
    )
    
    # Attach a success signal tied to this span/choice
    rating = evaluate_response(response)
    ze.set_signal(span, {"response_quality": rating >= 0.7})

Important Notes

  • Context Required: Must be called within an active ze.span(), trace, or session
  • Consistency: Same entity (span/trace/session) always receives the same variant while the test runs
  • Weight Validation: Weights should sum to 1.0 (warns if not within 0.95-1.05)
  • Duration Required: duration_days must be > 0; experiments stop after this window
  • Fallback Behavior: Once the backend reports the test as completed, default_variant is used automatically
  • Signal Analytics: Use ze.set_signal() on the same span to compare variant impact in the dashboard
  • Key Matching: Variant keys and weight keys must match exactly