Overview
When calibrating judges, you can submit feedback programmatically using the SDK. This is useful for:- Bulk feedback submission from automated pipelines
- Integration with custom review workflows
- Syncing feedback from external labeling tools
Important: Using the Correct IDs
Judge evaluations involve two related spans:| ID | Description |
|---|---|
| Source Span ID | The original LLM call that was evaluated |
| Judge Call Span ID | The span created when the judge ran its evaluation |
judge_id parameter to ensure
feedback is correctly associated with the judge evaluation.
Python SDK
From the UI (Recommended)
The easiest way to get the correct IDs is from the Judge Evaluation modal:- Open a judge evaluation in the dashboard
- Expand the “SDK Integration” section
- Click “Copy” to copy the pre-filled Python code
- Paste and customize the generated code
Manual Submission
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt_slug | str | Yes | The task slug associated with the judge |
completion_id | str | Yes | The span ID being evaluated |
thumbs_up | bool | Yes | True if judge was correct, False if wrong |
reason | str | No | Explanation of the feedback |
judge_id | str | Yes* | The judge automation ID (*required for judge feedback) |
REST API
Finding Your IDs
| ID | Where to Find It |
|---|---|
| Task Slug | In the judge settings, or the URL when editing the judge’s prompt |
| Span ID | In the evaluation modal, or via get_judge_evaluations() response |
| Judge ID | In the URL when viewing a judge (/judges/{judge_id}) |
Bulk Feedback Submission
For submitting feedback on multiple evaluations, you can iterate through evaluations:Related
- Pulling Evaluations - Retrieve judge evaluations programmatically
- Judge Setup - Configure and deploy judges