Candor documentation
Quickstart
1. Install
This installs the Candor CLI via npm, authenticates you via the browser, and installs the Candor skill for Claude Code.
2. Run a study
Your agent reads your codebase for context, drafts a study script, shows you the cost estimate, and waits for your approval before recruiting.
3. Get results
View prioritized findings (P0-P3) with key quotes and suggested fixes. Or ask your agent to open GitHub issues for the top findings.
How it works
When you ask your coding agent to run a study, it uses the Candor CLI to create the study and recruit real participants from a research panel. Each participant lands on a Candor session page and interacts with whatever stimulus is configured — items to evaluate, a product to test, or a topic to discuss.
For moderated studies (session or study scope), an AI voice moderator conducts the interview — asking questions, assigning tasks, and adapting follow-ups in real time. For item-based studies, participants complete structured tasks (compare, rate, or label) in a lightweight UI with no moderator.
After sessions complete, Candor synthesizes the data into rankings and agreement metrics (item-based) or prioritized findings with severity ratings, key quotes, and suggested actions (moderated). Your agent can then act on the results — opening issues, proposing fixes, or running follow-up studies.
candor initThis command:
- Opens your browser to authenticate with candor.sh
- Saves your API key to
~/.candor/config.json - Installs the Candor skill to
~/.claude/skills/candor/
candor logincandor studycandor study with no subcommand lists all studies.Output
candor study list--json.candor study create--task and --moderator are auto-inferred from the stimulus type, so you usually don't need to specify them. --items infers compare/none, --url infers use/session, --topic infers share/session. Advanced options like --workers and --reward are available for item-based studies — run candor study create --help for details.candor study show <id>candor study status <id>--live for real-time monitoring with activity feed.candor study results <id>candor study cancel <id>candor study findings <id>candor study coverage <id>candor study approve <id>candor doctorChecks for:
- Candor CLI installed and up to date
- API key configured
- Claude Code detected
- Skill file installed
candor updatecurl -fsSL https://candor.sh | bashStudies
Stimulus types
Task types
Study lifecycle
AI moderator
The AI moderator is a voice agent that conducts live interviews with study participants. It follows your study script, asks open-ended questions, assigns tasks, and adapts follow-up questions based on participant responses — probing on confusion, frustration, or unexpected behavior in real time.
For moderated studies (session and study scope), the moderator records transcripts and audio, which are later synthesized into findings. See Moderator scope for the none/session/study distinction and voice vs. text output modes.
Moderator scope
When to use each scope
none — Best for quantitative evaluation: A/B comparisons, ratings, labeling. No moderator overhead, results in minutes. Use this when you have concrete items to evaluate and need fast, structured signal.
session — Best for qualitative feedback on a specific product or topic. Each session is independent — the moderator follows the same script but adapts to each participant. Use this for usability tests, concept validation, or exploratory interviews.
study — Best for deep research. The moderator builds understanding across sessions, adaptively probing gaps and emerging themes. Use candor study coverage to see exploration state and decide when you have enough signal.
Output modes
For session and study scopes, the --moderator-output flag controls how the AI communicates with participants:
voice (default) — The AI speaks aloud. Full two-way voice conversation with the participant.
text — The AI monitors audio and sends text-based interventions only. Useful for think-aloud studies where you want the participant to narrate freely while the AI observes and prompts via on-screen messages.
Findings
Each finding includes the title, description, category, affected feature/page, number of times mentioned, key participant quotes, and a suggested action.
Pricing
Moderated studies (url/topic): ~$14.50 per participant session, covering recruitment, AI-moderated interview, transcripts, and findings synthesis. A typical 5-participant study costs ~$65–$75.
Item-based studies (compare/rate/label): ~$0.05–0.15 per task assignment. A typical pairwise study with 6 items (15 pairs, 3 workers each) costs ~$4.50–$6.75.
Your agent will always show the cost estimate before you approve.
Evaluating AI agents
This guide walks through using Candor to evaluate the output of an AI agent against a rubric. It covers how to decompose a multi-criteria eval into a single study, how to pick the right task type, how to write a rubric that produces reliable labels, how content renders to your labelers, and how to preview cost before anything charges.
The worked example is a research assistant agent that reads source documents and writes summaries — but the same pattern applies to any multi-source AI agent: research assistants, customer support triage bots, GTM tools, code review agents, or anything else where you want human judgement on whether the output is good.
Scope. This guide focuses on item-based studies (your agent produced some outputs, and you want humans to label them). For moderated interviews where humans talk to a Candor AI moderator about a product, see the Studies concept section above.
Breaking down a complex eval
Most real agent evals are multi-dimensional. You don't just want to know "is this output good?" — you want to know why it's good or bad across several orthogonal axes. Take a research assistant that reads a set of source documents and produces a summary. Four reasonable things to measure:
- Did the agent retrieve the right source documents for the question?
- For every claim in the output, is the source correctly attributed?
- Is the summary faithful to what the sources actually say?
- Does the output surface the most important findings — or bury them?
The wrong instinct here is to run four separate studies, one per question. That costs 4× the labeler time (each labeler has to re-read the same output four times) and gives you four disjoint score streams that are hard to join.
The right pattern is one item per agent output, scored along multiple dimensions in a single pass. Each labeler sees one output, answers all four questions on it at once, and moves on. In Candor this is a scorecard task with four criteria.
If your dimensions are all orthogonal (measuring different things), fold them into one scorecard. If they aren't — if two of them are really asking the same thing — drop one.
Picking a task type
Candor supports five item-based task types. The choice is not about the mechanic — it's about the question you want answered.
- "Is this output good across several dimensions?" —
scorecard— Multiple weighted criteria, behavior-anchored levels, a weighted overall score. The right tool for almost every agent-eval scenario. - "Is version A of my agent better than version B?" —
compare (pairwise)— Participant sees two outputs side-by-side, picks the winner. Use when absolute judgement is hard but relative is easy — e.g. comparing two model versions, two prompts, or two generation strategies. - "Which category does this output fall into?" —
label— Participant picks one label from a fixed set. Good for taxonomies (e.g. intent classification, sentiment, failure-mode tagging). - "On a 1–5 scale, how good is this?" —
rate— Single-dimension rating. Simpler than a scorecard, but you lose the per-dimension breakdown. Use only when you genuinely only care about one axis. - "What did you notice? Freeform." —
describe (free text)— Participant writes an open-ended response. Great as a companion task to catch failure modes you didn't anticipate in your rubric — but hard to aggregate, so don't use it as your primary signal.
For the research-assistant example — and for essentially any multi-criteria eval — scorecard is the right choice. The rest of this guide assumes scorecard.
A quick word on pairwise, since it's the least obvious one: pairwise is useful when you're comparing two versions of something (two model variants, two prompt templates, two retrieval strategies) and absolute quality is subjective. People are much better at saying "A is better than B" than at saying "A is a 4 out of 5". But pairwise only gives you relative rankings — it can't tell you whether any of the options are actually good.
Designing a rubric
A scorecard is a set of criteria. Each criterion has a name, a weight, and an ordered list of levels from worst to best. Good rubrics produce reliable labels; sloppy rubrics produce inter-rater disagreement and findings you can't act on.
Keep dimensions to 3–5
More than five and labelers get fatigued; their later answers get sloppier. Fewer than three and you lose the whole point of a multi-dim eval. If you catch yourself wanting six dimensions, usually two of them are measuring the same thing.
Use behavior-anchored levels
A level description should tell a labeler exactly what they're looking for — ideally something they can observe or verify without having to make a judgment call. Compare:
When two labelers look at the same output and pick different levels, it's almost always because the level wording was ambiguous — not because the output was genuinely borderline. Invest in the wording.
Use 4 levels when you can
An even number of levels forces labelers off the fence. Three or five levels give them a comfortable middle option ("Satisfactory", "Neutral") that they'll reach for whenever they're unsure — which means your middle bucket absorbs noise and tells you nothing. Four levels makes them pick a side.
Weights are relative
Weights only matter in ratio to each other. [5, 3, 3, 1] and [10, 6, 6, 2] produce identical overall scores. Use small integers and pick weights that reflect how you'd actually explain the decision to a stakeholder: "accuracy is way more important than formatting" becomes weight: 5 for accuracy and weight: 1 for formatting.
Here's the criteria JSON for the research-assistant example:
Note that levels are ordered worst to best, and accuracy has the highest weight because this team cares most about whether they can trust the summary. Your weights should reflect your own priorities.
Content formats — what labelers actually see
Before you prepare items, you need to know what your labelers will see on screen. Each item has three display slots:
label— short text. Plain text only — no Markdown or HTML.assetUrl+mimeType— media. Supportsimage/*,audio/*, andvideo/*only. Not HTML, not text files.metadata.description— optional small-gray caption rendered below the main content. Good for subtitles like "Q3 2026 run — v2.4 of the agent".
When you set both label and assetUrl, both render — the media shows prominently and the label renders as a caption right next to it. It's not one or the other.
Here's exactly what labelers see for each combination:
Image + label + description
Text only (no assetUrl)
Text-only items are fine for short strings (a headline, a single sentence, a filename) but not for anything longer. There's no wrapping control, no formatting, no line breaks — just a single styled span inside a gray box. A long paragraph will render as one ugly run-on.
Audio
Video
Pairwise (two items side-by-side)
Pairwise is just ItemRenderer rendered twice in a grid with "Option A" / "Option B" labels on top. Whatever an item looks like on its own, it looks the same inside a pairwise comparison.
Rendering long-form text output
This is the practical gotcha for agent evals: your agent produces long structured text, and the item renderer has no way to show it cleanly. There is no HTML path, no Markdown path, no long-text path.
The reliable workaround is to render your agent's output to HTML yourself, screenshot the rendered page, host the PNG, and pass the PNG as an assetUrl. You get full control over typography, syntax highlighting, sections, and wrapping, and it renders exactly the same way to every labeler. Use label as a short caption identifying which run/version the screenshot came from.
The item label field does not support Markdown or HTML. If your content has any formatting — headings, bullets, code blocks, tables — you must screenshot it before uploading, or labelers will see a run-on string in a gray box.
Hosting your assets
Candor does not proxy your media. Whatever URL you pass as assetUrl must be publicly reachable — from both Candor's servers and every labeler's browser. There are two fetch paths to keep in mind.
Server-side probing (audio/video only)
When you create a study with audio or video items, Candor's server immediately calls fetch(assetUrl) to measure duration. This feeds the reward estimator so the auto-calculated payout matches the actual task length. If the URL isn't reachable from Candor's servers, the probe silently fails and your reward estimate will be off — but the study still creates. Image-only studies skip this step entirely.
Client-side loading (all media)
When a labeler actually takes the task, their browser loads assets directly via <img src=...>, <audio><source>, and <video><source>. There is no proxy. If the URL requires auth headers, a session cookie, a VPN, or is behind your corporate firewall, the labeler sees a broken-image icon and your data for that item is garbage.
Quick check: can you open the asset URL in an incognito browser window with no cookies and no VPN? If yes, labelers can see it. If no, they can't.
Practical hosting choices
- Public CDN path — simplest and most reliable. Cloudflare, CloudFront, Bunny, Fastly, anything that returns your file on a plain GET.
- S3 presigned URLs — work fine, but pick an expiration longer than your study duration. A study can run for days; presign for at least 7 days.
- Google Drive / Dropbox share links — often don't work, because the share URL redirects through a preview page that
<img>can't render. If you must use them, find the direct-download URL and use that instead. - Localhost / intranet / VPN-only URLs — will never work. Candor's servers can't reach them and neither can outside labelers.
Screenshots of your agent's output may contain sensitive data (customer names, internal documents, API keys that leaked into logs). Sanitize before you upload, or keep the study internal by using platform: "direct" and sharing only with your team. Once a URL is in an assetUrl, anyone who receives a participant link can open it.
Getting reliable labels
A scorecard is only as good as the people filling it out. Agent evals are usually specialist tasks — "is this retrieval correct?" or "is this attribution complete?" requires someone who understands the domain. A crowd worker recruited in 10 minutes almost certainly can't answer these correctly.
For agent evals, the right default is direct-link studies labeled by your own team. Set platform: "direct" in the create payload, then share the resulting URLs with a small group of domain experts — engineers, product folks, or whoever knows what a good output looks like. Candor's managed recruitment is better for broader-audience questions (does a marketing message resonate?) than for technical quality judgements.
How many labelers per item?
The participants field controls redundancy — how many people see each item. One labeler is cheapest but gives you zero agreement signal: if someone makes a mistake, you'll never notice. Three labelers is the de-facto standard — enough to compute agreement and catch outliers without being wildly expensive. Five or more for decisions that really matter (model launch / kill calls).
A sensible rollout is: start with 1–2 internal labelers for a dry run, make sure the rubric is sane, then scale up to 3 labelers for the real evaluation.
End-to-end walkthrough
Here's the whole flow for the research-assistant example — from rubric to results — using real API calls.
Step 1: Prepare items
For each agent output you want evaluated, render it to HTML (whatever rendering your agent already produces), screenshot it to a PNG, and upload the PNG to a public CDN. You end up with one URL per output. The label should be a short identifier the labeler can reference if they need to report a problem.
Step 2: Create a draft study
A few things to notice about that payload: task: "scorecard" and criteria go together; platform: "direct" keeps this study internal (no external recruitment); participants: 3 gives you three labelers per item for agreement signal.
Step 3: Inspect the estimate
The create response gives you everything you need to decide whether to go ahead. Nothing has charged yet — this is a draft.
totalTasks(number) — How many individual item scorings will happen.items.length × participantsgives the upper bound; Candor may bundle multiple tasks into one assignment.totalAssignments(number) — How many distinct labeler sessions. Divided by yourparticipantsfield this tells you how many share URLs the approve step will give you.estimatedCostCents(number) — Lower-bound cost estimate, mostly useful as context. PrefertotalCostCentswhen budgeting.totalCostCents(number) — This is the number you actually pay. Forplatform: "direct"item studies with internal labelers, this is typically very small — no recruitment fees apply.estimatedMinutes(number) — How long Candor expects a labeler to take per assignment. Useful for setting expectations with your team.
Step 4: Preview what labelers will see
Before you approve, open the study in a browser as if you were a labeler:
Walk through the task end-to-end, check that your asset URLs load, that the rubric reads the way you expected, and that the levels are unambiguous. This works on drafts and doesn't consume a real assignment. See Previewing a task for the programmatic flow if you want to embed the preview URL in your own tooling.
Step 5: Approve, or delete and retry
If the estimate looks fine and the preview checks out, approve the study. This is the only endpoint that debits your balance. For direct-link studies it costs nothing at approve time; for managed recruitment it deducts the full totalCostCents.
If the estimate is wrong, too expensive, or you want to tweak the rubric, delete the draft and create a new one. Drafts are free to create and destroy — iterate as much as you want before approving.
Step 6: Share links with labelers
For direct-link studies, the approve response includes a shareUrls array — one URL per labeler slot. Send each URL to someone on your team; they follow the link, complete the tasks, and submit. No accounts, no signup, no Candor-side recruitment.
Step 7: Read the results
Once labelers have submitted, GET /api/studies/:id/results returns a per-item, per-dimension breakdown. Each item shows its weighted overall score and the per-dimension detail.
Two things to know about these numbers:
meanScoreis normalized to 0.0 – 1.0, not 0–5. A 4-level rubric with unanimous top picks gives 1.0; unanimous worst gives 0.0.overallWeightedScoreis the weighted average of allmeanScorevalues across dimensions, using theweightfield. It's the single number you'll quote in dashboards.levelDistributionis a raw count of how many labelers picked each level — useful for spotting disagreement (a 1/1/1 split across three labelers is a flashing red light).
In the example above, the agent is strong on retrieval and accuracy but weak on attribution and prioritizing findings. That's an actionable signal: fix the attribution chain and the findings-ranking step, then re-run with a new batch of items to see if the scores improve.
Previewing a task before you share it
Before you approve a study and send links to labelers, you almost always want to open it yourself and walk through the task the way a labeler would. Candor gives you a preview URL for this — it works on drafts, so you can see exactly what the task looks like before anything charges and without using up a real assignment slot.
The easy path: one URL, no API calls
Open this in any browser:
That page creates a preview assignment, redirects you to the real participant UI, and appends ?preview=true so submissions aren't recorded. You see every task in order — the full rubric, the actual item renders, the submit button, everything. Share the URL with a teammate and they can review without needing a Candor account.
The programmatic path
If you want to embed the preview URL in your own tooling (e.g. show it next to the estimate in a custom dashboard), call the preview endpoint directly:
The response gives you a relative URL you can prefix with the API base:
Append ?preview=true and hand it to whoever needs to see it. Moderated studies return { type: "moderated", url, sessionToken } instead — the URL drops you into a full moderated session so you can walk the interview script end-to-end.
What preview does and doesn't do
- Works on drafts — you don't need to approve first.
- Shows the exact same ItemRenderer your real labelers will see, loading your real asset URLs. If something renders weird in preview, it will render weird for labelers too.
- Creates a real assignment row tagged
preview_*. These are filtered out ofprogress,results, and the activity feed, so they won't pollute your real data. - Multiple calls create multiple preview assignments — harmless but they accumulate.
To clean up, delete them whenever you want:
This removes every preview_* assignment (and its tasks) for that study. Real participant data is never touched.
Previewing cost before you commit
To recap the money flow clearly, so you can build it into whatever tooling you use to drive Candor:
POST /api/studies— always free. Creates a draft, returnsestimate. Nothing charges.DELETE /api/studies/:id/delete— also free. Deletes the draft; no residual state.POST /api/studies/:id/approve— this is the only endpoint that debits your balance. For direct-link item studies it costs nothing (platform-internal); for managed recruitment it deducts the fulltotalCostCentsyou saw in the estimate.POST /api/studies/:id/publish— also free. Just flips the study from ready-to-publish into active recruitment on the provider.
The safe iteration loop is:
You can create and destroy as many drafts as you want while tuning your rubric — nothing charges until you explicitly approve. That's the whole cost-preview story.
API Reference
Every feature the CLI uses is available over HTTPS. Create an API key from the API Keys page in the dashboard, then call the same endpoints the CLI does. All responses are JSON.
Base URL: https://candor.sh/api
Authentication
Pass your API key as a bearer token in the Authorization header.
Keys look like ck_<48-hex>. Treat them like passwords — anyone with a key can read and modify your studies. Revoke compromised keys from the dashboard.
Studies
The three study types
Every study has a stimulus type that decides what participants see and how they respond. You pick a stimulus by sending exactly one of items, url, or topic in the create payload — Candor infers the rest.
- Item studies — triggered by
items. Participants evaluate a discrete set of things (images, audio clips, copy variants, model outputs). No voice — everything happens in a structured browser UI. Use when you have alternatives to compare or want to label/rate a collection. - URL studies — triggered by
url. Participants visit a live website or product and talk through their experience with an AI moderator. You can pass an interview guide; Candor generates one from your goal if you don't. Use when you're testing a real product and want qualitative feedback on specific flows. - Topic studies — triggered by
topic. Same moderated-interview format as URL studies, but without a product to test. Participants discuss a subject with the AI moderator. Use for discovery research, concept testing, or any interview that isn't tied to a live UI.
Create a study
POST /api/studies
goal is the only universally-required field. Everything else depends on the stimulus type. Candor fills in sensible defaults (task, moderator scope, reward, batch size) based on the combination you pass, so a minimal request is usually enough to start.
Parameters shared by all study types
goal(string, required) — Plain-language description of what you want to learn. Used as the study name ifnameis not provided, and fed to the moderator / task generator to shape the participant experience.participants(number, default: 5) — For moderated studies (URL/topic): the number of sessions to run. For item studies on the direct platform: the number of independent shareable links. For item studies on a managed platform: the number of respondents per task batch.audience(string) — Natural-language audience description used when Candor is recruiting for you (e.g."US designers aged 25-40 who use Figma daily"). Ignored forrecruitment: "direct".platform("direct" | "managed", default:"direct") — Controls how participants are sourced.directgives you a URL you share yourself (free, no recruitment fees).managedhas Candor recruit from a vetted pool using youraudiencestring.reward(number (cents)) — Per-session (moderated) or per-assignment (items) payout. If omitted, Candor auto-calculates based on task type, media duration, and platform fees. Required only if the auto-estimate feels wrong.rewardMultiplier(number) — Multiplier applied to the auto-calculated reward. Use for hard-to-recruit audiences where you want to pay above-market without picking an exact number.
Item studies
Send a non-empty items array. Each item needs a label and optionally an assetUrl (for images, audio, video) plus mimeType. Candor probes audio and video durations at creation time to size the reward and batch correctly — if you pass assetUrl, make it publicly reachable.
The task field decides what participants do with each item. If you omit it, Candor picks one by reading your goal (and whether you passed labels).
Task types for item studies
task: "compare"(or rank — 20 pairs per assignment) — Shows two items side by side; participant picks the winner (or ties). No extra fields required. Candor generates all pairs for small item sets, or samples for >100 items.task: "rate"(or score — 20 items per assignment) — Shows one item; participant rates 1–5. No extra fields required.task: "label"(or categorize — 20 items per assignment) — Shows one item; participant picks from a fixed label set. Passlabels: string[]— e.g.["positive", "neutral", "negative"]. Required.task: "describe"(or transcribe / respond / review — 20 items per assignment) — Shows one item; participant writes an open-ended response. No extra fields. Pick the verb that best matches your instructions: describe for observations, transcribe for audio/video, review for evaluative writing.task: "scorecard"(10 items per assignment) — Shows one item; participant evaluates it across multiple rubric dimensions. Passcriteria: { name, weight, levels[] }[]. Required. Each criterion produces a dimension score; overall score is weighted.
Batch size (batchSize) is how many tasks go into a single worker's assignment — defaults above. Override it if you want shorter or longer sessions.
For very large item sets (more than a few hundred), post the first chunk in the create call and append the rest via POST /api/studies/:id/items in batches of up to 500.
URL studies
Send a url to the page you want tested. Candor automatically generates a short interview guide from your goal — or pass your own in interviewGuide (plain text / Markdown; Candor converts it to the internal script format via Claude).
url(string, required) — The page participants will interact with.displayMode("iframe" | "tab", default:"iframe") —iframeembeds the page inside Candor's session UI alongside the moderator panel.tabopens it in a new browser tab — use this for sites that refuse to be iframed.durationMinutes(number, default: 5) — Target session length. Drives the auto-estimated reward and the generated interview script's section timing.interviewGuide(string) — Your own interview script as plain text. Sections, questions, and tasks are preserved verbatim. If omitted, Candor generates a 2–3 section script from yourgoal.moderator("none" | "session" | "study", default:"session") —sessionruns one moderator per participant (default).studyruns one moderator across all sessions with adaptive coverage — cheaper but participants don't each get a full interview.nonedisables the moderator entirely (rarely useful for URL studies).moderatorOutput("voice" | "text", default:"voice") —voice— AI moderator speaks aloud via TTS and expects participants to reply with their voice (a full spoken interview).text— AI monitors participant speech via STT but responds only with text prompts on screen. Default isvoicefor all moderated studies (URL, topic, and items follow-ups). Passtextto opt out of TTS.inputModes(string[]) — Explicit list of input channels participants can use, e.g.["voice", "text"]. Default is voice only.
Topic studies
Send a topic string. Everything else works the same as URL studies — same moderator options, same interview guide handling, same recruitment choices. The only difference is participants don't see a website during the session.
Direct vs managed recruitment
Every study has a platform that controls how participants are sourced. It's a creation-time choice — you can't switch a study between modes later.
platform: "direct"— Candor gives you one shareable URL per participant slot. You send the links to whoever you want — teammates, your own user list, a beta group. No recruitment fees. Theaudiencefield is ignored.platform: "managed"— Candor handles recruitment against a vetted participant pool using theaudiencestring you provide. The total cost per participant is visible on the create response undertotalCostCents— nothing charges until you approve the study.
After creating a study
Every create returns status: "draft". Nothing charges, nothing recruits, nothing is visible to participants until you explicitly approve and publish:
Direct (link) studies skip the publish step — approve activates them and returns the shareable URLs inline.
List studies
GET /api/studies
Returns an array of studies. Append ?archived=true to include archived studies.
Get a study
GET /api/studies/:id
Append ?include=findings,participants to include related entities in the response. Example response body:
The platform field is direct for self-shared studies and managed when Candor handles recruitment.
Add items to a study
POST /api/studies/:id/items
Append additional items to an existing item-based study. Useful for large studies where the initial POST /api/studies body would exceed request size limits — send the first batch on create, then stream the rest here in chunks of 500.
Preview a task
The POST endpoint creates a disposable assignment and returns a url that opens the real participant UI in preview mode (no responses saved, preview rows filtered out of results). Works on drafts — you can preview before approving. For the browser-friendly shortcut, just visit https://candor.sh/study/preview/:id and the page handles the POST and redirect for you. See Previewing a task in the guide for the full flow.
Lifecycle transitions
The pause and archive endpoints also handle the inverse operation. Pause auto-toggles — calling it on a paused study resumes it — or pass { "action": "resume" } in the body to force it. Archive takes { "action": "unarchive" } to restore a study.
Results & findings
The output shape depends on the study type. Item studies return computed results that depend on the task; moderated (URL/topic) studies return prioritized findings extracted from session transcripts.
Append ?format=csv to any of these to download as CSV. Worker IDs in results and demographics are exposed as participantId (JSON) or external_participant_id (CSV) so you can correlate responses across calls without depending on any recruitment provider's ID format.
Item-study results — shape per task type
GET /api/studies/:id/results is valid for item studies only. The results object shape depends on the task you picked at creation time:
Pairwise comparison (task: "compare")
Rating scale (task: "rate")
Categorical label (task: "label")
Free text (task: "describe", "review", etc.)
Scorecard (task: "scorecard")
Moderated-study findings
GET /api/studies/:id/findings returns prioritized findings (P0–P3) that Candor extracts from session transcripts after sessions complete. The same P0–P3 scale used for the CLI's findings command — see the Findings concept section above for what each priority means.
Coverage (moderated studies only)
GET /api/studies/:id/coverage returns the themes participants have explored so far and which expected themes are still missing — useful mid-study to decide whether to keep running sessions or stop early.
Demographics (managed recruitment only)
GET /api/studies/:id/demographics works only for studies with managed recruitment. Returns one row per participant with demographic fields reported by the recruitment provider.
Account balance
GET /api/billing/balance
Returns your organization's prepaid balance in cents and whether a payment method is on file. Useful for showing a top-up prompt before creating an expensive study.
Webhooks
Instead of polling, subscribe to events and Candor will POST them to your endpoint as they happen. Create an endpoint from the Webhooks page — you'll get back a signing secret that you should store securely.
Payload format
Webhook bodies are intentionally minimal. Use GET /api/studies/:id to fetch the full state — this keeps payloads small and lets you process events in any order.
Event types
Events are grouped into four namespaces. Subscribe to the ones you need.
study.*— lifecycle transitions (launched, published, paused, completed, cancelled)participant.*— participant events (joined, session_started, submitted, no_show)interaction.*— clicks, media playback, responses. High volume.transcript.*— per-utterance events from the AI moderator and participant. Very high volume.
Verifying the signature
Every request includes an X-Candor-Signature header with an HMAC-SHA256 of the raw body using your endpoint secret. Verify it before trusting the payload.
Delivery & retries
Your endpoint must respond with 2xx within 10 seconds. Non-2xx responses and timeouts are retried with exponential backoff (up to 3 retries = 4 attempts total). Persistently failing endpoints are flagged in the dashboard as Failing but not automatically disabled — you'll see the state change on the Webhooks page and can pause or delete the endpoint from there.
Webhook delivery is at-least-once. Use X-Candor-Delivery-Id to deduplicate on your side if you're doing anything non-idempotent.
Other headers you'll see on each delivery:
X-Candor-Event— event type (e.g.study.completed)X-Candor-Delivery-Id— unique delivery ID, useful for idempotencyX-Candor-Timestamp— when the event was emitted
Errors
All error responses use standard HTTP status codes and return a JSON body with an error field.
400— Bad request — missing or invalid parameters401— Unauthorized — missing or invalid API key402— Payment required — insufficient account balance404— Not found — study, endpoint, or resource does not exist422— Unprocessable — validation failed (e.g. pre-flight check)500— Server error — try again or contact support