Annotation & Labeling

Build training datasets, audit model predictions, or label content for compliance. Define your own taxonomy and get human-verified classifications with label distributions and agreement metrics. Supports text, images, audio, and video.

Content Moderation

You have a batch of user-generated content and need it labeled for policy compliance. Reviewers categorize each item into your custom label set — safe, borderline, violation, or whatever you define. Get label distributions and agreement metrics per item, and scale to thousands when you need production-grade moderation data.

Participant view
Positive
Neutral
Negative
Learn more about Categorization
How you'd run it
$ candor study create --goal "label content for policy compliance" \
--items "posts.csv" --task categorize \
--labels "safe,borderline,violation" --recruit --participants 5
What you get back
Item Safe Borderline Violation Agreement
post_1041 100% 0% 0% 1.00
post_1042 20% 60% 20% 0.40
post_1043 0% 20% 80% 0.73
post_1044 80% 20% 0% 0.73
240 items labeled · Fleiss' κ: 0.68 (substantial)

Image Classification

You have a set of images and want them sorted into categories. Define your label taxonomy and get human-verified classifications with confidence scores. Scales to large datasets with inter-annotator agreement for ML training data, QA, or content tagging.

Participant view
Positive
Neutral
Negative
Learn more about Categorization
How you'd run it
$ claude "classify these product images: electronics, clothing, home, food"
What you get back
Item Label Confidence Agreement
IMG_2041.jpg electronics 100% 5/5
IMG_2042.jpg clothing 80% 4/5
IMG_2043.jpg home 60% 3/5
IMG_2044.jpg food 100% 5/5
500 images · Fleiss' κ: 0.84 (near-perfect)

Sentiment Analysis

You have a CSV of customer reviews or support tickets and want them labeled by sentiment. Reviewers classify each one as positive, neutral, or negative — or whatever labels you define. Great for building training data or auditing ML model predictions at any scale.

Participant view
Positive
Neutral
Negative
Learn more about Categorization
How you'd run it
$ candor study create --goal "label sentiment" \
--items "reviews.csv" --task categorize \
--labels "positive,neutral,negative" --recruit --participants 5
What you get back
Corpus Summary (1,200 reviews)
Positive: 58.3% (700)
Neutral: 25.0% (300)
Negative: 16.7% (200)
Fleiss' κ: 0.81 (near-perfect) · 5 reviewers

Audio Transcription

You have audio clips and want accurate text for each one. Reviewers listen and write out what they hear, giving you human transcriptions you can trust. Supports mp3, wav, ogg, flac, m4a, and other common formats.

Participant view
Describe what you see...
Learn more about Free Text
How you'd run it
$ claude "transcribe these 12 audio clips with 3 reviewers each"
What you get back
clip_007.mp3 (3 transcriptions):
R1: "The quarterly results exceeded expectations by twelve percent"
R2: "The quarterly results exceeded expectations by 12 percent"
R3: "The quarterly results exceeded expectations by 12%"
12 clips · 36 transcriptions · Avg WER across reviewers: 0.03

Image Description

You have images and want human-written descriptions of what's in them. Useful for alt-text generation, accessibility audits, or training vision models. Each reviewer describes what they see, and you get multiple perspectives per image.

Participant view
Describe what you see...
Learn more about Free Text
How you'd run it
$ candor study create --goal "describe what you see" \
--items "screenshots/*.png" --task freetext \
--recruit --participants 5
What you get back
dashboard-home.png:
"A dark-themed analytics dashboard showing a line chart of
monthly revenue trending upward, with a sidebar navigation
listing Reports, Settings, and Team sections."
20 images · 100 descriptions · Avg length: 42 words