Rating Scale
Participants rate each item independently on a numeric scale. Candor handles batching, randomization, and statistical aggregation so you get per-item means, medians, and standard deviations — without building any UI or computing any statistics.
Candor does this so you don't have to
Independent evaluation
Each item is presented individually. Participants see one item at a time and assign a rating on the configured scale. No cross-item comparison or anchoring effects.
Item order randomization
The order in which items appear is shuffled for every participant. This eliminates primacy and recency bias, ensuring each item gets a fair evaluation.
Multiple participants per item
Each item is rated by multiple participants — not just one. Overlapping judgments let Candor compute reliable averages and measure inter-rater agreement.
Automatic aggregation
All individual ratings are automatically aggregated into per-item statistics. Mean, median, standard deviation, and inter-rater reliability are computed in real time.
{ mean: 4.2 }AI-Generated Images
Scoring quality, realism, and aesthetic appeal of generated images across models or prompts.
Creative Assets
Rating design variants, brand assets, or marketing materials on specific quality dimensions.
Instruction Following
Evaluating how well LLM outputs follow instructions, measuring accuracy and completeness.
Design Mockups
Assessing UI designs, layouts, or prototypes for usability, clarity, and visual polish.
Video Quality
Benchmarking video generation models on motion quality, temporal consistency, and visual fidelity.