Evaluation & Ranking

Run pairwise comparisons, numeric ratings, and A/B tests on any type of content. Candor produces ranked leaderboards with win rates, agreement scores, and statistical confidence — whether you're comparing audio samples, images, videos, or text.

Pairwise Audio Comparison

Rank text-to-speech models, music tracks, or podcast intros by having reviewers compare pairs and pick a winner. Candor generates all possible pairs, randomizes presentation order, and produces a stack ranking with win rates and inter-rater agreement.

Image Quality Ranking

Compare UI mockups, product photos, or AI-generated images side by side. Reviewers pick winners in randomized pairs, producing a ranked leaderboard. Useful for design reviews, asset selection, and visual quality benchmarking.

A/B Copy Testing

Test marketing headlines, email subject lines, or button copy by running pairwise comparisons across reviewers. No files needed — just text labels. Get ranked results with agreement metrics to pick the copy that resonates most.

Rating on a Scale

Have reviewers rate design mockups, prototypes, or creative assets on a numeric scale. Get per-item averages, standard deviations, and distributions. Useful when you need absolute scores rather than relative rankings.

Video Quality Scoring

Score video content, animations, or generated clips on quality metrics. Reviewers watch and rate each video, producing aggregate scores with inter-rater reliability. Supports mp4, mov, avi, mkv, and wmv formats.