Training AI Through Teamwork
Creating a Balanced Human–AI Review Process
THE PROJECT
Overview
My Role
Design Lead
Led the design efforts across product research, user interviews, rapid prototyping, and usability testing.
Team
Engineering Lead
Billy Franklin
Researchers
Haley Scheer
Jessica Peng
Laura Heppell
Timeline
Apr 2025 - Jul 2025
Results & Impact
25%
Fewer reviewer errors after adding task visibility tools and confirmation prompts
90%
AI label accuracy maintained through human-in-the-loop checks, preventing drift
20%
boost in cross-team trust scores after showing reviewer names and vote results in tasks
Guided, step‑by‑step tasks so no feedback is missed
Multi‑staff voting across seniority levels to refine labels
Feedback creates an auditable trail and continuously improves AI accuracy
REQUIRNMENTS
Project Scope
Design a flow for human-in-the-loop moderation of AI-flagged chats and labels (or missing labels).
The system should:
Allow BCBAs of all experience levels to vote on the correct response and labels.
Enable participants to leave notes explaining their decisions.
Assign each task to 5 reviewers, with the system tallying votes to determine the final result.
Use these results to further train curaJOY’s AI for higher accuracy.
The Challenge
Guarantee full review and correction of every label and conversation.
Build a feedback loop for peer learning while training the AI.
RESEARCH
Bridging The Seniority Gap
Competitor Analysis
To explore current labeling capabilities, I reviewed tools like Prodigy, Labelbox, and Encord . These platforms are easy to use and feature-rich—though many functions are beyond our current MVP needs. Key takeaways include:
Task-by-task flows help users focus and improve accuracy.
Quick thumbs-up/down actions can speed up the review process.
How might we create a review flow that ensures every task is caught and results stay accurate?
DESIGN
First Round Of Design
USABILITY TESTING
Method & Goals
Method
30-minute sessions, testing two main task flows for editing AI labels.
Goals
Identify usability issues and gather feedback on the process of providing input to train the AI.
Test Results
What Worked
Simple label changes via easy dropdown menus
Easy access to iteration history
Needs Improvement
Unclear who has reviewed each task
No visibility into voting results
No incentive for reviewers to leave feedback for AI training
Auto task-switching goes unnoticed, causing confusion
User Quotes
"I think I saved the changes already, why does this log still says To Review?"
" Are leaving notes or giving thumbs up or down optional? I don't think people will voluntarily do it if it is not required…"
"I think the history button is goo enough to help me go back and find previous revisions."
"To review or view the task, what is the difference? I am confused…"
"This task list is very clear, I have no questions about it."
DESIGN ITERATION
User-Drive Redesign
Using direct feedback, I fine-tuned key features to better fit user needs. After a few rounds of iteration and critique, here’s how the designs evolved.
WHAT I LEARNED
Bridging People And AI
I learned that having clear steps for both junior and senior reviewers keeps things running smoothly and ensures quality. Creating feedback loops—and giving people a reason to participate—not only helps train the AI, but also helps reviewers sharpen their own skills. I also saw how simple transparency, like showing who reviewed a task and what the votes were, builds trust in the process. And those small touches, like pop-ups or a quick “next task” confirmation, go a long way in keeping people from getting lost in the flow.