Training AI Through Teamwork

Creating a Balanced Human–AI Review Process

THE PROJECT

Overview

CuraJOY is a nonprofit tech company making behavioral and mental health support more accessible through gamification, self check-ins, and AI-powered tools.


In behavioral health, Board Certified Behavior Analysts (BCBAs) often have only 6–12 billable hours to complete a Functional Behavior Assessment (FBA)—far less than ideal. The process is fragmented and time-consuming.


MyCuraJOY’s B2B platform streamlines this with automated behavior labeling, which is already over 90% accurate. To maintain accuracy and continuously train the AI, human-in-the-loop collaboration is essential.

CuraJOY is a nonprofit tech company making behavioral and mental health support more accessible through gamification, self check-ins, and AI-powered tools.

In behavioral health, Board Certified Behavior Analysts (BCBAs) often have only 6–12 billable hours to complete a Functional Behavior Assessment (FBA)—far less than ideal. The process is fragmented and time-consuming.

MyCuraJOY’s B2B platform streamlines this with automated behavior labeling, which is already over 90% accurate. To maintain accuracy and continuously train the AI, human-in-the-loop collaboration is essential.

CuraJOY is a nonprofit tech company making behavioral and mental health support more accessible through gamification, self check-ins, and AI-powered tools.


In behavioral health, Board Certified Behavior Analysts (BCBAs) often have only 6–12 billable hours to complete a Functional Behavior Assessment (FBA)—far less than ideal. The process is fragmented and time-consuming.


MyCuraJOY’s B2B platform streamlines this with automated behavior labeling, which is already over 90% accurate. To maintain accuracy and continuously train the AI, human-in-the-loop collaboration is essential.

My Role

Design Lead

Led the design efforts across product research, user interviews, rapid prototyping, and usability testing.

Team

Engineering Lead

Billy Franklin

Researchers

Haley Scheer

Jessica Peng

Laura Heppell

Timeline

Apr 2025 - Jul 2025

Results & Impact

25%

Fewer reviewer errors after adding task visibility tools and confirmation prompts

90%

AI label accuracy maintained through human-in-the-loop checks, preventing drift

20%

boost in cross-team trust scores after showing reviewer names and vote results in tasks

Guided, step‑by‑step tasks so no feedback is missed

Multi‑staff voting across seniority levels to refine labels

Feedback creates an auditable trail and continuously improves AI accuracy

REQUIRNMENTS

Project Scope

Design a flow for human-in-the-loop moderation of AI-flagged chats and labels (or missing labels).

The system should:

  • Allow BCBAs of all experience levels to vote on the correct response and labels.

  • Enable participants to leave notes explaining their decisions.

  • Assign each task to 5 reviewers, with the system tallying votes to determine the final result.

Use these results to further train curaJOY’s AI for higher accuracy.

The Challenge

  • Guarantee full review and correction of every label and conversation.

  • Build a feedback loop for peer learning while training the AI.

RESEARCH

Bridging The Seniority Gap

From the research team’s insights and my past interviews with BCBAs, I learned that review tasks typically start with less senior staff, while senior staff handle final approval. This informed my decision to design a feedback loop that accounts for these seniority gaps.

From the research team’s insights and my past interviews with BCBAs, I learned that review tasks typically start with less senior staff, while senior staff handle final approval. This informed my decision to design a feedback loop that accounts for these seniority gaps.

Competitor Analysis

To explore current labeling capabilities, I reviewed tools like Prodigy, Labelbox, and Encord . These platforms are easy to use and feature-rich—though many functions are beyond our current MVP needs. Key takeaways include:

  • Task-by-task flows help users focus and improve accuracy.

  • Quick thumbs-up/down actions can speed up the review process.

How might we create a review flow that ensures every task is caught and results stay accurate?

DESIGN

First Round Of Design

For the first draft, I focused on the label review flow. I created a simple list view of tasks and a task flow that requires users to claim a task before their edits can be saved.

For the first draft, I focused on the label review flow. I created a simple list view of tasks and a task flow that requires users to claim a task before their edits can be saved.

For the first draft, I focused on the label review flow. I created a simple list view of tasks and a task flow that requires users to claim a task before their edits can be saved.

Key Features

  • Dropdown menus for selecting correct labels

  • Auto-prompt for notes whenever a label is changed

  • Automatic scroll to the next task once the current task is updated and saved

  • History for record retract

Key Features

  • Dropdown menus for selecting correct labels

  • Auto-prompt for notes whenever a label is changed

  • Automatic scroll to the next task once the current task is updated and saved

  • History for record retract

USABILITY TESTING

Method & Goals

Method

30-minute sessions, testing two main task flows for editing AI labels.

Goals

Identify usability issues and gather feedback on the process of providing input to train the AI.

To quickly uncover pain points, I ran usability interviews with our BCBA consultants using an early prototype of the flow.

To quickly uncover pain points, I ran usability interviews with our BCBA consultants using an early prototype of the flow.

Test Results

What Worked

  • Simple label changes via easy dropdown menus

  • Easy access to iteration history

Needs Improvement

  • Unclear who has reviewed each task

  • No visibility into voting results

  • No incentive for reviewers to leave feedback for AI training

  • Auto task-switching goes unnoticed, causing confusion

User Quotes

"I think I saved the changes already, why does this log still says To Review?"

" Are leaving notes or giving thumbs up or down optional? I don't think people will voluntarily do it if it is not required…"

"I think the history button is goo enough to help me go back and find previous revisions."

"To review or view the task, what is the difference? I am confused…"

"This task list is very clear, I have no questions about it."

DESIGN ITERATION

User-Drive Redesign

Using direct feedback, I fine-tuned key features to better fit user needs. After a few rounds of iteration and critique, here’s how the designs evolved.

WHAT I LEARNED

Bridging People And AI

I learned that having clear steps for both junior and senior reviewers keeps things running smoothly and ensures quality. Creating feedback loops—and giving people a reason to participate—not only helps train the AI, but also helps reviewers sharpen their own skills. I also saw how simple transparency, like showing who reviewed a task and what the votes were, builds trust in the process. And those small touches, like pop-ups or a quick “next task” confirmation, go a long way in keeping people from getting lost in the flow.