UX as quality control: Building the interface that trains Sky's ML Model

Outcome

UX as the quality-control layer for the dataset

Over a two-week sprint, my scope was the capture interface, the single surface where users and the model's data requirements met, within a multi-stage ML pipeline built alongside a small ML engineering team. Treating every screen as a data-quality decision (not just a usability one) lifted training-footage consistency, reduced abandonment at known failure points, and reframed consent as participation.

Positioning variability dropped

Ready, Countdown, GO! gave users time to enter frame and settle, so videos reached the model in the correct starting position more reliably.

Recovery without abandonment

Errors mapped to pipeline stages (camera, network, upload, ingestion) gave users the right recovery action instead of a generic dead end.

Consent as participation

Framing consent as training-the-model participation, not legal friction, directly influenced dataset eligibility and opt-in rates.

Challenge

The problem space

Sky was building a machine learning system that needed real people to record specific body movements to train a computer-vision model. The tool was a TV app running on the Sky Live camera. My job was to design the capture experience, where every UX decision directly determined whether a recording could be used for training or had to be thrown away.

The interface was, in effect, part of the pipeline.

How the system works

Pipeline first, interface second

Understanding the technical architecture was essential to designing an effective experience. The system works end-to-end as five sequential stages, with UX sitting at the very top. The clarity of the capture interface determines whether the data that reaches the model is usable.

Authentication

Secure sign-in, with consent as a hard gate.

Task assignment

Modular task: name, duration, instructions.

Capture

Sky Live camera, adaptive resolution and rate.

Where the UX scope sat

Upload

Asynchronous, event-driven upload to cloud storage.

ML ingestion

Final destination of every recording.

What the Model needs

The constraints behind every screen

The interface fed a custom classifier built on a third-party pose-estimation model, which maps the body as a set of 3D landmarks but doesn't know what a plank is, our model learned positions from that data, so every recording had to clear the upstream model first. That upstream model drops low-confidence frames, needs full-body visibility, works best side-on, and defaults to one person.

Every task instruction is a landmark requirement in disguise.

Minimum confidence threshold

→ Camera-dominant layout. Users self-check positioning before recording so more frames clear the threshold.

Full body + side-on angle

→ "How to do the task" reference image. Teaches the framing the model needs, not just the movement.

Preparation time required

→ Ready, Countdown, GO! sequence. Users settle into the pose before capture, so the model gets high-confidence landmarks from frame one.

Single-person detection

→ "Plank, 1 person" task label. Task metadata mirrors model configuration, not just user-facing copy.

The design

The capture flow

Seven states designed to reduce uncertainty, improve data quality, and give users confidence at every step.

Sky Live ML capture, initial state. Live camera preview dominates the left, task panel with instructions and controls on the right.

Screen 1 · Press record to start

The initial screen sets the split-screen layout: ~80% live camera preview, because what the camera sees is what the model trains on, and a persistent right-hand panel with task name, duration, and instructions. "How to do the task" is available as a pull, not a push, so confident users go straight to Record.

Reference image showing the correct plank position as a visual data-quality guide.

Screen 2 · How to do the task

Tapping "How to do the task" replaces the live feed with a reference photograph of the correct pose. Showing the ideal position teaches users what a good recording looks like before they make one, guidance that serves the pipeline as much as the person.

Ready! state. Large overlay prompts the user to position themselves in frame.

Screens 3 to 5 · Ready, Countdown, GO!

Borrowed from sports timing: "READY!" gives time to enter frame, the countdown gives time to settle, "GO!" removes ambiguity about when recording is active. Each phase prevents users from starting a recording out of position.

Finished state. The final frame is shown with Re-take and Submit options.

Screens 6 and 7 · Finished and Submitted

After recording, users see the final frame with two clear options: Re-take or Submit. Knowing they can try again means users are more likely to produce a higher-quality recording on the second attempt than to submit a bad one under pressure. "Your video is submitted" closes the loop.

Key design decisions

Trade-offs between user friction and dataset integrity

Camera-dominant layout

The camera view is what enters the ML pipeline. Keeping it central helps users self-correct their positioning in real time. Impact: reduced off-frame recordings; users treat the camera as their primary feedback.

Three-phase launch (Ready, Countdown, GO!)

Users need time to move into position before recording. A cold start produces off-frame or out-of-position data. Impact: more consistent body positioning at the start of recordings.

Pipeline-aware error states

Generic errors cause abandonment. Errors mapped to pipeline stages (camera, network, upload, ingestion) give users the right recovery action at each failure point. Impact: higher recovery rate from camera, network, and upload failures.

Consent as purposeful participation

Consent is a hard gate before any recording. Framing it as participation in training the model, not legal friction, lifts opt-in and reduces drop-off. Impact: higher opt-in rate; users enter the flow with a clearer sense of why their data matters.

Reflection

What I took from it

This was a small, tightly scoped project, two weeks, one interface, but the constraints were real: a model with a hard confidence threshold, landmark requirements that made every task instruction a data-quality requirement in disguise. The user wasn't simply completing a task, they were an operator in a pipeline, and their accuracy directly affected the model being trained downstream.

Small scope, real constraints

Rigour doesn't require a big system, it requires understanding what the machine actually needs.

Mapping invisible failures

The highest-impact decisions mapped invisible system failures to clear recovery actions.

Teaching the user what the machine needs

Reference imagery, launch timing, and Re-take agency all implicitly taught users what good training data looks like.

Designing for reuse, not just one model

The task structure was deliberately generic. New ML use cases could plug in by defining a new task, not requesting a new interface, the same pattern that let one capture tool serve multiple ML projects.

The interface is part of the pipeline. Every design decision is a data-quality decision.