March 2026 Articles Robotics Face Tracking 6 min read

How Face Tracking Works in Nova

Face tracking is one of the core systems that makes Nova feel present rather than static. Instead of sitting motionless like a speaker on a desk, Nova can detect where a person is, follow them with her neck and torso, and keep attention focused in a way that feels much more alive. That physical response is one of the biggest differences between a desktop robot and a normal voice assistant.

At a basic level, face tracking means using a camera to detect a person in view, measuring where that face sits inside the frame, and then converting that information into motion commands for the robot. In practice, though, doing this well is much harder than it sounds. A system that simply reacts as fast as possible often looks jittery, nervous, or mechanical. A system that reacts too slowly feels disconnected and unresponsive.

Nova is designed to sit in the middle of those extremes. The goal is not just to detect a face, but to create smooth, believable attention. That means the tracking system has to balance speed, stability, and expression.

Step 1: Detecting the face

The first stage is computer vision. Nova uses a camera feed to locate a face inside the image. Once a face is found, the software calculates a target point, usually the center of the detected face box. This gives the robot a clear reference for where the person is relative to the camera view.

If the face is near the middle of the frame, Nova is already looking in roughly the right place. If the face is off to one side, too high, or too low, the system treats that offset as an error that needs correcting.

The important part:

Nova is not just checking whether a face exists. She is constantly measuring how far the face is from centre, then using that error to decide how much movement is needed.

Step 2: Turning screen position into motion

Once the face position is known, the next job is converting image-space error into robot movement. If the face is left of centre, Nova needs to rotate toward it. If it is high in frame, Nova may need to tilt upward. This sounds simple, but mapping pixels to motor motion has to be done carefully.

A robotic joint does not understand “the face is 140 pixels to the left.” It understands servo positions, speed, acceleration, and movement limits. So the software acts as a translator between the camera and the hardware.

In Nova, the most important tracking axes are usually:

Neck yaw for left and right movement
Neck pitch for up and down movement
Torso yaw on some versions to share the horizontal movement load

Splitting attention between neck and torso can make the movement feel more natural. Instead of one joint doing everything, the body can support the gaze, which creates a more grounded and lifelike look.

Step 3: Smoothing the movement

Raw detection data is noisy. A face detector may shift slightly from frame to frame even when the person is standing still. If a robot copies every tiny variation directly, the result is visible shaking. This is one of the fastest ways to ruin the illusion of life.

To avoid that, Nova uses smoothing between what the camera sees and what the motors do. Instead of snapping instantly to every tiny change, the system blends movement over time. That makes the response feel deliberate rather than twitchy.

This is where tuning becomes critical. Too much smoothing and the robot feels sleepy. Too little and the robot feels unstable. Good tracking is about hitting that narrow band where the motion still feels responsive, but visually calm.

Step 4: Deadzones and stability

Another important trick is using a deadzone. A deadzone is a small area around the centre of the image where the robot deliberately chooses not to move. Without it, Nova would keep making tiny corrections even when the face is already basically centred.

That matters because humans do not hold perfectly still either. If somebody is talking, breathing, shifting their weight, or slightly changing posture, the detector will see micro-movements constantly. A deadzone prevents the robot from overreacting.

In practice, this gives Nova a more confident presence. She looks engaged, but not frantic.

Step 5: Speed and catch-up behaviour

A good tracking system should not move at one fixed speed all the time. If a person is only slightly off centre, small and gentle adjustments look best. But if somebody suddenly walks across the room, the robot needs to catch up much more quickly.

That is why face tracking systems often use variable response. Small errors produce small movements. Large errors produce stronger corrections. This helps Nova remain stable when people are near centre, while still being capable of snapping attention back onto a target when needed.

In other words, the robot should feel calm when it already has you, and decisive when it has lost you.

Why this matters for presence:

People are extremely sensitive to timing. Even simple changes in how quickly a robot reacts can make it feel more attentive, more awkward, or more alive.

Step 6: Mechanical limits

Software is only half the story. A real robot also has physical constraints. Servos have travel limits, the neck has safe ranges, and different body designs change how much movement is possible before something begins to look strained.

Nova’s tracking has to respect those boundaries at all times. That means the software clamps motion to safe ranges and avoids asking the hardware to do impossible positions. This becomes even more important when combining multiple joints, such as neck yaw and torso yaw, because the final body language needs to stay believable as well as safe.

Why face tracking changes the feel of the robot

Face tracking is not just a technical feature. It changes the emotional quality of the interaction. When a device can orient itself toward you, maintain attention, and respond physically to where you are in the room, it stops feeling like passive software and starts feeling like an agent with presence.

That is one of the main ideas behind Nova. The goal is not to create a screen with a voice. The goal is to create a desktop robot companion that feels physically engaged in the conversation.

Even small movements matter here. A subtle turn of the neck, a slight correction upward, or a shared movement between torso and head can make the difference between “this is playing audio” and “this is paying attention.”

Where Nova goes next

Face tracking is only the foundation. Once a robot can reliably detect and follow a person, that tracking can be combined with voice conversation, gesture timing, idle behaviour, and memory systems to make the interaction feel much richer. A robot that sees you, listens to you, and physically responds to you creates a very different experience from traditional assistants.

That is the direction Nova is built around: affordable personal robotics that feel expressive, interactive, and real enough to belong on a desk instead of only in concept videos.

← Back to Articles Build a Nova