Written by Brian Kaatz, Lead Engineer at Whisker

Meet WhiskerVision, a first-of-its-kind vision system that turns everyday moments into meaningful insights, helping pet parents understand their cats like never before.

“I’ve been using WhiskerVision for months and couldn’t be more thrilled,” says Jacob Zuppke, President and CEO of Whisker. “I have two tortie brown Siberian cats that look nearly identical and my Litter-Robot® 5 Pro distinguishes between them confidently, even in the dark. I’m so proud of the Whisker Team for building such an impressive new set of features and products for the Whisker ecosystem to help pet parents everywhere!”

What is WhiskerVision?

WhiskerVision is Whisker’s AI-powered camera platform that features CatID facial recognition, an industry-first pet insights technology that uses facial features for precise identification, even between similar-looking cats. Whether you’re across town or across the world, CatID facial recognition helps pet parents see what their cats are up to, understand bathroom and activity patterns, and capture useful data.

WhiskerVision CatID facial recognition on Litter-Robot 5 Pro

Benefits of CatID facial recognition:

  • Identify individual cats in a multi-cat household. Link each of your individual cats to device activities for more accurate tracking and precise insights.

  • Check in on your cat anytime for fun and peace of mind. See your cats when they use the litter box or walk by their Litter-Robot®. 

  • View your cats’ elimination habits to spot changes early and show to your veterinarian with confidence.

  • Access data with ease, with litter box usage clips automatically vaulted as organized events in your secure cloud history for easy sharing.

How does CatID facial recognition work?

CatID facial recognition blends purpose-built robust hardware with a two-stage AI pipeline and neural network. Analysis happens on the device hardware, and event metadata and short clips are organized in the cloud for a smooth app experience.

Hardware you can trust

  • High-quality image sensor built for excellent low-light performance and natural color.

  • Wide field-of-view (FOV) optics tuned for whole-room coverage without heavy distortion.

  • Adaptive low-light exposure & IR for clear night scenes that are gentle on eyes.

  • Custom video pipeline & image-quality IP to deliver best-in-class clarity across lighting and motion.

  • Low-latency encode with a sub-20 ms video pipeline for responsive live view and snappy events.

  • Highly efficient compression using advanced H.264/H.265 codecs to keep quality high and bandwidth low.

  • On-device AI accelerators (NPU/DSP). Powered by an industry-leading vision SoC—NPU for AI inference, DSP for fast image/video processing—so there’s no need to stream raw video for processing.

  • Multi-sensor-ready architecture for future configurations and expanded coverage.

  • Secure, sandboxed firmware. Analysis runs on-device; when a cat is detected, the device records the moment and securely uploads the event (metadata + short clip) to the cloud. The camera does not keep a local video archive.

  • Privacy: Events and recordings trigger only when a cat is in the frame.

The AI pipeline 

  • Detection: Identifies cats in each frame and ignores everything else—built to handle motion, lighting changes, and different angles.

  • Identification: Creates a “signature” and matches it to your roster of pets to ensure that we update your app with individual cat identifications assigned to each event.

  • Event engine (device ➜ cloud): When a cat is detected, the device packages an event—lightweight metadata (who/what/when) plus a short video clip—and uploads it to our storage cloud, where it’s organized and indexed for your account. The app uses this to deliver reliable notifications and fast, rich event views across your devices.

Why build WhiskerVision like this?

Traditional computer vision leaned on manually crafted features (e.g., “five facial points,” color histograms). Real homes and real cats––especially for those households with similar looking cats—outpace those methods. That’s why WhiskerVision is built around modern, data-driven models to recognize each cat in your home. Read on for the benefits of modern deep learning.

Learned > hand-crafted features

Classical algorithms rely on manually generated features, whereas deep-learning models learn the best features from data. Manually generated features will miss characteristics that are important when trying to distinguish between any two cats. 

Think about it from this perspective: 2 brown cats at night are hard to identify even for a human. But a deep learning model can understand each face attribute and compute the weighting of these characteristics to ensure our model accuracy always gets the right cat, right. 

Robustness

Hand-crafted features are unreliable under changing lighting, camera angle, pose, or partial occlusion. For example, coat color can change under different lighting conditions, coat patterns can shift as the cat moves. Deep neural nets are far more resilient.

Continuous learning

Deep learning excels at recognizing cats of all breeds and appearances, capturing the rich textures and patterns that make each one unique. Unlike classical methods, which plateau in accuracy early, deep models continually improve as they see more diverse examples.

What is edge computing?

Edge computing means the device handles the heavy lifting, while the cloud organizes events for a seamless app experience.

Why this is better than cloud-only for our use case:

  • Privacy by design: Analysis is local; the device uploads an event snippet (metadata + short clip) when a cat is detected.

  • Low latency: Instant, on-device decisions create snappy live views and timely insights.

  • Resilience: If your internet blips, the camera buffers a limited number of recent events and retries upload automatically once you’re back online.

  • Bandwidth-smart: The system sends just the event, not hours of continuous video.

How we keep pet parents secure

Your home is your home. WhiskerVision is built with privacy and security at the core:

Designed for cats, not people

We never look for people or trigger on them. Events and recordings happen only when a cat is in the frame. However, the camera will record a person if they are in the frame with their pet.

You’re in control

You decide what gets saved to your device. 

Encryption in transit and at rest

All event uploads and app playback use modern TLS (SSL) over the internet, and stored event clips and metadata are encrypted at rest with server-side encryption using provider-managed keys.

The result

With WhiskerVision, you get peace of mind and useful health insights—without trading away privacy or drowning in notifications. It’s cutting-edge AI, thoughtfully designed for real homes and the cats who rule them. 

Check out Litter-Robot® 5 Pro — the new insightful self-cleaning litter box — to learn more!

three cats with Litter-Robot 5 Pro featuring CatID facial recognition