Verkada announced Wednesday that NVIDIA has become both a technical collaborator and an investor as the company tries to scale what it calls physical AI across its cloud-connected security and operations platform.
The San Mateo company said the collaboration is meant to improve AI-powered video search, multimodal embeddings, vector retrieval, synthetic data generation, and model training across a network that now spans more than 2.4 million devices in 170 countries. Verkada says its systems are used by more than 30,000 organizations, including schools, hospitals, retailers, manufacturers, and more than 100 Fortune 500 companies.
The announcement is not just another NVIDIA investment headline. It shows how the physical AI push is moving from robots and autonomous vehicles into fixed building infrastructure: cameras, access-control systems, alarms, environmental sensors, intercoms, and workplace tools that already sit inside real organizations.
What NVIDIA brings to Verkada
Verkada said it is using NVIDIA Cosmos world foundation models and NVIDIA’s Physical AI Data Factory Blueprint to strengthen the model and data pipeline behind its video analytics. NVIDIA describes that blueprint as a reference architecture for large-scale data processing, curation, synthetic data generation, reinforcement learning, and evaluation for physical AI models, including vision AI agents, robotics, and autonomous vehicles.
For Verkada, the immediate focus is video understanding. The company said the work has already improved the mean average precision of its AI-powered search by 68% for spatial-temporal understanding. In plain terms, that metric points to how well the system can connect what appears in camera footage with where and when it happened.
That matters because security teams do not usually search video for a single static object. They look for sequences: a person entering a restricted area, a vehicle near a loading dock, a fall on a factory floor, a possible theft event in a store aisle, or a worker missing required protective equipment. Verkada says it is also developing a multi-model search-agent architecture and exploring reasoning models for complex, unstructured scenarios.
Security cameras are becoming AI endpoints
Physical AI is often discussed through the language of machines that move: humanoid robots, autonomous vehicles, drones, warehouse automation, and industrial arms. Verkada’s announcement points to a quieter version of the same shift. In many workplaces, the first broad deployment surface for real-world AI may be the security camera that is already installed above a doorway, hallway, register, classroom, warehouse bay, or production line.
That changes the role of video infrastructure. A traditional camera records footage for review after something happens. A modern analytics platform can index scenes, detect patterns, alert staff, connect events across cameras, and make footage searchable through natural-language queries. When tied to access control, alarms, intercoms, and sensors, the system starts to look less like a camera network and more like an operating layer for building events.
Verkada’s six product lines already cover video cameras, access control, environmental sensors, alarms, workplace tools, and intercoms. NVIDIA’s role is to improve the model side of that stack, including synthetic data pipelines that can help train systems for events that may be rare, sensitive, expensive, or impractical to collect repeatedly in the real world.
Synthetic data is the technical hinge
Synthetic data is especially important for physical AI because real-world environments are messy. Lighting changes. People block camera views. Equipment moves. A forklift, school hallway, emergency exit, and hospital corridor all produce different visual conditions. Rare safety events are hard to capture at scale, while privacy rules and operational sensitivity can limit how freely companies use real footage for training.
NVIDIA’s physical AI tools are designed to help teams generate and evaluate data for systems that need to understand space, motion, and context. In Verkada’s case, that could mean better models for video search, incident detection, and scene retrieval without relying only on manually labeled footage from customer environments.
The hard part is evaluation. A model that finds a red jacket in a hallway is useful. A model that confidently misreads a safety event, misses a person in a restricted zone, or produces too many false alerts can create operational noise or real risk. As these systems move from investigation tools toward proactive decision support, buyers will need evidence not only that the search is faster, but that it is accurate enough for the environment where it is being used.
The governance questions follow the cameras
The same features that make AI video analytics useful also make them sensitive. Schools, hospitals, factories, and retailers have legitimate safety and operations needs, but they also handle footage of students, patients, workers, visitors, and customers. Better search and retrieval can shorten investigations, but it can also make surveillance more powerful and easier to expand.
For enterprise buyers, the practical questions are becoming more specific: who can search footage, what kinds of queries are allowed, how long video and derived metadata are retained, whether biometric features are involved, how alerts are audited, and whether employees or visitors are told when AI analysis is being used. Those questions are not separate from the product story. They are part of whether physical AI can be deployed responsibly at scale.
The NVIDIA collaboration gives Verkada more technical muscle for the next phase of AI-powered physical security. It also makes the category easier to judge. Physical AI is no longer only a robotics-lab term. It is becoming a practical enterprise infrastructure question: what happens when cameras, sensors, and building systems can understand more of the physical world they watch?
Sources: Verkada announcement via PR Newswire; NVIDIA Cosmos; NVIDIA Physical AI Data Factory Blueprint.