Meta has paused an internal program that captured employee computer activity for AI training after reports that the collected data was left accessible inside the company, turning a controversial workplace-surveillance project into a data-security test case for AI agents.
The program, called the Model Capability Initiative, or MCI, was designed to collect examples of how Meta employees use computers so the company could train AI systems to navigate software more like people do. WIRED reported this week that the data included keystrokes, mouse clicks, and screen content from U.S. employees’ corporate laptops, and that an internal security notice described employee data across 45,000 Hive tables as exposed.
Those tables reportedly included full prompts and transcriptions, private conversations, people data, and performance-related information. Meta spokesperson Tracy Clayton told WIRED the company had designed the program with privacy safeguards and had no indication that the data was improperly accessed, but that Meta was pausing the program while it investigates.
Why Meta wanted the data
MCI sits at the center of a larger push to make AI agents useful inside ordinary software workflows. When Meta described the project in April, the company framed the collection as a way to give models real examples of mouse movements, button clicks, dropdown navigation, and other actions that are hard to learn from static documents or synthetic tasks. The Verge reported at the time that the tool ran in work-related apps and websites and captured mouse activity, keystrokes, and occasional screenshots.
That training need is real. AI agents that are supposed to operate business software, review code, move across browser tabs, or complete multi-step office tasks need more than text. They need examples of interface behavior: where people click, how they recover from errors, which menus they ignore, and how they move between tools such as code repositories, chat apps, calendars, documents, and internal dashboards.
But that is exactly why the dataset becomes sensitive. A screen recording or interaction trace is not just a productivity metric. It can capture passwords typed into the wrong window, health or family information opened between meetings, private workplace chats, unreleased product plans, customer data, source code, HR records, and the habits of specific employees. Even when a company says the data will not be used for performance reviews, the underlying collection can still create a rich behavioral record.
The access-control failure matters more than the pause
The most important part of the story is not simply that Meta workers objected to monitoring. More than 1,600 employees signed a petition opposing the tool, according to The Guardian, arguing that collecting and repurposing employee computer-use data raised serious concerns around privacy, consent, and workplace trust.
The deeper issue is that a dataset created to train AI agents appears to have become a broad internal data repository before the access model was ready. WIRED reported that Meta CTO Andrew Bosworth wrote internally that the implementation had fallen short of the company’s privacy-review standards and cited misconfigured access-control lists. If employee activity data is collected at this level of detail, the permissions system has to work before collection begins, not after a staff security notice exposes the gap.
That lesson applies well beyond Meta. Companies experimenting with AI agents often want to log real workflows because synthetic demos are too clean. Real work is messy, cross-application, and full of edge cases. The same messiness that makes it valuable for training also makes it dangerous to store casually. Agent-training data can combine application telemetry, screen context, text input, identity, timestamps, and business process details in one place.
What companies should learn from MCI
Any company planning to collect employee workflow data for AI training should treat it like a high-risk production dataset from the first day. That means opt-in rules where possible, narrow collection scopes, aggressive redaction, short retention periods, clear employee notices, separate handling for sensitive tasks, and audit logs that show who accessed what. It also means proving that the data is not available to unrelated teams simply because it sits in a shared analytics or data-warehouse environment.
Security teams should ask practical questions before approving these projects: which apps are in scope, whether screenshots are stored or transformed, how credentials and personal information are filtered, who can query raw traces, whether managers can connect the data to individual workers, and whether model-training pipelines preserve information that should have been stripped out upstream.
There is also a product-governance question. AI agents trained on internal workflows may eventually be asked to act inside the same business systems they observed. If the training data contains sensitive process details, undocumented workarounds, privileged internal access paths, or employee-specific behavior, companies need controls for both the dataset and the downstream model behavior. The risk is not limited to a leak; it also includes training an agent on patterns that should not be generalized.
Meta’s pause does not mean workplace data will stop being used to train AI agents. The pressure to automate office work is too strong, and companies building agents need realistic examples. The MCI episode shows the standard those projects will be judged against: not whether the AI use case sounds promising, but whether the organization can collect, restrict, audit, and explain the data before employees become the training set.