Automation’s appetite for human traces

A new era of agents built on what was previously left unwritten

Mar 18, 2026

Today, models are trained on a corpus of knowledge, composed of what humans have chosen to record and the traces they’ve left behind. Building agents that work like experts requires codifying tacit expertise: procedural knowledge that is so internalized and actionable that experts themselves struggle to articulate it.

But externalizing tacit knowledge is an open problem: when processes are documented at all, they tend to be retrospective and incomplete. With more work gone digital, the answer to unlocking the next generation of model capabilities may lie in going straight to the source: the software tools where expert work actually happens.

Automation’s Appetite for Human Traces

By Hamidah Oderinwale

In The Age of Extraction, legal scholar Tim Wu describes how tech platforms shifted from fueling economic activity to extracting value from it, becoming “some of history’s most advanced tools for extracting as much as possible: data, attention, and profit margins.” The pattern for ‘context harvest’ is the same one that defined the attention economy from social media: a two-sided platform, a consumer product on one side and a market for data on the other.

Today, we’re entering a new attention economy. If the digital economy of the last decade was about extracting what we consume on social media platforms for advertisers, the digital economy of this decade and the ones to come will be about extracting our interactions to provide to models as context. Context is metered by the token, and developers are tasked with sourcing and compressing the best data to optimize it. Models today are generalists, but building specialists requires understanding experts and what they do. Given this, the defining data type of this new era is procedural, capturing not just what experts produce but how they work.

But procedural data can’t be scraped en masse off the Internet: achieving models capable of automating that work will depend on first capturing how experts do it. Companies are staking their investments on labor automation for economic gain. Some envision the ideal world of agents as assistants for hire, tools capable of doing the things that humans don’t want to do; others imagine a future of humans as managers and models as laborers. The first is a narrative written on agency and the other on control. In both cases, though, the crucial question is how, exactly, companies will get the data required for the next breakthrough, and who will benefit.

Formalizing procedures

There are many problems in the world that could, in principle, be solved by machines—if only we knew how to represent them formally and verify when they had actually been solved. In the current AI development paradigm, such representation starts with data collection. For anyone looking to capture and formalize procedural data, two conditions have to hold. The first is capturing interactions at the right level of abstraction, close enough to preserve meaningful context but abstracted enough to filter noise rather than recording every keystroke. The second is inferring intent from those traces with enough fidelity to distinguish the judgment behind a decision from the action itself. This is a major challenge: even when experts are asked to articulate reasons why they work a certain way, what they produce is a lossy reconstruction, not a faithful account of the judgment itself, let alone the context that went into it.

These two ingredients, context and judgment, are both crucial elements for labs and companies wishing to use agents to replicate human work. To automate a role, you need to clearly define what it is and how to tell when it has been done well. But for many roles, like the one of the scientist, the philosopher, or the statesman, the criteria for success are fuzzy, the effects of your work show up on long time horizons, and the impact of any given decision you take is hard to trace.

Companies are betting that procedural data can help define these fuzzy criteria more clearly. The most prominent example is Mechanize, launched in April 2025 with the explicit goal of automating software engineering. Mechanize’s approach starts with building training programs for models to help them learn how to approach problems like an engineer, rather than simply feeding them a ton of code and training them to predict the next best function. They do this by mining real examples from public documentation and building bespoke sandboxes using this data, then putting their models inside these simulated environments.

Expert context and judgments are essential to this process because they help create realistic simulations and highlight essential context which current models are missing. Take the example of an engineer building a program to track new apartment listings as they’re posted. They run it, but nothing comes back, even though there are no errors. The code itself isn’t broken; the site has just cut them off for making too many requests. A model trained only on code would be left trying a slew of solutions, relying on sporadic human input to calibrate its progress. A model trained on expert trajectories could instead infer, without an explicit error message, that it should space out its requests and verify that new listings are populating the database without a human in the loop. More broadly, environment development often looks like taking passive observations of workflows and software bugs, whether sourced from public repositories and forums, generated synthetically, or modeled in bespoke sandboxes, and giving them structure as reusable, gradeable tasks.

More than half of YC’s Spring 2025 cohort are agent startups, each one trying to build coworkers as a service. Their success will depend on whether engineers can do for tacit-knowledge professions what Stack Overflow did for software: generate or capture the “softer” documentation that doesn’t yet exist.

Sourcing interaction

Just as companies in the platform economy sought monopoly over our time in-feed, companies seeking to automate expertise have clear incentives to create “data moats.” In the current regime, monopoly appears inevitable without infrastructure for ‘open human feedback’ to pool and share data like this for common benefit. Companies with the largest networks and broadest ecosystem reach will have the strongest data moats. Once a tool internalizes a user’s workflows and habits, it stops being interchangeable and becomes a personalized environment. At scale, this induces lock-in.

AI systems that learn a person’s patterns become difficult to leave because human feedback data is not currently standardized or portable, and that embedded context cannot easily transfer to another model. The platform becomes the canonical home for a user’s context, making switching costly. Another problem is that the degree of personalization scales with how fine-grained the capture is. Traditional privacy-preserving mechanisms rely on the size of the user pool to anonymize, but as models are delegated to more sophisticated and niche tasks, existing mechanisms will have to adapt to protect sensitive context across smaller and more specialized crowds.

The products that will win will be those with the broadest existing data advantage, able to model different users from what they already have. Instagram’s ‘for you page’ is a function of history and culture, a model of who you are and what you’re interested in. In the case of LLMs, personalization is a function of time and the diversity of what a model has seen.

The authors of a paper on General User Models describe an app-agnostic system for executing on the above. The system observes a user’s computer interactions at the OS level over time, translates these observations into propositions about what a user is working on and their knowledge, and then allows this context to be exported and integrated into any application. A user asking ChatGPT to ‘help me with this section’ wouldn’t need to manually reconstruct context because the model already knows what they’re working on. Their work builds on a vision of ‘global memory’ where, rather than each app building its own siloed understanding of the user, users benefit from personalization while maintaining sovereignty over their data.

Privileged access

As more regulated industries adopt AI at the application layer, the interfaces that earn user trust will be the ones that earn their capture. Harvey is building Cursor for law, OpenEvidence and Abridge are doing the same for clinicians, and Hebbia and Rogo are building AI copilots for investment bankers, each backed by hundreds of millions of dollars in funding. As these companies seek to capture more of their respective industries, new questions arise: can this privacy be assumed or guaranteed, and what happens when it is promised, maybe even protected by NDAs, but then breached through subpoenaed chat logs? This data need not come only from the platforms themselves; any company building on these models exposes users’ data to the labs underneath, whether they know it or not.

What happens when these concerns are disregarded? Even with little legal precedent for data lawsuits of this kind, companies are being sued and paying the costs for negligence. Data moats present the commercial race as one to win by accumulating the most users possible and extracting all that can be taken. Enterprise customers, the ones willing to foot the highest bills, won’t adopt tools they can’t trust. Otter.ai is facing a class action alleging that its notetaker recorded non-users’ conversations and used them to train its models without consent. Granola, another popular AI meeting notetaker dedicates a page to its data policies: by default, user data is used to train its own models, and while any user can opt out, the company offers a lack of detail on what its anonymization process actually involves.

While many B2B companies opt to build AI products with their own UIs that can be bought as a package, large labs instead sell task forces: they embed their engineers and custom-built agents directly into customers’ teams, taking their models straight into those workflows. In a recent deal, Goldman Sachs essentially hired Anthropic’s forward-deployed engineers to build agents for work that would traditionally fall to first-year analysts. At the end of last year, OpenAI took a stake in Thrive Holdings, a sister company of Thrive Capital, which notably invested in OpenAI in 2022. Thrive Holdings’ goal is to embed AI into corporations, drawing on its portfolio to automate financial and IT operations. Access to privileged financial and digital records requires trust. The future of embedded AI will depend not only on model capabilities or speed of integration but on whether privileged information can stay privileged while the work gets automated.

The platform playbook

Companies with existing user bases must consider how to monetize what will be scarce: data of the process. In this regime, the product becomes the mechanism for data capture, and the data becomes the product. In late 2025, LinkedIn updated its terms to train AI on user data by default. Figma faced a lawsuit for allegedly auto-opting users into using their data of what they were building on the platform for training. Anthropic, positioning itself as the privacy-focused alternative, updated its consumer terms in August 2025 to train on conversations unless users opted out.

As users interact with increasingly sophisticated tools, and model developers treat those interactions as a continuous training stream. The adage “if you’re not paying, you’re the product” is being put into practice.

While labs like Anthropic and OpenAI built AI-native from the start, companies that learned to leverage existing data infrastructure have proven that the model alone is no longer the moat for company advantage. Google is the standout example. Gemini lagged behind competitors for years, but as the company leaned into its data flywheel, performance caught up. Through Chrome, Google benefits from nearly a decade of large-scale telemetry on how billions of users navigate the web. No other companies currently have this advantage, making privacy-preserving infrastructure for pooling interaction data all the more urgent against the threat of monopoly.

It is clear that platforms, not just good models, fuel user adoption and use, which in turn fuels more user data. The poster child for this dynamic is Cursor. Though the company began by building IDEs, they are now leveraging their proprietary data to build code models themselves. On paper, Cursor is no more than a VSCode wrapper with a number of models callable at prompt’s notice. But at a $29 billion valuation, Cursor has built a defensible moat through proprietary developer feedback data, delivering near-frontier-level coding performance by learning directly from the procedural traces developers leave as they work. Ownership over the interfaces where work happens will be key for domain-specific models where the knowledge gap is tacit and more difficult to verify.

Perpetuated cycles

As agent platforms mediate an increasing share of work, many will also be tasked with capturing interactions and distilling them into procedures; observing enough usage in context to let structure emerge rather than relying on hard-coded taxonomies that define job descriptions. The hard part isn’t automating the cell edit or the slide deck in isolation—it’s building systems that know when to act and can string together easy tasks into artifacts that respond to what’s actually needed. Buried in Terms of Use and encoded in data deals worth millions will be the rights to who owns our expertise. It will be a key lever in the new economy, and we have yet to build the infrastructure to govern it well.

🌀 microdoses

Tired of AI pretending to be human? Now you can LARP as AI!
Why can’t AI write more than okay poetry? Our very own Jasmine Sun investigated for the Atlantic.
Thanks to all who came to the book launch we hosted for The Irrational Decision!

💝 closing note

Got more takes on tech and labor? Pitch us!

Same Page SF

Mar 18

Thoughtful and clarifying. Thank you! The point about knowing when it's solved is especially interesting and makes me wonder about fields, like writing, that have no "right" answer. Not only would one writer interpret a prompt or idea differently than another writer, but that very same writer might make different decisions on Tuesday versus Wednesday. Makes me oddly optimistic, actually, for creativity as particularly human.

Discussion about this post

Ready for more?