Everyone says that AI will replace white-collar workers, but I believe this claim ignores crucial information.

人人都说AI会替代白领，但我认为这种说法忽略了重要的信息 (中文在后面）

Jan 16, 2026

“AI will definitely replace white-collar workers; white-collar work is finished.”
This argument commits a classic error of time and game-structure reasoning: it treats a final outcome judgment as a judgment of the current state.

Everyone is talking about AI replacing white-collar workers, but I believe this judgment itself misses something extremely important.

Let me use an analogy.

Imagine a ninth-dan Go player playing against a beginner. The ninth-dan plays Black; the novice plays White. Of course, from overall strength and win probability, we all know Black has a huge advantage. But here is the problem: Black has only placed the very first stone, and you already declare that White has lost. Is that reasonable? Is that correct? No matter how weak White is, he hasn’t even made a move yet.

In Chinese, the word “game” (博弈) fundamentally refers to a process of play unfolding over time between Black and White. Even if Black eventually establishes a winning position in the midgame, that still requires at least 180 moves to unfold. And it is precisely in this long midgame—where Black is highly likely to win, yet the outcome is not yet settled—that a massive amount of important information is hidden and routinely ignored.

Over the past two years, I have listened to many startup ideas around so-called “AI-native software and systems.” I agree with much of their reasoning and internal logic. But I also feel that we can describe the direction we are all seeing far more clearly. That direction lies exactly in the reasoning I am about to explain. Although Black appears to be winning, it still must actually play out those 180 moves over time to establish that victory. This game is long, and the structure of human society cannot be reduced to a simplistic narrative like:

“A master arrives, plays one move, and wins.”

Such a narrative skips over time, skips over process, and skips over a vast amount of critical information. I believe that many truly meaningful AI-native systems will not land at the moment of final judgment, but precisely within this long, easily overlooked process of play.

Why upper-tier white-collar professionals are the backbone of society is not simply textbook knowledge

You must understand their role as institutional interfaces.

In modern, globalized industrial systems—highly complex systems by nature—the white-collar class spans both developing and developed countries and serves as the backbone of institutional operation. Changes to its production relations are, in essence, a long-duration, multi-round, high-risk game.

In such a game, every single move by either side carries concrete opportunities and costs. This is not merely about coding ability, generative text, problem-solving, or prompt writing. Right now, people are still copying and pasting in chat windows—so claiming that this ecosystem can already replace the white-collar ecosystem is plainly premature.

Take licensed professions such as lawyers and accountants as examples.

Their high training cost is not because they write more documents or memorize more rules, but because they occupy institutionally recognized positions of decision-making and responsibility.

The value of these professions rests on a deep, implicit social rule:

You can trust them, because if something goes wrong, they are responsible.

This “responsibility” is not abstract. It is institutionalized: professional discipline, civil liability, and even criminal liability.

For this reason, these professions do not demand one-off task execution ability; they require long-term training to become stable bearers of risk. Practitioners must make judgments under uncertainty and bear consequences in regulatory gray zones.

Many tasks appear transactional on the surface—bookkeeping, tax filing, drafting contracts, writing memos.

But what is truly expensive and truly scarce is the judgment embedded within them:

When must you clearly say “no”?
When can a “controlled exception” be allowed?
How should conflicts between rules be interpreted and resolved?
When risk cannot be eliminated, how do you assume it rather than evade it?

This kind of judgment is neither procedural nor templated. It is deeply human. Language models that remain at the level of “window-based interaction” do not occupy this institutional position of gray-area judgment.

From a broader perspective, these professions are institutional interfaces. They were not designed from scratch; they emerged through centuries of industrialization, through continuous friction and evolution between production and production relations—across Western industrial history and global economic rebalancing.

What is their core asset? Is it merely textbook knowledge? Of course not. It is:

reputation + track record

Track record is a time-based asset.

You must pass through enough real situations, mistakes, post-mortems, and disciplinary constraints to earn trust. In moments that are truly irreversible and life-defining, would you trust an instantly generated output—or a veteran expert with decades of proven judgment?

This is why many white-collar professions impose systemic barriers:

Exams
Licenses
Continuing education
Long-term training within firms or institutions

At their core, these barriers do only two things:

Filtering: minimizing unstable individuals in high-responsibility roles
Binding: binding individuals to professional disciplinary systems (punishable, accountable)

In other words, certification does not merely prove “you know”; it more importantly proves “you can be constrained by institutions.”

Many engineers and programmers—especially those from hard-science backgrounds—tend to hold overly simplified worldviews and are deeply accustomed to believing that problems have absolute right answers. But real-world games are far more difficult than Olympiad math problems that high-school graduates can solve.

“AI can’t replace white-collar workers because no one can take the blame.”

I believe this counterargument is also crude and incorrect. Designing blame-taking mechanisms is not difficult—digital signatures, human overrides, and so on. What is truly difficult is training a decision-making entity to be continuously constrained by institutions, repeatedly tested by history, and gradually trusted by society over long spans of time. That is precisely the hardest part of the game where Black appears to be winning, yet still must play out all 180 moves.

The road must be walked step by step; the stones must be placed one by one. As programmers, if we believe in this path, then the real opportunity lies in the process of the game—not in declaring “you’ve already lost, so let’s all lie flat.” That attitude just guarantees that, decades later, the true structural opportunities will already have passed you by.

What makes this AI wave truly “magical” is that it emerged as a full-domain technology

Teachers use it; students use it.

Law firms use it; accountants use it.

Programmers use it; novelists use it.

Finance uses it; medicine uses it—astonishingly, even outpatient doctors ask GPT.

It can write reports, revise contracts, generate structured documents, and even produce full formal languages end-to-end.

A technology that appears and immediately covers nearly all cognitive professions is extraordinarily rare in the history of technology. While I am not a professional historian, I have long studied the history of technology out of personal interest. Most major breakthroughs followed the opposite path: they first broke through a vertical domain, then slowly diffused outward. Steam power, electricity, computers, and the internet all followed this pattern. If I had to find a close analogy, the only one that comes to mind is the printing press.

This is why the intuition that “AI will replace all white-collar work” is so compelling—it arises directly from the visual shock of this full-domain coverage. I felt it myself. As a bilingual person who reads programming languages and formal languages, I suddenly saw all my linguistic interfaces covered at once. Every channel to the world seemed captured. For a moment, it felt omnipotent.

LLMs now command nearly all human language forms—and language is the universal interface of every industry. When a technology suddenly covers all interfaces, a natural illusion emerges:

Does this mean every profession attached to these interfaces will be replaced wholesale?

But this is precisely where the mistake lies. After 2016, my view of LLMs became increasingly grounded and skeptical.

I fully agree: AI’s coverage of white-collar work is real.

This “full-domain” nature exists not only because AI understands professional knowledge, but because language itself spans all industries. Contracts, reports, medical records, papers, manuals, code comments, meeting notes—most visible white-collar outputs are linguistic.

But coverage is not replacement.

The moment you move from “it looks like it can do the work” to “it replaces production relations,” complexity explodes. Nearly all discussions of AI replacing white-collar workers skip critical steps here. I have read countless analyses, videos, and startup narratives, yet almost none clearly explain the replacement mechanism itself. To do so, one must first answer a harder question:

What is the cross-industry, stable operating mechanism of white-collar work today?

White-collar work does not exist because people can write. Language is only the surface. The underlying logic includes:

How decisions are authorized
How risk is distributed
How responsibility is traced
How judgment is accepted under uncertainty
How errors are absorbed institutionally rather than causing systemic collapse

Only after clarifying this long-term, cross-industry operating mechanism can we ask the next question:

Can AI—and how—fully take over this mechanism?

Most “replacement” arguments stop at:

“It generates faster / better / cheaper.”

But jumping from improved generation to wholesale replacement skips dozens of steps: institutions, responsibility, trust, time, risk, governance, accountability, replay. Almost all are ignored.

The universal operating mechanism of white-collar work: transactional flow mixed with decision-making

White-collar work is neither pure decision nor pure execution. It is a remarkably stable structure:

Transactional processes densely interwoven with decision nodes.

By “transactional work,” I mean something very plain—even unflattering: tasks you do daily that require little deep thinking but must be performed repeatedly and accurately. In English, I would simply call them transactional tasks.

They make up the bulk of daily white-collar work:

Were the emails sent?
Were meeting notes written?
Was the revised version synced to everyone?
Was the report updated to the template?
Where is the approval workflow?
Is this the latest contract version?
Has the client confirmed?
Has legal reviewed it?
Were compliance comments incorporated?

You could argue that a white-collar worker spends 80% of their day on such tasks. They are repetitive, tedious, standardized, and look nothing like “high-value work.” This is precisely why LLM performance here feels so terrifying: email drafting, document editing, summarization, formatting—once you supply context, LLMs overwhelmingly outperform humans.

But if you observe closely, you will see that these transactional flows are saturated with non-explicit decision points. These are not “A or B” decisions, but micro-judgments embedded in execution:

Should this email be sent now?
Should this person be CC’d?
Should the wording be conservative or aggressive?
Will raising this issue now introduce unnecessary risk?
Should this be left ambiguous for the next phase?
Should this be escalated?
Should this be explicitly fixed in writing or left interpretable?

These judgments occur inside execution, not as separate decision events.

In other words, the true shape of white-collar work is this:

Continuous execution punctuated by small but irreversible judgments.

And that is the layer most replacement narratives fail to see.

LLMs are a no-blind-spot, crushing advantage for white-collar transactional workflows.

Across virtually every industry, as long as you provide enough context, specify constraints (even though prompts are “soft constraints”), and—importantly—stay within the boundaries of that context, without requiring the model to span the entire complexity of the full system engineering landscape (for example, without needing to “read the entire legal code”), then the output quality for transactional workflows is simply overwhelming. This is because a huge portion of white-collar time used to be spent on exactly these transactional processes—take legal statements, for instance, where every single word and every citation must be carefully assembled.

LLMs happen to land precisely on the core computational structure of white-collar transactional work:

language-type transactions = generating acceptable text / structured expression under constraints.

As long as you fill in the context, state the goal clearly, provide enough boundary conditions and formatting constraints (even if a prompt is only a soft constraint—I’ll explain later the difference between soft constraints and hard constraints; only real programming and executable formal languages truly achieve what I mean by “hard constraints”), and keep the task within a “context-closed” scope (i.e., not demanding global system-level complexity), then this category of output is almost naturally the LLM’s home turf.

It can generate large volumes of acceptable text in very short time: consistent genre, stable tone, complete structure, standardized formatting, uniform citation style. And it can iterate repeatedly based on your instructions—localized rewrites, template alignment, gap-filling, denoising and compression.

In traditional white-collar systems, this type of work occupies an enormous share of time:

Lawyers writing statements, memos, clauses, correspondence
Accountants writing report footnotes, explanations, reconciliation notes
Consultants writing decks, analytical summaries, meeting notes
HR writing job descriptions, performance reviews, policy documents
Product managers writing PRDs, alignment emails, release notes

In the past, this work looked “hard” and consumed huge amounts of time. But the “hardness” often wasn’t about deep thinking—it was the kind of difficulty that comes from “making the homework look polished and error-free.” Deliverable, readable, reusable, reviewable, traceable—the tone of every sentence, the formatting of every citation, the risk posture embedded in every phrase, the completeness of every section—each of these consumes human attention and energy.

Humans are, in a sense, consciousnesses better suited for creativity, passion, and imagination. In contrast, the LLM’s advantage is that it is essentially an extremely powerful language-structure generator. For “mass production within constraints,” it has near-physical scale advantages: no fatigue, parallelizable, endlessly rewritable, alignable, able to produce multiple versions simultaneously—while driving the “linguistic friction cost” of manual white-collar labor down to near zero.

At this point, LLMs have effectively smashed the marginal cost across the entire chain of “writing → organizing → formatting → revising → paraphrasing → summarizing → templated expression.” Many roles historically had to invest time not because these actions were inherently “high value,” but because organizations needed these artifacts as collaboration media and as institutional evidence. Now, artifact production has been suddenly industrialized—which is exactly why people get the sensation that “the whole industry will be replaced.”

A friend of mine calls this:

“The industrialization of language.”

And I largely agree.

White-collar workers look like they’re just copy-pasting in a window, but they’ve embedded decisions into prompts and context. So even if LLMs do all the dirty work, decision-making is still indispensable.

White-collar work naturally fuses transactional workflows with decision-making. So if an LLM can only handle transactional workflows, is the “copy-paste in a chat window” pattern sufficient to replace white-collar workers?

No.

“Copy-paste in a window + let the LLM do the work” is merely outsourcing transactions; it does not automatically erase the core value of white-collar work. Many white-collar workers today appear to be doing “low-level operations,” but in reality they are embedding the truly expensive thing—decisions—in a subtle way into prompts, context, constraints, sequencing, and trade-offs. On the surface, it looks like the LLM is writing, revising, producing drafts; in essence, humans are performing institutionalized judgment, while the LLM is being used as an extremely powerful transactional executor.

More rigorously: “decision” in white-collar work is not a single button, not a moment where someone says “please decide now.” It is distributed across the entire transactional chain as continuous micro-decisions.

When you write an email: who to CC, how hard the language should be, how explicitly to state risk, whether to leave wiggle room, where to place responsibility, how to set timelines—these are not just language skills, but judgments about organizational structure, risk tolerance, and boundaries of authority and responsibility.

When you draft contract clauses: do you use “must” or “should”? Do you lock it down or leave interpretive space? Do you place exceptions in the main text or in an appendix? Do you require the other party’s commitment before granting rights? Every word becomes a carrier of decision.

When you prepare reports or compliance explanations: which numbers must be explained, which risks must be disclosed early, which assumptions must be explicit, and which can be glossed over as “industry convention”—these, too, are judgments, not mere transactions.

Embedding decisions into context means writing into the boundary conditions of text generation: “which version of reality am I willing to be responsible for?” What the LLM crushes is the ability to produce deliverable text within defined boundaries. But the boundaries themselves, priorities, risk posture, exception strategy—these still require a subject that can be held accountable, can be reviewed, and can bear consequences under uncertainty.

So the answer to “If an LLM can handle transactional workflows, can it replace white-collar workers?” depends on what you mean by “replacement.”

If you mean replacing labor volume—reducing the physical work of drafting, organizing, filing, alignment, revision—then the window pattern can already replace a large share of the “visible labor” of white-collar work, and it will rapidly change headcount structures.

But if you mean replacing occupational function—replacing the “decision interface” that is authorized inside organizations, bound to risk bearing, constrained by institutions, and that accumulates reputation over long trajectories—then the window pattern is far from enough. Because the essence of the window pattern is: decisions remain in human heads; only transactions are outsourced to the model. When you hit gray zones, conflicts, irreversible consequences, situations requiring endorsement and responsibility, humans still must “sign the decision out” into the world—otherwise the system cannot close the loop.

The Window Paradox

There is an even deeper paradox: the stronger the window becomes, the more white-collar work shifts from “writing” toward “judging.” When the cost of writing approaches zero, what becomes scarce is no longer text production, but: who sets boundaries, who bears consequences, who is responsible for exceptions, who interprets conflicts, who conducts post-mortems and accepts disciplinary constraint.

In other words, LLMs will evaporate white-collar transactional labor, but simultaneously make the institutional core of white-collar work more visible: traceability of judgment and responsibility.

If I had to predict: many positions will indeed disappear, but “super-positions” will appear—concentrating the decisions that used to be distributed across those disappearing roles into a single super node.

So what’s the takeaway for engineers?

At minimum, in this game where AI as Black crushes global white-collar labor as White, we must find the system entry point.

Not to build a stronger “writing machine,” but to build a system structure that can carry decision and responsibility.

1) Stop crowding into the transactional layer: it’s already a crushing zone.

Transactional workflows—writing docs, organizing materials, formatting alignment, generating versions, adding explanations, running processes—are already the LLM’s absolute home turf.

As engineers, building “better prompt tools,” “smoother document generators,” “faster report assistants,” is, from a system perspective, mostly incremental innovation in a red ocean. We’ve already seen many teams crowd in early—report generation, summarization, storybooks with images; even vertical transactional pipelines like generating textbooks or question banks. It’s not that these products have zero value, but:

It’s hard to form durable moats
Model capabilities themselves will absorb them quickly
They rarely reshape production relations

If you choose the transactional layer as your entry point, you are essentially pushing against the model’s natural capability gradient.

2) The real gap: how decisions are caught.

Return to the core judgment: LLMs can evaporate transactional labor at scale, but they will not—and cannot—automatically absorb judgment and responsibility.

So where do those go? They don’t disappear. In reality, they rapidly concentrate into a small number of nodes—so-called “super-positions,” a handful of people, and those critical locations that must be institutionally recognized and must be able to “carry the load.” And in my view, this is exactly the entry window for engineers.

Decision-making in human systems is almost equivalent to “the meaning of time itself.”

Any meaningful process, any system that can keep running, is fundamentally a network of nodes and connections. And in that network, what truly determines direction is not the flows, but the decisions that sit on key nodes. Transactions are just the connecting lines.

For an individual, major decisions shape a life trajectory.

For a family, key choices determine intergenerational fate.

For an organization, decisions define strategy, risk, success, and failure.

For society and history, decisions are the inflection points of victory and defeat.

Now, this time-flowing network is being injected with a new form of intelligence: AI. Transactions can be automated; paths can be accelerated; but the nodes still exist—and become more important.

Who catches these nodes? Who is responsible for the judgments made at those nodes?

This is where engineering must intervene.

The engineering goal is to give decision itself a system form:

Make decisions explicitly expressible, stably recorded, institutionally constrained, post-mortem replayable, and—when necessary—migratable, inheritable, and reusable.

Standing at this moment, looking forward, I feel strongly that this is not something you solve by “adding a feature.” It represents an entire class of system opportunities:

A new kind of infrastructure to carry judgment, responsibility, and time—

not merely to improve efficiency, generate content, or optimize workflows (all of which will eventually be absorbed by model capability).

3) What do system-level entry points look like?

If you translate the “institutional core” of white-collar work into engineering objects, you can see at least five implementable directions:

Decision as Object
Today many key judgments are “embedded in prompts” and not traceable. The first step is to turn “why I decided this way” from implicit context into explicit objects.
Responsibility Binding
The classic “who takes the blame” question. In fact, this is often the easiest part: who is the owner, who is the reviewer, who has override authority?
Decision Logs and Replayability (Trace & Replay)
Engineering can turn judgment itself into a reusable asset rather than something that drifts away inside a chat window.
Exception Engineering
The real world is not a rule engine. The complexity of production relations and politics cannot be captured by simple if-else. Systems must support:

explicitly allowing exceptions
forcibly recording justification
requiring post-mortem review

This is the core work area of human judgment—and the weakest area of LLMs.

Any system that can “carry the load” must satisfy the principle of Minimum Engineering

Whether it’s a human system, a machine system, or a human-machine hybrid—once it enters the responsibility domain, it must satisfy a non-negotiable baseline:

Key system actions must be determinable, recorded, replayable, and accountable.

The emphasis isn’t on “who the system is,” but on “whether the system can be constrained.” Whether it is a human expert, a committee, an algorithm, an AI agent, or any hybrid form—identity does not automatically grant trust. What determines trust is whether it meets engineering constraints.

Take human experts: what does a signature mean? It means the person bears institutional consequences. If they violate the law, can they be sued? Of course. But what does a lawsuit rely on? Evidence. Where does evidence come from? Verifiable records, traceable chains, facts that third parties can understand and reconstruct. Without these, “responsibility” becomes a joke—unenforceable, unarbitrable.

No serious society allows key facts to “dissolve into a chat window.” Do companies keep accounting vouchers, ledgers, audit working papers, tax bases in chat windows? No. In many countries, ledger and voucher requirements are strict: retention, formatting, preservation periods, audit requirements. Compliance exists precisely because it can be audited, reviewed, and held accountable. Build an accounting/tax system and you must meet domestic standards and regulatory requirements. Once such a system introduces AI—or becomes AI-native—it does not become “less responsible”; it makes the responsibility domain more complex: more automation, faster iteration, and larger scale effects enlarge blast radius, and push accountability engineering requirements harder.

So this is a long-cycle evolution spanning computer science, institutional design, social consensus, and regulatory governance. Based on my study of technology history, it is often a ten-year or multi-decade system project. I’m seriously trying to find my livelihood for the coming decades—this is not empty talk. Writing essays isn’t my main job.

Responsibility = time + record + structure + explainability

Time: when the action happened, based on what information and rules
Record: whether key facts are solidified into verifiable artifacts
Structure: whether facts are organized in stable, machine-parseable form
Explainability: whether third parties can understand, review, and reconstruct the decision path and basis

Therefore, any long-term system that can “carry the load” must contain at minimum these five elements—none can be missing:

Action Object
What exactly was done? Was the output “frozen” into an identifiable, referencable, auditable object rather than a floating text stream?
Actor
Who initiated, who approved, who had authority, who bears consequences? The actor must be identifiable, bindable, and punishable.
Time Anchor
When does it take effect? Which version of policy/rules applied at the time? Time anchoring is a prerequisite for post-mortems and arbitration.
Replayable Trace / Log
Can you reproduce the inputs, basis, and path? Can you recompute after the fact, compare, and locate deviations? Without replay (and yes, replay is not the same as re-run), you have no correction capability.
Failure Mode
What happens when it’s wrong? How are exceptions approved? How are risks escalated? How are post-mortems triggered? Load-bearing systems assume they will be wrong and engineer error handling explicitly.

Because of this derivation, I strongly reject a popular claim:

“LLMs are probabilistic and uncertain, so we must accept an uncertain future.”

That might be acceptable in consumer applications—who cares if the person in a generated video wears red or green. But in the responsibility domain, it’s dangerous. Serious production, serious institutional evolution, serious change in production relations cannot be built on the premise of abandoning determinism. Any real systemic change that pushes society forward must return to one base:

Build on the maximum determinism humans can accept, to carry large-scale collaboration and responsibility.

An AI that cannot “carry the load” is just a language bot—it cannot justify trillion-dollar valuations.

In essence, the LLM window is still “running naked.”

If we apply the minimum-engineering standard to how white-collar workers use LLMs today, we see that the window is still naked. Break down the common pattern—Prompt → Copy → Paste—and strip away every part that actually enters the responsibility domain:

structured data committed into ERP / finance systems
emails formally sent, archived, and auditable
signed documents that pass approval workflows
contract texts stored in contract systems with legal effect
any institutional record that can be held accountable, replayed, and arbitrated

Then you see it:

The LLM window, in real economic operation, is basically running naked.

The generation process inside the window bears no responsibility and cannot carry responsibility. It has no stable action object, no frozen decision record, no clear actor, no time anchor, no replayable institutional log. Once separated from those “systems that ultimately catch it” —ERP, email systems, contract systems, approval systems, ledger systems—everything that happened in the window is, in engineering terms, as if it never happened.

What carries responsibility is not the window, but the institutional systems outside the window.

White-collar workers are not “making decisions in the window.” They are using the window for cognitive processing: organizing thoughts, generating candidate drafts, simulating alternative phrasings, searching for structural and tonal optima. Once they enter the stage where something must take effect externally and consequences must be borne, they leave the window and re-enter a system that satisfies minimum engineering requirements.

That’s why—even if the window is brilliant, efficient, and stunning—it remains, in serious economic activity, a non-responsibility system. It can drastically improve transactional efficiency, but it is not a responsibility node that can be trusted.

In other words:

White-collar workers are not “carrying the load in the window.” They carry the load outside the window.

The window only reduces the cost of thinking, probing, simulating, and expression. What determines whether something can enter the real world—into ledgers, contracts, law, and institutional chains—remains those systems that satisfy minimum engineering principles.

In the era when AI-native systems become a trend, the programmer’s job is to put clothes on this powerful system that is still “running naked.”

As AI-native systems begin to take shape, the real task for application-level engineers is not to keep amplifying what this system can already do. It is to face a deeper, harder reality: a system that is extraordinarily powerful in cognition and generation is still “naked” today.

It lacks responsibility anchors, institutional interfaces, and traceable historical structure—so it cannot truly be integrated into real-world production and governance. Part of the programmer’s work is to dress this already-arrived but still-unconstrained power—using engineering structure to carry judgment, using records to fix actions, using a time axis to constrain evolution, using responsibility mechanisms to connect to law, compliance, and social consensus.

This is an inevitably long-term engineering project spanning computer engineering, institutional design, and social coordination. Once you enter the responsibility domain, the time scale cannot be “a few months of product iteration.” It must be ten years, twenty years, or longer. Every technology that truly changed production relations went through this path: capability first appears, order is built afterward, and productivity must be institutionalized to become sustainable social power.

AI today stands at this threshold. I have never denied the greatness of this transformation: capability has exploded, but responsibility has not landed; tools have formed, but governance is still catching up. Ultimately, we must integrate the existing reliable institutional systems—ERP, email, contracts, approvals, ledgers—into an AI-native framework:

Commit / Persist
Freeze
Sign-off
Approval
Accountability
Audit
Replay / Post-mortem
Arbitration
Attribution
Liability

In this era, the programmer’s role—if writing “code” itself becomes increasingly trivial—is closer to participating in a deeper evolution: transforming a powerful but unanchored system into one that can be accepted by society, trusted by institutions, and run for decades across time. This path is slow, heavy, and has almost no shortcuts—but it is the path that truly aligns with the historical laws of how productivity and production relations evolve.

A great game of Go indeed.

人人都说AI会替代白领，但我认为这种说法忽略了重要的信息

AI一定会替代白领，白领完蛋了，犯了一个典型的时间与博弈结构错误：把“终局判断”当成“当前状态判断”。

人人都在谈 AI 会替代白领，但我认为这种判断本身忽略了极其关键的信息。

我打个比方：

一个围棋九段，对战一个刚学没多久的人。九段执黑，新手执白。我们当然知道，从整体实力和胜率上看，黑棋赢面极大。但问题是：黑棋才刚下第一子，你就宣布白棋已经输了，这合理吗？这能对吗？白棋再菜，他还没下呢！

中国人讲“博弈”，本来指的就是黑白双方在时间中的对弈过程。就算最终是黑棋中盘确立胜势，那也至少要经过一百八十步的展开。而恰恰是在这段“黑棋大概率稳赢、但胜负尚未揭晓”的漫长中盘里，隐藏着大量被忽视的重要信息。

这两年我听了很多“AI 原生软件 / 系统”的创业想法之后，我对其中很多的推理和逻辑都很赞同，但是觉得我们可以更清晰的，描述我们都看到的这个方向。而这些方向，就落在我要跟你说的以下推理之中。这个黑棋虽然看上去稳赢，但是他还真必须要下百八十步才能确立的这个时间过程中。这个博弈过程，还很漫长，人类社会的结构，没有那么容易用一个：

一个高手来了，下了一子，赢了。

这么简单的论述来囊括一个阶层的职业的消散。这种叙述跳过了时间，跳过了过程，也跳过了大量关键的信息。我相信许多 AI 原生系统真正的落点，并不在终局宣判，而恰恰就在这段被轻易略过的博弈过程中。

上层白领之所以是社会中坚，不仅仅是因为他们脑里的教科书式知识。你要真正的去理解他们的“制度接口型”位置。

对于现代化、全球性工业体系这样的高度复杂系统而言，白领阶层这个横跨发展中与发达国家、承载了制度运转的中坚群体，其生产关系的变化，本质上是一场长时间、多回合、高风险的博弈。

在这样的博弈中，黑白双方的每一子，都蕴含着真实而具体的机会与代价。它绝不只是代码能力、生成式文本，或者“能否解题”“能否写 prompt”这样的问题。现在大家还都在窗口复制黏贴呢！这种生态说要替代白领生态那真是言之过早了。

以律师、会计师这类持证的上层白领职业为例：

他们的培养成本之所以高，并不是因为他们会写更多文书、记住更多条文，而是因为他们占据的是一种被制度认可的决策与责任位置。

这些职业的价值，建立在社会默认的一条深层规则之上：

你可以信他，因为一旦出事，他要负责。

这个“负责”不是抽象的，而是制度化的：职业纪律处分、民事责任、乃至刑事责任。

正因为如此，这类职业要求的不是一次性完成任务的能力，而是长期训练成“稳定风险承担者”的能力。他们必须在不确定性中做判断，在规则的灰区中承担后果。

很多工作在表面上看起来像事务处理，做账、报税、起草合同、写备忘录。

但真正昂贵、真正稀缺的，是隐藏在其中的 judgment：

什么时候必须明确说“不可以”
什么时候可以允许“可控的例外”
当规则相互冲突时，如何解释、如何取舍
当风险不可消除时，如何承担而不是回避

这种判断能力，不是流程也不是模板。是高度“人本”的。而当前停留在“窗口交互”层面的语言模型，本质上并不具备这种灰度判断的制度位置。

从更宏观的角度看，这些职业其实是制度接口：

它们不是凭空设计出来的，而是在漫长的工业化进程中，由生产与生产关系不断摩擦、演化而形成的稳定结构。西方几百年的工业制度演化，全球化此消彼长的产业流动中，慢慢形成的。

它们最核心的资产，仅仅是教课书知识吗？当然不是！而是：

reputation + track record

track record 是一种时间资产。

你必须经历足够多真实情境、错误、复盘、纪律约束，才能被信任。在真正性命攸关、不可逆的时刻，你会信一个即时生成的输出，还是一个拥有长期信誉记录的老专家？

正因如此，很多白领职业设置了体系化门槛：

考试
执照
继续教育
事务所或机构内的长期训练

这些门槛的本质，其实只在做两件事：

筛选：尽可能减少不稳定个体进入高责任位置
绑定：把个人与职业纪律体系绑定在一起（可惩戒、可追责）

换句话说，证照的意义不只是“证明你懂”，更重要的是 “证明你可被制度约束”。而很多工程人员、程序员，尤其是理工背景出身的人，世界观往往过于简化，特别习惯于认为问题存在绝对正确答案。奥林匹克数学题答对了就可以了吗？真实世界的博弈可比高中毕业就能参加的奥林匹克数学难多了。

“AI 不能替代白领，因为没人背锅”。

我认为这种粗暴的反证其实也不正确。因为“背锅”本身并不难设计，电子签名，human override…真正困难的，是让一个决策主体在长期时间中，被制度持续约束、被历史反复检验、被社会逐步信任。而这恰恰是那场“黑棋看似稳赢、但仍需下完一百八十步”的博弈中，最难被跳过的一段路。路得一步一步走，棋要一子一子的下。我们作为程序员，如果看好这条道路，那么真正的机会在这个博弈的过程中，而不是，“啊，你输定了，大家躺平吧。” 那这种态度，一躺就是几十年以后了，真正的结构级机会你也错过了。

这一轮 AI 最“神”的地方，就在于它横空出世，就是一种“全域型技术”。

老师在用，学生在用；

律所在用，会计在用；

写代码的在用，写小说的也在用；

搞金融的在用，搞医学的也在用，居然连门诊医生都在问 GPT；

它能写报告，能改合同，能生成结构化文档，甚至连整篇形式语言都能直接给你铺出来。

这种一出现就覆盖几乎所有认知型职业的技术形态，在技术史上是极其罕见的。我虽然不是科技史专业，但是本人是比较喜欢研究科技史这个冷门的专业。虽然不说精通吧，但是还是因为兴趣看了不少书。绝大多数重大技术突破，路径都恰恰相反：都是先在一个垂直领域打穿，再慢慢外溢到其他行业。蒸汽机、电力、计算机、互联网，无一不是如此。如果非要在技术史中找一个相对接近的类比，我能想到的，恐怕只有印刷术。

所以当很多人说“AI 要取代所有白领”时，这种直觉其实是直接来自这种“全域型覆盖”的视觉冲击。我一开始也被冲击了，因为作为一名双语者，编程语言使用者，形式语言读得懂的人，他居然覆盖了我所有的语言接口。这个一下子就把我对世界的所有通道给接住了。然后我就懵了，我也认为他无所不能。

LLM 几乎掌控了人类所有的语言表达形式，而语言正是所有行业的通用接口。当一个技术突然覆盖了所有接口，你自然会产生一种错觉：

那是不是所有接在这个接口上的职业，都会被整体替换？

但问题恰恰出在这里，在进入2016年之后，我对于LLM的看法逐渐变得脚踏实地，更加清醒，也更skeptical。

我完全认同：AI 对白领工作的覆盖性是真实存在的。

而且这种“全域”，是因为 AI 懂所有行业的专业知识，也是因为语言本身就天然覆盖所有行业。从合同、报表、病历、论文、说明书、代码注释到会议纪要，白领世界的绝大部分可见产出，本来就以语言形态存在。

但覆盖不等于替代。

一旦你从“看起来能做”这一步，走向“生产关系层面的替代”，事情就变得复杂起来。而恰恰是在这一点上，目前几乎所有关于“AI 替代白领”的讨论，都严重跳步了。我看过大量分析文章、视频、创业叙事，但几乎没有人真正把“替代机制”本身讲清楚。因为要讲清楚这个问题，你必须先回答一个更基础、也更困难的问题：

当下全行业白领工作的“通用运行机制”到底是什么？

白领并不是靠“会写字”“会用语言”存在的。语言只是表层载体。真正的底层逻辑包括：

决策如何被授权
风险如何被分配
责任如何被追溯
判断如何在不确定性中被接受
错误如何被制度性吸收，而不是系统性崩溃

你只有先把这套跨行业、跨职业、长期稳定运作的机制讲清楚，才能继续问下一步：

AI 是否、以及如何，能够完整接手这套机制？

而现在的很多“替代论”，其实只是停留在：

“它能生成得比人快 / 比人好 / 比人便宜”。

但从“生成能力提升”，直接跳到“生产关系被整体替换”，中间至少缺失了几十个关键步骤：制度、责任、信任、时间、风险、治理、追责、复盘……几乎全部被一笔带过。

我认为白领工作的“通用运行机制”至少是：事务型流程和决策混杂。

我认为，白领工作的通用运行机制，并不是“纯决策”，也不是“纯事务”，而是一个长期极其稳定的结构：

事务型流程与决策节点高度混杂。

先说什么是事务型工作。

这里我用一个非常朴素、甚至有点“难听”的定义：**事务型工作，就是那些你每天都在做、并不需要高度智力投入、但又必须被反复、准确完成的工作。**如果翻成英文，我会直接叫它 transactional tasks。

它们构成了绝大多数白领日常工作的“体积”：

邮件有没有发？
会议纪要有没有整理？
上一轮修改版是不是同步给所有人了？
报表有没有按模板更新？
审批流程走到哪一步了？
这个合同版本是不是最新的？
客户是否已确认？
律师是否已 review？
合规意见有没有反映进方案？

你甚至可以说，一个白领在办公室小隔间里坐一天，80% 的时间都在处理这类事务型流程。它们重复、琐碎、标准化，看起来完全不像“高价值工作”。也正是因为这一点，LLM 在这些场景里的表现，才会显得如此“恐怖”：写邮件、改文档、补说明、出总结、对齐格式这方面，只要你把上下文给他补全了，它几乎全面碾压人类。

但是如果你真的认真观察过白领工作，你会发现：这些事务型流程里，密密麻麻地嵌着大量“非显式的决策节点”。而且这些决策，并不是那种“我现在要不要选 A 还是 B”的显性决策，而是：

这封邮件要不要现在发？
要不要抄送这个人？
这句话是写得更保守一点，还是更激进一点？
这个问题现在提出，会不会引发不必要的风险？
这一步是不是可以先模糊处理，留到下个阶段？
这个点要不要升级？
要不要在文档里明确写死，还是留解释空间？

这些判断，全部发生在事务型动作的执行过程中，而不是一个单独被拎出来的“决策时刻”。

换句话说：

白领工作的真实形态是：

在连续不断的事务执行中，持续做微小但不可撤销的判断。

LLM在帮白领做事务型流程的任务方面，是无死角碾压的。

在全行业，只要你把上下文足够补全，约束（虽然prompt是软约束）完整，且不论整个系统工程的完整性，也就是说在上下文的范围之内，不会因为你这个工程或者系统的涵盖面过于复杂，不需要把整个法典都翻遍了。那么这种事务型流程的输出啊，简直就是碾压级的。因为以前白领的大量时间，都在做这种事务型流程，比如律师的陈述，那每一个字，每个引用….

LLM他恰好踩中了白领事务工作最核心的计算结构，那就是语言型事务 = 在约束下生成可接受的文本/结构化表达。只要你把上下文补全，把目标说清楚，把边界和格式约束给够（哪怕 prompt 只是软约束，我以后再给你说这种软约束和硬约束什么区别。只有真正的编程，程序语言执行才能达到我说的“硬约束），并且把任务限定在“上下文封闭”的范围内（也就是不要求它跨越整个系统工程的全域复杂性），那么这类事务输出几乎天然就是 LLM 的主场。

他能在极短时间内生成大量合格文本，保持体裁一致、语气稳定、结构完整、格式规范、引用风格统一，并且能按你的指令反复迭代、局部改写、对齐模板、补齐缺口、降噪压缩。

这种工作在传统白领体系里占据了巨大的时间体积：律师写陈述、写 memo、写条款、写 correspondence；会计写报表附注、写解释、做对账说明；咨询写 deck、写分析摘要、写会议纪要；HR 写 JD、写绩效评语、写制度文案；产品写 PRD、写对齐邮件、写 release note。

以前看上去这种工作蛮难，很耗时，这里的“难”，很多时候并不是因为多么烧脑，而是一种”要把作业写的好看和无错“的难。可交付、可读、可复用、可审阅、可追溯，每一个字的语气、每一个引用的格式、每一处措辞的风险偏好、每一段结构的完整性，都在消耗人类的注意力与体力。

人类其实是更适合创造力，激情，和想象力的意识体。对比LLM 的优势就在于：它本质上是一台极强的“语言-结构生成器”，对这种“在约束内大量生产”的任务，拥有接近物理层面的规模优势：不疲劳、可并行、可重写、可对齐、多版本同时产出，并且把原本需要白领“手工搬砖”的语言摩擦成本压到极低。

现在LLM在这方面彻底直接把“写作-整理-格式-改稿-复述-汇总-模板化表达”这一整条链条的边际成本打穿；很多岗位过去被迫投入的时间，并不是因为这些动作有多“高价值”，而是因为组织需要这些产物作为协作介质与制度凭证，而现在这种介质生产被突然工业化了——这就是为什么人们会产生“全行业被替代”的感觉。

所以我有一个网友，直接称此为：

“语言的工业化“。

我还挺认同他。

现在白领们看上去在窗口里复制黏贴，然而他们把决策全部融入prompt和上下文中了。所以看上去虽然LLM干了所有的脏活累活，但是这个决策却是不可少的。

白领工作天然集事务型流程与决策于一体，那么只能承担事务型流程的LLM，就以窗口复制黏贴这个形态就足够替代白领了吗？

不够。

窗口里“复制黏贴 + 让 LLM 干活”只是把事务外包了，但白领的核心价值并没有因此自动消失。今天很多白领看起来像是在做“低级操作”，其实他们正在把真正昂贵的东西：决策！以一种隐蔽的方式嵌进 prompt、上下文、约束、顺序与取舍里。于是表面上是 LLM 在写、在改、在出稿；本质上是人类在完成“制度化判断”，LLM 只是被当作一个超强的事务执行器。

更严密地说：白领工作的“决策”不是一个单独的按钮，不是“现在请你决策”，而是分散在整个事务链上的连续微决策。你写一封邮件时：抄送谁、措辞多硬、把风险说到什么程度、是否留下余地、把责任推向哪里、把时间线怎么定，这些都不仅仅是语言技巧，而是在组织结构、风险承受、权责边界里的判断。你起草合同条款时：是用“must”还是“should”，是写死还是留解释空间，是把例外写在正文还是放进附件，是先让对方承诺再给权利，每一个词都是决策的载体。你做报表或合规说明时：哪些数字需要解释、哪些风险要提前披露、哪些假设必须写清、哪些可以“行业惯例”带过，这些同样不是事务，而是判断。人类把决策融进上下文，就是把“我愿意为哪个版本的现实负责”写进了文本生成的边界条件里。LLM 碾压的是“在既定边界内生成可交付文本”的能力；但边界本身、优先级本身、风险偏好本身、例外策略本身，仍然需要一个能够被追责、能够被复盘、能够在不确定性中承担后果的主体来设定。

所以，“只要 LLM 能做事务型流程，就能替代白领吗？”这个问题的答案取决于你如何定义“替代”。

如果你说的是替代劳动量，比如减少写稿、整理、归档、对齐、改版这些体力劳动，那窗口形态已经能替代掉白领很大一部分“可见工作量”，并且会迅速改变岗位的数量结构。但如果你说的是替代职业功能，也就是替代那个在组织里被授权做判断、被绑定承担风险、被制度约束、并在长期轨迹中积累信誉的“决策接口”，那窗口形态远远不够。因为窗口形态的本质是：决策仍在人的脑子里，只是事务被外包给模型；一旦遇到灰区、冲突、不可逆后果、需要背书与承担的时刻，人类仍然必须把决定“签出来”，否则系统无法闭环。

窗口悖论。

更进一步：窗口形态甚至会带来一个悖论：它越强，越会把白领的工作从“写”推向“判”。因为当写作成本趋近于零，真正稀缺的就不再是产出文本，而是：谁来定边界、谁来承担后果、谁来为例外负责、谁来解释冲突、谁来在事后复盘并接受纪律约束。也就是说，LLM 会让白领的事务劳动被蒸发，但同时会把白领职业的“制度核心”凸显出来：判断与责任的可追溯性。

如果要我预测的话，那就是大量的职位的确会消失，但是会有一些“超级职位”出现，把消失的职位里面所有的决策全部集中到一个超级节点。

所以给我们工程人员的启示是什么？我认为就是起码在这一局AI执黑棋碾压全球白领执白棋的棋局中，找到系统的切入点。

不是去造更强的“写作机器”，而是去造“承载决策与责任的系统结构”。

一、不要再挤事务层：那里已经是碾压区

事务型流程，写文档、整理材料、对齐格式、生成版本、补说明、跑流程，已经是 LLM 的绝对主场。

你作为工程人员，再去做“更好用的 prompt 工具”“更顺滑的文档生成器”“更快的报告助手”，从系统视角看，大多是在红海里微创新。看到一堆团队早两年去挤报表的生成啊，总结的生成啊，故事书还配图的生成啊。甚至一些垂直领域的事物型流程，比如生成教科书啊，生成习题库等等。我个人认为不是说这些东西没价值，而是：

它们很难构成长期壁垒
很快会被模型能力本身吃掉
也很难真正改变组织的生产关系

如果你把“切入点”选在事务层，本质上是在和模型的自然能力对冲。

二、真正的空档在：决策如何被接住

回到前面的核心判断：LLM 可以大规模蒸发事务型劳动，但它不会、也无法自动承接判断与责任。

那这些东西去哪了？它们并没有消失，而是在现实中迅速向少数节点集中——集中到所谓的“超级职位”，集中到少数人身上，集中到那些必须被制度认可、必须能扛事的关键位置。而我个人认为，这恰恰是工程人员真正的切入窗口。

决策对人类系统的重要性，几乎等同于“时间本身的意义”。

任何有意义的流程、任何能够持续运转的系统，本质上都是一个由节点与连接构成的网络。而在这张网络里，真正决定走向的，而是那些落在关键节点上的决策。事物只是连接线。

对一个人来说，重要决策塑造了人生轨迹；

对一个家庭来说，关键选择决定了代际命运；

对一个组织来说，决策定义了战略、风险与成败；

对社会和历史而言，决策本身就是胜负的分水岭。

现在，这样一张在时间中流动的网状结构，正在被注入一种新的智能形态——AI。事务可以被自动化，路径可以被加速，但节点仍然存在，而且变得更重要了。

谁来接住这些节点？谁来为这些节点上的判断负责？

这正是工程必须介入的地方。

工程的目标是让决策本身具备系统形态：

让决策能够被显式表达被稳定记录被制度约束被事后复盘并在必要时被迁移、继承与重用

我现在站在这个时间点上，想要展望未来这个系统能做的一切，我感觉这并不是一个“加个功能”就能解决的问题。这意味着的是一整类系统机会：

一种全新的基础设施，用来承载判断、责任与时间，

而不仅仅是提高效率、生成内容或优化流程（这个迟早被模型吃掉）。

三、系统级切入点长什么样？

如果把“白领的制度核心”拆成工程对象，你会发现至少有五个可落地的方向：

1️⃣ 决策显式化（Decision as Object）

今天大量关键判断是“融在 prompt 里”的、不可追溯的。系统要做的第一步是：把“我为什么这么定”从隐式上下文，变成显式对象。

2️⃣ 责任绑定（Responsibility Binding）

许多人提出的经典“谁来背锅”问题。这个问题其实相反还是最好解决的：谁是 owner？谁是 reviewer？谁有 override 权？

3️⃣ 决策日志与可复盘性（Trace & Replay）

工程系统可以把判断本身变成一种可复用资产，而不是随风飘散在聊天窗口里。

4️⃣ 灰区与例外机制（Exception Engineering）

现实世界不是规则引擎。人类生产关系和政治的复杂性不是简单的if-else能搞定的。

系统要支持：

明确允许“例外”
但强制记录理由
并要求事后复盘

这是人类 judgment 的核心工作区，也是 LLM 最弱的地方。

全世界，能扛事，能负责的系统，管你是人的系统还是机的系统，还是人机的系统，都有必须符合“最小工程化”原则

不管是人的系统、机器的系统，还是人机混合系统——只要它要进入责任域（responsibility domain），它就必须满足一条不可协商的底线：

系统的关键行为，必须能被确定、被记录、可复盘、可追责。

这里的重点不在“系统是谁”，而在“系统能否被约束”。管你是人类专家、委员会、算法、AI agent，或任何混合形态；身份并不自动赋予可信度，真正决定可信度的是：能否满足工程化约束。

拿人类专家举例：专家签字意味着什么？意味着他在制度上承担后果。那如果他违法，能不能起诉？当然能。但起诉靠什么？靠证据。证据从哪里来？来自可核验的记录、可追溯的链条、可被第三方理解与重建的事实。没有这些，所谓“负责”就是个玩笑。无法被执行，也无法被仲裁。

因此，任何严肃社会都不会允许关键事实“消散到窗口里”。全世界的公司会把会计凭证、账本、审计底稿、税务依据都留在聊天窗口里吗？不会。相反，在很多国家，账簿与凭证的保存、格式、留存周期、审计要求都极其严格；合规之所以成立，正是因为它可以被审计、被复核、被追责。你做一个会计/税务软件，必须符合本国会计准则与监管要求；当这样的系统引入 AI、甚至变成 AI 原生之后，它面对的不是“责任变轻”，而是责任域更复杂：更多自动化、更快迭代、更大的规模效应，会把错误的外溢半径放大，也会把责任链条的工程要求推得更硬。

所以这是一条横跨计算机科学、制度设计、社会共识与监管治理的长周期演化路径，根据我对技术史的研究，往往是十年、几十年的系统工程。我是很认真的在找我自己未来几十年的饭碗，不是在这里乱吹，我的主业又不是写作文。

责任 = 时间 + 记录 + 结构 + 可解释性

时间：行为发生在何时、基于何种当时信息与规则
记录：关键事实是否被固化为可核验材料
结构：事实是否以稳定、可解析的形式组织起来
可解释性：第三方是否能理解、复核、重建决策路径与依据

因此，一个能长期“扛事”的系统，其最小工程化至少必须具备以下五个要件——缺一不可：

明确的行为对象（Action Object）
这一次到底做了什么？输出是否被“冻结”为一个可指认、可引用、可审计的对象，而不是漂浮的文本流。
明确的责任主体（Actor）
谁发起、谁批准、谁拥有权限、谁承担后果？责任主体必须可识别、可绑定、可惩戒。
明确的时间锚（Time）
行为在何时生效？当时适用的规则版本是什么？时间锚是复盘与仲裁的前提。
明确的可回放记录（Trace / Log）
能否复现当时的输入、依据与路径？能否在事后重算、对比、追查偏差来源？没有 replay（程序员插播一句，我其实在前面的文章里说过 replay不等于re-run) ，就没有纠错能力。
明确的失败路径（Failure Mode）
错了怎么办？例外如何批准？风险如何升级？复盘如何触发？能扛事的系统必须默认自己会错，并把“错”工程化处理掉。

也正因为这一切的推导，我现在坚决反对一种流行说法：

“大模型是概率的、不确定的，所以我们要接受未来的不确定。”

这句话在消费级应用里也许成立，你可以接受一切不确定的生成式视频。生成视频的那个女生穿的是红色还是绿色没关系。但在责任域里是危险的。严肃的生产、严肃的制度演进、严肃的生产关系变革，不可能建立在“放弃确定性”的前提上。相反，任何真正推动社会前进的系统化变革，最终都必须回到同一个底座：

以人类可接受的最大确定性为基础，去承载规模化的协作与责任。

不能“扛事”的AI，只是一台“语言机bot“ 的AI，扛不起万亿估值。

大模型的窗口从本质上来说，还在“裸奔”。

如果我们严格按照前面所说的最小工程化标准来审视当下白领对大模型的使用方式，你就发现窗口完全还在裸奔。把今天白领最常见的 Prompt → 复制 → 黏贴 的窗口式操作拆解开来，并把所有真正进入责任域的部分剥离掉：

最终落盘到 ERP / 财务系统中的结构化数据
被正式发送、归档、可审计的邮件
经过审批流程、带有责任主体的签字文件
进入合同系统、具备法律效力的正式文本
任何可以被事后追责、复盘、仲裁的制度性记录

你发现：

大模型的窗口，在真正的经济运作中，几乎处于“裸奔”状态。

窗口里的生成过程本身，不承担责任，也无法承载责任。它没有稳定的行为对象，没有冻结的决策记录，没有明确的责任主体，没有时间锚，也没有可回放的制度日志。一旦脱离了那些“最终被制度接住的系统”——ERP、邮件系统、合同系统、审批系统、账本系统——窗口里的那一切，在工程意义上等同于没有发生过。

真正承载责任的，不是窗口，而是窗口之外的制度系统。

白领并不是在“用窗口做决策”，而是在用窗口进行认知加工：整理想法、生成候选文本、模拟不同说法、寻找语气与结构的最优解。一旦进入“要对外生效、要对后果负责”的阶段，他们必然离开窗口，把结果重新输入到一个满足最小工程化要求的系统中。也正因为如此，现在的大模型窗口，哪怕再聪明、再高效、再令人震撼，它在严肃经济活动中的地位仍然是：非责任系统。它可以极大地提高事务处理效率，但它本身不构成一个可被信任的责任节点。

换句话说：

不是白领在用窗口扛事，而是白领在窗口之外扛事。

窗口只是把“思考、试探、推演、表达”的成本压低了；真正决定能否进入现实世界、进入账本、进入合同、进入法律与制度链条的，仍然是那些符合最小工程化原则的系统。

为程序员的任务，在AI原生系统形成趋势的时代，就是帮这个正在“裸奔“的强大系统穿上衣服。

在 AI 原生系统逐渐成型的时代，对应用型程序员而言，真正的任务并不是继续放大这套系统已经具备的能力，而是要正视一个更根本、更艰难的问题：这样一个在认知与生成层面极其强大的系统，当前仍然处在“裸奔”状态。

它缺乏责任锚点、缺乏制度接口、缺乏可追溯的历史结构，因而无法被真正纳入现实世界的生产与治理体系。一部分程序员所要做的是为这股已经出现、却尚未被约束的力量穿上衣服——用工程结构去承载判断，用记录去固定行为，用时间轴去约束演化，用责任机制去对接法律、合规与社会共识。

这是一项必然横跨计算机工程、制度设计与社会协商的长期工程；一旦进入责任域，它的时间尺度就不可能是产品迭代的几个月，而必然是十年、二十年，甚至更长的周期。历史上一切真正改变生产关系的技术，都经历过这样的过程：能力先出现，秩序随后补齐，生产力必须通过制度化才能转化为可持续的社会力量。今天的 AI 正站在这个临界点上，我从来没有否认过这场变革的伟大：能力已经爆发，但责任尚未落位；工具已经成形，但治理仍在追赶。最终将现行的ERP、邮件系统、合同系统、审批系统、账本系统等等靠谱的制度系统全部并入AI原生的框架。

落盘（Commit / Persist）
固化（Freeze）
签字（Sign-off）
审批（Approval）
追责（Accountability）
审计（Audit）
复盘（Replay / Post-mortem）
仲裁（Arbitration）
归因（Attribution）
问责（Liability）

而程序员在这个时代所承担的角色，如果我们写代码的功能已经变得微不足道，更像是参与完成一次更深层演化的人：把一个强大却无锚的系统，转化为一个能够被社会接纳、被制度信任、并在时间中长期运行的系统。这条路径缓慢、沉重、几乎不可捷径的道路，才真正符合生产力与生产关系演变的历史规律。

一场伟大的棋局。

Susan STEM’s Entropy Control Theory

Discussion about this post

Ready for more?