Do data and compute have diminishing marginal utility?

​The economics of artificial intelligence are fundamentally distinct from previous technological shifts. Traditional rules governing industrial production, and even early digital products, break down when applied to large language models. This exploration tracks a series of interconnected structural realities: how diminishing returns apply to the foundational inputs of machine learning, whether base model training can ever be truly finalized, the trajectory of synthetic data, and the political economy of user data compensation. These insights represent a record of thinking in progress for an unmapped economic landscape.

​I. Diminishing Marginal Utility in Machine Learning

​Data: A Classic Case of Diminishing Returns

​When evaluating data and compute, data represents a classic textbook resource governed by diminishing marginal utility. The first thousand training examples provide massive learning gains, whereas subsequent millions yield progressively less. Error rates in neural networks typically follow a power-law relationship with dataset size, approximating error \propto N^{-\alpha}, where \alpha sits between 0.1 and 0.5. Consequently, a tenfold increase in data is required to secure equal incremental performance improvements at higher stages of the curve. This behavior stems from redundancy; as datasets swell, new examples heavily mirror existing ones, merely validating a known distribution rather than expanding capability. The exception lies in genuinely novel, high-diversity data covering rare edge cases or unrepresented domains, which can offer profound marginal utility even at scale.

​Compute: Diminishing Returns with Phase Transitions

​Compute behaves differently, presenting diminishing baseline macro-trends punctuated by sharp phase transitions. Standard scaling laws indicate that overall model loss falls as a power law relative to compute scale. However, massive compute allocations can unlock entirely new, emergent behaviors—such as coherent multi-step reasoning, complex instruction-following, and in-context learning. Because these capabilities manifest abruptly at specific resource thresholds, they generate localized, highly valuable non-diminishing returns.

​This dynamic is further shifted by the operational partition between training compute and inference compute. Modern reasoning models represent a strategic pivot toward test-time compute, allowing a model to search reasoning paths, self-verify, and backtrack during active queries. For complex mathematical and logical tasks, these inference-time returns remain highly productive.

​II. Can We Declare Training "Done"?

​Given that training compute exhibits diminishing returns, a natural question follows: can we reach a point of declaring training finished and redirecting all resources to inference? The honest answer is probably not, because "negligible returns" is always relative to the task at hand. A model may be saturated on standard benchmarks while still having vast room for improvement on long-horizon planning or genuinely novel scientific domains. As one capability saturates, the frontier moves, meaning the game is not to solve a fixed problem but to pursue an expanding one.

​Furthermore, inference-time scaling does not replace training; it completely depends on it. Chain-of-thought reasoning and extended self-reflection require a base model that already possesses the latent knowledge and core capability to draw upon. A weak base model hits a ceiling quickly at inference time, regardless of how much compute is applied. Training and inference scaling are not alternatives, but sequential layers: training lays down the substrate from which inference extracts value. The realistic picture is a joint approach: continue training to build stronger base models, while simultaneously developing inference-time scaling techniques that exploit those models more deeply.


​III. Synthetic Data: Promise, Peril, and the Verification Escape Hatch

​To bypass the constraints of finite real-world data, synthetic data offers unlimited volume and cheap, targeted production. Yet, training models recursively on their own outputs triggers a well-documented pathology: model collapse. Errors compound, distributions narrow, and critical edge cases are washed out—a phenomenon analogous to photocopying a photocopy.

​The premier escape hatch from this loop is anchoring generation to verifiable domains like mathematics, formal logic, or executable code. In these spaces, an external checker or compiler evaluates model-generated solutions, breaking the closed feedback loop and enabling authentic self-improvement from verified failures. While highly effective for programmatic domains, extending this verification framework to ambiguous fields like strategic planning or creative work remains an unresolved frontier where model collapse risks stay acute. The plausible trajectory for the field relies on a continuous loop: pretraining on real data, expanding capabilities through verified synthetic sets, and feeding optimized inference outputs back into the next generation of base models.

​IV. Zero Marginal Cost and the Uncompensated User

​Post-deployment, the marginal cost of serving an additional user drops essentially to the costs of electricity and amortized hardware. Because the core model is a non-rivalrous digital asset, standard market mechanics that push prices down to marginal cost fail. A price of zero cannot recoup massive fixed training outlays, meaning pricing is anchored to value capture rather than input costs. Platform value capture is further accelerated by an indirect network flywheel: higher user volume yields more preference signals, error discovery, and edge-case testing, which systematically refines future training iterations.

​This structural dependency frames users not merely as consumers, but as involuntary factor inputs to production. Users provide the very preference signals and conversational data that refine the models. This relationship takes on a deeply recursive, asymmetric quality: large language models are trained on the accumulated intellectual history of human thought, only to sell that distilled capability back to the humans whose cognitive labor they may ultimately compete with or replace. While frameworks like "Data as Labor" advocate for user micropayments, structural execution faces severe attribution hurdles. A single conversation's marginal contribution is mathematically diffuse and often indistinguishable from noise, despite massive aggregate value.

​V. Strategic Alignment of Compensation Frameworks

​If the industry moves to formally internalize the uncompensated externality flowing from users to platforms, several structural models present themselves:  

Model Mechanism Structural Constraint
Usage Credits Distributing free inference tokens or credits to heavy users providing high-quality interactions. The attribution problem makes isolating individual prompt value highly complex.
Data Cooperatives / DAOs Collective bargaining units negotiate data usage terms in exchange for model revenue shares. Requires unprecedented and non-existent collective governance frameworks.
Revenue Sharing Pools Distributing a corporate revenue percentage to users proportional to engagement. The average individual unit contribution is too close to statistical noise and hard to measure.
Tiered Access Offering non-monetary perks like preferential latency or expanded context windows to power contributors.         


Non-monetary perks may ultimately fail to offset systemic value extraction.
Ultimately, economics alone cannot balance this ledger; it remains a question of political economy. The core rules of this era are still being actively written, and society will eventually be forced to confront who holds the structural power to set the terms of exchange.  

Generated by Gemini Pro and Claude Sonnet 

Comments

Popular posts from this blog

Tokenomics: How Companies Operate in The Age of AI

The Innovator’s Dilemma is Dead. Long Live the Monopoly.

Capitalist Heroes, Communist Rhetoric