Newcomb's Paradox

Pierre-Normand · April 3, 2026, 6:21am

@hypericin, I want to revisit the demographics predictor case, because your recent post suggests you may be wavering for a reason that the OP in my new thread on causal exclusion provides some resources to address.

Your key move is: “However fine grained the data is, that data stops once the prediction is made. So the agent should still maximize their local utility, that is, two box.” This is a temporal cutoff argument: the predictor collected data about your rational profile in the past, the prediction is now frozen, and you face the boxes as a free agent whose present deliberation is no longer within the scope of what was tracked. If that’s right, the demographics case is structurally like the smoking-lesion case. A past condition (your rational profile at time of data collection) causes both the prediction and (probably) your choice, but your present deliberation can screen off the past data. Two-boxing follows.

I think this is mistaken, and the agent/circumstances distinction from my new thread explains why.

The temporal cutoff argument treats your rational profile as a past event — data collected, frozen, filed away — separate from your present deliberation. But your rational character isn’t a past event that caused your present deliberation as a downstream effect. It’s a standing capacity whose exercise is your present deliberation. When you deliberate about the boxes right now, you’re exercising the very rational capacities the predictor’s data tracked. The relationship between the data and your present choice isn’t like the relationship between a gene and a behavior (two separate effects of a common cause, screenable). It’s like the relationship between observing someone’s French fluency last year and predicting they’ll speak grammatical French today. The observation tracked a standing capacity, and today’s speech is the exercise of that same capacity. The temporal gap doesn’t create a screening-off opportunity.

A chess analogy may help, and it connects to a point @noAxioms raised in the other thread. When chess engines evaluate a player’s “accuracy” in a game, they measure how closely the player’s moves correlate with the engine’s top-evaluated move — the objectively strongest move in each position. The better Carlsen plays, the more predictable he is to Stockfish, because both are converging on the same thing: what the position demands. Carlsen’s freedom — his masterful exercise of chess understanding — is what makes him predictable. The novice who blunders is less predictable, because their moves don’t track the rational structure of the position.

This is a structural analogue for the Newcomb predictor. The predictor’s accuracy is parasitic on the agent’s rationality, just as Stockfish’s ability to predict Carlsen is parasitic on Carlsen’s ability to find strong moves. And the agent can’t outflank the predictor by deliberating differently, any more than Carlsen can surprise Stockfish by playing better chess because playing better chess means converging more closely on what Stockfish would predict. Carlsen can only become less predictable by playing worse by departing from what the position demands. Likewise, the agent facing the boxes can only outflank the predictor by reasoning worse — by departing from what their rational assessment of the situation calls for.

Crucially, Stockfish doesn’t need to watch Carlsen during the tournament game to predict his move. It only needs the position on the board and a sufficient grasp of what strong play looks like. The temporal gap between when Stockfish could have “scouted” Carlsen and when the move is actually played is irrelevant, because what’s being tracked is a standing capacity (Carlsen’s chess understanding) not a momentary state that expires when observation stops. Your “the data stops” point would be telling if what the predictor tracked were a momentary state: a mood, a neurochemical fluctuation, a passing impulse. But a rational profile built from past decisions, writing, and test results isn’t tracking a momentary state. It’s tracking a character, a cultivated capacity for reasoning that persists. That’s why the temporal cutoff doesn’t create the screening-off opportunity that a two-boxer is hoping for.

Now, this is where the empowerment criterion from my earlier post becomes relevant again. Contrast two discoveries:

You learn the predictor used a two-boxer gene: a genetic marker that correlates with two-boxing. This is genuinely empowering. The gene is a brute condition separable from your rational agency. Your rational assessment of the situation is a different source of action than the genetic predisposition the predictor tracked. You can screen off the gene: “I’m two-boxing because of my rational assessment, not because of the gene, so my two-boxing doesn’t place me among the gene-carriers the predictor anticipated.” Two-boxing is correct.

If you learn that the predictor used an exhaustive rational profile (past decisions, writing, test results, everything that reflects your deliberative character) is this empowering in the same way? You say it is, because the data collection stopped. But what would you do differently with this information? The demographics predictor isn’t a Laplacian demon tracking your every thought in real time. It works through statistical proxies. But at 99% accuracy, those proxies are fine-grained enough that the agent can’t act both intentionally (on what appear to them to be good grounds) and in a way that wasn’t predicted. You might hope there’s a gap to exploit — some clever piece of reasoning the demographic profile doesn’t capture. But any intentional attempt to exploit that gap is itself an exercise of the rational character the profile tracks. The 1% residual error is real, but it isn’t a strategic resource: it corresponds to cases where something outside rational agency (e.g. distraction, confusion, accident) intervenes. Unlike the gene case, where learning the basis frees you to exercise your rational agency against the tracked condition, here any deliberate exercise of your rational agency falls within the scope of what the predictor’s proxies capture with sufficient accuracy to disable the screening-off strategy. That’s what makes this a genuine Newcomb case, even though the predictor’s mechanism is demographic rather than Laplacian.

This is also why, as I argued before, accuracy and mechanism converge. A 65% accurate demographic predictor can get by tracking brute dispositions like greediness or impulsiveness and the screening-off strategy works. A 99% accurate predictor must be getting the rational deliberators right too, which means its demographic proxies are functioning as a window into rational character, however indirect. The temporal cutoff doesn’t help, because what the predictor tracked, whether through demographics or Laplacian simulation, is a capacity and not a momentary state. And a capacity doesn’t expire when the data collection stops.

You conclude that the problem is “critically underspecified” and that “it absolutely matters how the prediction is made.” I think that’s exactly right. But the specification that matters isn’t the temporal question of when the data was collected. It’s the constitutive question of whether what the predictor tracked is separable from the agent’s rational agency. When it is — gene, brute disposition, neurological condition — the agent can screen off, and two-boxing is rational. When it isn’t — Laplacian predictor, wizard, or demographic predictor accurate enough to be tracking rational character through proxies — the agent can’t screen off without departing from their own rational agency, and one-boxing is rational.

I foresee that @Suny, and maybe also yourself, will have a strong rejoinder to this. But I already have a good response to it.