Using code for Newcomb's problem

Newcomb’s problem has been discussed ad nauseam — in this forum and elsewhere — but after all this discussion little ground has been gained by either side. In an attempt to change this I’m going to use a novel approach: computer programming.

Ideally computer programming shouldn’t be necessary, logic alone should be sufficient to arrive to the correct answer, but as we’ve seen from the decades of discussions: that doesn’t seem to be the case. Consider what happened with Monty Hall problem where professional mathematicians consistently arrived to the wrong answer, even one of the greatest probabilists of the time — Paul Erdős — had the wrong intuition. It wasn’t until people did simulations — both manually and with computers — that the true answer saw the light of day.

Using computer simulations to find out the true answer to a very simple question shows the correct answer to Newcomb’s problem:

P(M_{C1} > M_{C2})

What is the probability that the money obtained by a one-boxer is greater than the money obtained by a two-boxer?

Several papers have attempted to answer this question with expected utility, and while this approach is correct, it’s too simplistic, and there’s a deeper answer.

We are going to refer to a choice of one-box as C1 and prediction of one-box as P1.

Basic probability

Let’s define the concept of probability starting from p=0.99. Given this, the frequency of successful independent Bernoulli trials with success probability of p will converge towards 0.99.

Using code:

mean(runif(n) < 0.99)

Note: here I’ll be using the R programming language given that it’s more succinct for this purpose, but the same can be reproduced in any programming language.

Mathematically: X_i = \mathbf{1}_{\{U_i < 0.99\}}, where U_i \sim \mathrm{Uniform}(0,1).

Gives us a result that approximates p as n increases.

If this is true (the definition of probability tells us that it is and we can verify empirically with code), then it follows that the expected utility of choosing two-box is:

money_c2 <- (1 - (runif(n) < p)) * 1000000 + 1000
mean(money_c2)

Which for p=0.99 gives us around $11,000, which aligns with the math:

E[C2] = P(P1∣C2) \cdot 1000000 + 1000

Therefore it’s proven that the average money obtained by one-boxers is greater than the one obtained by two-boxers. For p=0.99 it’s $990,000 and $11,000 respectively.

Let’s suppose a counterfactual where 100% of two-boxers earn more than $1M, say 100,000 of them. Although technically not impossible, the probability of that happening would be around 10^{−436}, so pretty much 0 and certainly very far from 0.99. It doesn’t seem rational to expect the impossible.

Most two-boxers accept this fact: Why Ain’cha Rich? argument. But then argue that “irrationality” is being rewarded, so even though two-boxing results in less money overall, it’s still the rational choice. Of course, this is a tricky definition of “rationality”.

Basic probability, whether it’s through code or math cannot be denied.

money_c1 <- (runif(n) < p) * 1000000 + 0
money_c2 <- (1 - (runif(n) < p)) * 1000000 + 1000
mean(money_c1) > mean(money_c2)
E[C1] > E[C2]

The only way to argue for two-boxing is to deny basic probability, which is not only to deny probability theory and the empirical results obtained through exhaustive computation, but the philosophy of probability as well.

The case for two-boxing

Naively one might think this is case closed, since clearly one-boxers make more money than two-boxers (on average). But if that was the case there would be no controversy, and there clearly is.

The case for two-boxing begins by fixing the prediction and arguing from there: for example given that the prediction was one-box. This isn’t necessarily a valid move, but for the sake of argument let’s suppose that it is.

If the prediction was P1, then C1 gives you $1,000,000 and C2 gives you $1,001,000.

However, how does this look like from the point of view of the code? First, we have to store the predictions, so that we can filter using them later:

pred_c1_p1 <- runif(n) < p
pred_c2_p2 <- runif(n) < p

money_c1 <- pred_c1_p1 * 1000000 + 0
money_c2 <- (1 - pred_c2_p2) * 1000000 + 1000

Now we can compare:

money_c1[pred_c1_p1] > money_c2[!pred_c2_p2]

Except when we try to do that the program fails because the vectors are of different size, which gives us a hint that this is an invalid approach.

What is money_c1[pred_c1_p1]? That’s the money obtained when choosing one-box given that the prediction was one-box (i.e. 1,000,000). And money_c2[!pred_c2_p2] is the money obtained when choosing two-box given that the prediction was one-box (i.e. 1,001,000).

Obviously 1,001,000 is greater than 1,000,000, but we can’t compare apples to oranges. The amount of people that earn one are fundamentally different than the people that earn the other.

We can ignore the vector sizes and conclude that two-boxing always gives more money, but we have to remember that this analysis began with the supposition that fixing the prediction was a valid move in the first place, which we never established.

\begin{aligned} E[C1|do(P1)] &< E[C2|do(P1)] \\ E[C1|do(P2)] &< E[C2|do(P2)] \end{aligned}

The failure of the program above tells us that C1|do(P1) is fundamentally different than C2|do(P1), namely that the sample size of the former is orders of magnitude greater than the later.

Intuitively we know why: given a high enough p, the probability that an agent is going to choose C1 and Omega predicted P1 is very low.

We can compare the size of the vectors with the following code:

length(money_c1[pred_c1_p1]) / length(money_c2[!pred_c2_p2])

For p=0.99 the result is around 100, and for p=0.9999 the result is around 10000. That means for a population of 100,000 participants, the size of C2|do(P1) is around 10.

We can fix the code to obtain 10 items from C1|do(P1) in order to match the size of C2|do(P1) and compare 10 from one, with 10 from the other:

min_size <- min(length(money_c1[pred_c1_p1]), length(money_c2[!pred_c2_p2]))
s1 <- sample(money_c1[pred_c1_p1], min_size)
s2 <- sample(money_c2[!pred_c2_p2], min_size)
mean(s1 > s2)

The result is 0, which means for each of those 10 samples, the money obtained by two-boxers is greater than the money obtained by one-boxers.

But remember the tentative conditional we started with: if the prediction was P1. We never established this was a valid move, and the complexity of the code suggests it isn’t, because we have to compare two fundamentally different objects. In order to make this possibly invalid comparison, we have to ignore 99990 elements from C1|do(P1) (99.99\%).

So the case for two-boxing begins with the assumption: if we ignore the fact that N[C1|do(P1)] is much greater than N[C2|do(P1)]. Which is another way of saying: if we ignore p.

That’s because the ratio of N[C1|do(P1)] over N[C2|do(P1)] is:

\frac{p}{1 - p}

Which gives us yet another clue: what happens if p=1? In other words: what happens if the predictor is perfect? In that case the ratio becomes infinite because N[C2|do(P1)] is 0.

It’s nonsensical to say let’s obtain a sample of 0 elements from C1|do(P1) and compare them with 0 elements from C2|do(P1).

Which is why for two-boxers p=0.9999999999 is fundamentally different from p=1 even though intuitively it shouldn’t be.

Although analytically E[C1|do(P1)] < E[C2|do(P1)] holds for p=0.9999999999, it doesn’t hold in a computer program:

n <- 1e5
p <- 0.9999999999

pred_c1_p1 <- runif(n) < p
pred_c2_p2 <- runif(n) < p

money_c1 <- pred_c1_p1 * 1000000 + 0
money_c2 <- (1 - pred_c2_p2) * 1000000 + 1000

min_size <- min(length(money_c1[pred_c1_p1]), length(money_c2[!pred_c2_p2]))

In this case min_size is 0 (almost certainly), which makes the subsequent comparison impossible, and the result is NaN.


From multiple vantage points it is clear the case for two-boxing relied on an invalid assumption.

Even in the most charitable interpretation, two-boxers bear the burden of proof if they want to assume a fixed prediction — and they don’t meet it. It’s assumed without justification.

The answer

Let’s return to the original question:

P(M_{C1} > M_{C2})

The answer using programming is surprisingly simple:

mean(money_c1 > money_c2)

The probability that one-boxers will obtain more money is p^2, so if p=0.99 the probability is 98.01\%.

This contradicts the simplified calculation using expected value which states that the cross-over point is around 51%. According to this formula, the real cross-over point is:

\frac{1}{2}^{1/2}

That is: around 71\%.

Case closed. We don’t even need to calculate the expected value to arrive to the correct answer.

Explanation

There are four possible outcomes, 2 for one-boxers: \{1000000,0\}, and 2 for two-boxers: \{1000,1001000\}. The probabilities of the product are the following:

$1,000 $1,001,000
$1,000,000 P(C1|P1) \cdot P(C2|P2) P(C1|P1) \cdot P(C2|P1)
$0 P(C1|P2) \cdot P(C2|P2) P(C1|P2) \cdot P(C2|P1)

The most likely outcome for one-boxers is $1,000,000 with probability P(C1|P1) which is p. The most likely outcome for two-boxers is $1,000 with probability P(C2|P2) which is p.

That’s why P(C1|P1) \cdot P(C2|P2) dominates: p \cdot p.

Two-box partition

What two-boxers do is begin with an assumption: if P1 (do(P1)). There’s only one cell that satisfies that predicate: P(C1|P1) \cdot P(C2|P1). Then they follow with: if P2 (do(P2)). Again, only one cell satisfies that: P(C1|P2) \cdot P(C2|P2).

When we expand the original formula using the law of total probability, we get:

\begin{aligned} P(M_{C1} > M_{C2}|do(P1)) &\cdot [P(C1|P1)P(C1) + P(C2|P1)P(C2)] + \\ P(M_{C1} > M_{C2}|do(P2)) &\cdot [P(C1|P2)P(C1) + P(C2|P2)P(C2)] \end{aligned}

We can see that in this formulation the likely probabilities are mixed with the unlikely probabilities, therefore masking p. But the real problem is that by the time we consider P(M_{C1} > M_{C2}|do(P_{x})), we already lost 2 of the 4 probability combinations; in particular p^2 which is the overwhelmingly most likely.

Two-boxers only consider P(C1|P1) \cdot P(C2|P1) and P(C1|P2) \cdot P(C2|P2). In both cases when p=1 the result is 0, making the comparison nonsensical because there’s nothing to compare to.

Moreover, this equality assumes collapsibility, which once again it’s not something that can be automatically assumed.

Simpson’s paradox

This is structurally identical to Simpson’s paradox. In Simpson’s original 2×2×2 formulation with elements \{A, B, C\}, it is possible that:

P(A|B,C) > P(A|\bar{B},C) \quad \text{and} \quad P(A|B,\bar{C}) > P(A|\bar{B},\bar{C})

yet when aggregating over C:

P(A|B) < P(A|\bar{B})

The subgroup analysis reverses when the groups are combined, because it ignores the population weights of each subgroup. Mapping to our formulation: A is the probability of winning, B is two-boxing, and C is the prediction. The two-boxer shows that within each level of C (i.e. do(P1) and do(P2)), two-boxing dominates:

P(M_{C2}|do(P1)) > P(M_{C1}|do(P1)) = \text{TRUE} \\ P(M_{C2}|do(P2)) > P(M_{C1}|do(P2)) = \text{TRUE}

But when C is marginalized out correctly the conclusion reverses:

P(M_{C2}) > P(M_{C1}) = \text{FALSE}

Although Simpson’s paradox is well understood, we can also verify the results using computation:

c <- runif(1e5) < 0.99
b <- (1 - c) * 0.999 + 0.001
nb <- c * 0.999

compare <- function(ma, a, mb, b) {
  n <- min(length(a), length(b))
  cat(sprintf('%s > %s: %s\n', ma, mb, mean(sample(a,n)) > mean(sample(b,n))))
}

compare('P(A|B,C)',  b[c],  'P(A|¬B,¬C)', nb[!c])
compare('P(A|B,¬C)', b[!c], 'P(A|¬B,C)',  nb[c])
compare('P(A|B)',    b,     'P(A|¬B)',    nb)

Which outputs:

P(A|B,C) > P(A|¬B,¬C): TRUE
P(A|B,¬C) > P(A|¬B,C): TRUE
P(A|B) > P(A|¬B): FALSE

You don’t need to understand the code or trust me, you can run it on an online R interpreter like rdrr.io.

Conclusion

The conclusion is that two-boxers begin with the assumption that the probability p doesn’t matter, and end with the conclusion that therefore the probability p doesn’t matter. But this is contingent on the unjustified assumption.

It’s nothing more than circular reasoning, which can be proved either using math or computer programs.

1 Like

But Newcomb doesn’t ask that question at all, so your program is answering the wrong question.
The question is, should I take one or both boxes? It never asks for a probability.
While the Monty Hall thing definitely has one correct answer, the Newcomb thing doesn’t, and the argument to 2B typically doesn’t involve probability at all. The context of the big box is arguably not within your control. You get it unconditionally. The choice is reduced to whether to grab the extra K, and the contents of that box is not a matter of probability at all, so quoting probabilities is not the way to counter their thinking.

I’m not advocating to 2B (I would 1B even if the big box was transparent and empty), but when addressing those that do, you need to address their argument, not your own.

Several papers have attempted to answer this question with expected utility, and while this approach is correct, it’s too simplistic, and there’s a deeper answer.

EDT is also not correct, as has been discussed in the main topic. It works in this scenario, but for the wrong reasons.

Also, don’t forget that a good percentage of those who argue the 2B case would actually choose 1B if they faced a real scenario with real money. There’s typically no way to tell unless you have many millions with which to test this.

Sure, but again, answering the wrong question. I think the question you want to answer here is: Given the million and the thousand, how good must the predictor be (what’s his hit rate) in order for 1B to be the more rational choice? That’s quite a different question, and the answer is quite close to 51%, not 71.
An expected utility calculation is needed, not a probability of a given 2B guy getting more than a given 1B guy.

1 Like

I understand that. I’m not claiming that’s the question. My question is merely a tool to explore Newcomb’s problem question “what do you do?”.

I know, but what I’m trying to establish is should two-boxers involve probability?

The typical two-boxer rationalization is that this problem rewards “irrationality”. So if I’m a one-boxer that ends up with more money than them, it’s because I’m “irrational”.

But Newcomb’s problem doesn’t ask what’s “rational” either. It only asks “what do you do?”.

Two-boxers decided to explore what’s “rational” because that’s what they think it’s interesting, but that’s not what the problem asked.

So I’m exploring a different question, because I think it’s interesting.

But I did address their argument, didn’t I say this?

If the prediction was P1, then C1 gives you $1,000,000 and C2 gives you $1,001,000.

That means if the prediction was one-box, choosing two-box gives you an extra $1 K. Conversely if the prediction was two-box, choosing two-box also gives you an extra $1 K.

That is literally their argument.

What I’m pointing out is that their argument begins with a conditional (if).

But I didn’t mention EDT. EDT is a red herring just to make CDT appear better.

What what about more modern post-theoretic decision theories? What about FDT?

In my view the answer is 71%. Expected value hides a lot of information from a probability distribution.

Consider this scenario. You are moving to an experimental colony in Mars where you are forced to start from scratch and give up all your net worth. You have tow colonies to choose from, in colony A the average income is $200,000, in colony B it’s $100,000. Both colonies have 100 members.

Which colony is more rational to choose?

Naively you may think you’ll have a better income in colony A, but colony A has much more inequality, so the median income is actually $80,000, and the median income of colony B is $90,000.

So you as an individual have more chances to earn more income in colony B, regardless of the fact that colony A has a higher average.

Expected value hides important information such as inequality. Just because the average income is $200,000 that doesn’t mean that’s what you as an individual should expect. 80% of this population is below the average. Expected value is a misnomer because you as an individual should not expect that.

And that’s what’s happening here. A probability of 51% would make the average of one-boxers be greater than the average of two-boxers, but that’s an ensemble average. That doesn’t mean that you as a one-boxer individual is likely to win more money that a two-boxer individual.

Why would I care how much money a team of one-boxers makes? I care about how much money I make.

1 Like

You don’t call it a tool. You said that it “shows the correct answer to Newcomb’s problem”

For example, they sell scratch off tickets for $1 and the payoff is $100 at odds 0.0199%

Two people make different choices. One (B) buys a ticket, the other (D) declines. The probability that D nets greater utility than does B is about 98%, and yet B is the rational choice.
This example illustrates that a probability of favorable outcome calculation does not answer the question.

Fair enough, and they do involve it, since a lot of them decide to 1B iff the predictor is perfect. That makes no sense to either of us since damn-near-perfect yields almost the same utility, and utility calculations definitely involve probabilities.

But then they go off on the ‘fate’ line of reasoning and lose sight of the fact that while the predictor rewards those predisposed one-boxers, but one is always free to choose to be thus predisposed, even after the predictor has done his thing.

The typical two-boxer rationalization is that this problem rewards “irrationality”. So if I’m a one-boxer that ends up with more money than them, it’s because I’m “irrational”.

I agree that Suny doesn’t argue his position well.

That means if the prediction was one-box, choosing two-box gives you an extra $1 K. Conversely if the prediction was two-box, choosing two-box also gives you an extra $1 K.

That is literally their argument.

What I’m pointing out is that their argument begins with a conditional (if).

That is at least his argument, but it’s not an ‘if’, it’s an if-else, with +1000 on both sides of the conditional. So attacking the ‘if’ seems fallacious.
It can be reworded as 'no matter what the predictor predicted, choosing 2B gets you an extra 1K. No ‘if’ in that, and yet that’s his argument.

But I didn’t mention EDT.

I said EDT in reply to your mention of ‘expected utility’, which is EDT thinking.

What about FDT?

Better. FDT works well with the transparent box case. EDT arguably doesn’t.

Did you program that? 1M and 1k in the boxes, predictor probability of accuracy is 70% (less than 71). You say the 2 boxers fare better on average than the one boxers? If not, you’re answering a different question when you conclude 71%.

Why a Mars colony? My first thoughts are that Earth money is toilet paper. A 100 person colony is a commune, with no money, and everybody pitching in for the good of all.
I’m going to ignore all that and take the problem as melted down, in which case the answer is ‘not enough info’. The average or median income has little to do. If you’re trying to maximize your income, what utility is each group going to provide for whatever it is you do? That’s what question should be asked.

Naively you may think you’ll have a better income in colony A, but colony A has much more inequality, so the median income is actually $80,000, and the median income of colony B is $90,000.

Naively you may think that median income is what matters, but looking again at our lottery players B and D above, the median utility of D (0) is better than that of B (-1), but B is still the better choice. So it’s not about median either.

A probability of 51% would make the average of one-boxers be greater than the average of two-boxers, but that’s an ensemble average. That doesn’t mean that you as a one-boxer individual is likely to win more money that a two-boxer individual.

No argument there. The tipping point of the average was the question asked when the answer was 51%.

Another thing that has not come up is utility of $1. You’re dirt broke, struggling to survive. Somebody hands you $200,000 and offers to triple it if you guess a coin flip, forfeiting it all if you lose. Do you accept the risk? Math says to do it. Reality says don’t. You need the first 200k far more than you need the next 400k.

It’s both. A tool can show you the correct answer.

But that’s a hint that they are doing something irrational. If a formula gives you a singularity — like an infinite result — that’s a clear indication that the formula is wrong.

I’m ignoring Suny’s “argument”. Every two-boxer I’ve debated with (other than Suny) accepts one-boxers get more money.

An if-else is two ifs.

But that statement gets followed by a conditional. It’s not just “no matter what the predictor predicted”, it’s “no matter what the predictor predicted, whether it’s prediction 1 or prediction 2…”.

This is very similar to Simpson’s paradox, where the controversial conclusion begins with “no matter whether you are male or female”, but then proceed to partition the results with the conditional “if you are a male…” and “if you are female…”. But if you don’t partition the results, the result is the inverse.

Do you want me to give you the table of the original paper so you can calculate it yourself?

My approach explicitly avoids expected utility, so EDT is a red herring.

Yes, you already saw (and executed) the program.

You don’t even need a computer program, you can run the algorithm in your head.

If the accuracy is 70%, we can imagine four groups in two categories. For 100 one-boxers we would have 70 with $1,000,000, and 30 with $0. For 100 two-boxers we would have 70 with $1,000, and 30 with $1,001,000.

If you pick one one-boxer at random and one two-boxer at random, the probability of a one-boxer having $1,000,000 is 70% and the probability of a two-boxer having $1,000 is 70%, so together the probability is 0.7*0.7, or 49\%.

If you are not convinced, you can calculate all the other probabilities: $0 with $1,000, $0 with $1,001,000, $1,000,000 with $1,001,000. The probabilities are going to be 0.21, 0.09, and 0.21 respectively. So: 0.21 + 0.09 + 0.21, which is 51\%.

Clearly: 51\% > 49\%.

So yes, the 100 one-boxers are going to have more money: 70*$1,000,000 + 30*$0 than two-boxers: 70*$1,000 + 30*$1,001,000. So on average (total/100) the ensemble is going to have more money.

That doesn’t mean that you as an individual random one-boxer is going to have more money that another individual random two-boxer.

You can use bottle caps for all I care. In all societies it’s clear some individuals contribute more than others. If you don’t think a commune is going to recognize that, I don’t think you are being realistic.

And you are intentionally avoiding the point.

Probability distributions have properties, the most common ones are the mean and the variance, but the third most important moment is skewness, which a lot of people ignore.

A classical way to conceptualize a skewed distribution is to imagine a room of 100 Silicon Valley professionals who on average have an income of $100 K. Elon Musk enters the room, and now on average their income is $2,000,000 K.

Is that a fair representation of the “average person” in the room?

If my Mars example doesn’t satisfy you, I can construct countless synthetic examples that demonstrate the point. But you can also look at any country on Earth and verify that the mean income and the median income are not the same… Anywhere on Earth.

That is a modeling problem. It’s the same thing as accepting a game of Russian roulette because you don’t lose any money if you lose the game.

Once you add into the model the actual variables of interest for you, the correct answer is trivial.

I do too but it doesn’t surprise me you don’t realize that.

Funny to mention the Simpson’s paradox which is another example of how simply looking at probabilities does not necessarily tells you relevant causal information.

It’s amazing you manage to make your analysis without once taking the money into account (like it seems having $70,000 instead of $1,000, for example, wouldn’t change a thing). But maybe you simplified it. Tell me, why do you take the $1,000,000 and $1,000 to be the ones which should be higher than 50%? Why this pair?

I don’t get the logic. Also, it’s not just that it is the highest probability compared to the other pairs, but it must be greater than 0.5, it seems.

I am interested, as you are the first person to ever propose a percentage other than 51%.

You say:

But, that’s just the expected value and the random one-boxer has more money on average when the accuracy is greater than 51%.

You said several times that two-boxers get more money than one-boxers.

I would go look at the quotes, but I don’t think it would matter.

Simpson’s paradox is all about probabilities. If you don’t understand that, you don’t understand Simpson’s paradox.

That is by definition. @noAxioms stated if the accuracy of the predictor was 70%, so:

  1. The accuracy of the predictor is 70%
  2. If the predictor accurately predicts that a two-boxer would pick two-boxes, the reward is $1,000

According to the formulation of Newcomb’s problem; the probability that a two-boxer gets $1,000 is 70%. By definition.

As I already explained, “expected value” is a misnomer. The value an individual one-boxer should expect is not the “expected value”.

You didn’t get the question. But I think I get it now; I think you took the pairs where the one-boxer gets more than the two-boxer. There is only one: $1,000,000 and $1,000.

So, if I understand correctly, if instead of $1,000 we had $999,000, your analysis would still say that from 71% onwards, it becomes more rational to one-box?

1 Like

I already explained that in multiple comments in the other thread. You asked why $1,000,000 and $1,000, I already explained: if p=0.7 then the probability of getting $1,000 for a two-boxer is 70%, and the probability of getting $1,000,000 for a one-boxer is 70%. The probability of both is 0.49\%, which means a one-boxer does not get more money than a two-boxer.

It’s not my analysis, it’s a mathematical fact that overall a random one-boxer would get more money than a random two-boxer.

And yes, I already stated that $1,000 is a distraction, and the same goes for $999,000. If the base money is $999,000, the cost of the mistake may be less, but it’s still a mistake.

But should it affect the accuracy of when you should switch to one-boxing?

No. An expected value of $1,989,000 is not real, nobody obtains that amount. That’s why the average person doesn’t exist, not in reality. Two-boxers would obtain either $999,000 or $1,999,000. Nothing in-between.

That’s why for you as an individual the average doesn’t matter.

Why would it matter for you that the average is $1,989,000 if you obtained $999,000? One is a reality, the other is a fictional concept.

It can indeed, but not if the wrong tool is used, as it is in this case. A coin flip can also show a correct answer, but it’s not the correct tool.

But that’s a hint that they are doing something irrational. If a formula gives you a singularity — like an infinite result — that’s a clear indication that the formula is wrong.

I don’t think the typical two-boxer is using any formula at all. The reason for switching to 1B if the predictor is perfect is somehow rationalized as having no choice. What, you lack the free will to be stupid? You want to be stupid but determinism compels you to make the better choice? Funny argument. But anyway, singularities don’t seem to come into play with that reasoning.

An if-else is two ifs.

I beg to differ. An if-else would be compiled into a single conditional machine instruction (e.g. bgt), not two. But an optimizing compiler would have optimized the algorithm you give with no condition at all. The choice is not a function of the prediction at all.

So a single conditional if-else
if (x) do Y; else do Y
optimizes into unconditional
do Y

No it isn’t. There is zero conditionals in the statement of mine you quoted. See the optimization example just above.

Funny, I drove at this point with @Suny as well (when discussing the friend that could see the box contents and advise) and he just couldn’t see it either. The friend advice was to 2B no matter what he saw. That’s unconditional advice. No need to have bothered looking.

I did not. You showed a program that shows the odds of a single given one-boxer faring better than a single given two boxer. That is a different question, and yes, the break-even point is about 71%.

It does not answer my question. I put back the context above: “Given the million and the thousand, how good must the predictor be (what’s his hit rate) in order for 1B to be the more rational choice?”

Your program totally fails there. It returns 98%. That’s not the correct answer. I modified the program to have p=70%, and it returned 49%, meaning a given 2B typically does better than a given 1B, but 2B is still the wrong choice.
Original program, but change the two 1000000 references to 1001 instead (potential $1001 in the opaque box). It still returns 98% (correct, but wrong question), which suggests that 1B is the correct choice when it most definitely is not.

This is what you get when asking the wrong question. Your program doesn’t lie, but you also don’t get the answer to the correct question.

Nit: Poor coding that I had to change 1000000 twice. It should be a constant defined in one place. That’s just the software reviewer bone waving it’s hand.

You don’t even need a computer program, you can run the algorithm in your head.

If you pick one one-boxer at random and one two-boxer at random, the probability of a one-boxer having $1,000,000 is 70% and the probability of a two-boxer having $1,000 is 70%, so together the probability is 0.7*0.7, or 49\% 49%.

Sure, but it’s still the wrong question, made all the more clear by the fact that changing the million to 1001 has no effect on the answer.

So yes, the 100 one-boxers are going to have more money: 70*$1,000,000 + 30*$0 than two-boxers: 70*$1,000 + 30*$1,001,000. So on average (total/100) the ensemble is going to have more money.

But the program didn’t compute that at all. At no point were all the winnings added up.

And you are intentionally avoiding the point.

I set aside the whole commune/Mars thing and took the problem at face value. I disagreed with your point, which seemed to be that a median calculation yielded the correct choice over an average calculation. Both are demonstrably wrong, as I demonstrated.

Your colony example amounts to a job interview, where you get to ask ‘what would I get paid to join you?’. You probably start at the bottom and work up, so a one number answer is an incomplete one. But few job applicants ask about average, median income, and income skew.

I’m quite aware that average rarely equals median in any list.

It’s because he’s asking a different question than ‘should I 2B?’.

Agree, but can you tell me what ‘expected value’ is then, that is (as we both agree) not the value an individual one-boxer should expect?
I ask because most of us find it important. It’s important in a lottery for instance, where expected value can be calculated (but is poorly calculated by anybody who plays the game), as opposed to the value an individual lottery ticket should expect to be worth, which is of course nada. But despite that expectation, people buy those tickets. In my example, that’s actually the rational choice, but not according to the way you spin it.

Excellent, thank you. You see it as well.

Actually, it ceases to be a mistake. A program that answers the correct question would show this.

The “wrong” tool can still give you the correct answer.

This is just a loaded language fallacy. You can call the tool “wrong”, but by what metric? If the metric is matching the long-run frequency of meeting a person of female sex in random encounters, then a coin flip can be as accurate as anything else.

This is just rhetorical dismissal of the form of “your prediction turned out to be correct, but for the wrong reasons”. Yes you can argue the tool is “wrong”, but if it gives the right answers, that’s an opinion, not an objective claim.

Their reasoning can be put into a formula:

  • E(C1) = x * 1000000 + 0
  • E(C2) = x * 1000000 + 1000

Therefore E(C2) = E(C1) + 1000. This is the wrong formula (objectively), but it’s a formula.

Yes, because the computer assumes the instruction after the conditional is going to be executed after.

This is a nonsensical question in probability theory that is the source of many “paradoxes”. You are basically saying the probability of getting an ace in a deck of cards is 100%, of course, after making sure you didn’t get a card that isn’t an ace in the first place.

That’s why probabilists resort to probability trees. The probability of landing in any given node of level x relies on the probabilities of the branches before that level.

Because your statement is incomplete.

That is absolute nonsense. If my friend is blind, then his “advice” is meaningless. He has no information that I don’t.

To try to exemplify this nonsense, let’s consider this example. You are driving towards a town and you have two options: if you take road A, you arrive at the town from the north, if you take road B, you arrive at the town from the south. Regardless of which road you take, you are still going to reach the town, so you don’t need to consider which road to take. Except if you don’t consider any road, you fall into a ravine and you die.

So “the condition doesn’t matter” and “all the available conditions lead to the same result” are two very distinct claims.

I can only choose to be a one-boxer or a two-boxer. What else can I choose?

The rational choice for who? For me? For a philosopher? For an economist? For treasurer trying to minimize expenses?

The rational choice for me is very obvious.

That’s the loaded language fallacy. You label the choice as “wrong”, even though the two-boxer does in fact get more money.

Why isn’t it correct? I get $1,001, Sunny gets $1,000. $1,001 is more than $1,000, isn’t it?

I use 1e6 in my actual code. The only reason I use 1000000 in the pasted code is to make clear that is big money for phsychological purposes.

Right. But it’s not the “wrong” question objectively. It’s the “wrong” question because subjectively you personally don’t like the result. Loaded language.

Not true: mean(x) adds all the elements in vector x and then divides them by the length of the vector. So…

mean(money_c1) > mean(money_c2)

Did compare all the winnings.

And you get to define the “right” answer as $990,000 even though no one gets that amount. And $1,000,000 is the “wrong” answer, even though that’s precisely what you would get. Right?

That’s why the phrase “there’s no such thing as the average person” is so true.

No, that’s precisely the question I’m asking.

I’m asking if Sunny should choose two-box.

You on the other hand is asking if we had a hypothetical infinite number of Sunnys and we added all their winnings and we divided that amount by the total number of hypothetical Sunnys, how much would that amount be? And then if we do the same for a hypothetical infinite number of one-boxers, and compare the amounts, would the amount for two-boxers be bigger?

My question is real, your question is completely imaginary.

I’ve provided that numerous times: 10^6 \cdot p.

No it is not. I get $1,000,000, Sunny gets $999,000. I get more, he gets less. Case closed.


I’m struggling to understand in what universe you think you as an individual would obtain the expected value. If p=99\% how can you as a one-boxer ever obtain $990,000?

By the metric of answering the question ‘which should I choose?’. The tool you provide prints a percentage. It doesn’t answer the primary question.

But I didn’t say he was blind. He knows what’s in the other box, and thus has information that you don’t. But his advice, not being conditional on the box contents, doesn’t convey that extra information.
The friend is asserted to be ‘rational’, and so doesn’t recommend say walking out or some other third option.

I can only choose to be a one-boxer or a two-boxer. What else can I choose?
The rational choice for me is very obvious.

I didn’t suggest there was a third choice. I’m saying your ‘tool’ doesn’t tell you what to choose. The rational choice for you may be ‘obvious’, but the tool isn’t what makes it obvious. On a side note, the rational choice for Suny is also ‘obvious’, yet he chooses differently, so what is the obvious choice isn’t necessarily obvious.

You declaring the two-boxer to get more money is a loaded language fallacy. What he gets depends mostly on if the predictor got him wrong (30% odds). 70% of the time he gets 1000.

My labeling of best choice is based on expected utility (before the outcome), which is 700000 vs 301000 respectively. Yes, if you define ‘expected utility’ as ‘the amount you’ll most likely take home’, then expected utility is 1000000 and 1000 respectively. Under both definitions, 2B is the wrong choice.

But your choice seems to be based not on utility, but on odds that the predictor guesses two different people correctly, despite the fact that you’re not two people. Yes, odds are, with two people and 70% accuracy, that the predictor will get at least one of them wrong, which is the only thing your tool indicates.

I am really questioning your mathematics literacy. Honestly, you don’t see it?

I use 1e6 in my actual code.

That’s the iteration count. All the payouts are hardcoded twice instead of defined once up front. I’m complaining about the multiple writing of the constant, not about the notation you use to specify it.
As I said, it’s not a bug, but it’s the kind of thing that would glean a comment in a code review. It makes it more error prone when one wants to experiment by changing the values.

It’s taking the mean of a conditional (a truth value), which is always 0 or 1. At no point is mean(money_c1) compared with mean(money_c2). At no point is the value of any money passed to a mean function. Only the value of a truth statement.

This is why changing the values from 1M/1K to 1001/1000 has no effect at all on the output.

You on the other hand is asking if we had a hypothetical infinite number of Sunnys and we added all their winnings and we divided that amount by the total number of hypothetical Sunnys, how much would that amount be?

Infinite? Does the answer change significantly if I go way beyond 1e6 like that? Anyway, yes, that’s an expected utility calculation, and most people don’t go so far as to calculate it exactly, but it drives the choice, and not the ‘odds of beating by a few cents a hypothetical specific competitor’.

And then if we do the same for a hypothetical infinite number of one-boxers, and compare the amounts, would the amount for two-boxers be bigger?

With 1001,1000 in the two boxes and 99% prediction accuracy, yes it would be bigger for the 2B. Your ‘tool’ fails to show that, and I didn’t need to write a simulation (or even get out a pencil) to see it.
At 75% accuracy, 2B strategy would be a lot bigger.

I’ve provided that numerous times: 10^6 \cdot p 106 ⋅𝑝.

No it is not. I get $1,000,000, Sunny gets $999,000. I get more, he gets less. Case closed.

You didn’t work p into it, and yet you did just above. Now why is that? Case not so closed.

So you’re saying that if somebody has 100 raffle tickets, one of which wins a million, and sells them for $1 each to different people, that you’d find the rational choice to be to decline the purchase of one since you have a 98% probability of being $1 ahead of specifically Suny, who buys one of them.

I’m struggling to understand in what universe you think you as an individual would obtain the expected value.

I never said that. No lottery player expects 0.20 back on his $1 ticket. It’s a million or nada, or maybe some lesser prize like 5000, but not 0.20, even if that’s the expected value of the ticket.

20% is about the average return for minor lottery tickets. It gets lower (12-15%) with the big ones like powerball and such, but very few people know how to compute it correctly.
Newcomb is funny because negative utility is arguably impossible.

Ipse dixit. By what metric does E[C1] > E[C2] answer the question in a way that P(M_{C1} > M_{C2}) doesn’t?

You are just calling your opinion “the metric”.

You said “no need to have bothered looking”, if he doesn’t look, he has no extra information, he has to look in order obtain the information, and the information is the amount of money in the mystery box, which by definition is conditional on the prediction.

And you get to decide what is the “right” tool that tells you that entirely based on your opinion.

Which you get to label arbitrarily based entirely on your opinion.

  1. What is the probability that you get 1000000?
  2. What is the probability that you get 1000?

There’s nothing to see. It is a fact that 1001 > 1000.

No. You didn’t read what I said.

Yes, expected utility assumes the perfect average as n \to \infty.

No. An individual two-boxer wouldn’t get more money, only the ensemble of infinite hypothetical two-boxers (plural). Not “the two-boxer”.

If I need that $1 to survive, yes. 99 times out 100 I would end up dead. The only reason you think otherwise is that you constructed the problem precisely to make $1 psychologically negligible.

Multiply both quantities by 1e6, so the ticket costs $1 million, and the price is $1 trillion. According to your expected utility I should buy the ticket no matter what, but I don’t have $1 million to throw around, I would have to sell some assets and ask for a loan. 99 times out of 100 I would end up with nothing except debt.

And that’s why the lottery ticket prices are a couple of bucks and not thousands: people fully expect to lose a trivial amount of money in each ticket.

Expected utility tells you that the bet is rational if you are prepared to lose that money, but the probability that you are going to lose doesn’t change.

I’m merely repeating earlier posts, and just giving a reminder that the problem isn’t solvable through probabilistic analysis, because the problem consists of three assumptions that are simultaneously inconsistent. To get a solution first requires sacrificing one of the following assumptions:

  1. An opaque box exists with a definite hidden state that is predetermined.

  2. A choice exists to open the opaque box.

  3. The box’s contents depend on whether or not the box is opened.

Assumption 3 is just a rewording of the infallible predictor assumption: contrary to what a statistician might think, an infallible predictor isn’t merely almost surely correct, but surely correct with no exceptions.

1+2 implies 3 is false: the presence of choice + predetermination of box contents entails that the predictor cannot be infallible, contrary to assumption. In which case, the predictor is at most almost surely correct, entailing that the two-boxing strategy is winning, even if the predictor correctly guessed the result of similar trials in the past with probablity 1. This is because choices refer to causal interventions, and causal interventions generally render inferences based on repeated trial analysis as invalid.

(frequencies concern outcomes, not choices)

1+3 implies 2 is false: If both the past and future are predetermined, then there cannot exist a choice to open the box or not, contrary to the problem’s description of a player presented with a choice. In which case there doesn’t exist a “winning” strategy, because the player’s strategy is in effect decided by the infallible predictor rather than the player. Repeated trial analysis is now relevant for computing the player’s expected return, if an additional assumption is made as to how often the predictor ‘chooses’ what the player will do.

2+3 implies 1 is false: the hidden state of the opaque box cannot be predetermined if a choice exists and it affects the state of the box; in which case the infallible predictor is ‘cheating’ by switching the box contents depending on the player’s choice, in manner that is mathematically describle in terms of Wheeler’s delayed choice experiment, contrary to the assumption that the oracle doesn’t ‘cheat’.

This has nothing to do with “the box’s contents depend on whether or not the box is opened”.

In Newcomb’s problem the choice is between a) one-box, or b) two-box. That’s it. Whether “the box” is opened or not is not part of the choice.

If the “mysterious” box is transparent and I see it’s empty, I can still choose one-box, in which case the predictor was wrong, rendering him fallible. So your conclusion is incorrect.

This is just a semantic trick because you are playing with the word “choice”. Choice doesn’t entail an ability to choose otherwise. I can program a computer to make a deterministic choice, based on certain input, it would always make the same choice. But because the input can change, the choice can change. Selecting between different options is a choice.

You personally don’t like the idea of deterministic choices, so you say that’s not a “choice”. You claim it’s not a “true choice” all you want; it’s still a choice.

Your claim that “2+3 implies 1 is false” fails for the same reason.

And your claim that “1+2 implies 3 is false” doesn’t follow, because the choice is not to open “the box”.


Your framing has nothing to do with Newcomb’s problem.

If you wish, sure, my opinion mirrors the majority ‘opinion’, which is that problems such as Monty Hall and Newcomb are considered by people maximizing expected utility. Your opinion apparently differs.

You said “no need to have bothered looking”, if he doesn’t look, he has no extra information, he has to look in order obtain the information, and the information is the amount of money in the mystery box, which by definition is conditional on the prediction.

I never said otherwise. What I said was that his advice is not a function of this obtained knowledge. Hence this knowledge serves no apparent purpose.

No, the label is standard, not opinion. Wiki for instance states:
“The expected utility hypothesis states an agent chooses between risky prospects by comparing expected utility values (i.e., the weighted sum of adding the respective utility values of payoffs multiplied by their probabilities).”

You seem to define the term with your own private definition, one mirroring your opinion as to the correct way to approach the situation. Using a standard term in a non-standard way has led to an awful lot of us talking past each other.

Expected utility tells you that the bet is rational if you are prepared to lose that money, but the probability that you are going to lose doesn’t change.

Oh, now you change your definition back to the standard one, which indicates that you know what it means all along.

  1. What is the probability that you get 1000?

Zero, presuming the original numbers.
99% presuming p=99% and potential 1001 in the opaque box.

That prints ‘TRUE’, but ‘FALSE’ if 1000000 is changed to 1001.
By your ‘opinion’, 1001 is > 1000, so this program is the wrong tool.

If I need that $1 to survive.

Dodging the question.
1 The Newcomb scenario never states that the player will not survive without going home with at least 1000.
2 Choosing 2B gets you guaranteed 999999 (and maybe double that), and yet you argue 1B because 1000000 is bigger, not ‘risking that one additional dollar’ and adding a real chance of getting nothing.

So an appeal to popularity fallacy. I began explaining that in the Monty Hall problem the majority was wrong.

But it is a function of this knowledge. If your friend ignores this knowledge, then his advice is functionally equivalent to that of a blind person with no knowledge.

You cannot eat your cake and have it too.

Do you understand what the word “hypothesis” means? If you actually read Wikipedia’s article you’ll find out 20% of the article is devoted to criticism: expected utility hypothesis’ criticsm.. In particular St. Petersburg paradox.

So no, it’s not as trivially obvious as you wish it to be.

I’ve no idea what made you believe otherwise.

What “original numbers” are you talking about? The probability that you get $1,000 given that you chose two-box is explicitly stated in Newcomb’s problem’s formulation.

That is correct. But you don’t seem to understand that a program can give you multiple results. It is up to you to decide what to do with that information.

And you are dodging the very well-known challenges to the simplistic expected utility hypothesis.

What’s the point in debating if you are just going to ignore the challenges? I contended with your hypothetical scenario, you are not contending with mine.

You did not address the Simpson’s paradox which shows your intuition is wrong, as I mentioned in the essay and my comments:

So I’ll give you the code which shows this:

counts <- array(
  data = c(4, 3, 8, 5, 2, 3, 12, 15),
  dim = c(2, 2, 2),
  dimnames = list(
    outcome = c('A', '!A'),
    treatment = c('B', '!B'),
    group = c('C', '!C')
  )
)

assoc <- function(m) m['A','B'] / sum(m[,'B']) - m['A','!B'] / sum(m[,'!B'])

# C
assoc(counts[,,'C'])

# !C
assoc(counts[,,'!C'])

# C+!C
assoc(counts[,,'C'] + counts[,,'!C'])

This code generates the following:

P(A|B) - P(A|\bar{B})
C -0.044
\bar{C} -0.044
C+\bar{C} 0

You can claim that P(A|B) < P(A|\bar{B}) is “not a function” of C because when C is true the statement is true, and when C is false the statement is also true. So C “doesn’t matter”.

That is:

  • P(A|B,C) < P(A|\bar{B},C)
  • P(A|B,\bar{C}) < P(A|\bar{B},\bar{C})

But Simpson’s paradox shows that when you ignore C, the result can flip:

  • P(A|B) \geq P(A|\bar{B})

The mistake is inferring from “holds for all values of C” to “holds when C is ignored”. Ignoring C is not the same as universally quantifying over it — pooling changes the weights each subgroup contributes, and those weights can reverse the result.

This feels unintuitive — that’s why people call it a “paradox” — but it’s a mathematical fact.

You have not contended with this fact at all.