How to test rationality? The case of argument evaluation

Suny · March 13, 2026, 6:08pm

In a lot of irrelevant hypothetical companies that have nothing to do with Dale’s written argument sure.

CrankySalamander · March 13, 2026, 6:11pm

The most rational answer to the question is there is insufficient information to fully judge the strength of Dale’s argument. It’s certainly not airtight, but contextually it could have some weight to it.

Suny · March 13, 2026, 6:13pm

My sibling in Christ, Dale is an outside observer, it’s a government project, Dale isn’t in charge of anything.

It’s certainly not airtight, but contextually it could have some weight to it.

And by “contextually” here, you mean if it was a completely different argument

CrankySalamander · March 13, 2026, 6:19pm

I haven’t seen the test, but unless there are instructions that state “assume Dale is a disinterested observer” that’s an unwarranted assumption.

Suny · March 13, 2026, 6:29pm

Right, just like it would be unwarranted to consider that Dale would be traumatized by the cancellation of the project.

CrankySalamander · March 13, 2026, 6:55pm

Which is why said the most rational answer is that there is too little information to properly evaluate the strength of Dale’s rebuttal. One measure of rationality of an argument is textual (how well does the reasoner martial facts and logic?) and one measure of rationality is contextual (what is the purpose of the argument and how well is it calculated to achieve its aim?). The second question is actually more grounded and practical than the first.

Suny · March 13, 2026, 7:06pm

Which is why said the most rational answer is that there is too little information to properly evaluate the strength of Dale’s rebuttal.

But that’s not true. No amount of context will change a bad argument into a good one. And we have everything we need here, i.e. the argument.

The second question is actually more grounded and practical than the first.

Not really. Depends on what you want. If you want to know the truth, it’s good to know when an argument works in terms of logic. If you just want to persuade an audience or “win” then you read The Art of Being Right: 38 Ways to Win an Argument and use fallacies or you lie or whatever, this isn’t what we are interested in here.

Delirium · March 14, 2026, 1:01am

Perhaps the following 2 questions will provide some clarification.
Solve for x
a) 2 + 2 = x
b) 3h = x where h = the length of a handspan

The answer to question a is 4, but what is the answer to question b?

Elephant · March 14, 2026, 1:17am

I just told you, no one gave it a 3 or more, so I disagree with your conjecture that someone scored it higher. Yes, if an expert gave it a 3, I would have a problem with it. They’re not called experts for nothing.
It’s not the mean, median, or average – it’s beta weighting.

Suny · March 14, 2026, 5:29am

from the paper:

The analysis of performance on the AET required that the participants’ evaluations of argument quality be compared to some objective standard. We used a summary measure of eight experts’ evaluations of these rebuttals as an operationally defined standard of argument quality. […]. Thus, for the purposes of the regression analyses described below, the median of the eight experts’ judgments of the rebuttal quality served as the objective index of argument quality for each item.

Suny · March 14, 2026, 5:41am

That’s an incomplete problem, I agree. The ‘context’ they want here would be saying that there is an invisible +2 somewhere in the equation. That Dale could be personally linked to the project or that Dale may be traumatized by the cancellation of the project doesn’t matter as the argument in front of our eyes isn’t about any of that.

Also the answer is 3h.

Delirium · March 14, 2026, 8:05am

One claim CART designers make is intelligence \ne rationality. IQ tests assess intelligence and CART assesses rationality.

Yes, the statistical parameter CART uses can be simplified to the median. I wonder why not mode. For ordinal categorical data, mode is preferred. Because CART designers have mapped their ordinal categories to numbers we see answers like 1.5. If they had not, we’d have clear answers like very weak or strong based on the mode.

The Dale question is difficult to answer. My response would be that Dale’s critics commit and error and Dale’s response to his critics is also an error.

Suny · March 14, 2026, 8:24am

Couldn’t you have multiple modes and you’ll have to take the average too?

The median might even be better because if you have an odd number of ratings then you always have a whole rating as the median (the middle one).

Here they have .5 sometimes because they have an even number (8) of experts.

For question (23)? What’s the critic’s error? It seems reasonable to me.

Delirium · March 14, 2026, 8:38am

I doubt if multiple modes make sense. Think of the normal distribution, it’s unimodal.

The critics say that the project will cost x billion dollars, such that x > 2 billion dollars. Savings is only 1.5 billion dollars. I have very little knowledge of economics/how money works, but there’s something off about this.

We could also group the experts’ answers. Weak and very weak are both weak.

Suny · March 14, 2026, 8:44am

I doubt if multiple modes make sense. Think of the normal distribution, it’s unimodal.

Yeah but some distributions can be plurimodal. If the experts’ ratings are:

1, 2, 2, 2, 3, 3, 3, 4

Then both 2 and 3 are modes.

I have very little knowledge of economics/how money works, but there’s something off about this.

It’s like a cost benefit. The continuation of the project if done will give a benefit of $1.5 billion but the cost of the continuation is $2 billion.

It’s like if you want to buy and sell an object. When you’ll sell that object you’ll have a benefit of $10. That’s great but it wouldn’t be a good decision to buy and sell if the cost of the object is $15. You’ll just lose $5 overall.

We could also group the experts’ answers. Weak and very weak are both weak.

Yeah. I think the current choices are ambiguous. It’s not always clear what distinguishes Weak from Very Weak so instead if I had to design a test, it would just be a “Yes/No”, “Weak/Strong” kind of thing.

Delirium · March 14, 2026, 8:52am

It’s best to consult someone who’s a businessperson. How do they compensate for/recuperate money spent on advertising, R&D. Who ultimately “pays the price” and how? I’m the wrong person here.

I meant to highlight the versatility of CART when I said we could group the experts’ responses. It’s got a vote vibe (the winner is the one with the maximum votes, the rest lose) to it. The mode is as resistant to outliers as the median. “Plurimodal” distributions can occur e.g. in the lengths of sexually dimorphic fish.

Suny · March 14, 2026, 8:57am

Well it’s usually compensated by people buying the product.

I mean if you group experts opinion then sure you’ll have only 2 categories and the mode is equivalent to the median but I don’t know what you mean by “we could group” outside of changing the test.

Delirium · March 14, 2026, 9:59am

As I said, I can’t form an opinion on this issue, but we are to assess Dale’s response to the opposition. I believe you said that it’s the sunk cost fallacy.

Is it not clear what I meant by grouping? Is weak not weak and is very weak also not weak? It’s like how light yellow and dark yellow are both yellow.

Suny · March 14, 2026, 10:06am

the idea behind grouping is clear, what isn’t clear is what is actually done when “you group” outside of simply considering the test as having two answers instead of four. If that is what is done then yeah mode = median but when I said median is better I meant for the four answers test obviously.

FlannelJesus · March 14, 2026, 10:52am

I’m not sure. If you’ve invested so much into research already, there are situations where you’re genuinely closer to the finish line. Not all costs are “sunk”.