Direct Realism and Perception

This is not controversial, and it survives any perceptual relation, even those that are manifestly indirect. In my interaction with you, my actuality is determining the content of the text you are reading right now, despite that our communication is undoubtedly indirect.

OK but now I want to press on what “directly aware of the model” actually amounts to. You’ve said we’re directly aware of a subset of our internal machinery — the conscious part. But what is that direct awareness? It can’t itself be a model, because then you’d need awareness of that model, and you’ve just told me only one level of meta is feasible. So this direct awareness is some kind of immediate, non-representational cognitive contact with a neural state.

But now you have a problem. You’ve helped yourself to a capacity — direct awareness — that is non-modelling, non-representational, immediate. You just restrict its scope to internal states. But why that restriction? What feature of direct awareness makes it constitutively incapable of being world-directed? You need a principled reason, not just the assertion that “anything outside this subset must be modelled.”

Because from where I sit, the capacity you’ve just described — non-representational cognitive contact — sounds exactly like what I’ve been calling direct awareness all along. I just don’t see the reason for arbitrarily restricting it to brain states. Especially since, phenomenologically, that’s not how it presents. When I see the rock, I’m aware of the rock, not of neural firing patterns. You’d have to say that the phenomenology is systematically misleading — that what feels like awareness of a rock is “really” awareness of a model. But the only reason to say that is the prior commitment to IR. The phenomenology itself doesn’t support it.

So the regress objection isn’t really my main point here. My point is that you’ve conceded the existence of a cognitive capacity that your framework officially shouldn’t be able to account for, and then quarantined it to a domain where it does the least damage to your position.

I think you’ve misread the analogy. I wasn’t saying the brain is like the larynx in terms of personal identity. Obviously the brain is more central to who you are than your larynx is. The point was about the logical structure of the explanation: speech is an act of a person performed through a larynx, and knowing is an act of a person performed through a brain. In neither case is the organ itself the right level of description for the act.

You say you identify personhood with the brain. But do you really? You deliberate, you make arguments, you evaluate evidence. Do you describe any of that as “my brain is doing such-and-such”? When you say “I think your argument fails,” is that a claim about your neurons? If so, what would it even mean for neurons to find an argument inadequate? Evaluation, inference, judgment — these are acts of persons, described in normative vocabulary (valid, sound, justified) that has no purchase at the level of neural description. A brain state isn’t “justified” or “unjustified.” It just is.

Identifying personhood with the brain doesn’t resolve this — it just relocates the question. Fine, call the brain “the person.” You still need to explain how this physical system performs acts that are subject to rational norms. And “it models things” doesn’t answer that, because modelling is a causal process and norms aren’t causal properties.

I think some confusion here is coming from the word “chime” doing double duty — it can mean the distal source (the bell or whatever) or the sound it makes. Let me restate with that cleared up: when you hear a bell, the qualitative character of the sound belongs to your act of hearing. The bell is the object. The sound-as-experienced is how the bell presents to you.

Now, you’re right that “the act of experiencing the bell” is, trivially, not identical with the bell. I wasn’t claiming otherwise. The third option isn’t that the quale is the object. It’s that the quale isn’t an intermediary between you and the object. There’s a difference between “other than the bell” and “standing between me and the bell.” My act of seeing a rock is obviously not the rock. But it doesn’t follow that my act of seeing the rock is a screen interposed between me and the rock. It’s the means by which the rock is disclosed, not a barrier to the rock.

The real disagreement is in your key move: “phenomenologically what presents to the subject as the object in fact belongs to the subject.” But this is precisely what I’m contesting. Phenomenologically, the bell presents as sounding a certain way. The sounding is neither purely subjective nor purely objective — it’s the relational event of the bell showing up for a hearer. You’re splitting this into “whats really happening (subjective modelling)” versus “what seems to be happening (objective features),” and then calling the gap between them an illusion. But that split is a theoretical imposition, not a phenomenological finding. You’re reading IR back into the description of experience rather than drawing it out of the experience.

But judgment does deploy evidence exterior to the initial perception — that’s exactly what I’ve been saying. You check, you corroborate, you gather further experience. The question is whether those further acts are themselves world-directed or whether they’re just more models being compared to models. And we’ve already been through that: on your picture, every piece of “exterior evidence” is itself another model, so you never actually get outside the circle. On mine, each further perceptual and cognitive act is itself a case of the subject engaging the world, so the evidence genuinely bears on reality.

I think the confusion is that you’re treating “direct” as though it means “infallible and self-certifying in a single act.” I’ve never claimed that. Direct perception means the perceptual act is world-directed — its object is the thing, not a representation of the thing. It doesn’t mean the act can’t be wrong or that it doesn’t need further support. Needing judgment to confirm what perception delivers doesn’t make perception indirect. It makes it incomplete — the first step in a cognitional process that also requires understanding and judgment.

Consider: I directly see the stick in water as bent. My perception is of the stick, not of a model. But it takes further inquiry — touching it, understanding refraction — to judge correctly that its actually straight. The judgment corrects the perception, but at no point was the perception directed at anything other than the stick. “Direct” describes the intentional structure of the act, not its epistemic sufficiency. You keep collapsing these, and I think that’s where the impasse is.

Wait — I think you’ve conceded the key point without noticing. You say “cognition is not just model building, otherwise we would have no basis to choose one model over another. Judgement, which can accept or discard models, is required as well.” I agree entirely. But now: what is this judgment, on your account? It’s not a model, you’ve just said so. It’s the capacity to evaluate models against rational norms — coherence, parsimony, explanatory power, evidence.

So you have a cognitive capacity that isn’t model-building, that operates according to rational norms, and that adjudicates between models. That’s
 basically what I’ve been arguing for this whole time. The question was never whether brains-in-vats are logically possible. The question was whether we have a cognitive capacity that makes genuine normative contact with reality rather than just shuffling representations. It sounds like you’ve just said “yes”.

And look at what that capacity does in your own argument. You say we judge the alternative hypotheses “extremely unlikely” using “various reasons.” But those reasons have to actually be good reasons — they have to track something real about plausibility, evidence, explanatory adequacy. If they’re just more models, you’re back in the circle. If they’re genuine rational insights into evidential relations, then you have exactly the non-modelling, world-directed cognitive contact that you’ve apparently been denying.

So I don’t think I need to “do better.” I think you’ve done the work for me. The skeptic can formulate the skeptical scenarios, sure. But the very act of judging them unlikely — which you rightly say is necessary — presupposes a cognitive grip on rational norms that transcends model-building. That grip is the direct contact with reality I’ve been pointing to. Not perceptual infallibility. Not some mystical union with objects. Just the capacity of intelligence and rationality to reach beyond representations toward truth.

I’m finding this very confusing. You point to various phenomena associated with a moment of reaching understanding. But you also allow that we may not notice them. So they are phenomenally available, but are present even if I don’t notice them. But surely phenomenal availability just is what I notice. So these moments are sometimes phenomenally available, but not always.
But then you say that they take place even if they are not phenomenally available. What have I got wrong?

But this is a side issue. Full disclosure, I’m simply chanelling Wittgenstein here. The question to ask is what shows that I understand. Anything that happens in the moment can turn out to be a false dawn. So nothing that happens in the moment actually shows what it is to understand. That comes later in what I say and do. What I’ve understood affects what I do, in my success in navigating the world, making good choices, etc.

Understanding may feel like an event or like a process. But it is neither. It is rationally appropriate behaviour.

I thought you were the one who said that ‘intelligibility is “right there” in what you see’. Or at least it was something we agreed on. I’m sorry if I misunderstood you.
I have big trouble with this. I’ve been walking - and talking - round it for a while.
Intelligibility seems to be whatever it is about something that allows us to understand it. But it can’t be something distinct from the various properties that it have. So it is the properties that something has that allow us to understand it. Actual understanding is the result of an interaction between ourselves and the object of our understanding, in a context.
Is that somewhere near what you understand by intelligibility?

I think it is a bit different from that. An interpretation is closely related to a gestalt. It is a way of understanding the elements in an experience that grasps the significance of the whole thing. I wouldn’t say it makes the experience have the determinate content that it has. I would say that the interpretation recognizes the content and attributes significance to it. Something like that.

Sorry for the confusion. I think what I am saying is actually fairly straightforward – insight is always present, but not always explicitly attended to. We tend to habituate insights over time. An insight that initially required great conscious effort to achieve becomes “automated”, eventually fading into the background of everyday experience. That doesn’t mean the insight is no longer operative, it means the insight has been seamlessly incorporated into the texture of lived experience.

I think we’ve actually come full circle. This is structurally the same move as the signpost example from earlier. If understanding just is rationally appropriate behavior, then what makes the behavior count as “rationally appropriate”? You need some standard of correctness, and that standard can’t itself just be more behavior without regressing. At some point something has to actually be correct or incorrect, not just be followed by further behavior that we then also evaluate.

And I’d push back on the claim that nothing in the moment shows understanding. The click of getting a joke isn’t just a feeling that might be a “false dawn.” It has a specific content — you grasp why it’s funny, you see how the parts fit together. That grasping can indeed be mistaken, sure. But the possibility of error doesn’t mean the act isn’t real, it just means it’s fallible. And “fallible but real” is a perfectly respectable status for a cognitional act to have.

You’re right that I said intelligibility is “right there” in what you see — and I stand by that. Intelligibility is real, it’s not something we project onto things. But there’s a subtle point I think got lost. The intelligibility is really there in things, but it doesn’t just transfer itself into your mind passively. You have to grasp it. The pattern is in the data, but seeing the pattern is an (often effortful) act.

So when you say intelligibility is “whatever it is about something that allows us to understand it” — I’d want to tweak that slightly. Its not just that the object has properties that allow understanding, as though understanding were just a matter of the object being available and us being in the right proximity. The object has a real intelligible structure, yes. But that structure only becomes understood when someone actually grasps it. And that grasping is what I’ve been calling insight.

Your phrase “the result of an interaction between ourselves and the object, in a context” is actually quite close to what I’d say. I’d just want to insist that the interaction isn’t symmetrical — the object contributes the intelligible structure, but we contribute the act of understanding. Neither side is sufficient on its own.

I think there’s an important difference between “recognizes” and “makes” here, and it’s worth being precise about which one we want. If interpretation merely recognizes content that’s already fully determinate before the act of understanding — then the intelligibility of the experience is just sitting there, pre-formed, waiting to be noticed. And that’s the position I keep trying to push us away from, because it implies that understanding adds nothing to the picture. It just passively registers what’s already there.

I’d say something more like: the elements of experience underdetermine the interpretation. The same visual field can be grasped as a duck or a rabbit, as a cup or a cylindrical shadow. The experience constrains but doesn’t fix the content. What makes the experience have this determinate content rather than that one is the act of understanding. So “makes” is too strong if it suggests we’re just projecting — I agree. But “recognizes” is too weak if it suggests the content was already fully there independent of the grasping. The truth is somewhere in between: understanding completes the determinacy that experience only offers in potential.

Descriptions of objects and how things look does not entail that they are in possession of phenomena called “redness” or “color”, or that such phenomena exist in any way shape or form in any other location.

Your suggestion that these are things in our experience suggests that we are able to experience them and therefor ought to be able to describe them. We can describe all sorts of bodily phenomena, but not colors? Why?

Because some things are irreducible, like pain, pleasure, a foul smell, a sweet taste, and a red colour.

We just experience them, name them, and recognise them to be different to one another.

Yes, I agree with you.

I suppose it depends on what we mean by “direct” I suppose? When we talk to someone on the phone, we don’t tend to think we are hearing their “representative” (although I suppose you could call the sound wave output just that).

Anyhow, if indirect realism were just a thesis about causal mediation, it would be trivially true, no? But then there would also be no “direct” interactions at all, making the distinction empty. But isn’t the point rather that what we are aware of/experience some sort of internal (generally mental) intermediary?

From my perspective, I guess the dichotomy only even makes sense in a particular metaphysical framing. If I had to put various traditions into one box or the other (direct/indirect) I don’t think I could even coherently do it. This is sort of like how it’s impossible to fit many older thinkers into the incompatiblist, compatiblist, libertarian buckets in the free will debate; a dichotomy’s framing often assumes much.

TBH, I think the reason these debates are intractable is because very little contemporary thought nicely fits the distinction anymore, even though people continue to use it. It makes the most sense for Locke, Descartes, etc. By the time we get to trying to frame “direct versus indirect” in terms of the sort of neuroscience-inspired functionalism popular today, I am not sure if this distinction is doing good work any longer.

However, I do think a potentially less loaded way to frame this would be to borrow (in a very loose sense) the tripartite semiotic structure of object, sign vehicle, and interpretant (or source / channel / receiver) and to say that:

Direct = the sign vehicle is the site of nuptial union between the object and interpretant. The relationship is irreducibly triadic. Just as a cause is only a cause in virtue of having its effect, a sign vehicle, interpretant, etc. is only what it is in virtue of being in this triadic relation. The object and interpretant are joined, as part of an irreducible whole. We might describe this in terms of the penetration of the interpretant, but also the interpretant’s ecstasis, their going out beyond themselves.

Indirect = the sign vehicle is an impermeable barrier between the object and the interpretant (but still the means of their linkage). The sign vehicle carries information about the interpretant, and causal force, but the object and interpretant are never joined. Rather, there is a dyadic relationship between the object and the sign vehicle, and a separate dyadic relationship between the sign vehicle and the interpretant. We can describe the whole has triadic, but this triad is reducible into discrete dyads.

The First Dyad = The object interacts with the sense organ (physics/biology).

The second dyad: The mind experiences the resulting image (psychology, cognitive science).

Whereas, on the direct view, the sign vehicle is transparent because it is defined relationally. The formal identity of, say, sight, already presupposes “seer and seen,” and their union.

In the direct case, you have parts defined in terms of the whole. In the indirect case, you have a whole determined in terms of the parts. You can have one dyad without the other (e.g., how indirect realism often interprets hallucinations, and why it sees it as a strong counterexample).

Maybe that is helpful? To my mind at least, it helps to underscore the key differences re mediation.

Right, which goes along with the whole idea of the intellectual virtues. Each faculty is involved, as one goes “out to the many,” better understands the one, and then tests that understanding. There is a constant transition between the unificatory understanding and discursive analysis and justification.

I have a handy note on that in case anyone is interested:


the dematerializing power of the intellect, by which man can abstract essential forms from natural phenomena, and so know their principles, was thought to be trainable. To understand this in more modern terms, we can think of the way in which many exceptional scientists are able to make major contributions across disparate fields (e.g., Erwin Schrödinger in both physics and biology). These men and women have trained their intellects to be able to recognize the unifying, higher-level principles at work in disparate events.

That is, the ancient explanation of elite scientists’ ability to quickly understand and begin contributing to fields outside their area of expertise would be understood in terms of their habituation to understanding the abstract principles (form) at work in the world, irrespective of the matter involved. This is a move from multiplicity to unity; an intellectual ascent from the material to the intelligible. Benoit Mandelbrot, one of the fathers of chaos theory and complexity studies, is another an excellent example here. As a mathematician, he was able to abstract the principles at work in a wide array of phenomena in the social and natural sciences. Norbert Wiener, the originator of cybernetics, would be another prime example. That individuals can seemingly develop their capacity for such movements is precisely why science was understood as a virtue and not a methodology in the pre-modern era. (Of course, one interpretation of “science” need not exclude the other).

This might seem strange today, but we can find much evidence to support it. Some principles are more general than others. For example, one of the most consequential paradigm shifts across the sciences in the past fifty years has been the broad application of the methods of information theory, complexity studies, and cybernetics to a wide array of sciences. This has allowed scientists to explain disparate phenomena across the natural and social sciences using the same principles. For instance, the same principles can be used to explain both how heart cells synchronize and why Asian fireflies blink in unison. The same is true for how the body’s production of lymphocytes (a white blood cell) takes advantage of the same goal-direct “parallel terraced scan” technique developed independently by computer programmers and used by ants in foraging.

Notably, these explanatory unifications are not “reductions.” Clearly, firefly behavior is not reducible to heart cell behavior or vice versa. Indeed, such unifications tend to be “top-down” explanations, focusing on similarities between systems taken as wholes, as opposed to “bottom-up” explanations that attempt to explain wholes in terms of their parts. This sort of work has been hugely influential across the sciences, and here the role of “abstracting and understanding higher level principles” is at the center of such breakthroughs.

This is one thing I’ve always found profoundly odd as an objection. I don’t think its fair to say any theory of perception says that reality can/is related to representations arbitrarily. It does, in fairness, run the risk of allowing for this - but I don’t think I’ve heard a version that doens’t ultimately say something Kantian in that the objects causing representations must logically be coherent with them.

I wonder if you are amalgamating what you called the “act of insight” and the insight gained as a result of the act? And, as I get deeper into this, I find that I can only make sense of what you say if I substitute “understanding” for “insight”. Are they the same? If not, what’s the difference?

You are making the same mistake as people who think that perceptions can only be questioned and/or confirmed is circular. Perceptions are not a seamless whole, but have many different components and aspects - not to mention many people to do the perceiving. Behaviour is not a seamless whole. One action can conflict with another, be inconsistent and so forth.

What does “grasp” mean here. I understand it as a metaphor, meaning “understand” (roughly". Certainly, seeing (another metaphor) how the parts fit together is one mode of understanding. Another is inferring to a hidden cause (common sense sense of hidden). And so on.

It is amazing how difficult it is to explain this just right. We nearly agree, I think. So forgive me if I come back at you.
What exactly is added to the picture when you see it as a duck (or rabbit)?
In one sense, the picture is the same under both interpretations. In that sense, the interpretation adds nothing, takes away nothing. But yet something has changed - to the point where the object pictured is not the same object. In that sense, it is a different picture.
Let me try again. The elements of the picture are seen in a new perspective (Gestalt), in different relations to each other, at a new descriptive level.

There’s a lot to agree with here, and your examples are well chosen. But I want to flag something about how the agreement is framed, because I think it matters for the larger dispute.

When you describe the “dematerializing power of the intellect” as the capacity to “abstract essential forms from natural phenomena,” this can be read in two quite different ways. On one reading, the intellect is doing something like intellectual perception — it grasps the form directly, and the virtue consists in a trained sensitivity to what’s already there to be seen. On another reading, what the intellect does is catch on to a pattern in the data — it has an insight that reorganizes what’s presented — and the virtue consists in a habituated facility with that act of reorganization, plus the critical judgment needed to test whether the insight actually holds.

These aren’t trivial variants. The first reading fits comfortably with a direct-realist picture where the intelligible structure of things is simply available to a properly disposed mind. The second treats intelligibility as genuinely achieved — not fabricated, but not simply received either. Mandelbrot didn’t perceive fractals the way one perceives colour; he grasped a unifying pattern across datasets that no one had organized that way before, and then the mathematical community tested whether that organization was adequate to the phenomena. The virtue was real, but it was a virtue of insight and critical testing, not of intellectual receptivity.

Your point about top-down explanatory unification actually strengthens this, I think. When complexity scientists discover that firefly synchronization and heart-cell synchronization instantiate the same dynamical principles, what’s happened is not that someone finally perceived a form that was sitting in the phenomena all along. What happened is that someone asked a new kind of question — not “what are these things made of?” but “what pattern of relations governs these systems as wholes?” — and that shift in the question opened up a new intelligibility. The principle was always there in the data, sure, but it only becomes an explanation when inquiry is directed at the right level. And thats precisely why these unifications are top-down rather than reductive: they’re organized by a different type of insight, one that attends to system-level relations rather than compositional parts.

So I think we agree on the phenomenology of how good inquiry works. The question is whether that agreement is best cashed out in terms of a mind trained to receive forms, or a mind habituated to the self-correcting cycle of experiencing, understanding, and judging. I think the examples you’ve given — especially the emphasis on non-reductive, top-down unification — fit the second picture more naturally than the first.

No, it’s because colors are not things.

Odor molecules, however, are things, and that’s why you yourself can describe them as “foul”. Food is a thing, and so you can describe some of it as “sweet”. Apples are things, and so you can describe them as red.

Good question, and yes, I was being a little sloppy. The act of insight is the event — the moment where intelligibility clicks into place. Understanding is the resulting state, the accumulated capacity that stays with you after the act. So they’re not the same thing, but they’re intimately connected: understanding is built up from acts of insight the way a house is built up from acts of construction. You can live in the house long after the building is done. When I talked about insights being “habituated,” I should have said that the understanding gained through insight gets habituated — the act of insight is generally a one-time event (unless the resulting understanding is forgotten), but the understanding it produced becomes part of the permanent furniture of your cognitive life.

But recognizing that one action conflicts with another — seeing the inconsistency — is itself a cognitional act, not another “behavior”. Two behaviors can’t “conflict” all by themselves; conflict is something a knower grasps/understands when she holds both behaviors together and sees they don’t cohere. So the internal checking you’re describing actually relies on exactly the kind of reflective judgment I keep pointing to. It’s not behavior all the way down — somewhere in the process, someone has to understand that there’s an inconsistency, and that understanding is what gives the correction its rational force.

Yes, “grasp” is roughly synonymous with “understand” — I’m not smuggling in anything extra with the word choice. But that’s actually the whole point. When you get a joke, something happens that deserves the name “understanding” — you see how the parts fit together. That’s the act I’ve been calling insight. So when you say “seeing how the parts fit together is one mode of understanding,” you’re conceding that understanding is a real act with a specific character, not just a label we paste onto subsequent behavior. Which is all I’ve been arguing for.

Right — and I think you’ve just described exactly what I mean by insight. The elements don’t change, but they get organized differently. Something new happens that isn’t a change in the data but a change in how the data hangs together. That “something” is the act I’ve been pointing to. You keep describing it beautifully — “new perspective,” “different relations,” “new descriptive level” — but then resisting giving it the status of a genuine cognitional act. I’m not sure what’s left to disagree about, honestly. If the elements are the same but the picture is different, then the difference is contributed by the viewer’s understanding. That’s the act of insight.

I can describe an apple as red and sweet. The adjective “red” describes how it looks and the adjective “sweet” describes how it tastes. An object’s look is how it appears in visual experience and an object’s taste is how it appears in gustatory experience. Visual experience is a mental phenomenon that occurs when the visual cortex is active and gustatory experience is a mental phenomenon that occurs when the gustatory cortex is active.

The science is pretty clear, as explained in a previous comment.

OK. I understand the event vs resulting state. I also understand what you are pointing to as an act of insight, though we need to note that it is not an act in the sense that we can execute it whenever we wish, so I think of it as more like an event.
I have two reservations. You seem to think that every bit of understanding we acquire is acquired on the basis of an act of insight - new Gestalt. It doesn’t seem right to call working out how much fuel I need to drive from A to B an act of insight. I may get a better understanding of some event by reading a new account of it. That might be radical, involving reinterpreting everything I already knew. But it might not. There’s not one pattern that covers everything.
WHen we are talking about Gestalt changes, it does seem likely that they are sudden. They cannot be done cumulatively, because the relationship of each element to every other element changes. Jokes also seem to be like this.
The moment of insight can be expressed in many different ways. But does it necessarily have to be expressed in any way at all. I might get the joke, and laugh or smile. But perhaps I might get the joke and do neither, because it isn’t funny.
But the fundamental issue is this. A new insight might affect me in many different ways; but there is no way that I can grasp, in the moment of insight, all the ways that I may be affected. Seeing things differently will have consequences down the road; I can’t grasp those until the relevant situations arise. (I’m channeling Wittgenstein on following a rule here.)

I get that. But the picture is different and also not different at all. The duck-rabbit picture is a picture of a duck and a picture of a rabbit and a picture of neither. That is quite hard to express clearly and non-paradoxically.

Again, directness is not what I believe to be at stake. We agree that the telos of perception is world access. And we agree that perception can be directed at the remote object even in indirect perceptual relationships.

I am not merely “helping myself” to direct awareness, and restricting it to internal states, as if by fiat.

One of the principles I am assuming is the identity of mind and brain. I don’t deny emergence of mental features, but at the same time I am assuming that, given a perfect understanding of the brain, every mental feature can in principle be traced to underlying brain function (I deny strong emergence).

And so when I ask you, “how can the brain grasp external reality”, I’m not implying that the brain is the best level of explanation of an answer. I’m asking you to think in the register of the brain: the pink spongy mass, not the ineffable, mysterious mind. Because of mind/brain identity, the ineffable mysterious mind inherits all the limitations of the pink spongy mass.

And so let’s consider this pink spongy mass, encased in its skull. It has various inputs, the ones we are concerned with are the nerve fibres originating in sensory organs. On their own these already provide a casual connection with the external world. But not epistemic connection; uninterpreted, they provide no more epistemic access to the world than the cellular waves saturating the air give to a city pigeon.

So, what does the brain do to gain epistemic access to the world? It does what it is designed to do: it predictively models a world that most likely produced the inputs it received. What other option is available to it? There is none. If you were designing a robot, is there some other choice that would be available to you? A human is not a robot, but this fundamental design constraint is the same across human and robot.

Given mind/brain identity, both mind and brain must be understood as one and the same thing to comprehend them. If the brain accesses the world by modeling, the mind is the experience of this modeling, as the experience of the world. Not y, x as y. There is simply no alternative. There is no magic the mind possesses that can transcend the fundamental limitation of the brain. You seem to imagine these models as some kind of internal utility that magically grants the mind direct access to distal objects. There is no such magic, and this vague idea cannot even be operationally understood. If the brain gains access to the world by modelling, then so must the mind.

And the remote world outside the reach of sense organs? The brain borrows the modeling language used for sensory experience. What is the corresponding feature of the mind? This is the self, with its dialogues, images, and concepts.

You keep saying this. And yet, we have been through so many examples where intentionality targets its object via an epistemic intermediary. You have said that in the case of VS, there is a well defined intermediary object, which is lacking in perception. But this doesn’t obviate the main point: the intentional target in the VS case is the apple, not the TV, and this perception is uncontroversially indirect. Your definition of direct cannot seem to survive this counterexample.

Wait a minute. Now you seem to be arguing something entirely different than direct realism of perception. Now, it is not perception, but judging and rational norms that afford direct contact with the world. I have not only not done your work for you, I don’t acknowledge this point at all. I consider judgement a purely internal operation. It might be “world aligned”, but this is far from the “direct contact” that perception is supposed to afford. Lacking world connected nerves, and therefore lacking even casual contact with the world, I see judgement as being a hopeless avenue for you.

Over the course of this very long and seemingly intractable debate there have been several attempts to define directness and indirectness. The best I have come up with is that indirect perception is something like:

Epistemic access to what is not at hand granted by a casually connected representation that is at hand.

This accords with intuitions about common examples: by means of direct access to casually connected representations inscribed on photos, maps, and faces, we gain indirect epistemic access to subjects, territories, and emotions.

This also accords with what I take to be the essence of your definition: that instruction entails a triadic structure, whereas directness is a dyad.

Above all, I think the question is still meaningful, regardless of the current state of academic discourse: is perception itself this sort of thing, or isn’t it?

What do you think?

A common problem with these discussions. I think the suggestion that there’s just one definition of “direct” and “indirect” is a false one. Rather, we might say that perception is direct in this sense but indirect in this sense.

See for example here:

Proponents of intentionalist and adverbialist theories have often thought of themselves as defending a kind of direct realism; Reid (1785), for example, clearly thinks his proto-adverbialist view is a direct realist view. And perceptual experience is surely less indirect on an intentionalist or adverbialist theory than on the typical sense-datum theory, at least in the sense of perceptual directness. Nevertheless, intentionalist and adverbialist theories render the perception of worldly objects indirect in at least two important ways: (a) it is mediated by an inner state, in the sense that one is in perceptual contact with an outer object of perception only (though not entirely) in virtue of being in that inner state; and (b) that inner state is one that we could be in even in cases of radical perceptual error (e.g., dreams, demonic deception, etc.). These theories might thus be viewed as only “quasi-direct” realist theories; experiences still screen off the external world in the sense that the experience might still be the same, whether the agent is in the good case or the bad case. Quasi-direct theories thus reject the Indirectness Principle only under some readings of “directness”. A fully direct realism would offer an unequivocal rejection of the Indirectness Principle by denying that we are in the same mental states in the good and the bad cases. In recent years, direct realists have wanted the perceptual relation to be entirely unmediated: we don’t achieve perceptual contact with objects in virtue of having perceptual experiences; the experience just is the perceptual contact with the object (Brewer 2011). This is the view that perceptual experience is constituted by the subject’s standing in certain relations to external objects, where this relation is not mediated by or analyzable in terms of further, inner states of the agent. Thus, the brain in the vat could not have the same experiences as a normal veridical perceiver, because experience is itself already world-involving.

My own take is that perception is not direct in the way that naive realism says it is. I’m not really interested in these other notions of directness.

I would think it would be the opposite. Representationalism would be the model with two dyads. The entire point is that what is “at hand” is never the object, always the internal representation, right? What is at hand is limited by the dyad between sign vehicle/representation and interpretant.

That is, you have a causal dyad between the object (cause) and representation, and a second dyad between the representation and the interpretant. The object is never “at hand” because there is no dyad between object and interpretant.

Re: pictures, in The Phenomenology of the Human Person Robert Sokolowski suggests that the “picture” metaphor is apt to cause all sorts of issues, and offers an alternative, the “lens”. Lenses are something we primarily look through not at. They are also something we actively use. This is why he thinks they make a better metaphor.

This resolves a number of difficulties. For instance, how can we ever be sure that a “picture” corresponds to what it is a picture of if we only ever deal in pictures? Whereas, switching between lenses (or sets of eyes) doesn’t seem to have this issue. Further, since consciousness is always intentional, we have to ask what its aboutness relates to here:

In the indirect “picture” model, the object of our consciousness is the image in our mind.

In the “lens” model, the object of our consciousness is the thing itself. Then lens only rarely becomes an object of intentionality itself.

So, Sokolowski wants to emphasizes that our bodies are not analogous to cameras recording data. He’d like to say that the body is more the “lenses” through which the world is disclosed, with our active, embodied participation. When we touch a table, the sensation isn’t a “picture” of hardness; rather our hand is the medium through which the table’s hardness presents itself to us. The mind isn’t a screen on which the world is projected, but rather the light through which it is seen. Or something like that.

I thought this was a good framing. For one, because if touch involves “pictures” I am not sure why all physical interactions wouldn’t be “pictures” in this sense, but then the analogy seems to break down. It seems better to stop at “everything is received in the manner of the receiver,” which is as true for rocks as people, and maybe to drop both lenses and pictures as problem analogates.

That said, while I think that Sokolowski is correct that representationalism has indeed been associated with many serious epistemic challenges, I do think this might have more to do with upstream assumptions about causality and the metaphysics of appearances more generally. So, I wouldn’t put to much weight on the “picture” analogy one way or the other.

This is true even going back to Plotinus’ attacks on representationalism, where his target is presumably the ancient Empiricists. The problem there is not so much representations, as the fact that they are metaphysically unanchored from reality.

I guess I find this less convincing than you do. I don’t see how you can deduce normativity, intentionality or qualia from brain function.

Yes, that’s what a brain does, but that’s not what a person does. If you think persons are reducible to brains then your question looks decisive. If not, it looks more like a category mistake.

There’s no magic on my view either. Neurophysiology and neurocomputation capture real intelligible patterns in the world. But they don’t exhaust what can be said about these matters and, in fact, they leave a lot out.

Consider dynamical systems theory, ecological psychology or 4E cognitive science — none of these would deny that the brain builds models. But they would deny that the content of perception is computed behind the skull. Instead, they’d say something more like: perceptual content is constituted by dynamic coupling between organism and environment. The brain’s processing is one component of that coupled system, but the perceptual achievement — the disclosure of a distal object — lives at the level of the whole coupled system, not at the level of some neural component in isolation.

But I’d go even another step-further and say that at a certain level of organization, the dynamical behavior of the person-environment system is most accurately and most economically described in fully intentional and normative terms. Not as a convenient shorthand — but as the right level of description for real intelligible patterns that are invisible from below.

When a person grasps that their evidence is sufficient and judges accordingly, or notices an error and self-corrects, those are stable, counterfactual-supporting regularities in the system’s behavior. But they are constitutively normative — you literally can’t describe “she corrected her mistake” without normative vocabulary. And “mistake” and “correction” aren’t reducible to anything in the neural dynamics, any more than “molecular structure” is reducible to a purely quantum mechanical description. The intentional vocabulary is tracking real causal structure at its own level of organization.

So personhood isn’t a ghost haunting the brain. It’s the level of organization at which the dynamical patterns constitutively are intentional and normative.

We keep missing each other on this point, so I’ll try to simplify. The apple is the referential target of a compound cognitive act involving two separate but related intentional acts: (1) perceiving the screen and (2) understanding how the surveillance apparatus is related to the apple. The apple itself is perceptually absent and so can’t be the direct object of perception. That’s what makes the scenario indirect.

Now consider the perception of the screen itself: there’s no analogous two-step structure. The screen is the perceptual object, full stop. To make the VS analogy work for IR you’d need to show theres a perceived intermediary playing the role of the screen in ordinary acts of perception, and that’s the thesis under dispute, not something the analogy establishes.

First, I’m not saying that only judgment puts us in contact with the world. Perception does too. So nothing has changed from my end — I’ve just introduced a distinction that I had previously left implicit.

Second, let’s try to get clear about the picture you are building here, because I think it has some pretty uncomfortable consequences.

On your view, perception puts us in direct contact with model contents, not the world. When contradictory contents arise, you say judgment adjudicates. Fine – but if judgment is a purely internal operation with no world-answerable standards, then you never get beyond models. You’re still behind a veil, except now you’ve added a story about how the veil is managed.

And worse: the normative criteria you yourself appeal to — “evidence,” “sufficiency,” “better explanation,” “unlikely” — become internal window dressing over whatever predictive dynamics is “really” doing below the surface. The phenomenology of inquiry (weighing reasons, noticing errors, revising beliefs) becomes epiphenomenal narrative, not genuine rational activity.

So here’s the fork as I see it. Either:

  1. judgment is genuinely norm-governed and world-answerable (in the ordinary way inquiry is: by checking, testing, corroborating, revising in response to how things are), in which case you’ve conceded a form of cognitive authority that isn’t captured by “modelling all the way down”; or

  2. judgment is just internal model management, in which case talk of justification and reasons is not doing any real work — and your own reliance on those notions (including in your skeptical argument) is undermined.

That’s the problem I’m pressing. It’s not that brains can’t be the realizer. It’s that your framework, as stated, threatens to explain away the very normativity it needs in order to count as an account of rational cognition rather than mere prediction.

1 Like