"I perceive myself perceiving the world" is what most people refer to as consciousness. Philosophers call it "second order perception". There's something rich about this word. It provides you a pathway to formalize consciousness. But, before that, we can raise various questions:
1) Is consciousness incomputable like what Rojer Penrose suggests?
2) Is consciousness a collective phenomenon like what Michael Levin suggests?
3) Is consciousness a quantum phenomenon similar to what Stuart Hameroff suggests?
4) Is consciousness purely a computable classical phenomenon?
5) Is it something Godly?
To be honest, I simply don't know which one of them is true. But, to try to model it, we will need to assume one of them as a postulate. If we start off with no 4, i.e. classical phenomenon as a postulate, then we could formalize some empirical properties of consciousness and figure out how it could be modeled into machines.
 |
| Consciousness |
The first thing we need to introspect is the idea of "second order perception". You perceive that you're perceiving something.
How do you define this mystical phenomenon?
The first thing you realize is that it is actually detached and works independent of your senses. For example, Imagine you're in a noisy street. You close your eyes and think about yourself or have an internal monologue about something. At that moment, you do not lose your ability to process external environment like sound, smell, vision. From this we can infer that information processing from the environment happens independently, regardless of our awareness about it. We can shift the awareness as we like internally or externally. Therefore, we can separate the two functions into two levels or orders:
- Inner neural network
- Outer neural network
Inner neural network will be responsible for processing information from external environment and controlling other parts of the system (limbs for humans). This layer is much similar to what we have today as LLM or multimodal system with tool call.
The outer neural network will be the part responsible for consciousness. To again derive the function of this layer we have to strip away all of the unnecessary elements surrounding consciousness and ask a few questions:
What is this sense of self?
What is this internal monologue that we have?
What other behaviors do we attribute to consciousness?
If we carefully observe, we can derive both the sense of self and the internal monologue (the internal conversation that we have with ourselves) with COT (Chain of thought). COTs have been very popular these days to build reasoning models. But, unlike the traditional COT, which is produces language, this COT will be based on sound.
But why based on sound, not language?
To answer this, I first want you to listen to this small clip and try to emulate it internally.
It is quite obvious that you cannot mimic the sound internally with language. In fact, the other way around is true. Language is sustained sound. We have an abstract representation for it in writing form, but there's a sound associated with all alphabets and words. This is why, COT is based on sound, call it COS (Chain of sound). The sense of self is just us referring to ourselves with COS or maybe sometimes with limbs. If we don't have any vocab to refer to ourselves in our language then, either we may mimic some new COS to refer to ourselves, or we might now refer to ourselves at all. The same is true for internal monologue. The dialogue we have with our selves is just COS all the way out. But all these are still not enough.
What are we missing?
What we're missing is visual context aka ability to visualize things. It's important to note that the visual context can be both static like a 2d picture and spatial like 3 degrees of freedom. If I mention the word "apple", you're likely going to visualize a picture of an apple. You could also spin it around, move it around just like 3d space. But this is not all, what many miss is that people also visualize letters, words, equations. When I invoke the word "dagmesteral", then you'll likely visualize a picture with word "dagmesteral". If you don't then you'll likely only mimic, the word with COS. This is the case, because the word does not have any meaning, and I made it up. You can either have all three working at the same time. When you hear the word "banana", you should be able to:
- Visualize the picture in 2d or 3d
- Visualize the word as picture
- COS mimicking banana
Are we done?
Not really, there's a final and the most import to piece the puzzle. To identify what it is we have to ask two questions:
What about attention?
What happens when we meditate or stop thinking about anything?
The answer to these two important experiences is what I like to call awareness. This phenomenon feels very intrinsic. It a sense of existing just for the sake of existing. You'd be correct to call this attention, but there's a good chance of you mistaking this for attentions in LLMs. Therefore, awareness is a better word, because awareness that I'm going to talk about encompasses a bit more than attending to previous token in context in order to predict next token. Awareness seems to actually be information distinguishing mechanism. To understand what I mean, we have to categorize awareness into two types:
1) Explicit awareness:
Explicit awareness is what I like to call sensory awareness. Your focus is shifted to one of the senses which could be vision, sound, smell, sensation like touch or pain etc. Let's take vision for an example. When you see something, you're particularly aware about a specific thing like a bird flying on the sky. You distinguish that bird from other information that you are able to visually capture like clouds, hills, planes, houses etc. The same can be said for sound, which your organ does a natural fourier transform and can distinguish one sound from a source over other from another source. Eg: Focusing on a bird singing, while your phone is buzzing with notification, along with other noise. And so on for other senses. This is explicit because you can choose to focus on whatever you'd like to.
To sum up, information distinction with explicit control.
In this case, the outer neural network would be distinguishing some representation in a whole space of current representation in the inner neural network.
There's important observation that I skipped while talking about visual context and COS. The observation is that when you get sensory information, usually visual, you may not have any visual context and COS at all. This happens often when you see something unknown. For example, I'm going to ask you to focus on this writing, try very hard not to manipulate your visual context and COS which we do so frequently. Mildly focus on a word or the writing:
 |
| Alien writing |
I know it's not easy but if you try enough, you'll get a glimpse of even a small moment without visual context and COS. And that is pure explicit awareness. It is easy to realize that when we perceive something in this state of pure explicit awareness, it becomes very difficult or outright impossible to visualize exactly what we have focused on. This observation will play an important role while talking about the function of awareness, which I'm going to, later on.
2) Implicit awareness:
Unlike explicit awareness, implicit awareness does not work upon any sensory information. This is closest to what people experience when meditating. During meditation people usually try to start of focusing on breathing (explicit awareness) then they reach a trance state of implicit awareness, where even sensory information is ignored. The voodoo dark magic of this whole phenomenon. That raises a natural question: What happens during this phase?
The most straightforward answer is that the outer neural network "observes" or shifts awareness to the entire inner neural network. What I mean is that, during implicit awareness, outer neural network focuses on operation of the inner neural network. There's no visual context, nor COS. The awareness is not on the information processed by inner net or representation of information in the inner neural net, but rather the processing done as a whole. Therefore, the level of abstraction is the main point of distinction between explicit and implicit awareness.
Okay, but what happens when we sleep? Where does the conscious experience go at that time?
We're now at the most fun idea. Since, we have defined awareness as active information distinguishing. During sleep, or when you get unconscious, your awareness simply diffuses. Precisely, you stop or temporarily stop distinguishing one information over another. Your senses still work, important organs in your body are still functioning. But you no longer are able to distinguish certain information or process over the other. During sleep sometimes you are aware but often in a dream, which is awareness during a visual context + COS, excluding sensory information.
How do you explain visualizing past event or experience while we reason something with gained knowledge?
When you try to reason or live a vivid memory of something that happened in the past, inner net seems to retrieve this stored information. Awareness in this case seems explicit just like sensory information but instead to the retrieved information.
Why did evolution even end up with awareness?
What survival advantages does it bring?
To be honest, this was and is quite hard to explore. The only reason I could think of is it being a filter mechanism for brain to store important information. You only remember information which you were aware of in that moment in the past. Try remembering what you did yesterday, pretty evident that you'll not remember things that you did not focus on. Your brain filters out information that you were aware as something important to store, but only if it is followed by visual context or COS or both. This is why I mentioned it is difficult to remember with pure explicit awareness.
Another fuzzy reason which is harder to quantify is that: Awareness + COS + visual context leads to better "understanding".
But what does it mean to "understand" something?
To understand this, we have to probe how a child learns while growing up. You could quantify understanding via two metrics: Language, Spatial cause and effect
When we talk about language, we simply mean two things. First, it's a label to identify a real-world object or an action. You could regurgitate the word "apple" and the word "jumping" as many times to a child, and the child would still not understand without showing an apple and the act of jumping. Secondly, language is known for being abstract and vague. If you describe a word "nonsense", you'd get something along the lines "something that does not make any sense", then you try to piece out the word "sense", funny enough, it is quite close to the word "understand". I'd like to call this process analogy reduction. Generalization can be viewed as formulating language L1 where analogy reduction can reduce it to multiple other languages L2, L3 and so on.
Spatial cause and effect are relationships that you learn after interacting with the environment. This relationship is mostly approximate meaning you do not need to know or figure out the underlying physics behind the cause and effect. Example: If you push a ball of a cliff the ball will roll down. You will not have any idea of all the forces acting upon it, nor the math behind it. You gain cause effect relationship with the associated states (object, environment): initial, intermediates and result states and the action performed. "Spatial intelligence as we like to call it".
The triad, consciousness, helps to distinguish information that needs to be labelled by language and visual context that makes understanding much easier. Though, I wouldn't call it the only way but should be more streamlined and provide way better generalization and continual learning than existing pretraining architectures.
But wait, where does emotions fit in all of these?
The answer is that it does not. Emotion such as happiness, pain, love, sadness etc. has nothing to do with consciousness. Interestingly, they're quite orthogonal. Emotions are simply rewards for Reinforcement Learning (RL). If you feel pleasure while doing something, it encourages you to keep doing the same thing, similarly, if you feel pain while say, placing your hand in fire, then you'll get discouraged to do the same activity. In normal cases, outer neural net process emotional signals from inner neural net in the same way as sensory information. But during extreme circumstances like during a "reflex", that quick motion which bypasses brain. It is simply inner neural net taking action to preserve the body bypassing outer neural net, after the "bad rewards" like pain's threshold reaches dangerous levels (hard encoded threshold value).
Machines are not creative, like humans but can they be?
I disagree with anyone who claim that humans have this intrinsic creativity that cannot be replicated. Creativity arises from 2 chain of processes: "understanding" and deeper orders of retrospection. We already talked about "understanding". What I mean by the second process is that creativity can be simulated by introspecting all the information distinction, visual context and COS which you recently say timestamp T1, at another timestamp T2 in a recursive manner. This process is quite useful for technical fields like math, physics etc. Meanwhile the other times, like when you get a random inspiration for an artwork, it merely is the work of visual context formed by retrieving relevant distinguished information from the past. This interesting phenomenon of orders of retrospection can be in fact generalized into "simulation of higher order of perception". What current LLMs do is they simulate second order perception from first order architecture. What makes human different is that having second order architecture, they simulate third order perception. Think of it like you evaluating your own processes you had in a previous timestamp. This could go very extreme as to when you're introspecting the thoughts (where you were simulating your friends thought) that you had when you are conversing with your friend. Creativity, generalization and new insights arise from this higher order of perception. This compels us to ask:
Why did we end up with just second order perceptions?
Why not even higher order of perceptions?
How does intelligence scale with increasing order of perception scale? Is there a bound?
I have no answers to these questions as of yet.
The most important part, how do you quantify agency or self-directed goals?
Initially, in babies, agency looks like random sampling by outer neural net, which is then passed to inner neural net for motor action. As the baby grows up and learns, then the goals are again due to "simulation of higher order perception" that again gets passed to inner neural net for action. This does not rule of random sampling in adults, as we do get moments of what we call "intrusive thoughts". After careful introspection, we usually deny such thoughts, sometimes we do act upon it.
Therefore, formalizing all of these in a computational model may allow us to do continuous learning from the start. If not, it could rule out the postulate claiming consciousness as a computable classical phenomenon.
Comments
Post a Comment