How We Learn:

Why Brains Learn Better Than Any Machine . . . for Now - - - by Stanislas Dehaene

(Viking (2020) ISBN 978-0525559887) Great Interview
Stanilas Dehaene (de-HAHN) wrote my favorite book on Consciousness and the Brain. He can explain complex ideas with elegance. His background and breadth of knowledge enable him to weave a rich fabric of understanding.

The coolest idea for me: "neuronal recycling" - we reuse brain regions that have the right basic mechanisms for new skills, for example, we take over face recognition regions to use for letter recognition.
Perhaps you might like to know the optimal frequency for reviewing material depending of how long you want to remember it.

This book is about learning and teaching. One of the windows into the mechanics is neuroscience, but, it is written for very curious audience - not specialists. The book covers what we are born with (nature), what nurture can do for us. There are various comparisons to AI, which are interesting, but, certainly not essential. It also covers various pedagogical theories, past and present and why many of them do not work. This all sets up for the meat of the book.

Here is the authors gist:
During evolution, four major functions appeared that maximized the speed with which we extracted information from our environment. I call them the four pillars of learning, because each of them plays an essential role in the stability of our mental constructions: if even one of these pillars is missing or weak, the whole structure quakes and quivers. Conversely, whenever we need to learn, and learn fast, we can rely on them to optimize our efforts. These pillars are: Attention, which amplifies the information we focus on. Active engagement, an algorithm also called “curiosity,” which encourages our brain to ceaselessly test new hypotheses. Error feedback, which compares our predictions with reality and corrects our models of the world. Consolidation, which renders what we have learned fully automated and involves sleep as a key component.

This page is mostly the quotes, in italics that I want to be able to refer to. This is in NO WAY a SUMMARY!. If you are reading this, I can gaurantee you will like the book. If you buy a hard cover and don't like it, I'll buy your copy. I wish I could go back in time and read this in hard cover. Writing in the margins is truly the best way to capture one's ideas while readng a book. As it is, I got the audio & e-book. Listening while driving is a great use of time, but, the insight must be very powerful to take the time to pull over and make a note of it.

The book has dozens and dozens of fun tidbits like this:
Mirth seems to be one of those uniquely human emotions that guide learning. Our brain triggers a mirth reaction when we suddenly discover that one of our implicit assumptions is wrong, forcing us to drastically revise our mental model. According to the philosopher Dan Dennett, hilarity is a contagious social response that spreads as we draw each other’s attention to an unexpected piece of information. And, indeed, all things being equal, laughing during learning seems to increase curiosity and enhance subsequent memory.

And insights the blend nueroscince to child development:
Cognitive psychology is full of examples where children gradually correct their mistakes as they increasingly manage to concentrate and inhibit inappropriate strategies... for example, you hide a toy a few times at location A, and then switch to hiding it at location B, babies below one year of age continue to look for it at location A (even if they saw perfectly well what happened)... Examination of the babies’ eyes shows that they know where the hidden object is. But they have trouble resolving mental conflicts: the routine response that they learned on previous trials tells them to go to location A, while their more recent working memory tells them that, on the present trial, they should inhibit this habitual response and go to location B. Before ten months of age, the habit prevails. At this age, what is lacking is executive control, not object knowledge. Indeed, the A-not-B error disappears around twelve months of age, in direct relation to the development of the prefrontal cortex.

Attention (pillar of learning #1 of 4)

No summaries here. The book is SO RICH!!! Just this little intro:
Attention plays such a fundamental role in the selection of relevant information that it is present in many different circuits in the brain. American psychologist Michael Posner distinguishes at least three major attention systems:
- Alerting, which indicates when to attend, and adapts our level of vigilance.
- Orienting, which signals what to attend to, and amplifies any object of interest.
- Executive attention, which decides how to process the attended information, selects the processes that are relevant to a given task, and controls their execution.

To direct attention is to choose, filter, and select: this is why cognitive scientists speak of selective attention. This form of attention amplifies the signal which is selected, but it also dramatically reduces those that are deemed irrelevant. The technical term for this mechanism is “biased competition”: at any given moment, many sensory inputs compete for our brain’s resources, and attention biases this competition by strengthening the representation of the selected item while squashing the others. This is where the spotlight metaphor reaches its limits: to better light up a region of the cortex, the attentional spotlight of our brain also reduces the illumination of other regions. The mechanism relies on interfering waves of electrical activity: to suppress a brain area, the brain swamps it with slow waves in the alpha frequency band (between eight and twelve hertz), which inhibit a circuit by preventing it from developing coherent neural activity.

Active Engagement

I suggest always keeping the concept of active engagement in mind. Maximally engaging a child’s intelligence means constantly feeding them with questions and remarks that stimulate their imagination and make them want to go deeper... The ideal scenario is to offer the guidance of a structured pedagogy while encouraging children’s creativity by letting them know that there are still a thousand things to discover. I remember a teacher who, just before summer vacation, told me, “You know, I just read a little math problem I couldn’t solve. . . .” And this is how I found myself ruminating on this question all summer, trying to do better than the teacher could. . . . Mustering children’s active engagement goes hand in hand with another necessity: tolerating their errors while quickly correcting them. This is our third pillar of learning.

Error Feedback

See below for Dehaene's synopsys of the Bayesian Theory of the Brain - prediction and error signals. Very, super important.

Classical conditioning and the Bayesian theory are almost the same?
The Rescorla-Wagner theory nicely explains the details of a learning paradigm called “classical conditioning.” Everyone has heard of Pavlov’s dog. In Pavlovian conditioning experiments, a dog hears a bell, which is an initially neutral and inefficient stimulus. After repeated pairing with food, however, the same bell ends up triggering a conditioned reflex. The dog salivates whenever he hears the bell, because he has learned that this sound systematically precedes the arrival of food. How does the theory explain these findings?

The Rescorla-Wagner rule assumes that the brain uses sensory inputs (the sensations generated by [Pavlov's] bell) to predict the probability of a subsequent stimulus (food). It works like this: The brain generates a prediction by computing a weighted sum of its sensory inputs. It then calculates the difference between this prediction and the actual stimulus it receives: this is the prediction error, a fundamental concept of the theory, which measures the degree of surprise associated with each stimulus. The brain then uses this surprise signal to correct its internal representation: the internal model changes in direct proportion to both the strength of the stimulus and the value of the prediction error. The rule is such that it guarantees that the next prediction will be closer to reality.

Consolidation (SLEEP!!!)

And this is where Wilson and McNaughton’s experiments fit in. They discovered that when the rat falls asleep, the place cells in its hippocampus start firing again, in the same order. The neurons literally replay the trajectories of the preceding wake period. The only difference is speed: during sleep, neuronal discharges can be accelerated by a factor of twenty. In their sleep, rats dream of a high- speed race through their environment!

Discoveries during sleep... the most famous case is the German chemist August Kekule von Stradonitz (1829–96), who first dreamed up the structure of benzene—an unusual molecule, because its six carbon atoms form a closed loop, like a ring or . . . a snake that bites its tail.

Can sleep really increase our creativity and lead us to truth? ... Jan Born and his team experimented. 24 During the day, these researchers taught volunteers a complex algorithm, which required applying a series of calculations to a given number. However, unbeknownst to the participants, the problem contained a hidden shortcut, a trick that cut the calculation time by a large amount. Before going to sleep, very few subjects had figured it out. However, a good night’s sleep doubled the number of participants who discovered the shortcut, while those who were prevented from sleeping never experienced such a eureka moment. Moreover, the results were the same regardless of the time of day at which participants were tested. Thus, elapsed time was not the determining factor: only sleep led to genuine insight.

Predicitve Coding / Error Signals

This is a sort of a side story to the book, but Predictive Coding with Prediction Error Signals fascinates me to NO end!!! The footnote: Bayesian model of information processing in the cortex: Friston, 2005. For empirical data on hierarchical passing of probabilistic error messages in the cortex, see, for instance, Chao, Takaura, Wang, Fujii, and Dehaene, 2018; Wacongne et al., 2011.

Dehaene's description of predictive coding, aka, active inference:
According to [Bayesian] theory, each region of the brain formulates one or more hypotheses and sends the corresponding predictions to other regions. In this way, each brain module constrains the assumptions of the next one, by exchanging messages that convey probabilistic predictions about the outside world. These signals are called “top-down” because they start in high-level cerebral areas, such as the frontal cortex, and make their way down to the lower sensory areas, such as the primary visual cortex. The theory proposes that these signals express the realm of hypotheses that our brain considers plausible and wishes to test.

In sensory areas, these top-down assumptions come into contact with “bottom-up” messages from the outside world, for instance, from the retina. At this moment, the model is confronted with reality. The theory says that the brain should calculate an error signal: the difference between what the model predicted and what has been observed. The Bayesian algorithm then indicates how to use this error signal to modify the internal model of the world. If there is no mistake, it means that the model was right. Otherwise, the error signal moves up the chain of brain areas and adjusts the model parameters along the way. Relatively quickly, the algorithm converges toward a mental model that fits the outside world.

According to this vision of the brain, our adult judgments combine two levels of insights: the innate knowledge of our species (what Bayesians call priors, the sets of plausible hypotheses inherited throughout evolution) and our personal experience (the posterior: the revision of those hypotheses, based on all the inferences we have been able to gather throughout our life). This division of labor puts the classic “nature versus nurture” debate to rest: our brain organization provides us with both a powerful start-up kit and an equally powerful learning machine. All knowledge must be based on these two components: first, a set of a priori assumptions, prior to any interaction with the environment, and second, the capacity to sort them out according to their a posteriori plausibility, once we have encountered some real data.

[jch: note that for predictive processing and generating error signals, both of what Dehaene calls Bayesian priors and posteriors become priors for calculating the probabilities (posteriors) of how the new percept fits into the current mental model]

The Bayesian Brain Hypothesis Audio available
Karl Friston intro to the Bayesian Brain

LANGUAGE HIGHWAYS

Humans are hardwired for language - from birth:
Activity flows through all these brain areas in a specific order because they are connected to one another. In adults, we are beginning to understand which neuronal pathways interconnect the language regions. In particular, neurologists have discovered that a large cable made of millions of nerve fibers, called the "arcuate fasciculus,” connects the temporal and parietal language areas at the back of the brain with frontal areas, notably the famous Broca’s area. This bundle of connections is a marker of the evolution of language. It is much larger in the left hemisphere, which, in 96 percent of right-handers, is devoted to language. Its asymmetry is specific to humans and is not observed in other primates, not even our closest cousins, the chimpanzees. Once again, this anatomical characteristic is not the outcome of learning: it is there from the start. In fact, when we examine the connections of a newborn’s brain, we discover that not only the arcuate fasciculus but all the major fiber bundles that connect cortical and subcortical brain areas are in place at birth (see figure 8).

But let’s go back to our experiment where babies listened to sentences in the MRI scanner. After entering the primary auditory area, the activity spread rapidly. A fraction of a second later, other areas lit up, in a fixed order: first, the secondary auditory regions, adjacent to the primary sensory cortex; then a whole set of temporal lobe regions, forming a gradual stream; and finally Broca’s area, at the base of the left frontal lobe, simultaneously with the tip of the temporal lobe. This sophisticated information processing chain, lateralized to the left hemisphere, is remarkably similar to that of an adult. At two months, babies already activate the same hierarchy of phonological, lexical, syntactic, and semantic brain areas as adults. And, just as in adults, the more the signal climbs up in the hierarchy of the cortex, the slower the brain responses are and the more these areas integrate information on an increasingly high level (figure 6(nothere)). including the strength of their brains’ alpha waves, which is a marker of attention and vigilance. Social skills and vocabulary were also markedly improved.

Learning to Read

Long before children learn how to read, they obviously possess a sophisticated visual system that allows them to recognize and name objects, animals, and people. They can recognize any image regardless of its size, position, or orientation in 3-D space, and they know how to associate a name to it. Reading recycles part of this preexisting picture naming circuit. The acquisition of literacy involves the emergence of a region of the visual cortex that my colleague Laurent Cohen and I have dubbed the “visual word form area.” This region concentrates our learned knowledge of letter strings, to such an extent that it can be considered as our brain’s “letter box.” It is this brain area, for instance, that allows us to recognize a word regardless of its size, position, font, or cAsE, whether UPPERCASE or lowercase. In any literate person, this region, which is located in the same spot in all of us (give or take a few millimeters), serves a dual role: it first identifies a string of learned characters, and then, through its direct connections to language areas, it allows those characters to be quickly translated into sound and meaning.

...theory predicts that reading should invade an area of the cortex normally devoted to a similar function and repurpose it to this novel task. In the case of reading, we expect a competition with the preexisting function of the visual cortex, which is to recognize all sorts of objects, bodies, faces, plants, and places. Could it be that we lose some of the visual functions that we inherited from our evolution as we learn to read? Or, at the very least, are these functions massively reorganized? This counterintuitive prediction is precisely what my colleagues and I tested in a series of experiments. To draw a complete map of the brain regions that are changed by literacy, we scanned illiterate adults in Portugal and Brazil, and we compared them to people from the same villages who had had the good fortune of learning to read in school, either as children or adults. Unsurprisingly perhaps, the results revealed that, with reading acquisition, an extensive map of areas had become responsive to written words. Flash a sentence, word by word, to an illiterate individual, and you will find that their brain does not respond much: activity spreads to early visual areas, but it stops there, because the letters cannot be recognized. Present the same sequence of written words to an adult who has learned to read, and a much more extended cortical circuit now lights up, in direct proportion to the person’s reading score. The areas activated include the letter box area, in the left occipitotemporal cortex, as well as all the classical language regions associated with language comprehension. Even the earliest visual areas increase their response: with reading acquisition, they seem to become attuned to the recognition of small print. 42 The more fluent a person is, the more these regions are activated by written words, and the more they strengthen their links: as reading becomes increasingly automatic, the translation of letters into sounds speeds up.

But we can also ask the opposite question: Are there regions that are more active among bad readers and whose activity decreases as one learns to read? The answer is positive: in illiterates, the brain’s responses to faces are more intense. The better we read, the more this activity decreases in the left hemisphere, at the exact place in the cortex where written words find their niche—the brain’s letter box area. It’s as if the brain needs to make room for letters in the cortex, so the acquisition of reading interferes with the prior letter box area. It’s as if the brain needs to make room for letters in the cortex, so the acquisition of reading interferes with the prior function of this region, which is the recognition of faces and objects. But, of course, since we do not forget how to recognize faces when we learn to read, this function is not just chased out of the cortex. Rather, we have also observed that, with literacy, the response to faces increases in the right hemisphere. Driven out of the left hemisphere, which is the seat of language and reading for most of us, faces take refuge on the other side.

We first made this observation in literate and illiterate adults, but we quickly replicated our results in children who were learning to read. 44 As soon as a child begins to read, the visual word form area begins to respond in the left hemisphere. Meanwhile its symmetrical counterpart, in the right hemisphere, strengthens its response to faces. The effect is so powerful that, for a given age, just by examining the brain activity evoked by faces, a computer algorithm can correctly decide whether a child has or has not yet learned to read. And when a child suffers from dyslexia, these regions do not develop normally—neither on the left, where the visual word form area fails to emerge, nor on the right, where the fusiform cortex fails to develop a strong response to faces. 45 Reduced activity of the left occipitotemporal cortex to written words is a universal marker of reading difficulties in all countries where it has been tested.

Recently, we got permission to conduct a bold experiment. We wanted to see the reading circuits emerge in individual children—and to this aim, we brought the same children back to our brain-imaging center every two months, from the end of kindergarten through the end of first grade. The results lived up to our expectations. The first time we scanned these children, there was not much to be seen: as long as the children had not yet learned how to read, their cortex responded selectively to objects, faces, and houses, but not to letters. After two months of schooling, however, a specific response to written words appeared, at the same exact location as in adults: the left occipitotemporal cortex. Very slowly, the representation of faces changed: as the children became more and more literate, face responses increased in the right hemisphere, in direct proportion to reading scores. Once again, in agreement with the neuronal recycling hypothesis, we could see reading acquisition compete with the prior function of the left occipitotemporal cortex, the visual recognition of faces. We realized while doing this work that this competition could be explained in two different ways. The first possibility is what we called the “knockout model”: from birth on, faces settle in the visual cortex of the left hemisphere, and learning to read later knocks them straight out into the right hemisphere. The second possibility is what we termed the “blocking model”: the cortex develops slowly, gradually growing specialized patches for faces, places, and objects; and when letters enter this developing landscape, they take over part of the available territory and prevent the expansion of other visual categories. So, does literacy lead to a knockout or a blockade of the cortex? Our experiments suggest the latter: learning to read blocks the growth of face-recognition areas in the left hemisphere. We witnessed this blockade thanks to the MRI scans that we acquired every two months from the children who were learning to read. 47 At this age, around six or seven, cortical specialization is still far from complete. A few patches are already dedicated to faces, objects, and places, but there are also many cortical sites that have not yet specialized for any given category. And we could visualize their progressive specialization: when children entered first grade and quickly began to read, letters invaded one of those poorly specified regions and recycled it. Contrary to what I initially thought, letters do not completely overrun a preexisting face patch; they move in right next door, in a free sector of cortex, a bit like an aggressive supermarket setting up shop right next to a small grocery store. The expansion of one stops the other —and because letters settle down in the left hemisphere, which is dominant for language, faces have no choice but to move to the right side.

Neuronal Recycling - reusing special brain regions for new purposes.

This is the newest idea for me: that we reuse brain regions that have the right basic mechanisms for new skills.

Let us first take the example of mathematics. As I explained in my book The Number Sense, 17 there is now considerable evidence to show that math education (like so many other aspects of learning) does not get imprinted onto the brain like a stamp on melted wax. On the contrary, mathematics molds itself into a preexisting, innate representation of numerical quantities, which it then extends and refines. In both humans and monkeys, the parietal and prefrontal lobes contain a neural circuit that represents numbers in an approximate manner. Before any formal education, this circuit already includes neurons sensitive to the approximate number of objects in a concrete set. 18 What does learning do? In animals trained to compare quantities, the amount of number-detecting neurons grows in the frontal lobe. 19 Most important, when they learn to rely on the numerical symbols of Arabic digits, rather than on the mere perception of approximate sets, a fraction of these neurons become selective to such digits. 20 This (partial) transformation of a circuit in order to incorporate the cultural invention of numerical symbols is a great example of neuronal recycling. In humans, when we learn to perform basic arithmetic (addition and subtraction), we continue to recycle that region, but also the nearby circuitry of the posterior parietal lobe. That region is used to shift our gaze and our attention—and it seems that we reuse those skills to move in number space: adding activates the same circuits that move your attention to the right, in the direction of larger numbers, while subtracting excites circuits that shift your attention to the left. 21 We all possess a kind of number line in our heads, a mental map of the number axis on which we have learned to accurately move when we perform calculations.

So cool:
Grid cells are neurons located in a specific region of the rat brain called the “entorhinal cortex.” Edvard and May-Britt Moser earned the Nobel Prize in 2014 for discovering their remarkable geometrical properties. They were the first to record from neurons in the entorhinal cortex while the animal moved around in a very large room. 13 We already knew that in a nearby region called the “hippocampus,” neurons behaved as “place cells”: they fired only if the animal was in a specific location in the room. The Moser’s groundbreaking discovery was that grid cells did not respond to just a single place, but to a whole set of positions. Furthermore, those privileged locations which made a given cell fire were regularly arrayed: they formed a network of equilateral triangles that grouped together to form hexagons, a bit like the spots on the skin of a giraffe or the basalt columns in volcanic rocks! Whenever the animal walks around, even in darkness, the firing of each grid cell tells the rat where it is in relation to a network of triangles that span the entire space. The Nobel committee rightly called this system the “GPS of the brain”: it provides a highly reliable neuronal coordinate system that maps external space.

Big theme in the book - we are not blank slates, but, rather born with many, many specific brain regions with specific strengths:
The conclusion that emerges from this line of research emphasizes the joint power of genes and self-organization in the development of the human brain. At birth, the baby’s cortex is folded almost like an adult’s. It is already subdivided into specialized sensory and cognitive areas, which are interconnected by precise and reproducible bundles of fibers. It hosts a collection of partially specialized modules, each of which projects a particular type of representation onto the outside world. The grid cells of the entorhinal cortex draw two-dimensional planes, perfect for coding and navigating space. As we will see later, other regions, such as the parietal cortex, draw lines, excellent for coding linear quantities including number, size, and the passing of time; and Broca’s area projects tree structures, ideal for coding the syntax of languages. From our evolution, we inherit a set of fundamental rules from which we will later select those that best represent the situations and concepts that we will have to learn in our lifetime.

wikipedia: Large-scale_brain_networks

Great summary of Memory!

At least four types of memory:
- Working memory retains a mental representation in active form for a few seconds. It mainly relies on the vigorous firing of many neurons in the parietal and prefrontal cortices, which in turn support neurons in other, more peripheral regions. 9 Working memory is typically what allows us to keep a phone number in mind: during the time it takes us to type it into our smartphone, certain neurons support one another and thus keep the information in an active state. This type of memory is primarily based on the maintenance of a sustained pattern of activity—although it was recently discovered that it probably also involves short-lived synaptic changes, 10 allowing the neurons to go briefly dormant and quickly return to their active state. At any rate, working memory never lasts more than a few seconds: as soon as we get distracted by something else, the assembly of active neurons fades away. It is the brain’s short-term buffer, keeping in mind only the hottest, most recent information.
- Episodic memory: The hippocampus, a structure located in the depths of the cerebral hemispheres below the cortex, records the unfolding episodes of our daily lives. Neurons in the hippocampus seem to memorize the context of each event: they encode where, when, how, and with whom things happened. They store each episode through synaptic changes, so we can remember it later. The famous patient H.M., whose hippocampi in both hemispheres had been obliterated by surgery, could no longer remember anything: he lived in an eternal present, unable to add the slightest new memory to his mental biography. Recent data suggest that the hippocampus is involved in all kinds of rapid learning. As long as the learned information is unique, whether it is a specific event or a new discovery worthy of interest, the neurons in the hippocampus assign it a specific firing sequence.
- Semantic memory: Memories do not seem to stay in the hippocampus forever. At night, the brain plays them back and moves them to a new location within the cortex. There, they are transformed into permanent knowledge: our brain extracts the information present in the experiences we lived through, generalizes it, and integrates it into our vast library of knowledge of the world. After a few days, we can still remember the name of the president, without having the slightest memory of where or when we first heard it: from episodic, the memory has now become semantic. What was initially just a single episode was transformed into long-lasting knowledge, and its neural code moved from the hippocampus to the relevant cortical circuits.
- Procedural memory: When we repeat the same activity over and over again (tying our shoes, reciting a poem, calculating, juggling, playing the violin, cycling . . .), neurons in the cortex and other subcortical circuits eventually modify themselves so that information flows better in the future. Neuronal firing becomes more efficient and reproducible, pruned of any parasitic activity, unfolding unerringly and as precisely as clockwork. This is procedural memory: the compact, unconscious recording of patterns of routine activity. Here, the hippocampus does not intervene: through practice, the memory gets stored in an implicit storage space, primarily involving a subcortical set of neural circuits called the “basal ganglia.” This is why the patient H.M., even without any conscious, episodic, hippocampus-related memory, could still learn new procedures. The researchers even taught him to write backwards while looking at his hand in a mirror. Having no memory of the numerous times he had practiced this before, he was flabbergasted to find out how good he was at what he believed to be a completely new trick!

4 Page Summary

There is a 4 page summary at the end that wraps everything up into the following principles, each with a short summary:

Do not underestimate children.
Take advantage of the brain’s sensitive periods.
Enrich the environment.
Rescind the idea that all children are different.
Pay attention to attention.
Keep children active, curious, engaged, and autonomous. Make every school day enjoyable.
Encourage efforts.
Help students deepen their thinking.
Set clear learning objectives.
Accept and correct mistakes.
Practice regularly.
Let students sleep.

This book is fantastic!!!

jch.com/jch/notes/DehaeneHowWeLearn.html 2020-02-16 YON Book Notes