Causality as Child’s Play
Study of causality confronts us with a huge dilemma. Intense controversy has raged for centuries over this topic among the philosophers. At the same time, studies of child development show that infants learn about causal concepts almost from birth, and toddlers have a sophisticated approach to causality. How can causality be easily understood by babies, but remain confusing and complicated to the best philosophers for centuries? The difficulty is compounded by the fact that philosophical approaches serve as a basis for empirical data analysis in statistics and econometrics. Even though correct estimation of causal effects is essential for policy, widely used econometric textbooks are deeply defective in their approaches to causality. Angrist and Pischke (2017) examine leading popular econometrics textbooks and conclude that these are based on an outmoded paradigm which ignores causality. They call for a pedagogical paradigm shift. Chen and Pearl (2013) also examine six leading econometrics textbooks and come to the same conclusion: these textbooks fail to explain central causal concepts with any degree of clarity. Even though Angrist and Pischke agree with Chen and Pearl on the diagnosis, the two sets of authors offer radically different remedies. Since the 1990’s Pearl and his group have been arguing for an approach based on Directed Acyclic Graphs (DAGs) as central to understanding causality. Angrist and Pischke (2008, 2013) have written two econometrics textbooks which exposit causality using a “Potential Outcomes” approach, and make no mention of DAGs. Thus, while everyone agrees that causality is very poorly handled in econometrics, there is no agreement about the solution to this problem. This has serious implications since philosophical controversies about causality ramify to the policy context involving real data and applications.
“Inversion” is a favorite philosophical device, where all conventional thinking on a subject is replaced by its diametrical opposite. It is natural to think that adults are wiser than children, because they have the advantage of years of experience and learning. In this article, we propose to look at what children can teach the philosophers — and by extension, statisticians, econometricians, and policy makers. We will examine insights about how children learn about causality from the child development literature, and see how they can be used to clarify philosophical controversies about the topic. This examination gives new meaning to the Biblical “You cannot enter the Kingdom of God unless you became as little Children”.
The idea that children start out as “tabula rasa” is firmly rejected by child development studies. Babies come into the world equipped with a vast amount of knowledge about the nature of the world into which they have arrived. Their survival depends on their abilities to “root” and “suckle” — to search for the mother’s nipple, and to latch on and suck milk. Hespos & Van Marie (2012) review of what infants know and learn about physics concludes that “ The evidence supports the view that certain core principles … are present as early as we can test for them, and the nature of the underlying representation is best characterized as primitive initial concepts that are elaborated and refined through learning and experience. “ In particular, babies know that there are objects in the world, that these objects persist through time. They can differentiate between solids which retain shape, and liquids which do not. Vision is specially equipped to detect straight lines which often mark object boundaries. Children are able to track trajectories of objects far more rapidly and accurately than experiential learning based on zero knowledge would allow for.
Babies are designed as amazingly efficient learning machines. They have procedural knowledge — what needs to be done to acquire knowledge of the world — which is adapted to the nature of the world they are born into. That is, the procedures they use to acquire knowledge are efficient because of the way that the external world is structured. They also have generalized learning capabilities — this means that even if the world is very different from their inbuilt expectations, they can learn to adapt to radically different environments.
As we will discuss shortly, babies at birth have already learnt to recognize their mom’s voice — a task which has only recently come within reach of advanced voice recognition computer programs. It seems fair to argue that the knowledge that went into writing such programs, and something equivalent in computing power to the algorithms they use, is hard-wired into infantile brains. Similarly, babies learn to recognize faces very early. This is another advanced skill, which depends on the knowledge of what face features look like — this is required as cues to learn how to differentiate faces. Such knowledge is built into the facial recognition programs currently in existence, which use AI techniques to derive knowledge from looking at hundreds of thousands of faces. Again, it seems hard to resist the conclusion that some specific kinds of knowledge about the actual world we live in is hard-wired into the infant’s brain. For instance, facial expressions for varying emotional states are universal among human beings. As a result, it is possible to hard-wire recognition that a smile represents a happy state. Languages vary greatly but there are general rules which all human languages follow. Babies are born with knowledge of these rules, and learn to parse language into syntactic units, and to differentiate between their native language and other languages with amazing rapidity. The amount of knowledge that goes into programs which can accomplish this is very high. Matching knowledge built into programs which can do what babies do, leads to the conclusion that babies come into the world with an extraordinarily large among of knowledge.
Babies learn about causality as soon as they learn to suckle milk. They learn that sucking leads to flow of milk, and stopping leads to cessation of milk. The fact that babies can choose to suckle or not is crucial to learning the causal effect. When they can control the flow, they know that they cause it. If I control an outcome, it occurs when I desire it to occur and does not when I do not cause it to occur. This control allows me to experience my own causal effectiveness as an agent, and is very different from the Humean constant conjunction. A standard objection to this account is the idea that “suckling” is an “instinct” and hence should not count as knowledge. We discuss an experiment which shows that this is not true.
DeCasper and Fifer (1980) conducted experiments on infants as young as 12 hours to learn whether or not they could recognize their mothers’ voices. The researchers put a nipple in the baby’s mouth which was connected to a voice recording. As long as the babies kept sucking, the voice recording keeps playing, but it stops if they stop for two seconds. After a little while, babies learn that they can control the play of the sound by sucking. Then, when given a choice between their mother’s voice and some other, they show a distinct preference for the mother’s voice. This suggests that they have learnt the sound in the womb. But for our purposes, the key inference is that the babies learn to cause lengthier play of mother’s voice by sucking longer. This is clearly use of knowledge to control an event of type never before encountered, and hence not an instinctive behavior.
One of the reasons that philosophers have difficulty with causality is because they have been persuaded that free will does not exist. Without free will, the baby’s choice of whether or not to suckle was pre-determined billions of years ago by the initial conditions at the birth of the universe. Even though this idea is so preposterous that it is not worth taking seriously, it seems to be widely believed by philosophers, so we provide one more argument for free will. We have direct personal experience of our freedom to make choices, almost from birth. Our lives are built around the choices we make. If this experience is an illusion, then everything we experience is an illusion, and we live programmed lives within a matrix constructed by forces outside our control. We cannot logically rule out this latter possibility. However, there are two types of errors we can make: believing ourselves to be free when in fact we live in a matrix, and the opposite error. The first error is NOT an error because if we believe ourselves to be free within a matrix, then this is also part of what the matrix dictates to us, and we are forced to believe it — we have no choice in the matter. In the second case, if we believe that our actions are pre-determined, when in fact we are free, there is a huge loss to us. We would fail to explore possibilities under the illusion that they do not exist, and that we have no choice. So, in both possible cases, it is best for us to choose to believe in free will.
Infants approach the problems of mastering their own capabilities, as well as learning to manipulate the external world, in multiple dimensions. In this section, we present a sketch of how they may learn causality using only “interventions”. This explains how interventions lead to a non-Humean account of causality, and how this is directly tied to counterfactuals. In fact, children have other resources, to be discussed later, which allow them to learn about causality without direct interventions.
Infants are self-aware. They can learn their crying brings a response — an Adult usually appears on the horizon with some uncertain time lag. What is important about this is that the cause is internal to the infant — they are aware that it is “My crying” which causes the appearance of an adult. In the pre-causal stage, the infant feels some discomfort and responds by crying. She observes a response: an adult caretaker appears and attends to her needs. The causal path diagram implied by this description is given below:
In terms of observations, the child observes a sequence of events PR followed by CR followed by AT. The child is assumed to be a passive observer; she observed her pain, and her own response in terms of crying, and then the appearance of an adult caretaker — mommy. The critical dilemma which has stumped philosophers for centuries is that, on the basis of observations, there is no way to determine causality within this sequence of observations. There is an alternative causal path that also explains this same observed temporal sequence:
PR is common cause of CR and AT
Purely on the basis of observational evidence, there is no way to distinguish between these two possibilities, even though the first possibility leads to a causal link between CR and AA, denoted CR => AA, whereas the second diagram shows a correlation between the two created by the common cause PR. In this second causal diagram, CR and AT are correlated because both occur due to a common cause PR. However, the infant can go beyond observational evidence by intervening on the CR variable. By doing so, she can easily discover the causal relationship. When the child cries without discomfort — as an intervention, made by choice — the difference between correlation and causation is easily sorted out.
If the causal relationships are as in the first diagram, crying without discomfort will lead to Adult Intervention. In the second diagram, if Adult Attention is caused by discomfort only, and not by crying, then the intervention on crying will not get the desired response. Here we have considered crying without discomfort, which is a common occurrence. The key lesson here is that exogenous interventions are necessary for the discovery of causality. Exogenous means that these interventions are not caused by any of the variables under study. This involves “breaking” the natural causal link between PRoblem (discomfort) and CRying, and instead CRying in response to an exogenous impulse — the desire to summon an adult, or merely curiosity to see the effects of crying.
This leads to a subtle problem in terms of the study of causality. Problem-induced crying may differ from exogenous crying. In fact, this is so common that mothers learn to differentiate between cries based on needs, and crying for attention. Developmental studies establish that children are constantly trying to learn about their environment, and testing different types of interventions to assess the causal consequences of these. Children soon become aware that mommy does not pay as much attention to experimental crying as she does to pain-induced crying. In response to this, children experiment with intervention on “pain” itself — they may voluntarily bump their heads or simulate a fall prior to crying, to see what effect this has:
Intervention in pain — creating it exogenously, instead of natural occurrence — can also lead to clarification of the causal structure of the child’s world. In a complex causal structure like the one above, the child might do multiple interventions. A useful and common intervention is just a variant of the first one: cutting the link between PR and CR. Instead of spontaneously crying without any pain, the child can also suppress crying in response to pain. This is common when the child is engaged in play and gets hurt. Instead of attracting attention, which has the potential of removing the child from the play area, the child may choose to suppress crying, avoiding adult attention, so as to continue playing. In the diagram above, suppressing crying would clarify whether the causal link runs from PR to AT or from CR to AT.
The child development literature identifies and studies three mechanisms used by children to learn about causal relations. One of these is “dispositional” — this is the one based on disposition (intentional interventions) by the children themselves, and later, by extension, other agents (humans or animals). In addition, there is good evidence that children can understand fairly sophisticated statistical covariations and use them to assess causality. Children also seem to be born with, or acquire very quickly, knowledge of basic physics — they are surprised when ball A rolls towards ball B, stops short of ball B, and then ball B starts rolling. Contrary to Humean ideas, the causal effects of the balls is understood to be based on contact very early in child development. A survey of the literature by Muentner and Bower argues that the dispositional approach to learning causality is primary, and the other approaches build on this base. For example, Somerville () writes that “Previous work suggests that adults and children readily detect causal structure by intervening on their environment (Gopnik & Schulz, 2004; Gopnik et al., 2004; Kushnir & Gopnik, in press; Lagnado & Sloman, 2004; Sobel & Kushnir, 2003; Steyvers, Tenenbaum, Wagenmakers, & Blum, 2003). Interventions are particularly crucial when one must disambiguate multiple causes or identify variables relevant to causal outcomes. Critically, interventions enable learners to test causal hypotheses and compare the outcomes of their interventions to expected outcomes (e.g., Sobel & Kushnir, 2003).
Taking self-interventions as the primary source for causal learning among infants leads us to natural answers to two problems which have occupied the attention of philosophers for centuries: Counterfactuals and Natural Kinds. We make brief comments on both.
Counterfactuals: Intervention creates counterfactual knowledge: the infant cries in order to get attention because she knows that she will not get attention if she does not cry. Children are able to use both branches of the causal fork to their advantage — they cry when they want attention, and do not cry when they do not want attention. Because both options can be chosen at will, they are aware of the counterfactual that if I start crying, then an adult will come to see what is wrong, even when they choose not to exercise this option. The children’s counterfactual knowledge is far simpler than the philosophers approach. Children’s understanding of counterfactuals has been tested directly. In one experiment, subject children were shown movies of a child walking across the floor with muddy shoes. Subjects were all easily able to understand that the mud on the floor was caused by the shoes. Counterfactual questions like what would have happened if the child had removed his shoes before walking across the floor were easily handled by six year old subjects. Harris’s (2000) conclusion is that “counterfactual thinking comes readily to very young children and is deployed in their causal analysis of an outcome”
Natural Kinds: To understand the issue, consider David Hume’s analysis of how we can learn that when ball A strikes ball B on the pool table, this will cause ball B to move. Suppose an observer watches professionals playing games and observes a thousand such interactions — what can he learn from these observations? Each interaction would be unique, in terms of positions of balls, velocity of A, angle of strike, and configuration of table and other balls. Learning requires encoding the information by throwing away irrelevant details and capturing only those aspects relevant to the causal interaction. A causal model which captures some notion of similarity of objects, especially in terms of having similar causal properties, would seem to be essential to creating such an encoding. In general, inference from past causal interactions to future would require a notion of natural kinds — objects of the same kind are those which have the same causal properties. There is substantial evidence that infants are born with the knowledge that there are objects in the world, and these objects persist over time — they do not blink into or out of existence. Similarly, it seems that some knowledge of natural kinds, and causal properties of objects, is also built into babies. Thus, when we look at “similar” objects, we expect them to have similar causal properties. This notion helps simplify the search for causal properties, and greatly facilitates learning. Suppose 10 events occur at time 1 and 10 events occur at a later time 2. Then there are 1023 nonempty subsets of the initial ten events which could be the cause of 1023 subsets of the later events. It would be impossible to test and explore the one million possible causal hypotheses which would be possible without any similarity relationship. But similarity and localization serve to substantially narrow the space of possible causal connections, making learning possible. By localization I mean that when two balls interact causally, we assume that only the two balls are involved, and the configuration of the remaining balls does not matter (very much). Without localization, causal interactions could be too complex to allow for learning.
What can babies teach philosophers? The ability to control the environment, and to make choices and decisions which affect our personal well-being begins at birth. Babies look for, and suck, when they are hungry, and not when they are satiated. This is not instinctive behavior — it is knowledge, which is built into babies. Thus, tabula rasa hypothesis does not hold. In particular, studies show that infants look for causes for unexpected events. That is, the knowledge that events occur for a cause is built into us. Also, personal interventions provide a powerful means for learning causal relations. If we intervene in the system to bring about a desired outcome, then we learn about causal effects of our actions. This experience of causality is radically different from observations of constant conjunction, which are never sufficient to deduce causality. Making choices, and choosing to intervene, or to not intervene — to suckle, or to stop — are given to us from birth. This free choice is what enables exogenous interventions, and also leads to knowledge of counterfactuals — the consequence of acting or not acting. This free will and resulting counterfactual knowledge is also an essential part of the moral framework within which we live our lives. Moral choices result, partly, from the knowledge of different consequences which occur as a result of our choices.
The causal relations perceived by the baby change rapidly as the baby grows more capable of interacting with the world. Also, because adults are responsive to growth and changes, the same effect can lead to different causal outcomes as growth and learning occur. This means that the concept of causality as a “necessary” connection is a non-starter for infants. In a rapidly changing world, observation of constant conjunctions as a means of learning is also a non-starter. Personal interventions provide for rapid learning of causal connections based on experience. In later lectures, we will see how babies generalize from personal experience to other agents. They attribute intentions and causal efficacy to other “dispositional” agents, and are good at reading intentions from actions. Causal learning based on agent interventions can then be generalized to mechanical interactions based on physical laws. To summarize, the results of watching infants learn about causal relationships essential to their survival and growth leads to a framework which is radically different from that of the philosophers. It resolves many puzzles which have stumped philosophers for centuries.
Finally, it is important to note that the infant can learn to manipulate the world while having very little knowledge of how and why these manipulations are successful. When the infant cries and the mother comes to investigate there are two types of causal mechanisms at work. One type is the mechanical aspect — the strength and type of crying, and the transmission of sound to the mother. The second is the dispositional aspect — how and why the mother responds to the child. Both of these are known to the child only roughly. Bad theories about the mechanisms and the dispositions are compatible with successful manipulations. This is of obvious significance for the philosophy of science.
De Pierris, Graciela and Michael Friedman, “Kant and Hume on Causality”, The Stanford Encyclopedia of Philosophy (Winter 2018 Edition), Edward N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/win2018/entries/kant-hume-causality/>.
Angrist, Joshua D., and Jörn-Steffen Pischke. Mostly harmless econometrics: An empiricist’s companion. Princeton university press, 2008.
Angrist, Joshua D., and Jörn-Steffen Pischke. Mastering ‘metrics: The path from cause to effect. Princeton University Press, 2014.
Angrist, Joshua D., and Jörn-Steffen Pischke. “Undergraduate econometrics instruction: through our classes, darkly.” Journal of Economic Perspectives 31.2 (2017): 125–44.
DeCasper AJ and Fifer WP. 1980. Of human bonding: newborns prefer their mothers’ voices. Science. 208(4448):1174–6.
Chen, Bryant, and Judea Pearl. “Regression and causation: a critical examination of six econometrics textbooks.” Real-World Economics Review, Issue 65 (2013): 2–20.
Hacking, Ian (1983) Representing and intervening: Introductory topics in the philosophy of natural science. Cambridge university press.
Hespos, S. J., & vanMarle, K. (2011). Physics for infants: characterizing the origins of knowledge about objects, substances, and number. Wiley Interdisciplinary Reviews: Cognitive Science, 3(1), 19–27. doi:10.1002/wcs.157
Imbens, Guido W. “Potential outcome and directed acyclic graph approaches to causality: Relevance for empirical practice in economics.” Journal of Economic Literature 58.4 (2020): 1129–79.
C. M. Lorkowski (2021) “David Hume: Causation”, Internet Encyclopedia of Philosophy, https://iep.utm.edu/hume-cau/
Michotte, A. (1963). The perception of causality. New York: Basic Books
Morgan, Stephen L., and Christopher Winship. Counterfactuals and causal inference. Cambridge University Press, 2015
Muentener, Paul, and Elizabeth Bonawitz (2017) “The development of causal reasoning” in Waldmann, Michael, ed. The Oxford handbook of causal reasoning. Oxford University Press.
Schulz, Laura, Tamar Kushnir, and Alison Gopnik. “Learning from doing: Intervention and causal inference.” Causal learning: Psychology, philosophy, and computation (2007): 67–85.
Jessica A. Sommerville, “Detecting Causal Structure: The Role of Interventions in Infants’ Understanding of Psychological and Physical Causal Relations”
Originally published at http://azprojects.wordpress.com on November 23, 2021.