The most informative home video

My family has some home videos. Some are on actual cassettes, and others are on our iPhones or in the cloud. They’re mostly short, and like photographs, they’re somewhat staged. In many cases, they show premeditated or sugar-coated shots of our lives.

But MIT researcher Deb Roy has some home videos that break the mould I just described. Roy and his wife set up video cameras to record every room of their house for about 10 hours a day for the first three years of their son’s life. They have more than 200,000 hours of data. Analyzing it is a mammoth and ongoing task, but it has helped answer some highly-debated and longstanding questions about how humans develop language. For example, one paper describes how this data can be used to predict the “birth” of a spoken word.

What factors facilitate word learning?

Not surprisingly, the child produced shorter words before longer words, as well as words that tended to occur in shorter sentences, and words he had heard often before rarer words. To me, these are intuitive features of words that make them easier to learn.

But there were also some less intuitive features that predicted how early the child would produce certain words. These features were more contextual, taking into account the when and where he heard words in the time leading up to his first production.

One feature that predicted a word’s birth was based on how often the boy heard the word in different rooms throughout the house. Some words were spatially distinct–for example, “breakfast” usually occurred in the kitchen, and others were more spatially dispersed, like the word “toy.” Spatial distinctiveness tended to help him learn words faster.

The researchers also measured temporal distinctiveness, or when during the day the word was likely to be heard by the toddler. Again, “breakfast” was temporally distinct, occurring almost exclusively in the morning, while the word “beautiful” was much more dispersed throughout the day. As with spatially distinct words, the researchers found that more temporally distinct words– those that were most often said at a similar time during the day — were learned sooner than those whose uses were spread out throughout a typical day.

Finally, they looked at the contextual distinctiveness of each word. This is basically the variation in the language that the child tended to hear with the word of interest. The word “fish” was contextually distinct, for example, often occurring with other animal words or words related to stories. “Toy,” on the other hand, occurred with a much greater variety of words and topics, so it was less contextually distinct. As with spatial and temporal distinctiveness, contextual distinctiveness made a word easier to learn.

This TED talk blew my mind.

Why does distinctiveness affect word learning?

Children learn language through conversations that are inseparable from the everyday life contexts they occur in. Those contexts are not just incidental features of word learning, but are actually crucial variables affecting how language is learned. This work is a reminder that language use and development is actually about much more than language, just as thinking is something that requires much more than just a brain. We humans are inseparable from our environments, and those environments play a big role in shaping how we think and navigate the wonderfully messy world we live in.


Embodied Language Conflict

Last night, I read a cool paper by Bergen and colleagues on the role of embodiment in understanding language. The idea is that portions of the brain that are used for perception and motor activity also play a role in understanding language via a process referred to as “simulation”.

Variations of the Perky effect can be used to study language understanding. For example, if a person is simulating while understanding language, it may be harder for him to use that same part of the brain in a visual or motor task. This is exactly what Bergen et al. found:

In Experiment 1, participants viewed sentences whose verbs literally denoted up or down, such as “The cork rocketed,” an “UP” sentence. At the same time, they had to characterize pictures of objects that were either located at the top or bottom of a screen. When the objects were located at the top, they were slower to do so, demonstrating an interference effect that may have occurred because they were simulating an “UP” sentence. This effect was also observed for “DOWN” sentences and objects located at the bottom of the screen.

When reading that "the cork rocketed," you probably simulated something in the upward direction, like this. Image:
When reading that “the cork rocketed,” you probably simulated something in the upward direction, like this.

Experiment 2 was the same, except up/down nouns were used instead of verbs. The experimenters again found an interference effect in the same direction. This suggests that the specific lexical entry isn’t what causes the simulation, but instead understanding the sentence as a whole may.

In Experiment 3, sentences containing verbs that expressed metaphorical motion were used (for example, “The prices climbed.”). There was no interference effect, nor was there an effect in Experiment 3, in which abstract, non-metaphorical verbs (such as “the percentage decreased”) were used. Together, these add support to the idea that the meaning of a sentence as a whole triggers simulation, rather than individual words.

Then this morning, I read a post about a paper that counters Bergen et al.’s findings. In the fMRI study reported, participants were shown nouns, verbs, noun-like nonwords, and verb-like nonwords (their endings were what signaled whether they were noun- or verb-like). The authors found that when viewing verbs and verb-like nonwords, participants’ premotor cortices were activated more than when viewing nouns and noun-like nonwords. They took this as an indication that the observed cortical responses to action words result from ortho-phonological probabilistic cues to grammar class, as opposed to embodied motor representations.

But, what about context? We rarely come in contact with words in isolation, but instead with words embedded in the context of a sentence, and sentences in their contexts too. Since the methods in the anti-embodied language study aren’t reflective of the real-life situations in which we encounter language, are they meaningful? How can we reconcile a these two studies?