COMPUTERS ARE LEARNING TO READ EMOTION, AND THE BUSINESS WORLD CAN’T WAIT.
By Raffi Khatchadourian, The New Yorker
Three years ago, archivists at A.T. & T. stumbled upon a rare fragment of computer history: a short film that Jim Henson produced for Ma Bell, in 1963. Henson had been hired to make the film for a conference that the company was convening to showcase its strengths in machine-to-machine communication. Told to devise a faux robot that believed it functioned better than a person, he came up with a cocky, boxy, jittery, bleeping Muppet on wheels. “This is computer H14,” it proclaims as the film begins. “Data program readout: number fourteen ninety-two per cent H2SOSO.” (Robots of that era always seemed obligated to initiate speech with senseless jargon.) “Begin subject: Man and the Machine,” it continues. “The machine possesses supreme intelligence, a faultless memory, and a beautiful soul.” A blast of exhaust from one of its ports vaporizes a passing bird. “Correction,” it says. “The machine does not have a soul. It has no bothersome emotions. While mere mortals wallow in a sea of emotionalism, the machine is busy digesting vast oceans of information in a single all-encompassing gulp.” H14 then takes such a gulp, which proves overwhelming. Ticking and whirring, it begs for a human mechanic; seconds later, it explodes.
The film, titled “Robot,” captures the aspirations that computer scientists held half a century ago (to build boxes of flawless logic), as well as the social anxieties that people felt about those aspirations (that such machines, by design or by accident, posed a threat). Henson’s film offered something else, too: a critique—echoed on television and in novels but dismissed by computer engineers—that, no matter a system’s capacity for errorless calculation, it will remain inflexible and fundamentally unintelligent until the people who design it consider emotions less bothersome. H14, like all computers in the real world, was an imbecile.
Today, machines seem to get better every day at digesting vast gulps of information—and they remain as emotionally inert as ever. But since the nineteen-nineties a small number of researchers have been working to give computers the capacity to read our feelings and react, in ways that have come to seem startlingly human. Experts on the voice have trained computers to identify deep patterns in vocal pitch, rhythm, and intensity; their software can scan a conversation between a woman and a child and determine if the woman is a mother, whether she is looking the child in the eye, whether she is angry or frustrated or joyful. Other machines can measure sentiment by assessing the arrangement of our words, or by reading our gestures. Still others can do so from facial expressions.
Our faces are organs of emotional communication; by some estimates, we transmit more data with our expressions than with what we say, and a few pioneers dedicated to decoding this information have made tremendous progress. Perhaps the most successful is an Egyptian scientist living near Boston, Rana el Kaliouby. Her company, Affectiva, formed in 2009, has been ranked by the business press as one of the country’s fastest-growing startups, and Kaliouby, thirty-six, has been called a “rock star.” There is good money in emotionally responsive machines, it turns out. For Kaliouby, this is no surprise: soon, she is certain, they will be ubiquitous.
Affectiva is situated in an office park behind a strip mall on a two-lane road in Waltham, Massachusetts, part of a corridor that serves as Boston’s answer to Silicon Valley. The headquarters have the trappings of a West Coast startup—pool table, beanbag chairs—but the sensibility is New England; many of the employees are from M.I.T. From a conference room, the Amtrak line to Boston is visible beyond a large parking lot.
When I visited in September, Kaliouby walked me past charts of facial expressions, some of them scientific diagrams, some borrowed from comics. Kaliouby has a Ph.D. in computer science, and, like many accomplished coders, she has no trouble with mathematical concepts like Bayesian probability and hidden Markov models. But she is also at ease among people: emotive, warm, even flirtatious. She is a practicing Muslim, and until two years ago she wore a head scarf, which had the effect of drawing the eye to her rounded, expressive features. Frank Moss, a former director of M.I.T.’s Media Lab, where she held a postdoctoral position, told me that she has a high “emotional intelligence.” As a mother of two, she worries about technology’s effects on her children.
Affectiva is the most visible among a host of competing boutique startups: Emotient, Realeyes, Sension. After Kaliouby and I sat down, she told me, “I think that, ten years down the line, we won’t remember what it was like when we couldn’t just frown at our device, and our device would say, ‘Oh, you didn’t like that, did you?’ ” She took out an iPad containing a version of Affdex, her company’s signature software, which was simplified to track just four emotional “classifiers”: happy, confused, surprised, and disgusted. The software scans for a face; if there are multiple faces, it isolates each one. It then identifies the face’s main regions—mouth, nose, eyes, eyebrows—and it ascribes points to each, rendering the features in simple geometries. When I looked at myself in the live feed on her iPad, my face was covered in green dots. “We call them deformable and non-deformable points,” she said. “Your lip corners will move all over the place—you can smile, you can smirk—so these points are not very helpful in stabilizing the face. Whereas these points, like this at the tip of your nose, don’t go anywhere.” Serving as anchors, the non-deformable points help judge how far other points move.