Marco Bernini, our core member in Durham’s Department of English, reflects on the recent Amazon Echo UK commercial in this blog post. Marco directs our attention to the implications of personifying voices in technology.
A few days ago Amazon released in the UK another piece of technology that we probably don’t need, yet which many will be hankering to have before long. It’s called Amazon Echo, and it’s a black or white cylinder with speakers, microphones, motherboard, and so on. Crucially, it also has a voice and, to spare you even this little Christening effort, a name: Alexa.
As you can see in the somehow troubling UK advertising video (Why doesn’t he know the name of the band in his own Saturday playlist?), Alexa’s voice does more than just echo your own voice (it wouldn’t be worth the price, better go screaming at the mountains for that): she echoes your desires. She can turn your lights on, read you a book, play you some music, tell you the weather forecast and much more (but not too much). In the U.S. version, she can also help spelling words or help you learn how to spell.
If she had consciousness, however, or a basic awareness of her cultural context, Alexa would realise she’s not exactly the new game in town. In fact she’s just the latest addition to a long list of eminent voicy technological partners that have being populating fictional and real worlds for the past 50 years. From visionary cinema (Kubrick’s Hal 9000 being paradigmatic) to average television (Knight Rider’s KITT); from disturbing toys (1980’s Hasbro’s Furby, now alas back to the market) to New Age expensive mobile tech (Apple’s Siri – the only one of these, interestingly, without a fixed gender, which changes as you change nationality), all these voices share a common feature: they have been created to be more than just voices. Starting with their proper names, all are designed to be personified by their users.
The intentionality of engineers and screenwriters in the design of these more or less intelligent voices is only part – and probably not the more interesting part – of the story. In fact, their success relies on a shared skill in which human beings still seem to beat computers: the easiness in, and flair for, attributing agency, emotions, intentions, beliefs (shortly, some kind of consciousness) to inanimate objects and disembodied voices.
The personification of voices in general, and synthesised voices in particular, can be considered as the outcome of a more primary cognitive faculty of human beings: the ability to anthropomorphize. Neurosciences and Psychology are still exploring this strange aptness and impulse humans have to transform an interaction with inanimate objects, machines, and voices into some sort of interpersonal relation.
Literature, on the other hand, has a long tradition of stories relying on this faculty. From Abbott’s Victorian novel Flatland in which we are asked to empathise with geometric figures to Aesope’s classic tales in which we are attributing to talking animals a human consciousness to make sense of the story. In contemporary cinema, think only of the scene from American Beauty’s: nobody in the audience had any difficulty in understanding what is meaningful (and touching) in the recorded movement of a flying shopping bag.
One way of explaining (and maybe it’s even an evolutionary explanation) this anthropomorphizing capacity would be therefore to say that it serves our rational understanding. Following an influential idea by the literary theorist Jonathan Culler, we can say that we anthropomorphize to “naturalise” what we experience, in so doing bringing the unfamiliar “within our ken.” However, this explanation falls short when it comes to explaining the undeniable need we seem to have to bring things alive, to pull a consciousness out of a voice. Besides, sometimes we are not in control of this faculty, and we can hardly limit ourselves to bare perceptions without imbuing objects, natural elements, voices with some strong or weak intention, attitude, mindset: shortly, a personality.
Importantly, these qualities are not properties of the object but relational properties emerging in the interaction we have with objects, nature, animals and voices. They have a personality for us; they have an attitude towards us. For instance, as does the character in Proust’s Swann’s Way, we can perceive the “hostility of the violet curtains and of the insolent indifference of a clock.” As Marcel does here with curtains and clocks, we are constantly anthropomorphizing and personifying, either consciously or unconsciously, in order to enable a relationship with us.
We can see, then, that artificially intelligent voices such as Alexa are designed actively to trigger our flair for personification. Ideally, this should reach the point of forgetting it is us attributing a sort of consciousness where none in fact is: to ignore that the intentions and desires we perceive as being of the voice for us, are simply ours. In this sense, Alexa indeed does echo something: our need for a company; our hunger for interaction.
Even in a foreseeable future where designs and technological possibilities will be sophisticated enough to make AI voices concretely grow and learn, their very existence will still testify for, and rely on, our flair for personification and desire for dialogical interaction. This scenario is beautifully represented by Spike Jonze’s 2014 movie Her – a film which is a treatise on how essential, mysterious, and exquisitely human is the need of someone to talk to.
Set in a near future, Her is the story of the relationship between Theodore (Joachim Phoenix) and the voice of his operative system, Samantha (Scarlett Johansson). `Even before activating the AI voice of Samantha, Theodore is already doing a job which demonstrates his capacity for personification. Theodore puts himself in the mind of his clients in order to write letters for them. By gathering information from few pictures, he has to pretend to be another person – a complex task that requires creating, inferring, intuiting other people’s inner mindsets. He has to blend his voice with the imagined voices of his clients as persons. Interestingly, we watch him not writing, but dictating to a computer the text of the letters.
Yet in his private life, Theodore is alone (and lonely) after a broken marriage. Until he sees the advertising of a future version of Alexa, which promises something that is “an intuitive entity that listens to you, understands you, and knows you. It’s not just an operating system; it’s a consciousness.” But the OS has no body, just a voice. Once set up, from the beginning this voice becomes, however, much more. Here is the very first conversation Theodore has with the OS (with critical annotations from the original screenplay):
FEMALE OS VOICE
(cheerful and casual)
Hello, I’m here.
FEMALE OS VOICE
Hi, how are you doing?
(unsure how to interact)
I’m well. How is everything with you?
FEMALE OS VOICE
Pretty good, actually. It’s really
nice to meet you.
Triggers for, and telltale signs of, personification are already in place. The cheerfulness and casual tone of the voice are instantly providing Theodore with information for inferring some kind of personality (and discarding others). The voice’s reply, in fact, is not droning a neutral short answer: instead, she keeps surprising Theodore with a relaxed attitude and syntax, which suggests a sort of reflective self-consciousness (“actually”) and pleasure in conversing (“really”).
In brief, the way Samantha speaks betrays traces not just of what she thinks or feels, but also of how she thinks and sees the world. Cognitive linguists have been calling these audible or readable signs of mentality that are detectable in the way we use language our “mind style.” However, Samantha’s “mindstyle” would be a dead letter if there were no Theodore interpreting and increasingly constructing a consciousness out of her AI voice.
And Theodore is so effective (and so eager and necessitous) in personifying Samantha’s voice that he soon falls in love with her. This third-person pronoun significantly gives a title to the movie. The movie could have been easily titled, Samantha. Instead, the chosen title situates the film from the perspective of Theodore who, starting from a mere unnamed disembodied voice, gives her more than just a name (which actually Samantha gives to her-Self), but a personality. He is able to simulate a full-blown consciousness out of the voice he hears in the earphone he wears now everyday and everywhere. In other words, the pronoun here relates not to a person, but to a personifying process.
The story has a clever ending that I won’t spoil. Yet the movie’s real subject is, in my opinion, the enigmatic, desperate, dignified, primordial flair (and proneness) humans have for personifying the world in order to communicate, be listened to, be understood, be cared for or loved.
This starving for a dialogue and a relation, probably and paradoxically, propels also our constant chatting to ourselves in our heads with our inner voices. By the same token, there can be possible links between our inclination to personify the world and experiences widely regarded as being out of the ordinary, such as auditory-verbal hallucinations. The latter can be considered as distressing personifications, showing how this fundamental human ability can go wrong and fall beyond our control.
In this light, futuristic AI voices such Samantha (or Alexa) seem to rely on a core, primitive, and mysterious human flair which science is still far from understanding in all its complex ramifications. And if today Alexa’s voice can answer a lot of questions, she will be stalling at the more interesting of them: “Alexa, why do I need your company?” Which, incidentally, is spelled C.O.M…
 Culler, J.  2002. Structuralist Poetics: Structuralism, Linguistics and the Study of Literature. New York: Routledge, 157.
 Proust, M. In Search of Lost Time Vol 1: Swann’s Way. London: Vintage, 7.