Until the invention of writing, spoken words, gestures, vocalizations, and facial expressions were the most popular ways to share ideas and feelings. New ways to communicate — cell phones, text messaging services, emails, and social media — haven’t been available until relatively recently. And times change quickly. As people have begun to rely on modern forms of communication, they have gradually shifted to text-based communication.
Text-based communication, for millions of individuals, has begun to take the place of telephone and face-to-face conversations. Text messages and other media — such as GIFs and short videos — offer quick and condensed interactions that often allow people to multitask as they communicate. In an attempt to share quick emotional content or avoid miscommunications, many users send picture- or text-based emojis: lol 🙂
These new symbols can add some emotional content, but both parties have to know what the symbols represent — IMHO. But despite their inconvenience in today’s world, phone and face-to-face conversations still have an advantage in accurately conveying emotional information. This is because text alone does not supply the auditory and visual cues that we rely on to effectively communicate not just the meaning of our words but the underlying intention. One way of communicating the underlying intention of a message is to use speech prosody, or tone of voice.
Prosody involves pitch, volume, and word emphasis in order to convey emotional tone, to clarify word meanings, and to determine types of sentences (e.g., the difference between a statement and a question). Communicating emotions through speech is complex and contains a wide array of speech variations depending on a given emotional state. There may be variations in pitch, speech tempo, rhythm, voice quality, loudness, and pronunciation (Mozziconacci, 2002). For example, a person who is sad may speak slowly, using a soft voice. Yet tone of voice can be misleading without context. Faces often provide an emotional context for tone of voice.
Facial expressions share an enormous amount of emotional information that is easy for most people to understand. In fact, Ekman and Friesen (1969) found that there are six universally recognizable human emotions: anger, disgust, fear, happiness, sadness, and surprise. Ekman’s (1977) Facial Action Coding System was developed to code universal human facial expressions into general patterns regardless of race, age, sex, or ethnicity. They found that facial expressions are comprised of specific muscle movements, called facial action units.
Each facial action unit is assigned a number, which corresponds to a specific set of muscles and the movements they create when stimulated. For example, in order for a person to be viewed as “happy,” they have to display a combination of facial action units: 6+12+25.
While texts, GIFs, emojis, and other tools can express emotions to some degree, the combination of facial expressions, spoken words, and tone of voice seems to be the most effective, and humanly innate, system for sharing and understanding feelings. But what if a person struggles to learn social cues by participating in conversations, spending time in groups, or watching others in their environment?
Recent technological advances in facial recognition and synthesized speech technology offer potential solutions for these unique individuals and many others. Research shows that a combination of facial expressions and tone of voice can improve the perception of emotional content within speech (Johnstone, et al. 2006; Brady & Guggemos, 2018). As an educational tool, synthesized emotional communication systems, which include facial expressions and tone of voice, could potentially help people with autism learn to share and understand emotional content within social settings. By using these tools, people with autism could have more opportunities to participate in the rapidly changing physical and virtual social communities, in which we all now live.