Eternal Voices with AI-Powered Technologies: Impacts on Education and Beyond
By Mariah Hagadone-Bedir
Recent developments in AI technology have allowed AI-generated voices to sound remarkably close to natural human speech. For example, Microsoft’s VALL-E, a neural audio codec model which claims to be 99 percent accurate to real human voices, anticipates releasing an app where people can “sell” their voices in a market. This novel tool stands out in that it can learn human speech in 3 seconds and transmit emotion. Besides being a boon in the global speech and voice recognition industry, AI-generated voice tools are making a large splash in education in two major ways: increasing communication possibilities for neurodivergent people and impacting foreign language learning in positive outcomes for the short term but potentially disrupting the field in the long term.
AI tools can benefit neurodivergent children with personalized support and accommodations that focus on their unique needs. It has been estimated that around 30 percent of individuals diagnosed with autism spectrum disorder do not fully develop speech communication (Wodka et al., 2013). Robots have already been confirmed to have positive immediate effects on autistic children’s communication skills (Kouroupa et al. 2022; Syriopoulou-Delli & GkioInta, 2022), but now AI speech-generating tools can aid autistic children by “speaking” electronically to others that communicate their needs and ideas with the child’s own voice. The sophistication of VALL-E and other AI platforms provides autistic children with the capability to “sound” like themselves in their communication. For people around the child, it could positively increase familial and friendship bonds due to a more personalized, human-like output from the device which can identify the local dialect and accent to which the user belongs. These AI speech-generating tools also assist children with practicing speech sounds and fluency as well as aid with identifying different emotional tones in speech—a characteristic some autistic children struggle with.
While there is a large potential for AI-generated voice tools to improve communication outcomes for neurodivergent people, there is also a significant impact these tools can have on foreign language learning. AI-generated voice tools like camb.ai can dub content in 78 languages, including accents and dialects, while using one’s own voice. This means you can watch original foreign talk shows and news shows in your preferred language and preserve the speakers’ voices. It also means you could create a narrated video of your house cat using the voice of world-renowned naturalist David Attenborough. Recreational videos aside, these AI tools can provide more accurate feedback to learners such as talkpal.ai, which is a GPT-powered AI language tutor that helps learners identify the stress, intonation, and rhythm of the target language—and all with a “realistic” voice.
The question arises, however, that if we now have capabilities that allow us to communicate in foreign languages with devices in a seamless manner as we go about daily routines, what might be the point of learning another language when a tool can provide us with immediate, accurate, and our own human voice to communicate? It seems there is a large potential for AI-powered tools to continue to develop to the point that everyone would be capable of communicating in a foreign language with the aid of a speech-generating device (SGD). If that becomes the case, will an SGD make becoming fluent through the conventional way of taking foreign language courses moot for most of us?
Lastly, there are numerous ethical implications of using AI-generated technologies we must consider and parse apart. Some of these implications include data privacy, transparency around what happens to our data or “voice” once we upload it to an AI platform, legal ramifications like using someone’s voice without their consent such as impersonation, deception, and infringing upon intellectual copyright laws. While it may be entertaining to watch a house cat go about its routine with David Attenborough’s narration, if the digital content creator did not obtain permission to use Mr. Attenborough’s voice, this is a privacy violation and copyright infringement. Or at least, this is what I think. Presently, there is limited legal guidance and consequences for what someone can and cannot do with another’s voice. The juxtaposition of the rapid pace of development with emerging technologies and the slower evolvement of legislature combines to produce an environment where potential negative repercussions may proliferate, and reliance is on individual judgement. But perhaps, as artificial intelligence researcher and computer scientist Eliezer Yudkowsky put it, the danger is not so much in impersonation but that “By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.”
References
Kouroupa, A., Laws, K. R., Irvine, K., Mengoni, S. E., Baird, A., & Sharma, S. (2022). The use of social robots with children and young people on the autism spectrum: A systematic review and meta-analysis. Plos One, 17(6), 1-25. https://doi.org/10.1371/journal.pone.0269800
Syriopoulou‐Delli, C. K. and Gkiolnta, E. (2021). Effectiveness of different types of augmentative and alternative communication (AAC) in improving communication skills and in enhancing the vocabulary of children with ASD: a review. Review Journal of Autism and Developmental Disorders, 9(4), 493-506. https://doi.org/10.1007/s40489-021-00269-4
Wodka, E. L., Mathy, P., & Kalb, L. (2013). Predictors of phrase and fluent speech in children with autism and severe language delay. Pediatrics, 131(4), e1128–1134. doi:10.1542/peds.2012-2221