IJEEC | Tackling the Problem of Multilingualism in Voice Assistants

Indexing and Abstracting

Tackling the Problem of Multilingualism in Voice Assistants

Soham Sabharwal , Rohan Sahni

International Journal of Electrical, Electronics and Computers (IJECC), Vol-9,Issue-5, September - October 2024, Pages 1-14, 10.22161/eec.95.1

Download | Downloads : 8 | Total View : 784

Article Info: Received: 23 Aug 2024; Accepted: 22 Sep 2024; Date of Publication: 01 Oct 2024

Abstract:

Voice assistants like Alexa and Siri have become increasingly advanced due to improvements in AI and language processing models like GPT and Gemini. However, these systems often perform poorly with less commonly spoken languages, such as many Indian languages, creating a significant accessibility gap. This paper addresses the problem of multilingualism in voice assistants, with a focus on languages like Hindi, Punjabi, and Bengali. We examine the evolution of voice assistants and highlight the major technical challenges they face, including speech recognition, language processing, and response generation in low-resource languages. To overcome these barriers, we propose a novel framework that combines different AI models to enhance multilingual support. Our approach offers a potential solution to make voice assistants more inclusive and accessible for speakers of underrepresented languages. By broadening language support, this research has the potential to extend the benefits of AI to a much wider audience.

Keywords:

Multilingualism, Voice assistants, AI, Transformers, Large Language Models, Natural Language Understanding

References:

[1] Pavitra, A.R.R. et al. (2023) 'A Review on Intelligent Voice Assistant with Multilingual Support using Artificial Intelligence,' Journal of Emerging Technologies and Innovative Research (JETIR), 10(4). https://www.jetir.org/papers/JETIR2304137.pdf.
[2] Dong, Qianqian & Huang, Zhiying & Xu, Chen & Zhao, Yunlong & Wang, Kexin & Cheng, Xuxin & Ko, Tom & Tian, Qiao & Li, Tang & Yue, Fengpeng & Bai, Ye & Chen, Xi & Lu, Lu & Ma, Zejun & Wang, Yuping & Wang, Mingxuan & Wang, Yuxuan. (2023). PolyVoice: Language Models for Speech to Speech Translation. 10.48550/arXiv.2306.02982.
[3] ElevenLabs: Free Text to Speech & AI Voice Generator | ElevenLabs (2024). https://elevenlabs.io/.
[4] AI4Bharat (no date) GitHub - AI4Bharat/IndicWav2Vec: Pretraining, fine-tuning, and evaluation scripts for Indic-Wav2Vec2. https://github.com/AI4Bharat/IndicWav2Vec.
[5] Bark - a Hugging Face Space by Suno (no date). https://huggingface.co/spaces/suno/bark.
[6] Company, F. and Meta (2023) 'Introducing Voicebox: the most versatile AI for speech generation,' Meta, 16 June. https://about.fb.com/news/2023/06/introducing-voicebox-ai-for-speech-generation/.
[7] Dowding, John & Gawron, Jean & Appelt, Doug & Bear, John & Cherny, Lynn & Moore, Robert & Moran, Douglas. (1994). Gemini: A Natural Language System For Spoken-Language Understanding. 10.3115/981574.981582.
[8] Gupta, S.C. (2023) 'InDIC Language Stack for voice assistants and conversational AI | towards data science,' Medium, 6 August. https://towardsdatascience.com/vernacular-indic-language-bharat-bhasha-stack-for-conversational-ai-platform-and-voice-assistant-apps-6f8b9b4ad0a5.
[9] Dabre, R. et al. (2022) 'INDIcBART: a pre-trained model for INDIC Natural Language Generation,' Findings of the Association for Computational Linguistics: ACL 2022 [Preprint]. https://doi.org/10.18653/v1/2022.findings-acl.145.
[10] Madhani, Y., Khapra, M.M. and Kunchukuttan, A. (2023) Bhasha-Abhijnaanam: Native-script and Romanized Language Identification for 22 Indic languages. https://arxiv.org/abs/2305.15814.
[11] Madhani, Y. et al. (2022) Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users. https://arxiv.org/abs/2205.03018.
[12] Bhogale, K.S. et al. (2023) VistaAr: Diverse Benchmarks and Training Sets for Indian Language ASR. https://arxiv.org/abs/2305.15386.
[13] Yadav, H. and Sitaram, S. (2022) A survey of multilingual models for Automatic Speech recognition. https://arxiv.org/abs/2202.12576.
[14] Rabiyath, S.S. et al. (2024) 'Bashini website and App – an overview,' Journal of Emerging Technologies and Innovative Research (JETIR), 11(1). https://www.jetir.org/papers/JETIR2401141.pdf.
[15] Mhaske, A. et al. (2022b) Naamapadam: a Large-Scale named entity annotated data for Indic languages. https://arxiv.org/abs/2212.10168.
[16] Create realistic Hindi Text to Speech | ElevenLabs (no date b). https://elevenlabs.io/languages/hindi.
[17] Auto Generate Hindi Voiceover Online | Wavel AI (2022). https://wavel.ai/solutions/ai-voice-generator/hindi-voiceover.
[18] Xiong, W. & Wu, L. & Alleva, Fil & Droppo, Jasha & Huang, Xuedong & Stolcke, A.. (2018). The Microsoft 2017 Conversational Speech Recognition System. 5934-5938. 10.1109/ICASSP.2018.8461870.
[19] Canbek, N.G. and Mutlu, M.E. (2016) 'On the track of Artificial Intelligence: Learning with Intelligent Personal Assistants,' Journal of Human Sciences, 13(1), p. 592. https://doi.org/10.14687/ijhs.v13i1.3549.
[20] Best Speech-to-Text APIs in 2024 (no date). https://www.edenai.co/post/best-speech-to-text-apis?referral=red-best-stt-apis-2Let.