Tafsiri S2ST – ArtSciLab

Project Overview

Tafsiri is an AI-powered speech-to-speech translation (S2ST) application that integrates modern web technologies, a Multimodal Translation Model, and a Text-to-Speech System to create a web-based app for translating audio to text and speech in over 100 languages spoken across the world.

ITEM	DESCRIPTION
Project	Tafsiri
Team	Collins Mwange (Lead Developer) Vinayak Jaiwant
Department	ASL (Creative Disturbance)
Tech stack	Models (seamlessM4T v2 + Edge TTS), FastAPI, NodeJS, React.js, PostgreSQL, Tailwind CSS, Vite, Nginx, Docker
App	tafsiri.creativedisturbance.org
Source code	Tafsiri-S2ST
Developed	Feb – Apr, 2025

The problem

In the realm of digital content, podcasts have emerged as a popular medium for storytelling, education, and entertainment. However, language barriers often limit the reach of these valuable resources. Recognizing this challenge, the team at ArtSciLab embarked on an ambitious project to make their Creative Disturbance Publishers (CDP) podcasts accessible to a global audience, regardless of language.

The Research

In the initial phase, the team didn’t have a clear idea of what the solution would look like, technically. So, they experimented with various potential solutions:

1: Using Microsoft’s Neural Machine Translation (NMT) Models

The team used the Microsoft Translator API (api.cognitive.microsofttranslator.com), which is powered by Microsoft’s proprietary Neural Machine Translation (NMT) models, part of Azure AI services (formerly Azure Cognitive Services). Microsoft developed these large language models (LLMs) for translation, separate from OpenAI’s GPT models. They are trained using deep learning techniques and optimized for multilingual translation across 100+ languages.

This solution was expensive and unsustainable.

The ideal solution would be free to consume and support Speech-to-Speech Translation from English to other languages. If that was not available, the team would settle for a Free solution that supported Speech-to-Text (English-to-English) and Text-to-Speech (English-to-Other Languages). It could be a single model supporting the 2 steps, or two different models supporting individual steps.

2: Using OpenAI’s Whisper Model

Ditching the Microsoft Azure managed service Neural Machine Translation (NMT) model for open-source locally hosted options, they embarked on research to find a solution that was FREE for consumption and supported Translation from English to other languages.

Whisper Model:

Only supported Speech-to-Text translations
Supported translation from other languages to English (reverse != True)

3: Meta’s seamlessM4T + Microsoft’s Edge TTS

The team decided to bundle together two models, seamlessM4T v2 and edge TTS. This solution proved efficient and sustainable. It ticked the main boxes for a desired solution.

🔹 Key Features

Supports many languages for both text and speech.
End-to-end solution
Pretrained & downloadable for local or API use.

huggingface.co

The team settled on this solution. Tafsiri uses two models to achieve S2ST:

SeamlessM4T v2 – Meta’s seamlessM4T handles speech-to-text (S2T) translation. e.g. speech in language X to text in language Y.
Edge TTS – Microsoft’s edge TTS handles text-to-speech (T2S) translation. The model takes as input the output from seamlessM4T. e.g. text in language Y => speech in language Y.

The Solution

By breaking down language barriers, Tafsiri democratizes access to the CDP’s rich content, making it accessible to non-English speakers across the globe. This aligns with the vision of the Harry Bass Jr. School of Arts, Humanities, and Technology at The University of Texas at Dallas, which initiated the CDP project.

To suggest/request features, go here: Tafsiri – User Requirements Gathering

To try/use the app, go here: tafsiri.creativedisturbance.org

Bottom Line

Tafsiri represents a significant stride in making digital content more accessible. By harnessing the power of AI, it transcends language barriers, bringing diverse audiences closer to the wealth of knowledge shared through the Creative Disturbance Publishers. The dedicated team behind Tafsiri continues to innovate, driven by the vision of a world where language is no longer a barrier to information.