VOXcommons: A Participatory Voice Infrastructure for AI

Mawa Keita

Thesis Faculty:

Kurt Bieg, Louisa Campbell

Building the data systems AI needs to understand Maninkakan and other underrepresented languages, using participatory voice tools that center community knowledge and lived experience.

The realization

My work began with a much narrower goal: designing AI-powered health tools to improve access to malaria education and healthcare communication in Guinea, West Africa. Early prototypes focused on multilingual symptom guidance, AI-assisted health literacy, and conversational interfaces meant to support users in English, French, and Maninkakan.

At first, the systems seemed technically successful. They generated responses in English and worked reasonably well in French. But when tested in Maninkakan, they consistently failed.

The AI could produce text, but it did not meaningfully understand the language. Responses felt inaccurate, unnatural, or culturally disconnected. Participants often switched back to English—even when it wasn’t their primary language—because the system could not be trusted to communicate reliably in Maninkakan.

This failure became the central insight of the project.

The problem wasn’t simply that the applications needed improvement. It was deeper: the linguistic infrastructure required for AI to understand and engage with Maninkakan did not meaningfully exist.

Before AI can support communication, healthcare, or education, it must first be capable of listening to—and learning from—the language itself.

A shift in direction

This realization fundamentally changed the direction of the work.

The project moved away from building applications and toward building the conditions that make those applications possible. Rather than treating language as a feature layered onto existing systems, the work began treating language as foundational infrastructure.

The focus shifted:

from interfaces to infrastructure
from outputs to language systems
from application design to dataset creation
from translation to participation

VOXcommons emerged from this shift.

The project became less about creating a single AI tool and more about asking a broader question:

How can primarily oral, underrepresented language communities participate in building the linguistic systems that future AI technologies will depend upon?

At the same time, the project began confronting another layer of complexity: how language itself is represented inside technological systems. Existing written references for Manding languages often originate from colonial-era linguistic frameworks that imposed external grammatical structures onto living speech.

This introduced a critical methodological concern:

If AI systems are trained only through inherited linguistic structures, they risk reproducing outdated or externally imposed understandings of language rather than learning from contemporary speakers themselves.

VOXcommons therefore became not only a technical project, but a decolonial one—an attempt to create systems that learn language through participation, voice, and community validation rather than solely through archival authority.

What VOXcommons is

VOXcommons is a participatory voice infrastructure for collecting, organizing, validating, and preparing spoken language data for artificial intelligence systems.

The system is designed around voice rather than text.

Using telephony, participants contribute recordings through simple phone calls. These recordings are captured, stored, transcribed, validated, and structured into datasets that can later support speech recognition systems, translation tools, small language models, and future AI applications.

The project focuses on Maninkakan and other widely spoken languages in Guinea that remain significantly underrepresented within digital systems despite being central to everyday communication.

Rather than requiring smartphones, literacy, or stable internet access, the system is intentionally designed around low-bandwidth communication methods already embedded in daily life. Telephony becomes the primary interface because it aligns more closely with the infrastructural realities of many communities in Guinea.

A phone call becomes an interface.

A spoken response becomes structured data.

A voice becomes part of a living corpus.

How it works

Participants interact with VOXcommons through guided prompts delivered over phone calls.

The system asks users to respond in their language using open-ended, constrained, or structured prompts. These prompts are designed to capture multiple layers of language, including vocabulary, storytelling, health communication, everyday speech, and cultural expression.

Each contribution moves through a multi-stage process:

voice recordings are captured
metadata is attached
AI-assisted transcription is generated
community validation and correction takes place
datasets are structured for future AI training

The methodology intentionally combines automation with human interpretation. Validation is not treated as a purely technical process, but as a collaborative act grounded in speaker knowledge.

This approach emerged directly from the realization that living language cannot always be evaluated against fixed grammatical references alone. Meaning must remain connected to the people who speak the language.

The project therefore positions speakers not simply as contributors of data, but as participants in shaping how AI systems will eventually understand and represent their language.

Why it matters

Many AI systems today are trained primarily on high-resource languages. Even systems described as “multilingual” often perform poorly in African languages because sufficient datasets simply do not exist.

The consequence is not only technical exclusion, but epistemic exclusion.

When languages are absent from AI systems, the cultural knowledge, lived experiences, and forms of expression carried by those languages also become absent from the future those systems help shape.

VOXcommons responds to this gap by building datasets directly from speakers rather than relying solely on externally sourced linguistic structures or inherited colonial frameworks.

The project treats language as:

living rather than static
participatory rather than extractive
spoken rather than purely textual
community-authored rather than externally imposed

This work also aligns with broader global efforts calling for greater linguistic inclusion in AI systems, including UNESCO’s International Decade of Indigenous Languages and African-led language AI initiatives such as Masakhane.

An ongoing infrastructure

VOXcommons is not a single application.

It is an evolving infrastructure designed to grow through participation.

Each recording expands the dataset.

Each contribution strengthens the corpus.

Each speaker helps shape how future systems may understand language.

The project has already moved through multiple forms:

AI health applications
language documentation tools
paper-based translation workbooks
small language model training systems
telephony-based data collection infrastructure

Each stage emerged through testing, failure, and methodological revision. The progression was not linear toward “more advanced” technology, but toward methods better aligned with real community conditions.

The work continues to evolve through research, prototyping, fieldwork preparation, institutional partnerships, and community participation.

Closing

A way of gathering language.

A way of building systems that can learn from it.

Not by extracting voices, but by listening to them.

A way of imagining AI infrastructure that begins with participation, language, and community itself.

Mawa Keita

MFA design & technology

Mawa Keita is a New York–based designer and healthcare IT professional whose work explores the intersection of artificial intelligence, language, and global health. Her practice is grounded in a central question: how can AI systems learn to understand communities that have historically been left out of the data that shapes them?

Her thesis focuses on designing AI for multilingual, low-resource contexts, with an emphasis on Guinea, West Africa. Through projects like VOXcommons, she creates participatory systems for language documentation and voice-based data collection; approaches that center community knowledge as both source and authority.

Her work aims to lay the foundations for more equitable digital health technologies by building systems that listen, adapt, and are shaped by the people they serve.

View Profile