Submitted a proof of personhood for an accelerator while in a hospital bed… all for cloud credits to keep building as a bootstrap founder, we are really build from everywhere 😄
It's a really interesting journey about @_dialectra
Today I'm going to talk about @_dialectra
Not in English, but in Hausa.
As someone who uses AI almost every day and spends a lot of time exploring AI tools and projects, I noticed something interesting.
Yawancin AI voice agents suna iya magana da Hausa, amma idan ka zurfafa ka duba, da yawa daga cikinsu ba sa fahimtar yadda Hausawa ke magana a zahiri.
Hausar Kano daban.
Hausar Katsina daban.
Hausar Sokoto daban.
Har ma kalmomi, karin magana da lafazi suna canzawa daga yanki zuwa yanki.
Anan ne Dialectra ta bambanta.
Maimakon su mayar da hankali kawai wajen gina AI mai magana, suna tattara sahihin bayanan murya daga masu magana da Hausa na gaskiya.
Ba wai karatun rubutu kawai ba.
Suna tattara yadda mutane ke magana a rayuwa ta yau da kullum, da lafazi, da karin magana, da bambancin yare daga yankuna daban-daban.
Wannan yana da muhimmanci saboda AI ba zai iya fahimtar abin da bai taba koya ba.
Idan bayanan da aka horar da shi da su ba su wakilci Hausawa na gaskiya ba, to ko da model ɗin ya yi ƙarfi, zai yi kuskure idan ya gamu da ainihin masu amfani.
Abin da ya fi daukar hankalina shi ne cewa Dialectra tana gina foundation ne, ba kawai wani voice AI app ba.
Yau muna magana da @ElevenLabs, @Hailuo_AI da sauran voice AI platforms.
Amma ka taba tunanin me zai faru idan irin waɗannan manyan platforms suka samu damar amfani da ingantattun bayanan Hausa da Dialectra ke tattarawa?
Me zai faru idan AI zai iya gane Hausar Kano, Katsina ko Sokoto ba tare da rikicewa ba?
Me zai faru idan AI zai iya fahimtar yadda Hausawa ke magana a zahiri, ba kamar yadda littafi ya rubuta Hausa ba?
A ganina, wannan shi ne babban abin da ya sa Dialectra ta bambanta.
Ba wai kawai tana gina AI ba.
Tana gina bayanan da za su taimaka wa AI fahimtar Hausa yadda ya kamata.
Kuma hakan na iya zama babban mataki ga Hausa da sauran harsunan Afirka a duniyar AI.
A few days ago, we launched Dialect Connect — a simple way for people to have real conversations while contributing to African speech datasets.
Here’s where things stand already:
• 896 total conversation requests
• 703 completed conversations
• 107.4 hours of conversational speech collected
• 12 pending
• 8 active
• 173 rejected
Alongside this, our corpus reading and transcription workflows have now crossed more than 300,000 voice samples collected from Hausa-speaking contributors across our platform.
What matters to us is not just collecting audio.
The difficult part is what happens after collection.
Every contribution inside https://t.co/4389JazlEJ goes through a structured pipeline:
→ Transcription
→ Annotation
→ Standardization
→ Human verification
→ Approval
We built this because raw voice recordings alone are not enough to train reliable speech systems.
Models need properly reviewed transcripts, dialect-aware normalization, quality checks, and consistent formatting before the data becomes useful for training.
This is where many African language datasets struggle.
A lot of existing datasets are either scraped, weakly labeled, inconsistent, or missing conversational context entirely.
We are trying to approach this differently.
Dialectra is focused on building speech datasets that reflect how people actually speak — accents, dialects, pauses, code-switching, natural conversations, and regional differences included.
For voice AI startups and model builders, this matters more than dataset size alone.
Better infrastructure produces better models.
We’re still early, but it’s exciting seeing contributors across Hausa-speaking communities helping shape what this can become.
More updates soon.
We launched "Dialect Connect" yesterday and in just 24hrs, the stats is really impressive.
I was thinking few days ago a simple idea: what if we could capture how people actually speak, not just how they read?
I then implements yesterday as a additional feature for https://t.co/eCLj0KInG2
24 hours later:
📞 371 conversation requests
✅ 303 completed conversations
🎙️ 45.1 hours of conversational speech collected
⏳ 5 pending
🟢 2 active
❌ 61 rejected
For years, most speech datasets have been built around scripted recordings. They are useful, but they only tell part of the story.
Language lives in conversations.
It lives in pauses, interruptions, storytelling, laughter, code-switching, local expressions, and the unique rhythm that makes every dialect different.
The future of voice AI will not be built solely on people reading sentences from a screen. It will be built on authentic human interactions.
That is what excites me most about these numbers.
In just 24 hours, hundreds of people chose to connect with complete strangers or friends and simply talk. In doing so, they generated something incredibly valuable: real-world conversational data for African languages and dialects.
Every completed conversation moves us closer to a future where AI can understand not only what Africans say, but how we say it.
When we started Dialectra, our mission wasn't just to collect voice data. It was to ensure that African languages, dialects, and identities are represented in the AI systems that will power the next generation of technology.
45.1 hours is a small number compared to where we're going.
But it's a reminder that the infrastructure for African voice AI won't be built in a lab alone. It will be built by communities, contributors, and everyday conversations happening across the continent.
We're still very early.
Africa is home to some of the world’s most spoken and culturally influential languages, yet modern AI systems still struggle to understand them accurately.
Hausa alone is spoken by an estimated 80 to 100 million people across West and Central Africa, particularly in Nigeria and Niger. Swahili, widely recognized as Africa’s leading lingua franca, connects more than 200 million speakers across East and Central Africa.
Arabic, one of the continent’s most dominant languages, is spoken by hundreds of millions across North Africa and parts of the Sahel, shaping commerce, education, religion, and communication throughout the region. Yet despite this enormous linguistic scale, African speech remains heavily underrepresented in global AI systems.
That is the gap @_dialectra is stepping in to solve building the speech infrastructure designed to help artificial intelligence truly understand how Africa speaks.
Dialect Connect is still in testing…
Yet we’ve already recorded:
• 300+ calls
• 33+ hours of conversations
All in less than 24 hours 👀
The future of conversational data is exciting.
Dialect Connect is now LIVE
You can now:
🤝 Find an online partner
📩 Send an invite to a friend
🎙️ Join live voice conversations
💰 Earn rewards for every completed session
Choose a topic or simply have a random conversation. Every discussion helps create high-quality conversational datasets for the next generation of African voice AI.
We're excited to see what the community creates.
Try it now: https://t.co/4389JazlEJ
Coming Soon: "Dialect Connect"
One of the most exciting features we've build so far at Dialectra.
Two verified contributors. One topic. A live conversation.
Not scripts.
Not prompts.
Real conversations. Real dialects. Real speech patterns.
We're moving beyond read speech and transcription into authentic voice interactions the kind of data needed to build the next generation of dialect-aware AI systems.
A big step toward our vision of building the infrastructure layer for African voice AI.
Stay tuned. More updates soon.