The Tiff update nobody asked us to announce:
✦ Cleaner, faster UI/UX
✦ Hallucinations down -meaningfully
✦ TTS integrated so Tiff can speak, not just type
✦ TAM fine-tuning pushed deeper into cultural context
We build because it's not good enough yet.
Fewer than 500 people on Earth still speak Ogiek fluently.
Most are over 70.
No younger generation is learning it.
This is not slow decline.
This is a closing window.
🧵
CA-AALM is Trendify's initiative to document,
preserve, and build AI from the ground up on
Ogiek and five other East African community
languages.
Primary field research.
Community-led documentation.
Real cultural data.
Not scraped from the internet.
Everyone is talking about AI for Africa.
Almost nobody is talking about what it actually
takes to build it properly.
Here is what we are doing at @TrendifyLabs with CA ◇ AALM
and why it is harder and more important than most
people realise.
Long Thread 🧵
@TrendifyLabs One thing I want to be clear about:
The community researchers are not assistants.
They are the primary investigators.
Native speakers trained in field methodology,
audio documentation, and annotation.
They conduct the interviews.
They do the transcriptions.
@TrendifyLabs The hardest part is not the model.
It is the annotation.
We built a custom cultural metadata schema
on top of CoNLL-U with fields like:
REGISTER: Elder / Moran / Ceremony / Daily / Taboo
CULTURAL_TAG: Age_Grade / Clan / Proverb / Land
CODE_SWITCH: Swahili / English / Mixed
@TrendifyLabs North Eastern
- Ogiek/Ndorobo- Endangered-Mau Forest
- Nubian (Kinubi)-Diaspora creole - Nairobi
- Maasai Tanzania (Maa TZ dialect)- Arusha
5 language families. 2 countries. 15.5M token target.
Each one is a different engineering challenge
@TrendifyLabs cultural knowledge base.
The base model knows how to reason.
CA-AALM teaches it what it means to be Ogiek.
We are working with 6 East African communities:
- Kikuyu (Gikuyu)-Bantu- Central Kenya
- Maasai Kenya (Maa) -Nilotic - Rift Valley
- Somali (Af-Somali)- Cushitic -
@TrendifyLabs CA-AALM = Culturally Aware African AI Language Model.
Not built from scratch (that takes $500M+).
Built the right way:
Base: Llama 3 (open-weight)
Fine-tuning: LoRA adapters rank 64, alpha 128
Data: 15.5M tokens of primary oral community data
Grounding: RAG layer on verified
@TrendifyLabs You cannot fix this by scraping more internet text.
The knowledge is not on the internet.
It is with elders in Mau Forest, Rift Valley,
and North Eastern Kenya.
So that is where we went.
@TrendifyLabs They fail basic Gikuyu morphology.
This is not a minor gap. It is a structural failure.
The root cause is data.
~70% of East African cultural knowledge exists
in oral form never transcribed, never digitised,
never anywhere near a training corpus.
@TrendifyLabs The dirty secret of "African AI":
Most models claiming to serve Africa are just
large English models with Swahili fine-tuning
bolted on top.
They cannot tell you what a Maasai proverb means
in the age-grade context it was spoken.
They hallucinate about Ogiek land rights.
Everyone is talking about AI for Africa.
Almost nobody is talking about what it actually
takes to build it properly.
Here is what we are doing at @TrendifyLabs with CA ◇ AALM
and why it is harder and more important than most
people realise.
Long Thread 🧵
What happens to cultures that were never digitised?
Most AI systems are trained on internet data. But a huge percentage of African knowledge was never written online in the first place. It exists in oral histories, ceremonies, ecological practices, proverbs, and languages spoken