TamilLM needs your help. Many folks reached out from my last post stating willingness to help.
We need real Tamil examples from real Tamil speakers: everyday Tamil, formal Tamil, Tanglish, cultural explanations, literary explanations, regional usage, technical explanations, anything natural and useful.
Goal: 50,000 rows.
But honestly, I will take anything I get. Even 50 good examples from you helps.
Template:
https://t.co/tkMOtQDCRU
How to help:
Open the sheet
Download it or make a copy
Fill 5 to 20 rows if you can
Write in your own words
Send it to [email protected]
Please do not paste copyrighted books, lyrics, movie dialogue, private chats, or scraped web content. If you care about Tamil having a serious open language model, this is one of the most useful ways to help. This will be part of the dataset I push to @huggingface along with the model when done. 🙏
Building TamilLM from scratch has been humbling.
The hard part is not just training a model. It is everything before that: finding usable Tamil data, OCR’ing old books, separating modern prose from classical text, balancing English/Tamil translation support, avoiding copyright traps, deduping web data, and proving the training stack before spending real GPU money.
We now have enough tokens for a serious first run, but the work exposed the real challenge: Tamil is not one lane. A good Tamil model needs modern writing, spoken usage, English support, translation, literature, history, and classical grounding.
Also learning the practical side: H100 capacity is scarce, data quality matters more than raw size, and every shortcut becomes expensive later.
Still moving. TamilLM is getting closer.
Eelam Tamil here. Never been a VJ fan. Gotta say…I am impressed. I thought he was a dummy piece in real life. Hi hi.
I am also high asf right now. Peace!!!
Even Namal passed that law exam without the proper qualifications. Yoshitha became a naval cadet officer without meeting the basic requirements. Chichiya (Rohitha) even became an astronaut.
Parents like Mahinda and Shiranthi have, in a way, ruined their own children. No matter what, parents should teach their children values and principles from a young age. In trying to preserve power, they ended up sacrificing their own children. Today, those children are struggling in courts and public forums to defend their lies.
In that respect, Chandrika was an excellent mother. She educated her children and allowed them to build their own comfortable lives away from politics
More than 200 of the world's elites registered for a retreat whose agenda runs from panels on cult-building and sex to prepping for World War III. An associated app offers matchmaking. https://t.co/ib53DjHHE6