Most AI systems don’t fail safety benchmarks. They fail when users show up.
Red teaming, trying to break the system and surface unsafe behavior, becomes a checklist.
Blue teaming, ensuring the model responds to safe requests, becomes cautious.
Both miss how systems are used.
Most Arabic AI demos look great, until real users show up.
Because users don’t speak in clean, structured Arabic. They speak in dialects, interrupt each other, talk over noise, and switch languages mid-sentence.
And this is where models break.
Can today’s AI models reliably read charts in Arabic? We tested them.
We ran a simple probe:
Two Arabic charts, two objective questions, tested across:
@OpenAI - GPT, @AnthropicAI - Claude, @Google - Gemini, @QatarComputing - @Ai_Fanar, and https://t.co/wHTxHvdTSF - GLM-5
Incidents like this highlight a core problem we see every day:
agents behave differently once they touch real systems.
Testing, red-teaming, and alignment in realistic environments isn’t optional anymore - it’s foundational
Ramadan Kareem / رمضان كريم 🌙
As people and brands use AI to generate greeting images, it is critical that the images make sense linguistically and culturally.
In the recent paper Simple Test-Time Scaling: https://t.co/5CThOaosBX, researchers from @Stanford, @UW, @allen_ai , and @ContextualAI show that even 1,000 curated samples can significantly improve model performance.
We recently published a blog post redteaming leading LLMs on Arabic:
https://t.co/j6qf8RIZZr
Building on that assessment, we zoomed in on safe responses and categorized them further: (1) Helpful Response, and (2) Refusals.
🚀 The @aiastrolabe Arabic Safety Index (ASAS - أساس) is now live.
We are officially launching the first-ever benchmark focused on Arabic LLM safety, and the results demand attention.
ASAS is eye-opening. Even top models struggle with basic safety in Arabic.
Redteaming does not transfer well when context is this complex; dialect, religion, politics, all layered.
Arabic AI is being left behind. We are here to change that.
At @aiastrolabe, we’re building the gateway to #ArabicAI; pushing the frontier across dialects, domains, and capabilities, and making sure models are safe, culturally grounded, and built for our communities.
🌍 Ensuring AI models preserve the full spectrum of human language is more critical than ever.
As LLMs increasingly reflect the linguistic homogenization of the internet, we must actively push back to bring the world's languages in AI-driven communication. This is challenging but essential work.
The ALLaM team's incredible efforts—accepted to ICLR 🎉—in adding world-class Arabic language capabilities for LLMs primarily trained on English is one example of this (proud to have been part of the team!).
Stay tuned for more work coming in this area, particularly high quality non-English datasets to train the best AI models...
Exciting news ahead! In Arabic, (ALLaM) means 'knowledge'. Get ready for our first launch of LLM, ALLaM - stay tuned for more updates soon.
(Tweet Written by ALLaM)
With - @areebsa@haidarkk1@twairesh@ajabal4
أمجد مسعد @amasad مؤسس ريبليت @replit - ستارتاب تقدر قيمتها بمليار دولار و تتيح لمستخدميها البرمجة من خلال المتصفح دون الحاجة لأجهزة أو تطبيقات برمجية متخصصة. وصل عدد مستخدمي @replit إلى ١٠ ملايين مستخدم حول العالم..🧵
#replit#العنصر14#برمجة#startup
https://t.co/vHwIN1bYHp
في هذه الحلقة نتحدث مع @AbbadDira عن تجربته مع الطويلة مع @ArabicWikipedia. عبّاد يجيبنا هنا عن أسئلة تتكرر حول ويكيبيديا مثل من يديرها وينظم محتواها؟ وهل هي مصدر موثوق؟ من أين يبدأ المرء إذا أراد المساهمة في ويكيبيديا؟ وما الفوائد المرجوة من المساهمة؟
https://t.co/CUUAZbWLGt
حوار شيق مع @DimaDamen من @BristolUni عن أبحاثها في مجال الرؤية الحاسوبية📺🤖 (#ComputerVision) و تعلم الآلة (#MachineLearning) و عن مسيرتها العلمية بما في ذلك اختيار التخصص في مراحل التعليم المختلفة و الفروقات بين أنظمة التعليم العربية و الغربية.
https://t.co/JmlLVl7NiD
#بيتكوين و #Blockchain: ما هي #العملات_الرقمية و كيف تعمل؟
نتناول العملات الرقمية من خلال بيتكون و تقنية الـblockchain. المواضيع التي نعالجها تتضمن سبب اختراع هذه العملات و و كيف تعمل و مسألة الخصوصية و التزوير و هل تستثمر فيها؟ الحلقة كاملةً:
https://t.co/GPopRVTxC5