Mohammed Moussa

@binmosa

مهندس برمجيات، مهتم في تطوير تطبيقات الذكاء الاصطناعي، Ph.D in AI @MonashUni, AI Engineering, MLOPs & Products.

Kingdom of Saudi Arabia

Joined January 2009

2K Following

800 Followers

1.5K Posts

Mohammed Moussa @binmosa

5 days ago

Just use a robust UI library like @hero_ui , you really don’t need to go that hard!

Kevin

@kvnkld

6 days ago

https://t.co/fAoLZH9CAB

962

85K

binmosa retweeted

واس العام

@SPAregions

4 months ago

أمنٌ يُدير الحشود بكفاءة. #واس_رمضان47

163

374

857K

Mohammed Moussa @binmosa

7 months ago

Proud of the grind, the commits, and the journey 🚀💻 #2025wrapped

Mohammed Moussa @binmosa

7 months ago

@wesley_bytegrad I am running a full e-commerce platform where backend is Laravel & frontend is Nextjs, they’re seamlessly integrated and performing just right.

Who to follow

Dr. Eng. Abdelhamid Zabrmawi

@AZabrmawi

Ph.D. BME, PMP® مهتم بالبحث والتطوير والابتكار ، قائد إدارة البحث والتطوير والابتكار @bmet_sa (وماتوفيقي إلا بالله عليه توكلت وإليه أنيب)

#أخطبوط_المايك مقدم حفلات خطابية ومعلق صوتي وصانع محتوى وكاتب سيناريوهات مؤلف كتاب #أجمل_المفردات_في_تقديم_الحفلات [email protected] 966500713826

binmosa retweeted

Avi Chawla

@_avichawla

8 months ago

Microsoft. Google. AWS. Everyone's solving the same problem for Agents: How to build a real-time context layer for Agents across dozens of data sources? Airweave is an open-source context retrieval layer that solves this! Learn how this layer differs from RAG below:

_avichawla's tweet photo. Microsoft.
Google.
AWS.

Everyone's solving the same problem for Agents:

How to build a real-time context layer for Agents across dozens of data sources?

Airweave is an open-source context retrieval layer that solves this!

Learn how this layer differs from RAG below: https://t.co/hqycgs5eoq

801

109

113K

Mohammed Moussa @binmosa

8 months ago

@salahkhashoggi اذا تحتاج طرق متقدمة وحلول بالامكان المساعدة.

355

binmosa retweeted

Andrej Karpathy

@karpathy

over 1 year ago

Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are we educating for agency? Are you acting as if you had 10X agency? Grok explanation is ~close: “Agency, as a personality trait, refers to an individual's capacity to take initiative, make decisions, and exert control over their actions and environment. It’s about being proactive rather than reactive—someone with high agency doesn’t just let life happen to them; they shape it. Think of it as a blend of self-efficacy, determination, and a sense of ownership over one’s path. People with strong agency tend to set goals and pursue them with confidence, even in the face of obstacles. They’re the type to say, “I’ll figure it out,” and then actually do it. On the flip side, someone low in agency might feel more like a passenger in their own life, waiting for external forces—like luck, other people, or circumstances—to dictate what happens next. It’s not quite the same as assertiveness or ambition, though it can overlap. Agency is quieter, more internal—it’s the belief that you *can* act, paired with the will to follow through. Psychologists often tie it to concepts like locus of control: high-agency folks lean toward an internal locus, feeling they steer their fate, while low-agency folks might lean external, seeing life as something that happens *to* them.”

50K

36K

11M

binmosa retweeted

Andrej Karpathy

@karpathy

8 months ago

Last night I taught nanochat d32 how to count 'r' in strawberry (or similar variations). I thought this would be a good/fun example of how to add capabilities to nanochat and I wrote up a full guide here: https://t.co/fz1AMI5kqk This is done via a new synthetic task `SpellingBee` that generates examples of a user asking for this kind of a problem, and an ideal solution from an assistant. We then midtrain/SFT finetune on these to endow the LLM with the capability, or further train with RL to make it more robust. There are many details to get right especially at smaller model sizes and the guide steps through them. As a brief overview: - You have to ensure diversity in user prompts/queries - For small models like nanochat especially, you have to be really careful with the tokenization details to make the task easy for an LLM. In particular, you have to be careful with whitespace, and then you have to spread the reasoning computation across many tokens of partial solution: first we standardize the word into quotes, then we spell it out (to break up tokens), then we iterate and keep an explicit counter, etc. - I am encouraging the model to solve the model in two separate ways: a manual way (mental arithmetic in its head) and also via tool use of the Python interpreter that nanochat has access to. This is a bit "smoke and mirrors" because every solution atm is "clean", with no mistakes. One could either adjust the task to simulate mistakes and demonstrate recoveries by example, or run RL. Most likely, a combination of both works best, where the former acts as the prior for the RL and gives it things to work with. If nanochat was a much bigger model, you'd expect or hope for this capability to more easily "pop out" at some point. But because nanochat d32 "brain" is the size of a ~honeybee, if we want it to count r's in strawberry, we have to do it by over-representing it in the data, to encourage the model to learn it earlier. But it works! :)

karpathy's tweet photo. Last night I taught nanochat d32 how to count 'r' in strawberry (or similar variations). I thought this would be a good/fun example of how to add capabilities to nanochat and I wrote up a full guide here:
https://t.co/fz1AMI5kqk

This is done via a new synthetic task `SpellingBee` that generates examples of a user asking for this kind of a problem, and an ideal solution from an assistant. We then midtrain/SFT finetune on these to endow the LLM with the capability, or further train with RL to make it more robust. There are many details to get right especially at smaller model sizes and the guide steps through them. As a brief overview:

- You have to ensure diversity in user prompts/queries
- For small models like nanochat especially, you have to be really careful with the tokenization details to make the task easy for an LLM. In particular, you have to be careful with whitespace, and then you have to spread the reasoning computation across many tokens of partial solution: first we standardize the word into quotes, then we spell it out (to break up tokens), then we iterate and keep an explicit counter, etc.
- I am encouraging the model to solve the model in two separate ways: a manual way (mental arithmetic in its head) and also via tool use of the Python interpreter that nanochat has access to. This is a bit "smoke and mirrors" because every solution atm is "clean", with no mistakes. One could either adjust the task to simulate mistakes and demonstrate recoveries by example, or run RL. Most likely, a combination of both works best, where the former acts as the prior for the RL and gives it things to work with.

If nanochat was a much bigger model, you'd expect or hope for this capability to more easily "pop out" at some point. But because nanochat d32 "brain" is the size of a ~honeybee, if we want it to count r's in strawberry, we have to do it by over-representing it in the data, to encourage the model to learn it earlier. But it works! :)

177

425

574K

Mohammed Moussa @binmosa

8 months ago

@hlslyuri Keynote only with material colors

binmosa retweeted

Andrej Karpathy

@karpathy

8 months ago

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

karpathy's tweet photo. Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI.

It weighs ~8,000 lines of imo quite clean code to:

- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with "GRPO"
- Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI.
- Write a single markdown report card, summarizing and gamifying the whole thing.

Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc.

My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.

Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

682

24K

18K

Mohammed Moussa @binmosa

9 months ago

تحديثات جديدة من OpenAI تفتح آفاق جديدة لأتمتة الكثير من الأعمال، خاصة لأصحاب المشاريع الصغيرة والناشئة. #الذكاء_الاصطناعي_التوليدي

OpenAI Developers

@OpenAIDevs

9 months ago

Introducing AgentKit—build, deploy, and optimize agentic workflows. 💬 ChatKit: Embeddable, customizable chat UI 👷 Agent Builder: WYSIWYG workflow creator 🛤️ Guardrails: Safety screening for inputs/outputs ⚖️ Evals: Datasets, trace grading, auto-prompt optimization

290

11K

binmosa retweeted

Thinking Machines

@thinkymachines

9 months ago

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! https://t.co/tJsgxgBuWo

thinkymachines's tweet photo. Introducing Tinker: a flexible API for fine-tuning language models.

Write training loops in Python on your laptop; we'll run them on distributed GPUs.

Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!

https://t.co/tJsgxgBuWo

240

789

Mohammed Moussa @binmosa

9 months ago

@justbuilding It is a set of all the traditional automation techniques that are wrapped around LLM.

Mohammed Moussa @binmosa

10 months ago

@lunarphp EC2 aws via Forge

159

Mohammed Moussa @binmosa

10 months ago

@levelsio Use Laravel forge and easily manage all of your sites & VPS (it supports Hetzner).

binmosa retweeted

Lunar @lunarphp

10 months ago

The wait is over. Lunar v1.0.0 has landed. 🚀 A powerful e-commerce platform, built for @laravelphp developers who want more. Start building today ! https://t.co/ooirL9sOwn

lunarphp's tweet photo. The wait is over. Lunar v1.0.0 has landed. 🚀
A powerful e-commerce platform, built for @laravelphp developers who want more.

Start building today !

https://t.co/ooirL9sOwn https://t.co/0iSN2LFc1k

169

Mohammed Moussa @binmosa

11 months ago

@svpino Currently, I’m building a multi-agent flow with ADK, initially I used gemini-2.0-flash and the agents were doing great, once I updated their models to gemini-2.5-flash, the flow broke , they started showing new behaviors not as expected, nothing else changed, only the model!

binmosa retweeted

Alex Vacca

@itsalexvacca

11 months ago

Before AWS existed, one company ran the servers for Twitter, LinkedIn, and Facebook's entire app ecosystem. They owned Node.js, invented containers 8 years before Docker, and Peter Thiel even backed them. Then something happened...

237

20K

10K

Mohammed Moussa @binmosa

11 months ago

@namd1nh This is totally a random hierarchy, it doesn’t reflect how the sub-fields are actually related.

112

Mohammed Moussa @binmosa

11 months ago

What makes a team truly creative? R. Florida’s 3 Ts model breaks it down: Technology fuels growth, Talent brings entrepreneurs and artists, Tolerance welcomes diverse ideas and people. Add Time to build deep trust and lasting collaboration. #Creativity #Innovation #TeamDynamics

Mohammed Moussa

@binmosa

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users