In response to questions from our previous tweet, we are sharing a behind-the-scenes view of the same task.
This video shows the MenteeBot’s head camera view in the top-left, along with its “thoughts” and decision-making output in the bottom-left, offering a direct view of the perception and planning that drive the box-handling task performed by two MenteeBots.
Full unedited 18-minute video here: https://t.co/OvlO2g6uwo
#humanoidrobots #PhysicalAI #AI
LLMs Don’t Think Like Developers - Until Now.
Together with @KatzShachar and @liorwolf
We made LLMs execute their code while generating it, just like a human developer.
Meet EG-CFG: A new inference-time method that injects real-time execution feedback into the generation loop.
The new method guides the LLM toward code that doesn’t just look right, but actually works.
📈 SOTA across top code generation benchmarks:
• MBPP and MBPP-ET (@GoogleResearch): 96.6%, 73.0%
• HumanEval-ET (@OpenAI): 87.19%
• CodeContests (@DeepMind): 58.18%
No ChatGPT. No Gemini. No Claude.
Only open-source model - DeepSeek-V3. @deepseek_ai
Outperforms leading closed-models-based methods.
#AI #Coding #AI #LLMs #CodeGeneration
📄🚨 New!
Tired of waiting minutes for LLMs to "think"?
Test-time scaling (O3, DeepSeek-R1) lets LLMs reason before answering — but users are left clueless, with no progress or control.
Not anymore!
We expose the LLM’s internal 🕰️, and show how to monitor 📊 & overclock it⚡
🧵👇
Menteebots are mentored by people. On their first day in a new place, the owner or co-worker needs to show them around. This involves following a person in an unfamiliar environment while avoiding obstacles.
@yoavgo תודה על ההתייחסות. מדובר בהרצאה לקהל הרחב שב20 דקות מסכמת 7 עבודות במושגים נגישים. אשמח לבקר מתישהו ולעמוד בפני ביקורת אקדמית. בקשר לחשיבה כמו מדען -- זה אג'נדה שיש לנו עשור ואני גאה בהתקדמות https://t.co/BnGlyTaj0p