Raúl G. Roa Gómez

@rgroag

CEO @ Tagshelf

Plano, TX

Joined August 2012

258 Following

208 Followers

6.9K Posts

rgroag retweeted

Osaurus

@OsaurusAI

3 days ago

AppleScript is the oldest way to drive a Mac. Frontier models butcher it. So we trained two small ones that don't. Open weights. On-device. 100% compile.

486

469

52K

rgroag retweeted

Poolside

@poolsideai

3 days ago

Today we’re releasing Laguna XS 2.1. It’s a small upgrade to the Laguna XS.2 model, the same 33B total / 3B active MoE and stronger results on multilingual coding and terminal-style tasks. Available now on @huggingface, @OpenRouter, and via Poolside API.

poolsideai's tweet photo. Today we’re releasing Laguna XS 2.1.

It’s a small upgrade to the Laguna XS.2 model, the same 33B total / 3B active MoE and stronger results on multilingual coding and terminal-style tasks.

Available now on @huggingface, @OpenRouter, and via Poolside API. https://t.co/xb3be90Kb3

259

58K

rgroag retweeted

Satya Nadella

@satyanadella

3 days ago

The future of the firm is a learning loop in which human capital and token capital compound. With our new Frontier Co., our ambition is to help every enterprise build its own AI capability, and to help create a frontier ecosystem where every organization can turn its knowledge, workflows, and judgment into its own AI systems that continuously improve. https://t.co/mvYhkRFyqa

637

rgroag retweeted

Hugging Models

@HuggingModels

2 days ago

Qwythos-9B is a full-parameter reasoning model post-trained on over 500 million tokens of high-quality Claude Mythos / Claude Fable traces with chain-of-thought generated in-house by Empero AI's internal rethink tool. It dominates the base Qwen3.5-9B under matched evaluation (+34 pts MMLU, +30 pts gsm8k-strict, +19 pts gsm8k-flex), supports native function calling per the Qwen3.5 spec, and ships with a 1,048,576-token (1M) context window via YaRN rope-scaling enabled by default.

HuggingModels's tweet photo. Qwythos-9B is a full-parameter reasoning model post-trained on over 500 million tokens of high-quality Claude Mythos / Claude Fable traces with chain-of-thought generated in-house by Empero AI's internal rethink tool.

It dominates the base Qwen3.5-9B under matched evaluation (+34 pts MMLU, +30 pts gsm8k-strict, +19 pts gsm8k-flex), supports native function calling per the Qwen3.5 spec, and ships with a 1,048,576-token (1M) context window via YaRN rope-scaling enabled by default.

449

397

29K

Who to follow

Can H.

@apeksteduran

🚗🏁🏆 Mechanical Engineer - Rally Driver

Amhed

@amhedH

Ex-Coinbase, Tiny beast, Working on Payments

旅打ちニッポン！

@traveler_0717

略称「旅うち(たびうち)」これでも一応大阪住み

rgroag retweeted

Yann LeCun

@ylecun

2 days ago

Exactly. I've been disseminating a similar message for years. The concentration of power in AI and the desire for control is by far the biggest danger of AI. It could lead to a few private companies and/or countries being in control of access to information, access to knowledge, and access to the tools of economic expansion. It's a kind of medieval obscurantism akin to the Ottoman empire banning the use of the printing press for 200 years, in part to keep control of the dogma, but also to protect the corporation of the calligraphers and scribes. Relevant historical bits about the Internet: 1. It took a deliberate decision by Al Gore and Bill Clinton to open up access of what was then ARPAnet to commercial entities and to the public, against the desires of the entrenched telecom industry. During a public roundtable about the "information superhighway" in 1993, the CEO of AT&T told Gore and Clinton "leave it to us". Gore said no. 2. In the late 1980s, setting up an Internet presence required buying proprietary hardware with proprietary OS and software stack from Sun Microsystems, HP, IBM, or Dell. By the 2000s, all of this was wiped out by commodity hardware, Linux, Apache, and an entirely free/open software stack. This migration to open platforms was the result of market forces. Infrastructure wants to be open. Foundation models are becoming an infrastructure and will inevitably become commoditized. Long term, the money is in the application layer, which is what I, Arthur Mensch, Alex Karp, and others have been saying.

364

752

194K

rgroag retweeted

Julian Goldie SEO

@JulianGoldieSEO

7 days ago

Ornith 1.0 is not just another open-source model. It changes how AI agents actually think through work. Here’s the simple breakdown: → It is MIT licensed. → It can be used commercially. → It has 9B, 35B, and 397B versions. → The 9B can run on a laptop. → The 35B is 21.2GB at 4-bit quantization. → The flagship reportedly scores 82.4 on SWE-Bench Verified. → It uses self-scaffolding reinforcement learning. That last part matters most. Most AI agents need humans to build the workflow around them. Ornith starts building the workflow while solving the task. Save this video, you’ll understand why AI agents are changing fast. Want the SOP? DM me. 💬

127

rgroag retweeted

vLLM

@vllm_project

6 days ago

🎙️ Serving TTS isn't the same problem as serving an LLM. It has to hit a first-audio budget of a few hundred ms, keep audio continuous across streaming chunks, and sustain enough concurrent streams per GPU to keep serving cost down. It's also a multi-stage pipeline where each stage bottlenecks differently, so no single recipe carries across models. vLLM-Omni TTS team tuned a different lever for each of four TTS models: 🗣️ Qwen3-TTS: decouple connector chunking from the Code2Wav decode window, batch the Stage-0 decode preprocessing. +61.5% audio throughput on H20×2, P99 latency nearly halved. 🌊 VoxCPM2: whole-forward torch.compile, plus CFM/LocDiT decode-tail batching across requests. +172% audio throughput. 🎚️ Higgs Audio V3: move the multi-codebook decode state machine into GPU-resident tensors. 2.7x speedup. 🐟 Fish Speech S2 Pro: a model-specific q_len=1 Triton attention kernel for the pure-decode path. Full engineering deep-dive on how we picked each lever: 🔗 https://t.co/ZVROwJwYoT

vllm_project's tweet photo. 🎙️ Serving TTS isn't the same problem as serving an LLM. It has to hit a first-audio budget of a few hundred ms, keep audio continuous across streaming chunks, and sustain enough concurrent streams per GPU to keep serving cost down. It's also a multi-stage pipeline where each stage bottlenecks differently, so no single recipe carries across models. vLLM-Omni TTS team tuned a different lever for each of four TTS models:

🗣️ Qwen3-TTS: decouple connector chunking from the Code2Wav decode window, batch the Stage-0 decode preprocessing. +61.5% audio throughput on H20×2, P99 latency nearly halved.
🌊 VoxCPM2: whole-forward torch.compile, plus CFM/LocDiT decode-tail batching across requests. +172% audio throughput.
🎚️ Higgs Audio V3: move the multi-codebook decode state machine into GPU-resident tensors. 2.7x speedup.
🐟 Fish Speech S2 Pro: a model-specific q_len=1 Triton attention kernel for the pure-decode path.

Full engineering deep-dive on how we picked each lever:
🔗 https://t.co/ZVROwJwYoT

220

129

15K

rgroag retweeted

Mo Elgaraihy

@EngMoElgaraihy

7 days ago

جوجل تفتح خزائنها للمطورين بشكل غير متوقع، وتتيح رسمياً 1,000,000 توكن في الدقيقة مجاناً بالكامل وبـ صفر قيود. 😳 بدون الحاجة لبطاقة ائتمانية، وبدون أي اشتراكات شهرية؛ فقط دخول رسمي ومباشر عبر منصة Google AI Studio لامتلاك طاقة حوسبة هائلة كانت تكلف آلاف الدولارات شهرياً. إليك تفاصيل هذه الفرصة وكيف تستغلها في مشروعك القادم: 👇

EngMoElgaraihy's tweet photo. جوجل تفتح خزائنها للمطورين بشكل غير متوقع، وتتيح رسمياً 1,000,000 توكن في الدقيقة مجاناً بالكامل وبـ صفر قيود. 😳

بدون الحاجة لبطاقة ائتمانية، وبدون أي اشتراكات شهرية؛ فقط دخول رسمي ومباشر عبر منصة Google AI Studio لامتلاك طاقة حوسبة هائلة كانت تكلف آلاف الدولارات شهرياً.

إليك تفاصيل هذه الفرصة وكيف تستغلها في مشروعك القادم: 👇

182

291K

rgroag retweeted

Vaibhav Sisinty

@VaibhavSisinty

8 days ago

This is actually wild. Hermes just let you merge any two AI models into one virtual model. 🤯 It is called Mixture of Agents. Here is how it works. You pick any two models. GPT-5.5 and Claude Opus for example. One runs as the reference, one as the aggregator. Name the combo anything you want. It shows up as a single selectable model in your picker like any other. Every task, both models run in parallel. The reference analyzes and responds. The aggregator reads that, synthesizes everything, writes the final answer, and handles all tool calls. You see one clean output. The results on hard agentic tasks: → 8% higher than Opus 4.8 alone → 11% higher than GPT-5.5 alone Full Hermes features work untouched. Memory, tool use, skills, long sessions, cross-channel messaging. Nothing breaks. The combo just performs better than either model on its own. You can mix any providers too. OpenAI, Anthropic, OpenRouter, local models. Whatever you have access to.

981

106

144K

rgroag retweeted

Liquid AI

@liquidai

10 days ago

Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic tasks on phones, robots, home and network automation devices. > 230M parameters, built on the LFM2 architecture > Pre-trained on 19T tokens, with a 32K context extension > Post-trained with distillation from LFM2.5-350M > 213 tok/s decode speed on Galaxy S25 Ultra (CPU) > 42 tok/s on a Raspberry Pi 5 (CPU) > Competes with and often beats models more than twice its size on instruction following, data extraction, and tool use. > use it for large-scale data extraction pipelines or lightweight on-device agentic workloads. 🧵

liquidai's tweet photo. Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic tasks on phones, robots, home and network automation devices.

> 230M parameters, built on the LFM2 architecture
> Pre-trained on 19T tokens, with a 32K context extension
> Post-trained with distillation from LFM2.5-350M
> 213 tok/s decode speed on Galaxy S25 Ultra (CPU)
> 42 tok/s on a Raspberry Pi 5 (CPU)
> Competes with and often beats models more than twice its size on instruction following, data extraction, and tool use.
> use it for large-scale data extraction pipelines or lightweight on-device agentic workloads.

🧵

194

236K

rgroag retweeted

Ornith

@ornith_

10 days ago

Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including: ✅Terminal-Bench 2.1(77.5) ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual) ✅NL2Repo(48.2) ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW) ✅ClawEval(77.1) Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎 All models are released under the MIT license, enabling full commercial and research use. 📖Tech Blog: https://t.co/qT9N2HYWFn 🤗Huggingface: https://t.co/PRrwqjeBtM

ornith_'s tweet photo. Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding.

Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including:
✅Terminal-Bench 2.1(77.5)
✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual)
✅NL2Repo(48.2)
✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW)
✅ClawEval(77.1)

Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎

All models are released under the MIT license, enabling full commercial and research use.

📖Tech Blog: https://t.co/qT9N2HYWFn
🤗Huggingface: https://t.co/PRrwqjeBtM

492

rgroag retweeted

Akshay 🚀

@akshay_pachaar

15 days ago

Web scraping will never be the same. (100% open-source visual search at scale) PixelRAG is a retrieval system that skips HTML parsing completely. Instead of scraping a page into text and embedding chunks, it screenshots the page and retrieves the image. A vision-language model reads the answer straight off the pixels. Why that matters: parsing is where web RAG quietly loses information. - A single HTML-to-text parser can drop 40%+ of a page. - Tables, charts, and layout get flattened or thrown out. - Swapping parsers alone can move accuracy ~10 points on the same docs. PixelRAG indexes the page a person actually sees. The team built a visual index of all of Wikipedia, 30M+ screenshots, and it still beats the strongest text RAG baseline by 18.1% on text-only QA. The repo also ships a Claude Code plugin that gives Claude eyes. It lets Claude screenshot any URL and read the rendered page instead of scraping the DOM. So you can hand it a live page, an arXiv paper, or your local site and ask what it actually looks like. One setup script. No MCP server, no backend. How the pipeline works: - Renders each document (web, PDF, image) to image tiles. - Embeds them with Qwen3-VL-Embedding, LoRA fine-tuned on screenshots. - Builds a FAISS index and serves a search API. A stronger reader model lifts accuracy with no re-indexing, since the index is just pixels. Everything is open-source under Apache-2.0. GitHub repo: https://t.co/qun9TjAdmw Talking about RAG, I recently wrote an article on a new approach that makes retrieval much more efficient by cutting corpus size by 40x, reducing tokens per query by 3x, and improving vector search relevance by 2.3x. The article is quoted below.

131

834

12K

929K

rgroag retweeted

Ahmad

@TheAhmadOsman

14 days ago

Local AI hardware = capacity × bandwidth × software stack - Capacity tells you what fits - Bandwidth tells you how hard the box can breathe - The software stack tells you how much of the spec sheet you can actually cash out. Hardware by Memory Bandwidth - Mac Studio M3 Ultra: up to 512GB @ 819 GB/s - RTX PRO 6000 Blackwell: 96GB @ 1792 GB/s - RTX 5090: 32GB @ 1792 GB/s - RTX 4090: 24GB @ 1008 GB/s - RX 7900 XTX: 24GB @ 960 GB/s - Radeon PRO W7900: 48GB @ 864 GB/s - AMD Radeon AI PRO R9700: 32GB @ 640 GB/s - Intel Arc Pro B65: 32GB @ ~608 GB/s - Tenstorrent Wormhole n300: 24GB @ 576 GB/s - Tenstorrent Blackhole p150: 32GB @ 512 GB/s + 800G - MacBook Pro M5 Max: 460-614 GB/s - MacBook Pro M5 Pro: 307 GB/s - DGX Spark: 128GB @ 273 GB/s (coherent + CUDA) - Mac mini M4 Pro: 273 GB/s - Ryzen AI Max / Strix Halo: ~256 GB/s (~96GB usable GPU) - MacBook Air M5: 153 GB/s - Snapdragon X2 Elite: 152-228 GB/s - Intel Lunar Lake: 136 GB/s - Snapdragon X Elite: 135 GB/s - Mac mini M4: 120 GB/s - Arc Pro B60: 24GB @ ~456 GB/s Verdict - GPUs are still the bandwidth kings - Apple wins: stupid amounts of memory, don’t want to shard across GPUs - Apple loses: when raw tokens/sec & concurrency matter more - DGX Spark: coherent memory + NVIDIA stack - Strix Halo / Ryzen AI Max: first real x86 unified-memory contender - Tenstorrent: fully OSS stack, excited to see this mature Fitting ≠ serving Even if it fits, you still pay for - bandwidth during decode - KV cache growth - dequantization - batching + concurrency - scheduler quality - framework overhead The only mental model that matters: 1. What must fit? 2. What bandwidth tier do I need? 3. What software stack can actually deliver it? In short: - NVIDIA → fastest raw speed - Apple Studio M3 Ultra → biggest one-box memory - Strix Halo → first real x86 unified - DGX Spark → coherent NVIDIA dev appliance - AMD / Intel Arc → rising alternatives - Tenstorrent → fully opensource stack Do ask: “which bottleneck am I buying?” Not: “which hardware is best?”

TheAhmadOsman's tweet photo. Local AI hardware = capacity × bandwidth × software stack

- Capacity tells you what fits
- Bandwidth tells you how hard the box can breathe
- The software stack tells you how much of the spec sheet you can actually cash out.

Hardware by Memory Bandwidth
- Mac Studio M3 Ultra: up to 512GB @ 819 GB/s
- RTX PRO 6000 Blackwell: 96GB @ 1792 GB/s
- RTX 5090: 32GB @ 1792 GB/s
- RTX 4090: 24GB @ 1008 GB/s
- RX 7900 XTX: 24GB @ 960 GB/s
- Radeon PRO W7900: 48GB @ 864 GB/s
- AMD Radeon AI PRO R9700: 32GB @ 640 GB/s
- Intel Arc Pro B65: 32GB @ ~608 GB/s
- Tenstorrent Wormhole n300: 24GB @ 576 GB/s
- Tenstorrent Blackhole p150: 32GB @ 512 GB/s + 800G
- MacBook Pro M5 Max: 460-614 GB/s
- MacBook Pro M5 Pro: 307 GB/s
- DGX Spark: 128GB @ 273 GB/s (coherent + CUDA)
- Mac mini M4 Pro: 273 GB/s
- Ryzen AI Max / Strix Halo: ~256 GB/s (~96GB usable GPU)
- MacBook Air M5: 153 GB/s
- Snapdragon X2 Elite: 152-228 GB/s
- Intel Lunar Lake: 136 GB/s
- Snapdragon X Elite: 135 GB/s
- Mac mini M4: 120 GB/s
- Arc Pro B60: 24GB @ ~456 GB/s

Verdict

- GPUs are still the bandwidth kings

- Apple wins: stupid amounts of memory, don’t want to shard across GPUs
- Apple loses: when raw tokens/sec & concurrency matter more

- DGX Spark: coherent memory + NVIDIA stack

- Strix Halo / Ryzen AI Max: first real x86 unified-memory contender

- Tenstorrent: fully OSS stack, excited to see this mature

Fitting ≠ serving

Even if it fits, you still pay for
- bandwidth during decode
- KV cache growth
- dequantization
- batching + concurrency
- scheduler quality
- framework overhead

The only mental model that matters:

1. What must fit?
2. What bandwidth tier do I need?
3. What software stack can actually deliver it?

In short:
- NVIDIA → fastest raw speed
- Apple Studio M3 Ultra → biggest one-box memory
- Strix Halo → first real x86 unified
- DGX Spark → coherent NVIDIA dev appliance
- AMD / Intel Arc → rising alternatives
- Tenstorrent → fully opensource stack

Do ask: “which bottleneck am I buying?”

Not: “which hardware is best?”

264

224K

rgroag retweeted

dharmesh

@dharmesh

18 days ago

Do not infer with AI that which can be queried without. That's from an internal presentation I gave at HubSpot today. --- LLMs are great, but there are a *lot* of use cases that are much better handled with a structured query (like SQL). It's much more economical, much faster and predictable. Just because an LLM can potentially answer a question you have by passing a bunch of unstructured text into the context window doesn't mean you should.

dharmesh's tweet photo. Do not infer with AI that which can be queried without.

That's from an internal presentation I gave at HubSpot today.

---
LLMs are great, but there are a *lot* of use cases that are much better handled with a structured query (like SQL). It's much more economical, much faster and predictable.

Just because an LLM can potentially answer a question you have by passing a bunch of unstructured text into the context window doesn't mean you should.

rgroag retweeted

MiniMax (official) @MiniMax_AI

20 days ago

M3's free on @0G_labs 0G compute for three days, starting today. Amazing chance to throw a long-running task at it 👇

245

24K

rgroag retweeted

Unsloth AI

@UnslothAI

23 days ago

DiffusionGemma can now run at 2000+ tokens/sec! ⚡ We made local DiffusionGemma inference 1.8× faster. Run it on 18GB RAM via Unsloth Studio. GitHub: https://t.co/aZWYAtakBP Guide: https://t.co/wYLfJWE6kG

186

177K

rgroag retweeted

MiniMax (official) @MiniMax_AI

23 days ago

MiniMax M3, Open-Weight, Now On Hugging Face , with only ~428B parameters and ~23B activated parameters Weights: https://t.co/g4Ybfa2kWH MiniMax Sparse Attention: https://t.co/HcTlWRotG3

113

330

538

691K

rgroag retweeted

NVIDIA AI

@NVIDIAAI

25 days ago

Congrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100. We're supporting it from day one with: • BF16 and NVFP4 checkpoints on @huggingface🤗 • Free GPU-accelerated endpoints on https://t.co/6T0R9P7EXS • @vllm_project support with FP8 precision Get started with DiffusionGemma on NVIDIA: https://t.co/vurk7GCQUs

117

326

101K

rgroag retweeted

Windows Developer @windowsdev

26 days ago

WSL containers ⚡ At #MSBuild, we announced a built-in way to create, run, and interact with Linux containers on Windows. Watch the demo on demand: https://t.co/NVotUyk1U9

727

314

65K

rgroag retweeted

Google Gemma

@googlegemma

25 days ago

Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

166

809

962K

Raúl G. Roa Gómez

@rgroag

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users