Manuel Alejandro de Brito Fontes @aledbf - Twitter Profile

Manuel Alejandro de Brito Fontes

@aledbf

5 months ago

this allows the use of stock containerd v2.2.0+ and improves qemubox to boot in ~240ms https://t.co/qAACyEHWLp

0

1

0

129

Manuel Alejandro de Brito Fontes

@aledbf

5 months ago

What if your #containerd snapshotter skipped the host mounts entirely? 🚀 Introducing nexus-erofs: Pulls OCI images, converts layers to #EROFS on-the-fly, merges multiple layers into a single VMDK, and passes it directly to your VM via virtio-blk. The guest handles mounting-zero host overhead! 🧪 Experimental plugin for #containerd in VM runtimes GitHub: https://t.co/8fXRqqUPM1

1

0

155

Manuel Alejandro de Brito Fontes

@aledbf

5 months ago

Built qemubox: experimental containerd shim that runs containers in lightweight QEMU/KVM VMs: - ~300ms boot with full systemd - Docker works inside the VM - Snapshot & commit like regular images Demo: https://t.co/L0e90zw11z GitHub: https://t.co/GsimqOCgwj

0

2

0

131

aledbf retweeted

Awni Hannun

@awnihannun

5 months ago

Running OpenCode with MLX and Nemotron 3 Nano locally on an M4 Max is pretty nice. Here's a quick demo:

14

291

20

151

24K

Who to follow

Matt Moore ⛓🚀

@mattomata

Founder/CTO of @chainguard_dev.

Mauricio Salatino

@salaboy

Ecosystem & Software Engineer / Java Champion @salaboy.com @dash0hq / Author of https://t.co/PjCM8zkF0N

Bartłomiej Płotka

@bwplotka

Senior SWE TL @Google | ex Principal @RedHat | @ThanosMetrics @PrometheusIO | Bsky: https://t.co/ZzsR3O48hy | Efficient Go 📖 @OReillyMedia

aledbf retweeted

Alex Cheema

@alexocheema

6 months ago

Total unified memory: 2TB @ 3.2TB/s. Apple Silicon leads in memory / memory bandwidth unit economics. This is what matters for local AI where batch_size is small and workloads are memory-bound.

26

490

41

93

46K

aledbf retweeted

Awni Hannun

@awnihannun

6 months ago

The latest MLX is out! And it has a new distributed back-end (JACCL) that uses RDMA over TB5 for super low-latency communication across multiple Macs. Thanks to @angeloskath

awnihannun's tweet photo. The latest MLX is out!

And it has a new distributed back-end (JACCL) that uses RDMA over TB5 for super low-latency communication across multiple Macs.

Thanks to @angeloskath https://t.co/254dMxND9W

6

222

45

43

35K

aledbf retweeted

Jake Tivy

@jakkuh_t

6 months ago

VIDEO on this INSANITY is live!! Watch now! :D https://t.co/Wi4SuWBKQy

20

707

16

164

193K

aledbf retweeted

Kohei Tokunaga @TokunagaKohei

7 months ago

LLMlet: P2P distributed LLM inference on browsers with Wasm-compiled llama.cpp + WebRTC Repo: https://t.co/v0pJciWWxt Demo: https://t.co/Zq3jCj7fMa A model can't fit in a tab can be split and run on multiple browsers. Still experimental and missing parallelism and TURN service.

TokunagaKohei's tweet photo. LLMlet: P2P distributed LLM inference on browsers with Wasm-compiled llama.cpp + WebRTC

Repo: https://t.co/v0pJciWWxt
Demo: https://t.co/Zq3jCj7fMa

A model can't fit in a tab can be split and run on multiple browsers. Still experimental and missing parallelism and TURN service. https://t.co/FfyFBPtp7Z

2

13

3

2K

aledbf retweeted

Kubernetes

@kubernetesio

7 months ago

Blog: Ingress NGINX Retirement: What You Need to Know - https://t.co/PforxixCcU #Kubernetes

3

210

78

89

76K

aledbf retweeted

paco xu @xu_paco

7 months ago

containerd sandbox runtime using vms https://t.co/FHi4PoHw7w

1

26

5

15

3K

Manuel Alejandro de Brito Fontes

@aledbf

7 months ago

Wrapping up my nearly five-year journey at Gitpod today! Grateful for all the experiences and the amazing people I've met along the way. On to the next chapter! 🚀

0

4

0

150

aledbf retweeted

Alex Cheema

@alexocheema

8 months ago

NVIDIA sent us 2 DGX Sparks. For a while we wondered what we would do with them. The memory bandwidth is 273GB/s making it 3x slower than an M3 Ultra (819GB/s) for batch_size=1 inference. But it has 4x more FLOPS (100 TFLOPS compared to 26 TFLOPS). So we thought, what if we could combine the DGX Spark & M3 Ultra, and make use of both the massive compute on the DGX Spark and the massive memory-bandwidth on the M3 Ultra. We came up with a way to split inference across both devices and achieve a speedup of up to 4x for long prompts compared to the M3 Ultra on its own. Full details in the blog post linked below.

53

1K

120

610

281K

aledbf retweeted

Awni Hannun

@awnihannun

9 months ago

The new batch generation in MLX LM is pretty fast. Here's 4 simultaneous generations with Qwen3 4B on my M4 max:

20

254

21

55

60K

aledbf retweeted

Junyang Lin

@JustinLin610

9 months ago

Qwen3-Next, or to say, a preview of our next generation (3.5?) is out! This time we try to be bold, but actually we have been doing experiments on hybrid models and linear attention for about a year. We believe that our solution shoud be at least a stable and solid solution to new model architecture for super long context! GDN plus hybrid is based on a lot of trials and errors, and the implementation of attention gate is something just like a free lunch to get benefits. Moreover, we continue our research on MoE and carefully further increase the sparsity to make it more efficient and effective! What makes us suffer a lot is that you need to run the whole process of training to evaluate new model architecture, which means pre-training + post-training (notably reinforcement learning). We have proven it working and we release the instruct and thinking models both after RL. Nevertheless, as this is for the first time that we release something totally new, we are still unsure about what we have done right or wrong, and we need the support from the community. Specifically, many thanks to Hugging Face, vLLM, and SGLang. They have done quite a lot of efforts helping us deliver this new model to you all! Welcome to try and send us feedback! Hope it is a good start of a new journey 🚗

53

1K

104

223

112K

aledbf retweeted

Matt Beton

@MattBeton

10 months ago

Linear scaling achieved with multiple DeepSeek v3.1 instances. 4x macs = 4x throughput. 2x M3 Ultra Mac Studios = 1x DeepSeek @ 14 tok/sec 4x M3 Ultra Mac Studios = 2x DeepSeek @ 28 tok/sec DeepSeek V3.1 is a 671B parameter model - so at its native 8-bit quantization, it requires ~700GB of memory to run the model. EXO puts half of the layers on each device, combining their memory. EXO uses MLX distributed with TB5 interconnect, optimized for Apple Silicon. If we need higher throughput, adding two more devices lets us serve more users at once. @exolabs handles all of this seamlessly - adding more devices to the cluster for linear scaling as we need it. The new EXO 1.0 will be open-source soonTM

49

1K

142

681

158K

aledbf retweeted

Rhys

@RhysSullivan

10 months ago

Got inspired so I recreated a demo of this w/ Claude Code & Vercel Sandbox Each thread gets their own sandbox to develop in, but if you wanted to they could all use the same sandbox via worktrees

5

105

4

78

23K

aledbf retweeted

vLLM

@vllm_project

10 months ago

🚀 Amazing community project! vLLM CLI — a command-line tool for serving LLMs with vLLM: ✅ Interactive menu-driven UI & scripting-friendly CLI ✅ Local + HuggingFace Hub model management ✅ Config profiles for perf/memory tuning ✅ Real-time server & GPU monitoring ✅ Error logs & recovery 📦 Install in one line: pip install vllm-cli GitHub: https://t.co/FnfE0dtZ03 👉 Would you like to see these features in vLLM itself? Try it out & share feedback!

vllm_project's tweet photo. 🚀 Amazing community project!

vLLM CLI — a command-line tool for serving LLMs with vLLM:
✅ Interactive menu-driven UI & scripting-friendly CLI
✅ Local + HuggingFace Hub model management
✅ Config profiles for perf/memory tuning
✅ Real-time server & GPU monitoring
✅ Error logs & recovery

📦 Install in one line:

pip install vllm-cli

GitHub: https://t.co/FnfE0dtZ03

👉 Would you like to see these features in vLLM itself? Try it out & share feedback!

13

1K

180

581

71K

aledbf retweeted

Gergely Orosz

@GergelyOrosz

10 months ago

Something deeply ironic: how startups asking for devs to put in 6+ days per week, 80+ hour per weeks are... AI startups You'd assume that the value add of AI could be humans needing to do less work! So devs could spin off agents, go home and sleep. But no, doesn't work like this

59

2K

124

188

131K

aledbf retweeted

AI SDK

@aisdk

10 months ago

Stream preliminary tool results

8

278

17

111

27K

aledbf retweeted

Binyuan Hui

@huybery

10 months ago

We'll continuously enhance the qwen code (cli tool) based on your feedback and even release improved qwen-coder (model)! Our goal is to match Claude Code's performance while remaining fully open-source!

137

3K

155

584

176K

Manuel Alejandro de Brito Fontes

@aledbf

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users