Ettore Di Giacinto

Verified account

@mudler_it

dad, creator of LocalAI( and Kairos ( , ex @SUSE/@Rancher, ex-Gentoo Dev.

Italy

Joined January 2016

257 Following

3.1K Followers

2.8K Posts

Pinned Tweet

Ettore Di Giacinto

23 days ago

LocalAI ( @LocalAI_API ) 4.2.0 is out, just few numbers and facts: - +392 commits ( we squash these 😄 ) - +11 Backends: voice and face recognition, vibevoice.cpp (from me), LocalQVE from @jichiep and among @sgl_project , @__tinygrad__ , @no_stp_on_snek 's Turboquant, ik_llama.cpp, sam.cpp from @el_PA_B - Many new QoL improvements, increased sglang and VLLM support and hardening on distributed mode - 16+ new contributors ! Thanks to the community! LocalAI is all about give you flexibility to run the latest from the community, and ds4 support from @antirez is on its way! This is the year of Local AI!

mudler_it's tweet photo. LocalAI ( @LocalAI_API ) 4.2.0 is out, just few numbers and facts:

- +392 commits ( we squash these 😄 )
- +11 Backends: voice and face recognition, vibevoice.cpp (from me), LocalQVE from @jichiep and among @sgl_project , @__tinygrad__ , @no_stp_on_snek 's Turboquant, ik_llama.cpp, sam.cpp from @el_PA_B
- Many new QoL improvements, increased sglang and VLLM support and hardening on distributed mode
- 16+ new contributors ! Thanks to the community!

LocalAI is all about give you flexibility to run the latest from the community, and ds4 support from @antirez is on its way!

This is the year of Local AI!

10

38

8

19

8K

Ettore Di Giacinto

about 10 hours ago

@LocalAI_API https://t.co/jkkiRvBXca

Ettore Di Giacinto

4 days ago

parakeet.cpp: native C++/ggml (@ggml_org) inference for @NVIDIAAIDev's Parakeet, one of the best speech-to-text models out there, from the @LocalAI_API team. Every Parakeet model (TDT/CTC/RNNT/hybrid + cache-aware streaming), byte-for-byte identical output to NeMo, now running anywhere with no Python and even a bit faster, on CPU and GPU. Quantized GGUF on @huggingface 🤗 Huge thanks to @ggerganov for ggml and to @NVIDIAAIDev for releasing Parakeet! 🧵

14

363

54

358

54K

0

0

0

0

241

Ettore Di Giacinto

about 10 hours ago

parakeet.cpp now does batched transcription. Decode N clips in one pass and a single GB10 runs up to 12x faster at batch 16. Peak ~1,260 clips/s. CPU sees 3-5x. Same model, bit-for-bit identical output. No accuracy traded for speed.

1

43

2

37

3K

Ettore Di Giacinto

about 10 hours ago

Already live in LocalAI ( @LocalAI_API ) as dynamic batching. https://t.co/CB7lXI3Txy

1

4

1

4

221

Who to follow

OpenAI Open Source alternative. LocalAI is a community, drop-in replacement API compatible with OpenAI for local CPU/GPU inferencing

Darren Shepherd

Verified account

@ibuildthecloud

Agents building Agents. I'm not serious. Co-Founder @Obots_ai Formerly @Rancher_Labs. k3s Creator. Member of The Church of Jesus Christ of Latter-Day Saints

Create & Manage Tenant Clusters Like a Hyperscaler

Ettore Di Giacinto

about 18 hours ago

Amazing use-case! powered by parakeet.cpp!

about 23 hours ago

moshi 3.2.2 is OUT in App Store with Parakeet. it is really FAST!

3

10

1

2

3K

0

3

1

1

959

Ettore Di Giacinto

about 18 hours ago

@odd_joel nice!

1

2

0

0

30

Ettore Di Giacinto

1 day ago

@BarathAnandan7 @ggml_org @NVIDIAAIDev @LocalAI_API Thanks appreciated!

0

0

0

0

51

Ettore Di Giacinto

4 days ago

parakeet.cpp: native C++/ggml (@ggml_org) inference for @NVIDIAAIDev's Parakeet, one of the best speech-to-text models out there, from the @LocalAI_API team. Every Parakeet model (TDT/CTC/RNNT/hybrid + cache-aware streaming), byte-for-byte identical output to NeMo, now running anywhere with no Python and even a bit faster, on CPU and GPU. Quantized GGUF on @huggingface 🤗 Huge thanks to @ggerganov for ggml and to @NVIDIAAIDev for releasing Parakeet! 🧵

14

363

54

358

54K

Ettore Di Giacinto

1 day ago

@Pinperepette Grazias!

0

2

0

0

28

Ettore Di Giacinto

2 days ago

@Aoomsn Parakeet.cpp multi-language supports 25 languages, it's written in the model card: https://t.co/2wXiA6ZafA

0

1

0

4

256

Ettore Di Giacinto

2 days ago

parakeet.cpp now runs on Apple Metal.

Ettore Di Giacinto

4 days ago

parakeet.cpp: native C++/ggml (@ggml_org) inference for @NVIDIAAIDev's Parakeet, one of the best speech-to-text models out there, from the @LocalAI_API team. Every Parakeet model (TDT/CTC/RNNT/hybrid + cache-aware streaming), byte-for-byte identical output to NeMo, now running anywhere with no Python and even a bit faster, on CPU and GPU. Quantized GGUF on @huggingface 🤗 Huge thanks to @ggerganov for ggml and to @NVIDIAAIDev for releasing Parakeet! 🧵

14

363

54

358

54K

5

265

15

233

39K

Ettore Di Giacinto

2 days ago

https://t.co/Lcbz1vDNh0

0

8

0

9

511

Ettore Di Giacinto

3 days ago

This is actually cool when you have more nodes to distribute your compute to

LocalAI @LocalAI_API

3 days ago

Scaling LLMs across nodes? When a follow-up lands on a replica that never saw your chat, the whole prompt is recomputed and the KV cache wasted. LocalAI fixes this at the router: cache-aware routing across a mixed fleet of vLLM + SGLang + llama.cpp + ...

LocalAI_API's tweet photo. Scaling LLMs across nodes? When a follow-up lands on a replica that never saw your chat, the whole prompt is recomputed and the KV cache wasted.

LocalAI fixes this at the router: cache-aware routing across a mixed fleet of vLLM + SGLang + llama.cpp + ... https://t.co/eFf7Z9zqMw

1

3

1

3

624

1

3

0

2

352

Ettore Di Giacinto

3 days ago

@ivanfioravanti

0

2

0

1

203

mudler_it retweeted

3 days ago

what a wonderful project: parakeet.cpp https://t.co/idw7t2y106 GGML based parakeet inference pipeline that's 2x faster than my ONNX parakeet pipeline on Apple Silicon! (Needed a few local patches to get it going)

11

212

10

190

19K

Ettore Di Giacinto

3 days ago

@badlogicgames Thank you! really appreciated coming from you!

1

1

0

0

139

Ettore Di Giacinto

3 days ago

@sky_bolt20907 @antirez replying with AI? why folks. just be genuine

1

8

0

0

129

Ettore Di Giacinto

3 days ago

and I really mean it. Everyday we fight now: - Lots of Security reports which aren't valid (while some are, but get buried in the mix now) - AI Automated PR which looks legit, but then looking at detail you realize nothing was really well put or at least very superificially. And even if asking for fixes on the PR, the author just goes away (why opening it then?) - And then, harassment, and github does nothing about it. here's the last one that I received: https://t.co/ZICcymfrTl

Ettore Di Giacinto

4 days ago

yeah that's very bad. People taking pitchforks are gonna push away OSS maintainers even more. It's becoming already barely unsustainable: from thousands of bad security reports, github issues with attacks and violent phrasing, and fake automated PRs with zero to little mind put on it.

2

26

1

4

13K

1

3

1

0

543

mudler_it retweeted

3 days ago

This is a great find. Qwen3.6-35B-A3B APEX by @mudler_it is surprisingly speedy for a 32GB Mac Mini M2 Pro. Is able to fix basic Scala unit tests; prefill starts at 400 tk/s, drops to 120 by 32k ctx; tg around 25tk/s dropping to 13tk/s - not uber fast, but for a device not made for AI this is fantastic. Client: Mistral Vibe. ./llama-b9434/llama-server -hf mudler/Qwen3.6-35B-A3B-APEX-MTP-GGUF:I-Compact --spec-type draft-mtp --spec-draft-n-max 2 -fa on -ngl all --host 0.0.0.0 --port 8080 -c 70000 --parallel 1 --no-warmup -b 2048 -ub 2048 -ctk q4_0 -ctv q4_0 There's also an I-Nano quant available [https://t.co/kL9StcxrXb] which is 11.7 GB in size (!! - might work for those on 16GB VRAM)

1

9

2

5

1K

Ettore Di Giacinto

4 days ago

yeah that's very bad. People taking pitchforks are gonna push away OSS maintainers even more. It's becoming already barely unsustainable: from thousands of bad security reports, github issues with attacks and violent phrasing, and fake automated PRs with zero to little mind put on it.

2

26

1

4

13K

Last Seen Users on Sotwe

Trends for you

Most Popular Users