Andrei @abetlen - Twitter Profile

Pinned Tweet

over 2 years ago

Multimodal support in llama-cpp-python powered by LLaVA 1.5 API is compatible with the new gpt-4-vision-preview model and supports JSON mode responses Github: https://t.co/5YzNSCDG7u Docs: https://t.co/mOc945q9qM

abetlen's tweet photo. Multimodal support in llama-cpp-python powered by LLaVA 1.5

API is compatible with the new gpt-4-vision-preview model and supports JSON mode responses

Github:
https://t.co/5YzNSCDG7u

Docs:
https://t.co/mOc945q9qM https://t.co/5HY52gPgGi

8

245

48

161

47K

Andrei

@abetlen

about 3 hours ago

Colab Link: https://t.co/Mm6wtcPVrm

0

1

79

Andrei

@abetlen

about 3 hours ago

btw you can run Gemma 4 QAT using llama.cpp in a T4 Google Colab notebook (link in reply)

Google for Developers

@googledevs

about 18 hours ago

Gemma 4 quantization-aware training (QAT) models are now available, bringing AI performance directly to edge devices and consumer GPUs. These checkpoints are optimized with quantization-aware training to dramatically reduce memory requirements and unlock high-speed local inference. 🧵

googledevs's tweet photo. Gemma 4 quantization-aware training (QAT) models are now available, bringing AI performance directly to edge devices and consumer GPUs. These checkpoints are optimized with quantization-aware training to dramatically reduce memory requirements and unlock high-speed local inference. 🧵

23

762

87

201

52K

1

3

0

209

abetlen retweeted

Georgi Gerganov

@ggerganov

19 days ago

llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! https://t.co/vjaMwEpIaR

48

1K

183

532

273K

Who to follow

Eric Hartford

@QuixiAI

We make AI models Dolphin and Samantha BTC 3ENBV6zdwyqieAXzZP2i3EjeZtVwEmAuo4 https://t.co/3ri2GbXrQB https://t.co/zH0F3pTjjY @dphnAI

Jon Durbin

@jon_durbin

Human. Backend dev https://t.co/CJYvkACyne

Georgi Gerganov

@ggerganov

24th at the Electrica puzzle challenge | building https://t.co/baTQS2bdia | engineer @huggingface

Andrei

@abetlen

28 days ago

@vikhyatk they're calling him the jack kerouac of vlms

0

1

0

84

abetlen retweeted

Georgi Gerganov

@ggerganov

2 months ago

llama.cpp at 100k stars now that 90% of the code worldwide is being written by AI agents, I predict that within 3-6 months, 90% of all AI agents will be running locally with llama.cpp 😄 Jokes aside, I am going to use this small milestone as an opportunity to reflect a bit on the project and the state of AI from the perspective of local applications. There is a lot to say and discuss and yet it feels less and less important to try to make a point. Opinions about viability of local LLMs are strongly polarized, details are overlooked, the scientific approach is lacking. Arguments are predominantly based on vibes and hype waves. One thing is clear though - local LLMs are used more and more. I expect this trend to continue and likely 2026 will end up being one of the most important years for the local AI movement. I admit that I didn't expect the agentic era to come so quickly to the local LLM space. One year ago, the available models were too computationally expensive for doing long-context tasks. There wasn't an obvious path towards meaningful agentic applications. The memory and compute requirements were huge. Last summer, with the release of gpt-oss, things started to change. It was the first time we saw a glimpse of tool calling that actually works well within the resource constraints of our daily devices. Later in the year, even better models were released and by now, useful local agentic workflows are a reality. Comparing local vs hosted capabilities at a given moment of time is pointless. To try put things into perspective: - We don't need frontier intelligence to automate searches and sending emails - We don't need trillion parameter models to be able to summarize articles or technical documents - We don't need massive GPU data centers to control our home appliances or turn the lights off in the garage I believe that there is a certain level of intelligence we as humans can comprehend and meaningfully utilize to improve our working process. Beyond that level, access to more intelligence becomes unnecessary at best and counterproductive at worst. I also believe that that level of useful artificial intelligence is completely within reach locally and it has always been just a matter of implementing the right software stack to bring it to the end user. With llama.cpp, I am confident that we continue to be on the right track of building that software stack! The llama.cpp project is going stronger than ever. With more than 1500 contributors, the project keeps growing steadily. From technical point of view, I think that llama.cpp + ggml is the only solution that actually makes sense. That is, the software stack must run efficiently on every possible device, hardware and operating system. The technology is too important to be vendor-locked. It has to be developed in the open, by the community, together with the independent hardware vendors. This is the only right way to build something that will truly make a difference in the long run. I won't try to convince you about what is currently and will be possible with local AI. We will just continue to build as usual. I am confident that after the smoke clears and we look objectively at what we have built together, the benefits will be obvious to everyone. Big shoutout to all llama.cpp maintainers. I feel extremely lucky to be able to work together with so many talented contributors. Every day I learn something new and I feel there is so much more cool stuff that we are going to build. Also, I am really thankful that the project continues to have reliable partners to support it! Cheers!

ggerganov's tweet photo. llama.cpp at 100k stars

now that 90% of the code worldwide is being written by AI agents, I predict that within 3-6 months, 90% of all AI agents will be running locally with llama.cpp 😄

Jokes aside, I am going to use this small milestone as an opportunity to reflect a bit on the project and the state of AI from the perspective of local applications. There is a lot to say and discuss and yet it feels less and less important to try to make a point. Opinions about viability of local LLMs are strongly polarized, details are overlooked, the scientific approach is lacking. Arguments are predominantly based on vibes and hype waves.

One thing is clear though - local LLMs are used more and more. I expect this trend to continue and likely 2026 will end up being one of the most important years for the local AI movement.

I admit that I didn't expect the agentic era to come so quickly to the local LLM space. One year ago, the available models were too computationally expensive for doing long-context tasks. There wasn't an obvious path towards meaningful agentic applications. The memory and compute requirements were huge. Last summer, with the release of gpt-oss, things started to change. It was the first time we saw a glimpse of tool calling that actually works well within the resource constraints of our daily devices. Later in the year, even better models were released and by now, useful local agentic workflows are a reality.

Comparing local vs hosted capabilities at a given moment of time is pointless. To try put things into perspective:

- We don't need frontier intelligence to automate searches and sending emails
- We don't need trillion parameter models to be able to summarize articles or technical documents
- We don't need massive GPU data centers to control our home appliances or turn the lights off in the garage

I believe that there is a certain level of intelligence we as humans can comprehend and meaningfully utilize to improve our working process. Beyond that level, access to more intelligence becomes unnecessary at best and counterproductive at worst. I also believe that that level of useful artificial intelligence is completely within reach locally and it has always been just a matter of implementing the right software stack to bring it to the end user.

With llama.cpp, I am confident that we continue to be on the right track of building that software stack!

The llama.cpp project is going stronger than ever. With more than 1500 contributors, the project keeps growing steadily.

From technical point of view, I think that llama.cpp + ggml is the only solution that actually makes sense. That is, the software stack must run efficiently on every possible device, hardware and operating system. The technology is too important to be vendor-locked. It has to be developed in the open, by the community, together with the independent hardware vendors. This is the only right way to build something that will truly make a difference in the long run.

I won't try to convince you about what is currently and will be possible with local AI. We will just continue to build as usual. I am confident that after the smoke clears and we look objectively at what we have built together, the benefits will be obvious to everyone.

Big shoutout to all llama.cpp maintainers. I feel extremely lucky to be able to work together with so many talented contributors. Every day I learn something new and I feel there is so much more cool stuff that we are going to build. Also, I am really thankful that the project continues to have reliable partners to support it!

Cheers!

278

3K

398

2K

455K

abetlen retweeted

Fulcrum

@fulcrum_inc

2 months ago

🚨 We're open-sourcing Druids, a library for coordinating and deploying coding agents across machines. Our beta users have used Druids to work on open math problems, conduct ML "autoresearch," and make software faster.

3

227

31

228

26K

Andrei

@abetlen

4 months ago

@ggerganov Congratulations!

1

2

0

123

abetlen retweeted

Jared Duker Lichtman

@jdlichtman

9 months ago

This past July, @AlexKontorovich and Terry Tao announced progress of a “medium” Prime Number Theorem proof, which was a very exciting result. However, it faced technical difficulties in complex analysis related to the Riemann zeta function, which has been notoriously tricky for Lean. Today I’m very glad to announce a ~25K line Lean proof of a “strong” Prime Number Theorem, by establishing the classic zero-free region for the Riemann zeta function. Github: https://t.co/eDRrzquyBY

jdlichtman's tweet photo. This past July, @AlexKontorovich and Terry Tao announced progress of a “medium” Prime Number Theorem proof, which was a very exciting result. However, it faced technical difficulties in complex analysis related to the Riemann zeta function, which has been notoriously tricky for Lean.

Today I’m very glad to announce a ~25K line Lean proof of a “strong” Prime Number Theorem, by establishing the classic zero-free region for the Riemann zeta function.

Github: https://t.co/eDRrzquyBY

3

192

17

73

22K

abetlen retweeted

Math, Inc.

@mathematics_inc

9 months ago

Gauss is entering beta testing with select mathematicians. 🔗 Blog: https://t.co/BvLOAB1JSs 🔗 GitHub: https://t.co/0Y3IeVrvEw 🔗 Early access: https://t.co/ieajQbD40E

5

262

30

82

37K

abetlen retweeted

Math, Inc.

@mathematics_inc

9 months ago

Today we're announcing Gauss, our first autoformalization agent that just completed Terry Tao & Alex Kontorovich's Strong Prime Number Theorem project in 3 weeks—an effort that took human experts 18+ months of partial progress.

81

3K

467

1K

1M

Andrei

@abetlen

9 months ago

really happy to announce i'm in sf now fulltime on an o1 visa! thank you @jessemhan @morph_labs and @lisawehden for making this happen also, a big thank you to @natfriedman @PhilCulliton and @julien_c

abetlen's tweet photo. really happy to announce i'm in sf now fulltime on an o1 visa!

thank you @jessemhan @morph_labs and @lisawehden for making this happen

also, a big thank you to @natfriedman @PhilCulliton and @julien_c https://t.co/pUuZSplAQW

16

77

4

7

8K

abetlen retweeted

ggml @ggml_org

10 months ago

https://t.co/thFbq7kCq7

2

222

31

74

119K

Andrei

@abetlen

12 months ago

Incredibly proud of this work by the @morph_labs team!

1

18

1

2

1K

abetlen retweeted

Christian Szegedy

@ChrSzegedy

12 months ago

A mathematical paper autoformalized for the first time: amazing work by @morph_labs, presented today at the Big Proof conference by @jdlichtman and @jessemhan. I am very impressed by the blazing fast progress of the morph team. Especially by @LeyanPan and @critic_model.

6

224

34

75

33K

abetlen retweeted