Paul Paczuski @plpxsk - Twitter Profile

plpxsk retweeted

about 1 month ago

Tool calling is at the core of agentic systems, but it is so brittle with OSS models that retries are baked into every pipeline. Our new product dotlambda guarantees that tool calls execute 100% of the time. 65% tokens savings, up to 20% accuracy 📈 👉 https://t.co/rmfLDickQZ

dottxtai's tweet photo. Tool calling is at the core of agentic systems, but it is so brittle with OSS models that retries are baked into every pipeline.

Our new product dotlambda guarantees that tool calls execute 100% of the time. 65% tokens savings, up to 20% accuracy 📈

👉 https://t.co/rmfLDickQZ https://t.co/hfpXsku0os

7

88

9

89

13K

Paul Paczuski @plpxsk

3 months ago

@karpathy @nummanali https://t.co/NUAlj1c00p seems to fit the bill

0

1

28

plpxsk retweeted

Awni Hannun

@awnihannun

3 months ago

The obvious reasons intelligence-per-watt is going up so fast: more efficient architectures, more efficient hardware, and higher quality data. The less obvious reason: finding the right balance on what should be stored in the model's weights and what can be computed through tool use, reasoning, and potentially other types of in-context learning. A simple example: in the earlier LLM days, it was quite likely that for simple arithmetic (e.g. adding two numbers), the model had to basically memorize tuples of (inputs, op, outputs). You can imagine this took up a lot of room in the weights. With reasoning the model can compute this in its chain-of-thought. With tool calling the model can compute this with a tool call. In both cases it saves a lot of space in the weights. I'm sure there is a floor on the smallest LLM that can have say GPT 5.x quality. But that floor could be 5B, it could be 100B. And I don't think anyone really knows because of the above effects. In other words we can probably go much further with a 5B-15B model with exceptional tool calling and reasoning.

37

378

38

107

45K

Paul Paczuski @plpxsk

3 months ago

Worst part about new agentic coding editors? Unfamiliar keyboard shortcuts. Anthropic's seem particularly strange: meta-P to switch model in Claude Code 🤔 What's your worst culprit?

0

28

Who to follow

sergey bratus

@sergeybratus

The cat is the Otocolobus Manul, https://t.co/Xswt7Vp2F1 . Manul is the perfect privacy mascot. All views & opinions are my own & personal.

Arsenii Ashukha

@senya_ashuha

Making GPUs go brrrr. staff AI Research Scientist @IsomorphicLabs | x-Research Scientist @Samsung | PhD @bayesgroup

Alana Renda

@alanamarzoev

Currently: AI research @ MIT. Previously: founder/CEO at @readysetio, research @ Microsoft, UC Berkeley, Cornell.

Paul Paczuski @plpxsk

3 months ago

@beaversteever To be fair, coding is not the only RAG use case. See NotebookLM

0

87

Paul Paczuski @plpxsk

3 months ago

@pie6k When I need inspiration, I do the same - just skip the https://t.co/PUPeViMpv8 part

0

476

Paul Paczuski @plpxsk

3 months ago

@dotta Zero human business. Who makes the money?

1

0

380

Paul Paczuski @plpxsk

3 months ago

@rauchg … as the CLI author says: “I built a CLI for Google Workspace — agents first”

0

14

Paul Paczuski @plpxsk

3 months ago

@rauchg Agents ❤️ CLIs

1

0

319

plpxsk retweeted

Peter Gostev

@petergostev

3 months ago

Bullshit benchmark - how good are LLMs are at detecting nonsense questions & pushing back: - Latest @AnthropicAI models are doing well, including Haiku - @Alibaba_Qwen Qwen 3.5 and @Kimi_Moonshot Kimi K2.5 are pretty decent too - @OpenAI and @GoogleDeepMind are middle of the pack - not great for mainstream models - Lots of other slightly older and smaller models engage with 70%+ of bullshit questions

petergostev's tweet photo. Bullshit benchmark - how good are LLMs are at detecting nonsense questions & pushing back:
- Latest @AnthropicAI models are doing well, including Haiku
- @Alibaba_Qwen Qwen 3.5 and @Kimi_Moonshot Kimi K2.5 are pretty decent too
- @OpenAI and @GoogleDeepMind are middle of the pack - not great for mainstream models
- Lots of other slightly older and smaller models engage with 70%+ of bullshit questions

8

119

14

32

20K

Paul Paczuski @plpxsk

3 months ago

@steipete @VibeTunnel GitHub Desktop “Plus” now seamlessly integrates worktrees, works out of the box. Found my cursor/claude code work trees without a problem. See https://t.co/jzL7nyJuJF

0

7

plpxsk retweeted

Тsфdiиg

@tsoding

4 months ago

Wait, this sounds incredible useful! Can we just have a model with 0 entropy, 0 hallucinations, that just acts like a retrieval database over its training dataset? Also sounds like a great way to solve the traceability problem. Why don't the AI labs just make something like that?

tsoding's tweet photo. Wait, this sounds incredible useful! Can we just have a model with 0 entropy, 0 hallucinations, that just acts like a retrieval database over its training dataset? Also sounds like a great way to solve the traceability problem. Why don't the AI labs just make something like that? https://t.co/n37BIqqQPO

406

9K

243

1K

513K

Paul Paczuski @plpxsk

4 months ago

@miguelgrinberg Love the 2x2 model of thinking about errors: New vs Bubbled-Up & Recoverable vs Non-Recoverable

0

1

0

3

Paul Paczuski @plpxsk

8 months ago

@vikhyatk Here, they go further than simply producing "novel" candidates: "the model’s in silico prediction was confirmed multiple times in vitro [lab]". This seems good. Still, many in-vitro experiments fail later in the process. But they're still published and "advance science".

0

8

Paul Paczuski @plpxsk

about 1 year ago

Next time I get asked which AI I choose, I will align myself with “the tastemakers” and give the reason in partial French: "consumers—particularly the tastemakers—are drawn to its certain _je ne sais quoi_ in conversation and thoughtful design” https://t.co/pXIawNGfd3

0

15

Paul Paczuski @plpxsk

over 2 years ago

@GaelVaroquaux @scikit_learn For example, `LogisticRegressionCV()` is my favorite (and most useful) single line of ML code, anywhere – although big shout-out to `train_test_split()` ;)

0

1

0

5

Paul Paczuski @plpxsk

over 2 years ago

@GaelVaroquaux @scikit_learn Big reason for its success is the simplicity and ease of use. That's the kind of marketing real users love.

1

0

7

plpxsk retweeted

Alonso Silva @alonsosilva

almost 3 years ago

LLM generating random numbers

139

5K

443

343

1M

plpxsk retweeted

Brad Neuberg

@bradneuberg

about 3 years ago

One form of future shock is being paralyzed by new tech shown every week. Learn to put that in background, & focus on shipping real products using tech of today, with an eye towards future ways to transform what you’re doing. Ship today, transform what you’re doing tomorrow.

2

70

9

18

8K

Paul Paczuski @plpxsk

about 3 years ago

@RohanAlexander Browse your Google sheets and export as CSV. Then go. Ask your students to try the same

0

20

Paul Paczuski

@plpxsk

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users