Dan Kubb

Verified account

@dkubb

Mission, BC, Canada

Joined May 2007

2.6K Following

1.9K Followers

9.3K Posts

Pinned Tweet

9 months ago

What would a project, framework or language need to provide to maximize effectiveness for AI? I’m thinking: a large corpus of working examples, the ability to establish fast feedback loops via strong types, good testing culture, good linters/formatters, well written user docs and plentiful examples. My gut says something like https://t.co/SFcL1b5VJX would probably help too. IME the more guardrails I establish the more effective the LLMs are because there are fewer degrees of freedom for them to get lost in. The faster they can identify when they’ve left the golden path e better the results.

4

14

0

5

2K

about 2 hours ago

0

2

0

0

453

1 day ago

@justfielding Maybe the code was too enterprisey and it was like clippy “I see we are writing Java style Rust, let me help you with that!”

0

0

0

0

16

1 day ago

I made GPT lose it's mind in my Codex App. Its last message (see screenshots) was to generate an image of a Java cheatsheet. I did not ask it to generate an image. I asked it to generate Rust code. And then it tried to compact over and over and even if I stop and restart or recompact it cannot break out of this loop.

dkubb's tweet photo. I made GPT lose it's mind in my Codex App.

Its last message (see screenshots) was to generate an image of a Java cheatsheet. I did not ask it to generate an image. I asked it to generate Rust code.

And then it tried to compact over and over and even if I stop and restart or recompact it cannot break out of this loop.

1

3

0

1

288

Who to follow

Verified account

prolific open-source contributor. former 100x engineer; now replaced by an LLM. early eng team @cursor_ai and @github

Verified account

CEO, @qltysh, building Fabro

Christian. Husband. Father. Leader. Speaker. Mentor. One-word-sentence-writer. Hyphenator. Loves you. Empowering people at @GustoHQ. He/him.

1 day ago

I hope someday @Apple will focus on writing software that is as good as its hardware. It'll be like Goku shedding his weights: https://t.co/j4xWCY9Hrh

1

2

0

0

105

1 day ago

I could watch a whole series of lectures like this. I find this far more engaging than watching a slide-deck. The imperfect nature, and the natural pauses allows me to focus more on the material and absorb it rather than passively watching a polished presentation with picture-perfect slides.

1

2

0

0

634

2 days ago

To be fair, if someone is a 0.01x developer a 100x would bring their productivity to that of an average dev. I actually think this is what is happening in a lot of cases: non or really bad devs are finally producing a “normal” amount of code and are having the time of their lives.

1

2

0

0

122

2 days ago

One thing I do is set up a goal in a file, and then I tell Codex to use the goal file, and have Claude set to wake up every N minutes to review the commits and "steer" Codex by updating the goal file with flags it finds. Codex is also told about the arrangement and to re-check the goal file every time it completes a task. I literally do: /goal /tmp/goal.md This way I get Codex and Claude working together, with Claude watching Codex like a hawk. I'll also tell Claude to make sure Codex doesn't reward hack, and tell Codex to be wary of Claude too.

0

12

0

7

1K

3 days ago

There is definitely a change in cache usage. Around a week ago my 20x started draining 3-5x faster than before; with no change in prompt or project. I have been using goals with 5.5 low that drain 20-50% of my weekly quota in 24 hours. These are rust projects with long build times in between inference so it's not like it's hammering OpenAI servers or anything. There are multiple reports all across X confirming this. A 155h goal is a lot, but that doesn't mean it was all run consecutively. This is just how it reports things.

2

3

0

2

835

3 days ago

@zeeg @ThePrimeagen As I read that book a few times I was like “thats a very logical way to break that problem down and solve each sub problem in parallel”. I was not surprised af all to learn the author is a programmer.

0

2

0

0

63

6 days ago

@niner_by_nature @thsottiaux They just hired @pvncher from @RepoPrompt, which I hope means the Codex plan mode is going to become best-in-class soon.

2

0

0

0

78

7 days ago

I'm shocked that the ruby community hasn't latched onto mutant in the age of LLMs. There are no better techniques available to Ruby devs go make their tests better and their agents more effective than mutation testing. Seems like such a massive waste of time and money to not use it.

1

1

0

0

96

8 days ago

@joshmo_dev Agent harnesses should have the option to track every single event in such a way that you could do a perfect reconstruction of a session and what happened. IMHO the quality of engineering in a harness should be closer to a database than a todo list app 😁

0

3

1

0

233

8 days ago

@JacobRothfield @doodlestein Haha, well it's fun challenge, but it's not for a lab. My harness allows a SOTA model and open weight models to work together. There are a few modes, some where it could be used for distillation, or offloading of some tasks, or best of N, etc.

0

1

0

0

61

9 days ago

@JacobRothfield @doodlestein > mitigate LLM failures that are baked in by the training I know I've seen this but it must be very frequent for you to call this out. What have you seen? (asking because I am working on a harness right now)

2

1

0

1

337

9 days ago

@0xblacklight @RhysSullivan I noticed Opus doing that sometimes with its comments before bash calls.

1

0

0

0

183

9 days ago

@CWood_sdf Spoiler: All programmers have atrocious memories, and the ones who don't think this applies to them are the worst. Functional programming was made for people who are honest with themselves about their capabilities.

0

10

0

0

440

9 days ago

I would guess they can't move and experiment as fast as third party harness. Also, I don't think RL is as strong of an anchor as people think. The LLM has no idea what harness it's talking to; all it does is produce tokens and tool calls that the harness parses. There's no reason a third party harness couldn't perfectly emulate the interfaces exactly, but chain and combine things in novel ways.

0

1

0

0

533

9 days ago

@cavempty_ The fact that that this kid is getting upset about this advice, and posting about it online, makes it 100x funnier for the Dad.

0

115

0

0

5K

9 days ago

@DavidKPiano @lgrammel I know this somewhat contradicts with your statement about “go off and come back when you're done” but I'm thinking about it more for microtasks not entire trajectories that add a bunch of context. Just offload the fiddly bits while keeping the mainline context higher level.

0

0

0

0

41

9 days ago

Another benefit is that you can hand off tasks like “fix the failing lint or twst” to the forked subagent. The iterating required to make it pass doesn't really add to the conversation, so the subagent can just work on it and then return success without cluttering context. A lot of what we put in the context window doesn't really benefit the LLM on future turns, it just adds noise.

1

1

0

0

59

Last Seen Users on Sotwe

Trends for you

Most Popular Users