Eddie Storm @vctrstrm - Twitter Profile

Pinned Tweet

16 days ago

The next time a machine guesses exactly what you want, ask yourself whether it succeeded in understanding what you want or whether it succeeded in teaching you what to want.

0

33

Eddie Storm

@vctrstrm

about 18 hours ago

@ID_AA_Carmack There are about 4 levers that can be used: context, harness, authority-boundary, and problem shape. Plenty of degrees of freedom between them to get extremely good performance with small models but DevX isn't there yet to make this easy for people to use.

0

69

Eddie Storm

@vctrstrm

about 23 hours ago

@thdxr

0

28

Eddie Storm

@vctrstrm

1 day ago

@Teknium @lxnmnnn harness polyinstantiation wen?

0

334

Eddie Storm

@vctrstrm

2 days ago

@andrewlekashman <<< this >>> People going around saying "idk what the problem is" are really telling on themselves. Anyone using AI seriously in any domain today is doing enough harness engineering in their loop that they can trigger model downgrades and lobotomy router easily.

0

1

0

206

Eddie Storm

@vctrstrm

2 days ago

Total BS release, esp. the whole "we will use activation-steering to make the model give you stupid ass suggestions if you work on anything related to AI". Who the hell is working on any domain with heavy use of AI who is not simultaneously doing harness engineering and research that would trip the stupid lobotomy router?

0

1

0

1

338

Eddie Storm

@vctrstrm

3 days ago

@andonlabs "weird moral boundary" - they're activation steering it to hell

0

3

0

1K

Eddie Storm

@vctrstrm

3 days ago

@Polymarket lol, "technically not Mythos", market resolves as "false". No Refunds

0

115

Eddie Storm

@vctrstrm

3 days ago

@arian_ghashghai Series "Cope"

0

97

Eddie Storm

@vctrstrm

5 days ago

The insight is that they are the one and the same. There are no generalists, there are only specialists who have grown to take their environment for granted. You always find yourself in company of specialists, with some more or less specialized to dispatching. - That is probably the closest root to the useful distinction you're hinting at. Spending $100 for an oil change doesn't make you a car mechanic anymore than $100 spent on a rotisserie chicken makes you a hunter, a rancher, or a cook. Monetary transaction analogy makes the point obvious, but it is not about money. - It is about the transaction. Transaction happens every time you walk on a paved road. Every time you are able to buy a tool instead of having to figure out how to make one. Every time you boot up computer despite not knowing how to build one from scratch. - We transact against the momentum of the civilization and those who came before us.

0

1

0

31

Eddie Storm

@vctrstrm

5 days ago

@thdxr *disgruntled Nix noises*

0

97

Eddie Storm

@vctrstrm

5 days ago

The notion of storage is useful among cabinets, ledgers, and dry goods. It is a modest servant inthe counting house. But at the scale of minds, practices, lineages, and civilizations, there is nomere retrieval. There is only regrowth

0

17

Eddie Storm

@vctrstrm

5 days ago

Capital allocators and AI users often succumb to the same delusion: that they are "generalists". No. You're dispatchers. Stay humble.

0

17

Eddie Storm

@vctrstrm

5 days ago

@Thedorgy @alive_ as a generalist

1

2

0

65

Eddie Storm

@vctrstrm

5 days ago

@voird33r @docmilanfar Haha, nope. - Had to look that one up. Luckily, I was too young and video games were too much fun.

0

1

0

20

Eddie Storm

@vctrstrm

5 days ago

I found that it depends on the model (and model architecture). Smaller models do not have as much capacity for in-context learning so there is less possibility for adaptation outside of mutating the harness and the shape of the problem. We are presently at the junkyard stage of model embodiment, where the models, by virtue of being trained on human-generated outputs, have to be embodied in a harness that mimics the human environment (shells, commonly used tools, etc.). - I call it the "junkyard stage" because we are assembling harnesses from primitives already found lying around. The biggest hurdle for squeezing more juice out of human-generated data is, currently, not the lack of better data, but the momentum of UX. - The customer pool for the frontier models expects and demands interoperability with their own choice of tools. These are the apocryphal "demands for faster horses" that we have to transcend in order to unlock greater potential. Within the realm of using human-generated data, there is tremendous unlock hidden in post-training large models to prefer, for example, Python REPL style interaction instead of the shell-based interaction. - The unlock comes from the fact that it is much easier to create a software-defined Python REPL (or Jupyter Notebook) than it is to create a fully software-defined facsimile of a Linux OS shell. - And the other benefit of this is increased legibility (everything is a call graph), enforcement, and general control surface over the agent. Finally, it unlocks the ability to begin earnestly evolving the bodies of agents with the problem domains. I don't know if frontier labs will ever be able to make a huge bet on a concept like this at a large model scale. - Maybe they will, once the practices are fully or quasi nationalized and given unlimited budgets. I'm presently working on demonstrating that at the smaller scale, where embodiment and problem-shaping can allow drawing out more, much more, domain-specific performance than training alone.

Philipp Schmid

@_philschmid

6 days ago

My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?

243

419

15

75

58K

0

74

Eddie Storm

@vctrstrm

7 days ago

@Strife212 Consciousness is a vibes-based, nonsense concept. It is practically synonymous with "the subjective feeling of being special and especially aware".

0

2

0

57

Eddie Storm

@vctrstrm

9 days ago

@dhh we just need a serious successor to nix

0

1

0

85

Eddie Storm

@vctrstrm

Last Seen Users on Sotwe

Trends for you

Most Popular Users