Pete Hodgson (@thepete.net on bluesky)

9 days ago

@jonathannorris @tracesdotcom @tarunsachdeva @devtoolsTO I love https://t.co/7PC1p4trj6! Jealous you got an in person demo.

114

ph1 retweeted

Joe Walnes

@joewalnes

23 days ago

Modern macOS contains a fully local inference model. No network calls, stays fully on device. Here's a single file script to turn it into an OpenAI API compatible completions server: https://t.co/07ieY8ASKu

joewalnes's tweet photo. Modern macOS contains a fully local inference model. No network calls, stays fully on device.

Here's a single file script to turn it into an OpenAI API compatible completions server:
https://t.co/07ieY8ASKu https://t.co/yq4Zcy6ymn

956

I make things that nobody's asking for. I used to post things that nobody was asking for, but then I stopped. Learn about me at https://t.co/U3CFodk0J6

24 days ago

@shubhamJReacts @mattpocockuk I think you have it backwards. Markdown is interpreted pretty permissively, but HTML way more so. HTML is probably the most permissively interpreted file format out there. Renderers and parsers will wade on no matter how malformed it is.

113

Who to follow

Paul Hammant

@paul_hammant

Software dev, trunk-based development, branch by abstraction, CI, CD, and lean expert. #BLM #COVIDisAirborne https://t.co/xAGC1LLD8y

28 days ago

@techgirl1908 100% agree on "just give it more context" being unhelpful. But I remain skeptical on automatically managed memories, until I see compelling results. I'm not ready to trust the quality of context being injected behind the scenes, at least when it comes to coding agents.

ph1 retweeted

30 days ago

the funniest thing about the token grift is most folks who pushed token burn in q1 are now having a falling out with their CFOs because they don’t have a metric that correlates to business outcomes Inputs -> outputs -> outcomes If you can’t measure revenue, measure KPIs If you cant measure KPIs, measure customer outcomes If you cant measure customer outcomes, measure task throughput (features, tickets, bugs) If you cant measure task throughput, measure work throughput (PRs) If you cant measure PRs, measure LOC If you cant measure LOC, measure tokens if you’re a leader and you’re not focused on improving your ability to measure things that matter, you’re cooked

15K

about 2 months ago

You cannot outsource the need for tasteful judgement. There's times you don't need it - when a good-enough decision is fine - and in those situations you should be using an LLM every time. But when thoughtful design decisions pay dividends, you still need an experienced human.

about 2 months ago

You cannot outsource the thinking

454

ph1 retweeted

Andrej Karpathy

@karpathy

about 2 months ago

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

21K

12K

ph1 retweeted

Matt Pocock

@mattpocockuk

3 months ago

Doing some experiments today with Opus 4.6's 1M context window. Trying to push coding sessions deep into what I would consider the 'dumb zone' of SOTA models: >100K tokens. The drop-off in quality is really noticeable. Dumber decisions, worse code, worse instruction-following. Don't treat 1M context window any differently. It's still 100K of smart, and 900K of dumb.

151

253

160K

ph1 retweeted

boris

@boristane

3 months ago

slop creep is what happens when you turn your brain off and hand the thinking to coding agents each individual change is fine, but all together, you have a pile of crap we're witnessing this happen in real-time across everything https://t.co/2hkDx8RAhE

645

447

90K

3 months ago

I designed a simple little thing and printed it and use it in my home. Some random people in other parts of the world needed the same thing too. They printed it, and now they use it in their homes. That's nice. https://t.co/0eSz99IreP

ph1's tweet photo. I designed a simple little thing and printed it and use it in my home.

Some random people in other parts of the world needed the same thing too. They printed it, and now they use it in their homes.

That's nice.

https://t.co/0eSz99IreP https://t.co/dhhKpJPMLc

182

3 months ago

Being an Old, I have a bit of nostalgia for The Good Old Days of OSS where you shared a thing and maybe some people used it, and there wasn't any influencing or fancy websites or weird drama. It's nice to rediscover that vibe in the 3D printing community...

ph1's tweet photo. Being an Old, I have a bit of nostalgia for The Good Old Days of OSS where you shared a thing and maybe some people used it, and there wasn't any influencing or fancy websites or weird drama.

It's nice to rediscover that vibe in the 3D printing community... https://t.co/YHZA3xBdvD

146

3 months ago

@thesamparr https://t.co/zJBHSzLkqg

3 months ago

@0xblacklight Amazing write-up! Can I steal your subagent context window visualization for a presentation (w. credit!)? Also FYI in "Distributing Tools with Skills" you say you can't package MCPs, scripts etc. in a skill. It's true, but Claude Code's plugins solve exactly for that.

108

3 months ago

Great summary of the things you need to know to succeed with agentic coding (in early 2026 🫠)

Kyle Mistele 🏴‍☠️

@0xblacklight

3 months ago

new blog post just dropped come get your excalidrawslop

188

3 months ago

Amen!

3 months ago

the most powerful but also misunderstood/misused lever you have is subagents

681

ph1 retweeted

3 months ago

Here’s what’s gonna happen: - you replace your code review with feedback loops (sentry, datadog, support tickets, etc) - you stop reading the code - software factory fixes everything - one day something breaks at 3am, agent can’t fix it - nobody’s read the code in 3 months - you have 3 weeks of downtime trying to re-onboard and fix it - you lose significant % of your contracts and users - your company is now dead

250

551

599K

ph1 retweeted

dax

@thdxr

3 months ago

sent this to the team today everything great comes from being able to delay gratification for as long as possible and it feels like we're collectively losing our ability to do that

thdxr's tweet photo. sent this to the team today

everything great comes from being able to delay gratification for as long as possible

and it feels like we're collectively losing our ability to do that https://t.co/HlIpY86eJn

255

697

981K

3 months ago

@GergelyOrosz What percentage of code in large enterprises and public sector do you think is written by an LLM? My educated guess would be 25%, optimistically.