“If we use technology to replace our thinking, we’ve lost the plot” -Creative Technologist @bycjacksonn “Good AI implementation means less time on screens, not more” #txedFest
Deep inner suffering inevitably arises when the human person is reduced to performance, consumption, or a statistical datum. Many young people today live under the yoke of expectations to perform, immersed in an exasperated competitiveness that generates anxiety, fear of not measuring up, and disorientation.
@cryptopunk7213 new tech is exciting, but I think this is like suggesting, if you give everyone a basketball they could be in the nba.
tools don’t make world class stories, people do. and exceptional original IP is a deeply human/scarce asset that has nothing to do with the ubiquity of tools
Welcome to internet fren. If you're new here, please pull up a chair and let me share some jewels with you.
Don't worry, I'm not high and mighty, I was once like you after all.
But you see, there's laws in this dimension, some that transcend the ones we have in the physical world. They may not be as precisely written as the ones we know, but they definitely exist.
One of those laws being, that there is no such thing as any information being 100% secure. If there exists a logical (or even sometimes illogical) path to any given piece of information, then there is a highly volatile probability that said information will be accessed in a certain period of time.
Closely related is the Streisand law of this dimension. The more visibly you try to suppress a piece of information, the more attention you draw to its existence, and the more curious minds start looking for it. You can hide something in plain sight, or you can make a loud fuss about hiding it. Rarely do both work.
And the last jewel for today, the one that ties all the others together > every security system is really a trust system wearing a technical mask.
Firewalls, access tokens, SSO, vendor agreements, the whole apparatus of enterprise security, all of it rests on human beings underneath.
People get tired, people take shortcuts, people trust the wrong colleague, push to the wrong branch, stand up a vendor environment that's good enough instead of correct, and assume nobody will notice because nobody has noticed yet.
The technical layer is always downstream of the human one, which is why the most sophisticated breaches rarely involve sophisticated code.
https://t.co/rWYvlTLs0W
Today is a big day! We're launching a ~ new ~ version of Claude Code in the desktop app. It's been redesigned from the ground up for parallel work and is a lot faster.
It's been my main way to use Claude Code for the last few weeks.
Today, Sierra is releasing Ghostwriter, our agent for building agents. With Ghostwriter, you can create an AI agent for your customer experience — one that can chat, pick up the phone, speak dozens of languages, take action on your systems of record, and be protected with industry-leading guardrails — simply by having a conversation. No clicking, no forms, no menus.
Codex and Claude Code have transformed how we build software, making it possible for software engineers to orchestrate and review the work rather than doing all the work themselves. We think the same transformation will happen for all software. Rather than every enterprise app having a web app for humans and an API for automation, every software platform’s UI will be an agent that can do the work on your behalf.
I recorded a demo of my building and optimizing an agent with Ghostwriter so you can see how powerful and easy it is to use. It’s completely changed the way our early adopters build agents, and it’s changed the way I think about the software industry. Let me know what you think, and, if you’re interested in trying it out at your business, please reach out directly.
Running a company is just context engineering internally.
Now that skill has even more value in the agentic world. Us tech founders have been doing reps to prepare for this.
thinking: products that help humans get credit for the work accomplished by agents they supervise in the enterprise will have better adoption than agentic solutions that do the work instead of humans.
credit feeds ego, drives adoption...and accountability.
If you have to ask yourself, "Kaleo wtf is this post? What does Dune have to do with Bitcoin? There's zero correlation." - you obviously don't have enough spice in your life.
We're shipping a new feature in Claude Cowork as a research preview that I'm excited about: Dispatch!
One persistent conversation with Claude that runs on your computer. Message it from your phone. Come back to finished work.
To try it out, download Claude Desktop, then pair your phone.
A single rulebook for how the United States polices digital-asset markets SEC and CFTC One of the major hurdles until full tokenization What does this mean for you and your USD?…and your child’s USD… https://t.co/q88DRD65mH
👋 Roughly, the more tokens you throw at a coding problem, the better the result is. We call this test time compute.
One way to make the result even better is to use separate context windows. This is what makes subagents work, and also why one agent can cause bugs and another (using the same exact model!) can find them. In a way, it’s similar to engineers — if I cause a bug, my coworker reviewing the code might find it more reliably than I can.
In the limit, agents will probably write perfect bug-free code. Until we get there, multiple uncorrelated context windows tends to be a good approach.
I can assure the simultaneous mainstreaming of self-tuning by Karpathy and the efforts of Anthropic and OAI to automate PRs and security reviews is not coincidental
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project.
This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.:
- It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work.
- It found that the Value Embeddings really like regularization and I wasn't applying any (oops).
- It found that my banded attention was too conservative (i forgot to tune it).
- It found that AdamW betas were all messed up.
- It tuned the weight decay schedule.
- It tuned the network initialization.
This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism.
https://t.co/WAz8aIztKT
All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges.
And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.