Introducing Harness-1, a 20B search agent trained with a state-externalizing harness.
> frontier-level long-horizon search, rivaling Opus-4.6 and outperforming GPT-5.4
> Context-1-level cost and latency
> externalizes candidates, evidence, verification, and search history
> open-source
I just shipped message-ui, build dynamic iMessage attachments with React.
◆ Charts
◆ Tables
◆ Text primitives
◆ Local preview + PNG export
◆ Tailwind support
◆ Works with Chat SDK
Link ⬇️🧵
Introducing Email SDK
A unified email SDK for sending with any email provider. One clean TypeScript API.
- Multiple providers: Resend, Postmark, SendGrid, Mailgun, Brevo, SMTP, and more
- Built-in fallbacks and retries
- Agent Skill included
- CLI for testing
been working on Harness, a macOS terminal.
built it all from the metal up. no libghostty, no tmux, no cmux - all one custom build.
It’s a terminal built for speed and agentic workflows.
Spent 20 mins faking a perspective mockup in Figma?
Stop.
→ https://t.co/5oNjpwX4IO
Drop in a screenshot, tilt it in 3D, animate it with keyframes, export an MP4.
No login.
No watermark.
No upload.
Just ridiculously good product shots.
Robotics is still data starved. Collecting high-quality robot demonstrations remains brutally slow and expensive.
Introducing COBALT: A cloud-native teleoperation platform designed for large-scale robot learning.
We are democratizing data collection by leveraging the hardware everyone already owns: the smartphone
All you need is to download an app (today)!
Read on for more!
Introducing Vocs v2: a minimal docs framework designed for agents and humans.
Flexible docs that stay simple at the source, rich in the browser, and easy for agents to consume.
$ 𝚗𝚙𝚖 𝚒𝚗𝚒𝚝 𝚟𝚘𝚌𝚜
Excited to release a new repo: abcGPT!
It can be hard to "dial in" the voice you want from an LLM, because an LLM is a tangled superposition of millions of voices from millions of different authors around the world. Instead, frontier LLMs tend to give that slop-ish / generic / corporate tone that's hard to avoid, even with aggressive prompting and an informative context window.
Lately I've been experimenting with some ideas on the fringes of attribution/unlearning, trying to make it so an AI user can "dial in" the specific voice/style/sources they want to use in a way that's more rigorous than prompting/context-engineering. and I'm starting to get pretty good results.
the model below uses the following technique:
- Take nanoGPT as written by @karpathy
- Assign each neuron a random "specialty score" m between 0 and 1, sampled from a U-shape so most neurons land near 0 or near 1 with some in the middle.
- Freeze this "m" for the lifetime of the network (it's the neuron's permanent corpus assignment)
- Extend the forward() code with an α parameter, a kind of vibe-fader from 0 to 1. Think of each neuron's m as its position on that same slider. The slider acts like a spotlight: it lights up neurons whose m is near its current position, and silences those far away. Slide all the way to 0, and only TinyStories specialists fire. Slide all the way to 1, and only Shakespeare specialists fire.
- Train this new nanoGPT on two datasets (in this case, TinyStories and Shakespeare)
- During training, sample α from Beta(0.5, 0.5) AND draw the corpus from Bernoulli(α), so a Shakespeare batch tends to come with a high-α (Shakespeare-favoring) gate, and a TinyStories batch tends to come with low-α.
- train until golden brown 🧑🍳
Perhaps surprisingly... it works! ¯\_(ツ)_/¯ The neurons we pre-assigned to Shakespeare learn to behave as Shakespeare specialists. the neurons we pre-assigned to TinyStories become children's-story specialists. the halfsies learn to bridge between them.
After training, you can play with the kindof... vibe dial... you can "dial in" the voice you want during inference, by choosing whether to lean on Shakespeare or TinyStories neurons more or less. 📀💿
When you fully dial in Shakespeare neurons, the model only outputs tokens which look like Shakespeare, and when you fully dial in TinyStories, the model only outputs tokens which look like children's stories, and... (honestly this was the hard part)... everywhere inbetween!
In a way, it's partitioning statistical signal into fuzzy segments, and then the end user can choose which pre-training data sources they want to lean upon for generation... and how much.
My goal was to get a version of this working at
scale, with clear intuition for why it works, and I'd like to explore ways to scale up this effect to large numbers of sources and larger models, and study the interplay between individuality/generality as scale increases.
Link to repo and a detailed walkthrough of the abcGPT methodology in the reply.
Great read -- all it really takes is:
- a harness
- connectors to your data/tools
- reliable, always-accessible agent(s)
The models have reached the inflection point where it's not more complicated than this
I've been using state-of-the-art models to teach small models running on my computer how I work.
The result : a personal agent that runs my inbox, my deal pipeline, my blog, my calendar, & my research.
🚀 Introducing SkillOpt — an optimizer for agent skills.
Instead of finetuning model weights, we treat a natural-language skill as a trainable external parameter.
Think of it as deep learning for the frontier-model + agent era: learning rate, LR schedule, mini-batch, batch size, epoch, momentum — all in text-space optimization.
SkillOpt enables stable, controllable skill updates through bounded edits, allowing the optimizer to summarize “gradient directions” from agent experience and continuously improve procedural capability.
We evaluate SkillOpt across 6 benchmarks and 7 models, under both direct model calls and real agent execution loops with Codex + Claude Code. SkillOpt achieves best or tied-best results in 52/52 settings.
Train the skill, not the model. 🛠️🤖
🌐 https://t.co/zinqcX2wfQ
📄 https://t.co/pCI4VWdpih
Today I launched my project — Arc AI Logistics.
Arc AI Logistics is an AI-agent powered freight coordination demo built with @Circle + @Arc.
The system uses specialized AI agents to analyze shipment opportunities using GPS location, route intelligence, ETA, profitability, and risk analysis — then coordinates a USDC-denominated paid agent run with on-chain proof simulation.
The goal is to explore how stablecoin-native AI agent coordination and nanopayment-style execution can reduce operational costs and improve logistics efficiency.
Live demo: https://t.co/FLXcSRRjj9
Git: https://t.co/CMvbhtGb4R
The goal is defined.
The tasks are clear.
The demand is real.
Now begins the most interesting stage of my life.
Claude code’s /security-review is just a Skill, and the whole prompt is in this repo
It’s p generic and imo you can tailor it to each repo to language you’re scanning to get better results
https://t.co/1a4puZSASL
Supply chain attacks are becoming a massive problem, from happening every few months its happening every few days.
Think only real protection against it so proper virtualisation layer. But it has to be convenient, natural part of workflow.
That's why created new open source tool, fend off those attacks and try https://t.co/NasovW29Z5
still beta and open to suggestions!
Every engineer should read this.
The principles for building reliable software systems have been around for a long time. Max outlines them beautifully.
Here's to getting that 99.99% on your status page.
https://t.co/HFDcriLodl