wow, amazing tech report. lots of details on every part of the pipeline, especially on data. love that they share the system design of how they train models and do research with their "model factory", and also the negative results from M1 and how they fixed them in XS.2
one of the best tech reports to get up to speed on model training
I had a lot of fun talking about what we are building at @paywithsoap , the future of AI enabled payments infra, how are we thinking about stablecoins, and more! Thanks for having me @BettingStartups
π¨ New episode alert
@jesselearmonth is joined by @FilipMichalsky from @paywithsoap, an AI-native orchestration layer designed to simplify the complex payments stack for real-money gaming operators.
Full episode π
βΆοΈ YouTube: https://t.co/clQQYvnUvA
βͺοΈ Apple: https://t.co/3s0voDwU9y
π’ Spotify: https://t.co/cv2SKy3kF4
π¨ New episode alert
@jesselearmonth is joined by @FilipMichalsky from @paywithsoap, an AI-native orchestration layer designed to simplify the complex payments stack for real-money gaming operators.
Full episode π
βΆοΈ YouTube: https://t.co/clQQYvnUvA
βͺοΈ Apple: https://t.co/3s0voDwU9y
π’ Spotify: https://t.co/cv2SKy3kF4
Yess the mythos moment for form filling is finally here - do you know how many financial forms we need to fill out on daily basis as a payments infra company? Excited for this @browserbase
Do you understand what Browserbase just open-sourced???
an agent that learns any website once, then does the job 10x cheaper forever
[ literally how it helps me ]:
- writing scrapers for new sites (used to spend half a day per site, every single time)
- chasing selectors when sites redesign on a tuesday (lost weeks to this)
- digging out hidden APIs buried in network traffic (gave up on this too many times)
- explaining to my team HOW the agent does the job (was impossible until now)
Autobrowse figured all of that out by itself in 3-5 iterations
and saved the answer as a markdown file the next agent reads BEFORE it starts
[ how it actually works ]:
> give the agent a real task on a real site
> it tries, fails, learns, tries again
> 3-5 rounds and it converges on a path that just works
> writes that path down as SKILL.md
> next agent loads it and skips straight to the answer
the markdown file IS the memory
every browser agent before this had AMNESIA
figured out the site, then forgot the second the session closed
you've been paying the same discovery tax 100 times in a row.. and not noticing
[ Karpathy's auto-research idea, but applied to the web ]:
same idea, just different approach
Karpathy did it for research and coding loops
Autobrowse does it for the open web
the new part:
Karpathy's loop got smarter inside ONE session
Autobrowse SAVES the lesson into a file the next agent reads before it even starts
iteration = graduation
the agent doesn't just learn.. it leaves a note for every agent that comes after
[ the math ]:
Craigslist scrape:
- generic agent loop: $0.22 / 71 seconds
- graduated Autobrowse skill: $0.12 / 27 seconds
form-fill task:
- run 1: $1.40
- run 4: $0.24
run 1 pays for everything that comes after
[ the part that broke my brain ]:
they pointed it at a federal grants portal
agent dug around and found an undocumented JSON endpoint humans had missed for years
a 28-page scrape collapsed into one fetch
> an agent tried something a person never would
> and found something a person would never see..
100% OPEN SOURCE, FREE
I was digging inside of it for the whole morning and got impressed when I saw such as savings on tokens spending
literally for scraping 10 websites, I spent just 12 cents instead of basic $1.02
P.S. Sorry if somewhere my reaction was too "forcing" to setup it, just wanted to mark by BOLD what's the treasure
you can skip this, it's your deal β€οΈ
Do you understand what Browserbase just open-sourced???
an agent that learns any website once, then does the job 10x cheaper forever
[ literally how it helps me ]:
- writing scrapers for new sites (used to spend half a day per site, every single time)
- chasing selectors when sites redesign on a tuesday (lost weeks to this)
- digging out hidden APIs buried in network traffic (gave up on this too many times)
- explaining to my team HOW the agent does the job (was impossible until now)
Autobrowse figured all of that out by itself in 3-5 iterations
and saved the answer as a markdown file the next agent reads BEFORE it starts
[ how it actually works ]:
> give the agent a real task on a real site
> it tries, fails, learns, tries again
> 3-5 rounds and it converges on a path that just works
> writes that path down as SKILL.md
> next agent loads it and skips straight to the answer
the markdown file IS the memory
every browser agent before this had AMNESIA
figured out the site, then forgot the second the session closed
you've been paying the same discovery tax 100 times in a row.. and not noticing
[ Karpathy's auto-research idea, but applied to the web ]:
same idea, just different approach
Karpathy did it for research and coding loops
Autobrowse does it for the open web
the new part:
Karpathy's loop got smarter inside ONE session
Autobrowse SAVES the lesson into a file the next agent reads before it even starts
iteration = graduation
the agent doesn't just learn.. it leaves a note for every agent that comes after
[ the math ]:
Craigslist scrape:
- generic agent loop: $0.22 / 71 seconds
- graduated Autobrowse skill: $0.12 / 27 seconds
form-fill task:
- run 1: $1.40
- run 4: $0.24
run 1 pays for everything that comes after
[ the part that broke my brain ]:
they pointed it at a federal grants portal
agent dug around and found an undocumented JSON endpoint humans had missed for years
a 28-page scrape collapsed into one fetch
> an agent tried something a person never would
> and found something a person would never see..
100% OPEN SOURCE, FREE
I was digging inside of it for the whole morning and got impressed when I saw such as savings on tokens spending
literally for scraping 10 websites, I spent just 12 cents instead of basic $1.02
P.S. Sorry if somewhere my reaction was too "forcing" to setup it, just wanted to mark by BOLD what's the treasure
you can skip this, it's your deal β€οΈ
Introducing Mirage, a unified virtual filesystem for AI agents!
6 weeks. 1.1M+ lines of code. We rewrote bash from the ground up so cat, grep, head, and pipes work across heterogeneous services. S3, Google Drive, Slack, Gmail, GitHub, Linear, Notion, Postgres, MongoDB, SSH, and more, all mounted side-by-side as one filesystem.
Bash that AI agents already know works on every format! cat, grep, head, and wc parse .parquet, .csv, .json, .h5, even .wav! One pipe can stitch S3, Drive, GitHub, Slack, and Linear together, same Unix semantics throughout.
Workspaces are versioned too. Snapshot, clone, and roll back the whole thing with one API call. A two-layer cache turns repeated reads into local lookups, so agent loops stay fast and cheap.
Drop a Workspace into FastAPI, Express, or a browser app. Wire it into OpenAI Agents SDK, Vercel AI SDK, LangChain, Mastra, or Pi. Run it alongside Claude Code and Codex.
Site: https://t.co/zo1orc2wA9
GitHub: https://t.co/zeRAKri7I9
#AIAgents #OpenSource #AgenticAI #Strukto #Filesystem #VFS
We partnered with @PrimeIntellect to build Fast Ask, a small RL-trained subagent that helps our Sheets agent find answers in spreadsheets. It scores +4% over Opus on exact match accuracy at Haiku latency.
@ariG23498 Its the deterministic code for looping inference in & out of the model. We will witness the day when the agent writes its own harness but the day is not today (could be tomorrow afaik)
Low key we think we got some job candidates like this with AI Deepfakes / lip sync / voice over claiming to βlive in Romaniaβ - fortunately their models are not good enough so we could tell still
CrowdStrike CEO @George_Kurtz says North Koreans are posing as American employees to gain access to company laptops, which are then sent by a mule to laptop farms run by the North Korean threat actor group Silent Chollima.
"The laptop that you send them gets sent somewhere in the US to a mule, and that mule takes it to a laptop farm. Then the North Koreans control it and bypass all of your security because you just handed them a laptop."
"We first started notifying customers, saying, 'Hey, we think it's not really Bob in Texas working for you. It's a North Korean.' They said, 'Yeah, you're right,' and went to the hiring manager."
"They told the hiring manager, 'Hey, we need to get rid of this person. It's a North Korean.' And his response was, 'Well, they do really good work. Can we keep them?'"