We're building a Moon Base!
@NASAMoonBase will serve as a habitat where astronauts live and work during long-term science missions.
Join us at 2pm ET on Tuesday, May 26, for a live news event where weβll share updates on our lunar exploration plans: https://t.co/IJXA7xYwju
Bull case: people are thinking before they write and the production becomes easier
Bear case: the thinking is "write me a tweet about this data"
OpenAI data on ChatGPT usage shows that across knowledge work, AI is most used where artifacts get produced.
When lawyers use ChatGPT, drafting documents (45%) beats asking for advice (42%). AI is used more to produce artifact rather than make the judgment.
For most expert work, the artifact is writing. For engineering, it's code. For IT, it's procedure. In 9 of 12 business functions, the dominant artifact is writing.
When using AI, Finance writes memos more than it builds models. Sales drafts proposals more than it closes deals. Researchers write papers more than they run experiments.
As artifact production collapses in cost, thinking before producing becomes asymmetrically valuable.
h/t @RonnieChatterji and the @OpenAI Signals data
*disclaimer, this tweet was written by hand
@sama@ycombinator π Sam, i know a lot of startups doing amazing things in the Nordics (the next YC hub), that would take you up on that offer. Any way to apply?
Genuinely impressive release by Google today (remember when they were behind?)
Gemini 3.5 Flash perf:
* Building on prior strengths (83.6% of MMMU-Pro for multimodal),
* big jump on agentic coding (76.2% on Terminal-Bench for agentic coding and 56.5% on Toolathon for real world tasks)
* progress and expert tasks (57.9% on Finance Agent 2... we are cooked)
* leading scores across SWE-Bench, OSWorld etc.
(also, elegant to bold the top scores in the chart below even if when it's not Google leading)
Ofc, just benchmarks, and also not cheap (~$9/M output), but Google is cookin'... we are all so spoiled to have the 3 labs compete
Pretty much all the CEOs that I've met have a similar setup (and, for some reason, seem to believe that theirs is special). Itβs usually just a bunch of markdown files. Setup in a smart way and APIs/MCP with all the tools they use. Then they try to expand some version of that to the org.
Most don't go as deep with context as I think you need to go for it to work properly, but always seeing new and interesting things.
Happy to do a show and tell.
@t_blom Super interesting - would be great to undertand how big of a file it ends up being - (and the vector embedding), and if/how you can make it useful.
I've built my own version of gbrain somewhat organically, but lots of great useful things in there.
v3 of @slashlast30days is here. 20,000+β on GitHub. The biggest upgrade yet.
An AI agent-led search engine scored by upvotes, likes, and real money - not editors. Reddit comments, X posts, and YouTube transcripts are now FREE. No API keys needed for the core sources.
v3 killer feature: intelligent search. Before it searches, a Python pre-research brain resolves X handles, subreddits, TikTok hashtags, and YouTube channels for your topic. It finds the RIGHT places to search before the LLM judge assembles the report. Shout out to @jeffreysperling for building this engine
New in v3:
- Free Reddit, X, and YouTube (no API keys)
- Intelligent pre-research engine
- Best Takes (the funniest Reddit comments are first-class)
- Cross-source cluster merging
- Single-pass comparisons (X vs Y in 5 min, not 12)
- GitHub person-mode
- ELI5 mode
This is the simplest distillation of what I have learned about agentic engineering this year
Push smart fuzzy operations humans do into markdown skills. Fat skills.
Push must-be-perfect deterministic operations into code. Fat code.
The harness? Keep it thin.
Another week on the road meeting with a couple dozen IT and AI leaders from large enterprises across banking, media, retail, healthcare, consulting, tech, and sports, to discuss agents in the enterprise.
Some quick takeaways:
* Clear that weβre moving from chat era of AI to agents that use tools, process data, and start to execute real work in the enterprise. Complementing this, enterprises are often evolving from βlet a thousand flowers bloomβ approach to adoption to targeted automation efforts applied to specific areas of work and workflow.
* Change management still will remain one of the biggest topics for enterprises. Most workflows arenβt setup to just drop agents directly in, and enterprises will need a ton of help to drive these efforts (both internally and from partners). One company has a head of AI in every business unit that roles up to a central team, just to keep all the functions coordinated.
* Tokenmaxxing! Most companies operate with very strict OpEx budgets get locked in for the year ahead, so theyβre going through very real trade-off discussions right now on how to budget for tokens. One company recently had an idea for a βshark tankβ style way of pitching for compute budget. Others are trying to figure out how to ration compute to the best use-cases internally through some hierarchy of needs (my words not theirs).
* Fixing fragmented and legacy systems remain a huge priority right now. Most enterprises are dealing with decades of either on-prem systems or systems they moved to the cloud but that still havenβt been modernized in any meaningful way. This means agents canβt easily tap into these data sources in a unified way yet, so companies are focused on how they modernize these.
* Most companies are *not* talking about replacing jobs due to agents. The major use-cases for agents are things that the company wasnβt able to do before or couldnβt prioritize. Software upgrades, automating back office processes that were constraining other workflows, processing large amounts of documents to get new business or client insights, and so on. More emphasis on ways to make money vs. cut costs.
* Headless software dominated my conversations. Enterprises need to be able to ensure all of their software works across any set of agents they choose. They will kick out vendors that donβt make this technically or economically easy.
* Clear sense that it can be hard to standardize on anything right now given how fast things are moving. Blessing and a curse of the innovation curve right now - no one wants to get stuck in a paradigm that locks them into the wrong architecture. One other result of this is that companies realize theyβre in a multi-agent world, which means that interoperability becomes paramount across systems.
* Unanimous sense that everyone is working more than ever before. AI is not causing anyone to do less work right now, and similar to Silicon Valley people feel their teams are the busiest theyβve ever been.
One final meta observation not called out explicitly. It seems that despite Silicon Valleyβs sense that AI has made hard things easy, the most powerful ways to use agents is more βtechnicalβ than prior eras of software. Skills, MCP, CLIs, etc. may be simple concepts for tech, but in the real world these are all esoteric concepts that will require technical people to help bring to life in the enterprise.
This both means diffusion will take real work and time, but also everyoneβs estimation of engineering jobs is totally off. Engineers may not be βwritingβ software, but they will certainly be the ones to setup and operate the systems that actually automate most work in the enterprise.
@sama For context I lost my wallet x3 times when travelling abroad, and in all cases random people found my number, called me, and gave it back (and refused to get awards for it). Most people are genuinely kind. (And i have travelled a lot).
It's probably a great talk, but there are so many great ones out there and time is valuable. So what I really want to do when I see this post is to prompt it and ask "tell me everything relevant about Mythos" (which is the hook) - then decide if I wan't to watch those parts or everything (or just read the prompt results) But I can't. So currently I need to find this on youtube -> copy transcript -> do my prompt.
Make it easier for us plz! π
I've been using a ton of stuff from your G-Stack (thank you!) but I've bundled so many things together to just do all the workflows combined rather than doing them step by step.
I'm also trying to figure out the right balance between the number of tabs you have and context switching between them or staying in that same tab. Usually using high effort Opus 4.6 for everything but not sure that's the best way.
every new model generation you see the pinch of the bitter lesson.
harnesses, pipelines, rules which previously felt important now hold you back from innovating.
what took months of grind for you is now just a prompt away at Β½ the cost.
look for it and you will see. Both large and small companies re-evaluating. Company directions change before your eyes.
itβs a wild moment for our industry