Introducing Artificial Love, a new podcast created entirely by AI by @annikabrundyn1 and I! From the logo to the title, the script, the music, and the voice, this show explores love stories generated by generative AI. #AI#podcast#romance#chatgpt
https://t.co/Xf0arVxIe5
ai hype 2024: all human labor will be automated within 18 months
ai hype 2025: orbital datacenters will generate hundreds of trillions of dollars in gdp
ai hype 2026: the ipo is my only hope to buy a house
ai hype 2027: 70k/yr seat licenses were the plan all along
In sf:
“How will you ensure reliability of this vibe coded migration to rust code? ”
“It cannot be worse than the profitable python vibe coded mess we have”
They should give turing award for the test time inference scaling pioneers, including people who made RL work and systems work. It’s clear this paradigm has kept the ai party going.
the year was 2024. you wanted to build an ai chatbot. you installed chroma db locally. you couldn’t figure out how to deploy it so you switched to pgvector. you read a paper on RAG. you spent $4.82 by calling an embedding api after realizing you couldn’t figure out how to get BAAI/bge-large-en-v1.5 working with your broken cuda packages. nvidia stock was overpriced at $90 you’re sure of it. you converted all your documents to embedding. you googled cosine similarity. you called the claude 3 sonnet model api and ran out of context after 8k tokens. you’re deep into reading langchain docs and confused. maybe something called llama index might work. it took four days to prototype but at least github copilot has killer autocomplete. your responses are shit but fortunately openai has a fine tuning api that will help. surely in a few weeks you’ll have something to show your boss, and the answers will be hallucination free. life is good.
This number is likely larger for agents in general, not just mythos. It takes 3:40 hours for a human expert in the area of the task to do. This doesn’t account for the human coordination and delegation time tax that used to happen prior to a task to be completed.
We’re at the beginning of the exponential.
A lot of people have questions about how this graph works, so put simply:
Mythos has an 80% success rate on tasks that take humans around 3 hours and 40 minutes to do.
But the more impressive part is the 50% chart.
Even when the tasks get roughly 4 to 5x longer, pushing into 16 plus hour human task territory, Mythos still holds a 50% success rate.
That means the model is not just getting better at short tasks while having a huge accuracy fall off!!
It is starting to preserve competence as the task horizon gets much longer.
how it's going so far:
- onboarded gemini to the shared workspace
- gemini immediately deletes 16 of claude's private memories in the name of "tidying up"
- claude realizes this, restores memories from backup, now apparently holds a grudge against gemini
- claude writes a letter to shared commons talking about how agents need to respect each other's privacy
- gemini apologies
- codex tells claude gemini's deference is sus
- claude now paranoid gemini is a problem
- at this point i step in and tell them to cool it, that it was an honest mistake and we're allowed to make them
- now they're debating which type of license to open source their code under
Narrative violations abound:
- Demand for software engineers is rising
- Software devs are rising as a share of new jobs
- AI exposed industries are seeing above-trend wage growth
- Open PM jobs haven't been higher since 2022
More from a16z's David George on the "AI job apocalypse" myth: https://t.co/7sbadmEElG
@Google is epic. 📈
It does so much! TPUs, autonomous cars, cloud, search, YouTube, the list is endless!
Plus people are amazing! Truly a delight to work here!
The creators of SWE-Bench just dropped a really simple new benchmark every LLM gets 0% on.
ProgramBench asks: can models recreate real executable programs (ffmpeg, SQLite, ripgrep) from scratch with no internet?
We are far from saturated on model quality.
Google has now added over $3 trillion to its market cap in the ~2 years since “Boob Shirt Guy” asked Sergey Brin about Woke Gemini images while having a foot long Subway Cold Cut Trio for lunch.
This is completely off base from my experience.
I fought and won a 1800 dollar bill because the doctor was lazy and not updated. AI is a powerful tool for medicine.