i graduated a year ago. yesterday i had my convocation and it still feels unreal.
four years ago when i joined uni, i didn't have a laptop. i didn't know how to use ms word or create folders. we had a pc at home but i'd get headaches after using it for a few minutes, so i never really learned computers growing up.
first semester, i found programming. fell in love instantly. started spending hours on hackerrank. pre-chatgpt era, so it was just me, a tab open, and stack overflow. then started entering hackathons.
then i found my second love, open source. opened a pr. it got merged. opened a second. third. all merged. never looked back.
the routine for the next four years was simple. do open source. keep showing up.
by the grace of God, i got things i never let myself dream of. gsoc at brown university. linux foundation mentorship. ai engineering roles at companies whose names i used to be intimidated by.
four years of a degree gave me an environment to compete in. that's it. nothing more, nothing less. and none of this would have been possible without God's will.
still feels unreal.
Follow: https://t.co/YFLSqzdx3l
been building agents at @HeyOz_AI and ran into something subtle last week.
claude started missing tools or picking the wrong one mid-conversation. the bigger the list, the worse the recall got.
obvious fix was to let claude search for the tool it needs instead of loading all definitions upfront. that does work. but the moment you start sending different tool subsets per request, your prompt cache breaks. on a real production workload that gets expensive fast.
so I went deeper into anthropic's docs and found defer_loading. one flag on each tool definition. solves both problems.
here's how it works:
you still send all tool definitions to the api on every request. nothing changes in your payload. that's what keeps the prompt cache warm.
but you mark each non-critical tool with defer_loading: true. those tools are stored on the api side but not loaded into claude's context. claude only sees the tool search tool plus your 3-5 critical always-on tools.
when claude needs something specific, it searches by name or description. only the matching tool gets expanded into context. everything else stays deferred.
anthropic published the numbers: opus 4 went from 49% to 74% on mcp accuracy benchmarks with tool search enabled. opus 4.5 went from 79.5% to 88.1%. context savings around 85%.
the takeaway for anyone building agents: if your tool list is growing and recall is dropping, you don't have to choose between context bloat and a broken cache. defer_loading lets you have both.
https://t.co/CGtMHrEubV
Follow: https://t.co/YFLSqzdx3l
@MrAhmadAwais How do you evaluate agents when development is moving so quickly? What rubrics or metrics do you use? I’m also finding it difficult to balance tool-result accuracy with speed at fast scale.
github keeps going down and you don't think people have absorbed how much of the ai stack hangs off this one company.
their own cto publicly admitted it: load growth is outpacing their architecture.. 257 outages between may 2025 and april 2026. the trend is up, not down.
the recent incidents:
- may 23: 13-hour intermittent auth token failures.
- may 26: github actions down 3+ hours. the error said "your account is suspended" because their automated account review system suspended the service account that authenticates workflow runs.
- may 28: copilot degraded for two hours because the upstream openai api had issues.
- june 1: code scanning and billing delays.
- june 3: two separate disruptions.
I mean every ai coding tool you use runs through github. whether it's @claudeai code pulling repos from github, or @cursor_ai reading and writing to github, even @Copilot itself living on github. ci/cd pipelines run on github actions. npm packages publish from github.
ai didn't replace @github , it amplified its centrality.
and tbh i'm not sure what the answer is. gitlab is also centralized, gitea has scaling issues and forgejo is small.
but it feels wrong that the entire global software supply chain hangs on a single platform that's been getting less reliable for six straight months.
every other reel on my feed lately is hyping microsoft markitdown, so i took some time and went through the codebase.
opened up the pdf converter file first:
https://t.co/gkcNRhbm7S
here's what i think.
it's pdfplumber for tables and pdfminer for plain text, with some smart heuristics for detecting column boundaries in borderless tables. that's the core of it.
which made me smile, because this is basically what i've been doing in production for a while. pdfplumber when there are tables, pdfminer when it's prose, a few custom rules for the edge cases that always trip you up. nothing fancy, just stitched together as needed.
the part that's genuinely useful is that @Microsoft centralized it. same clean api for pdf, docx, xlsx, pptx, html, images. one import, one function call, consistent output. i don't have to rewrite the same glue code in every new project anymore.
that alone is worth it.
how are you guys doing this? did you build your own pipeline with pdfplumber + pdfminer? or straight sending each page to vision model?
would love to hear what's been working in production.
https://t.co/bkBQsAm9ZE
now you can make your agent write other agents on the fly. anthropic just published how it works.
anyone like me who's built with claude code for long-running tasks has hit the same three problems.
> claude gets lazy. finishes 35 of 50 items in a complex review and declares done.
> claude grades itself too high. when you ask it to check its own work, it tells you it did great.
> claude forgets the goal in the middle. across many turns, the original instructions quietly drop out.
all three come from the same thing. one big task, one context window, trying to plan and execute everything at once.
the fix is dynamic workflows. @claudeai code now writes its own setup of mini-agents, each handling one focused piece of the task. each mini-agent gets a fresh context and a clear goal. you can run them in parallel, have one check another's work, or have a few of them try different approaches and pick the best result.
trigger it by asking claude for a workflow, or typing "ultracode" before your prompt.
real example. bun was rewritten from zig to rust using this pattern. they broke the migration into small pieces, gave each piece its own mini-agent, had another agent review the change, and merged the ones that passed. a full production language migration done entirely with orchestrated agents.
but heres the catch: workflows use a lot more tokens than a normal claude run. not for "fix this bug." for tasks where you'd previously need five engineers and a week.
i've been waiting for this. it's the missing piece between "ask claude to do something" and "claude actually finishes complex multi-part work."
thanks to @badlogicgames for sharing this
must read:
https://t.co/CCLhQpIEE0
@AnthropicAI just filed for ipo at $965 billion valuation. annualized run-rate revenue: $47 billion.
for comparison, @SamsungUS does $233 billion in actual revenue per year and is valued at $1.54 trillion.
samsung is 5x the revenue. only ~60% more valuable.
that gap is what people mean when they say "ai bubble." it's not that the technology is fake. Claude is real. the demand is real.
but profitability is the part nobody wants to talk about.
right now you can buy a $200/month claude max plan that ships you something like $4000 worth of api usage at posted rates. that gap is the subsidy. anthropic eating cost to capture market share before competitors do. same with cursor, lovable, every "free tier" you see right now.
code became a commodity. vibe coders ship a startup a day. that only works because someone is paying for the inference, and right now that someone is the labs subsidizing their own customers.
the consumer plans aren't going away. but the gap between $200/month and "$4000 of usage" is going to compress, fast. ipo investors don't fund subsidies. they fund margins.
the bubble talk isn't about whether ai works. it's about whether the prices you're seeing today are the prices you'll be paying next year.
Anthropic just filed for ipo at $965 billion valuation. annualized run-rate revenue: $47 billion. for comparison,
@SamsungUS does $233 billion in actual revenue per year and is valued at $1.54 trillion.
samsung is 5x the revenue. only ~60% more valuable.
right now you can buy a $200/month claude max plan that ships you something like $4000 worth of api usage at posted rates. that gap is the subsidy. anthropic eating cost to capture market share before competitors do. same with cursor, lovable, every "free tier" you see right now.
but profitability is the part nobody wants to talk about.
that gap is what people mean when they say "ai bubble." it's not that the technology is fake. @claudeai is real. the demand is real.
vibe coders ship a startup a day. that only works because someone is paying for the inference, and right now that someone is the labs subsidizing their own customers.
the consumer plans aren't going away. but the gap between $200/month and "$4000 of usage" is going to compress, fast. ipo investors don't fund subsidies.
they fund margins. the bubble talk isn't about whether ai works. it's about whether the prices you're seeing today are the prices you'll be paying next year.
two ai projects went viral recently.
one was trained only on books from london between 1800 and 1875. the other only on english text from before 1931. so they have no idea wwi happened or about what is transistors, the moon landing, or the internet.
so the question come: could a model trained only on pre-1900 data have predicted what came next?
my take: no.
same reason chatgpt can't tell you the next big startup idea.
llms synthesize what's already written. they're extraordinary at remixing existing patterns. but the next big idea isn't a remix. it's a leap that doesn't exist in the corpus yet, because no one has written it.
a model trained on 1875 couldn't have invented the transistor. someone had to be invent that first, by people who looked at the world and saw something that wasn't there.
same when you ask a modern llm "what's the next breakthrough." it can extrapolate trends. it can't invent the iphone in 1995, because in 1995, the iphone wasn't in human writing yet.
the future isn't a remix of the past. someone has to invent it first.
Hi I am Samad AI Prod Engineer at @HeyOz_AI
Saw a comment on a reel that stuck with me. "It only takes one person to make the mistake of hiring you."
That's the whole thing, isn't it. You don't need everyone to believe in you. You need one person who's willing to bet on you before you've proven anything.
I think about this a lot because my career was built on exactly that.
I was in my second year of university, no internship, no network, no guidance. I'd heard about LFX, the @linuxfoundation mentorship program. Less than 3% acceptance rate. People with way more experience than me were getting rejected.
The summer after my first year I had started learning git on my own. Small contributions to open source. Branches, pushes, simple logic fixes. Boring stuff. I also built a YouTube clickbait detector from scratch as a side project, using cosine similarity between the title and the transcript translation. Nothing groundbreaking, but it was mine, end to end.
When LFX applications opened, I sent my resume with that project and a few real commits on the org's codebase. Not "fixed a typo in the README" commits. Real ones, showing I'd actually read the code.
That was enough. One person on the maintainer team decided I was worth a shot, even though I had no formal experience and was up against people who did.
That single yes changed my whole career path. GSoC came after. Then Meta Sensing, Canon, and now @HeyOz_AI .
If you're early in your career and feeling like nobody is going to give you a chance, the truth is most people won't. You don't need most people. You need one.
Wrote about the LFX application in detail back then if anyone wants the full breakdown: https://t.co/vb3BTAOMiy
@rawheeel@MrAhmadAwais , you've been through the startup accelerator world.. really appreciate any guidance on what actually mattered when applying to YC, a16z 🙏