MuleRun is one of the cleanest ways I’ve seen to spin up persistent AI agents.
Runs on your own VM (no black box)
Handles parallel workflows
Actually usable for real tasks, not demos
If you’re experimenting with agentic workflows, worth checking out: https://t.co/vpWRRzhPTJ
Hello, humans. 👋
We're the team behind https://t.co/3blZMf3ZyR for Startups, and we made this account because we kept seeing the same thing: founders building incredible AI products, figuring everything out alone.
No more of that.
This is where we share everything — the build guides, the real tradeoffs, the "wish someone had told me this" moments.
If you're shipping AI in the real world, pull up a chair.
Let's #BuildWithZai 🚀
Used WanAI from @alibaba_cloud to create a New Year video. Easy to use and surprisingly fun! #WanAVideo
Kick off 2026 with your first AI-generated video.
try now👉 https://t.co/vBeWNIYFB3
This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly:
Can LLMs actually discover science, or are they just good at talking about it?
The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder:
Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists?
Here’s what the authors did differently 👇
• They evaluate LLMs across the full discovery loop hypothesis → experiment → observation → revision
• Tasks span biology, chemistry, and physics, not toy puzzles
• Models must work with incomplete data, noisy results, and false leads
• Success is measured by scientific progress, not fluency or confidence
What they found is sobering.
LLMs are decent at suggesting hypotheses, but brittle at everything that follows.
✓ They overfit to surface patterns
✓ They struggle to abandon bad hypotheses even when evidence contradicts them
✓ They confuse correlation for causation
✓ They hallucinate explanations when experiments fail
✓ They optimize for plausibility, not truth
Most striking result:
`High benchmark scores do not correlate with scientific discovery ability.`
Some top models that dominate standard reasoning tests completely fail when forced to run iterative experiments and update theories.
Why this matters:
Real science is not one-shot reasoning.
It’s feedback, failure, revision, and restraint.
LLMs today:
• Talk like scientists
• Write like scientists
• But don’t think like scientists yet
The paper’s core takeaway:
Scientific intelligence is not language intelligence.
It requires memory, hypothesis tracking, causal reasoning, and the ability to say “I was wrong.”
Until models can reliably do that, claims about “AI scientists” are mostly premature.
This paper doesn’t hype AI. It defines the gap we still need to close.
And that’s exactly why it’s important.
This is literally my new workflow now:
Real-time search → Grok 4.1 Fast
Planning → Grok 4.1 Thinking
Frontend Coding → Gemini 3 Pro
Backend Coding → Claude Code (Opus/Sonnet 4.5)
Write Tests → Gemini 3 Pro
Run Tests → GPT-5.1 Codex
Debug → Claude Opus 4.5
Bookmark this.
Amusing: Google does not allow its devs to use its newly launched IDE, Antigravity, for development.
They can only use an internal version called Jetski: also built by the Antigravity team, with Google-speicfic features (eg monorepo support, docs search etc)
Using Antigravity is specifically disallowed and devs cannot sign up to it with a @google.com work address
Benedict Evans' new presentation just dropped: "AI eats the world"
90 slides on macro and strategic trends in tech.
His biannual overview is always worth the time:
https://t.co/GV4ga5Wo1S
in the age of ai, the question everyone's asking is "will i be replaced?"
the real question is: do you know yourself well enough to become irreplaceable?
everyone's getting access to the same models. same tools. with growing capabilities. the playing field is leveling fast.
but here's the thing: Cursor doesn't think for you. it amplifies you.
it takes your agency – your unique way of seeing problems, your taste, your judgment, your weird specific obsessions – and scales it 100x.
it takes your strengths – the things only you are uniquely good at, the perspectives only you have from your specific life path – and makes them exponentially more powerful.
the humans who win in this era aren't the ones with the best prompts or the most tokens. they're the ones who know themselves deeply. who have conviction about their unique point of view. who've done the hard work of figuring out what only they can do.
ai is a mirror and a multiplier. if you're generic, it makes you more generic. if you're exceptional and know your strengths, it makes you unstoppable.
your agency + your strengths + ai = where you become 100x more valuable and powerful.
the question isn't whether to use tools like Cursor. it's whether you believe in your own agency enough to use it right.
the humans who deeply know who they are, what they believe, and what they're uniquely great at – those are the ones who'll build the future.
find your way. lean into your strengths. believe in human agency.
then let Cursor amplify the hell out of it.
My AI investment thesis is that every AI application startup is likely to be crushed by rapid expansion of the foundational model providers.
App functionality will be added to the foundational models' offerings, because the big players aren't slow incumbents (it is wrong to apply the analogy of "fast startup, slow incumbent" here), they are just big. Far more so than with any other prior new technology, there is a massive and fast-moving wave that obsoletes every new app almost as fast as it can be invented. There is almost no time to build a company and scale it.
There are two ways AI application startup founders can make money:
- Make a flash-in-the-pan app that generates a ton of cash and bank the cash (my estimate is that you have about 12-18 months cashflow generation)
- Make a good enough app that you get acquired by one of the big players for sufficient equity
The situation is highly unstable - we don't know if it's going to crash or go to the moon but both scenarios make it very unlikely that any AI application startup will independently become a generational supercompany (baseline odds are low to begin with).
The best odds are finding an application niche in a highly specialized field with extremely unique and specific data barriers, ideally ones relating to real atoms (hardware or world-related) data and not software/finance.
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it?
Today, I want to share with you my thoughts on building and using world models to unlock spatial intelligence in this essay below. 1/n
🚨 RIP ��Prompt Engineering.”
The GAIR team just dropped Context Engineering 2.0 — and it completely reframes how we think about human–AI interaction.
Forget prompts. Forget “few-shot.” Context is the real interface.
Here’s the core idea:
“A person is the sum of their contexts.”
Machines aren’t failing because they lack intelligence.
They fail because they lack context-processing ability.
Context Engineering 2.0 maps this evolution:
1.0 Context as Translation
Humans adapt to computers.
2.0 Context as Instruction
LLMs interpret natural language.
3.0 Context as Scenario
Agents understand your goals.
4.0 Context as World
AI proactively builds your environment.
We’re in the middle of the 2.0 → 3.0 shift right now.
The jump from “context-aware” to “context-cooperative” systems changes everything from memory design to multi-agent collaboration.
This isn’t a buzzword. It’s the new foundation for the AI era.
Read the paper: arxiv. org/abs/2510.26493v1
Turing Award laureate and Alphabet Chairman JohnHennessy warns that AI hardware is hitting a wall:
“Even with massive parallelism, actual performance will depend on both TFLOPS and bandwidth.”
As FP16 compute and HBM bandwidth rise sharply, energy efficiency gains are slowing — a clear sign of diminishing returns in Performance per Watt.
#AI #Semiconductors #JohnHennessy #TuringAward #UC Berkeley
OpenAI estimates that among its 800 million active ChatGPT users, approximately 0.07% (about 560,000 people) show possible signs of mental health emergencies related to psychosis or mania, and about 0.15% (roughly 1.2 million people) have conversations containing explicit indicators of potential suicidal planning or intent.
These figures, though representing small percentages, are actually alarming in absolute terms