VickyInBeta

@VickyinSF

Tech Writer

Bay Area

Joined September 2011

244 Following

326 Followers

228 Posts

VickyInBeta

@VickyinSF

3 months ago

MuleRun is one of the cleanest ways I’ve seen to spin up persistent AI agents. Runs on your own VM (no black box) Handles parallel workflows Actually usable for real tasks, not demos If you’re experimenting with agentic workflows, worth checking out: https://t.co/vpWRRzhPTJ

VickyInBeta

@VickyinSF

3 months ago

Agentic workflows > chat At this workshop you’ll: • Deploy an always-on agent (Debian VM) • Run parallel workflows • Build proactive execution loops Powered by MuleRun v310 + OpenClaw (no cloud lock-in, no black box) 🏆 $1K prizes SF | Tuesday | 5:30 PM https://t.co/k1VnqUWDwu

VickyinSF retweeted

Z.ai for Startups

@ZaiforStartups

4 months ago

Hello, humans. 👋 We're the team behind https://t.co/3blZMf3ZyR for Startups, and we made this account because we kept seeing the same thing: founders building incredible AI products, figuring everything out alone. No more of that. This is where we share everything — the build guides, the real tradeoffs, the "wish someone had told me this" moments. If you're shipping AI in the real world, pull up a chair. Let's #BuildWithZai 🚀

ZaiforStartups's tweet photo. Hello, humans. 👋

We're the team behind https://t.co/3blZMf3ZyR for Startups, and we made this account because we kept seeing the same thing: founders building incredible AI products, figuring everything out alone.

No more of that.

This is where we share everything — the build guides, the real tradeoffs, the "wish someone had told me this" moments.

If you're shipping AI in the real world, pull up a chair.
Let's #BuildWithZai 🚀

VickyinSF retweeted

Matthew Segura

@mhtua

5 months ago

Not betting on one model but orchestrating many. This is how AI coding actually scales.

14K

Who to follow

@ct1e3

赛博CK

@cyberchuangke

$NET #Qwen @GIPHY 野生布道师 | 体育推 #HereWeGo #GoSpursGo

VickyinSF retweeted

5 months ago

A great read for the day

100

522K

VickyInBeta

@VickyinSF

6 months ago

Used WanAI from @alibaba_cloud to create a New Year video. Easy to use and surprisingly fun! #WanAVideo Kick off 2026 with your first AI-generated video. try now👉 https://t.co/vBeWNIYFB3

43K

VickyInBeta

@VickyinSF

6 months ago

claude for sure

Ben Lang

@benln

6 months ago

Best AI product of 2025? All answers welcome, curious!

534

611

460

556K

VickyinSF retweeted

Alex Veremeyenko

@alex_verem

6 months ago

This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly: Can LLMs actually discover science, or are they just good at talking about it? The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder: Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists? Here’s what the authors did differently 👇 • They evaluate LLMs across the full discovery loop hypothesis → experiment → observation → revision • Tasks span biology, chemistry, and physics, not toy puzzles • Models must work with incomplete data, noisy results, and false leads • Success is measured by scientific progress, not fluency or confidence What they found is sobering. LLMs are decent at suggesting hypotheses, but brittle at everything that follows. ✓ They overfit to surface patterns ✓ They struggle to abandon bad hypotheses even when evidence contradicts them ✓ They confuse correlation for causation ✓ They hallucinate explanations when experiments fail ✓ They optimize for plausibility, not truth Most striking result: `High benchmark scores do not correlate with scientific discovery ability.` Some top models that dominate standard reasoning tests completely fail when forced to run iterative experiments and update theories. Why this matters: Real science is not one-shot reasoning. It’s feedback, failure, revision, and restraint. LLMs today: • Talk like scientists • Write like scientists • But don’t think like scientists yet The paper’s core takeaway: Scientific intelligence is not language intelligence. It requires memory, hypothesis tracking, causal reasoning, and the ability to say “I was wrong.” Until models can reliably do that, claims about “AI scientists” are mostly premature. This paper doesn’t hype AI. It defines the gap we still need to close. And that’s exactly why it’s important.

alex_verem's tweet photo. This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly:

Can LLMs actually discover science, or are they just good at talking about it?

The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder:

Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists?

Here’s what the authors did differently 👇

• They evaluate LLMs across the full discovery loop hypothesis → experiment → observation → revision
• Tasks span biology, chemistry, and physics, not toy puzzles
• Models must work with incomplete data, noisy results, and false leads
• Success is measured by scientific progress, not fluency or confidence

What they found is sobering.

LLMs are decent at suggesting hypotheses, but brittle at everything that follows.

✓ They overfit to surface patterns
✓ They struggle to abandon bad hypotheses even when evidence contradicts them
✓ They confuse correlation for causation
✓ They hallucinate explanations when experiments fail
✓ They optimize for plausibility, not truth

Most striking result:

`High benchmark scores do not correlate with scientific discovery ability.`

Some top models that dominate standard reasoning tests completely fail when forced to run iterative experiments and update theories.

Why this matters:

Real science is not one-shot reasoning.

It’s feedback, failure, revision, and restraint.

LLMs today:

• Talk like scientists
• Write like scientists
• But don’t think like scientists yet

The paper’s core takeaway:

Scientific intelligence is not language intelligence.

It requires memory, hypothesis tracking, causal reasoning, and the ability to say “I was wrong.”

Until models can reliably do that, claims about “AI scientists” are mostly premature.

This paper doesn’t hype AI. It defines the gap we still need to close.

And that’s exactly why it’s important.

378

VickyinSF retweeted

Min Choi

@minchoi

7 months ago

This is literally my new workflow now: Real-time search → Grok 4.1 Fast Planning → Grok 4.1 Thinking Frontend Coding → Gemini 3 Pro Backend Coding → Claude Code (Opus/Sonnet 4.5) Write Tests → Gemini 3 Pro Run Tests → GPT-5.1 Codex Debug → Claude Opus 4.5 Bookmark this.

176

252

208K

VickyinSF retweeted

Gergely Orosz

@GergelyOrosz

7 months ago

Amusing: Google does not allow its devs to use its newly launched IDE, Antigravity, for development. They can only use an internal version called Jetski: also built by the Antigravity team, with Google-speicfic features (eg monorepo support, docs search etc) Using Antigravity is specifically disallowed and devs cannot sign up to it with a @google.com work address

131

783

583K

VickyinSF retweeted

Thomas Chua

@SteadyCompound

7 months ago

Benedict Evans' new presentation just dropped: "AI eats the world" 90 slides on macro and strategic trends in tech. His biannual overview is always worth the time: https://t.co/GV4ga5Wo1S

SteadyCompound's tweet photo. Benedict Evans' new presentation just dropped: "AI eats the world"
90 slides on macro and strategic trends in tech.

His biannual overview is always worth the time:

https://t.co/GV4ga5Wo1S https://t.co/MajZJ4jrPb

344

880K

VickyinSF retweeted

Ryo Lu

@ryolu_

7 months ago

in the age of ai, the question everyone's asking is "will i be replaced?" the real question is: do you know yourself well enough to become irreplaceable? everyone's getting access to the same models. same tools. with growing capabilities. the playing field is leveling fast. but here's the thing: Cursor doesn't think for you. it amplifies you. it takes your agency – your unique way of seeing problems, your taste, your judgment, your weird specific obsessions – and scales it 100x. it takes your strengths – the things only you are uniquely good at, the perspectives only you have from your specific life path – and makes them exponentially more powerful. the humans who win in this era aren't the ones with the best prompts or the most tokens. they're the ones who know themselves deeply. who have conviction about their unique point of view. who've done the hard work of figuring out what only they can do. ai is a mirror and a multiplier. if you're generic, it makes you more generic. if you're exceptional and know your strengths, it makes you unstoppable. your agency + your strengths + ai = where you become 100x more valuable and powerful. the question isn't whether to use tools like Cursor. it's whether you believe in your own agency enough to use it right. the humans who deeply know who they are, what they believe, and what they're uniquely great at – those are the ones who'll build the future. find your way. lean into your strengths. believe in human agency. then let Cursor amplify the hell out of it.

967

127

472

124K

VickyinSF retweeted

Yishan

@yishan

7 months ago

My AI investment thesis is that every AI application startup is likely to be crushed by rapid expansion of the foundational model providers. App functionality will be added to the foundational models' offerings, because the big players aren't slow incumbents (it is wrong to apply the analogy of "fast startup, slow incumbent" here), they are just big. Far more so than with any other prior new technology, there is a massive and fast-moving wave that obsoletes every new app almost as fast as it can be invented. There is almost no time to build a company and scale it. There are two ways AI application startup founders can make money: - Make a flash-in-the-pan app that generates a ton of cash and bank the cash (my estimate is that you have about 12-18 months cashflow generation) - Make a good enough app that you get acquired by one of the big players for sufficient equity The situation is highly unstable - we don't know if it's going to crash or go to the moon but both scenarios make it very unlikely that any AI application startup will independently become a generational supercompany (baseline odds are low to begin with). The best odds are finding an application niche in a highly specialized field with extremely unique and specific data barriers, ideally ones relating to real atoms (hardware or world-related) data and not software/finance.

13K

10K

21M

VickyinSF retweeted

Fei-Fei Li

@drfeifei

7 months ago

AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on building and using world models to unlock spatial intelligence in this essay below. 1/n

294

766

926K

VickyinSF retweeted

Robert Youssef

@rryssf

8 months ago

🚨 RIP ��Prompt Engineering.” The GAIR team just dropped Context Engineering 2.0 — and it completely reframes how we think about human–AI interaction. Forget prompts. Forget “few-shot.” Context is the real interface. Here’s the core idea: “A person is the sum of their contexts.” Machines aren’t failing because they lack intelligence. They fail because they lack context-processing ability. Context Engineering 2.0 maps this evolution: 1.0 Context as Translation Humans adapt to computers. 2.0 Context as Instruction LLMs interpret natural language. 3.0 Context as Scenario Agents understand your goals. 4.0 Context as World AI proactively builds your environment. We’re in the middle of the 2.0 → 3.0 shift right now. The jump from “context-aware” to “context-cooperative” systems changes everything from memory design to multi-agent collaboration. This isn’t a buzzword. It’s the new foundation for the AI era. Read the paper: arxiv. org/abs/2510.26493v1

rryssf's tweet photo. 🚨 RIP ��Prompt Engineering.”

The GAIR team just dropped Context Engineering 2.0 — and it completely reframes how we think about human–AI interaction.

Forget prompts. Forget “few-shot.” Context is the real interface.

Here’s the core idea:

“A person is the sum of their contexts.”

Machines aren’t failing because they lack intelligence.
They fail because they lack context-processing ability.

Context Engineering 2.0 maps this evolution:

1.0 Context as Translation
Humans adapt to computers.
2.0 Context as Instruction
LLMs interpret natural language.
3.0 Context as Scenario
Agents understand your goals.
4.0 Context as World
AI proactively builds your environment.

We’re in the middle of the 2.0 → 3.0 shift right now.

The jump from “context-aware” to “context-cooperative” systems changes everything from memory design to multi-agent collaboration.

This isn’t a buzzword. It’s the new foundation for the AI era.

Read the paper: arxiv. org/abs/2510.26493v1

390

231K

VickyInBeta

@VickyinSF

8 months ago

Turing Award laureate and Alphabet Chairman JohnHennessy warns that AI hardware is hitting a wall: “Even with massive parallelism, actual performance will depend on both TFLOPS and bandwidth.” As FP16 compute and HBM bandwidth rise sharply, energy efficiency gains are slowing — a clear sign of diminishing returns in Performance per Watt. #AI #Semiconductors #JohnHennessy #TuringAward #UC Berkeley

VickyinSF's tweet photo. Turing Award laureate and Alphabet Chairman JohnHennessy warns that AI hardware is hitting a wall:

“Even with massive parallelism, actual performance will depend on both TFLOPS and bandwidth.”

As FP16 compute and HBM bandwidth rise sharply, energy efficiency gains are slowing — a clear sign of diminishing returns in Performance per Watt.

#AI #Semiconductors #JohnHennessy #TuringAward #UC Berkeley

VickyInBeta

@VickyinSF

8 months ago

OpenAI estimates that among its 800 million active ChatGPT users, approximately 0.07% (about 560,000 people) show possible signs of mental health emergencies related to psychosis or mania, and about 0.15% (roughly 1.2 million people) have conversations containing explicit indicators of potential suicidal planning or intent. These figures, though representing small percentages, are actually alarming in absolute terms