Gary @GaryLuo1994 - Twitter Profile

Pinned Tweet

@GaryLuo1994

6 days ago

Spent an hour with @samdblond at Monaco HQ today. Rare to meet a founder this sharp who's also this generous with his time.

1

5

0

382

GaryLuo1994 retweeted

Alberto Rosas

@albertorosasg

about 2 hours ago

Narrative violation: builders still want to build. AI doesn’t kill headcount. It increases ambition. We’re entering a new era of software factories, venture builders, and tiny teams creating massive companies. The builder era is back.

albertorosasg's tweet photo. Narrative violation: builders still want to build.

AI doesn’t kill headcount. It increases ambition.

We’re entering a new era of software factories, venture builders, and tiny teams creating massive companies.

The builder era is back. https://t.co/4qBYha4Mbn

1

2

0

371

GaryLuo1994 retweeted

Zara Zhang

@zarazhangrui

1 day ago

I built a Chrome extension that turns your "read later" list into dedicated reading time on your calendar. Save 5 articles → it auto-books a 30-min "reading block" on your Google calendar, links included. So you'll actually sit down and go through them. No account, no server, everything local. Open source (link below)

52

677

47

504

37K

GaryLuo1994 retweeted

AI at Meta

@AIatMeta

1 day ago

We’re sharing the next major milestone in our non-invasive brain-to-text decoder research: Brain2Qwerty v2. Building on v1, which was published today in @Nature, Brain2Qwerty v2 is the highest-performing end-to-end pipeline capable of real-time sentence decoding from raw brain signals. It advances beyond character-level performance to decoding words and semantics, enabling accuracy for overall communication. We believe this research has the potential to make a real difference for the millions of people who suffer from brain lesions or disorders that prevent them from communicating. 🧵👇

638

14K

2K

7K

5M

Gary

@GaryLuo1994

2 days ago

@hthieblot Relationships

1

3

0

36

Gary

@GaryLuo1994

2 days ago

Don’t ask LLMs to generate overall scores. Ask them to generate true or false scores on criteria. That’s how we build our AI sourcer.

elvis

@omarsar0

3 days ago

If you use LLM-as-judge, this one is worth reading. (bookmark it) It's actually one of the most effective ways to use LLM-as-a-Judge for evals. Holistic judge scores hide both their reasoning and their ceiling effects. BINEVAL decomposes each evaluation criterion into atomic yes-or-no questions, answers each independently per output, then aggregates the verdicts into calibrated multi-dimensional scores. Every question-level verdict is inspectable, so you can diagnose exactly why an output scored low, and the same verdicts feed straight back as targeted prompt-improvement signal. Across SummEval, Topical-Chat, and QAGS, it matches or beats UniEval and G-Eval, training-free, with especially strong results on factual consistency. Paper: https://t.co/oar6BZcasm Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

omarsar0's tweet photo. If you use LLM-as-judge, this one is worth reading.

(bookmark it)

It's actually one of the most effective ways to use LLM-as-a-Judge for evals.

Holistic judge scores hide both their reasoning and their ceiling effects.

BINEVAL decomposes each evaluation criterion into atomic yes-or-no questions, answers each independently per output, then aggregates the verdicts into calibrated multi-dimensional scores.

Every question-level verdict is inspectable, so you can diagnose exactly why an output scored low, and the same verdicts feed straight back as targeted prompt-improvement signal.

Across SummEval, Topical-Chat, and QAGS, it matches or beats UniEval and G-Eval, training-free, with especially strong results on factual consistency.

Paper: https://t.co/oar6BZcasm

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

50

2K

241

4K

211K

0

1

0

176

Gary

@GaryLuo1994

3 days ago

Great video!

Gabriel Jarrosson

@GJarrosson

3 days ago

I flew from SF to New York with one week to raise $50M for Fund II. The first yes, the no's, everything in between.

1

14

0

4

1K

0

1

0

47

Gary

@GaryLuo1994

3 days ago

@kushagrchitkar @rohansharma0509 @divitsheth007 @usealmanac Cong!!!

0

39

Gary

@GaryLuo1994

3 days ago

Always the best!

Bek

@beknabdik

5 days ago

Day 2 in YC I’m realizing again, I’m in the best place I could be right now

18

127

2

4

5K

0

1

0

34

Gary

@GaryLuo1994

3 days ago

Welcome to our new office.

0

8

Gary

@GaryLuo1994

3 days ago

Miss the days in batch, and also enjoy working with my best batchmates.

Alberto Rosas

@albertorosasg

4 days ago

We share the office with 10 other @ycombinator W26 companies, and yesterday we ran our first group Office Hours. The energy, ambition, and intensity in this room is hard to describe.

albertorosasg's tweet photo. We share the office with 10 other @ycombinator W26 companies, and yesterday we ran our first group Office Hours.

The energy, ambition, and intensity in this room is hard to describe. https://t.co/sUFp6pLK0D

3

30

2

1

2K

1

0

81

Gary

@GaryLuo1994

5 days ago

@TTrimoreau Finding anything that AI builds.

0

1

0

196

GaryLuo1994 retweeted

Zizheng Pan

@zizhpan

5 days ago

We’re expanding our team! Come and join us! You will be working in Beijing or Hangzhou. Fluent Chinese is required. 各个岗位热招，均可实习，欢迎投递🙌

zizhpan's tweet photo. We’re expanding our team! Come and join us! You will be working in Beijing or Hangzhou. Fluent Chinese is required.

各个岗位热招，均可实习，欢迎投递🙌 https://t.co/JyRT9Le5p3

67

634

44

183

141K

GaryLuo1994 retweeted

Paul Graham

@paulg

7 days ago

The users who complain about the flaws in your product may seem annoying, but they are on the whole probably your most valuable users. They complain because they care, and I doubt a startup could ever get really big without users who care a lot about the product.

462

9K

1K

884K

Gary

@GaryLuo1994

6 days ago

@zarazhangrui lack of the courage to be imperfect

0

5

0

419

GaryLuo1994 retweeted

Yun-Ta Tsai

@yunta_tsai

11 days ago

Many people think any given ML project is 99% training. In reality, it’s 50% evaluation, 40% data cleaning, 8% integration, and 2% training. The first two set the noise floor for learning. No ML magic matters; the model cannot lower the noise floor, as that’s the optimal bound of Shannon encoding of your data. Thus, not a single day goes by without me thinking about ontology. Even the old labels have to be constantly reviewed.

555

11K

1K

6K

18M