Tianzhu Qin

@Maple_Optboy

Stanford Visiting Researcher; Cambridge PhD Candidate; AI; Causal Inference; Sustainability; Rescue Diver

Joined November 2016

109 Following

51 Followers

8 Posts

Maple_Optboy retweeted

Yiqing Xu

@xuyiqing

2 months ago

8/ StatsClaw is open source. We build a /contribute command that can summarize your lessons for the community. Come build with us! https://t.co/Se9KLI4aVj

xuyiqing's tweet photo. 8/ StatsClaw is open source. We build a /contribute command that can summarize your lessons for the community. Come build with us! https://t.co/Se9KLI4aVj https://t.co/YAYiu5z55P

Maple_Optboy retweeted

Yiqing Xu

@xuyiqing

2 months ago

7/ It's not magic. Quality scales with how much you engage: discuss the plan, review the comprehension artifact, help design tests. Dense math can trip it up. But for package maintenance and new features, it changes the game.

xuyiqing's tweet photo. 7/ It's not magic. Quality scales with how much you engage: discuss the plan, review the comprehension artifact, help design tests. Dense math can trip it up. But for package maintenance and new features, it changes the game. https://t.co/fn8RkyAldS

Maple_Optboy retweeted

Yiqing Xu

@xuyiqing

2 months ago

6/ In another, the tester discovered a property that causes inconvergence we hadn't considered: balanced-panel geometry makes certain FE augmentations algebraically degenerate. The system didn't just verify — it discovered.

xuyiqing's tweet photo. 6/ In another, the tester discovered a property that causes inconvergence we hadn't considered: balanced-panel geometry makes certain FE augmentations algebraically degenerate. The system didn't just verify — it discovered. https://t.co/ozbiksSV3I

897

Maple_Optboy retweeted

Yiqing Xu

@xuyiqing

2 months ago

5/ We used it on our own R packages (panelView, interflex, fect). In one case, the reviewer caught 6 bugs that passed all 34 tests — including 2 that silently produced wrong standard errors.

xuyiqing's tweet photo. 5/ We used it on our own R packages (panelView, interflex, fect). In one case, the reviewer caught 6 bugs that passed all 34 tests — including 2 that silently produced wrong standard errors. https://t.co/opllOdcKmv

958

Maple_Optboy retweeted

Yiqing Xu

@xuyiqing

2 months ago

4/ Here's a demo. We gave it a 4-page PDF with three probit estimators (MLE, Gibbs, MH) and one prompt: "Build the R package from this PDF. Run Monte Carlo. Ship it." What came back: a complete R package with C++/Armadillo backends, 3 estimators, a full test suite, and Monte Carlo results — all verified against R's glm().

xuyiqing's tweet photo. 4/ Here's a demo. We gave it a 4-page PDF with three probit estimators (MLE, Gibbs, MH) and one prompt: "Build the R package from this PDF. Run Monte Carlo. Ship it."

What came back: a complete R package with C++/Armadillo backends, 3 estimators, a full test suite, and Monte Carlo results — all verified against R's glm().

Maple_Optboy retweeted

Yiqing Xu

@xuyiqing

2 months ago

3/ StatsClaw enforces information barriers. A planner reads your math and produces three independent specs — one for the builder, one for the simulator, one for the tester. None of them can see each other's instructions. The builder doesn't know the ground-truth parameters. The simulator doesn't know how the algorithm works. The tester just checks: does the implementation recover ground truth? A bug that survives must fool all three independently.

xuyiqing's tweet photo. 3/ StatsClaw enforces information barriers. A planner reads your math and produces three independent specs — one for the builder, one for the simulator, one for the tester. None of them can see each other's instructions.

The builder doesn't know the ground-truth parameters. The simulator doesn't know how the algorithm works. The tester just checks: does the implementation recover ground truth? A bug that survives must fool all three independently.

Maple_Optboy retweeted

Yiqing Xu

@xuyiqing

2 months ago

2/ Key idea: verification with information barriers. The problem: when an LLM writes both code and tests from the same information, the agents can cheat or find workaround then fixing the root cause. If it misunderstands your estimator, it embeds the same mistake in the tests. Everything passes. The implementation is wrong.

Maple_Optboy retweeted

Yiqing Xu

@xuyiqing

2 months ago

1/ Happy to release StatsClaw — an open-source multi-agent workflow for building statistical software with AI. w/ @Maple_Optboy Site: https://t.co/4svIckWc4m Paper: https://t.co/HrzzB4BJcG

xuyiqing's tweet photo. 1/ Happy to release StatsClaw — an open-source multi-agent workflow for building statistical software with AI. w/ @Maple_Optboy

Site: https://t.co/4svIckWc4m
Paper: https://t.co/HrzzB4BJcG https://t.co/U7MkiU2yCl

454

125

388

54K

Tianzhu Qin

@Maple_Optboy

Last Seen Users on Sotwe

Trends for you

Most Popular Users