Had an amazing time with @Meta speaking about the future of AI, what I look for in companies, leadership, and how to best serve this new modern world for good through philanthropy. Thanks, @Meta !!!!
I spent $0 and a weekend vibecoding an @openclaw setup that I text to run experiments for me. In the process, I ended up with bespoke software for self-managing a personal cluster. Also, it now comes up with its own experiments if I don’t have enough running. Blog post link ⬇️
1/9 Can we leverage foundation models for better RL agents? Yes!
We introduce Language-Aligned Reward Machines (LARMs) and a framework that uses Foundation Models (FMs) to automatically generate them.
This work enables new ways to train RL agents efficiently on complex tasks
🧵
NO verifiers. NO Tools.
Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling.
Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of!
Then we use aggregation-aware RL to push further!! 📈📈
🧵below!
@dodecahedra@evelynjlamb Now what would really be fun is a combinatorial proof of the equivalence of all of these, which would require realizing that this is the core scenario behind all of them.
@dodecahedra@evelynjlamb My second (even worse) way was this. We can make an equivalent situation by giving people in a line red and blue tokens. Partition on how many of each color, then pick which get red, then pick which get blue. However, we need to divide by two to take care of mirror cases.
@dodecahedra@evelynjlamb Pretending I didn’t first think that people were indistinguishable, I now think that it’s 28501. One way I thought of it was giving colored tokens to people.
@Quelklef@rytse_ That sounds like the notion of independence. A null hypothesis is a claim of no difference between probabilities, means, or distributions. If I were to try and write down your sugar scenario, I wouldn’t be able to do so (it seems like it’s trying to be a proportion test).
@Quelklef@rytse_ The null hypothesis has to be a statement about the probability. If what is observed differs significantly from the proposed probability statement (the null), it should rightly be rejected. Keep in mind we never accept the alternative, merely reject the null with a certain p
@Quelklef@rytse_ The procedure you described isn’t really a hypothesis test. Saying “I eat sugar” is an event (n=1). You need to observe behavior, like if you eat sugar or not over 10 days. Even still, the probability is not a p-value. Remember that a p-value talks about a sampling distribution.