More than 50% of the reported reasoning abilities of LLMs might not be true reasoning.
How do we evaluate models trained on the entire internet? I.e., what novel questions can we ask of something that has seen all written knowledge? Below: new eval, results, code, and paper.
Functional benchmarks are a new way to do reasoning evals. Take a popular benchmark, e.g., MATH, and manually rewrite its reasoning into code, MATH(). Run the code to get a snapshot that asks for the same reasoning but not the same question. A reasoning gap exists if a model’s performance is different on snapshots. Big question: Are current SOTA models closer to gap 0 (proper reasoning) or gap 100 (lots of memorization)?
What we find: Gaps in the range of 58% to 80% in a bunch of SOTA models. Motivates us to build Gap 0 models.
We’re releasing the paper, code, and 3 snapshots of functional MATH() today.
arxiv draft: https://t.co/KtvWPc0R72
github repo: https://t.co/gzDVaxZ9yg
1/🧵
What started as an experiment in a Harvard dorm and became one of Sequoia India’s 1st partnerships in SEA has grown into a 🚀company powered by a maniacal focus on #customer centricity. Congrats Chih-Han, Joe, Winnie & everyone @GoAppier on today’s #IPO!
https://t.co/VwOiASjfWJ
We are deeply grateful to Sequoia’s LPs, who have committed $1.35B to two new Sequoia India venture and growth funds. The region’s #startup ecosystem is at a fork in the road. We believe there is an opportunity to make different choices for the future. https://t.co/x9BZ17rtsj
https://t.co/gWwbrkVrk7
Appier continues to be one of the leading AI companies in Asia. We @Sequoia_India are thrilled to have been partners for 5+ years and look fwd to their continued success. Onwards!
Welcome aboard @amitjain1. Absolutely thrilled to have you join the @sequoia_india family! Looking forward to a fantastic journey together. #onwards
https://t.co/cGqC2GE0EH
Pick just 1 or 2 metrics at each phase of your #startup journey & track them relentlessly, says @abheek. The ability to articulate a vision and break it down into measurable goals helps #founders move the ball forward every day. https://t.co/Oi7QysZVXn
So happy to see my former employer Facebook's Return to Work program - helping those who have left the workforce for 2+ yrs come back fulltime. We must have more orgs do this, especially in Asia https://t.co/74WbqItJ1k
After years of reflection, my four (software) startup engineering killers are:
1. Premature scaling
2. Too much shiny/new tech
3. Bad hiring (great engineer, but not great startup eng)
4. Eng/business mismatch
What are yours?
https://t.co/dt0WKVBcHY
A little over 72 hours before we close applications for Surge. It’s right down to the wire! Head over to https://t.co/yWSW9xsrQ9 to apply. Let’s get this done!
#GetReadyToSurge
1/ We @sequoia hear stories like this about @ericsyuan every day
It’s no coincidence that this humble & amazing founder is at the helm of such a wonderful company, @zoom_us
In fact, his humility has been a key driver of their success
We asked ourselves: how can we best serve early-stage founders? Can we do more to give them an unfair advantage? Thrilled to launch Surge (@_surgeahead) - a new kind of rapid scale-up program for early stage #startups in #India & #ASEAN. #GetReadyToSurge
https://t.co/KKXPzfhex6