We could already use more Haskell, or rust. I don’t see how it couldn’t be done.
But, would that solve the problem, because there are still real human effort there porting every financial or infrastructure system to type-safe functional language
It’s also not the priority of most business
The original motivation of AI is actual knowledge and capability transfer to a biological human, cyborgization
Not what we have right now, being subjected under uncontrollable power
3 years ago we didn’t expect that there’d be any new capable models coming out.
Honest question, how much life would’ve changed if we could use Fable, Mythos, GPT-5.6?
If FDEs are the ones deploying the models to the enterprise infrastructure, that is still a software and infrastructure that everyone needs to tinker with, it’s not “general intelligence” in the sense that a human is there to get the work done.
Then I think the real measure of improvement is when a human could work in area where he/she is not an expert and able to contribute significantly with the help of these models
Newest from @gwern who also called the importance of the scaling paradigm right after GPT-3:
"In particular, the Lean programming language likely has, with 2026-era LLMs, a worse baseline constant and total loss on existing codebases, but better scaling exponents. This would imply that implementations in Lean can eventually win and deliver large benefits in program correctness at global scale—and thus could help justify large-scale investments in rewriting existing codebases in Lean or paying for new Lean code, thereby improving global cybersecurity."
What exactly did you guys think "permanent underclass" was gonna look like? Having access to all the same models and resources as the insider elite? LOL
Really fun to hang again with my friend 🃏 @polynoamial (OpenAI research scientist, our first guest ever on @NoPriorsPod in early 2023) to talk about the implications of large test-time compute, and what happens when models are given $10M budgets to spend on a single task. Topics:
01:23 – Why Benchmarks Are Broken
04:19 – Compute Budgets and Projections
06:48 – How Long Should Models Think?
08:01 – Benchmarkmaxxing
09:48 – Noam's Evals
12:40 – Safety (When Model Capability Scales With Spend)
16:09 – Implications For the Model Release Cycle
18:34 – Latent Model Capability
22:27 – Limits on Recursive Self-Improvement
28:38 – Large-Scale Multi-Agent Coordination
30:39 – Competition at the Frontier
33:19 – Breaking the Benchmark Grid Equilibrium
34:57 – Why Benchmarks Should be Scaled by Cost
This isn’t exactly what Opus does or any other models I interacted with.
The models will analyze various approaches of a situation, as long as it’s in the data, it will explain that to the degree appropriate to the context, influenced by temperature if it’s configurable as a parameter.
For some context it can be a bigger problem or bigger factor than it actually is.
That’s what a frontier LLM is capable of.
It even discouraged an LLM-based architecture of a system if it thinks that the problem can be solved in a much more efficient manner
The model is not a decision maker most of the time. You can also notice this for simpler things like travel planning
VCs are now sharing screenshots in group chats of Claude discouraging investment in open-source AI infra startups and models.
Obviously there is an absolute EXPLOSION of pitches in inference companies, harness companies, RL-as-a-service companies, open-source tooling currently including Neolabs that plan to open source models as well.
Now the obvious takeaway is: “Claude is biased against open source.” Who cares?
The more unsettling take is: every major AI model has a worldview, and that worldview is becoming embedded in capital allocation.
If Claude’s safety priors cause it to frame open-source AI as dangerous, hard to govern, or less fundable, it’s probably doing the same thing in enterprise buying workflows.
Now investors and executives are obviously smarter, but “influence” just changes which risks get highlighted, which questions buyers should ask, and which vendors are suggested…..
the models are very capable. i'm not saying they're very smart. but the bottleneck with them is likely your own imagination
proof is so many companies demo their ai product with "it can schedule stuff for you"