Lawvere cited philosophy in helping him clarify his work in category theory, culminating in axiomatic cohesion. He discusses this in several works, like taking categories seriously, categories of space and quantity. He mentions Aristotle, Hegel, Leibniz, Cantor. It seems his claim is that philosophical language can help isolate the conceptual essence of mathematical structure.
@etscrivner@rickasaurus The problem with OOP is that in the process of misleading programmers into thinking of it as an ontological tool for classification of knowledge, for which it is far too inexpressive, it leads them to forget their epistemological purpose of writing programs.
Observer dependency is real but bounded. The observer of a problem implies admissible operations, a representation language, a resource model.
But there’s still intrisicity.
You can quotient out the representational artifacts, but still yield a minimal residual structure (curvature, entropy, incompressibilty, logical depth, homological obstruction). Take Gauss’ work on intrinsic curvature, showing that you can compute curvature without reference to a specific coordinate system. Kolmogorov complexity also hints at this.
Complexity can be seen as an obstruction to reduction. You can know the shape of a problem, but that just gives you a way to measure complexity, it doesn’t eliminate it.
All three curves are called Cotes spirals, after Roger Cotes’ work on the inverse cube force law, published posthumously in 1722. Cotes seems to have been the first to compute the derivative of the sine function. After Cotes’ death at the age of 33, Newton supposedly said “If he had lived we would have known something”
“OLS regression is structurally a special case of a single-layer Linear Transformer.”
“…spectral decomposition reveals that linear attention achieves the OLS projection in one forward pass, showing statistical inference to be Transformer’s intrinsic algebraic property”
“the weight matrices serve as slow memory to extract long-term statistical patterns, while the attention scores act as fast memory to construct real-time contextual associations”
“the transition from linear projection to Softmax attention represents a fundamental leap in the energy function of Hopfield associative memory networks, ultimately achieving a breakthrough in memory capacity from linear to exponential scale”Ordinary Least Squares is a Special Case of Transformer
an iterative simulation.
Introducing Hyperagents: an AI system that not only improves at solving tasks, but also improves how it improves itself.
The Darwin Gödel Machine (DGM) demonstrated that open-ended self-improvement is possible by iteratively generating and evaluating improved agents, yet it relies on a key assumption: that improvements in task performance (e.g., coding ability) translate into improvements in the self-improvement process itself. This alignment holds in coding, where both evaluation and modification are expressed in the same domain, but breaks down more generally. As a result, prior systems remain constrained by fixed, handcrafted meta-level procedures that do not themselves evolve.
We introduce Hyperagents – self-referential agents that can modify both their task-solving behavior and the process that generates future improvements. This enables what we call metacognitive self-modification: learning not just to perform better, but to improve at improving.
We instantiate this framework as DGM-Hyperagents (DGM-H), an extension of the DGM in which both task-solving behavior and the self-improvement procedure are editable and subject to evolution. Across diverse domains (coding, paper review, robotics reward design, and Olympiad-level math solution grading), hyperagents enable continuous performance improvements over time and outperform baselines without self-improvement or open-ended exploration, as well as prior self-improving systems (including DGM). DGM-H also improves the process by which new agents are generated (e.g. persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs.
This work was done during my internship at Meta (@AIatMeta), in collaboration with Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos).
C# doesn’t have formal DUs, but:
- Records and pattern matching get you part of the way there
- You can write an explicit Fold method to handle all cases, which is helpful in F# also (if you use a catch all clause you forego exhaustiveness checks)
- F# DUs require special effort for JSON serialization
- Easy to code a code generator to generate Fold methods for C# DUs expressed either as class/record hierarchies or a single type with a type discriminator enum
- The codegen solution has advantages over base F# DUs in that it provides a fold method, and it can be used for value-type representations of DUs, not to mention being able to map every domain-specific DU to a generic Choice<..> DU
Have been updating my "physics of AI" blogs every day. Requiring only 2 hours every day, I learn surprising facts about neural networks via toy models. Many insights might be trivial or irrelevant in the end, but some will be huge and transform the field.
https://t.co/lZeIJEZLLI
I don’t think asynchrony is necessarily about correctness.
Here are the definitions of asynchrony, concurrency and parallelism that I’ve landed on over the years:
Synchrony is the coupling of two or more events into a single semantic event.
Asynchrony is the absence of such coupling: related events remain distinct and are not required to occur as one.
Concurrency is the multiplexing of multiple independent computations onto (fewer) execution resources.
Parallelism is the decomposition of a computation into multiple computations executed on multiple execution resources.
You can even formalize these in terms dualities, loosely as followes:
Asynchrony is dual to synchrony
Concurrency is dual to parallelism
In some important ways, a user’s LLM chat history is an extended interview. The social media algorithms learn what you like, but chats can learn how you think.
You should be able to provide an LLM as a job reference, just like you would a coworker, manager, or professor. It can form an opinion and represent you without revealing any private data.
Most resumes are culled by crude filters in HR long before they get to the checking-references stage, but this could greatly increase the fidelity. Our LLM will have an in-depth conversation with your LLM. For everyone.
Most people probably shudder at the idea of an LLM rendering a judgement on them, but it is already happening in many interview processes today based on the tiny data in resumes. Better data helps everyone except the people trying to con their way into a position, and is it really worse than being judged by random HR people?
Candidates with extensive public works, whether open source code, academic papers, long form writing, or even social media presence, already give a strong signal, but most talent is not publicly visible, and even the most rigorous (and resource consuming!) Big Tech interview track isn’t as predictive as you would like. A multi-year chat history is an excellent signal.
Taken to the next level, you could imagine asking “What are the best candidates in the entire world that we should try to recruit for this task?” There is enormous economic value on the table in optimizing the fit between people and jobs, and it is completely two-sided, benefitting both employers and employees.
@alz_zyd_ Made me recall something Jeff Bezos said on Lex Fridman…he said he wants to see humanity reach a population of one trillion so that we can have millions of very talented people. I wonder if that’s the bottleneck.