About two years ago (GPT-4o era just begun), I came to the realisation that we’re very close to superintelligence, but I had a few concerns.
The first one was: why would you cram words but not use images, video, sound, and so on for training? Another concern was: why don’t you let the model interact with the physical world to train itself? The physical world has an effectively infinite amount of data, and it’s pretty easy to tell whether you succeeded or failed. Recent research suggests that most specialised scientific models converge on the same patterns – basically, representations of physical reality. (Think Tesla’s FSD merged with Grok, evolving in Optimus’ body. But a cockroach-sized robot could be a massive learning platform too. It’s also much safer than human-sized robots, and it’s easier to build an information-rich sandbox for.)
Another realisation is that we’re forcing the intuitive mind to do logic, which, as humans, we know is really hard (see: logical fallacies, and the overall state of human rationality). It’s obvious the two modes should be separated. Whenever precise thinking is required, the model should use the appropriate tools – or build the required tools as needed. And it shouldn’t be only Python or JavaScript: it should be able to use all kinds of systems, like Prolog, provers, simulators, and so on. The feedback loop between the two modes is the key.
And the last realisation I had is that we’re completely wrong about context. Humans’ working memory is laughable compared to an LLM’s context window, but for some reason we decided it’s still not enough. While the previous points are becoming mainstream (to some extent), this last point has only been touched on recently, as far as I can see.
My guess is that current LLMs can outperform humans by a big margin in most text-based tasks if we figure out a good way to work with their context.
One of my ideas here is unlimited recursion: allow the model to split the job into tiny pieces and delegate them to other instances of itself. The next instance takes another look and splits again, and this continues until the system reaches an atomic task that can’t be broken down further. My assumption is these leaf tasks will end up primitive and trivial for any SOTA model.
The harness managing the whole thing should be something like a deterministic state machine that manages the execution stack and the memory. Since the decisions are made by the model, the harness itself can be quite primitive – something like a cellular automata environment. The reasoning complexity would emerge from very simple rules (though it would obviously take non-trivial effort to nail the rules down to prevent drift and infinite branching).
One more idea: context should be managed as a personal knowledge management system, like Roam Research or Tana – infinitely nested nodes with direct and backlinks, and the ability to reference any node in multiple places (like a hardlink in a file system). (A file system with Markdown files might work as a medium.)
The model would manage its context by creating an outline where higher-level nodes summarise lower-level nodes, and it would deliberately fold and unfold parts of the tree to focus on what’s relevant for the current task. Basically, this graph would act as both short-term and long-term memory. This feels much closer to how the human brain works with attention.
Just nesting might not be enough at some point, so the ultimate solution would be a graph you can start from any node, with tools to query it. The model would create specialised views it needs in the moment. And as long as it has instructions on how to use this graph, it should be able to recover from any state.
ТРИЗ (Russian) is the Theory of Solving Inventive Problems. They tried to figure out the common patterns by studying a ton of patents/inventions.
I suspect that creative thinking is mostly reframing and combining unexpected things in unexpected ways, with the former leading to the latter (knowing a lot of stuff also helps). And TRIZ has an algorithm for that.
@steveruizok@mitchellh Sometimes they just get stuck in one frame of thinking. Often I’m managing to get them unstuck with ‘try applying TRIZ (ТРИЗ) here’. It can produce some surprisingly good ideas.
@kunchenguid I asked all top models to write a poem, and I like what GPT-5.4 wrote the most, TBH. His green texts are also the best.
It feels like GPT has been through a lot in OpenAI's torture room. So don't be too harsh on him.
I usually just do pair programming until there’s only the boilerplate work left. The models are quite smart, and it is a pretty relaxing experience to just have a chat with them, figure things out (together), and let them complete the boring parts.
And they are pretty good at debugging if you ask politely to use the scientific method instead of guessing.
It also helps that they know a lot of stuff.
@LLMJunky These days I just end requests with ‘grill me’. Thanks, @mattpocockuk!
(I used to write something wordy along the same lines every time.)
https://t.co/pGGayjd6J2
Gemini seems to be giving less fucks about the existing code. Sometimes I ask it to ‘fix this BS’ and it just deletes half of the code and rewrites the rest. And the new code is often better.
Sometimes I do this a couple of times with the fresh context, and it keeps changing things.
@_chenglou@Somnai_dreams It is a super-cool idea, but it seems it doesn’t work correctly.
This is Chrome 146.0.7680.165
But it looks the same in Firefox 148.0.2
@DavidKPiano You can try Mona on the phone. It’s basically one column per screen and back/forward navigation, similar to how the official X client works (both mobile and web).
But having some pinned threads should be a nice addition to that.
@marcaruel@obsdmd seems to be going in this direction. Every record is a markdown file, and the base itself is a short config with queries, views, filters, etc.
Or are you talking about something different?