Training job: "CUDA out of memory"
Me: what did my code allocate??
My code: nothing. literally nothing. it crashed on line one.
Turns out the GPU had 40GB held by ghost processes from jobs that finished hours ago and never let go.
So playing with Hermes Agent and im having a bunch of trouble asking it to setup cron jobs.
Like itll create the script in the wrong directory and the cron wont point at it.
Whats the default directory we should be putting cron scripts into?
America hits token limits on all their agents, meanwhile Australia has a tiny population getting the off-peak GPU buffet.
We are no longer "remote". We are computationally advantaged nomads
RAG systems fell away because agents were able to navigate a bunch of files and figure out the important information themselves.
This is similar to how Karpathy was talking about self driving cars. Moving from C++ where the rules were enforced, to letting the AI make decisions
Models trained in clean environments learn brittle strategies, over rely on structure and don't develop robustness.
So they reward architectures that are fragile in the real world
Reviewing an AI paper and its like
We compare against:
1) Weak baselines
2) Small synthetic models, and
3) Synthetic tasks.
We added some inductive bias and you can see our HUGE GAINS
They are just patching weaknesses in underpowered setups instead of improving strong models
i14 Journal Club: Foundation Models Where Math Meets Cognitive Science
i14 is starting a weekly online discussion group for AI researchers and engineers exploring the intersection of generative AI, mathematics, and cognitive science. We analyze how architectural design impacts learning, memory, and reasoning in foundation models.
Join us to dissect training dynamics and explore how cognitive principles can inform the next generation of architectures,
with our first session hosted via Google Meet on
Monday, March 30 Β· 12:00 PM AEDT (Melbourne time), which is
Sunday, March 29 Β· 6:00 PM PDT (San Francisco time)
Apply to join HERE:
https://t.co/fOU6DQWqKY
Maybe Iβm reading too many posts on reddit. But DLSS 5 sounds awesome.
The game engine can do the βroughβ sketch of what should be on the screen quickly, then let AI polish that into a super realistic frame.
Its like the perfect pipeline for parallel processing
@levelsio Is this just because the scans are actually too low resolution to be useful for that task?
Like the false positive rate is so high itβs only useful if you know something is already wrong.
Would this be the same issue with a better scanner?
OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:
We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.
We want AI to βjust workβ for you; we realize how complicated our model and product offerings have gotten.
We hate the model picker as much as you do and want to return to magic unified intelligence.
We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.
After that, a top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks.
In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.
The free tier of ChatGPT will get unlimited chat access to GPT-5 at the standard intelligence setting (!!), subject to abuse thresholds.
Plus subscribers will be able to run GPT-5 at a higher level of intelligence, and Pro subscribers will be able to run GPT-5 at an even higher level of intelligence. These models will incorporate voice, canvas, search, deep research, and more.