I trained an AI to think like me — here’s what happened:
Could I teach a LLM to think like me?
I fine-tuned a local Deepseek-R1-llama-8B on my Obsidian second brain to find out.
Could I create an AI clone that amplifies my thinking?
Not just search through my notes, but reason across my ideas.
What I learned suggests this is just version 0.0.1 of something much bigger.
I’m driven by a mix of fascination and fear.
Fascination because AI tools are already useful (and we’ve barely scratched the surface). Fear because these same tools change what it means to be valuable.
I want to adapt to the Intelligence Age by amplifying my abilities, not surrendering them.
My goal: merge the best of what LLMs can do with the best of what my brain can do.
I want to build Augmented Intelligence.
Here’s what I’ve got so far.
I’m in the middle of 6 week learning sprint going deep on the foundations of neural networks and transformers.
AI tools today make it painfully easy to skim, summarize, and take short cuts. Going deep on the basics and doing math by hand again is how I try to avoid the mental atrophy of this new default path.
For more see the notebooks on GitHub - this is a work in progress: https://t.co/WvLeAIFlAK
I was driving my son home from a field trip when he saw “NW” on the rear-view mirror display.
“Daddy, what does NW mean?” he asked.
I started explaining it and realized I was describing linear algebra. The same math that powers AI today.
In AI, data becomes vectors.
Once the data is a vector, a neural network can start moving it.
A matrix in a neural network is a learned move through representation space.
During training, the model learns how to move vectors to find useful representations that make the downstream task easier.
When you see vectors as just a direction in some space and your model as simply moving those vectors around, you can ask interesting questions like:
what space is this vector in?
what directions matter here?
what does this matrix amplify?
what does it erase?
what directions does it mix together?
In a transformer, when you see:
Q = X @ W_Q
K = X @ W_K
V = X @ W_V
that means:
take the same token vectors (X)
move them into query space (Q)
move them into key space (K)
move them into value space (V)
I bought a $200 weather sensor and fed the data into the time series foundation model my research team at Datadog just released.
Call it personal observability for the weather.
It works surprisingly well.
My team at Datadog AI Research just released Toto 2.0. It’s a family of five open-weight time series foundation models from 4m to 2.5B parameters. We achieved the top spot on GIFT-Eval, BOOM, and TIME forecasting benchmarks.
Toto is a transformer, just like most LLMs. Except Toto predicts the next patch of numbers instead of the next word. It was trained on observability and synthetic data, never weather. That it works on weather at all is the whole point of foundation models.
Now in my backyard the weather app has been horribly wrong. My wife and I would complain about the temperature forecast being 5-10 degrees below what actually seems to happen in the afternoon.
The setup is simple. I use the Ecowitt WS-90 sensor and the GW3000 gateway. This streams live data from my backyard to my Hugging Face Space. The data is then fed to Toto-2.0-22M and forecasts the next 48 hours. I then track the accuracy against the National Weather Service’s forecasts.
Five days in, Toto is winning the 1-hour forecast on both temperature (1.9°F MAE vs NWS 2.5°F) and humidity (4.6% vs 7.0%). NWS pulls ahead at 3h and 12h.
Toto reacts fast to fresh sensor data, which dominates the near term. NWS has radar, satellites, and neighbor stations feeding the regional context that dominates longer horizons.
I just love that Toto works here. Trained on server metrics, never on weather, finds enough signal in noisy backyard readings to beat a national forecasting agency at the near term.
Live on HF: https://t.co/sOWwD0fyVZ
Never bench press to Ed Sheeran.
I was surprised to learn you’ll lift at his tempo.
Here’s another surprise: you can train a time series foundation model with zero public data and still top every public benchmark.
That model is Toto 2.
My team at Datadog AI Research is releasing today: a family of 5 open-weight models from 4m to 2.5B parameters.
A few things worth flagging:
→ Scaling works. Every size beats the one below it, with no saturation at 2.5B.
→ Trained only on Datadog observability + synthetic data. No public forecasting data in pretraining, yet it leads on general-purpose benchmarks.
→ Top of every benchmark we tested — BOOM (observability), GIFT-Eval (general-purpose), and TIME (a new contamination-resistant zero-shot benchmark).
→ 7× more parameter-efficient than Toto 1 at matching quality, and a lot faster at inference.
Read the blog: https://t.co/qQBCZOZuue
Try it out on HF: https://t.co/VA4qlyUY7Y
@garrytan@garrytan I've been running my version of this for over a year now, it's amazing how much your system validates what I've been doing.
The system is only half of the value though.
An augmented human is the other half.
I've been privately teaching others how to do this.
Everyone's building LLM knowledge bases this week. I've been at it for 2 years.
Here's what I actually want from mine:
- Reminds me of the things I already learned at the exact moment I need them
- Catches my blind spots in real time
- Connects what I'm learning today to what I figured out 6 months ago
- Takes the next step for me when I know what to do but can't get to my computer yet
- Enables me to just go do the thing without worrying about remembering how.
A tool for being a better, more capable human.
Open source, very much in progress: https://t.co/CKHJGtE8Js
@karpathy I've been building this human context layer for the past 2 years.
I use Obsidian as my interface, dispatching tasks directly from my thinking.
This MCP server acts as the bridge between Obsidian and Agents.
https://t.co/H9MkvYznvM
I built an MCP server that gives Claude semantic search, graph traversal, and temporal filtering across my entire Obsidian vault
All 4,000 notes, 3 years of thinking.
I didn't just give Claude access to my files. I ran an ETL pipeline that parses every note into blocks, extracts timestamps, pulls out wikilink relationships, embeds everything semantically, then deduplicates and diversity-reranks results so the agent doesn't get stuck in one corner of the vault.
Structured data was the key.
Now Claude doesn't grep my files.
It can actually reason across years of my data.
https://t.co/Tq5dzcpHBe
Data privacy is dead.
But it doesn't have to be.
It's the most valuable part of the AI stack.
Is anyone building an open source personal memory layer wrapped in a MCP so I can take all my data with me to any frontier model?
@VibeCoderOfek Fair point - IF the graph enrichment were done in real-time.
But the data is pre-processed offline.
Dispatch is fast, retrieval starts at recursively distilled hubs and makes a few hops.
Do you need graphs created online?