This is the actual bottleneck. The models are smart enough already. What is missing is the company-specific context locked in senior people heads. Whoever cracks knowledge extraction at the company level unlocks the rest.
As you work on this, please consider using GBrain as your OSS retrieval layer
https://t.co/0F5uDQzPHu
Shopify CEO Tobi Lutke explains Goodhart’s law and why he doesn’t like KPIs or OKRs
“Goodhart’s law is real. The moment a metric becomes a goal, it’s no longer a useful metric… No metric by itself is a complete heuristic for a complex business. There’s a million different tensions in a company, and you can’t keep all of them in harmony by optimizing for one thing.”
For this reason, Shopify doesn’t use KPIs or OKRs. But as Tobi explains, this doesn’t mean they don’t value data and metrics.
“We are extremely data informed. We have invested enormous amounts of money and time into systems that give us basically everything at our fingertips… But what Shopify attempts to do is just not over-fit for what’s quantifiable.”
People love optimizing for highly-quantifiable things because there’s immediate gratification that comes from seeing a number go up. But Tobi thinks that the most important aspects of a product are rarely quantifiable:
“The overlap of the most valuable things you can do with a product and the things that happen to be fully quantifiable are like maybe 20%. Which leaves 80% of a value space unaddressable by the people who only look at quantifiable things.”
He continues:
“Shopify is comfortable with unquantifiable things like taste, quality, passion, love, hate… The sort of deep satisfaction that a craftsperson feels when they’ve done a job well is actually a better proxy if you allow it to be.”
They then have robust analytics systems that tell the company if something’s wrong or a new rollout breaks something.
“We think about it as a cockpit for a pilot. The decisions are still made by pilots, and we think this leads to better results… I think there needs to be more acceptance in business of unquantifiable things… And then metrics take a support function.”
Source: @lennysan (Feb 2025)
It's pretty insane how AI has been transforming the way I study Math. On top of @karpathy 's wiki idea, I can immediate generate visualization and generate @leanprover programmable proofs to keep myself engaged and really ponder the concepts
Echoing this. My 'aha' moment with Al is internalizing open-source libraries to make them faster and safer.
Lately I'm exploring @leanprover with Claude Code, utilizing it both for formal verification and as the most concise spec for agents (maths)
https://t.co/ivTX6KtxVy
I think it must be a very interesting time to be in programming languages and formal methods because LLMs change the whole constraints landscape of software completely. Hints of this can already be seen, e.g. in the rising momentum behind porting C to Rust or the growing interest in upgrading legacy code bases in COBOL or etc. In particular, LLMs are *especially* good at translation compared to de-novo generation because 1) the original code base acts as a kind of highly detailed prompt, and 2) as a reference to write concrete tests with respect to. That said, even Rust is nowhere near optimal for LLMs as a target language. What kind of language is optimal? What concessions (if any) are still carved out for humans? Incredibly interesting new questions and opportunities. It feels likely that we'll end up re-writing large fractions of all software ever written many times over.
Thanks to good people at @AnthropicAI we now have an official MCP for Excalidraw!
Take it for a spin on @claudeai (search for Excalidraw in Connectors, or use in Claude Code and elsewhere).
More to come. ✌
> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week. It’s 3M+ lines of code across thousands of files.
I would actually be more impressed if it had 3000 lines of code, and came with a Lean proof that its sandboxing is bug-free :D
I think now that code in general (for non-frontier use cases) is on its way to being too cheap to meter, the next challenge is pushing everything up to the top tier of security.
@martinfowler Software Engineering is unusual in that it works with deterministic machines. Maybe LLMs mark the point where we join our engineering peers in a world on non-determinism."
https://t.co/nmIlm8ik9s
AI didnt just make us architect, it also calls for better engineer.
Peter Rice, the legendary structural engineer behind Centre Pompidou, once said:
The engineer is the objective inventor and the architect the creative input.
@martinfowler wrote "Other forms of engineering have to take into account the variability of the world. A structural engineer builds in tolerance for all the factors she can’t measure... "
Really refreshing to see how sync engine Jazz handles decentralized permission in CRDT.
With crypto signature, the chain of edits is verifiable like a blockchain. Check against the whitelist of who can read/write the values.
(another example of crypto more useful outside CT)
My talk from @sync_conf 2025 is ready!
Check it out to learn how:
1) CRDTs + cryptographic permissions work
2) @jazz_tools is now a general-purpose database
3) our unique Jazz Cloud infra compares to
- traditional stacks
- other sync engines
- Durable Objects
There is no way @claudeai talk about NEA photocathodes when asked to build a landing page, unless it is leaking response of other users's private query
The more I realize AI can do my job, the more I feel I should focus on what humans are good at.
If I'm not getting enough sleep, I'm definitely doing something terribly wrong
Today developers are shaping agents with skills shared on Github. Very soon it the COO office will be interviewing, onboarding, assigning, reviewing and laiding off agents like the HR today
mcps are changing
turns out designing mcps to load every tool definition into the model prompt was a bad idea
anthropic’s nov 4 blog post suggests a new pattern
treat each mcp server like a normal code library, e.g. typescript modules or files, and let the agent write and run small programs that do two things:
discover only what is needed, list a servers directory to see what exists, open just the specific tool files, import only those functions
process data locally, call mcp tools from code, then filter, join, and aggregate inside a sandboxed runner so only the small final bits go back to the model
doing this dramatically cuts tokens
anthropic shows a typical case dropping from 150k tokens down to ~2k (98.7% savings)
below a viz showing before/after
The takeaway nobody wants to hear from yesterday, is that perps are a fundamentally unsound market foundation. Just like options markets outsizing equities markets in tradfi, it's unhealthy to have perps markets outsizing spot markets in crypto