Everything I do is an ongoing process; I see code and writing as ways to test and refine how I think. Been writing code since I was 8.
Generalist / @talliedart
Nashville Zoo always knows what's up.
Out of the 64 clouded leopards in accredited zoos, the Nashville Zoo holds 17 clouded leopards. Clouded leopards are rare and sensitive to sound/light, so I'd assume a 69,220-sq-ft data center isn't ideal.
Why not take a page from research and use a double-blind procedure?
For example: "Reviewer A says X. Reviewer B says Y. Which is better supported? Why?"
If sycophancy is directed at the user, wouldn't this avoid bias by presenting two viewpoints without implying user preference? I'd think AI would do a better job of considering the strongest arguments for both sides, since the user's position isn't identifiable in this case.
As promised, here's a recording of my 30-min keynote and the subsequent Q&A for the inaugural late interaction retrieval (LIR) workshop, cc @bclavie@antoine_chaffin.
The talk is admittedly advanced, as it's directed at an expert IR community. But hopefully still broadly useful!
The bitter free lunch is the idea that if you rely *purely* on scale, you will get the bland and bitter defaults that everyone else is getting.
There's irreducible complexity and irreducible modularity to *your* problem specification, at least if your problem is worth your time.
For the last few months, I've begun to increasingly dislike the vibe I see in tpot around AI.
Nuance and substance are more absent than ever, and are replaced by shallow hype, including now by many of your largest favorite accounts in this space.
in case it needs to be a meme:
please shut the fuck up i don't even care about the specific thing you're saying i'm just so tired of hearing predictions one after the other telling me what the future is going to be like just please shut the fuck up
Anyone interested in the intersection of engineering & design should be following @haydenbleasel. He builds great tooling (like Ultracite) and shares his knowledge frequently (definitely check out https://t.co/1OzR5RUegV).
Today is my last day at Vercel.
I've had an absolute blast over the last year.
I got to work on a lot of my own projects that helped form the basis for our AI Cloud, helped other teams ship their great ideas, and worked alongside some of the most talented people I've ever met.
I'll have news on my next adventure for you on Tuesday but for now, wanted to reflect on my time here and the projects I worked on.
Read on if you're interested ↓
recommend reading: Omar & Harrison are very thoughtful on “how to design LLMs systems to interact with long context data”
the brute force approach of context stuffing+subagent parallelization is incredibly wasteful & usually simply impossible at large data scales
RLMs are a strategy that bias harness design + optionally train models to teach them how to write small programs to interact with sub pieces of context (files/variables)
immediate interesting use case is hill
climbing with gradient free/reflective optimization by mining big trace data
for example when you run a model on your eval set you get a lot of data out of it:
- folders containing the full trace logs
- metadata on success, failure, errors, stats
- metadata on latency, cost, num calls, tools used etc
it’s often impossible to read even a single trace into context and reason over it, let alone an entire dataset
RLMs “feel” especially good here because the problem structure maps exactly to the behaviors of writing small programs to read certain parts of files to understand some sub-behaviors
good search + good RLM design to mine trace data feels like a really exciting area for making models better
it also ties directly into another piece of awesome work from Omar…dspy!!
Claude codes faster than I do, by a significant factor. Claude can hold more details in its "mind" than I can -- again by a significant factor.
But Claude cannot hold the big picture in it's mind. It doesn't really even understand the concept of a big picture. Architecture is likely beyond it's capacity.
And although Claude appreciates the value of refactoring, it shows no inclination to acquire that value for itself. It has no sense of self preservation. It does not look ahead and foresee the disaster it is creating.
Looking for advice on raising pre-seed for Tally, software for private art collectors. We launched our webapp two weeks ago and just onboarded our first user. Any tips or tricks on getting started this early? Any advice on what to avoid?