@SemiAnalysis_ are too good. This speaks to me on a spiritual level. I'm so exhausted by the overly simplistic GPU/h metrics. For anyone doing training or inference at scale, you NEED to read this.
How much do GPU clusters really cost?
https://t.co/4B7QYJzPut
We’re excited to launch #MagicLayers globally at Canva 🚀🚀🚀
This is currently the most advanced image-to-layer decomposition model in the world-purpose-built for design-native content.
⚡ 20–200× faster than Qwen-Image-Layer
🎨 Super strong on designs
https://t.co/vQ7cCPnGGZ
Feels like everyone making their own agent stumbles across the same primitives and thinks they solved something
Let me save you some time (read this, it's funny and useful):
- You're going to make an agent
- You're going to run it on benchmarks
> It's going to suck
- You're going to make a tool to analyze traces
- You're going to say this helped you
> It wont work
- You're going to think about role based agents for solving a single task
- You're going to make a workflow for solving a benchmark
> Both will work. Neither are generalized
- You'll think you made it
> It will be nearly unusable by an end user
> Back to square one
- You're going to realize you're stuck with a for loop
- You're going to think about swarms
> In swarms single agent usability doesn't matter
- But wait you need a task manager
- But wait you need a merge queue
- But wait you need compression for long jobs
> Compression is a foot gun
- But wait now you need an agent to manage it all
- But wait now you need something that checks to make sure the manager is managing
- You're going to go back to single loop agents
- Well, subagents seem like the way to do all of this
> Bam! Plot twist: subagents are hard to do well
- You're going to think "Hmm well subagents isolate context" because <Insert_Person> said so
- You're going to start to look at other agent implementations
> How have they all solved compaction, multi-agent, task management, memory etc.?
- You're going to realize it's all just tradeoffs, but most of them have only one side people care about
- "Oh it's all just context engineering"
> Yep. But it has to be good and it has to be general.
> Back to the starting loop. Rinse and repeat.
Congrats.
Keep it simple. Keep it general.
I remain unconvinced that any companies need “realtime data”. The only place it makes sense is your observability stack, and you’re probably using datadog, honeycomb, or otel for that already. Everything else: use an OLTP database in product and a data warehouse for offline.
We used Kafka + Flink + ScyllaDB for a project that emitted 1 record every 5 minutes instead of Spring + Postgres
Why? so we could flex our "realtime data stack" and get promoted faster
It worked.
Forget MCP, what I want is an industry consortium backed standard for a JSON API protocol for talking to LLM providers
Everyone building half-baked copies of the OpenAI Chat Completion API worked until it didn't any more!
everyone's constantly posting the meme about having a bunch of different agent rule files while the real nightmare continues to be totally ignored:
- openai just dropped responses api that breaks every single existing agent architecture
- anthropic format was the universal translator (superset of openai completions), now it's obsolete
- every provider has different message shapes, tool calling patterns, reasoning hydration
- cline has anthropic baked into disk storage, 30+ providers, core interfaces
- migration would total architectural hell
who gives a fuck about .cursorrules vs agents md when your reasoning traces disappear between api calls and your entire codebase assumes one message format that's no longer the superset?
can we please standardize on a future proof llm API standard?
Superhuman is being acquired by @Grammarly! 💜💚
Together, we will build the AI-native productivity suite of choice 🥇
We will invest even more deeply in AI and email, reimagine chat and collaboration, and build AI agents that unlock a whole new way of working.
More below 👇
@mitsuhiko I've also encountered groups who rampage around a company trying to replace all of the the custom, wonky shaped homegrown wheels with "don't reinvent the wheel" external products and libraries... without taking the time to understand why the wonky wheels are that particular shape
New Evals API
I’m excited to share a new API for logging evals with W&B Weave.
EvaluationLogger
- log_prediction
- log_score
- log_summary
Our design goal for this API was to get out of your way and build the most flexible eval API out there, inspired by wandb.log, which our @weights_biases users love.
- No hidden logic, you control the eval loop and what you log
- Easy to integrate into existing evals with any model or framework
- Log and version everything so you don’t accidentally compare incomparable things
- Easy to query
- Works with our existing comparison UIs
We'd love you to try it out and share your thoughts on it. link below
@OfficialLoganK Congrats on the release! Gemini is crushing it across the board recently.
Do you know if there are any plans to improve the caching semantics? It's more awkward to use than your competitors.
Agents are just selecting a DAG of tool calls aren't they? So we're doing query planning, optimization and execution again aren't we? It's SQL isnt it? The answer is SQL. It's always SQL. AQL: Agent Query Language.
At @Tailscale we just raised a $160M Series C, marking our official transition to a growth stage company. Hold on to your socks. https://t.co/qjhqKKk7ns