After a few months of intense building I can finally share #aecbench publicly!
It focuses on a simple question that has a complex answer: how capable is AI in real engineering work?
"You cannot govern a technology you have only been briefed on."
Singapore Minister for Foreign Affairs, Dr. @VivianBala, echoing @karpathy and @yacineMTB on why he runs NanoClaw: "you can outsource memory and computation, but you cannot outsource your understanding"
https://t.co/z4Aidf89ha
He also shared his tech stack for running his second brain for Singapore's Foreign Affairs Ministry and parliamentary affairs:
- @AnthropicAI Claude Agent SDK
- Baileys + WhatsApp
- Mnemon (Graph Memory)
- @ollama + @nomic_ai
- @ggerganov Whisper.cpp + OneCLI
With special notes on how he handles security and isolation, and what implications he sees for Singapore Inc.
Today, we are launching our collaboration with @nomic_ai to make AI agents more effectively and efficiently understand complex PDF documents.
Nomic's new nomic-layout-v1 model allows your AI agents to parse documents locally, so sensitive documents never leave your machine.
Everything is Apache 2.0:
📄 Paper: https://t.co/oiwSmtdqJq
💻 Benchmark + harness + eval: https://t.co/H5mMrTKtw3
@huggingface: https://t.co/WcTcsIhH1z
We want people to build on this — new agent harnesses, new tool integrations, new task families. PRs welcome.
If SWE-Bench drove progress in coding agents, we think AEC-Bench can do the same for the built world.
Today, we're launching AEC-Bench — the first open, multimodal agent benchmark for construction.
196 tasks across real construction documents. Full agent harness. Automated evaluation. Apache 2.0.
We benchmarked Claude Code, Codex, and our own agent. Here's what we found 🧵
Nomic is hosting 200 AI leaders, designers, engineers across the world's leading firms to discuss how AI is accelerating design and build.
Join us in the heart of Manhatten on November 12th.
AI systems excel in domains that have abundant coverage in internet data.
Large sectors of the economy are not digital-native. Their data, processes, and workflows are governed by signals that are out of distribution of foundation models.
Introducing the new Nomic Platform
Our platform and developer infrastructure are already in production at the world's top built-world organizations, delivering value across the entire stack — from designers and engineering teams to the owners managing multi-billion-dollar assets.