📌 Latest result in Kaggle
🏝 Great Barrier Reef 🐠https://t.co/4pA4xxrfPu
I learned a lot from my first opportunity to use an anchor-free object detection model, YOLO-X.
And as usual, I missed the gold medal 🥈
Solution ➡ https://t.co/fjt90eDU2p
The creators of SWE-Bench just dropped a really simple new benchmark every LLM gets 0% on.
ProgramBench asks: can models recreate real executable programs (ffmpeg, SQLite, ripgrep) from scratch with no internet?
We are far from saturated on model quality.
A new leaderboard has arrived: Image to WebDev.
It ranks models based on their ability to generate websites based on screenshots and images.
Who’s in the top 10:
- #1-3 @Anthropic takes the lead with Claude 4.6 (Sonnet and Opus)
- #4-6 @GoogleDeepMind is right behind with Gemini 3.1 and 3
- #7 @OpenAI with GPT-5.1-High
- #8 Google back with Gemini 3 Flash (thinking-minimal)
- #9 @Kimi_Moonshot represents open models with Kimi K2.5 Instant
- #10 OpenAI closing out the top 10 with GPT-5.1
This is a dedicated leaderboard that shows which models are the best at agentic coding live sites based on visual inputs. All Arena leaderboards are ranked by a community of real people bringing in real-world tasks across a variety of expert fields.
ENERGY PRICES TODAY
US crude oil prices are up 8%, trading at $104
Brent crude oil prices are up 8%, trading at $103
European gas is up 9.2%, trading at $47.7
Heating oil is up 8.7%, trading at $4.08
Gasoline is up 4.4%, trading at $3.17
An accidental leak exposed over 500,000 lines of Anthropic’s Claude Code, revealing its agentic structure: modular tools, subagent swarms, and layered memory management.
The source code offers rare insight into how advanced agents operate today, and hints at potential future features like autonomous background agents and voice interfaces.
Learn more in The Batch https://t.co/hVh0iPAaUP
📚 Doint-Meta-Analysis in R
https://t.co/5MHiDzrZMt
メタアナリシスはあるテーマに関して公開されている複数の研究結果を統合し,研究全体での効果を推定する手法
本書では原理を簡単に解説し,R の meta 📦 を使った分析の仕方を丁寧に紹介
やっと読み終えたので,概要図をペタリ.良本でした
Object detection shouldn’t break at deployment.
YOLO26 is anchor-free, NMS-free, and built for fast CPU inference.
We fine-tuned YOLO26m on a custom dataset with full experiment tracking and a clean, reproducible pipeline—start to finish in one Studio.
Train → evaluate → export (ONNX, TensorRT, CoreML)
Try it → https://t.co/Fnva1rIgVF
Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation.
Link: https://t.co/iF4DsMcnhj