Introducing TabFM, a foundation model designed specifically for tabular data classification & regression. This approach allows generation of high-quality predictions on previously unseen tables in a single forward pass.
Learn more and try out the model →https://t.co/OTbVQ8oUQs
Cool new paper from NVIDIA.
Looks like agentic coding is moving into hardware design.
HORIZON treats hardware design as repository-level code evolution. A Markdown harness becomes a project pack with domain knowledge, an executable evaluator, an acceptance predicate, and a git policy.
The agent then evolves an isolated worktree.
That is a strong pattern because hardware design needs executable checks. The verifier harness becomes the real interface between the agent and the design task.
The paper reports 100% benchmark completion across several hardware design suites, which makes this one worth tracking even if you do not work on EDA.
Paper: https://t.co/zoUSIPhYGt
Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c
Transformers are easier to learn when you can poke the model directly.
Transformer Explainer is an interactive visualization tool for learning how Transformer-based text-generation models like GPT work.
It helps you connect the architecture to real behavior by running a live GPT-2 model in the browser, letting you enter your own text, and showing how internal components work together to predict the next tokens.
Key features:
• Live GPT-2 in the browser – experiment without setting up a separate model server first
• Custom text input – try your own prompts and watch how the model handles them
• Internal component views – observe the operations that work together inside the Transformer
• Next-token prediction focus – connect each visual step to the model’s token predictions
• Local development path – clone the repo, install dependencies, and run it with npm for deeper inspection
It’s open-source (MIT license).
Link in the reply 👇
If you use LLM-as-judge, this one is worth reading.
(bookmark it)
It's actually one of the most effective ways to use LLM-as-a-Judge for evals.
Holistic judge scores hide both their reasoning and their ceiling effects.
BINEVAL decomposes each evaluation criterion into atomic yes-or-no questions, answers each independently per output, then aggregates the verdicts into calibrated multi-dimensional scores.
Every question-level verdict is inspectable, so you can diagnose exactly why an output scored low, and the same verdicts feed straight back as targeted prompt-improvement signal.
Across SummEval, Topical-Chat, and QAGS, it matches or beats UniEval and G-Eval, training-free, with especially strong results on factual consistency.
Paper: https://t.co/oar6BZcasm
Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX