@AKLKO1977 "Advaita Vedanta approach to agentic AI"
No wonder what it means. No white paper. No framework, just non serious stuff.
Current Agents are unreliable.
@AKLKO1977
The Nexbax AI Index proposes new AI evaluation metrics focused on real-world usability, cost, and accessibility for users in India and the Global South, challenging standard global benchmarks.
✍️Rashmi Patil
https://t.co/q1EvuHQDNK
Quantum startup @ParityQC demonstrates 52‑qubit quantum Fourier transform on IBM Heron processor, the largest to date: https://t.co/KUzbTxhp2b
New “Parity Twine” method achieves record-setting performance by rethinking how quantum information is represented and propagated.
A blog on how to get frontier-level results from small language models by using Mellea and Granite Libraries do the heavy lifting: https://t.co/rfjBuJjfc8
@RishiBommasani@percyliang We used the open concept kitchen analogy in this video a few years ago: https://t.co/FfAsaR5FTt (start around 6:15) and have gotten positive feedback from viewers.
📊 We also introduce VELI5, a new dataset with controlled factual errors + ground-truth fixes. This dataset has already been used to fine-tune state-of-the art factuality guardrails such as Granite Guardian [https://t.co/t30KnYRs7P].
(3/4)
IBM just dropped Granite 4.1, their largest model release to date
Language, vision, speech, embeddings, and safety models all in one drop
The 8B instruct model reportedly matches their previous 32B MoE on instruction following and tool calling
Guardian 4.1 does risk and policy scoring with calibrated confidence levels instead of binary yes/no filtering, which is a smarter approach for enterprise deployment
All Apache 2.0, available on HuggingFace, Ollama, and watsonx
IBM is quietly building a full enterprise AI stack
https://t.co/KTarte7hkG
IBM is clearly doubling down on a very specific lane here: practical, efficient, enterprise-ready models rather than chasing leaderboard dominance.
Granite 4.1 feels like a continuation of that philosophy—especially the 8B. That 4M token usage vs 78M on Qwen is kind of wild. In real deployments, that translates directly into:
lower latency
dramatically lower cost
easier scaling for agent workflows
Which honestly matters more than raw benchmark scores for most companies.
The tradeoff is obvious though: you’re giving up peak intelligence. A 12 vs 15 score doesn’t sound huge, but in practice that gap can show up in:
reasoning depth
edge-case handling
coding reliability
So these aren’t “frontier competitors”—they’re workhorse models.
What’s arguably more important is the Apache 2.0 + openness push. That 61 Openness Index score puts IBM ahead of most “open-ish” players like Alibaba (Qwen) and Google (Gemma). For enterprises, that’s a big deal:
fewer licensing headaches
more control over deployment (on-prem / air-gapped)
easier compliance story
The positioning is pretty clear:
Granite 3B → edge / lightweight agents
Granite 8B → sweet spot (cost vs capability)
Granite 30B → heavier enterprise workloads where you still want efficiency
The most interesting signal here isn’t the scores—it’s the token efficiency trend. If models like this keep improving, the industry might shift from “bigger is better” to:
“good enough intelligence, but 10–20x cheaper to run”
And that’s where adoption really explodes.
Curious part: if someone pairs Granite 8B with strong retrieval + tools, it could close a lot of that intelligence gap without losing its cost advantage. That’s probably the real play.
IBM has released three new non-reasoning Granite 4.1 models (30B, 8B, 3B) as open weights under Apache 2.0. All three are notably token-efficient relative to peer non-reasoning models, with the 8B standing out for its token efficiency relative to intelligence
@IBM has released three new instruct models in the Granite 4.1 family: Granite 4.1 30B (15 on the Intelligence Index), Granite 4.1 8B (12), and Granite 4.1 3B (9). The release continues IBM's focus on small, efficient, and open models for enterprise and edge deployment, alongside the existing Granite 4.0 Nano family (1B and 350M variants released in October 2025). The Intelligence Index is the Artificial Analysis synthesis metric incorporating 10 evaluations covering agentic tasks, coding, and scientific reasoning.
Key benchmarking results:
➤ All three Granite 4.1 models score 61 on the Artificial Analysis Openness Index, standing out among peer open weights non-reasoning models. This is driven by full open weights under Apache 2.0 plus partial disclosures across pre-training data, post-training data, and training methodology. Granite 4.1 sits well above peers like Qwen3.5 (39), Gemma 4 (39) and GLM-4.7-Flash (44), and represents a meaningful improvement over the Granite 4.0 family (56), driven by stronger methodology disclosure. Olmo 3.1 and K2 Think V2 (both 89) remain leaders as the most ‘open’ models.
➤ Granite 4.1 8B uses just 4M output tokens to run the Intelligence Index. This is ~20x fewer than Qwen3.5 9B (78M tokens), ~3x fewer than Ministral 3 8B (13M), and ~2x fewer than Gemma 4 E4B (8M). The pattern holds across the family: Granite 4.1 30B uses 4.6M output tokens (vs 7M for Gemma 4 31B and 25M for Qwen3.5 27B), and Granite 4.1 3B uses 2.7M.
➤ Token efficiency comes at the cost of intelligence relative to peer non-reasoning models. Granite 4.1 30B (15) trails leading peers like Qwen3.5 27B (37) and Gemma 4 31B (32). Granite 4.1 8B (12) trails Ministral 3 8B (15) and Gemma 4 E4B (15). Granite 4.1 3B (9) trails Gemma 4 E2B (12).
➤ Granite 4.1 30B and 3B both gain on the Intelligence Index over their Granite 4.0 predecessors. Granite 4.1 30B (15) gains 4 points over Granite 4.0 H Small (32B / 9B active, 11), with the largest gains in tool use (τ²-Bench: 42% vs 17%) and agentic tasks (GDPval-AA: 493 vs 344 Elo). Granite 4.1 3B (9) gains 1 point over Granite 4.0 Micro (8).
Other information:
➤ License: Apache 2.0 (open weights, permissive commercial use) ➤ Context window: 128K tokens ➤ Availability: Granite 4.1 8B is available via @WandB ($0.05/$0.1 per 1M input/output tokens) and @replicate. Weights for all three models are available via @huggingface.
What if your language model could reason efficiently in an entirely new language?
We introduce Abstract Chain-of-Thought, a new mechanism which allows language models to reason through a short sequence of reserved "abstract" tokens through reinforcement learning. It is as performant as verbalized CoT at a fraction of the cost, achieving major gains in inference-time efficiency.
I've been working on an open source project called Mellea, and wrote a blog post about using it to automatically validate and fix Qiskit code generated by an LLM: https://t.co/xmibsKpaFm
Nice to see this benchmark dataset on LLM-supported rare disease diagnosis and confirmation.
paper: https://t.co/7NX4iBvaWf
github: https://t.co/iSqvJvDite
#healourskin#raredisease
I disagree with the statement "we do not expect human beings to hold within themselves multiple different sets of moral beliefs and values" that appears in a paper about LLM moral reasoning that was published yesterday.
https://t.co/W85aTjWDsb
We have extended our ICLR workshop deadline to Feb 5th! #AFAA2026 Submit your work on fairness across alignment & agentic AI systems. We also continue to accept broad work on fairness. CfP: https://t.co/eCObuMJsVF
4th graders welcomed RB parent Kush R. Varshney, an IBM Fellow who volunteered his time to explain how AI works—its benefits and pitfalls—with a tailored presentation featuring our school song and a Charlotte’s Web excerpt. Grateful for his generosity & expertise! #WeAreChappaqua
Grateful to have co-hosted the Trusted AI Symposium yesterday. Left with so many new ideas from the posters, panels, and lectures. 🧠 Big thanks to our keynote speakers, panelists, and staff for driving the conversation on trust in AI.🤝 #TrustedAISymposium2026