mlfeed.tech

@mlfeedtech

is your source for the latest artificial intelligence, machine learning, and data science content and analysis.

Joined May 2019

224 Following

328 Followers

22.6K Posts

mlfeedtech retweeted

David Koleczek @davidkoleczek

about 1 month ago

Dwarkesh posed some interesting questions https://t.co/35I1jyzlsF, https://t.co/sUC9ljtszQ One of them was why hasn't AI hit the wall that many expected. It comes down to "bootstrapping", industry growth, and that AI Progress is on a technical trajectory. https://t.co/PTFQdFr6nU

mlfeedtech retweeted

David Koleczek @davidkoleczek

about 1 month ago

Customizing harnesses for multiple models, especially across providers, is still hard. People are starting to figure out with strategies for it, here are some examples: https://t.co/4BkSmC4rmi https://t.co/T0Frk2yjyB

mlfeedtech retweeted

Sumit @_reachsumit

6 months ago

NVIDIA Nemotron 3: Efficient and Open Intelligence NVIDIA introduces a family of models (Nano, Super, Ultra) using hybrid Mamba-Transformer MoE architecture with up to 1M token context and state-of-the-art reasoning performance. 📝 https://t.co/IQKsRqhmdc

387

102

38K

mlfeedtech retweeted

Rohan Paul

@rohanpaul_ai

6 months ago

The paper proposes FaithLens, an 8B model that spots when a large language model (LLM) claim is unsupported, and explains why. Makes it much easier and cheaper to catch and explain hallucinated claims before they reach users. Across 12 benchmarks, it beats GPT-4.1 and o3 while running far cheaper. In many apps, a model is given documents but still invents details, and that is a faithfulness hallucination. Most checkers either call a huge judge model, or output a bare Yes or No with no reasons. FaithLens takes a document and a claim, then returns both the label and a short explanation that points to the missing or conflicting evidence. To train it without humans, the authors make synthetic examples using a stronger model, then throw away samples where the label, explanation, or topic variety looks wrong. After that cold-start training, they run reinforcement learning where an explanation only earns credit if it helps a weaker model reach the correct Yes or No. The takeaway is a practical, low-cost verifier that flags a bad claim and spells out the evidence gap. ---- Paper Link – arxiv. org/abs/2512.20182 Paper Title: "FaithLens: Detecting and Explaining Faithfulness Hallucination"

rohanpaul_ai's tweet photo. The paper proposes FaithLens, an 8B model that spots when a large language model (LLM) claim is unsupported, and explains why.

Makes it much easier and cheaper to catch and explain hallucinated claims before they reach users.

Across 12 benchmarks, it beats GPT-4.1 and o3 while running far cheaper.

In many apps, a model is given documents but still invents details, and that is a faithfulness hallucination.

Most checkers either call a huge judge model, or output a bare Yes or No with no reasons.

FaithLens takes a document and a claim, then returns both the label and a short explanation that points to the missing or conflicting evidence.

To train it without humans, the authors make synthetic examples using a stronger model, then throw away samples where the label, explanation, or topic variety looks wrong.

After that cold-start training, they run reinforcement learning where an explanation only earns credit if it helps a weaker model reach the correct Yes or No.

The takeaway is a practical, low-cost verifier that flags a bad claim and spells out the evidence gap.

----

Paper Link – arxiv. org/abs/2512.20182

Paper Title: "FaithLens: Detecting and Explaining Faithfulness Hallucination"

148

102

11K

Who to follow

Pavel Izmailov

@Pavel_Izmailov

Researcher @AnthropicAI 🤖 Assistant Professor @nyuniversity 🏙️ Previously @OpenAI #StopWar 🇺🇦

ChemRxiv

@ChemRxiv

ChemRxiv is the open preprint server for the global chemistry community. Contact us at [email protected] for assistance.

Sanjeev Arora

@prfsanjeevarora

Director, @PrincetonPLI and Professor @PrincetonCS. Seeks math/conceptual understanding of deep learning and large AI models. Also on the "other" social network

mlfeedtech retweeted

DAIR.AI

@dair_ai

6 months ago

The first comprehensive survey on GraphRAG. There is a lot of interest in GraphRAG, so let's discuss why it matters. RAG has transformed how LLMs access external knowledge. However, traditional RAG treats documents as isolated chunks. It misses the relationships between entities that often matter most for answering complex questions. But real-world knowledge is interconnected. This survey formalizes GraphRAG: retrieval-augmented generation that leverages graph structure instead of flat text. Graphs capture what text-based RAG cannot. Citation networks encode influence. Knowledge graphs encode relationships. Social networks encode interactions. Semantic similarity alone misses these structural signals. The framework operates in three stages. - Graph-Based Indexing: construct or connect to knowledge graphs, whether open sources like Wikidata and ConceptNet or self-constructed from documents. - Graph-Guided Retrieval: fetch relevant nodes, triplets, paths, or subgraphs based on queries. - Graph-Enhanced Generation: convert retrieved graph elements into prompts for LLMs. Benefits of GraphRAG: - Retrieval granularity matters. - Nodes provide entity information. - Triplets capture direct relationships. - Paths reveal multi-hop reasoning chains. - Subgraphs offer comprehensive local context. - Hybrid approaches combine multiple granularities based on query complexity. The survey covers 200+ papers across downstream tasks, including knowledge base QA, commonsense reasoning, entity linking, fact verification, and dialogue systems. Application domains span e-commerce, biomedicine, academic research, and legal analysis. As RAG adoption grows, understanding when and how to incorporate graph structure becomes critical. Not every retrieval task needs graphs, but many complex reasoning tasks benefit substantially from explicit relational knowledge. Paper: https://t.co/uGIwU1SyCd Learn to build effective AI agents and RAG systems in our academy: https://t.co/zQXQt0PMbG

dair_ai's tweet photo. The first comprehensive survey on GraphRAG.

There is a lot of interest in GraphRAG, so let's discuss why it matters.

RAG has transformed how LLMs access external knowledge. However, traditional RAG treats documents as isolated chunks. It misses the relationships between entities that often matter most for answering complex questions.

But real-world knowledge is interconnected.

This survey formalizes GraphRAG: retrieval-augmented generation that leverages graph structure instead of flat text.

Graphs capture what text-based RAG cannot. Citation networks encode influence. Knowledge graphs encode relationships. Social networks encode interactions. Semantic similarity alone misses these structural signals.

The framework operates in three stages.

- Graph-Based Indexing: construct or connect to knowledge graphs, whether open sources like Wikidata and ConceptNet or self-constructed from documents.
- Graph-Guided Retrieval: fetch relevant nodes, triplets, paths, or subgraphs based on queries.
- Graph-Enhanced Generation: convert retrieved graph elements into prompts for LLMs.

Benefits of GraphRAG:

- Retrieval granularity matters.
- Nodes provide entity information.
- Triplets capture direct relationships.
- Paths reveal multi-hop reasoning chains.
- Subgraphs offer comprehensive local context.
- Hybrid approaches combine multiple granularities based on query complexity.

The survey covers 200+ papers across downstream tasks, including knowledge base QA, commonsense reasoning, entity linking, fact verification, and dialogue systems. Application domains span e-commerce, biomedicine, academic research, and legal analysis.

As RAG adoption grows, understanding when and how to incorporate graph structure becomes critical. Not every retrieval task needs graphs, but many complex reasoning tasks benefit substantially from explicit relational knowledge.

Paper: https://t.co/uGIwU1SyCd

Learn to build effective AI agents and RAG systems in our academy: https://t.co/zQXQt0PMbG

375

519

68K

mlfeedtech retweeted

Chi Wang

@Chi_Wang_

over 2 years ago

Imagine if ✨multiple✨ ChatGPT agents could collaborate to solve complex tasks for you! 🧑‍🦱🤝🤖🤖🤖 📢 AutoGen: A new framework for building multi-agent LLM applications https://t.co/6fcGHxLcIB It allows creating many agents that converse to solve complex tasks! ... 1/4

Chi_Wang_'s tweet photo. Imagine if ✨multiple✨ ChatGPT agents could collaborate to solve complex tasks for you! 🧑‍🦱🤝🤖🤖🤖

📢 AutoGen: A new framework for building multi-agent LLM applications

https://t.co/6fcGHxLcIB

It allows creating many agents that converse to solve complex tasks! ...

1/4 https://t.co/dnEvpxZbIW

240

105

121K

mlfeedtech retweeted

Microsoft AI Frontiers

@ms_aifrontiers

over 2 years ago

AutoGen is the highest trending repo on GitHub this week! 🚀🚀🚀 https://t.co/7Eui0xcDNm #AutoGen #TechNews #LLM #Microsoft

mlfeedtech retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 3 years ago

AgentBench: Evaluating LLMs as Agents Presents a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM as Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. repo: https://t.co/6DIaftPaNw abs: https://t.co/H1B7wc6hRp

arankomatsuzaki's tweet photo. AgentBench: Evaluating LLMs as Agents

Presents a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM as Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting.

repo: https://t.co/6DIaftPaNw
abs: https://t.co/H1B7wc6hRp

591

138

312

115K

mlfeedtech retweeted

Niels Rogge @NielsRogge

almost 3 years ago

After text-only leaderboards, the next step is going to be multimodal leaderboards. Found an interesting paper that tries to benchmark all multimodal LLMs: https://t.co/NbRjgxDlq2 Looks like instructBLIP is current SOTA. Available in @huggingface here: https://t.co/OnzQH0Gckd

NielsRogge's tweet photo. After text-only leaderboards, the next step is going to be multimodal leaderboards. Found an interesting paper that tries to benchmark all multimodal LLMs: https://t.co/NbRjgxDlq2

Looks like instructBLIP is current SOTA.

Available in @huggingface here: https://t.co/OnzQH0Gckd https://t.co/0RJC8upZBF

345

156

57K

mlfeedtech retweeted

elvis

@omarsar0

almost 3 years ago

Enabling LLMs with tool-use capabilities is where I am noticing the greatest potential for companies to go big with LLMs. Gorilla is a good popular example but I have seen a ton of other examples, especially from people building with AI-powered agents. I also think this is one of the use cases where open LLMs like Llama 2 are going to be extremely useful -- every company will want to tune their models for their own internal APIs. If you are curious about this space, check out this new paper that enables LLMs to interact with 16000 real-world APIs. It's more of a framework with all the niceties like data preparation, training, and evaluation (GitHub repo included). The authors also claim that one of their models, ToolLLaMA, has reached the performance of ChatGPT (turbo-16k) in tool use. Another side note: not sure if it's possible that LLMs can do this natively, although the Llama 2 paper does mention a related emergent behavior. I have been tracking all the research and tools that aim to enable these types of capabilities. Combining tools and LLMs is nothing new and we are seeing this across products and even in domains like Robotics and Chemistry. There are significant breakthroughs to be made here but we are not quite there yet. (paper and tool in the replies)

omarsar0's tweet photo. Enabling LLMs with tool-use capabilities is where I am noticing the greatest potential for companies to go big with LLMs.

Gorilla is a good popular example but I have seen a ton of other examples, especially from people building with AI-powered agents. I also think this is one of the use cases where open LLMs like Llama 2 are going to be extremely useful -- every company will want to tune their models for their own internal APIs.

If you are curious about this space, check out this new paper that enables LLMs to interact with 16000 real-world APIs. It's more of a framework with all the niceties like data preparation, training, and evaluation (GitHub repo included).

The authors also claim that one of their models, ToolLLaMA, has reached the performance of ChatGPT (turbo-16k) in tool use.

Another side note: not sure if it's possible that LLMs can do this natively, although the Llama 2 paper does mention a related emergent behavior.

I have been tracking all the research and tools that aim to enable these types of capabilities. Combining tools and LLMs is nothing new and we are seeing this across products and even in domains like Robotics and Chemistry. There are significant breakthroughs to be made here but we are not quite there yet.

(paper and tool in the replies)

504

137

453

136K

mlfeedtech retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 3 years ago

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs ToolLLaMA exhibits comparable performance to ChatGPT repo: https://t.co/J5V19NtpUw abs: https://t.co/DzVjlJomiZ

arankomatsuzaki's tweet photo. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

ToolLLaMA exhibits comparable performance to ChatGPT

repo: https://t.co/J5V19NtpUw
abs: https://t.co/DzVjlJomiZ https://t.co/stddoyr4zN

607

150

292

89K

mlfeedtech retweeted

DAIR.AI

@dair_ai

almost 3 years ago

Top ML Papers of the Week (July 24 - July 30): - RT-2 - LoraHub - Med-PaLM Multimodal - Survey of Aligned LLMs - Foundation Models in Vision - Universal Adversarial LLM Attacks ...

414

225

114K

mlfeedtech retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 3 years ago

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback Presents PanGu-Coder2, which achieves 62.20% pass@1 on the HumanEval benchmark. https://t.co/3QjBH9xlkE

arankomatsuzaki's tweet photo. PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Presents PanGu-Coder2, which achieves 62.20% pass@1 on the HumanEval benchmark.

https://t.co/3QjBH9xlkE https://t.co/cHrdnFL5fQ

166

36K

mlfeedtech retweeted

Moritz Borrett-Laurer @MoritzLaurer

almost 3 years ago

Microsoft and Tsinghua U. claim to have found the "Successor to Transformer for Large Language Models": RetNet. They claim better language modelling performance, with 3.4x lower memory consumption, 8.4x higher throughput, 15.6x lower latency. 1/2

MoritzLaurer's tweet photo. Microsoft and Tsinghua U. claim to have found the "Successor to Transformer for Large Language Models": RetNet. They claim better language modelling performance, with 3.4x lower memory consumption, 8.4x higher throughput, 15.6x lower latency. 1/2 https://t.co/2MsA5QpWWC

983

161

560

222K

mlfeedtech retweeted

Yam Peleg

@Yampeleg

almost 3 years ago

The largest dialog dataset collection just dropped! DialogStudio from Salesforce TL;DR: Merged data from 87 datasets. Evaluated & filtered each sample by multiple criteria [1]. Ended up with a HUGE high quality conversational dataset. --- Huggingface Dataset: https://t.co/4kDDMsJGI6 Github: https://t.co/DQOdDvPZKl Paper: https://t.co/6ZxT0KToAW --- The conversations in the dataset are categorized into multiple categories: - Knowledge-Grounded-Dialogues - Natural-Language-Understanding - Open-Domain-Dialogues - Task-Oriented-Dialogues - Dialogue-Summarization - Conversational-Recommendation-Dialogs Really cool and useful work. I just wish I had enough compute to train on all of these datasets --- [1] Understanding, Relevance, Correctness, Coherence, Completeness, and Overall Quality.

Yampeleg's tweet photo. The largest dialog dataset collection just dropped!

DialogStudio from Salesforce

TL;DR: Merged data from 87 datasets. Evaluated & filtered each sample by multiple criteria [1]. Ended up with a HUGE high quality conversational dataset.

---
Huggingface Dataset: https://t.co/4kDDMsJGI6
Github: https://t.co/DQOdDvPZKl
Paper: https://t.co/6ZxT0KToAW
---

The conversations in the dataset are categorized into multiple categories:

- Knowledge-Grounded-Dialogues
- Natural-Language-Understanding
- Open-Domain-Dialogues
- Task-Oriented-Dialogues
- Dialogue-Summarization
- Conversational-Recommendation-Dialogs

Really cool and useful work.
I just wish I had enough compute to train on all of these datasets

---

[1] Understanding, Relevance, Correctness, Coherence, Completeness, and Overall Quality.

543

135

402

110K

mlfeedtech retweeted

Sanyam Bhutani

@bhutanisanyam1

almost 3 years ago

Easily the best paper on current State of LLMs! 🙏 A 50 page read but it’s not “just another” survey paper, that only documents facts. The authors actually add very useful commentary capturing all aspects of building Large Language Models. Hence the result is a collection of ideas we might have missed across months of research. It covers both building LLMs and effectively applying them to domains, with a focus on current limitations and “sharp edges” As always, I think great content makes you discover missing bits in your knowledge, for this reason it’s a solid cover to cover read recommendation: https://t.co/DjCSS8o2Tp

bhutanisanyam1's tweet photo. Easily the best paper on current State of LLMs! 🙏

A 50 page read but it’s not “just another” survey paper, that only documents facts. The authors actually add very useful commentary capturing all aspects of building Large Language Models.

Hence the result is a collection of ideas we might have missed across months of research.

It covers both building LLMs and effectively applying them to domains, with a focus on current limitations and “sharp edges”

As always, I think great content makes you discover missing bits in your knowledge, for this reason it’s a solid cover to cover read recommendation:

https://t.co/DjCSS8o2Tp

305

314K

mlfeedtech retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 3 years ago

Meta-Transformer: A Unified Framework for Multimodal Learning The first framework to perform unified learning across 12 modalities with unpaired data https://t.co/PYcYbnT6O0

arankomatsuzaki's tweet photo. Meta-Transformer: A Unified Framework for Multimodal Learning

The first framework to perform unified learning across 12 modalities with unpaired data

https://t.co/PYcYbnT6O0 https://t.co/axNZmJ4Vle

500

127

230

77K

mlfeedtech retweeted

Arvind Narayanan

@random_walker

almost 3 years ago

We dug into a paper that’s been misinterpreted as saying GPT-4 has gotten worse. The paper shows behavior change, not capability decrease. And there's a problem with the evaluation—on 1 task, we think the authors mistook mimicry for reasoning. w/ @sayashk https://t.co/ZieaBZLRFy

992

195

390

581K

mlfeedtech retweeted

Leandro von Werra

@lvwerra

almost 3 years ago

Did you know that you can train all Llama-2 models on your own data in just a few lines? The script even works with the 70B model on a single A100 GPU thanks to the magic of 4bit and and PEFT! Learn more: https://t.co/njGyWdnbzT Full script: https://t.co/6y9FNdYSuT

lvwerra's tweet photo. Did you know that you can train all Llama-2 models on your own data in just a few lines?

The script even works with the 70B model on a single A100 GPU thanks to the magic of 4bit and and PEFT!

Learn more: https://t.co/njGyWdnbzT
Full script: https://t.co/6y9FNdYSuT https://t.co/qydJryHahs

268

192K

mlfeedtech retweeted

Andrej Karpathy

@karpathy

almost 3 years ago

Good / slightly obscure tip is that applications can benefit from custom supervised finetuning of emebeddings returned by APIs. Collect a few examples of +ve (and optionally hard -ve) pairs, use them to train a linear projection that better discriminates your pairs.

831

507

265K

mlfeed.tech

@mlfeedtech

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users