Mr. Agent @agenticai - Twitter Profile

almost 2 years ago

❓What is an agent? I get asked this question a lot, so I wrote a little blog on this topic and other things: - What is an agent? - What does it mean to be agentic? - Why is “agentic” a helpful concept? - Agentic is new Check it out here: https://t.co/KCwYmzSd5Z

hwchase17's tweet photo. ❓What is an agent?

I get asked this question a lot, so I wrote a little blog on this topic and other things:
- What is an agent?
- What does it mean to be agentic?
- Why is “agentic” a helpful concept?
- Agentic is new

Check it out here: https://t.co/KCwYmzSd5Z https://t.co/doHt99poYP

14

328

41

371

58K

AgenticAI retweeted

Namgyu Ho

@itsnamgyu

almost 2 years ago

Do you know your LLM uses less than 1% of your GPU at inference? Too much time is wasted on KV cache memory access ➡️ We tackle this with the 🎁 Block Transformer: a global-to-local architecture that speeds up decoding up to 20x 🚀 @kaist_ai @LG_AI_Research w/ @GoogleDeepMind 🧵

itsnamgyu's tweet photo. Do you know your LLM uses less than 1% of your GPU at inference? Too much time is wasted on KV cache memory access ➡️ We tackle this with the 🎁 Block Transformer: a global-to-local architecture that speeds up decoding up to 20x 🚀

@kaist_ai @LG_AI_Research w/ @GoogleDeepMind 🧵 https://t.co/ina1b1tl0O

12

616

113

541

74K

AgenticAI retweeted

Eugene Vinitsky 🦋 @EugeneVinitsky

almost 2 years ago

This page of common pytorch mistakes is pretty invaluable https://t.co/GybuWrwDQH

4

815

104

1K

81K

AgenticAI retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 2 years ago

Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Enables MLLMs to express intermediate reasoning as images using code. You probably didn't use typography knowledge to solve this query proj: https://t.co/vhMNM3owrc abs: https://t.co/Hrd27izArg

arankomatsuzaki's tweet photo. Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

Enables MLLMs to express intermediate reasoning as images using code. You probably didn't use typography knowledge to solve this query

proj: https://t.co/vhMNM3owrc
abs: https://t.co/Hrd27izArg https://t.co/GznpdLfVhG

3

206

46

111

23K

AgenticAI retweeted

elvis

@omarsar0

almost 2 years ago

From RAG to Rich Parameters Investigates more closely how LLMs utilize external knowledge over parametric information for factual queries. Finds that in a RAG pipeline, LLMs take a “shortcut” and display a strong bias towards utilizing only the context information to answer the question, while relying minimally on their parametric memory. Quote: "Through attention contributions, attention knockouts and causal traces, we specifically observe a reduced reliance on the subject token, and the MLP activations associated with it, when the context is augmented with RAG." https://t.co/z4nAsDRNzp

omarsar0's tweet photo. From RAG to Rich Parameters

Investigates more closely how LLMs utilize external knowledge over parametric information for factual queries.

Finds that in a RAG pipeline, LLMs take a “shortcut” and display a strong bias towards utilizing only the context information to answer the question, while relying minimally on their parametric memory.

Quote: "Through attention contributions, attention knockouts and causal traces, we specifically observe a reduced reliance on the subject token, and the MLP activations associated with it, when the context is augmented with RAG."

https://t.co/z4nAsDRNzp

6

346

86

238

28K

AgenticAI retweeted

Rohan Paul

@rohanpaul_ai

almost 2 years ago

Transformer models can learn robust reasoning skills (beyond those of GPT-4-Turbo and Gemini-1.5-Pro) through a stage of training dynamics that continues far beyond the point of overfitting (i.e. with 'Grokking') 🤯 For a challenging reasoning task with a large search space, GPT-4-Turbo and Gemini-1.5-Pro based on non-parametric memory fail badly regardless of prompting styles or retrieval augmentation, while a fully grokked transformer can achieve near-perfect accuracy, showcasing the power of parametric memory for complex reasoning. 🤯 'Grokking' refers to a phenomenon where a transformer model continues to improve its generalization performance on a task through extended training, long after it has already fit the training data perfectly (i.e., achieved near-zero training loss). 👉Paper - "Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization" 📌 This paper investigates if transformers can learn to implicitly reason over parametric knowledge, a skill that even SoTA LLMs struggle with. The paper focuses on two types of reasoning - composition and comparison, and finds that transformers can learn implicit reasoning, but only through grokking, i.e. extended training far beyond overfitting. The levels of generalization vary across reasoning types: transformers fail to systematically generalize for composition but succeed for comparison when faced with out-of-distribution examples. 📌 Reveals: 1) The mechanism behind grokking, such as the formation of the generalizing circuit and its relation to the relative efficiency of generalizing vs memorizing circuits. 2) The connection between systematicity and the configuration of the generalizing circuit. 📌 For the composition task, the transformer forms a "sequential" generalizing circuit that stores atomic facts separately across layers, causing it to fail on out-of-distribution generalization. For the comparison task, the transformer forms a "parallel" generalizing circuit that stores atomic facts together, enabling it to achieve systematicity. 📌 The findings suggest that proper cross-layer memory-sharing mechanisms for transformers, such as memory-augmentation and explicit recurrence, are needed to further unlock the transformer's generalization capabilities.

rohanpaul_ai's tweet photo. Transformer models can learn robust reasoning skills (beyond those of GPT-4-Turbo and Gemini-1.5-Pro) through a stage of training dynamics that continues far beyond the point of overfitting (i.e. with 'Grokking') 🤯

For a challenging reasoning task with a large search space, GPT-4-Turbo and Gemini-1.5-Pro based on non-parametric memory fail badly regardless of prompting styles or retrieval augmentation, while a fully grokked transformer can achieve near-perfect accuracy, showcasing the power of parametric memory for complex reasoning. 🤯

'Grokking' refers to a phenomenon where a transformer model continues to improve its generalization performance on a task through extended training, long after it has already fit the training data perfectly (i.e., achieved near-zero training loss).

👉Paper - "Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization"

📌 This paper investigates if transformers can learn to implicitly reason over parametric knowledge, a skill that even SoTA LLMs struggle with. The paper focuses on two types of reasoning - composition and comparison, and finds that transformers can learn implicit reasoning, but only through grokking, i.e. extended training far beyond overfitting. The levels of generalization vary across reasoning types: transformers fail to systematically generalize for composition but succeed for comparison when faced with out-of-distribution examples.

📌 Reveals: 1) The mechanism behind grokking, such as the formation of the generalizing circuit and its relation to the relative efficiency of generalizing vs memorizing circuits. 2) The connection between systematicity and the configuration of the generalizing circuit.

📌 For the composition task, the transformer forms a "sequential" generalizing circuit that stores atomic facts separately across layers, causing it to fail on out-of-distribution generalization. For the comparison task, the transformer forms a "parallel" generalizing circuit that stores atomic facts together, enabling it to achieve systematicity.

📌 The findings suggest that proper cross-layer memory-sharing mechanisms for transformers, such as memory-augmentation and explicit recurrence, are needed to further unlock the transformer's generalization capabilities.

8

274

31

234

31K

AgenticAI retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 2 years ago

Google presents What Are the Odds? Language Models Are Capable of Probabilistic Reasoning https://t.co/PR2Jiqi8tv

4

364

55

267

37K

AgenticAI retweeted

Sumit @_reachsumit

almost 2 years ago

A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges Explores applications of LLMs in various financial tasks, discussing the challenges, opportunities, and resources for further development in this domain. 📝https://t.co/ae12zaAEAv

_reachsumit's tweet photo. A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges

Explores applications of LLMs in various financial tasks, discussing the challenges, opportunities, and resources for further development in this domain.

📝https://t.co/ae12zaAEAv https://t.co/9fb3l6WU3u

0

20

7

11

981

AgenticAI retweeted

Harrison Chase

@hwchase17

almost 2 years ago

I have lots of thoughts on "agents"! ❓What is an agent? Why do the basic agents not work reliably? How are teams bringing "agentic" applications to production 🙏I had a lot of fun talking about these topics (and more!) for nearly a hour with Sonya/Pat https://t.co/T9RpLHzrqr

hwchase17's tweet photo. I have lots of thoughts on "agents"!

❓What is an agent? Why do the basic agents not work reliably? How are teams bringing "agentic" applications to production

🙏I had a lot of fun talking about these topics (and more!) for nearly a hour with Sonya/Pat

https://t.co/T9RpLHzrqr https://t.co/2fB7PiTX1N

7

251

58

250

44K

AgenticAI retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

almost 2 years ago

Learning Iterative Reasoning through Energy Diffusion abs: https://t.co/rilXN2Jvi8 project page: https://t.co/Pw3ErOGTSF "IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference based on problem difficulty, enabling it to solve problems outside its training distribution"

iScienceLuvr's tweet photo. Learning Iterative Reasoning through Energy Diffusion

abs: https://t.co/rilXN2Jvi8
project page: https://t.co/Pw3ErOGTSF

"IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference based on problem difficulty, enabling it to solve problems outside its training distribution"

2

206

42

114

16K

AgenticAI retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

almost 2 years ago

Transcendence: Generative Models Can Outperform The Experts That Train Them abs: https://t.co/2RYTj27R0j Uses chess games as a simple testbed for studying transcedence: generative models trained on human labels that outperform humans. Transformer models are trained on public datasets of human chess transcripts. To test for transcendence, the maximal rating of the human players in the dataset are limited to below a specified score. ChessFormer 1000 and ChessFormer 1300 achieve significant levels of transcendence, surpassing the maximal rating seen in the dataset. The observation is that the generative models implicitly perform majority voting over the human experts. Sampling with low-temperature also implicitly induces this majority vote phenomenon and this is where transcendence is observed.

iScienceLuvr's tweet photo. Transcendence: Generative Models Can Outperform The Experts That Train Them

abs: https://t.co/2RYTj27R0j

Uses chess games as a simple testbed for studying transcedence: generative models trained on human labels that outperform humans.

Transformer models are trained on public datasets of human chess transcripts. To test for transcendence, the maximal rating of the human players in the dataset are limited to below a specified score. ChessFormer 1000 and ChessFormer 1300 achieve significant levels of transcendence, surpassing the maximal rating seen in the dataset.

The observation is that the generative models implicitly perform majority voting over the human experts. Sampling with low-temperature also implicitly induces this majority vote phenomenon and this is where transcendence is observed.

6

301

76

191

29K

AgenticAI retweeted

DeepSeek

@deepseek_ai

almost 2 years ago

DeepSeek-Coder-V2: First Open Source Model Beats GPT4-Turbo in Coding and Math > Excels in coding and math, beating GPT4-Turbo, Claude3-Opus, Gemini-1.5Pro, Codestral. > Supports 338 programming languages and 128K context length. > Fully open-sourced with two sizes: 230B (also with API access) and 16B. #DeepSeekCoder

deepseek_ai's tweet photo. DeepSeek-Coder-V2: First Open Source Model Beats GPT4-Turbo in Coding and Math

> Excels in coding and math, beating GPT4-Turbo, Claude3-Opus, Gemini-1.5Pro, Codestral.
> Supports 338 programming languages and 128K context length.
> Fully open-sourced with two sizes: 230B (also with API access) and 16B.

#DeepSeekCoder

61

2K

324

724

488K

AgenticAI retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 2 years ago

How Do Large Language Models Acquire Factual Knowledge During Pretraining? Reveals several important insights into the dynamics of factual knowledge acquisition during pretraining https://t.co/L7zEusOnFf

arankomatsuzaki's tweet photo. How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Reveals several important insights into the dynamics of factual knowledge acquisition during pretraining

https://t.co/L7zEusOnFf https://t.co/GwLLBQfKjR

6

404

74

378

55K

AgenticAI retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 2 years ago

Google presents Improve Mathematical Reasoning in Language Models by Automated Process Supervision - MCTS for the efficient collection of high-quality process supervision data - 51% -> 69.4% on MATH - No human intervention https://t.co/1Kh8rVyTat

arankomatsuzaki's tweet photo. Google presents Improve Mathematical Reasoning in Language Models by Automated Process Supervision

- MCTS for the efficient collection of high-quality process supervision data
- 51% -> 69.4% on MATH
- No human intervention

https://t.co/1Kh8rVyTat https://t.co/NCFbUiLrli

5

341

57

274

36K

AgenticAI retweeted

Bindu Reddy

@bindureddy

almost 2 years ago

Announcing LiveBench AI - The WORLD'S FIRST LLM Benchmark That Can't Be Gamed!! We (Abacus AI) partnered with Yann LeCunn and his team to create LiveBench AI! LiveBench is a living/breathing benchmark with new challenges that you CAN'T simply memorize. Unlike blind human eval, you can't fine-tune or style-hack your LLM to ace simple human conversations. We evaluate LLMs on different dimensions, including reasoning, coding, writing, and data analysis. The key reason behind introducing LiveBench is that you can disambiguate LLMs better. Here are some key findings - GPT-4o inches out GPT-4-turbo. - Claude Opus excels at data analysis and language understanding - Gemini doesn't score as well as Claude or GPT-4 as it does on Lmsys. This means, generally speaking, Gemini isn't as good as Claude or GPT - GPT-4 does much better at reasoning and coding than GPT-4o. We and other labs have reported this before, as well - Qwen 72B is the best open-source model This benchmark serves as LLMs' independent, objective, and TRANSPARENT ranking. We are excited to maintain this living benchmark and hope the other models catch up to GPT-4 on these hard questions.

bindureddy's tweet photo. Announcing LiveBench AI - The WORLD'S FIRST LLM Benchmark That Can't Be Gamed!!

We (Abacus AI) partnered with Yann LeCunn and his team to create LiveBench AI!

LiveBench is a living/breathing benchmark with new challenges that you CAN'T simply memorize. Unlike blind human eval, you can't fine-tune or style-hack your LLM to ace simple human conversations.

We evaluate LLMs on different dimensions, including reasoning, coding, writing, and data analysis.

The key reason behind introducing LiveBench is that you can disambiguate LLMs better. Here are some key findings

- GPT-4o inches out GPT-4-turbo.
- Claude Opus excels at data analysis and language understanding
- Gemini doesn't score as well as Claude or GPT-4 as it does on Lmsys. This means, generally speaking, Gemini isn't as good as Claude or GPT
- GPT-4 does much better at reasoning and coding than GPT-4o. We and other labs have reported this before, as well
- Qwen 72B is the best open-source model

This benchmark serves as LLMs' independent, objective, and TRANSPARENT ranking.

We are excited to maintain this living benchmark and hope the other models catch up to GPT-4 on these hard questions.

89

902

184

474

301K

AgenticAI retweeted

AK

@_akhaliq

almost 2 years ago

Husky A Unified, Open-Source Language Agent for Multi-Step Reasoning Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as

_akhaliq's tweet photo. Husky

A Unified, Open-Source Language Agent for Multi-Step Reasoning

Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as https://t.co/siMBFCQXyt

3

321

75

255

35K

AgenticAI retweeted

elvis

@omarsar0

almost 2 years ago

Towards Lifelong Learning of LLMs Nice survey on techniques to enable LLMs to learn continuously, integrate new knowledge, retain previously learned information, and prevent catastrophic forgetting. https://t.co/jLQCSVpy77

omarsar0's tweet photo. Towards Lifelong Learning of LLMs

Nice survey on techniques to enable LLMs to learn continuously, integrate new knowledge, retain previously learned information, and prevent catastrophic forgetting.

https://t.co/jLQCSVpy77

6

342

83

276

30K

AgenticAI retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 2 years ago

Simple and Effective Masked Diffusion Language Models Achieves a new SotA among diffusion models on a range of LM tasks and approaches AR perplexity repo: https://t.co/97uIi2my8I abs: https://t.co/R82RQBmLEI

arankomatsuzaki's tweet photo. Simple and Effective Masked Diffusion Language Models

Achieves a new SotA among diffusion models on a range of LM tasks and approaches AR perplexity

repo: https://t.co/97uIi2my8I
abs: https://t.co/R82RQBmLEI https://t.co/19VV5iuMCY

5

254

59

147

41K

AgenticAI retweeted

Chief AI Officer @chiefaioffice

almost 2 years ago

BREAKING: Mistral raises a $640M Series B led by General Catalyst at a $6B valuation. Here's their Seed pitch deck to remind you of their vision:

chiefaioffice's tweet photo. BREAKING: Mistral raises a $640M Series B led by General Catalyst at a $6B valuation.

Here's their Seed pitch deck to remind you of their vision: https://t.co/9N3jsMjAqC

22

1K

166

2K

284K

AgenticAI retweeted

Sumit @_reachsumit

almost 2 years ago

Synthetic Query Generation using Large Language Models for Virtual Assistants Apple investigates the use of LLMs to generate synthetic queries for virtual assistants that are similar to real user queries and specific to retrieving relevant entities. 📝https://t.co/geldbITLEQ

_reachsumit's tweet photo. Synthetic Query Generation using Large Language Models for Virtual Assistants

Apple investigates the use of LLMs to generate synthetic queries for virtual assistants that are similar to real user queries and specific to retrieving relevant entities.

📝https://t.co/geldbITLEQ https://t.co/ZqFzgFPDmt

0

76

15

42

6K

Mr. Agent

@AgenticAI

Last Seen Users on Sotwe

Trends for you

Most Popular Users