Adam Łucek @AdamRLucek - Twitter Profile

Adam Łucek

@AdamRLucek

about 1 hour ago

@TBone2101 Simplicity is always key! Very easy to overdo it once code execution and scripts come into play

0

55

Adam Łucek

@AdamRLucek

about 6 hours ago

I'm bullish on agent swarms (aka workflows). Agents are increasingly being used to analyze and collate massive amounts of unstructured data in repetitive ways (e.g. document extraction, reading emails, parsing logs), but as these tasks and data inputs scale we've seen reliable execution decrease, even from the most capable models. Specifically, the consistency of sub agent dispatches from filesystem-based agents drops dramatically when attempting to deploy more than 30+ sub agents in parallel. So… how can you harness the best of an agent's intelligent decision making with reliable sub agent task execution at scale? Here's how 👇 1/5

AdamRLucek's tweet photo. I'm bullish on agent swarms (aka workflows). Agents are increasingly being used to analyze and collate massive amounts of unstructured data in repetitive ways (e.g. document extraction, reading emails, parsing logs), but as these tasks and data inputs scale we've seen reliable execution decrease, even from the most capable models. Specifically, the consistency of sub agent dispatches from filesystem-based agents drops dramatically when attempting to deploy more than 30+ sub agents in parallel.

So… how can you harness the best of an agent's intelligent decision making with reliable sub agent task execution at scale? Here's how 👇 1/5

3

36

5

58

6K

Adam Łucek

@AdamRLucek

about 4 hours ago

@masondrxy Agents galore and knicks in 4 🗽

0

2

0

31

Adam Łucek

@AdamRLucek

about 6 hours ago

Finally, the output can be navigated and ingested by the orchestrating agent for any downstream follow up! From testing and implementation, we've seen these techniques begin to deliver incredibly reliable sub agent execution at scale, where relying on a single LLM's function calling ability and logic would fail. It will be critical to watch how this evolves as more agent harnesses begin to rely on both code execution and recursive agent runs behind the scenes! 5/5

AdamRLucek's tweet photo. Finally, the output can be navigated and ingested by the orchestrating agent for any downstream follow up!
From testing and implementation, we've seen these techniques begin to deliver incredibly reliable sub agent execution at scale, where relying on a single LLM's function calling ability and logic would fail. It will be critical to watch how this evolves as more agent harnesses begin to rely on both code execution and recursive agent runs behind the scenes! 5/5

0

1

0

1

173

Who to follow

G Unit

@gavinlmyers

we’re just normal men. we’re just innocent men 🇻🇦🚒

about 6 hours ago

Once each individual sub agent completes its run, the results can either be joined back to the main table or acknowledged as completed. We can use structured outputs to do this join automatically via the same dispatch script as results stream in. 4/5

AdamRLucek's tweet photo. Once each individual sub agent completes its run, the results can either be joined back to the main table or acknowledged as completed. We can use structured outputs to do this join automatically via the same dispatch script as results stream in. 4/5 https://t.co/3bAObtOA2e

1

0

178

Adam Łucek

@AdamRLucek

about 21 hours ago

@huntlovell @hwchase17 Imagine forgetting middleware… couldn’t be me

0

1

0

92

Adam Łucek

@AdamRLucek

2 days ago

@fantopy Lowkey fear it may literally be folks asking Claude to write copy

0

16

Adam Łucek

@AdamRLucek

2 days ago

One personal gripe I have with current ai product advertising is that many displays/billboards seem strangely… verbose? Like lots of shoehorned text awkwardly worldbuilding niche scenarios to get their use case across Am I just not the target audience? does this not seem counterintuitive to traditional brand/product marketing?

1

3

0

214

Adam Łucek

@AdamRLucek

2 days ago

@palashshah Dawg if you know nothing then what the hell do I know

1

0

249

Adam Łucek

@AdamRLucek

3 days ago

@adelbucetta Magic everywhere lowkey

0

1

0

10

Adam Łucek

@AdamRLucek

3 days ago

One of the most technically impressive agents I’ve had the honor of working on 🚒

LangChain

@LangChain

3 days ago

Stop manually triaging agent failures. Let LangSmith Engine fix it.

10

80

15

34

33K

2

32

6

16

9K

Adam Łucek

@AdamRLucek

3 days ago

@BraceSproul Actually? That’s the smoking gun. And genuinely a belt and suspenders insight

0

3

0

68

Adam Łucek

@AdamRLucek

3 days ago

@Vtrivedy10

1

3

0

113

Adam Łucek

@AdamRLucek

6 days ago

@Vtrivedy10 Reading the “LLMs are few shot learners” paper radicalized me

1

4

0

3

607

Adam Łucek

@AdamRLucek

9 days ago

@Abhilekh_Meda Maybe soon 🤔

0

1

0

61

Adam Łucek

@AdamRLucek

9 days ago

Trace data is literally worth its weight in gold these days, if you know what to do with it! As has been established, creating effective agents requires shipping early, observing behavior, and iterating quickly. At the core of this are your agent traces capturing exact inputs, outputs, steps, and metadata along the way. Analyzing traces helps surface inefficiencies and areas for improvement, but they can also be used in more sophisticated ways to set up robust evaluations. Here's two of the ways we use traces to build evals for production agents 👇

AdamRLucek's tweet photo. Trace data is literally worth its weight in gold these days, if you know what to do with it! As has been established, creating effective agents requires shipping early, observing behavior, and iterating quickly. At the core of this are your agent traces capturing exact inputs, outputs, steps, and metadata along the way.

Analyzing traces helps surface inefficiencies and areas for improvement, but they can also be used in more sophisticated ways to set up robust evaluations.

Here's two of the ways we use traces to build evals for production agents 👇

12

154

22

267

42K

Adam Łucek

@AdamRLucek

9 days ago

@pothuLabs A real missed opportunity for most!

3

0

80

Adam Łucek

@AdamRLucek

9 days ago

@novasarc01 Ah ok so this is testing whether the proposed eval actually catches the failure mode that it’s supposed to. Nice! Need to look deeper into user session coding traces- agreed with the earlier point that standard benchmarks rarely reflect actual user experience

1

2

0

1

78

Adam Łucek

@AdamRLucek

9 days ago

@novasarc01 Interesting! So you also generate what could be called, for lack of better terms, a "synthetic ground truth" based on what should be expected/have happened/ideal state with the eval. What does rerunning the eval on that tell you? Or do you use as more a target/reference

1

0

2

123

Adam Łucek

@AdamRLucek

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users