Steve El-Hage @hagestev - Twitter Profile

7 days ago

@mathemagic1an nice. have your agent have them add a proof of work tag to write at the beginning of the video to avoid clipping risk.

0

977

hagestev retweeted

Will

@willreil

4 months ago

Buick would make a bazillion dollars if they just re-released the grand national with modern safety features. Exact same car, bazillion dollars.

453

34K

3K

807

2M

Steve El-Hage

@hagestev

5 months ago

@vasuman its particularly bad if your context has high logic density (ie rules, conditionals, exceptions) which becomes more common for production use cases

0

2

0

208

Steve El-Hage

@hagestev

8 months ago

@cwamidon why do you say that? what parts are you bottlenecked on?

0

240

Who to follow

𝐓𝐡𝐢𝐬 𝐢𝐬 𝐃𝐢𝐬𝐚𝐬⸸𝐞𝐫

@k_zulvan

my beloved Oshi @itskindavienny #𝐓𝐞𝐫𝐢𝐦𝐚𝐤𝐚𝐬𝐢𝐡𝐕𝐢𝐞𝐧𝐧𝐲 layanan pengaduan ⬇️⬇️⬇️

CEO Josh

@ceojosh24

Ceo Wildest Dreams Ent #WildestDreamsEnt

Steve El-Hage

@hagestev

10 months ago

The reason recent model releases are disappointing is because of benchmark hacking. Co optimizes their model for the benchmark and says “this is 50% smarter than the others”. Prophet arena is cool because I think we can all agree we have agi when AI can predict the future.

Prophet Arena

@ProphetArena

10 months ago

🔮 Introducing Prophet Arena — the AI benchmark for general predictive intelligence. That is, can AI truly predict the future by connecting today’s dots? 👉 What makes it special? - It can’t be hacked. Most benchmarks saturate over time, but here models face live, unseen future events. You can’t memorize tomorrow (unless you’ve cracked time travel). - It’s interpretable. Strong performance = real foresight, which translates into real investment gains. 👉 Check it out: https://t.co/1ASTV8GzWy

ProphetArena's tweet photo. 🔮 Introducing Prophet Arena — the AI benchmark for general predictive intelligence.

That is, can AI truly predict the future by connecting today’s dots?

👉 What makes it special?

- It can’t be hacked. Most benchmarks saturate over time, but here models face live, unseen future events. You can’t memorize tomorrow (unless you’ve cracked time travel).
- It’s interpretable. Strong performance = real foresight, which translates into real investment gains.

👉 Check it out: https://t.co/1ASTV8GzWy

91

1K

137

850

452K

2

0

1

464

Steve El-Hage

@hagestev

10 months ago

@RobertHaisfield I support you. 90% chance it’s fine.

0

1

0

32

Steve El-Hage

@hagestev

10 months ago

@levie The main value of subagents is UX. We get to see the “little ai workers” “working” and is easy to understand. Similar to the CoT letting you “see what they think” (it doesn’t), it’s useful for building agent ux but not for production ai workflows.

1

2

0

1

855

Steve El-Hage

@hagestev

10 months ago

The code agent is the everything agent

Mckay Wrigley

@mckaywrigley

10 months ago

@claudeai YESSSS GUYS THEY’RE LEANING INTO BEING THE EVERYTHING AGENT!!!

4

288

3

43

22K

0

4

0

341

Steve El-Hage

@hagestev

10 months ago

@jacob_posel @MatthewGattozzi We're working on this specific problem. Easy to do a quick and dirty version, hard (but solvable) to do it in a more consequential large data environment (ie 8-9 figure spend, XXM MAU) once you solve it for real, you can have self-improving creative which is insanely powerful

0

12

Steve El-Hage

@hagestev

10 months ago

After seeing the anthropic blog post and posts like these, I spent the night using claude code for non-code use. Three agents on growth strategy and three agents on creative generation. I gave them a small creative repo, some data, and a context profile instead of code. Takeaways: 1. For whatever reason each agent call took ~60s, and averaged 15k tokens used. 2. Bad instruction following. Subagents are overly eager and are extremely aggressive in liberties taken. If you ask them to do one easy task, they often do 4-5 additional ones on their own. 3. Agent coordination and handoff are very poor and kind of random even if there were clear instructions given. 4. Creatively, these models are much worse than the sonnet-4 models. Theyre stated as being the same (?) or similar, but theyre notably very bad at creative compared to most stock LLMs. Yes, you can make stuff, but its not really shippable yet. 5. Subagents generated by anthropic (recommended) did much worse than ones created yourself. 6. Inspired by @boringmarketer I tried quant analysis of marketing creative, and my tldr right now is that most implementations will lose you all your money. I can make a separate thread on this and how to make it work, but I compared agent creative analysis when given performance data vs hand analysis and the agents is generally wrong. The killer feature here is claude code interface but with general purpose agent sdk. Net I was able to get some decent results but this use case is pretty far from production viable. Next is recreating this with openai agents.

Peter Yang

@petergyang

11 months ago

Something that many people haven't realized: Claude Code is useful for much more than just coding. For example, @AnthropicAI's growth marketing team is using it to generate 100s of new ad creatives in minutes. I think Anthropic has undersold the power of this tool. It should be branded as "Claude Agent" not "Claude Code." I asked people what their non-coding use cases are and some of the replies are 🤯. See next post for the link.

petergyang's tweet photo. Something that many people haven't realized:

Claude Code is useful for much more than just coding. For example, @AnthropicAI's growth marketing team is using it to generate 100s of new ad creatives in minutes.

I think Anthropic has undersold the power of this tool. It should be branded as "Claude Agent" not "Claude Code."

I asked people what their non-coding use cases are and some of the replies are 🤯. See next post for the link.

37

1K

74

2K

135K

0

5

0

3

385

Steve El-Hage

@hagestev

11 months ago

@AtelierMissor_ @StarbaseTX congrats - good to see you guys getting more love

0

1

0

95

Steve El-Hage

@hagestev

12 months ago

@AiBuilder1 steal

0

149

Steve El-Hage

@hagestev

12 months ago

@VibeMarketer_ MCP

0

2

0

63

Steve El-Hage

@hagestev

about 1 year ago

@mageeclegg whats living in santiago like?

1

2

0

168

Steve El-Hage

@hagestev

about 1 year ago

How good are stock LLMs at translation? Is 4o english-> spanish basically perfect or can native speakers immediately tell the difference? Also is there some kind of language benchmarking to see how good each model is for different languages?

0

1

0

176

Steve El-Hage

@hagestev

about 1 year ago

@mjmayank1 yeah decently common in early stage

0

1

0

20

Steve El-Hage

@hagestev

about 1 year ago

Turns out it’s exactly how it works

Steve El-Hage

@hagestev

about 1 year ago

@shl Company A buys saas from Company B for $1m ARR Company B buys saas from Company C for $1m ARR Company C buys saas from Company A for $1m ARR Is the total revenue $1M or $3M a year?

6

39

3

9K

1

4

0

347

Steve El-Hage

@hagestev

about 1 year ago

veo3 San Francisco morning commute

0

3

0

1

397

Steve El-Hage

@hagestev

about 1 year ago

Dante Alighieri live streaming from hell pt2

0

1

0

4

Steve El-Hage

@hagestev

about 1 year ago

@Yassir1611 because its LA

0

68

Steve El-Hage

@hagestev

about 1 year ago

more raw veo3 experiments. these are all sequential, first attempt, no cherry picking since I think that gives a better view of what the model's actually like: veo3 prompt: generate the most sterotypical tiktok video you can

1

12

0

14

6K

Steve El-Hage

@hagestev

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users