shanx @mailshanx - Twitter Profile

4 months ago

@stevekrouse https://t.co/179b9kQ4mK comes with an API (not documented yet) and we can ingest 100GB+ worth of PDFs and data and have an AI agent work on it without losing coherence or quality. DMs open!

0

1

154

shanx @mailshanx

5 months ago

@_can1357 Is this integrated now to omp?

1

0

2K

shanx @mailshanx

5 months ago

@lateinteraction @Teknium Is the first requirement necessary? If it has symbolic handles to external data, can’t it also manage long prompts similarly? In fact wouldn’t it necessarily need to see prompts up to a non-zero length because that’s the only way anything enters the context?

0

1

0

66

shanx @mailshanx

5 months ago

@brenorb @lateinteraction @Teknium I don’t think the point is not “not read” the variable - rather you specify an action that (like print) that emits data to Kiên context in an intentional way. There is no other way to emit data to the LLM context otherwise.

0

11

Who to follow

multimodal ml systems | gpus are great ml research engineer @JohnsHopkins 🇺🇸, beng mcomp @ouranu 🇦🇺.

harsha

@harsha_musunuri

research scientist @Dolby ; conditional generative models are world models

shanx @mailshanx

5 months ago

Indeed very promising, but I implemented RLMs for my harness and used it for legal analysis and reasoning. But it seems to fare worse than a cc style harness, probably because of post training. For ex it’s v prone to rabbit hole-ing and getting stuck that way

Omar Khattab

@lateinteraction

5 months ago

The following are not standard in a coding agent: 1. The user prompt P itself (not just external data) is a symbolic object in the environment. The model is not allowed to grep/read long snippets from P. 2. The model has to write recursive code (that calls LMs) to understand or transform the content of P. Unlike "sub-agents", recursion must happen during code execution, which means that you can launch arbitrarily many sub-calls, not just a small constant number, without polluting the context window. 3. All sub-calls and tool calls return values into symbolic variables. The model is not allowed to pollute its context window with their return values. Instead, it must build up (and refine) its output with recursion. Contrast Algorithm 1 and Algorithm 2 to see the formal differences.

lateinteraction's tweet photo. The following are not standard in a coding agent:

1. The user prompt P itself (not just external data) is a symbolic object in the environment. The model is not allowed to grep/read long snippets from P.

2. The model has to write recursive code (that calls LMs) to understand or transform the content of P. Unlike "sub-agents", recursion must happen during code execution, which means that you can launch arbitrarily many sub-calls, not just a small constant number, without polluting the context window.

3. All sub-calls and tool calls return values into symbolic variables. The model is not allowed to pollute its context window with their return values. Instead, it must build up (and refine) its output with recursion.

Contrast Algorithm 1 and Algorithm 2 to see the formal differences.

12

178

17

230

60K

0

31

shanx @mailshanx

5 months ago

@a1zhang What do you think are the trade offs compared to the RLM implementation you presented in the talk?

0

106

shanx @mailshanx

5 months ago

@jonnydimond @TaskletAI Sounds fascinating, you should write a blog post!

0

29

shanx @mailshanx

6 months ago

@__morse How does it compare to dev-browser? Can the browser session keep running while the agent iteratively tries new playwright code? The playwright mcp restarts the browser for example

0

shanx @mailshanx

6 months ago

@penberg @letsbuildmore Can’t find the repo, drop a link?

0

6

shanx @mailshanx

7 months ago

@charlespacker Congratulations on the release, looks v promising

0

1

0

135

shanx @mailshanx

7 months ago

@corbtt How should we think about using ART like tools for our agents? Is there still alpha in RL training if I care most about quality of responses in a domain like legal (and I'm relatively insensitive to cost)?

0

24

shanx @mailshanx

7 months ago

@LukeW How does this work?

0

97

shanx @mailshanx

7 months ago

@ChrisGPotts Thanks for your talk, the reframing really made it click for me. Do you have pointers for getting started with optimizers for datasets similar to the HR dataset in your talk? Sometimes we might only have agent interactions in our app, without user comments.

1

0

272

shanx @mailshanx

7 months ago

@perceptnet Sounds really amazing, congratulations 🥂 how is it different from @ManusAI ?

0

21

shanx @mailshanx

8 months ago

@nikolaj_astrup Amazing! Can I come visit and say hello? Where in Vietnam is this?

0

58

shanx @mailshanx

8 months ago

@guilleflorvs FOUNDER

0

22

shanx @mailshanx

8 months ago

@sh_reya @lateinteraction Do you have any writeup I can reference for learning how to do this? I'm trying to improve LLM performance on legal texts as well, but my use case has a broader scope extending beyond contracts and I wonder if similar approaches can work.

0

68

shanx @mailshanx

8 months ago

@ravihanda Figuring out immigration is more difficult than FIREing

0

2

shanx @mailshanx

8 months ago

@LVNilesh How did you immigrate to the US?

0

124

shanx @mailshanx

9 months ago

@ravihanda F&B scene in Singapore has suffered greatly, in fact it’s pretty much dead now. Very few places are worth it for eating out compared to what it used to be even 5 years ago.

1

0

503

shanx

@mailshanx

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users