@stevekrouse https://t.co/179b9kQ4mK comes with an API (not documented yet) and we can ingest 100GB+ worth of PDFs and data and have an AI agent work on it without losing coherence or quality. DMs open!
@lateinteraction@Teknium Is the first requirement necessary? If it has symbolic handles to external data, can’t it also manage long prompts similarly? In fact wouldn’t it necessarily need to see prompts up to a non-zero length because that’s the only way anything enters the context?
@brenorb@lateinteraction@Teknium I don’t think the point is not “not read” the variable - rather you specify an action that (like print) that emits data to Kiên context in an intentional way. There is no other way to emit data to the LLM context otherwise.
Indeed very promising, but I implemented RLMs for my harness and used it for legal analysis and reasoning. But it seems to fare worse than a cc style harness, probably because of post training. For ex it’s v prone to rabbit hole-ing and getting stuck that way
The following are not standard in a coding agent:
1. The user prompt P itself (not just external data) is a symbolic object in the environment. The model is not allowed to grep/read long snippets from P.
2. The model has to write recursive code (that calls LMs) to understand or transform the content of P. Unlike "sub-agents", recursion must happen during code execution, which means that you can launch arbitrarily many sub-calls, not just a small constant number, without polluting the context window.
3. All sub-calls and tool calls return values into symbolic variables. The model is not allowed to pollute its context window with their return values. Instead, it must build up (and refine) its output with recursion.
Contrast Algorithm 1 and Algorithm 2 to see the formal differences.
@__morse How does it compare to dev-browser? Can the browser session keep running while the agent iteratively tries new playwright code? The playwright mcp restarts the browser for example
@corbtt How should we think about using ART like tools for our agents? Is there still alpha in RL training if I care most about quality of responses in a domain like legal (and I'm relatively insensitive to cost)?
@ChrisGPotts Thanks for your talk, the reframing really made it click for me. Do you have pointers for getting started with optimizers for datasets similar to the HR dataset in your talk? Sometimes we might only have agent interactions in our app, without user comments.
@sh_reya@lateinteraction Do you have any writeup I can reference for learning how to do this? I'm trying to improve LLM performance on legal texts as well, but my use case has a broader scope extending beyond contracts and I wonder if similar approaches can work.
@ravihanda F&B scene in Singapore has suffered greatly, in fact it’s pretty much dead now. Very few places are worth it for eating out compared to what it used to be even 5 years ago.