Ending this year with a blog on RL environments:
https://t.co/wfGtEormJy
Talks about reward hacking, sandboxing, curriculum learning, tool calling - all the stuff that can break when you actually try to train agents
Thanks!
https://t.co/cVZAPeYJDQ
It's a countdown task - "Use every given number exactly once to build an equation that hits the target."
As a next step, I wanted the small model to decide if it has to forward the equation or solve on its own. I introduced latency as a negative reward but it didn't work quite well. And then I went to other projects.
An interesting thing happened when the remote model is swapped with a bad quality 7B LLM. After failing to get the remote model to provide right equations, the local model started providing hints in its prompt for the remote model. In a few instances, the local model started solving the problem and just passed the equation in remote model's prompt for the remote model to just **forward** the equation.
@brendanh0gan Sounds similar to the experiment I tried a few months ago. I tried it with qwen-3b and o3-mini
Spoiler: It works!
https://t.co/C99Esn1D9n
@goyal__pramod Always. Visualizations are rewarded since humans are visual creatures.
To speed this process, you can try these:
1. Claude renders diagrams with Artifacts feature.
2. Try asking AI for mermaid versions and later manually convert to excalidraw or ilk
Given @HamelHusain and @sh_reya 's credibility, I would love to take the evals course. But 2000$ seems like a lot. Even to convince the corporate. What are some best alternatives here?
@tanay_mehta@doSwayamExist Ouch. How pathetic!! This incident and intern-shaming Apple's latest paper is deeply concerning. This is classism in a different form.
This can only change if indian teachers/ professors are "okay" with removing Sir/Madam. And how cool is it to say, Prof. Sanyal/Prof. Chawla?!
Introducing Sarvam Samvaad 🚀
Sarvam Samvaad, our Conversational AI Platform, is designed to help enterprises build, test, and launch AI agents fluent in 11 Indian languages.
▪️Power interactions across telephone, WhatsApp, web, and apps
▪️Handle complex phrases, alphanumerics, and proper nouns with precision
▪️Listen in on every conversation and discover deep insight
▪️Scale your support with pricing built for India
Trusted by leading brands across the country, the platform delivers production-ready agents so enterprises can move from pilot to full-scale deployment in just days.
It’s time to level up customer experience with Sarvam Samvaad.
Watch it in action ↓
@t_blom I came across folks who said:
"If you find the clients and take care of the tech stuff, you'll be the co-founder". How dependable can a person be!!?