Congratulations to @pulkit_verma for winning the #ICAPS2025 Outstanding Dissertation Award! 🍾🥳🎉
The award-winning dissertation establishes formal foundations for autonomous assessment of adaptive AI systems.
The deadline for #IJCAI2025 Workshop on User-Aligned Assessment of Adaptive AI Systems is just 5 days away. If you are working on any aspect of assessment, regulation, compliance, etc., of AI systems, please check it out.
More details here: https://t.co/AhhwUfjqMO
Gemini now coaches table tennis robots 🤖🏓:
In our latest paper, we introduce SAS Prompt – a technique for robot self-improvement with LLMs. Here is how it works 🧵
Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)
I’m super happy to have co-led the Evalchemy 🧪 team! I’ve been personally wanting a simple and fast framework for running a host of common post-training evals 📚 for a while now and this tool has streamlined a lot of my research. We hope that it helps you out too!
https://t.co/OBsYc5udSr
I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training metrics like MMLU. This requires you to download and install more than 10 repos, each with different dependencies and issues. This is, as you might expect, an actual nightmare. (1/n)
If interested please reach out to me at – [email protected] or dm me.
For more details please check my –
CV: https://t.co/Uk8flPEwhN
Scholar profile: https://t.co/QqCkJW1ZXD
Webpage: https://t.co/5V9sxiEx3j
6/6
My PhD was oriented towards automated task planning (PDDL based heuristic-search planners) and its application to human-ai collaboration #HumanAwareAI#HAAI scenarios to design algorithms for finding relevant information to support active teaming. 3/6
I also have 10+ years of experience in developing web-based/standalone systems for using (using python, JS, jQuery, and Dojo framework), and Java (at @sapient).
I have also designed several human-factors studies to evaluate the effectiveness of various AI based systems. 5/6
Through my work, I have been exploring LLM+planning for LLM-based agents for the past few months and some of the initial work was presented at #ICAPS24#demo.
Link to the tweet -- https://t.co/PpD4bZW7Ni
I have also been part of several DARPA projects. 2/6
The video (https://t.co/TGGiELc55b) first shows the core technology and the pipeline, followed by the demo.
This work was done in collaboration with @shiwalimohan at PARC, part of #SRI. 2/7
I am looking for positions in LLM based agents, and combining planning and learning techniques/systems.
I have around 2.5 years of industry research including two years at @PARCinc as a research scientist and multiple summer intern positions @amazon@alexa99. 1/6
We evaluated the planner on Open AI gym problems such as Mountain Car, Cartpole, and Acrobot, where a plan is constructed in less than a second.
This work was done at @PARCinc, in collaboration with Wiktor Piotrowski and Alex Perez. 7/7
🚨Our presentations at ICAPS24 a 🧵
(1) ICAPS24 #demo on using #LLM to translate natural language requests from humans to formal representations and use it as a goal to construct a plan using several planners. 1/7
(3) #ICAPS24 paper at #KEPS workshop to present Nyx: a PDDL+ planner for solving real-world problems with non-linear dynamics, exogenous events, and continuous processes (https://t.co/9dw83oASdI). 6/7