next gen computer-use agents @salesforce | ex @convergence_ai_ @UnitaryAI @CompSciOxford @imperialcollege | interested in steering ai to do more good than bad
Excited to release code and models for detoxify 🙊, a simple python library built @UnitaryAI that aims to detect hate speech and toxic comments online! Includes training scripts built with @PyTorchLightnin ⚡ and transformers checkpoints @huggingface 🤗!
https://t.co/dv4eip8XE6
say what you will about ai slop but it’s so cool that you can just generate a 4 page comic about the history of epistemology
these are with @OpenAI images 2.0
this was a balanced, fun and surprisingly touching doc, @DanielRoher did well capturing the ai zeitgeist and its defining promise/peril tension in a very human way, kudos to @Fremond_@ry_paddy for getting their hands on it & organising the watch party
claude cowork is making me think maybe we’ll look back and it’ll be obvious that humans were never meant to spend their lives working behind a screen. we’ll see it as inevitable that computers do everything for us on computers and the future of work is cooler than we can imagine
main takeaways from the @dwarkeshpodcast@karpathy interview:
*RL limitations*: hard to get past reward sparsity problem in RL when it comes to real world tasks, one promising direction could be more sample efficient learning by reflecting on mistakes
*reduced model size by getting rid of redundant memorisation*: soon the cognitive engine might look like a 1 billion parameter model that just knows how to think without the need to memorise lots of data - it should be able to recognise what it doesn't know and look it up
*gradual loss of control and understanding*: even if we still have humans delegating tasks to autonomous entities, it will get increasingly harder to fully control them, let alone understand what they're doing; similar thoughts in this great albeit on the pessimistic side paper https://t.co/8s5M8Ub6FC
*role of AI education post-AGI*: pre-AGI AI education is useful, post-AGI education is fun, could be seen as a way to train mentally just how you do physically
pantheon is now up there with some of my favourite sci fi of all time like asimov’s the end of eternity, the last question or story of your life with some elements of snow crash or dark the tv show but upping the scale even more, dare i say something akin to what @DavidDeutschOxf was describing in the beginning of infinity
a nice straight forward summary on some of the grpo limitations beyond just not being great for multi-turn e.g.
* if you have multiple reward signals -> the model won't know which one it is being rewarded for since they're usually all collapsed into one
* only the scalar reward is used for policy update when a more detailed textual feedback could be used (what gepa kinda does with their reflective prompt evolution)
wrote a short blogpost on what I think are some limitations of GRPO:
I’ve been playing around with RL finetuning for reasoning tasks and came across a few limitations that i wanted to document here
feedback/corrections are welcome!
It's been interesting to see RL having a comeback lately. Guess in retrospect it's not surprising that RLHF is not enough since it naturally hits a ceiling limited by the quality of human feedback. What's particularly exciting though is that this shift is happening just as AI is becoming more agentic and able to interact with the digital world. Combine this with some grounded reward signals (e.g. profit, likes, citations) and we should see AI not just replicate what humans do but create new knowledge through their own experiences. The next few years should be a fun and wild ride 🎢
For another interesting paper on the topic: https://t.co/rRRoDCDf6s
or this podcast on it:
https://t.co/JPOhZUuUmJ
Why does RL work so well to learn preferred completions in llm post training and why can't we just use supervised finetuning?
This paper has an interesting information-theoretic explanation:
https://t.co/eLoprr5xxc
TLDR:
The literature shows that the 2 stage RL approach (1. learn a good reward model that can score llm completions similar to how a human would 2. use that to learn good policies with RL) used today in the sota models outperforms using only supervised finetuning. The authors posit that this happens when the verification of an output is simpler than generating it. This is because, in the RL case, the search space is constrained to the subset of policies that are optimal for the learnt reward model. When they reduce the verification-generation gap empirically the difference in results diminishes too.
great discussion! we need more public discourse between top ai leaders and economists 🤝
would’ve liked to see them engage with hinton’s concern that the benefits ai brings will deepen the economic divide in a capitalist system further creating fertile ground for fascism 🌶️
Watch our 2024 Nobel Prize laureates talk about their research and careers in a unique roundtable discussion, 'Nobel Minds', moderated by BBC's Zeinab Badawi.
https://t.co/yB5We7tWmd
Introducing Cambrian-1, a fully open project from our group at NYU. The world doesn't need another MLLM to rival GPT-4V. Cambrian is unique as a vision-centric exploration & here's why I think it's time to shift focus from scaling LLMs to enhancing visual representations.🧵[1/n]
The 2nd part of the Intro to multi-node machine learning is out! 🥳 This all about how to use Slurm to scale up your ML applications! 🚀
https://t.co/QaMQvEyjdl
Excited to kick off a new deep-dive blog series on how to build and set up the infrastructure for distributed training from scratch.
Spoiler alert – setting up a cloud-based cluster that can scale to hundreds of nodes isn't as daunting as it sounds! 👀🚀
https://t.co/6zCjGBVGHm