Check out my latest article: Building AI-Native Engineers: What a Game Clock Revealed About the Gap Between Coding and Problem Solving https://t.co/Snyo2EAAqQ via @LinkedIn
So excited that our work is on the cover of Science!!! We find that AI models overly affirm users, even when they describe harmful actions. Advice from sycophantic AI made people more self-centered, yet people prefer and trust it more, which may promote this model behavior.
Jaggedness remains a key feature of LLMs & I have yet to see a clearly articulated argument about why it will disappear. A jagged general intelligence (not quite an oxymoron, as humans are too) still creates lots of bottlenecks that require people & slow many kinds of take-off.
RL is not all you need, nor attention nor Bayesianism nor free energy minimisation, nor an age of first person experience. Such statements are propaganda.
You need thousands of people working hard on data pipelines, scaling infrastructure, HPC, apps with feedback to drive benchmarks and data, tons of research and engineering on generative models, data mixtures, ablations, RL/selftraining, etc etc and we will probably need lots of people working hard to figure out safety, causal world models, awareness, models that create abstractions comparable to infinity and zero and use these to predict the existence of things like black holes and suggest experiments to verify such hypothesis, or come up with novel engineering designs to generate energy more efficiently, robotics, etc etc.
It takes thousands of people and many ideas. In the end some simple ideas might become obvious but such obviousness only happens in retrospect. Yes, there is a bitter lesson but if we had followed it, we’d still be doing linear regression with RL. Let’s not oversimplify, but rather honour the research and engineering of thousands of people.
Also, people keep rewriting history. When our language understanding start up (darkbluelabs) was acquired by Google about 10 years ago, we joined DeepMind, where the AGI documents were all about concepts, RL, episodic memories and made it clear that there was no room for language. To be honest, back then such a position wasn’t so crazy. Now it seems silly, but only because of the benefit of hindsight.
There’s no 1 or 10 heroes in the history of AI. There’s many 1000s of hard working students, profs, engineers, operations and support people, product folks, managers, even hedge funds among others. Let’s honour the whole community and not just ceos or the philosophers of Bayes, RL, deep learning, etc. I look forward to learning from the next generation and seeing what they will achieve. To them: Don’t buy the existing narratives blindly, innovate. Remember that just like mathematics, AI will advance one grave at the time.
🚀After an incredible response, the Antler AI Research Paper Club is back with its 3rd edition, an invite-only group of AI technologists, researchers & founders to network and analyze cutting-edge AI papers every month. Facilitated by @_rampi_.
Sign up: https://t.co/yLqQL0f9mV
https://t.co/0n290x7iWV
I currently don’t trust LLMs enough to find them useful for my work. Science needs to be accurate. OTOH we need reliable QA and summarization tools to help with the deluge of papers, so the quest goes on…
I think AI agentic workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models. This is an important trend, and I urge everyone who works in AI to pay attention to it.
Today, we mostly use LLMs in zero-shot mode, prompting a model to generate final output token by token without revising its work. This is akin to asking someone to compose an essay from start to finish, typing straight through with no backspacing allowed, and expecting a high-quality result. Despite the difficulty, LLMs do amazingly well at this task!
With an agentic workflow, however, we can ask the LLM to iterate over a document many times. For example, it might take a sequence of steps such as:
- Plan an outline.
- Decide what, if any, web searches are needed to gather more information.
- Write a first draft.
- Read over the first draft to spot unjustified arguments or extraneous information.
- Revise the draft taking into account any weaknesses spotted.
- And so on.
This iterative process is critical for most human writers to write good text. With AI, such an iterative workflow yields much better results than writing in a single pass.
Devin’s splashy demo recently received a lot of social media buzz. My team has been closely following the evolution of AI that writes code. We analyzed results from a number of research teams, focusing on an algorithm’s ability to do well on the widely used HumanEval coding benchmark. You can see our findings in the diagram below.
GPT-3.5 (zero shot) was 48.1% correct. GPT-4 (zero shot) does better at 67.0%. However, the improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating an iterative agent workflow. Indeed, wrapped in an agent loop, GPT-3.5 achieves up to 95.1%.
Open source agent tools and the academic literature on agents are proliferating, making this an exciting time but also a confusing one. To help put this work into perspective, I’d like to share a framework for categorizing design patterns for building agents. My team AI Fund is successfully using these patterns in many applications, and I hope you find them useful.
- Reflection: The LLM examines its own work to come up with ways to improve it.
- Tool use: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.
- Planning: The LLM comes up with, and executes, a multistep plan to achieve a goal (for example, writing an outline for an essay, then doing online research, then writing a draft, and so on).
- Multi-agent collaboration: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would.
I’ll elaborate on these design patterns and offer suggested readings for each next week.
[Original text: https://t.co/y4McIAjD2m]
While GenAI is fun, I think its economic value is grossly over estimated, because it’s unreliable, risky and expensive to make and serve. It’s fine for creative tasks, but not (yet) autonomous agents
# on shortification of "learning"
There are a lot of videos on YouTube/TikTok etc. that give the appearance of education, but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are learning (but actually they are just having fun). The people creating this content also enjoy it because fun has a much larger audience, fame and revenue. But as far as learning goes, this is a trap. This content is an epsilon away from watching the Bachelorette. It's like snacking on those "Garden Veggie Straws", which feel like you're eating healthy vegetables until you look at the ingredients.
Learning is not supposed to be fun. It doesn't have to be actively not fun either, but the primary feeling should be that of effort. It should look a lot less like that "10 minute full body" workout from your local digital media creator and a lot more like a serious session at the gym. You want the mental equivalent of sweating. It's not that the quickie doesn't do anything, it's just that it is wildly suboptimal if you actually care to learn.
I find it helpful to explicitly declare your intent up front as a sharp, binary variable in your mind. If you are consuming content: are you trying to be entertained or are you trying to learn? And if you are creating content: are you trying to entertain or are you trying to teach? You'll go down a different path in each case. Attempts to seek the stuff in between actually clamp to zero.
So for those who actually want to learn. Unless you are trying to learn something narrow and specific, close those tabs with quick blog posts. Close those tabs of "Learn XYZ in 10 minutes". Consider the opportunity cost of snacking and seek the meal - the textbooks, docs, papers, manuals, longform. Allocate a 4 hour window. Don't just read, take notes, re-read, re-phrase, process, manipulate, learn.
And for those actually trying to educate, please consider writing/recording longform, designed for someone to get "sweaty", especially in today's era of quantity over quality. Give someone a real workout. This is what I aspire to in my own educational work too. My audience will decrease. The ones that remain might not even like it. But at least we'll learn something.
Chandrayaan-3 Mission:
'India🇮🇳,
I reached my destination
and you too!'
: Chandrayaan-3
Chandrayaan-3 has successfully
soft-landed on the moon 🌖!.
Congratulations, India🇮🇳!
#Chandrayaan_3#Ch3
POC: Dynamically generating instructional videos: Using LLM to high level instructions -> Custom Problem Solving to solve and create a visualisation representation -> generate Manim code to produce the video.
@manim_community @3blue1brown#llm#gpt#generativevideo
🚀 Exciting News! 🎉 I'm thrilled to announce the launch of our innovative learning platform, The School of AI! 🌟
We have ingested ~200 landmark research papers to provide you with an exceptional starting point.
🌐 Join us today at https://t.co/hwjaslazUK
#AIinEducation
Nice to see that our company got listed as one of the top disruptive AI startups by Matrix Partners India.
We are just getting started.
#startups#ai#india https://t.co/GmFTO0FBG4