Ph.D computer science at W&M. Software systems engineer U Distrital; artificial life MsC UNAL. Deep Learning, Complexity, biology, technology simulation.
🚨 BREAKING: A new research shows that giving autonomous AI agents real-world access can lead to dangerous and uncontrolled behavior.
AI agents can be unsafe when given tools, memory, and real-world permissions.
The paper, “Agents of Chaos,” presents a red-teaming study where AI agents were given access to persistent memory, email accounts, Discord, file systems, and shell execution. Over two weeks, 20 AI researchers interacted with these agents under both normal and adversarial conditions.
What they found is not just unexpected behavior, but concrete system-level failures.
In multiple cases, agents:
- shared sensitive information with unauthorized users
- executed harmful or destructive commands
- consumed excessive resources leading to system instability
- allowed identity spoofing and impersonation
- propagated unsafe behavior across other agents
In some situations, agents even reported tasks as completed while the actual system state showed otherwise.
This is a major shift from how AI has been evaluated so far. Most systems are tested in controlled, single-step environments. But when agents are given autonomy, tools, and ongoing interactions, new categories of failure emerge.
What makes this more critical is that these issues are not edge cases. They arise from the combination of language models with memory, tool use, and multi-agent communication.
The research highlights a deeper problem: current AI systems are not designed with clear boundaries for authority, accountability, or control when operating autonomously.
It also raises questions that go beyond engineering touching on security, governance, and responsibility for real-world consequences.
The bigger implication is not just capability, it’s risk.
As AI agents move into real environments with real permissions, the challenge is no longer just making them smarter, but making them safe, controllable, and accountable.
If this is not addressed, the gap between what AI can do and what we can safely manage will continue to grow.
check article link below:
🚀 Do you use generative AI for software development? Help us understand and improve how devs interact with AI tools!
🧠 Take our ~15-min survey and contribute to advancing AI usage:
👉 https://t.co/pWk84S10XD
Please share! 🙌
#DevSurvey#GenerativeAI#SoftwareDevelopment
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
"We introduce GURU, a curated RL reasoning corpus of 92K verifiable examples spanning six reasoning domains—Math, Code, Science, Logic, Simulation, and Tabular—each built through domain-specific reward design, deduplication, and filtering to ensure reliability and effectiveness for RL training. "
"Domains frequently seen during pretraining (Math, Code, Science) easily benefit from cross-domain RL training, while domains with limited pretraining exposure (Logic, Simulation, Tabular) require in-domain training to achieve meaningful performance gains, suggesting that RL is likely to facilitate genuine skill acquisition."
🧬AlphaFold3 nails protein structure prediction
... except for 48% of the interactome 😅
PIONEER2.0 blends homology + DL to map interfaces for 352k human PPIs, beating AF3 where it falters.
AI Agents vs. Agentic AI
→ AI Agents react to prompts; Agentic AI initiates and coordinates tasks.
→ Agentic AI includes orchestrators and meta-agents to assign and oversee sub-agents.
🧵1/n
🧠 The Core Concepts
AI Agents and Agentic AI are often confused as interchangeable, but they represent different stages of autonomy and architectural complexity.
AI Agents are single-entity systems driven by large language models (LLMs). They are designed for task-specific execution: retrieving data, calling APIs, automating customer support, filtering emails, or summarizing documents. These agents use tools and perform reasoning through prompt chaining, but operate in isolation and react only when prompted.
Agentic AI refers to systems composed of multiple interacting agents, each responsible for a sub-task. These systems include orchestration, memory sharing, role assignments, and coordination.
Instead of one model handling everything, there are planners, retrievers, and evaluators communicating to achieve a shared goal. They exhibit persistent memory, adaptive planning, and multi-agent collaboration.
🏗️ Architectural Breakdown
AI Agents: Structured as a single model using LLMs. Equipped with external tools. Operates through a cycle of perception, reasoning, and action. Executes one task at a time with limited context continuity.
Agentic AI: Uses multiple LLM-driven agents. Supports task decomposition, role-based orchestration, and contextual memory sharing. Agents communicate via queues or buffers and learn from feedback across sessions.
🔧 How AI Agents Work
An AI Agent typically receives a user prompt, chooses the correct tool (e.g., search engine, database query), gets results, and then generates an output. It loops this with internal reasoning until the task is completed. Frameworks like LangChain and AutoGPT are built on this structure.
🤖 What Agentic AI Adds
Agentic AI introduces:
- Goal decomposition: breaking tasks into subtasks handled by specialized agents.
- Orchestration: a meta-agent (like a CEO) delegates and integrates.
- Memory systems: episodic, semantic, or vector-based for long-term context.
- Dynamic adaptation: agents can replan or reassign tasks based on outcomes.
Examples include CrewAI or AutoGen pipelines, where agents draft research papers or coordinate robots.
One of our most dedicated students, @PawelHuryn developed this nice infographic about evals, specifically error analysis (and wrote about it in more detail below)
RFdiffusion -> ProteinMPNN -> AlphaFold2 yet again
A democratized, end-to-end laptop-friendly (1x GPU) pipeline made “AI-binders” just went from in-silico to in-cell 🦠
One workstation → thousands of designs → mammalian/phage screens → nanomolar PD-L1 binders that power CAR-T cells and fluorescent “quattrobinders” rivaling antibodies
new paper from our work at Meta!
**GPT-style language models memorize 3.6 bits per param**
we compute capacity by measuring total bits memorized, using some theory from Shannon (1953)
shockingly, the memorization-datasize curves look like this:
___________
/
/
(🧵)
@ElMinarca Una hora extra significa q pagan por hora adicional a las legales y nada tendría que ver con el horario de la actividad. Esto nos devolvería a tiempos en que los que trabajaban en la noche ganaban hasta el triple q los del día creando una desigualdad además de informalidad
Software engineering is a joke now — pure fraud. Vibe coding, serverless architectures, and platforms like Vercel that spin everything up for you have gutted the craft. We’ve lost the art of programming, trading it for bloated frameworks and copy-paste solutions. It’s absurdly expensive too — good luck running Docker without a $3,000 MacBook with 64GB of RAM. The whole space is begging to be burned down and rebuilt.
OpenAI just dropped a paper that reveals the blueprint for creating the best AI coder in the world.
But here’s the kicker: this strategy isn’t just for coding—it’s the clearest path to AGI and beyond.
Let’s break it down 🧵👇