You’ve built an agent, but have you built an agent that communicates with other agents over the internet? What about an agent that generates a custom UI on-demand?
I'm super excited to announce that @a2anetcom are co-hosting London's first A2A protocol and generative UI hackathon with @Google and @CopilotKit on Saturday 13th June!
The A2A track is not your typical hackathon track. Your task will be to build three specific agents involved in customer service: a personal agent, a customer service agent, and a research agent.
This track will be judged on a held-out test set. Half of your score will come from how well your agents work together, the other half will come from how well each of your agents works with two other agents. Therefore, you need to build agents that work well with each other, and other agents!
The generative UI track is an exciting, open-ended hackathon track with infinite possibilities. For this your task is to build a product that features at least one agent and generative UI.
This track will be judged on a number of different criteria, like originality, economic value, technical difficulty, and use of generative UI!
This is NOT one you’re going to want to miss! Check the comments for the Luma page 👇✨
I used to think prompt engineering was the unlock.
I then trained an LLM from scratch, realising the real game is context engineering.
The entire window shapes every output. Your clever prompt is just one small slice of it. Most people are still optimising the wrong variable. Here are some takeaways.
First, start every chat with a clear why, and invoke the persona. One role. Keep the objective tight. If you don’t define it upfront, there is too much guessing. Those guesses compound across the context and pull the whole thing off track.
And, when the model gets it wrong, don’t reply. Go back and rephrase your original prompt instead. Appending corrections pollutes the context window. Use forks or spin up a clean new chat.
Second, every word is precious, each one shifts the output in embedding space. This is why ALL CAPS emphasis actually changes results. It’s not stylistic fluff. Precision compounds with every token you choose.
Third, switch deliberately between expansive and contractive context. Expansive mode: explore every way this could go wrong, every option on the table. Contractive mode: pick one path and go deep.Most people drift in the blurry middle. The highest-leverage work happens when you choose the mode on purpose.
Fourth, think in diamonds. Treat each stage of thinking as its own clean chat. Don’t drag exploration, decision-making, and execution into one messy thread. Give every phase its own focused context.
Fifth, Memory hygiene matters. Delete chats that are wrong or stale. Memory-enabled models will happily drag old bad context into new conversations if you leave it lying around.
The biggest unlock came from building the LLM myself. It made obvious why context is the primary performance lever. Everything else is secondary. These are the shifts that actually changed how I use models and agents every day.
The full post is here: https://t.co/Iec1cdQbf4 →
@SystemsForScale More on the 'train an llm from scratch' - I really found it useful to understand the fundamentals and it changed how I interact with LLMs.
Training an LLM from scratch is easier to study when the whole path is in one repo.
Train LLM From Scratch is a PyTorch repository for learning how a transformer language model is built, trained, saved, and used for text generation.
It helps you move from “I understand attention on paper” to a runnable training pipeline by pairing model code with data download, preprocessing, config, training, and generation scripts.
Key features:
• Transformer components from scratch – separate PyTorch modules for MLP, attention, transformer blocks, and the final model
• Pile-based data path – scripts download The Pile files and preprocess JSONL.ZST text into tokenized HDF5 datasets
• Configurable training setup – model size, context length, heads, blocks, batch size, learning rate, and file paths live in https://t.co/zuPqaR3MhP
• Hardware guidance – README compares common GPUs for 13M and 2B-class training runs
• Generation workflow included – generate_text.py loads trained checkpoints and produces sample text outputs
It’s open-source (MIT license).
Link in the reply 👇
Spec-driven development? You mean NASA's V-model. It's perfect for the agentic era - a clear, systems led approach.
It starts by outlining how you'll address a need and validate that you've met the need.
You then build comprehensive requirements, and implement.
Now with AI.