I research AI, machine learning, and LLMs by building with them.
Agents, RAG systems, experiments that probably shouldn't work but sometimes do.
Follow along for papers worth reading, tricks that survive contact with real data, and the stuff hype cycles miss.
Anthropic is very transparent when it comes to the system prompt that their models use in Claude. These system prompts can be found here: https://t.co/cY0sOgrlDu
Both I would say. Sometimes when I genuinely want to understand something, then I’ll leverage it to understand that desired concept. However, if I’m feeling lazy with something I’ll just take whatever it generates. Obviously I’ll still use critical thinking skills to make that decision.
@andonlabs I really love the experiments that you’re doing at your lab. It’s so creative. It really gives a glimpse into the future when some of these agentic systems automate real world processes.
I don’t think it was because of best-in-class models. They had a lead initially and leveraged public relations really well to make it go viral.
They lost out after the compute issues and limiting people’s usage. This resulted in negative sentiment.
I still use Claude Code, the models are more than capable but there’s no in between for pricing and usage.
@sheriyuo I think there’s a trade off though. At some stage, it isn’t worth undertaking it from a financial perspective and probably competing priorities in people’s personal lives.
Introducing Daybreak: frontier AI for cyber defenders.
Daybreak brings together the most capable OpenAI models, Codex, and our security partners to accelerate cyber defense and continuously secure software.
A step toward a future where security teams can move at the speed defense demands.
I've noticed increasing reports that people no longer enjoy programming as much, citing both the effectiveness of vibe coding tools and a reduced sense of challenge.
I've been sharing a similar sentiment. I recently used Claude Code to vibe code a Google I/O countdown as part of the Code the Countdown challenge. It produced something noticeably better than what I would have built manually. It achieved this with just the planning mode and a fairly vague prompt.
Anyone sharing the same feelings?
Your goal with research is to produce new knowledge. This knowledge could be small or it could be a massive breakthrough.
Your second goal is to maximise reproducibility in your experiments. Good research is when you conduct an experiment and anyone reading your paper can fully replicate what you have done.
The methodology should report things exactly how you have conducted the experiment. As an example, with LLM studies, you should be reporting the model you’ve used, the sampling parameters, when you accessed the model, the computational environment to the run the model, how you’ve conducted inference (libraries), what LLM features you’re using (e.g. structured output) and so on.
Many LLM application papers are terrible because you cannot replicate what they have done and they omit too much detail.
I agree with this post a lot.
Even for inference, experimenting with larger models requires high-end GPUs. In privacy-sensitive situations where models need to run locally, you’re often forced to use smaller models that don’t achieve state-of-the-art performance. That also means your results and benchmarks may not reflect what is currently possible with the latest frontier models.
I can see how annoying it’d be not being at one of the major labs and being super constrained. It has flow on effects for your actual learning outcomes too.