I’m really curious about this. How much extra juice is there, performance wise, for model-specific silicon. It can’t be that easy to design your own chips, for your own models, but everyone send you be doing it. What’s a good thing to read on all this?
YES!
This paper basically confirms what many of us already suspected:
If you want better LLM results without paying for longer outputs or fine-tuning, there’s a concrete, low-effort tip:
Duplicate your prompt!
Researchers found that repeating the exact same input can dramatically improve performance (up to a 76% gain on specific tasks).
LLMs process text left to right, each token can only look at the previous context, never ahead.
So when you write a long prompt with context first and the question at the end, the model can rely on that context to answer, but the context was processed before the model even knew the question.
This asymmetry is a basic structural property of how LLMs work.
Repeating the prompt helps counter this limitation by giving the model a second pass over the full context.
There are no new losses to compute and no fancy prompt engineering involved.
It’s just a simple structural hack that works across almost every major model they tested.
Paper in 🧵↓
Recently, Yiannis Antoniou, Lab49's head of AI, data and analytics, shared his thoughts on @OpenAI's two new variants, o3-mini and o3-mini-high.
Read more in the full article below.
https://t.co/X6nb0bhn5e
📢 Joba Network Points Program is LIVE! 📢
We are thrilled to announce the launch of our Joba Points Program designed to reward our identity network! 🌟
Who Will Be Rewarded? 🤔
🟣 Early adopters who join us
🟣 Active users who complete profiles and interact with the network
Join the user owned network and start earning points today!
To learn more: https://t.co/38r73GvPEr
Sign up now and earn your first 100 points! https://t.co/X9PT13iUZn