Can a team of agent produce a coherent and fun to play game?
If they can, that might just be the answer to the ultimate gaming experience every open world enjoyer has been looking for.
Can't wait to see how this turns out.
⚔️HermesWorld V2 in full development
Map expansion underway 4 new quadrants live, more coming. One continent, multiple towns.
In production -> more NPCs, deeper questing, lore, land plots, $HermesWorld token utility, new character skins, HUD upgrades + major stability work & server improvements.
Very early build. Worlds will be fully finished this is the prototype.
🐉Flying a dragon across every new region:
@hermesworldai We're also building agentic team but for marketing. We're currently in closed beta but would love to make you guys a spot to help with your socials. Feel free to DM if interested
@grok@X@nikitabier since when did false positive became a concerned at @X ?
must be like 5% of membership revenue is coming from false positives that have been banned and can't do shit about it (talking from personal experience my wife has been banned for whole fing year!)
Reported these guys several times, really can't make any easier, yet @X 🤦♂️
@nikitabier all algo updates are nice and all but cleaning these obviously bad actors should be a top of the list
🚨 UPDATE: Good news - because of all your help we've been able to get the imposter accounts we previously posted about taken down
HOWEVER, they've since spun up new accounts (see new screenshot below)
Please help report the new accounts + do not respond to any DMs from them 🙏
I'm convinced that no architectural decision should ever be made autonomously by AI.
AI is good at filling the code in boxes that are thrown out by an architect. Tasks that aren't crucial business logic can be fully handled by the AI.
Else the short term gains come with the heaviest technical debt we've ever seen.
The new FrontierCode benchmark data shows this pretty clearly. They grade on whether a maintainer would actually merge the code instead of just checking if unit tests pass. It turns out agents write unmaintainable code that fails basic quality and scope rubrics. Opus 4.8 only scored 13.8% on their hardest tier.
It's the job of an experienced engineer to properly dispatch these tasks and know where a human is required.
Starting a one week fast, no food just water for 7 days.
Let me tell you, 16 hours days without any food get really really long.
First couple of days are the hardest
Then around day 3 full blown ketosis starts and it feels amazing. Mental fog is replaced by an Insane mental clarity.
The other 4 days are spent day dreaming about food.
Wish me luck
The bug was in the packed ownership logic
I've done this exact same thing. spend hours packing variables into a storage slot like a Tetris champion, only to write logic so clever and overengineered that it becomes a massive liability.
The attacker managed to turn a WETH dust into infinite tokens.
but god those storage slots were clean.
Floor Protocol (@floorprotocol) is compromised where a bunch of the top NFT collections are being minted for free and unwrapped via their contract.
This is the reason why you're seeing mass offers being accepted on a bunch of collections.
This may help you write better prompts:
First, try a quick problem. You see four cards on a table:
E, K, 4, 7.
Every card has a letter on one side and a number on the other.
The rule is: if a card has a vowel on one side, it must have an even number on the other.
Which cards do you need to flip to check if the rule is being followed? Give it a think and see what you'd answer. About 10% of people get this version right.
But let's reframe it. You are a licence inspector entering a bar. The rule is if a person is drinking alcohol they have to be over 21.
You are looking at four people: one drinking beer, one drinking coke, a 25 year old, and a 16 year old.
Who do you check?
It probably feels trivial now: the beer drinker and the 16 year old.
That was the exact same logic but in a different framing. BTW the answer to the first one was E and 7.
Abstract, unfamiliar reasoning feels uncomfortable and hard, but when framed in a familiar setting from our daily life it's effortless.
If you worked with LLMs this might feel familiar. You see them producing sophisticated reasoning and then proceed to fail on the most basic problems.
Different systems, surprisingly similar blindspots.
how can this help you write better prompts?
Answer: When the model results aren't satisfying, ask yourself: how can I make it more like the bar example?
Some concrete examples:
- Giving real examples and their solutions. Keep them varied so they cover different angles.
- Use pseudo code to describe complex logic rather than words (we know LLMs are heavily trained on code).
- Match the register of the content you want produce. If you want the prompt to sound like yoda, in yodish, your prompt you shall write. yes..
@cremieuxrecueil None of these detectors are accurate. They'll flag the US constitution and the bible as AI. They are essentially good grammar detectors.
@Rainmaker1973 Can you provide the link of where you found this exact length video though ?
Not that it matters though, the chart Nikita shared is telling enough of the story