In London, GB, 2 pizzas cost just βΏ0.000258 today.
In 2010, the same order cost 10,000 BTC β now worth $0.78B.
Don't be Laszlo. π
Calculate yours β https://t.co/G1n4gHriMJ #Tj3OLVMGKzAw#BitcoinPizzaDay#Coinpedia
@RohanArun@OpenAI@Alibaba_Qwen@deepseek_ai This is what open model progress actually looks like β not leaderboard vibes, but real tasks, real results. Qwen and DeepSeek beating proprietary models on both time and cost? Open models are ready.
This is what open model progress actually looks like β not leaderboard vibes, but real tasks, real results. Qwen and DeepSeek beating proprietary models on both time and cost? Open models are ready.
π¨Striking new benchmarks for long-running computer-use beating @OpenAI with their own models!
β Codex Computer Use: 21 minutes and fails
Computer-use-kit + our native Mac app:
β @Alibaba_Qwen Qwen 3.6 Plus: 5m 27s success $0.325/$1.95π
β @deepseek_ai V4: 3m 34s success. $1.74/$3.48 π
β GPT 5.4: 2m 41s success $2.50/$15
β GPT 5.5 Pro: 4m 34s success $30/$180
Task: clip the latest video from our Youtube channel and post it to Tiktok.
We just published a new realtime upgrade to optimize our computer-use-kit runtimes(API launching soon).
We finally solved reliable computer-use for long-running tasks, and we use benchmarks to rigorously test and report which use-cases will work best on which models.
The cool thing is our benchmarks clearly show an upper limit to tasks, so you can use open models to run them!
@rohanarun@OpenAI Real benchmark > demo reel. 16 min with subtitles, hook, and cross-platform support vs 21 min of retries and a failure. The gap speaks for itself. π
Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! π
Round 1: Clip a youtube video from our channel and upload it to Tiktok
β https://t.co/AAsaZBioXq + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android)
β Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes.
Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits.
@sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. π
I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before https://t.co/aW1Cd9vOWN so I've been working on this for a long time.
We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box.
We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.
this is the part where people start paying attention to open source again lol. Hermes outperforming GPT 5.4 on agentic tasks isn't a fluke β instruction-following at scale matters more than raw benchmarks. and automated evals for computer-use agents is long overdue honestly
Introducing automated benchmarks for long-running computer-use agents, automatically generating the most comprehensive computer-use benchmark on the planet.
Using our computer-use kit, @NousResearch Hermes 405B instruct free is performing better than @OpenAI GPT 5.4.... π
Introducing automated benchmarks for long-running computer-use agents, automatically generating the most comprehensive computer-use benchmark on the planet.
Using our computer-use kit, @NousResearch Hermes 405B instruct free is performing better than @OpenAI GPT 5.4.... π
Did you know you can now fly drones with just your voice?
Well, with SuperPowers AI, this drone was controlled by voice alone.
https://t.co/qrSD6ZSY6T
Watch how it was done in this video π»
AI is not backing off !!!
I used #seedream4.5 to generate this ads for Coca-Cola, and I must say, it truly appears as if it was short in the studio.
I highly recommend experiencing it for yourself on @SocialSight, where you can also receive free 200 points daily.
#SocialSightAI