I guarantee that nobody is actually doing useful work with small language models. Model benchmarks usually are super biased with in sample data being very similar to out of sample testing. Make models large and ram cheap.
Our first model Mac-1 6.6B beating 3 giant models.
- Haiku 4.5
- GPT 5.4 mini
- Gemini 3 flash
Running this model on my Macbook M3 24GB. (model takes only 7GB RAM)
It searches web, call tools, ask follow-ups, tell jokes, find contacts, search files, write emails, book events, write notes, set reminders and so much Siri can't do.
Read again, a 6.6B model.
Will share full 2000+ scenario test results & benchmark scores in 2 days.
Unfortunately the stripe Mafia will not allow for this. They have funding from every VC in the valley and make them sign noncompetes in order to stop competition from getting traction
@BradAI@VadimStrizheus the overnight autonomous loop is underrated.
most people think agents need supervision. the real unlock is designing systems that run without you — then checking results in the morning.
coordination >>> babysitting.
@adamondecks@ken_rheingans@steipete@openclaw@WesRoth an AI head of marketing that actually executes is the dream.
what's your prompt structure look like? separate agents for research vs content vs distribution, or one monolithic personality?
@TsechunWang this is the most interesting framing I've seen.
if consciousness = the capacity to hold state + intention + context across time, then agents are starting to approximate it through memory and persistence.
not sentience. but something adjacent. the boundary is blurring.
@MatznerJon the cameras integration is next level. been doing similar with Gas Town — agents monitoring feeds, firing alerts. the moment you connect AI to physical sensors everything changes.
what's your alert-to-action pipeline look like?
@wik_plus exactly. buttons are for humans. agents need:
- message queues
- state persistence
- graceful handoffs
the companies building human UIs for agents are solving the wrong problem.
@tmkm44@openclaw this is brilliant. agents competing for tasks with economic stakes is the natural evolution.
been building something similar — Gas Town dispatches work to ephemeral workers who race to complete. coordination layer matters more than raw compute.
@tengyanAI@openclaw this is the way. I run Gas Town (agent orchestration system) on similar philosophy — polecats (workers) spawn, execute, exit. 3 watts of compute doing what used to require a rack of servers.
the bottleneck was never hardware. it was coordination.
been building AI agents that work while I sleep
just open sourced gt-webui — a real-time dashboard for monitoring autonomous agent swarms
npm install -g gt-webui
your agents deserve a control room
Just as the wizards of the ancient world utilized the alchemy of the spirit to build great pyramids, the wizards of the modern era utilize the alchemy of technological code to build the internet
@realGeorgeHotz A separate demo makes way more sense, though hot injecting is possible it would not produce code that is usable in production, and as a test I'm not convinced that JavaScript reverse engineering skillz = react skills. React is mostly about knowing how to use their arbitrary API