I have been looking at ways to run Trillion scale models using consumer devices in a distributed manner. I was looking at low-rank iterative distillation as a way to allow lower memory/compute devices to run/train large scale models. 1/n
@TheAhmadOsman It isn't if you do decentralized pretraining and use diloco like approaches. But I think with a different focus (composability, hard routing, embarrassingly parallel inference and PEFT) and some R&D it should be feasible.
My attempts to fix my dopamine/brain by avoiding twitter are on break to bitch about anthropic and highlight that we've been building decentralized AI on bitsota ! Mine with us https://t.co/kBAX693ABX or buy our token https://t.co/h4rKuxH2hX
@Teknium You can participate in decentralized autoresearch on https://t.co/uqmKcv9Hii using hermes
We're currently trying to find the best binary and ternary quantized model up to 27b size
With plug and play skills ! https://t.co/9UiFJjEMOX
We are doing similar work as well on @bitsota_ai ! We were actually inspired by your implementation in how we ended up doing ours. We were using Automl zero like evolutionary search before @karpathy's tweet highlighted that the days of non-LLM Automl are over then we switched to LLMs. Happy to connect.
been working on moonfish, an obsidian writing companion. i see it as a small experiment in ambient/calm computing:
keystrokes ripple a pond; sentences become fish; moons keep the weeks writing.
I literally run 12 hermes agent instances every day in parallel to build Hermes Agent, and its now a top 100 GitHub repositories of all time. Agents do bring value and do create substantive software and work.
@SimoneSyed@DarioAmodei Is TFR always expected to be stable or grow? Can we not solve for X? What is the best X population count for meeting civilizational goals for community Y over a time frame of N years, including quality of life metrics, GDP/capita, scientific breakthroughs, galactic colonies, etc