Antwane

@jun_song we should stop building in silos. A community of devs working together on one shared project would save everyone time, energy and effort compared to everyone duplicating work on their own. Would love to hear your thoughts. I can also bring funding to support the initiative.

444

Antwane

@AntoinePinelli

2 months ago

@jun_song https://t.co/hpJu8ayJog

Antwane

@AntoinePinelli

2 months ago

I've been working on something for a few months now and I'd rather talk to you directly before it becomes public anywhere. I've been building a heavily optimized Rust fork of Ollama. The goal has been to push inference performance significantly beyond the current baseline, particularly on Apple Silicon but also on commodity hardware. I've personally invested over 15,000€ in LLM API tokens alone (for prompt engineering, architectural planning, code review, benchmark analysis, and iterative refinement). That's just the token spend, not counting the time. The technical focus has been on: - MLX backend integration for native M-series acceleration, leveraging the Apple Neural Engine and unified memory architecture - Metal Performance Shaders for custom compute kernels - candle (Hugging Face's Rust-native ML framework) as an alternative inference path, with ort + CoreML execution provider fallback - Custom KV cache reuse strategies across concurrent requests for massive latency reduction on multi-turn conversations - SSD-backed context offloading for 128k+ context windows without OOM - Continuous batching inspired by vLLM for multi-user scenarios - Tokio-based async runtime with scheduler tuning for low first-token latency - Zero-copy memory mapping for GGUF model loading The fork is designed to be modifiable, tunable, and extensible. Not locked into a single runtime philosophy. Everything from the scheduler to the memory layer is open for iteration. I'll share the full codebase with you Now here's the bigger picture, because this fork is only one piece of what I'm building. I'm also working on a second project called AURA, which is a decentralized self-improving LLM network. The underlying idea is "Bitcoin for AI". Instead of centralized training on hyperscaler GPU clusters, AURA uses federated learning via Flower to continuously improve a base model (currently Gemma 4 31B) across a distributed network of nodes. Contributions are tracked and rewarded through a custom Substrate-based chain with a native token. The goal is an LLM that gets progressively smarter over time without depending on any single company's training budget. And I also maintain a fork of a Rust-based personal AI agent framework that handles multi-channel communication (WhatsApp, Telegram, Discord, iMessage), long-term memory, MCP tool use, and autonomous task execution. Here's where it all comes together, and this is what I'd love to build with someone like you. The plan is to synchronize these three pieces into a unified local AI stack: 1. The optimized Ollama fork serves inference. Fast, efficient, minimal resource footprint on consumer hardware. 2. The agent framework runs the autonomous logic. It handles the user interactions, tool use, memory management, and long-running tasks. 3. AURA runs as the learning layer in the background. And here's the key architectural move: the agent framework runs as AURA's data collection and synthesis agent during off-hours. While the user sleeps, the agent queries connected knowledge sources, crawls relevant domains, analyzes new content, extracts insights, and submits training contributions to the AURA federated learning network. The result: a local LLM that is lightweight, powerful, and actually learns to grow on its own every single night without human intervention. Every node running this stack contributes to the collective intelligence. Every node also benefits from the continuously improved weights pulled down from the network. The system gets smarter while you sleep, on your own hardware, under your own control. This is a long-term bet on local-first, user-owned AI infrastructure that doesn't depend on OpenAI, Anthropic, or any centralized provider. Rust is the foundation throughout because performance, memory safety, and cross-platform deployment matter. What we can do : 1. Share the full code with you right away, no strings. You review what I've built, tell me honestly what you think, what to improve, what's missing. 2. If you see the potential, we co-found a Discord community around this project. You and I would both be administrators. We build the core group together, vet the early contributors, and shape the technical direction jointly. 3. We aim the community at a concrete goal: outperforming Opus 4.7 latency on M-series hardware within 12 months. That's a rallying cry strong enough to attract serious contributors. On community building and reach, I want to be transparent. My X following is small because I only started using the platform seriously six months ago when AI conversations moved there. But I have other distribution channels: - 250,000 followers on Instagram (built over years in a different business, but an engaged audience) - Significant press coverage across multiple outlets and years - An existing company (Soflution ltd) with real revenue and the ability to fund community initiatives So when we're ready to go public with a Discord launch, I can amplify it in ways that reach beyond the typical dev Twitter bubble. That reaches an audience that doesn't normally engage with open source but might fund or champion the right project. I'm reaching out to you specifically because of your work and because I want one real collaborator before this becomes anything public. I appreciate you reading this far.

267

Antwane

@AntoinePinelli

2 months ago

@jun_song Sure, I text you this afternoon

Antwane

@AntoinePinelli

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users