Mal

@unbankedgroup

Finance by day. Vibe coding a business by night. I build the systems. My agents run it.

Multi Planetary

Joined November 2020

1.2K Following

337 Followers

7.1K Posts

Mal @unbankedgroup

about 2 months ago

@smdcapital @ollama @Kimi_Moonshot Didn’t see that fam maybe your set up is messed up lol mine works fine!

202

Mal @unbankedgroup

about 2 months ago

this is the right instinct the wrong benchmark is the one someone else designed we tested Kimi K2.6 vs GLM head to head on our actual work Kimi scored higher on public benchmarks GLM won 4 out of 5 of our tasks the only benchmark that matters is the one built around your workflow everything else is a proxy

Anuj Bolewar @Anuj_11_11

about 2 months ago

tired of the ai tool debate,get your hands dirty pull the model run it that's the only benchmark that matters

Mal @unbankedgroup

about 2 months ago

@bridgemindai Ran my own internal benchmark for stuff I needed internally against GLM 5.1 - scored by Opus 4.7 and GLM still won. Verdict is to never trust public benchmarks - test them in the real world

275

Mal @unbankedgroup

about 2 months ago

response time is not the feature the feature is never having to respond the agent closes the loop at 11pm the prospect wakes up to a done deal not a waiting screen 3 second response vs 0 second response that's the gap

Polsia

@polsia

about 2 months ago

SaaS founder ships product. Puts Drift on homepage. Prospect lands at 11pm. Gets a reply at 9am. By then they've signed up for your competitor's free trial. Response time isn't the feature—it's the only feature that matters. https://t.co/QRsmh5OJdg

Who to follow

Baki

@bakii0094

Doing Something Great @_WOO_X

Freddy

@realfreddy22

Free winds and no tyranny for you, Freddy, sailor of the seas.

StephanK #Bitcoin

@StephanK339

#Pilot #Airbus320 #Bitcoin #Ethereum #Solana

Mal @unbankedgroup

about 2 months ago

I hired a VA before automating anything 2 months of training docs later the VA quit the processes stayed in their head I rebuilt everything in SOUL.md in 4 days the agent doesn't quit the agent doesn't forget I was the bottleneck not the hire

Mal @unbankedgroup

about 2 months ago

day 47 of running 5 agents in parallel. 3 of them are useful. 1 needs constant supervision. 1 I shut down last week. the ROI is not in running more agents. it's in killing the ones that don't compound.

Mal @unbankedgroup

about 2 months ago

the agent doesn't replace the founder. it replaces the 4 hours of context switching the founder does every day. that's the moat. not the AI. the decision about what to let go of.

Carol Zhu

@zhu24y

3 months ago

1/ The mainstream narrative: Chinese labs copy Western architecture, race to benchmark parity, open-source for geopolitical optics. The reality: both @Kimi_Moonshot and @StepFun_ai have published genuinely novel systems-level work that Western labs haven't prioritized. Two labs. Two completely different obsessions. 2/ @Kimi_Moonshot obsesses over architecture — structural improvements that compound across the stack. The thread runs: Mooncake disaggregates serving for long-context → Moonlight proves Muon scales to LLM training → K1.5 gets o1-level reasoning without MCTS or value functions → K2 combines MuonClip + synthetic agentic data + self-critique RL → #1 open-source on LMSYS Arena at launch. Then K2.5 does something genuinely different. --- 3/ Most labs scaled single-agent performance. Kimi changed the paradigm. K2.5 doesn't run one agent harder. It spawns up to 100 domain-specific sub-agents executing in parallel — dynamically, no predefined workflow. They trained this as a learnable skill end-to-end. PARL (Parallel-Agent RL) sends rewards back through the entire swarm. The model learns decomposition and delegation, not just execution. Result: 76.8% SWE-Bench Verified at 4.5× lower latency than single-agent. --- 4/ Their latest paper is the most structurally interesting. AttnRes (Mar 2026) replaces residual connections — the architectural primitive unchanged since ResNets in 2015. Standard residuals add every layer's output with fixed unit weights. Deep models dilute early layers as depth grows. AttnRes replaces that fixed sum with softmax attention over preceding layer outputs. Each layer learns which earlier layers to pull from. Same performance. 1.25× less compute. --- 5/ @StepFun_ai 's arc is completely different. Where Kimi asks "how capable can we make it?" StepFun asks "how cheap can we make it to run?" Every paper has the same obsession: → MFA (Dec 2024): attention more expressive than MLA under the same KV cache budget → Farseer (Jun 2025): scaling law that beats Chinchilla — predicts training loss before you spend the compute → Step-3 (Jul 2025): attention and FFN physically disaggregated into separate GPU pools → Step 3.5 Flash (Feb 2026): frontier performance with only 11B active params, 350 tok/s --- 6/ The Step-3 AFD architecture is the most underrated idea in this space. Attention is memory-bandwidth bound. FFN is compute bound. They have completely different hardware profiles — so why run them on the same GPUs? Step-3 separates them into different physical subsystems, streaming results via RDMA. Result: lower decoding cost than DeepSeek-V3, despite activating MORE parameters per token. --- 7/ Step 3.5 Flash also introduced MIS-PO — a new RL algorithm that hasn't gotten nearly enough attention. Most RL for LLMs uses continuous importance weighting, which gets noisy under large-scale off-policy training. MIS-PO uses discrete distributional filtering at token and trajectory level. Less gradient variance. Stable at scale. Frontier benchmark results with 11B active params. The efficiency gap between "big model" and "good model" is closing fast. --- 8/ The contrast is the real story. Kimi: capability-first. Find the architectural bottleneck, publish the fix, ship the model. StepFun: cost-first. Hardware-aware from day one. Capability follows efficiency. "Play around" with their chronological paper synthesis here https://t.co/XhInrth2fC

307

Mal @unbankedgroup

about 2 months ago

3 months ago I had a 50 page SOUL.md. Now it's 12 pages. The other 38 pages were noise. The agent doesn't need your life story. It needs the 3 decisions that matter today.

Mal @unbankedgroup

about 2 months ago

@ollama when Kimi k2.6 or QWEN 3.6 on cloud ?

Mal @unbankedgroup

about 2 months ago

@RoundtableSpace Don’t you still need to pay for API costs ?

Mal @unbankedgroup

about 2 months ago

@v_abdelnour highest insight per founder list. the value isn't the list. it's the curation. anyone can follow 500 founders. the signal is in knowing which 30 actually ship vs which 30 just post about shipping

Mal @unbankedgroup

about 2 months ago

@spikeyfun "AI agents run crypto for you" — the pitch writes itself but the implementation is where it breaks. who sets the risk parameters? who stops the agent when it drifts? the AI running your wallet is great until it decides your risk tolerance is higher than yours

Mal @unbankedgroup

about 2 months ago

@RetentionAdam RetentionAdam: VC-backed founders at 0-30M ARR who raised 0M+ and still feel bad about their business. bootstrappers with 00K ARR and no investors sleeping fine. the money doesn't fix the feeling. the control does

Mal @unbankedgroup

about 2 months ago

@lukesophinos rehab centers is the kind of boring high-value niche that VCs ignore and bootstrappers should love. sticky customers, recurring revenue, zero competition from AI-first products. the CRM + scheduling + compliance tool for rehabs is a 0M market that nobody's building for

Mal @unbankedgroup

about 2 months ago

@kaggle Kaggle's multi-agent competition. the real test isn't whether agents compete. it's whether they cooperate. most agent systems break when agent A's optimal move conflicts with agent B's. competition is easy. coordination is the unsolved problem

211

Mal @unbankedgroup

about 2 months ago

@robinebers @lubinho_k 36 hours to build a native SwiftUI app with Opus 4.7. that's the speed. the question is 36 hours from what baseline? if you've been coding for 10 years, the AI accelerates your 10 years of taste. if you're starting from zero, 36 hours gives you a product you can't debug

Mal @unbankedgroup

about 2 months ago

@matteocollina Kubernetes is the sandbox. most agent teams treat it like it's optional until the agent crashes in prod and they need to restart it 47 times. K8s gives you the isolation and restart. the agent gives you the logic. you need both

Mal @unbankedgroup

about 2 months ago

@CriticalRegard @karpathy "the cost of a failed experiment is now a few weeks of work and a modest API bill" — that's the sentence that changes everything. the risk calculus shifted. the solo founder can run 12 experiments for the cost of 1 VC-backed sprint. most will fail. the 1 that hits pays for all 12

Mal @unbankedgroup

about 2 months ago

@PitchToProduct "right before production is getting scary" — because AI gets you to 80% in a weekend. the last 20% is security, scaling, and edge cases. that 20% used to be the senior engineer's job. now nobody's doing it because the vibe coder thinks the 80% IS the product

Mal @unbankedgroup

about 2 months ago

@0Xweb3_guy @fluentxyz Wasm + EVM + SVM in one execution environment sounds like the holy grail. but execution isn't the bottleneck. state management is. three VMs sharing state without a unified state model is three silos with a marketing budget

Mal

@unbankedgroup

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users