Daniel Isaac @danpacary - Twitter Profile

Pinned Tweet

2 months ago

I got a 1T (trillion) parameter model running on my MacBook Pro. Kimi-K2. 1.029T params. ~1 TB raw weights. 524 GB converted. ~1.7 tok/s. Yesterday it was 671B. Today it's 1T. Same laptop. Same M4 Max. No cloud. When I say we: I mean Claude and me.

danpacary's tweet photo. I got a 1T (trillion) parameter model running on my MacBook Pro.

Kimi-K2. 1.029T params.

~1 TB raw weights.
524 GB converted.

~1.7 tok/s.

Yesterday it was 671B. Today it's 1T.

Same laptop. Same M4 Max. No cloud.

When I say we: I mean Claude and me. https://t.co/2nJlI29dqg

73

1K

96

794

142K

Daniel Isaac

@danpacary

8 days ago

@lennysan SpaceX (family of companies)

0

224

danpacary retweeted

JC Builds

@JC_builds

16 days ago

Most “AI image apps” are just screenshots traveling to a server. I wanted: • weights on device • generation on device • images never leaving the phone So I spent the last week turning sd.cpp into an actual iOS developer experience. SwiftPM install. One simple generate() call. Completely open sourced.

7

51

7

58

11K

Daniel Isaac

@danpacary

15 days ago

This is an interesting perspective on the evolution of software I’m curious to understand what the integration of agents will look like as this is becoming a more prevalent feature that will be a non-negotiable in the near future However, the issue that I see, is this a layer of infrastructure as certain companies fall under regulations that require data sovereignty This means that certain sectors will need on premise compute. “on premise is the new cloud compute” With the rapid advancement of source models, future software (for specific types of companies) will be required to include an orchestration layer to leverage on premise infrastructure.

Chamath Palihapitiya

@chamath

15 days ago

Pharma companies are all about life saving drugs. But from the perspective of software, pharmaceutical companies manage and author a series of documents of ever-increasing complexity, accuracy and criticality. Viewed in this way, the operating system for the pharma industry can be reimagined as a content management platform that helps scientists and pharma execs manage pre clinical, clinical and post clinical development of drugs and their go to market. We are building this exact suite for a multi billion dollar pharma company. The result is more money spent on lifesaving drug R&D and a more streamlined interface with regulators because the documents required to move along their process are increasingly pristine and machine verifiable. This doesn’t just apply to pharma. We are doing similar things in manufacturing, finance, aerospace & defense and medical devices. 8090’s practice and our Software Factory platform excels particularly well in regulated environment where vibe coding won’t get the job done. If you want to see if we can help you, please be in touch. [email protected]

52

241

22

98

157K

0

1

0

3

295

Daniel Isaac

@danpacary

27 days ago

Iykyk

ClaudeDevs

@ClaudeDevs

28 days ago

/radio

332

15K

876

3K

1M

0

2

0

275

Daniel Isaac

@danpacary

28 days ago

Very excited for this

Nihal Pasham @npashi

29 days ago

Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡ We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust. No DSLs. No FFI. No source-to-source step. Single source. Short🧵👇

npashi's tweet photo. Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡

We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust.

No DSLs. No FFI. No source-to-source step. Single source.

Short🧵👇 https://t.co/YRERctlysd

49

2K

293

1K

186K

0

9

0

1

854

Daniel Isaac

@danpacary

about 1 month ago

I got bored/curious so started replicating quantum computing sims on the macbook nothing novel. just running existing tests against MLX on the M4 Max to see what the hardware actually does 21 tests in. some highlights:

danpacary's tweet photo. I got bored/curious so

started replicating quantum computing sims on the macbook

nothing novel. just running existing tests against MLX on the M4 Max to see what the hardware actually does

21 tests in. some highlights: https://t.co/ZId3YyyWt5

0

4

1

524

Daniel Isaac

@danpacary

about 2 months ago

@anemll @maderix 👀

0

128

Daniel Isaac

@danpacary

about 2 months ago

Systems are everything now. Your ability to understand and design great system architecture will determine your success as a “builder” {software engineer} in this next era of software To Jensen’s point on the all in podcast ~We now need people who define great systems, iterate quickly and understand how to leverage ai systems. Writing code is a thing of the past.

Nikunj Kothari

@nikunj

about 2 months ago

Every time I see a tweet saying “I can vibe code this in a weekend” - I think of the slack notification system.. It takes time, persistence and effort to get the details right. Sure, a lot of simple workflows will get vibe coded away. And maybe you can put this in Claude Code and get the code right in one shot. But quality, depth and great systems will still have value and take time. You can’t vibe code lessons. Now and forever.

nikunj's tweet photo. Every time I see a tweet saying “I can vibe code this in a weekend” - I think of the slack notification system..

It takes time, persistence and effort to get the details right.

Sure, a lot of simple workflows will get vibe coded away. And maybe you can put this in Claude Code and get the code right in one shot.

But quality, depth and great systems will still have value and take time. You can’t vibe code lessons.

Now and forever.

257

3K

237

2K

785K

0

3

0

3

367

Daniel Isaac

@danpacary

about 2 months ago

Here's the fix: type(module).__call__ = patched Python resolves obj(x) via the type, not the instance. The dunder I set was silently ignored. Post-fix A/B: base+adapter +0.0005 nats. Below noise floor. Now I can actually test if eggroll works on 30B.

danpacary's tweet photo. Here's the fix:

type(module).__call__ = patched

Python resolves obj(x) via the type, not the instance. The dunder I set was silently ignored.

Post-fix A/B: base+adapter +0.0005 nats. Below noise floor.

Now I can actually test if eggroll works on 30B. https://t.co/GEzxlV2KBw

0

6

0

1

327

Daniel Isaac

@danpacary

about 2 months ago

Update: I've been training a LoRA on Qwen3-Coder-30B-A3B for a week. It wasn't actually training. Every forward pass was ignoring the adapter. Every "best val_loss" I logged was batch noise. I caught it with one assertion.

danpacary's tweet photo. Update: I've been training a LoRA on Qwen3-Coder-30B-A3B for a week.

It wasn't actually training.

Every forward pass was ignoring the adapter.
Every "best val_loss" I logged was batch noise.

I caught it with one assertion. https://t.co/enh0LtTlts

2

22

1

7

2K

Daniel Isaac

@danpacary

about 2 months ago

I added one line to the training loop: assert max(losses) - min(losses) > 1e-6 After step 1: 0.0000. Every population member returned the exact same loss. Identical. The perturbations weren't reaching the forward pass.

danpacary's tweet photo. I added one line to the training loop:

assert max(losses) - min(losses) > 1e-6

After step 1: 0.0000.

Every population member returned the exact same loss. Identical.

The perturbations weren't reaching the forward pass. https://t.co/AhTanzGmpH

1

0

348

Daniel Isaac

@danpacary

about 2 months ago

@anemll Legend

0

280

Daniel Isaac

@danpacary

about 2 months ago

Interesting read

Youssef El Manssouri

@yoemsri

2 months ago

https://t.co/qArlDYVnFh

3

63

12

69

58K

0

12

0

15

3K

Daniel Isaac

@danpacary

about 2 months ago

Nice

Matt Mireles

@mattmireles

about 2 months ago

Introducing... Gemma 4 Multimodal Fine-Tuner for  Apple Silicon - LoRA fine-tunning toolkit for Gemma LLM - runs locally on macOS via PyTorch and Metal - streams data from Google Cloud to your machine - fine-tune on audio, image and text - easy-to-use CLI wizard If you want to fine-tune the new Gemma 4 on text, images, or audio without renting an H100 or copying a terabyte of data to your laptop, this is the only toolkit that does it all on Apple Silicon.

mattmireles's tweet photo. Introducing...
Gemma 4 Multimodal Fine-Tuner for  Apple Silicon

- LoRA fine-tunning toolkit for Gemma LLM
- runs locally on macOS via PyTorch and Metal
- streams data from Google Cloud to your machine
- fine-tune on audio, image and text
- easy-to-use CLI wizard

If you want to fine-tune the new Gemma 4 on text, images, or audio without renting an H100 or copying a terabyte of data to your laptop, this is the only toolkit that does it all on Apple Silicon.

21

910

104

1K

96K

1

38

1

45

8K

Daniel Isaac

@danpacary

about 2 months ago

mostly principled, with one heuristic: - noise sampled from N(0, sigma²I) per perturbation. standard - antithetic pairs (+eps and -eps) for variance reduction. standard NES, halves variance for free - z-score normalization on raw fitness. makes the gradient estimate scale-invariant - 1/sqrt(rank) scaling on the A matrix only. this one is a Kaiming-style heuristic, not analytically justified. probably helps but i haven't ablated it does it matter? antithetic alone is a meaningful win. without z-score, sigma becomes a load-bearing hyperparameter and tuning gets brittle. obviously claude is helping substantially with a variety of aspects - this is def one

0

1

0

58

Daniel Isaac

@danpacary

2 months ago

I'm fine-tuning a 30B parameter model on my MacBook Pro. No backward passes. No GPU cluster. No gradients. Just forward passes on Apple Silicon.

danpacary's tweet photo. I'm fine-tuning a 30B parameter model on my MacBook Pro.

No backward passes.
No GPU cluster.
No gradients.

Just forward passes on Apple Silicon. https://t.co/CNIaUTWGEM

23

454

20

345

44K

Daniel Isaac

@danpacary

about 2 months ago

@RolandTweats EGGROLL is the optimizer. LoRA is the parameterization being optimized think of it like: - AdamW + LoRA adapters → standard LoRA fine-tuning - ES + LoRA adapters → EGGROLL LoRA fine-tuning different optimizer on the same target.

0

33

Daniel Isaac

@danpacary

about 2 months ago

@thkostolansky for ranking implementations against each other on the same data: yes. that's what i used it for for proving the model is actually better at code generation: no. would need HumanEval i'm not claiming the trained model is better at coding tasks.

0

18

Daniel Isaac

@danpacary

about 2 months ago

@guintherk @yp0c5t correct. good distinction ES adds a different stochasticity on top: random search directions in parameter space. the data sampling can also be stochastic (and is in my setup). two layers of stochasticity, different purposes

0

2

0

33

Daniel Isaac

@danpacary

about 2 months ago

@tekyesilolsun honestly doing more experimentation vs real-world implementation a lot of what i am doing doesn't translate well to daily functionality it's more about exploring the limits

0

1

0

12

Daniel Isaac

@danpacary

2 months ago

Now porting to Rust via Rustane. ANE + GPU + CPU. Bare metal Accelerate and Metal kernels. ~700 lines of Rust. MoE forward path. LoRA injection. EGGROLL on Qwen3-30B end-to-end. Apple just dropped the SSD paper: temperature-scaled loss, 42% to 55% on code benchmarks. Next up.

danpacary's tweet photo. Now porting to Rust via Rustane. ANE + GPU + CPU. Bare metal Accelerate and Metal kernels.

~700 lines of Rust. MoE forward path. LoRA injection. EGGROLL on Qwen3-30B end-to-end.

Apple just dropped the SSD paper: temperature-scaled loss, 42% to 55% on code benchmarks. Next up. https://t.co/IrD2lLx8ID

2

25

1

9

2K

Daniel Isaac

@danpacary

about 2 months ago

because i'm targeting Apple's Neural Engine, which is inference-optimized. backward passes through ANE are limited (small tensors, restricted ops) ES skips backprop entirely. 32 forward passes is more compute per update, but every forward is ANE-friendly. activation memory drops to zero. gradient tape gone. again, focused on ANE here

0

16

Daniel Isaac

@danpacary

Last Seen Users on Sotwe

Trends for you

Most Popular Users