@suchenzang You need to treat writing as code and do it in claude code, it actually works well, You can plan ahead and create plotline, characters, tone etc. and you can ensure the agent adheres to them.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
Gemma-4 12B, quantized with mlx-optiq, running on a Mac: 2.5x faster than bf16 (28.8 vs 11.6 tok/s on M3 Max), 2.7x smaller (8.9 GB vs 24 GB), fits in 16 GB of RAM. The sensitivity-aware 4-bit also beats naive 4-bit by +6.4 Capability Score (+13 long-context, +11.6
code, 93% GSM8K). No accuracy tax.
pip install mlx-optiq.
@PaulGugAI@Teknium@NeoAIForecast The optiq quants tend to be more accurate and faster compared to the same uniform 4 bit ones on mlx. With MTP it will be quite fast - https://t.co/UH381cyvJt