陳璿丞醫師(Soân-sêng Tân) @soanseng - Twitter Profile

soanseng retweeted

Akshay 🚀

@akshay_pachaar

24 days ago

https://t.co/Exoyd8tB0d

113

6K

761

23K

5M

soanseng retweeted

CJ Zafir

@cjzafir

19 days ago

Codex 5.5 use cases I found so far: > made my internet faster > made my local 6B SLM 3x faster > made my macbook pro faster like new > made a lightweight suite to write & test metal kernals > made a skill to communicate with claude code in realtime > made a pipeline to generate SFT dataset using Deepseek v4 > made a computer use workflow to fine tune models in Google Colab > made 4 routines to test workflows on autopilot 3 times/day

86

7K

285

13K

3M

陳璿丞醫師(Soân-sêng Tân)

@soanseng

16 days ago

電子報好讀版：https://t.co/kKc2qQ8Ndb

0

16

陳璿丞醫師(Soân-sêng Tân)

@soanseng

16 days ago

https://t.co/2x46pCJ1Uv

1

0

21

Who to follow

Pak-tâu、Pat-chi-lân、tōa-Ka-la̍h-pó、 Báng-kah。Tâi-pak ê Tâi-oân-lâng tē-thâu 。 #台灣海峽沒加蓋法案籌備委員會副主委委員長 @TYUIJKOL (新增彈射軌道方案) #FormosaCore #À_la_lanterne #天安門事件

soanseng retweeted

19 days ago

a prompt I've been using a lot recently: implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know

trq212's tweet photo. a prompt I've been using a lot recently:

implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know https://t.co/qQFTES4fjo

343

10K

582

12K

821K

soanseng retweeted

mr-r0b0t

@mr_r0b0t

18 days ago

Here's the first of five NVFP4 optimized benchmark. It's a crowd favorite that saw HUGE benefit from the CUTLASS back end! If you're using @NVIDIAAI Blackwell GPUs or a GB10 (DGX Spark or equivalent) this is for you! TLDR: 57.49 tok/s single stream on fully native architecture!

mr_r0b0t's tweet photo. Here's the first of five NVFP4 optimized benchmark. It's a crowd favorite that saw HUGE benefit from the CUTLASS back end! If you're using @NVIDIAAI Blackwell GPUs or a GB10 (DGX Spark or equivalent) this is for you!

TLDR: 57.49 tok/s single stream on fully native architecture! https://t.co/9bU41WwNyX

6

53

6

47

7K

soanseng retweeted

WquGuru

@wquguru

20 days ago

https://t.co/7veTsZUgJg

46

488

57

888

78K

soanseng retweeted

mr-r0b0t

@mr_r0b0t

20 days ago

Dear @UnslothAI My @NVIDIAAI GB10 (DGX Spark) absolutely loved your quantization of this model! unsloth/Qwen3.6-35B-A3B-NVFP4 It ran stably up to concurrency 64, not optimal but I did that for no other reason than to see if it would 😂 🧵

mr_r0b0t's tweet photo. Dear @UnslothAI
My @NVIDIAAI GB10 (DGX Spark) absolutely loved your quantization of this model!
unsloth/Qwen3.6-35B-A3B-NVFP4
It ran stably up to concurrency 64, not optimal but I did that for no other reason than to see if it would 😂
🧵 https://t.co/9taM9PEzLN

3

48

5

29

4K

soanseng retweeted

Sudo su

@sudoingX

21 days ago

i've run a stack of models across a single 3090, a 5090, and a 128GB DGX Spark. exactly three are worth building on. the honest list. the three worth it: > 1. StepFun Step-3.5 Flash, the REAP pruned 121B MoE (Q6, DGX Spark) a 121 billion parameter mixture of experts running on a single desktop box. the most worth-it model in everything i've tested. > 2. Qwen 3.6 27B Dense, Q4 (single RTX 3090) the undisputed king of the 24GB tier. one shot a playable game, around 41 tok/s, fits with context headroom to spare. one 24GB card, this is your answer. > 3. NVIDIA Nemotron 3 Nano Omni, 30B-A3B (DGX Spark) the best multimodal i've tested for video classification work. vision in, runs clean on the Spark. the rest, ran them, they hold up fine: on the Spark: DeepSeek V4 Flash 158B, GLM 4.7 Flash, GLM 4.5 Air REAP 82B-A12B, Gemma 4 26B-A4B, Qwen3-VL 235B-A22B, Qwen3 Coder 30B-A3B, Qwen3 30B-A3B, Carnice 35B-A3B. on consumer GPUs: Kimi K2.5 1T, Qwen3-Coder-Next 80B, Hermes 4.3 36B, Qwen 3.5 27B Dense. single 3090 to a 128GB Spark, that's the range. the three up top are the ones worth your hardware today.

24

267

17

313

40K

soanseng retweeted

Azeez

@AtlasInference

about 1 month ago

Atlas is open source! An inference engine written from scratch in Rust + CUDA. No PyTorch, no Python, no 200-dependency install dance. <2 min cold start, BUILT for GB10 Qwen3.6-35B at 130 tok/s on a single DGX Spark. Demo powered by @Gradio 🧵 https://t.co/vxZLwBJMub

27

302

58

297

36K

soanseng retweeted

mr-r0b0t

@mr_r0b0t

about 1 month ago

Made this for everyone who is working with a @NVIDIAAI DGX Spark (GB10) ⚡️ Definitely also bookmark the official site, it's a fabulous resource with playbooks for nearly everything you'd want to see! https://t.co/uAxkSvIbWG

mr_r0b0t's tweet photo. Made this for everyone who is working with a @NVIDIAAI DGX Spark (GB10) ⚡️
Definitely also bookmark the official site, it's a fabulous resource with playbooks for nearly everything you'd want to see!
https://t.co/uAxkSvIbWG https://t.co/H3YS9KTqEQ

23

218

24

232

16K

soanseng retweeted

Ronin

@DeRonin_

25 days ago

Andrej Karpathy: "90% of your AI coding bill is paying for context you didn't need to send" Here are 10 things senior AI engineers stopped wasting tokens on: 1. Auto-context loading 50 files for a 30-line fix: $1.20/turn for tokens you'll never read. 80% input waste, every session 2. Running Opus on lint, format, and rename tasks: $0.60 for what Haiku nails at $0.02. 30x overpay on the cleanup tier 3. Tool call loops that re-send the full repo on every retry: 5x context cost per agentic flow. fixing these alone cuts 30-50% of bills 4. Sonnet as the default model: Kimi 2.6 matches its quality on most coding tasks at 1/6 the cost. defaulting to Sonnet in 2026 is leaving 60-70% on the table 5. Streaming responses on stable-prefix workflows: kills your prompt cache. you pay 10x for tokens that should have cost cents 6. "Just in case" file includes: 80,000-token prompts that should be 3,000. context bloat is the silent budget killer 7. Per-session knowledge rebuilding: 10 min writing a SKILL.md once vs paying agents to re-figure out your environment every run. $4 vs $0.30 per execution 8. Single-model setups: premium tier on every task is the most expensive mistake in AI coding right now 9. Asking 10 small questions one at a time: 10 separate input prefix charges vs one batched call. 70-90% savings on routine workflows 10. Buying Claude Pro + ChatGPT Plus + Cursor Pro: you seriously use one. the other two are habit, not utility what actually compounds instead: - context discipline (grep before fetching, always) - prompt caching on every stable prefix - multi-model routing (Kimi 2.6 default, Opus for the 10%) - graduated skills via SKILL.md files - profiling tool calls before optimizing prompts - the routing mindset (right model for right task) in 12 months, the gap between developers shipping on $200/month and $4,000/month budgets won't be skill it'll be how well they route study this.

87

3K

387

8K

510K

soanseng retweeted

raymel 👋

@pseudokid

26 days ago

Whole day with Pi agent just used 10% of my $20 Codex weekly quota All gpt 5.5, various thinking levels I'm now 3 hours away from weekly reset and still have 42% left! For my typical 4-6h/day session: Codex Desktop - 20-30% a day, 1-2 days/week Pi - 10% a day, 4-5 days/week? My switch to Pi is really looking good so far

48

530

16

508

152K

soanseng retweeted

⿻ Audrey Tang 唐鳳

@audreyt

25 days ago

@antirez 🙋Hi! I'm happy to personally buy you one so we can work on https://t.co/m9B3NiOBj3 together. Please check DM / Gmail.

8

338

64

161

36K

soanseng retweeted

Mushtaq Bilal, PhD

@MushtaqBilalPhD

26 days ago

https://t.co/wBjpOMqu2O

16

1K

215

3K

533K

soanseng retweeted

Mnimiy

@Mnilax

28 days ago

https://t.co/W4MxdOyUKD

83

7K

702

24K

4M

soanseng retweeted

Andrej Karpathy

@karpathy

26 days ago

This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral https://t.co/z21CP5iQfu There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.

1K

19K

2K

21K

4M

soanseng retweeted