Ctralman omoal 🧢 @ctralman - Twitter Profile

Ctralman omoal 🧢 @ctralman

4 days ago

@thsottiaux Codex cloud threads view from mobile app

0

3

0

56

Ctralman omoal 🧢 @ctralman

5 days ago

@STUD_MAN_X @thsottiaux Quantization matters here the most, that's why you get Q4 kimi k2.6 at so cheap

0

65

Ctralman omoal 🧢 @ctralman

5 days ago

@STUD_MAN_X @thsottiaux Yes, they can shard the full MoE weights across many GPUs using expert parallelism, and each token only computes the routed active experts. But the full weights still need to be stored across the serving replica. They can’t have one central copy and stream experts to many fleets

3

0

22

Ctralman omoal 🧢 @ctralman

5 days ago

@STUD_MAN_X @thsottiaux If you say, “5T total and 100B active” so that means only ~100B params are used for compute per token, but the full 5T weights must be sharded somewhere. They also can’t rely on heavy offloading at scale, because that would destroy throughput.

0

1

0

18

Ctralman omoal 🧢 @ctralman

5 days ago

@SiddharthInk_ @Chrisgpt 😂

1

3

0

1

39

Ctralman omoal 🧢 @ctralman

5 days ago

@STUD_MAN_X @thsottiaux Q4 is never the actual training weight precision. Models are usually pre-trained in FP16/BF16 or FP32, and then quantized to Q4 later. After that, they may run Q4-oriented post-training or fine-tuning to improve performance in the quantized format.

0

1

0

60

Ctralman omoal 🧢 @ctralman

5 days ago

@STUD_MAN_X @thsottiaux So why do you say q4 precision does not matter, vrams are not infinite

1

0

28

Ctralman omoal 🧢 @ctralman

5 days ago

@STUD_MAN_X @thsottiaux They use moe no doubt, but this time it is most probably recurring loop transformer, and precision matters if not then kimi k2.6 can not even run on 8x h100 with f32

1

0

77

Ctralman omoal 🧢 @ctralman

5 days ago

@STUD_MAN_X @thsottiaux It is not 4x costlier than 5.4, I think it is Larger model and they did not quantised it to 4bit, but for sure it is token efficient, I let it build my project from the plan.md and it uses 40% less output tokens than 5.4

1

0

49

Ctralman omoal 🧢 @ctralman

5 days ago

@STUD_MAN_X @thsottiaux 5.5 is Larger model need more resources for inference

1

0

77

Ctralman omoal 🧢 @ctralman

6 days ago

@Govindtwtt I forget which things I can do google

0

14

Ctralman omoal 🧢 @ctralman

6 days ago

@ash_twtz Use it for codex, Claude code does not have free plan

0

122

Ctralman omoal 🧢 @ctralman

6 days ago

@md_kasif_uddin K2.6 is the SOTA, deepseek has hight performance per $, gpt Oss fastest and cheapest yet I would say the greatest open weight model

1

2

0

81

Ctralman omoal 🧢 @ctralman

6 days ago

@karthikponna19 Assembly, actionscript, swift, rust, ruby

0

82

Ctralman omoal 🧢 @ctralman

7 days ago

@deedydas it's just the web traffic data from similarweb not the actual user count if you see the appstore / playstore download stats you will get the actuall view how dominant is the chatgpt in consumer ecosystem. Chatgpt has 1.3B+ downloads in mobile Claude has 40M+ downloads

0

2

0

210

Ctralman omoal 🧢 @ctralman

7 days ago

@Pallavi_345 yahoo on Netscape

0

14

Ctralman omoal 🧢 @ctralman

8 days ago

@f_demaku @thsottiaux join the trusted access for cyber to reduce the guardrails on cyber related work, otherwise it is quite painful. Most probably your Agent.md or skills contain something related to code review like "try penetrating", or some other cyber related phrase

0

72

Ctralman omoal 🧢 @ctralman

8 days ago

@scaling01 gemini 3.5 flash has higher error rates, so it needs multiple runs (more tokens) to complete a same task successfully that other models can do in a single run.

0

52