@decoded_dev@jpschroeder No, it does not matter especially with modern engineering around quantization.
And your entitlement of having contributed to pytorch has nothing to do with the accuracy of your arguments.
You already mixed up a lot of concepts inaccurately in ur earlier comment lol.
@decoded_dev@jpschroeder This relates to why different jobs you do take difference precision. Training at fp8 uses e5m2 to have larger range of coverage while at inference you use e4m3.
And of course, for the optimizer, you go for fp32 and that's one reason why training is so fucking expensive
@decoded_dev@jpschroeder it is not a big deal and the only reason the AI will say it is a big deal is because you pushed it that way.
It is largely agreed that fp8 does not lose too much accuracy that is relevant compared to bf16 in inference.
Yes, for training, you need higher precision.
If future models display much fewer undesirable propensities, we could become more concerned about catastrophic misalignment, as weโd be worried that models may have learnt to evade detection (for example, as a result of being trained not to produce misaligned reasoning).
@dragonitematero@WatcherGuru Are you fucking retarded
The "not preferred long term model" means that OpenAI would not prefer that the government has to check every release and validate it before rollout