@SashaMTL Just stating facts. BLOOM having 1131 citations despite being released in 2022 while Llama2 having 3855 despite being released 8 months later. BLOOM was just severely undertrained with the amount of limited compute they had, with way too much ambition to do so many languages.
@dctanner@mov_axbx That’s a really expensive server on eBay considering its age and specs. It seems any used rack that can hold greater than 4 GPUs are highly inflated in price now.
@Yampeleg@abacaj Did similar trainings, and from some manual evaluations, the loss might have plateaued for hundreds of thousands of steps, but the quality of the generations are better given more epochs.
@BramVanroy OpenNMT does have most of those implemented since they are also now supporting LLMs. Marian looks dead, perhaps due to lowered importance by MSFT in preference of LLMs.
@e270889o@ID_AA_Carmack Plenty of ram, but slow compute-wise. Apple’s CoreML is too opaque to developers, so the Neural Engine hasn’t been usable in an obvious way yet.
@tmophoto@abacaj You split the layers across different cards. That’s why you need fast interconnects like NVLink so that the GPUs can process the computations quickly without bottlenecks.