@RishiBommasani@percyliang The analogy for cloud vs local would be restaurant vs takeout. At the restaurant you better behave otherwise you get kicked out. At home you eat your food however you want.
@MatthewBerman Sure about this? Given the current reproducibility crisis in ML research, I doubt that humans would achieve a much higher replication score.
4/ In academia, the work is very different. PhD students or even undergraduates are the ones doing most the actual research work. But as a PhD student, you need to decide whether you prioritize the project work over your own PhD work (papers and thesis).
3/ LLMs and other foundation models are no longer research artifacts but products. Frontier models are developed by dedicated teams of +100 people specialized across the whole stack (from low level hardware optimization over data to ML and UX topics).
@gui_penedo@pjox13 We will release a filtered version of Colossal OSCAR soon. Is your training and evaluation script somewhere available? I would love to do the comparison with that version.
@mark_cummins For Germany, we have ~50B tokens of court decisions but that are only the publicly available ones and that represent ~1% of all court decisions. However, you won't need all for LLM training due to high duplicate ratio. @mlissner might have the US numbers.
@gui_penedo Awesome work. Will the remaining models also be released? And from your experience what model and data size do you need to see a significant difference in performance?
@BramVanroy @VSC_HPC If your cluster uses slurm you can catch the kill signal and save a checkpoint before that. See this script for an example. Line 14 and 293-300 do the magic. https://t.co/BZI294jR3N
@SebastianB929 Opengptx is an official government funded research project. Occiglot is a loose group of individuals from different organizations without any formal ties. We call it a research collective. You may also call it simply a discord server. And yes, the website needs to be improved.
@ZedDou1@occiglot As mentioned in the readme, we suspect that this is due to the benchmarks being machine translated from English and based on English prompts.