malteos @XYOU - Twitter Profile

about 1 month ago

@RishiBommasani @percyliang The analogy for cloud vs local would be restaurant vs takeout. At the restaurant you better behave otherwise you get kicked out. At home you eat your food however you want.

0

21

malteos @XYOU

about 1 year ago

@MatthewBerman Sure about this? Given the current reproducibility crisis in ML research, I doubt that humans would achieve a much higher replication score.

0

16

malteos @XYOU

over 1 year ago

4/ In academia, the work is very different. PhD students or even undergraduates are the ones doing most the actual research work. But as a PhD student, you need to decide whether you prioritize the project work over your own PhD work (papers and thesis).

0

204

malteos @XYOU

over 1 year ago

3/ LLMs and other foundation models are no longer research artifacts but products. Frontier models are developed by dedicated teams of +100 people specialized across the whole stack (from low level hardware optimization over data to ML and UX topics).

1

0

235

Who to follow

Maarten Sap (he/him)

@MaartenSap

retiring X acct: find me @maartensap.bsky Working on #NLProc for social good. Currently at @LTIatCMU, previously at @UWNLP, @MSFTResearch, and @allen_ai. 🏳‍🌈

eaclmeeting

@eaclmeeting

The European Chapter of the Association for Computational Linguistics An annual Top-tier *ACL conference. #EACL2027 #NLProc March 9-14, 2027

CoNLL 2026

@conll_conf

#CoNLL2025 (co-located with ACL 2026) https://t.co/jBE9BRGXSE July 31 & August 1, 2025

malteos @XYOU

almost 2 years ago

@hu_yifei Did you already try Grobid? https://t.co/mL5C5BLBzD

0

1

0

1

96

malteos @XYOU

almost 2 years ago

@gui_penedo @pjox13 That’s even better. I will share the data with you as soon it’s ready!

0

2

0

21

malteos @XYOU

almost 2 years ago

@gui_penedo @pjox13 We will release a filtered version of Colossal OSCAR soon. Is your training and evaluation script somewhere available? I would love to do the comparison with that version.

1

0

34

malteos @XYOU

about 2 years ago

@mark_cummins For Germany, we have ~50B tokens of court decisions but that are only the publicly available ones and that represent ~1% of all court decisions. However, you won't need all for LLM training due to high duplicate ratio. @mlissner might have the US numbers.

1

0

43

malteos @XYOU

about 2 years ago

@gui_penedo Awesome work. Will the remaining models also be released? And from your experience what model and data size do you need to see a significant difference in performance?

0

505

malteos @XYOU

about 2 years ago

@yoavgo "collected" 😎

0

2

0

307

malteos @XYOU

about 2 years ago

@saattrupdan @SebastianB929 @occiglot Do you have the whole eval setup in containers? If so, I could help with compute.

0

36

malteos @XYOU

about 2 years ago

@SebastianB929 @occiglot Pinging @saattrupdan who did the evals.

1

0

31

malteos @XYOU

about 2 years ago

@qinzytech @OpenAI @Meta Great work! Will the pretraining code be open source?

0

2

0

847

malteos @XYOU

about 2 years ago

@BramVanroy @VSC_HPC If your cluster uses slurm you can catch the kill signal and save a checkpoint before that. See this script for an example. Line 14 and 293-300 do the magic. https://t.co/BZI294jR3N

0

2

0

2

102

malteos @XYOU

about 2 years ago

@SebastianB929 Opengptx is an official government funded research project. Occiglot is a loose group of individuals from different organizations without any formal ties. We call it a research collective. You may also call it simply a discord server. And yes, the website needs to be improved.

0

2

0

41

malteos @XYOU

about 2 years ago

@ZedDou1 @occiglot As mentioned in the readme, we suspect that this is due to the benchmarks being machine translated from English and based on English prompts.

0

1

0

76

malteos @XYOU

about 2 years ago

@BramVanroy Have you tried tensor parallelism on the embedding layer? If I remember it correctly Bloom used this with its large vocab. @StasBekman

1

0

143

malteos @XYOU

over 2 years ago

@BramVanroy @ph_singer There is a high correlation between the weights of Mistral and Mixtral. So this seems pretty likely.

0

29

malteos @XYOU

over 2 years ago

@robertomasymas @burkov Check out "progressive growing". People did something similar already with BERT models.

0

2

0

99

malteos

@XYOU

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users