Mauro S. @ma_sc_ - Twitter Profile

23 minutes ago

@mkurman88 sequence length? I trained on RNA some months back, with a byte level tokenizer tho, loss around ~1, 1024 seq length. finetuning on a downstream task went pretty well ~regardless of pretraining loss.

0

4

Mauro S. @ma_sc_

about 19 hours ago

@ns123abc he's a 007 in disguise. 18months, finds something interesting and go back reporting in Google.

0

55

Mauro S. @ma_sc_

1 day ago

@antirez pruning is last resort

0

451

Mauro S. @ma_sc_

1 day ago

@scaling01 realistically, Fable can't be a 10T model, not with that inference speed, which is very similar to Opus. anything bigger than 2T in prod at scale means bankruptcy

0

1

0

38

Who to follow

Kartik Bhat

@KartikB101

EICO & Sei Labs // prev Nvidia // Carnegie Mellon

North Carolina independent agent here to provide you with expert guidance while shopping for inexpensive whole ,term life, Final Expense and Medicare products

Mauro S. @ma_sc_

1 day ago

@slimcat0101 @PaddlePaddle unfortunately out of 100 images of that kind, which I processed with v6 and manually inspected, 100% of them have spacing issues. e.g., consecutive words merged into one. mostly tested en and it languages.

1

0

12

Mauro S. @ma_sc_

1 day ago

@slimcat0101 @PaddlePaddle you can try on these images yourself as test: https://t.co/jujlAVuRZ9 you can open more flyers and try with other offers, every retailer has its own layout and font (I work here on document AI with VLMs)

1

0

19

Mauro S. @ma_sc_

1 day ago

@slimcat0101 @PaddlePaddle latest. I've tried v6 in all the flavours, all of them have this issue, consistently. I tried to implement a hacky postprocessing algo with gpt 5.5, it improves the situation but it's fragile. It's a pity really, otherwise I'd ship it in prod over gvision

1

0

27

Mauro S. @ma_sc_

2 days ago

@antirez is pruning on your todo list ?

0

86

Mauro S. @ma_sc_

2 days ago

@scaling01 imho they are not even trying.. - the entire work chain in CN is probably 5-7x cheaper than in US - their users pool is massive - their compute is way cheaper - their data is way cheaper - CSA does wonders

0

15

Mauro S. @ma_sc_

3 days ago

@LucianoLicelli classiche marchettate per average user, ci sta. figurati se l'utente avg pensi al fatto che il decoding sia memory bound lol anche una dgx spark, non che stia messa troppo meglio su quel fronte ;) e una 5090 come confronto non ci azzecca un granché

0

1

0

31

Mauro S. @ma_sc_

4 days ago

@antirez I have another view: the recipes are different. the data pools are different. oai and anthropic have virtually the same people working for them. person A (including dario, andrej, etc) was in oai and then switched to anthro. and viceversa. they have the ~same secrets and tricks

0

604

Mauro S. @ma_sc_

5 days ago

@a_karvonen let's say your paper is 20k tokens, out of trillions. let's say they augment each paper 10x, 200k tokens. that is only 2x10⁻⁷ over 1T total tokens. I don't think it's memorising your paper, probably doing some smart predictions mimicked as "I recall bla bla"

0

7

Mauro S. @ma_sc_

5 days ago

@xeophon @ThePrimeagen they finally got to this

0

2

0

77

Mauro S. @ma_sc_

5 days ago

@mkurman88 wezterm, great for shortcuts and scripting with lua -> https://t.co/iGPMIRcoKZ e.g I have cmd+0 -> watch -n2 nvidia-smi; cmd+1 -> open project X folder in a new tab, split in 2 panes, pre-send git pull to pane 1, write "code ." in the other btop -> https://t.co/ZdbL77D40Y

0

103

Mauro S. @ma_sc_

5 days ago

no. smartphones industry is nonsense from a sustainability pov, always has been. it's a complete waste of precious resources. you all don't need to change smartphone every year. as usual, they just propose solutions to the problems they've created.

Peter Steinberger 🦞

@steipete

5 days ago

This shortage of chips is getting out of hand.

68

3K

97

487

343K

0

18

Mauro S. @ma_sc_

5 days ago

@judokach @jsuarez new env?

0

1

0

64

Mauro S. @ma_sc_

6 days ago

@mkurman88 ok keep your secrets

1

0

49

Mauro S. @ma_sc_

6 days ago

@Dorialexander @antirez yeah, differently from US, in EU we gotta pay the bills first. which doesnt fit well with frontier LLM economy

0

4

0

832

Mauro S. @ma_sc_

6 days ago

@theo Your beloved "national security" in the play means that they'll get rid of china as well doing a favor to everybody in US. At the end, the gvmt will be part of the pre-release safety tests as well and that's it.

0

4

Mauro S. @ma_sc_

6 days ago

@theo It's all part of the IPO story. too much money at stake, just wait 2 weeks. They need this to strengthen public recognition and feed the "we can build powerful/dangerous AI" story, which will keep feeding the "we can replace your job" main story line.

1

0

26

Mauro S.

@ma_sc_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users