@mkurman88 sequence length? I trained on RNA some months back, with a byte level tokenizer tho, loss around ~1, 1024 seq length. finetuning on a downstream task went pretty well ~regardless of pretraining loss.
@scaling01 realistically, Fable can't be a 10T model, not with that inference speed, which is very similar to Opus. anything bigger than 2T in prod at scale means bankruptcy
@slimcat0101@PaddlePaddle unfortunately out of 100 images of that kind, which I processed with v6 and manually inspected, 100% of them have spacing issues. e.g., consecutive words merged into one.
mostly tested en and it languages.
@slimcat0101@PaddlePaddle you can try on these images yourself as test: https://t.co/jujlAVuRZ9
you can open more flyers and try with other offers, every retailer has its own layout and font
(I work here on document AI with VLMs)
@slimcat0101@PaddlePaddle latest. I've tried v6 in all the flavours, all of them have this issue, consistently. I tried to implement a hacky postprocessing algo with gpt 5.5, it improves the situation but it's fragile.
It's a pity really, otherwise I'd ship it in prod over gvision
@scaling01 imho they are not even trying..
- the entire work chain in CN is probably 5-7x cheaper than in US
- their users pool is massive
- their compute is way cheaper
- their data is way cheaper
- CSA does wonders
@LucianoLicelli classiche marchettate per average user, ci sta. figurati se l'utente avg pensi al fatto che il decoding sia memory bound lol
anche una dgx spark, non che stia messa troppo meglio su quel fronte ;) e una 5090 come confronto non ci azzecca un granché
@antirez I have another view: the recipes are different. the data pools are different.
oai and anthropic have virtually the same people working for them. person A (including dario, andrej, etc) was in oai and then switched to anthro. and viceversa.
they have the ~same secrets and tricks
@a_karvonen let's say your paper is 20k tokens, out of trillions. let's say they augment each paper 10x, 200k tokens.
that is only 2x10⁻⁷ over 1T total tokens.
I don't think it's memorising your paper, probably doing some smart predictions mimicked as "I recall bla bla"
@mkurman88 wezterm, great for shortcuts and scripting with lua -> https://t.co/iGPMIRcoKZ
e.g I have cmd+0 -> watch -n2 nvidia-smi; cmd+1 -> open project X folder in a new tab, split in 2 panes, pre-send git pull to pane 1, write "code ." in the other
btop -> https://t.co/ZdbL77D40Y
no. smartphones industry is nonsense from a sustainability pov, always has been. it's a complete waste of precious resources.
you all don't need to change smartphone every year.
as usual, they just propose solutions to the problems they've created.
@theo Your beloved "national security" in the play means that they'll get rid of china as well doing a favor to everybody in US.
At the end, the gvmt will be part of the pre-release safety tests as well and that's it.
@theo It's all part of the IPO story. too much money at stake, just wait 2 weeks.
They need this to strengthen public recognition and feed the "we can build powerful/dangerous AI" story, which will keep feeding the "we can replace your job" main story line.