deatos2k

What's crazy to me is that Fable is blocked from life sciences broadly, nerfed even if you get passed the classifiers and filter level blocks. The whole point of AGI/ASI is to cure all diseases. Everything else is just nice to haves. But Anthropic wants to close off that path. I think Anthropic might be the worst company on the planet.

421

391

374

234K

deatos2k retweeted

Phoebe Yao

@phoebeyao

about 1 month ago

training data is starting to look like a zero knowledge proof problem. labs have to judge quality without seeing the full dataset or the QC pipeline behind it. vendors proxy quality with multi-rollout pass rates, small-model ablations, and downstream eval gains. but compute and iteration costs explode as environments and trajectories grow more complex. quality has no ceiling, and the best data is often the hardest to capture in a metric or explain in a writeup. huge alpha in making data quality more legible.

413

242

61K

Who to follow

Alastair Horne

@althepcman

He/Him. Threat Analyst II by day, father and husband at night. NQ Cowboys supporter and TTRPG player/DM. My opinions are my own.

L3b0z

@abderahmanelgho

$//$ 🐞🐞 To be the good guy, sometimes you gotta be the bad guy first...

Charlie Fraser - @[email protected]

@njtreker

Both IT and InfoSec. CISSP, GCFA gold, GPEN, GCWN, GCIH, GISP, GIAC Advisory Board. Graduate Certificate in progress TESU.

deatos2k @deatos2k

about 1 month ago

Funny but also true

LaurieWired

@lauriewired

about 1 month ago

the most low-effort / high reward thing you can do for security is installing the Russian language pack (not even joking, it's ridiculous how often that prevents execution)

lauriewired's tweet photo. the most low-effort / high reward thing you can do for security is installing the Russian language pack

(not even joking, it's ridiculous how often that prevents execution) https://t.co/wQ4res5DCA

143

14K

962

deatos2k retweeted

sudo rm -rf --no-preserve-root /

@pcaversaccio

about 1 month ago

ethereum got flooded by fucking mercenaries pretending to be revolutionaries. everyone talks about decentralisation until there's money on the table, then suddenly compliance, surveillance, business opportunities, and extraction become pragmatic. most of this ecosystem does not give a shit about freedom. they care about pumps, fees, and turning users into fucking exit liquidity. cypherpunk was never about making founders rich. it was about making tyranny obsolete. the grifters will cash out. the cowards will comply. cypherpunks will outlast all of them. principles compound harder than capital. you don't have to believe me. you won't. because you'll be gone while i'm still here.

541

23K

deatos2k retweeted

Dave W Plummer

@davepl1968

about 1 month ago

@ThePrimeagen Confession: I hold down EVERYTHING and hit V and it will paste in the current text format (ie: regular old paste, no formatting). I can never remember which combination it actually is, but if you mash the shift/option/alt combo, it works. Caveman coding 101.

238

15K

deatos2k @deatos2k

about 2 months ago

This had quite a few people on edge lol.

Romain Huet

@romainhuet

about 2 months ago

Thank you for flagging this, Jeff. This was a mistake: we are not deprecating text-embedding-3-small. We’re looking into where this came from now, and we’ll also email users to clarify. Sorry for the confusion!

495

89K

deatos2k retweeted

ani

@anirudhbv_ce

2 months ago

pip install turboquant-gpu 5.02x KV cache compression for ANY GPU (RTX, H100, A100, B200) - works over @huggingface transformers - dead-simple API: compress + generate in 3 lines - 3-bit Lloyd-Max fused KV compression (0.98 cosine similarity) - outperforms MXFP4 (3.76x) and NVFP4 (3.56x) on compression Ran Mistral-7B: 1,408 KB → 275 KB KV cache (5.02x) Quickstart: https://t.co/7Arml3at79 Written in cuTile (CUDA 12, 13) with PyTorch fallbacks

anirudhbv_ce's tweet photo. pip install turboquant-gpu

5.02x KV cache compression for ANY GPU (RTX, H100, A100, B200)

- works over @huggingface transformers

- dead-simple API: compress + generate in 3 lines

- 3-bit Lloyd-Max fused KV compression (0.98 cosine similarity)

- outperforms MXFP4 (3.76x) and NVFP4 (3.56x) on compression

Ran Mistral-7B: 1,408 KB → 275 KB KV cache (5.02x)

Quickstart: https://t.co/7Arml3at79

Written in cuTile (CUDA 12, 13) with PyTorch fallbacks

269

159K

deatos2k retweeted

Harrison Chase

@hwchase17

2 months ago

https://t.co/TPxl5W66OJ

164

515K

deatos2k @deatos2k

3 months ago

Claude please make my infra socks 2 compliant for the winter. Thanks

Austen Allred

@Austen

3 months ago

Thinking of this exchange today

157

388

397K

deatos2k retweeted

Christos Tzamos @ChristosTzamos

3 months ago

1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy

248

806

deatos2k retweeted

Han Xiao

@hxiao

4 months ago

Okay Junyang is not Qwen, and Qwen is not Junyang. Behind every great model is a team grinding through data pipelines, training runs, and sleepless launches. But he was the voice, the bridge, the person who made the global AI dev community feel like Qwen was theirs too. That kind of developer trust takes years to earn and seconds to lose. What bothers me is the pattern. Big companies talk about valuing people holistically, then punish the very qualities that made those people valuable. A first-principles thinker inside a non-first-principles system doesn't just overburnt - they suffocate. And the system never sees it as its own failure. In big corp, the hardest part was never the tech- they have talents - It was watching talented people get squeezed between what they know is right and what the org will allow. That gap is where you lose your best. Wherever he lands next, the community will follow. That should tell Alibaba everything.

900

151

132K

deatos2k retweeted

Ronak Malde

@rronak_

4 months ago

WTF you can now train a 5 million context window 8b model on a single node of 8xH100s ???? Most people don't realize that even on long context pretrained frontier models, most RL post-training is only done on a small fraction of that context. Why? Long context RL is notoriously memory hungry, and requires a sharding strategy called Context Parallelism that takes up an inordinate amount of GPUs. This paper for Together flew under the radar, combines the best of Context Parallelism + Sequence Parallel-style head chunking, to get memory efficient long context. The gains are insane, cutting attention memory footprint up to 87% Authors: @m_ryabinin , @sereghik , Maksim Abraham, @ghadiaravi13

$rronak_'s tweet photo. WTF you can now train a 5 million context window 8b model on a single node of 8xH100s ???? Most people don't realize that even on long context pretrained frontier models, most RL post-training is only done on a small fraction of that context. Why? Long context RL is notoriously memory hungry, and requires a sharding strategy called Context Parallelism that takes up an inordinate amount of GPUs. This paper for Together flew under the radar, combines the best of Context Parallelism + Sequence Parallel-style head chunking, to get memory efficient long context. The gains are insane, cutting attention memory footprint up to 87% Authors: @m_ryabinin , @sereghik , Maksim Abraham, @ghadiaravi13$

468

493

44K

deatos2k retweeted

Lou

@louszbd

4 months ago

progress in AI has always been built on learning from each other, open research (and of course healthy competition) our ultimate goal is AGI, we want the ecosystem to be inclusive and open respect every lab’s effort to protect their IP, but always believe the path to AGI is a collective journey.

777

69K

deatos2k retweeted

Lisa Dunlap

@lisabdunlap

8 months ago

So is the formula to just name the most famous institutions and call it an X paper? Neither the first or last author are from Anthropic or Stanford. I get that reputation matters for publicity but it does seem a little disrespectful

431

103

84K

deatos2k retweeted

raylib technologies @raylibtech

about 4 years ago

Interesting fact: raylibtech tools fit in a single floppy disk. High-performant, single binary, beatiful multi-themed UI, powerful command-line, multiplatform, no external dependencies. Better software is still possible. #gamedev #toolsdev #raylib #raygui #uidesign #futureisnow

raylibtech's tweet photo. Interesting fact: raylibtech tools fit in a single floppy disk. High-performant, single binary, beatiful multi-themed UI, powerful command-line, multiplatform, no external dependencies. Better software is still possible.

#gamedev #toolsdev #raylib #raygui #uidesign #futureisnow https://t.co/1ZRS7IXiY9

616

deatos2k @deatos2k

9 months ago

TLDR: This model is actually good, The harmony template is really good, don't sleep on the new harmony features. But... The harmony format may not gain adoption, OpenAI does not release models very often and the harmony format is very opinionated making it hard to get just right

deatos2k @deatos2k

9 months ago

I think a lot of people are sleeping on GPT-OSS, There were issues with providers and backends properly using the harmony format when this first came out. I run this with VLLM and it performs quite well under loads I cant do with other models.

Nathan Lambert

@natolambert

9 months ago

Who's using GPT-OSS and for what? Was it cheaper, better, faster than other open models? Or just not from China? Download numbers are actually very strong on HuggingFace for first model releases.

123

592

218

125K

deatos2k @deatos2k

9 months ago

System and Developer are not the same and special care must be take when crafting the input. This model was not something I could just throw into my existing solutions and get good results. I was also quick to figure out the model under performed without both present.

deatos2k

@deatos2k

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users