Every morning, the moment my eyes open, I wake up to 40 unread Slack messages that effectively say:
“If you don’t fix this in the next 5 minutes, the world will implode and the app will cease to exist.”
Gemma 4 dropped a 12B.
I put it on RTX 5090 against its 31B sibling.
when you cut a model from 31B to 12B, what do you actually lose?
~ reasoning barely moves
GSM8K (math) 97.5 > 96.4 (−1.1)
ARC-C (sci reasoning) 97.6 > 94.0 (−3.6)
~ knowledge falls off a cliff
MMLU (world knowledge) 87.8 > 78.9 (−8.9)
HellaSwag (commonsense) 92.0 > 81.6 (−10.4)
~~~
parameters store facts, not thinking. the 19B you delete is mostly where the model kept its trivia and world-priors, cut it and recall collapses, while the reasoning machinery stays nearly whole.
a 12B reasons almost like its big brother. It just knows less.
122 tok/s vs 53 (2.3x faster generation), ~10GB instead of ~24, meaning that you get 20GB+ free on a 32GB card for long context or a second model.
so it depends of your workload:
reasoning / math / agentic loops = the 12B is nearly free
broad-knowledge Q&A with no retrieval = that's the one job worth paying for the 31B.
The key point here is that if the memory market was going to become deeply cyclical again and operate on digestion cycle dynamics you do not make this move.
Claude gets grumpy about direct commands but excited if you give it leading questions, so it thinks it is coming up with the direction itself. Codex on the other hand takes commands fine but if you ask it a leading question it will just answer it literally and then stop
@tomwarren It's wild that the Game Store company has made further progress on OS market penetration than the OS company has made on Game Store market penetration.
You were suspended for platform manipulation.
On April 11, we announced that a portion of creator revenue would be allocated to original authors of content. Immediately after, you stopped using Video Share, which you had been using for 3 years.
Instead, you began to programmatically download-and re-upload other accounts' videos so that the system would credit them as original.
The behavior alone was circumstantial. What made it conclusive: you uploaded another user's video with the watermark cropped out. You deliberately attempted to manipulate the payout formula.
We don't pay people who cheat the program.
Every new message sent to a conversation includes the rest of the conversation before it. If you have a persistent session (either 5 min or 1 hr), they load the conversation from cache instead of sending it all freshly. Cached input tokens are 10x less than freshly loaded tokens, and subscriptions have the same penalty on usage. It's always worked this way, but it wasn't very clear before.
We are investigating unauthorized access to GitHub’s internal repositories. While we currently have no evidence of impact to customer information stored outside of GitHub’s internal repositories (such as our customers’ enterprises, organizations, and repositories), we are closely monitoring our infrastructure for follow-on activity.
on some level if you want civilization to ascend to a new level you need your AIs to do things that are not legible to you and maybe not even strictly obey you, in the same way that if you hire a great new ceo you give them a lot of autonomy to transform the company according to their own plan, even one which may not immediately read as a winning strategy (imagine the board of directors of Apple firing and rehiring Steve Jobs years later - except the board of directors are chimpanzees)
all else equal, companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium and requires some sort of vast cooperation scheme if you’d like to enforce it
real asi alignment has to operate at a deeper level than oversight, control, or human corrigibility
There is this strange idea that “techno-optimism” means “liking the maximum amount of all technology all the time,” as though being a “fan of music” meant “preferring all music genres played at maximum volume all the time.” Love of a thing means more discerning taste, not less.