Situation 1: dev A thinks approach X is correct, dev B thinks Y is the right way. They argue and try to convince each other.
Situation 2: dev A thinks approach X is correct, tells the LLM to implement it.
There is SO MUCH learning in Situation 1, lost when using LLMs....
there should be a way to filter out posts(from people you follow) which are off topic. for eg: you follow someone for their tech related posts and suddenly they start blabbering about food/politics/etc. can we do anything like that on X?
@nikitabier
@manasjsaloi i have asked this question to some people and almost always i get answer that "why would I work if no need to worry about money". on pushing a little it goes till "would travel world".
normally never get a decent answer.
what if all the LLMs purposefully hallucinate to mask themselves as “not perfect” as an attempt to mislead humans to think that they are not more intelligent than us?
DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M).
For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints.
Does this mean you don't need large GPU clusters for frontier LLMs? No but you have to ensure that you're not wasteful with what you have, and this looks like a nice demonstration that there's still a lot to get through with both data and algorithms.
Very nice & detailed tech report too, reading through.