@y0b1byte In section 2.3.2 they said that cold started RL still had language mixing problems. They had to specifically introduce a language matching reward to mitigate this.
I have never been able to generate an accurate chess board in any position. This may be an “AI hard” task: solving it probably requires a breakthrough in reasoning @GaryMarcus
Frontier models like GPT-4o (and now Claude 3.5 Sonnet) may be at the level of a "Smart High Schooler" in some respects, but they still struggle on basic tasks like tic-tac-toe. There was hope that native multimodal training would help but that hasn't been the case.
A short post on the best architectures for real-time image and video processing.
TL;DR: use convolutions with stride or pooling at the low levels, and stick self-attention circuits at higher levels, where feature vectors represent objects.
PS: ready to bet that Tesla FSD uses convolutions (or perhaps more complex *local* operators) at the low levels, combined with more global circuits at higher levels (perhaps using self-attention). Transformers on low-level patch embeddings are a complete waste of electrons.
Octopus v2: On-device language model for super agent
Presents a new method that empowers an on-device 2B model to outperform GPT-4 in both accuracy and latency, and decrease the context length by 95%
https://t.co/J1ikDK4ELx
Google presents Genie
Generative Interactive Environments
introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
Google Deepmind presents Grandmaster-Level Chess Without Search
paper page: https://t.co/qwpbAb9DL7
largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.
Three logicians walk into a bar.
The bartender asks: 'Does everyone want a drink?'
The first logician says: 'I don't know.'
The second logician says: 'I don't know.'
The third logician says: 'Yes.'
this paper's nuts. for sentence classification on out-of-domain datasets, all neural (Transformer or not) approaches lose to good old kNN on representations generated by.... gzip https://t.co/6eZiXlJxOX
@anmarasovic@soldni For my blog search, I tokenize searchable text into trigrams before ranking with BM25. It’s snappy because it’s all happening in the browser and gives intuitive results. https://t.co/J32RPzTzd4
I've seen a lot of people asking "why does everyone think Twitter is doomed?"
As an SRE and sysadmin with 10+ years of industry experience, I wanted to write up a few scenarios that are real threats to the integrity of the bird site over the coming weeks.
It's here–the deepest, sharpest infrared view of the universe to date: Webb's First Deep Field.
Previewed by @POTUS on July 11, it shows galaxies once invisible to us. The full set of @NASAWebb's first full-color images & data will be revealed July 12: https://t.co/63zxpNDi4I
DALLE-2 has a secret language.
"Apoploe vesrreaitais" means birds.
"Contarra ccetnxniams luryca tanniounons" means bugs or pests.
The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs.
A thread (1/n)🧵