happy to announce that we've gotten rid of tokenizers!
especially excited with what we've replaced them with: end-to-end trainable modules that not only learn to group characters into (sub)words, but can iterate to group words into phrases and further higher-order concepts
see @sukjun_hwang's thread for more details π
Tokenization has been the final barrier to truly end-to-end language models.
We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
Also new in Claude Code: dynamic workflows (research preview).
For the hardest tasks, Claude makes a plan, runs hundreds of parallel subagents, and verifies its work before reporting back. Think a migration touching hundreds of files.
Read more: https://t.co/7gt06kGkDN