Finally got around to try the Slug algorithm. It's fantastic. I made a small demo and an article that explain how to prepare your font data on my blog: https://t.co/1850vSNqy2
New blog post: A Decade of Slug
This talks about the evolution of the Slug font rendering algorithm, and it includes an exciting announcement: The patent has been dedicated to the public domain.
https://t.co/xWEz0q2c4N
Somebody on r/LocalLLaMA trained an LLM from scratch on London texts from 1800 to 1875
Fun artifact
> “telephone” invented in 1876
> dataset stops at 1875
> so when you prompt “telephone”
> the model treats it like
> some secret diplomatic device
> or a mysterious apparatus
Model & Data
> 1.2B parameters
> ~90GB corpus
> books, journals, legal documents
> religious writing, medical papers
Tokenizer
> custom tokenizer
> trained on the same dataset
Training
> ~182k training steps
> trained on a rented H100 SXM