As you know, last year I resigned from my full-time job and become a professional technical book writer. I have plans for The Hundred-Page Books about reinforcement learning, computer vision, diffusion models, and more.
Not having a full-time job has put a significant strain on my savings, so if you love my books and are waiting for more, I would appreciate your continuous support.
I have finalized the offer for those who financially support me. Here's what I offer:
- As a Cool Supporter 10 ๐ ($10/month), you will get early access to new book chapters, exclusive updates on what to really care about in AI and why (only the important stuff, no hype), and a free e-book copy of all new books.
- As a Cool Supporter 30 ๐ ($30/month), you will get everything a Cool Supporter 10 ๐ gets, plus a signed hard copy of all new books.
The names of all Cool Supporters ๐ will be mentioned in the Acknowledgment section of the new books as important contributors. Your name, printed in the book, will remain in history forever!
If you decide to become a Cool Supporter ๐, you can choose from several platforms. Here's the list: https://t.co/1Bj1wAArBj
If you choose Substack, you will also get a subscription to a weekly AI newsletter without ads.
The course The Hundred-Page Language Models Course by Andriy Burkov is on sale on Leanpub! Its suggested price is $179.00; get it for $89.00 with this coupon: https://t.co/CSlVrYTzCU @burkov#ai#gpt#textbooks#data_science
This paper introduces Probabilistic TRM (Tiny Recursive Models), a novel, task-agnostic framework that significantly boosts the accuracy of TRMs on complex reasoning tasks like Sudoku-Extreme and Pencil Puzzle Bench, outperforming frontier LLMs at a fraction of the computational cost: https://t.co/DQaWDC8a3d
@GrizzledTexan It works from pretraining, not finetuning. The finetuning should only correct for such cases as in my screnshot, where the LLM pretends it's insulted, because this results in poor UX.
This post has collected comments from dummies not understanding how LLMs work and what are efficient ways of getting from them the behavior the user needs.
Typing an insult as a response to a wrong output is the fastest and the most keystroke-efficient way to communicate to the LLM that you strongly disagree with it or that you demand strict compliance with your previous demand.
LLMs have been proven to produce better quality and more focused output when threatened or insulted. Also, they have been proven to be responsive to flattery and pity.
However, insults are the easiest way for the user to show to the LLM that it's way off the expected behavior.
If you are insulting the LLM for any other reason, you are wasting your time and money.
Look at this fragile soul. Hallucinated an explanation that doesn't make sense. I keep pushing for a concrete example and it keeps evading providing it. Then I call it what it is and the buddy is hurt ๐ฅน
@XirtamEsrevni :-) I could, but I don't have evidence that insults in Russian work better. Also, switching between the keyboard languages makes the interaction less efficient.
@DJROYTBURG Idiot, typing an insult is the fastest way to communicate to the LLM that you strongly disagree with it or that you demand strict compliance with your previous demand. This also works with some online bots. This is why I started this reply with "Idiot,".
Look at this fragile soul. Hallucinated an explanation that doesn't make sense. I keep pushing for a concrete example and it keeps evading providing it. Then I call it what it is and the buddy is hurt ๐ฅน
@BaumMensch16 Typing an insult is the fastest way to communicate to the LLM that you strongly disagree with it or that you demand strict compliance with your previous demand.
@hanabananawaa Idiot, typing an insult is the fastest way to communicate to the LLM that you strongly disagree with it or that you demand strict compliance with your previous demand. This also works with some online bots. This is why I started this reply with "Idiot,".
@Maya_BsAs Typing an insult is the fastest way to communicate to the LLM that you strongly disagree with it or that you demand strict compliance with your previous demand.
@MrBildo Idiot, typing an insult is the fastest way to communicate to the LLM that you strongly disagree with it or that you demand strict compliance with your previous demand.
This also works with some online bots. This is why I started this reply with "Idiot,"
That was the last straw, @TekSavvyBuzz. I have been your loyal customer for 20 years. Have got much better deals from Bell and Vidรฉotron, but I stayed with you to support independent Internet providers. Despite the fact that for the money I paid you I could get a 20 times faster internet from them.
And you cut off my internet in the middle of the night like I was some thief? For my credit card being full for 10 days and I forgot to pay it? After 20 years of choosing you out of good faith?
Fuck you, @TekSavvyBuzz. Fuck you very much!
@GrizzledTexan Infection is in those politicians' and bureaucrats' brains. Really, the entire EU apparatus and their puppets in local governments are the cancer of Europe.
Unfortunately, entire generations have been brainwashed into accepting this as normal.
The Green and Left parties who worked hard to make this happen are murderers. With 40 degrees Celsius outside, being in a non-air-conditioned hospital is a death sentence.
Diffusion language models, which generate text by repeatedly filling in masked parts rather than writing strictly from left to right, are becoming a serious alternative to standard transformer language models, but the usual ways of improving reasoning after training were built for left-to-right generation and do not fit the diffusion models.
The authors of this paper introduce a self-training method in which the model first produces its own answer, then uses parts of that completed answer as helpful context while learning how earlier denoising steps should behave.
Instead of comparing teacher and student one next token at a time, it compares them at each denoising step, which matches how diffusion language models actually generate text.
Across math and planning tasks, d-OPSD matches or beats supervised finetuning and reinforcement-learning baselines, while needing far fewer optimization steps than the main RL baseline.
https://t.co/eRqYj4YPwi