Had this vision for a very long time, so happy to finally be able to make it a reality! 🎉
Our new Maishiv search experience reflects how we try to create our AI experiences thoughtfully.
Both wanting to take advantage of the power of AI, but with the extra care and responsibility that working with Torah demands.
Will AI replace us?
I think this line in the tweet best defines where we will replaced, and therefore where we will still be needed:
𝐀𝐧𝐝 𝐦𝐨𝐫𝐞 𝐠𝐞𝐧𝐞𝐫𝐚𝐥𝐥𝐲, 𝐚𝐧𝐲 𝐦𝐞𝐭𝐫𝐢𝐜 𝐲𝐨𝐮 𝐜𝐚𝐫𝐞 𝐚𝐛𝐨𝐮𝐭 𝐭𝐡𝐚𝐭 𝐢𝐬 𝐫𝐞𝐚𝐬𝐨𝐧𝐚𝐛𝐥𝐲 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐭𝐨 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐞 (𝐨𝐫 𝐭𝐡𝐚𝐭 𝐡𝐚𝐬 𝐦𝐨𝐫𝐞 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐩𝐫𝐨𝐱𝐲 𝐦𝐞𝐭𝐫𝐢𝐜𝐬 𝐬𝐮𝐜𝐡 𝐚𝐬 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐚 𝐬𝐦𝐚𝐥𝐥𝐞𝐫 𝐧𝐞𝐭𝐰𝐨𝐫𝐤) 𝐜𝐚𝐧 𝐛𝐞 𝐚𝐮𝐭𝐨𝐫𝐞𝐬𝐞𝐚𝐫𝐜𝐡𝐞𝐝 𝐛𝐲 𝐚𝐧 𝐚𝐠𝐞𝐧𝐭 𝐬𝐰𝐚𝐫𝐦. 𝐈𝐭'𝐬 𝐰𝐨𝐫𝐭𝐡 𝐭𝐡𝐢𝐧𝐤𝐢𝐧𝐠 𝐚𝐛𝐨𝐮𝐭 𝐰𝐡𝐞𝐭𝐡𝐞𝐫 𝐲𝐨𝐮𝐫 𝐩𝐫𝐨𝐛𝐥𝐞𝐦 𝐟𝐚𝐥𝐥𝐬 𝐢𝐧𝐭𝐨 𝐭𝐡𝐢𝐬 𝐛𝐮𝐜𝐤𝐞𝐭 𝐭𝐨𝐨.
That feels like a great framework for thinking about where humans may be replaced, and where humans will still provide real value.
If there is a clear metric, and it can be tested and iterated on, the problem becomes much more well-defined. At that point, the raw compute advantage of computers makes them very hard to compete with.
But a lot of important things do not fit neatly into that bucket.
Often the hard part is defining what success even means.
Jeff Bezos makes this point well in the video below. Knowing what to measure is often the actual work.
What do you actually want? You cannot tell an AI agent to optimize for something unless you first know what you want optimized.
And sometimes the thing just cannot really be tested in advance.
Often, you are working with small signals, and you have to make a judgment call.
And this lines up with something I keep noticing when I listen to David Senra’s podcast (which I love.)
Almost every episode, when you hear the interview of a great founder or CEO, there is some version of the same point.
At the end of the day, so much of the value came from trusting their intuition before the data was obvious.
I think this is what people mean when they say “taste” is becoming one of the most AI-resistant skills.
Taste is the ability to notice something before the data is obvious.
It is understanding what you want, understanding what other people want, having strong intuition, and having the courage to follow it.
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project.
This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.:
- It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work.
- It found that the Value Embeddings really like regularization and I wasn't applying any (oops).
- It found that my banded attention was too conservative (i forgot to tune it).
- It found that AdamW betas were all messed up.
- It tuned the weight decay schedule.
- It tuned the network initialization.
This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism.
https://t.co/WAz8aIztKT
All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges.
And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Forgot to mention maybe the most consequential update of all in the video: you can now upload full OCR outputs as well. 🤦♂️
That means you can upload full PDFs of seforim, even hundreds of pages long, and make them searchable inside Maishiv.
I think that is pretty cool.
Improved Maishiv upload experience, now with bulk uploads for @Sofer_Ai transcripts and documents.
You can upload multiple files at once, including folders of Word documents, and add them directly to your Knowledge Base, making it much easier to build and search across your Torah content in Maishiv.
Truly unbelievable. Hard for me to wrap my head around the fact that what started as a little project has been able to have such an impact.
This is not the goal in and of itself, but it is a reminder that what we are working on is real.
Thank you to Hashem, and to all of those that helped, encouraged, and supported along the way.
Excited for what is ahead.
@OfficialLoganK The price to performance ratio makes it hard to find a use case for.
It is way more expensive than 3 flash, but at the same time it's not pushing the capability frontier either. It's kind of stuck in between, which puts it in a weird space.
Primary reason why @Sofer_Ai is NOT an nonprofit
"[it] needs to be a profitable enterprise that stands on its own two feet...because it's a measure of its relevance. If people wont pay for our product it's not a good enough product"
SORKIN: Why lay people off at the Post? Why fire people?
BEZOS: Because the Post needs to be a profitable enterprise that stands on its own two feet
SORKIN: Does it? Some people say it should be a trust
BEZOS: Yes. It's a measure of its relevance. If people aren't paying for our product, it's not a good enough product
Try to keep inbox sacred and only send emails for major updates or promotions. But at the same time we putting out small updates and improvements constantly. So we created https://t.co/yEeb8Mrymy
Update: Updates
You can now track all https://t.co/5vx6VSQPxN updates in one place: new features, fixes, improvements, and rule changes.
Check it out here: https://t.co/PhlGYzZ1VV
Super cool that @Sofer_Ai got to be a part of this amazing project, AishU, led by the phenomenal @NoahOmriLevin.
When I first learned how they were using @Sofer_Ai and AI in general, I was blown away.
A lot of individuals and organizations are trying to build AI chatbots, but the approach @NoahOmriLevin and his team took struck me as extremely sophisticated, thought out, and purposefully designed for Jewish education.
If you’re curious to learn more how they did it, check out the write-up in the article!! It’s fantastic!
You've been asking for this one...
Now in preview: Codex in the ChatGPT mobile app.
Start new work, review outputs, steer execution, and approve next steps, all from the ChatGPT mobile app. Codex will keep running on your laptop, Mac mini, or devbox.
@Sofer_Ai peut désormais transcrire des audios principalement en français, même lorsqu’ils contiennent également de l’hébreu, de l’araméen ou du yiddish.
Nous sommes heureux de pouvoir mettre la puissance de la transcription de Torah au service de la communauté francophone. Les chiourim, cours et enregistrements pourront ainsi être plus facilement recherchés, étudiés, partagés et préservés.
La prise en charge du français est encore en version bêta, et nous travaillons activement à l’améliorer. Si vous l’essayez avec des audios de Torah en français, vos retours nous seront très précieux : précision de la transcription, mise en forme, identification des intervenants, vocabulaire de Torah, ou tout autre point à améliorer.
Nous avons hâte d’ouvrir cette nouvelle possibilité à la communauté francophone.
-----
@Sofer_Ai can now transcribe audio that is primarily in French, even when it also contains Hebrew, Aramaic, or Yiddish.
We’re excited to bring the power of Torah transcription to the French-speaking Torah community. Shiurim, classes, and recordings can now be more easily searched, studied, shared, and preserved.
French support is still in beta, and we’re actively working to improve it. If you try it with French Torah audio, your feedback would be extremely valuable: transcription accuracy, formatting, speaker identification, Torah vocabulary, or anything else that could be improved.
We’re looking forward to opening this new possibility to the French-speaking community.
With AI, the 80/20 rule has turned into 90/5:
90% of the work gets done in 5% of the time.
And now that last 10% feels even more painstaking than before.
But (for now at least) that last 10% is the differentiator.
Thank you so much, I really appreciate it!
Trying to be better about posting on here, LinkedIn, and to our email list on updates that we implement.
Right now the biggest upcoming one I hope to have ready soon would be an AI index/mafteach creator for a sefer. You could upload a whole sefer, and would generate the index for you so you can easily identify where different topics are discussed.
If you would like to beta test that when ready feel free to email me/dm me and can make that happen when its ready