Patrick Batman

@aamoneymaker

Hadi ya

Joined March 2023

322 Following

48 Followers

1.6K Posts

Pinned Tweet

Patrick Batman @aamoneymaker

about 3 years ago

That the real is on the rayzzzz fuck them oth3rrrrr guyžzzzzz

833

aamoneymaker retweeted

Nav Toor

@heynavtoor

5 months ago

Google has open-sourced LangExtract: A Python library that turns unstructured documents into structured data, with clear source references for every result. 100% open-source.

heynavtoor's tweet photo. Google has open-sourced LangExtract:

A Python library that turns unstructured documents into structured data, with clear source references for every result.

100% open-source. https://t.co/8IB3dibWVz

128

125

aamoneymaker retweeted

Alex Veremeyenko

@alex_verem

5 months ago

the best 20 accounts to follow in AI: @karpathy = LLMs king @steipete = built openclaw @gregisenberg = startup ideas king @rileybrown = vibecode king @corbin_braun = cursor king @jackfriks = solo apps king @levelsio = solo startups king @marclou = solo startups king @EXM7777 = AI ops + systems king @eptwts = AI money twitter king @godofprompt = prompt king @vasuman = AI agents king @AmirMushich= AI ads king @0xROAS = AI UGCs king @egeberkina = AI images king @MengTo= AI landing pages king @rryssf = automations king @kloss_xyz = systems architecture king @emollick = AI science king @Hesamation = AI/ML king follow them all and learn.

322

17K

aamoneymaker retweeted

Boris Cherny

@bcherny

5 months ago

I'm Boris and I created Claude Code. I wanted to quickly share a few tips for using Claude Code, sourced directly from the Claude Code team. The way the team uses Claude is different than how I use it. Remember: there is no one right way to use Claude Code -- everyones' setup is different. You should experiment to see what works for you!

924

51K

103K

Who to follow

Eren Akıncı

@lawhoer

I'm something of a developer myself

Gamze Yıldırım

@girisimci_gamze

Tam bir #girişimci kafası, her türlü öneriye öğrenmeye açık bir #entrepreneur adayı 🙂

Mehmet Özdemir

@mehmeetozdemir1

istanbul https://t.co/mKKTpGRQX5

aamoneymaker retweeted

Fitz

@FitzGPT

5 months ago

YARATICISINDAN CLAUDE CODE TAKTİKLERİ 🚀 Claude Code ekibi böyle kullanıyor👇🏻 Kaydet lazım olacak 📌 👉Paralel çalış, worktree kraldır 3–5 ayrı git worktree aç, her birinde bağımsız Claude code çalıştır. Claude code ekibinin en büyük verim artışı bu sayede olmuş. 👉Her karmaşık işte önce PLAN moduna gir Plan moduna tüm enerjini verirsen claude code isteğini one shot bile yapabilir. İş sapıttığında hemen geri plan moduna gir, zorlama. 👉Kendi kendini eğiten Claude code docs’u güncelle! Her hatadan sonra hatasını anlat, "bunu kaydet ve bir daha yapma" de. Zamanla hata oranı düşüyor. Bazıları proje başına notes klasörü tutup oraya point ediyor, deneyebilirsin. 👉Tekrarlanan işleri skille çevir Bir işi günde 1’den fazla yapıyorsan → command veya skill haline getir. 👉Bug fixleri Claude Code'a bırak Bug analizini review aracına yaptır, sonra yapıştır + “fix” de → bitti. Sonra “CI testlerini düzelt” de mesela. 👉Prompting level atla - “Değişiklikleri kontrol+test et, testten geçmeden PR açma” - Her fix sonrası: “Şimdi bildiğin her şeyle çözümü baştan yaz” - Detaylı spec ver, belirsizliği sıfırla. 👉Öğrenme modu Explanatory/learning style aç → nedenini açıkla. Karmaşık kodları HTML sunum veya ASCII diagram haline getirmesini iste daha verimli olur. Boşlukları doldurma yeteneğini çağır, sen anlat, Claude code soru sorup boşlukları doldursun. Thank you @bcherny ♥️

203

305

20K

aamoneymaker retweeted

atul

@atullchaurasia

5 months ago

Research papers you must read for AI Engineer interviews - 1. Attention is all you need (Transformers) 2. LoRA (Low rank adaption) 3. PEFT ( Parameter Efficient Fine Tuning) 4. VIT (Vision Transformers) 5. VAE (Variational Auto Encoder) 6. GANs ( Generative Adversarial Networks) 7. BERT ( Bidirectional Encoder Representation from Transformers) 8. Diffusion Models (Stable Diffusion) 9. RAG (Retrieval Augment Generation) 10. GPT (Generative Pre-trained Transformers)

419

143K

aamoneymaker retweeted

Print Hello Berat

@zekkontro33

5 months ago

8 yıllık mobil geliştirme serüvenimde şunu çok kez deneyimledim: Uygulamayı geliştirmek maratonun yarısıysa, mağazada yayına almak diğer yarısı. Ve en sıkıcı olanı. App Store/Play Store girişi, ekran görüntüleri, lokalizasyon, gizlilik metinleri derken asıl ürün geliştirmeden uzaklaşıyoruz. Tüm bunların bir çözümü var, gelin benle +++

481

128K

aamoneymaker retweeted

Harrison Chase

@hwchase17

5 months ago

LangChain vs langgraph vs deepagents When to use each one

324

441

66K

aamoneymaker retweeted

Mehmet

@xenit_v0

5 months ago

Basitlik hayat kurtarır. 16 agent 50 tane skill yerine 1 agent 15 skill doğru düzenlenince çok daha iyi çalışıyor. Maestro

428

781

51K

aamoneymaker retweeted

Semih Kışlar

@semihdev

5 months ago

Akışta Claude Code üyeliğini başlatan çok fazla kişi görüyorum. Eğer aranızda geçenler varsa, 2025’te LLM’lerle 2.000 saat kod yazmış bu adamın paylaştığı Claude Code kullanım pattern’larına mutlaka göz atmanızı tavsiye ederim.

semihdev's tweet photo. Akışta Claude Code üyeliğini başlatan çok fazla kişi görüyorum.

Eğer aranızda geçenler varsa, 2025’te LLM’lerle 2.000 saat kod yazmış bu adamın paylaştığı Claude Code kullanım pattern’larına mutlaka göz atmanızı tavsiye ederim. https://t.co/q9r8f0TIxy

95K

aamoneymaker retweeted

Abdullah Yılmaz

@ASAPabdllh

5 months ago

Etsy’de rakip tag’lerin hacmini ve liste yaşını gösteren Everbee eklentisi bu arada BELEŞ 💯💯💯 Eskidende beleşdi hâla da beleş 🤯 Abi hiç bir şey bilmiyoz diyorsan kur eklentiyi rahatına bak.

131

161

10K

aamoneymaker retweeted

Akshay 🚀

@akshay_pachaar

5 months ago

Everyone is sleeping on this new paper from AWS. A model 100x smaller than GPT and Claude crushed them on tool calling. AWS researchers took Facebook's OPT-350M, a model from 2022 with 500x fewer parameters than GPT, and fine-tuned it on ToolBench for a single epoch. The results are wild: ↳ Their SLM: 77.55% pass rate ↳ ChatGPT-CoT: 26% ↳ ToolLLaMA: 30% ↳ Claude-CoT: 2.73% Here's what's happening: Large models suffer from "parameter dilution." Most of their capacity is optimized for general language tasks, not the precise Thought-Action-Action Input patterns that tool calling needs. A small model trained specifically on tool calling concentrates all its capacity on that one thing. No distractions. The training setup was surprisingly simple. Hugging Face TRL, 187K examples, learning rate of 5e-5, and aggressive gradient clipping for stability. But I want to be clear on something: This doesn't mean small models win everywhere. The authors acknowledge their model may struggle with complex contextual nuances or ambiguous requests. It's a specialist, not a generalist. Still, if you're building agentic systems and want to cut inference costs by orders of magnitude, this is worth paying attention to. I've shared link to the paper in the next tweet.

akshay_pachaar's tweet photo. Everyone is sleeping on this new paper from AWS.

A model 100x smaller than GPT and Claude crushed them on tool calling.

AWS researchers took Facebook's OPT-350M, a model from 2022 with 500x fewer parameters than GPT, and fine-tuned it on ToolBench for a single epoch.

The results are wild:

↳ Their SLM: 77.55% pass rate
↳ ChatGPT-CoT: 26%
↳ ToolLLaMA: 30%
↳ Claude-CoT: 2.73%

Here's what's happening:

Large models suffer from "parameter dilution." Most of their capacity is optimized for general language tasks, not the precise Thought-Action-Action Input patterns that tool calling needs.

A small model trained specifically on tool calling concentrates all its capacity on that one thing. No distractions.

The training setup was surprisingly simple. Hugging Face TRL, 187K examples, learning rate of 5e-5, and aggressive gradient clipping for stability.

But I want to be clear on something:

This doesn't mean small models win everywhere. The authors acknowledge their model may struggle with complex contextual nuances or ambiguous requests. It's a specialist, not a generalist.

Still, if you're building agentic systems and want to cut inference costs by orders of magnitude, this is worth paying attention to.

I've shared link to the paper in the next tweet.

544

500

37K

aamoneymaker retweeted

merve

@mervenoyann

5 months ago

🙌🏻 Qwen3-VL has all you need for e2e multimodal RAG I have put together a notebook, you can run on a free Colab (T4)! 🔥

462

384

38K

aamoneymaker retweeted

Tech with Mak

@techNmak

5 months ago

Meta just solved RAG's biggest bottleneck. 30× faster decoding. Zero accuracy loss. The problem nobody talks about: When you feed an LLM 80 retrieved passages, only 5-10 are actually useful. The rest? Dead weight. But you're computing attention for ALL of them. The math is brutal: Traditional RAG with 16K context: → 100+ seconds to first token → 10× throughput drop → Massive memory waste What REFRAG does: Compresses context chunks into single embeddings. Instead of processing 16,384 tokens → Process 1,024 chunk embeddings. The results: ✓ 30.85× faster time-to-first-token ✓ Zero perplexity loss ✓ 16× context extension (4K → 64K tokens) ✓ 3.75× better than previous SOTA Why it works: RAG contexts have sparse attention patterns. Most retrieved passages don't interact. REFRAG exploits this with: 1./ Precomputable embeddings - Cached from retrieval, reused across inferences 2./ RL-based compression - Smart policy decides what to compress 3./ Works anywhere - Unlike previous methods, compresses at any position Real impact: • 8 passages at single-passage latency • Better accuracy with weak retrievers • Handles unlimited conversation history • No model architecture changes needed This changes RAG economics: More context + Lower latency. (Link to the Meta paper in comments) ♻️ Repost to save someone $$$ and a lot of confusion. ✔️ You can follow @techNmak, for more insights.

techNmak's tweet photo. Meta just solved RAG's biggest bottleneck.

30× faster decoding. Zero accuracy loss.

The problem nobody talks about:

When you feed an LLM 80 retrieved passages, only 5-10 are actually useful.

The rest? Dead weight. But you're computing attention for ALL of them.

The math is brutal:

Traditional RAG with 16K context: → 100+ seconds to first token → 10× throughput drop → Massive memory waste

What REFRAG does:
Compresses context chunks into single embeddings.

Instead of processing 16,384 tokens → Process 1,024 chunk embeddings.

The results:
✓ 30.85× faster time-to-first-token
✓ Zero perplexity loss
✓ 16× context extension (4K → 64K tokens)
✓ 3.75× better than previous SOTA

Why it works:
RAG contexts have sparse attention patterns. Most retrieved passages don't interact. REFRAG exploits this with:

1./ Precomputable embeddings - Cached from retrieval, reused across inferences
2./ RL-based compression - Smart policy decides what to compress
3./ Works anywhere - Unlike previous methods, compresses at any position

Real impact:
• 8 passages at single-passage latency
• Better accuracy with weak retrievers
• Handles unlimited conversation history
• No model architecture changes needed

This changes RAG economics: More context + Lower latency.

(Link to the Meta paper in comments)

♻️ Repost to save someone $$$ and a lot of confusion.
✔️ You can follow @techNmak, for more insights.

188

64K

aamoneymaker retweeted

Matt Dancho (Business Science)

@mdancho84

5 months ago

🚨NEW: Awesome Generative AI Data Scientist GitHub Repo LangGraph Ecosystem: 1. Prebuilt Agents 2. AI Data Science Agents 3. LangMem 4. LangGraph Supervisor 5. Open Deep Research 6. LangGraph Reflection 7. LangGraph Big Tool 8. LangGraph CodeAct 9. LangGraph Swarm 10. LangGraph MCP Adapters

mdancho84's tweet photo. 🚨NEW: Awesome Generative AI Data Scientist GitHub Repo

LangGraph Ecosystem:

1. Prebuilt Agents
2. AI Data Science Agents
3. LangMem
4. LangGraph Supervisor
5. Open Deep Research
6. LangGraph Reflection
7. LangGraph Big Tool
8. LangGraph CodeAct
9. LangGraph Swarm
10. LangGraph MCP Adapters

385

393

19K

aamoneymaker retweeted

Huseyin Hobek

@huseyinhobek_

5 months ago

@acerionsjournal bir diğer tool da ben ekleyim öyleyse; https://t.co/z9IPVOlUQs

761

aamoneymaker retweeted

Alican Selçuk

@Alican_Selcuk

5 months ago

@acerionsjournal Orjinal gönderiyi de keşke paylaşsaydın: https://t.co/E8OYoahwxj

aamoneymaker retweeted

AcerionsJournal

@acerionsjournal

5 months ago

REPO: https://t.co/Jy5Q5ugqww

141

aamoneymaker retweeted

Andrej Karpathy

@karpathy

5 months ago

New post: nanochat miniseries v1 The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent. For the first public release of nanochat my focus was on end-to-end pipeline that runs the whole LLM pipeline with all of its stages. Now after YOLOing a few runs earlier, I'm coming back around to flesh out some of the parts that I sped through, starting of course with pretraining, which is both computationally heavy and critical as the foundation of intelligence and knowledge in these models. After locally tuning some of the hyperparameters, I swept out a number of models fixing the FLOPs budget. (For every FLOPs target you can train a small model a long time, or a big model for a short time.) It turns out that nanochat obeys very nice scaling laws, basically reproducing the Chinchilla paper plots: Which is just a baby version of this plot from Chinchilla: Very importantly and encouragingly, the exponent on N (parameters) and D (tokens) is equal at ~=0.5, so just like Chinchilla we get a single (compute-independent) constant that relates the model size to token training horizons. In Chinchilla, this was measured to be 20. In nanochat it seems to be 8! Once we can train compute optimal models, I swept out a miniseries from d10 to d20, which are nanochat sizes that can do 2**19 ~= 0.5M batch sizes on 8XH100 node without gradient accumulation. We get pretty, non-itersecting training plots for each model size. Then the fun part is relating this miniseries v1 to the GPT-2 and GPT-3 miniseries so that we know we're on the right track. Validation loss has many issues and is not comparable, so instead I use the CORE score (from DCLM paper). I calculated it for GPT-2 and estimated it for GPT-3, which allows us to finally put nanochat nicely and on the same scale: The total cost of this miniseries is only ~$100 (~4 hours on 8XH100). These experiments give us confidence that everything is working fairly nicely and that if we pay more (turn the dial), we get increasingly better models. TLDR: we can train compute optimal miniseries and relate them to GPT-2/3 via objective CORE scores, but further improvements are desirable and needed. E.g., matching GPT-2 currently needs ~$500, but imo should be possible to do <$100 with more work. Full post with a lot more detail is here: https://t.co/na8zVLqWLf And all of the tuning and code is pushed to master and people can reproduce these with scaling_laws .sh and miniseries .sh bash scripts.

karpathy's tweet photo. New post: nanochat miniseries v1

The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent. For the first public release of nanochat my focus was on end-to-end pipeline that runs the whole LLM pipeline with all of its stages. Now after YOLOing a few runs earlier, I'm coming back around to flesh out some of the parts that I sped through, starting of course with pretraining, which is both computationally heavy and critical as the foundation of intelligence and knowledge in these models.

After locally tuning some of the hyperparameters, I swept out a number of models fixing the FLOPs budget. (For every FLOPs target you can train a small model a long time, or a big model for a short time.) It turns out that nanochat obeys very nice scaling laws, basically reproducing the Chinchilla paper plots:

Which is just a baby version of this plot from Chinchilla:
Very importantly and encouragingly, the exponent on N (parameters) and D (tokens) is equal at ~=0.5, so just like Chinchilla we get a single (compute-independent) constant that relates the model size to token training horizons. In Chinchilla, this was measured to be 20. In nanochat it seems to be 8!

Once we can train compute optimal models, I swept out a miniseries from d10 to d20, which are nanochat sizes that can do 2**19 ~= 0.5M batch sizes on 8XH100 node without gradient accumulation. We get pretty, non-itersecting training plots for each model size.

Then the fun part is relating this miniseries v1 to the GPT-2 and GPT-3 miniseries so that we know we're on the right track. Validation loss has many issues and is not comparable, so instead I use the CORE score (from DCLM paper). I calculated it for GPT-2 and estimated it for GPT-3, which allows us to finally put nanochat nicely and on the same scale:
The total cost of this miniseries is only ~$100 (~4 hours on 8XH100). These experiments give us confidence that everything is working fairly nicely and that if we pay more (turn the dial), we get increasingly better models.

TLDR: we can train compute optimal miniseries and relate them to GPT-2/3 via objective CORE scores, but further improvements are desirable and needed. E.g., matching GPT-2 currently needs ~$500, but imo should be possible to do <$100 with more work.

Full post with a lot more detail is here:
https://t.co/na8zVLqWLf
And all of the tuning and code is pushed to master and people can reproduce these with scaling_laws .sh and miniseries .sh bash scripts.

226

673

713K

aamoneymaker retweeted

Ozancan Özdemir

@OzancanOzdemir

5 months ago

Adamın her postu mini ders gibi. Kısaca, Karpathy, LLM eğitimini "compute" kadranıyla kontrol edilen bir model ailesi olarak kurgulamış ve Chinchilla grafiklerini birebir yeniden üretmişler. Sadece 100 dolar harcayarak GPT-2/3 ayarında skorlar elde etmişler ve hedef bu maliyeti daha da düşürmekmiş. Açık kaynak LLM eğitimi için de bir yol haritası çıkarmışlar.

113

125

15K

aamoneymaker retweeted

Femke Plantinga

@femke_plantinga

5 months ago

Good context engineering isn’t just 𝘩𝘰𝘸 you chunk. It’s 𝘸𝘩𝘦𝘯 you chunk. And that timing choice creates two completely different architectures. 𝗣𝗿𝗲-𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴: (the classic way) Everything happens offline before a user ever sends a query. 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄: clean → chunk → embed → store → fast retrieval 𝗣𝗿𝗼𝘀: ✓ Lightning-fast retrieval ✓ Simple, stable architecture ✓ Predictable performance 𝗖𝗼𝗻𝘀: ⚠️ Locked into your chunking strategy ⚠️ Costly to change chunk sizes ⚠️ Not adaptive to query context 𝗣𝗼𝘀𝘁-𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴: (the advanced approach) Instead of chunking upfront, you chunk after retrieval, based on the actual query. 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄: retrieve doc → chunk dynamically → rerank → send to LLM 𝗣𝗿𝗼𝘀: ✓ Query-aware chunking ✓ More relevant results ✓ Adapts to diverse question types 𝗖𝗼𝗻𝘀: ⚠️ Higher latency ⚠️ More infrastructure ⚠️ More compute per query 𝗪𝗵𝗶𝗰𝗵 𝘀𝗵𝗼𝘂𝗹𝗱 𝘆𝗼𝘂 𝗰𝗵𝗼𝗼𝘀𝗲? • 𝗣𝗿𝗲-𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴 → you need speed + simplicity • 𝗣𝗼𝘀𝘁-𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴 → you need flexibility + relevance Want to go deeper? The free guide about chunking strategies for context engineering breaks it down 🧡 https://t.co/tIKNrsbWy1

femke_plantinga's tweet photo. Good context engineering isn’t just 𝘩𝘰𝘸 you chunk.

It’s 𝘸𝘩𝘦𝘯 you chunk.

And that timing choice creates two completely different architectures.

𝗣𝗿𝗲-𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴: (the classic way)
Everything happens offline before a user ever sends a query.

𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄: clean → chunk → embed → store → fast retrieval

𝗣𝗿𝗼𝘀:
✓ Lightning-fast retrieval
✓ Simple, stable architecture
✓ Predictable performance

𝗖𝗼𝗻𝘀:
⚠️ Locked into your chunking strategy
⚠️ Costly to change chunk sizes
⚠️ Not adaptive to query context

𝗣𝗼𝘀𝘁-𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴: (the advanced approach)
Instead of chunking upfront, you chunk after retrieval, based on the actual query.

𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄: retrieve doc → chunk dynamically → rerank → send to LLM

𝗣𝗿𝗼𝘀:
✓ Query-aware chunking
✓ More relevant results
✓ Adapts to diverse question types

𝗖𝗼𝗻𝘀:
⚠️ Higher latency
⚠️ More infrastructure
⚠️ More compute per query

𝗪𝗵𝗶𝗰𝗵 𝘀𝗵𝗼𝘂𝗹𝗱 𝘆𝗼𝘂 𝗰𝗵𝗼𝗼𝘀𝗲?
• 𝗣𝗿𝗲-𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴 → you need speed + simplicity
• 𝗣𝗼𝘀𝘁-𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴 → you need flexibility + relevance

Want to go deeper? The free guide about chunking strategies for context engineering breaks it down 🧡
https://t.co/tIKNrsbWy1

673

117

595

32K

Patrick Batman

@aamoneymaker

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users