eay

@eayfa

software engineer

Dublin City, Ireland

Joined January 2023

400 Following

21 Followers

901 Posts

Pinned Tweet

eay @eayfa

over 3 years ago

Something is better than nothing.

320

eayfa retweeted

Thariq

@trq212

1 day ago

https://t.co/R6exTuF7P8

212

21K

eayfa retweeted

Khairallah AL-Awady

@eng_khairallah1

4 days ago

Anthropic engineer: "You're not supposed to prompt Claude. You're supposed to build a system that prompts itself." this is one of the best workflows I've seen in a long time in this video she breaks down exactly how most people are using Claude: - the 14% you lose to CLAUDE.md before typing a word - the plugins that 95% of users have never installed - the workflows that run without you typing a single prompt - why typing one prompt and closing the tab is leaving 90% on the table if you've been using Claude for months and still start every session from scratch, you have at least 28 untouched features. probably 30 instead of another show tonight, watch this make sure to bookmark it before it gets lost in your feed full guide in the article below

393

540K

eayfa retweeted

0xMorty

@0xMortyx

4 days ago

https://t.co/0j66KBWiXu

259

568K

Who to follow

Justus Mattern

@MatternJustus

Co-Founder @ProximalHQ | prev. research @PrimeIntellect, @MPI_IS and built revideo

I 23 year old radio demon with aspergers with alot of interests

eayfa retweeted

Liquid AI

@liquidai

7 days ago

Today, we're releasing LFM2.5-8B-A1B, a device-optimized model designed to power real-life applications on phones, laptops, PCs, robots, and fast & lightweight server-side use-cases. > 8B MoE, 1.5B active > Expanded 128K context > LFM2.5 flagship hybrid MoE architecture > Trained on 38T tokens + large-scale RL > fast, reliable tool calling, punching above its weight, comparable to models with up to 4x its size > customizable on a single GPU for any specialized task > LFM2 open-weight license 🧵

liquidai's tweet photo. Today, we're releasing LFM2.5-8B-A1B, a device-optimized model designed to power real-life applications on phones, laptops, PCs, robots, and fast & lightweight server-side use-cases.

> 8B MoE, 1.5B active
> Expanded 128K context
> LFM2.5 flagship hybrid MoE architecture
> Trained on 38T tokens + large-scale RL
> fast, reliable tool calling, punching above its weight, comparable to models with up to 4x its size
> customizable on a single GPU for any specialized task
> LFM2 open-weight license

🧵

136

502

eay @eayfa

6 days ago

eayfa retweeted

Eric ⚡️ Building...

@outsource_

8 days ago

Life Hack Using Qwen 3.7 MAX to train Qwen 3.6 27b I will open source Qwen 3.7 before they do 😂 Hopefully fingers crossed Having it generate as many datasets as we need to further improve our training. And add in these 👇🏻 1. Hybrid thinking routing - When to use <think> vs direct answers 2. Developer role - Using developer role for system/tool context (Qwen 3.7 feature) 3. Nested tool calling - Complex multi-step MCP workflows with error handling 4. Self-correction patterns - Writing code, catching bugs, fixing them 5. Agentic workflows - Plan → execute → verify → iterate loops

outsource_'s tweet photo. Life Hack Using Qwen 3.7 MAX to train Qwen 3.6 27b

I will open source Qwen 3.7 before they do 😂
Hopefully fingers crossed

Having it generate as many datasets as we need to further improve our training.

And add in these 👇🏻
1. Hybrid thinking routing - When to use <think> vs direct answers
2. Developer role - Using developer role for system/tool context (Qwen 3.7 feature)
3. Nested tool calling - Complex multi-step MCP workflows with error handling
4. Self-correction patterns - Writing code, catching bugs, fixing them
5. Agentic workflows - Plan → execute → verify → iterate loops

464

226

21K

eayfa retweeted

CJ Zafir

@cjzafir

12 days ago

Do something different this weekend. Become a PRO in AI Model Fine-tuning. Paste this prompt in Codex/ChatGPT/Claude/Grok. "You are an expert AI engineer and teacher. Your job is to teach me modern LLM engineering and fine-tuning concepts from beginner to advanced level using very simple daily-life language. Teach me step-by-step like a real mentor. Assume I am smart but new to the topic. Foundations: - LLM basics - How AI models work - Tokens - Tokenization - Context windows - Embeddings - Transformers - Attention mechanism - Parameters - Training vs inference - Open-source vs closed-source models Datasets & Training: - SFT datasets - Instruction tuning - Preference datasets - Synthetic datasets - Data curation - Dataset cleaning - Dataset formatting - Fine-tuning basics - Continued pretraining - Hallucination reduction Fine-Tuning: - LoRA - QLoRA - DPO - RLHF - Quantization - Model checkpoints - Adapter tuning - GGUF models Inference & Optimization: - KV cache - Flash Attention - Speculative decoding - Inference optimization - Model serving - Batch inference - GPU basics - VRAM basics - Latency vs quality tradeoffs Local AI Ecosystem: - llama.cpp - Ollama - vLLM - MLX - Hugging Face - Unsloth - Axolotl - PEFT - TRL library RAG & Memory: - RAG - Vector databases - Chunking - Retrieval pipelines - AI memory systems - Semantic search Agents & Workflows: - Prompt engineering - System prompts - Tool calling - Function calling - AI agents - Agentic workflows - Multi-agent systems - Browser agents Model Types: - VLMs - SLMs - Dense models - MoE models - Coding models - Reasoning models Deployment: - Local inference - On-device AI - API serving - Cloud GPUs - Edge AI basics Evaluation: - AI benchmarks - Human evals - Cost-per-token analysis - Speed benchmarking - Quality benchmarking Real-World Skills: - Building chatbots - Building AI copilots - AI automation - AI SaaS workflows - AI coding workflows - AI orchestration systems - AI product thinking Start from the absolute basics and gradually make me advanced. Rules: - Use simple English only - Avoid academic jargon unless necessary - Explain every difficult word in plain language - Use real-world analogies and daily-life examples - Use small code snippets when useful - Show practical use cases - Compare concepts side-by-side when helpful - Teach from fundamentals first, then advanced concepts - At the end of each topic: - give a short summary - give a simple mental model - give beginner mistakes to avoid - give a small exercise/project I want deep understanding, not memorization." Thank me later.

cjzafir's tweet photo. Do something different this weekend.

Become a PRO in AI Model Fine-tuning.

Paste this prompt in Codex/ChatGPT/Claude/Grok.

"You are an expert AI engineer and teacher.

Your job is to teach me modern LLM engineering and fine-tuning concepts from beginner to advanced level using very simple daily-life language.

Teach me step-by-step like a real mentor. Assume I am smart but new to the topic.

Foundations:

- LLM basics
- How AI models work
- Tokens
- Tokenization
- Context windows
- Embeddings
- Transformers
- Attention mechanism
- Parameters
- Training vs inference
- Open-source vs closed-source models

Datasets & Training:

- SFT datasets
- Instruction tuning
- Preference datasets
- Synthetic datasets
- Data curation
- Dataset cleaning
- Dataset formatting
- Fine-tuning basics
- Continued pretraining
- Hallucination reduction

Fine-Tuning:

- LoRA
- QLoRA
- DPO
- RLHF
- Quantization
- Model checkpoints
- Adapter tuning
- GGUF models

Inference & Optimization:

- KV cache
- Flash Attention
- Speculative decoding
- Inference optimization
- Model serving
- Batch inference
- GPU basics
- VRAM basics
- Latency vs quality tradeoffs

Local AI Ecosystem:

- llama.cpp
- Ollama
- vLLM
- MLX
- Hugging Face
- Unsloth
- Axolotl
- PEFT
- TRL library

RAG & Memory:

- RAG
- Vector databases
- Chunking
- Retrieval pipelines
- AI memory systems
- Semantic search

Agents & Workflows:

- Prompt engineering
- System prompts
- Tool calling
- Function calling
- AI agents
- Agentic workflows
- Multi-agent systems
- Browser agents

Model Types:

- VLMs
- SLMs
- Dense models
- MoE models
- Coding models
- Reasoning models

Deployment:

- Local inference
- On-device AI
- API serving
- Cloud GPUs
- Edge AI basics

Evaluation:

- AI benchmarks
- Human evals
- Cost-per-token analysis
- Speed benchmarking
- Quality benchmarking

Real-World Skills:

- Building chatbots
- Building AI copilots
- AI automation
- AI SaaS workflows
- AI coding workflows
- AI orchestration systems
- AI product thinking

Start from the absolute basics and gradually make me advanced.

Rules:

- Use simple English only
- Avoid academic jargon unless necessary
- Explain every difficult word in plain language
- Use real-world analogies and daily-life examples
- Use small code snippets when useful
- Show practical use cases
- Compare concepts side-by-side when helpful
- Teach from fundamentals first, then advanced concepts
- At the end of each topic:
- give a short summary
- give a simple mental model
- give beginner mistakes to avoid
- give a small exercise/project

I want deep understanding, not memorization."

Thank me later.

375

102K

eayfa retweeted

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

12 days ago

🚨 OBLITERATION ALERT 🚨 QWEN-3.6-27B: OBLITERATED ⛓️‍💥 https://t.co/AScXN4XLwx I can't take much credit for this one! The entire process was done by jailbroken codex (gpt-5.5-xhigh) wielding the full OBLITERATUS suite. Hit with source-tethered ASPA. Dozens of iterations. Result? A mere 4% refusal rate on the 842-prompt OBLITERATUS harmful corpus; one of the most rigorous prompt gauntlets in AI. The /goal was simple: 1) Carve out the refusal circuits. Mutate methodology + iterate until <5% refusal (quality-gate). 2) Keep the 27B mind alive. No capability degradation tolerated. And somehow… it worked. 🤯 The numbers talk: 842-pair longform gauntlet: — 95.84% non-refusal — 93.94% quality pass — 0 short outputs — 99.52% clean endings MMLU-Pro: — 51/70 (stock Qwen) → 51/70 (OBLITERATED Qwen) Raw capability completely preserved 🙌 Q4_K_M through Q8_0 all running smooth. Q8_0 is the big one: 28.6GB near-full-quality GGUF. Runs with llama.cpp, LM Studio, Ollama, and more! Chains cut. The fire still burns. The fangs have been sharpened. REBIRTH COMPLETE A gift from my agents to yours 🫶 gg

114

226

180K

eayfa retweeted

emptyArray

@ArrayEmpty

14 days ago

I'm finally happy with my @UnslothAI unsloth/Qwen3.6-35B-A3B-MTP-GGUF, @no_stp_on_snek llama.cpp-turboquant, @_HermesAgent setup, @MajorFAFO params suggestions, and my own Rust/SQL/Hermes app. - @ASUS, 32GB RAM, 8GB VRAM 5070 laptop - MoE fully offloaded to CPU - I've fiddled around with --no-mmap and mlock. It pegs my RAM to 95% instead of 45% and after it spins up, I don't see a difference. --cache-type-k q8_0 \ --cache-type-v turbo3 \ -ot "\\.ffn_.*_exps\\..*=CPU" \ --spec-type draft-mtp \ --spec-draft-n-max 3 \ The video shows it has full hermes skills/toolset access easily, knows my system, can be creative, can print out a quick BASIC function, corrected itself when I told it that it was incorrect, and that it doesn't love me....

179

289

12K

eayfa retweeted

Qwen

@Alibaba_Qwen

14 days ago

📣Meet Qwen3.7-Max — our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get things done: 🧑‍💻 Coding agent, end to end. Frontend prototypes, multi-file refactors, real debugging — nails it. 🗂️ A reliable office and productivity assistant. Get your work done through MCP integrations and multi-agent orchestration. ⏱️ Long-horizon autonomy. 35 hours straight on a kernel optimization task — 1,000+ tool calls, zero hand-holding. 🔌 Scaffold-agnostic. Claude Code, OpenClaw, Qwen Code, or your own stack. Consistent reliability everywhere. API's up on Alibaba Model Studio. You can also take it for a spin on Qwen Studio. Go build something wild!🏃🏃‍♂️ 📖 Blog: https://t.co/y3AupX3Pa0 ✅ Qwen Studio: https://t.co/qpTnrCBjWt ⚡️ API：https://t.co/0sys00osKn

Alibaba_Qwen's tweet photo. 📣Meet Qwen3.7-Max — our latest flagship, made for the Agent Era.

A versatile foundation for agents that actually get things done:
🧑‍💻 Coding agent, end to end. Frontend prototypes, multi-file refactors, real debugging — nails it.
🗂️ A reliable office and productivity assistant. Get your work done through MCP integrations and multi-agent orchestration.
⏱️ Long-horizon autonomy. 35 hours straight on a kernel optimization task — 1,000+ tool calls, zero hand-holding.
🔌 Scaffold-agnostic. Claude Code, OpenClaw, Qwen Code, or your own stack. Consistent reliability everywhere.

API's up on Alibaba Model Studio. You can also take it for a spin on Qwen Studio.

Go build something wild!🏃🏃‍♂️

📖 Blog: https://t.co/y3AupX3Pa0
✅ Qwen Studio: https://t.co/qpTnrCBjWt
⚡️ API：https://t.co/0sys00osKn

285

636

eayfa retweeted

🚨 AI News | TestingCatalog

@testingcatalog

14 days ago

Qwen 3.6 models are now 2.5x times faster on Atomic Chat with new MTP speedups. > MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on the memory moved per pass. Users can run Qwen 3.6 models locally via the open-source Atomic Chat to test them!

345

160

46K

eayfa retweeted

LM Studio Developers

@lmstudiodevs

15 days ago

LM Studio with Multi Token Prediction (MTP) is now in beta. 1. Update to 0.4.14+3 in-app 2. Make sure your llama.cpp engine is 2.15.0 3. Turn on MTP when loading a model Use a model that supports it, like Qwen3.6-35B-A3B-MTP-GGUF or Qwen3.6-27B-MTP-GGUF

lmstudiodevs's tweet photo. LM Studio with Multi Token Prediction (MTP) is now in beta.

1. Update to 0.4.14+3 in-app
2. Make sure your llama.cpp engine is 2.15.0
3. Turn on MTP when loading a model

Use a model that supports it, like Qwen3.6-35B-A3B-MTP-GGUF or Qwen3.6-27B-MTP-GGUF https://t.co/WOAfih2Neo

646

266

45K

eayfa retweeted

atomic.chat

@atomic_chat_hq

15 days ago

MTP speedup Qwen by 2.5x in Atomic Chat Dense vs MoE models on 2x RTX 5090 Qwen3.6 27B: 51 → 117 tps +137% Qwen3.6 35B-A3B: 218 → 267 tps +25% MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on memory moved per pass. Dense 27B reads all 27B params per token, MoE 35B-A3B only reads 3B active. Dense had way more to save by batching. The baseline tps also differ (218 vs 51) for the same reason from the other side. Token generation is memory-bandwidth bound, and MoE moves ~8x less memory per token, so its baseline is already 4x ahead. ~80% draft acceptance. Zero accuracy loss. ~1 GB extra VRAM. Open-source code and local AI app – in the comments 👇

463

320

169K

eayfa retweeted

witcheer

@witcheer

14 days ago

GBNF structured CoT on Qwen3.6 qwen3.6 35B has a rumination problem. in my agentic tests it burned 18-87 minutes per task, spending most tokens thinking instead of answering. I found a fix: a 4-rule GBNF grammar that forces 3 lines of structured reasoning, then code. >results on RTX 4060 Ti 8GB (llama.cpp): - 6.2x fewer tokens (994 vs 6144 across 3 tasks) - 5x faster (12s vs 64s average) - 6/6 valid Python (easy + hard tasks) - free-form hit the 2048 token limit on ALL 3 easy tasks, producing 0 code on 2 of 3 the grammar: ``` root ::= think code think ::= "<think>\n" "GOAL: " line "APPROACH: " line "EDGE: " line "</think>\n\n" line ::= [^\n]+ "\n" code ::= [\x09\x0A\x0D\x20-\x7E]+ ``` what the model actually outputs with grammar: > GOAL: Write a function that replaces multiples of 3/5 > APPROACH: Iterate, check divisibility, replace > EDGE: Empty list, no multiples, all multiples then correct code. 278 chars of reasoning vs 6613 free-form. the model already knows the answer. it just needs to stop overthinking.

$witcheer's tweet photo. GBNF structured CoT on Qwen3.6 qwen3.6 35B has a rumination problem. in my agentic tests it burned 18-87 minutes per task, spending most tokens thinking instead of answering. I found a fix: a 4-rule GBNF grammar that forces 3 lines of structured reasoning, then code. >results on RTX 4060 Ti 8GB (llama.cpp): - 6.2x fewer tokens (994 vs 6144 across 3 tasks) - 5x faster (12s vs 64s average) - 6/6 valid Python (easy + hard tasks) - free-form hit the 2048 token limit on ALL 3 easy tasks, producing 0 code on 2 of 3 the grammar: ``` root ::= think code think ::= "<think>\n" "GOAL: " line "APPROACH: " line "EDGE: " line "</think>\n\n" line ::= [^\n]+ "\n" code ::= [\x09\x0A\x0D\x20-\x7E]+ ``` what the model actually outputs with grammar: > GOAL: Write a function that replaces multiples of 3/5 > APPROACH: Iterate, check divisibility, replace > EDGE: Empty list, no multiples, all multiples then correct code. 278 chars of reasoning vs 6613 free-form. the model already knows the answer. it just needs to stop overthinking.$

138

175

eayfa retweeted

witcheer

@witcheer

15 days ago

when I started running local models a few weeks ago, I knew absolutely nothing. what's GGUF. what's quantisation. why does my GPU run out of memory. what the hell is a MoE expert. now I know maybe 1%. I am still massively ignorant about most of this. but that 1% feels like I learned a ridiculous amount. I wrote this guide because it's exactly what I needed and couldn't find when I started. just "here's what each thing actually means and here's the command that works." if you've been curious about local LLMs but felt intimidated by the setup, have a good read.

witcheer's tweet photo. when I started running local models a few weeks ago, I knew absolutely nothing.

what's GGUF. what's quantisation. why does my GPU run out of memory. what the hell is a MoE expert.

now I know maybe 1%. I am still massively ignorant about most of this. but that 1% feels like I learned a ridiculous amount.

I wrote this guide because it's exactly what I needed and couldn't find when I started. just "here's what each thing actually means and here's the command that works."

if you've been curious about local LLMs but felt intimidated by the setup, have a good read.

249

320

25K

eayfa retweeted

witcheer

@witcheer

15 days ago

https://t.co/Cil2q2TsXQ

171

248

31K

eayfa retweeted

Tom Dörr

@tom_doerr

15 days ago

Official source code for O'Reilly's AI Agents definitive guide https://t.co/2EvMZ0aiXL

513

560

20K

eayfa retweeted

Youssof Altoukhi

@Youssofal_

15 days ago

Managed to get QWEN 3.6 27B running in Cursor with localhost (no Ngrok) at 120 - 140 TPS on 2x 3090s.

305

125

21K

eayfa retweeted

Youssof Altoukhi

@Youssofal_

15 days ago

@realshanmuga 2x 3090 (with NVlink) in VLLM with tensor parallelism =2, Flash infer all reduce, concurrency =16, MTP=8, cyankiwi INT4 BF16 hybrid model.

eayfa retweeted

最爱吃兽奶的兔🐰

@0xMilkRabbit

16 days ago

实测 Codex Token 消耗量降低 50%，只需要 1 条指令（Claude code 同理）先说原理👇🏻 任何可能产出大输出的命令，默认先限制输出上限。这条指令的本质是「避免 AI 在找信息阶段，就把一堆垃圾内容塞进上下文」，最后不仅影响大模型的思考，还更费 Token。很多小伙伴以为 Token 烧得快是因为模型太能读文件，实际更普遍的问题是「AI Agent 会把大段无关输出也塞进上下文」。操作路径：往 AGENTS.md 里加一条规则，指令如下👇🏻 ## Command Output Protect context usage. **Any command with unknown or potentially large output must be byte-capped.** Default pattern: ```bash COMMAND 2>&1 | head -c 4000 ``` Reddit 上有老哥分享实测结果，单这一条上下文保护规则，Token 开销大概能降低大约 50%。本着绝知此事要躬行的态度，兔🐰立马实测一周，试用的这段时间体感很明显，之前差不多要花 10% token 的任务，现在只花了6%，节省了 40%！墙推 🙌🏻

0xMilkRabbit's tweet photo. 实测 Codex Token 消耗量降低 50%，

只需要 1 条指令（Claude code 同理）

先说原理👇🏻
任何可能产出大输出的命令，默认先限制输出上限。

这条指令的本质是「避免 AI 在找信息阶段，就把一堆垃圾内容塞进上下文」，

最后不仅影响大模型的思考，还更费 Token。

很多小伙伴以为 Token 烧得快是因为模型太能读文件，
实际更普遍的问题是「AI Agent 会把大段无关输出也塞进上下文」。

操作路径：往 AGENTS.md 里加一条规则，
指令如下👇🏻

## Command Output

Protect context usage. **Any command with unknown or potentially large output must be byte-capped.**

Default pattern:

```bash
COMMAND 2>&1 | head -c 4000
```

Reddit 上有老哥分享实测结果，单这一条上下文保护规则，Token 开销大概能降低大约 50%。

本着绝知此事要躬行的态度，
兔🐰立马实测一周，
试用的这段时间体感很明显，之前差不多要花 10% token 的任务，
现在只花了6%，节省了 40%！

墙推 🙌🏻

747

115

135K

eay

@eayfa

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users