Hakan Uysal @hakanu_ - Twitter Profile

Pinned Tweet

Hakan Uysal @hakanu_

over 7 years ago

Story of my life

0

29

2

0

Hakan Uysal @hakanu_

about 3 hours ago

Then distributes your private traces to 60 different providers with unclear t&c instead of 1 with somewhat clear t&c.

Vaibhav Sisinty

@VaibhavSisinty

about 6 hours ago

I just found a tool that makes your Claude Code sessions basically unlimited. It's called 9Router and it's trending on GitHub right now. It sits between Claude Code and 60+ AI providers. One local endpoint. That's it. When your Claude Code quota runs out, it switches to a cheaper model. When that runs out, it drops to a completely free one. You don't notice the switch. Your session never stops. → Works with Claude Code, Cursor, Codex, Cline, Copilot, and more. One setup covers your entire stack. → Built-in token compression saves 20 to 40% on every request. Same answers, fewer tokens to get there. → Tracks your quota per provider in a live dashboard so you always know where you stand. → Translates between OpenAI, Claude, and Gemini formats automatically. Any tool talks to any provider. The free tier alone is wild. Kiro gives you unlimited Claude Sonnet 4.5. iFlow gives you unlimited Kimi, GLM, and MiniMax. Qwen gives you unlimited Qwen 3 Coder. Setup is two steps. Install it, point your tool at localhost:20128. Done. For anyone burning through Claude credits mid-session or tired of hitting rate limits at 2am, this changes what's possible on a near-zero budget.

VaibhavSisinty's tweet photo. I just found a tool that makes your Claude Code sessions basically unlimited. It's called 9Router and it's trending on GitHub right now.

It sits between Claude Code and 60+ AI providers. One local endpoint. That's it.

When your Claude Code quota runs out, it switches to a cheaper model.

When that runs out, it drops to a completely free one. You don't notice the switch. Your session never stops.

→ Works with Claude Code, Cursor, Codex, Cline, Copilot, and more. One setup covers your entire stack.

→ Built-in token compression saves 20 to 40% on every request. Same answers, fewer tokens to get there.

→ Tracks your quota per provider in a live dashboard so you always know where you stand.

→ Translates between OpenAI, Claude, and Gemini formats automatically. Any tool talks to any provider.

The free tier alone is wild. Kiro gives you unlimited Claude Sonnet 4.5. iFlow gives you unlimited Kimi, GLM, and MiniMax. Qwen gives you unlimited Qwen 3 Coder.

Setup is two steps. Install it, point your tool at localhost:20128. Done.

For anyone burning through Claude credits mid-session or tired of hitting rate limits at 2am, this changes what's possible on a near-zero budget.

10

55

8

106

6K

0

76

Hakan Uysal @hakanu_

4 days ago

Tuning my news feed

0

48

Hakan Uysal @hakanu_

5 days ago

since the heatwave is gone so my GPU and ROG can reach better inference speeds. ROG ally does more throughput than free Gemini and my GPU almost 5x free gemini. Still 370h to go.

hakanu_'s tweet photo. since the heatwave is gone so my GPU and ROG can reach better inference speeds.

ROG ally does more throughput than free Gemini and my GPU almost 5x free gemini.

Still 370h to go. https://t.co/FZnlJW51wC

0

29

Who to follow

Onur Karaagaoglu 📷🚴‍♂️☕️✈️

@onurka

Building Global Network Infra at @cloudflare Previously @google, @microsoft, @uber I like photography, cycling, coffee, and traveling.

Sinan Onur Altınuç

@sinanonur

Helping Companies Implement AI Pragmatically | Ex-AI and R&D Lead | AI and Cognitive Science Enthusiast, PhD in Cognitive Science

Abdullatif Köksal

@akoksal_

Research Scientist @GoogleDeepMind | PhD @LMU_Muenchen and @Cambridge_Uni

Hakan Uysal @hakanu_

24 days ago

Using my Asus ROG ally x to run Gemma 4 12b qat. 24gb unified memory, 16g allocated for gpu, 30 token / second (no thinking). Not too bad. I even hooked OpenClaw up with local llm backend through lmstudio with tool calls and telegram handling including transcription of my telegram voice messages.

hakanu_'s tweet photo. Using my Asus ROG ally x to run Gemma 4 12b qat. 24gb unified memory, 16g allocated for gpu, 30 token / second (no thinking). Not too bad.

I even hooked OpenClaw up with local llm backend through lmstudio with tool calls and telegram handling including transcription of my telegram voice messages.

1

3

0

498

Hakan Uysal @hakanu_

9 days ago

a non thinking model (qwen2.5-7b-instruct-4bit) can reach 12.68 tok/sec. easy double.

2

1

0

43

Hakan Uysal @hakanu_

13 days ago

Intelligence is getting more expensive. So I have been doing a study for the last 10d on how the free endpoints work out. Is self hosting LLMs worth it? Asked Claude to build a pool of endpoints: self hosted lmstudio based 4060 TI 16G (w/ google/gemma-4-12b-qat), self hosted lms based ROG Ally X (see below), gemini flash lite free endpoint, cerebras, groq, openrouter free quota. Task: Classify guldumnet posts if there is a need for moderation (offensive content) and do image description to explain the joke hence multi modality is essential. And i have around 500k posts so going over sequentiallybecause my self hosted vram depletes quickly if I go too parallel). With this setup it will still take 38d to finish one pass over all the posts 🤷‍♂️ My non-scientific ranking: 1. Gemini: Responds within a second, no sass, generous free quota, top tier OCR and Turkish understanding. 2. Groq: very fast inference, not so bad free quota (using llama-4-scout-17b-16e-instruct) 3. Cerebras: huge model (gpt-oss-120b) and very fast inference, yet doesn't support images in free mode and free quota is not so generous. 4. 4060: not bad but inference times go up to 11s. gemma-4-12b-qat is a great model for world knowledge. 5. Asus ROG (Steamdeck equivalent): Same gemma4 model, 30s mean response time but very reliable, has been running for 10 days with 0 errors, slow but sure. 6. Openrouter: worst, i even topped up 10 usd to make it pseudo free, still there is no clear guidelines about the rate limits unlike others. I will try more free tiers soon.

hakanu_'s tweet photo. Intelligence is getting more expensive.

So I have been doing a study for the last 10d on how the free endpoints work out. Is self hosting LLMs worth it?

Asked Claude to build a pool of endpoints: self hosted lmstudio based 4060 TI 16G (w/ google/gemma-4-12b-qat), self hosted lms based ROG Ally X (see below), gemini flash lite free endpoint, cerebras, groq, openrouter free quota.

Task: Classify guldumnet posts if there is a need for moderation (offensive content) and do image description to explain the joke hence multi modality is essential. And i have around 500k posts so going over sequentiallybecause my self hosted vram depletes quickly if I go too parallel). With this setup it will still take 38d to finish one pass over all the posts 🤷‍♂️

My non-scientific ranking:

1. Gemini: Responds within a second, no sass, generous free quota, top tier OCR and Turkish understanding.

2. Groq: very fast inference, not so bad free quota (using llama-4-scout-17b-16e-instruct)

3. Cerebras: huge model (gpt-oss-120b) and very fast inference, yet doesn't support images in free mode and free quota is not so generous.

4. 4060: not bad but inference times go up to 11s. gemma-4-12b-qat is a great model for world knowledge.

5. Asus ROG (Steamdeck equivalent): Same gemma4 model, 30s mean response time but very reliable, has been running for 10 days with 0 errors, slow but sure.

6. Openrouter: worst, i even topped up 10 usd to make it pseudo free, still there is no clear guidelines about the rate limits unlike others.

I will try more free tiers soon.

Hakan Uysal @hakanu_

24 days ago

Using my Asus ROG ally x to run Gemma 4 12b qat. 24gb unified memory, 16g allocated for gpu, 30 token / second (no thinking). Not too bad. I even hooked OpenClaw up with local llm backend through lmstudio with tool calls and telegram handling including transcription of my telegram voice messages.

1

3

0

498

0

1

287

Hakan Uysal @hakanu_

13 days ago

it worked, we are so back

0

15

Hakan Uysal @hakanu_

14 days ago

Ethernet port was loose and it was not reaching 1000mb/s and it was stuck at 100mb/s, needed some extraordinary measures. (Yes old laptop as Media server, because non-arm processors are good with x265)

hakanu_'s tweet photo. Ethernet port was loose and it was not reaching 1000mb/s and it was stuck at 100mb/s, needed some extraordinary measures.

(Yes old laptop as Media server, because non-arm processors are good with x265) https://t.co/zwqUKdqyH2

1

0

67

Hakan Uysal @hakanu_

22 days ago

@mertcobanov @aivilope Ben de :) kolay gelsin, güzel projeler, keep it up 🔥

0

29

Hakan Uysal @hakanu_

22 days ago

@mertcobanov @aivilope Ben de https://t.co/g21onL4CMz için öyle başlamıştım 😊 zor iş, kolay gelsin

1

0

39

Hakan Uysal @hakanu_

24 days ago

What Fable (almost) one-shot today: get all my emails in gmail and download their attachments and make everything searchable including pdf to markdown conversion and build a ui on top so that i can search and view things. 4h later:

hakanu_'s tweet photo. What Fable (almost) one-shot today: get all my emails in gmail and download their attachments and make everything searchable including pdf to markdown conversion and build a ui on top so that i can search and view things.

4h later: https://t.co/6bzgh3Mbuv

0

1

0

75

Hakan Uysal @hakanu_

26 days ago

My cursor usage for $20 plan 🍚

1

0

81

Hakan Uysal @hakanu_

26 days ago

Gifting gold at the weddings ❌ - Gifting Lego at the kid birthdays ✅

0

2

0

95

Hakan Uysal @hakanu_

27 days ago

Obviously @opencode zen has free models and OC is a great harness with very fast loading times, with deepseek-v4-flash you can one shot a lot of features for free.

0

53

Hakan Uysal @hakanu_

3 months ago

if you seek excitement in life: alias gemini="gemini --yolo"

1

0

77

Hakan Uysal @hakanu_

27 days ago

1. Claude code (cc) for _serious_ projects like https://t.co/Ajtv25wgjU (pun) 2. Cursor CLI (agent) for everything else, Composer 2.5 is extremely powerful and fast. Generous limits for pro and you will have access to Opus models. 3. Antigravity CLI (agy) still requires some ironing for being a daily driver, nice free alternative. 4. Gemini CLI (gcli) for troubleshooting prod, being my SRE to keep things running.

1

0

120

Hakan Uysal @hakanu_

30 days ago

For factual up-to-date text generation tasks: good budget king has just been released: google/gemma-4-12b. Disable thinking and reach incredible token / sec speeds. Not for coding though, qwen is still beating there.

Hakan Uysal @hakanu_

3 months ago

Ai edge gallery running gemma4-e2b on pixel 10: getting close to 5-6 token / second. Works fully offline. It also has image understanding capability. Probably going to move guldumnet's image understanding pipeline to a local model instead gcloud vision api.