Sachin

@sachindetrax

Building Vramcoder 🔥

Joined August 2021

129 Following

100 Followers

995 Posts

Pinned Tweet

Sachin

@sachindetrax

5 months ago

SWE-Bench is mostly Python. Our codebase is Rails + Phlex + Stimulus. So we built our own SWE-Bench using real PRs. Results 👇 GPT-5.3 Codex: ~0.70 quality, < $1 Opus 4.6: ~0.61 quality, ~ $5 Codex shipped better code at ~1/7th the cost. Opus 4.6 barely improved over 4.5.

Sachin

@sachindetrax

about 10 hours ago

@SlimTradeyBaby i thought 3.7 landed lol

Sachin

@sachindetrax

about 15 hours ago

If you're looking to use Hermes Agent, the best model under 20B right now is Ornith 9B. It's fast, follows tool calls reliably, and works surprisingly well for real-world agent workflows. If you're running locally and want a strong balance of speed and capability, it's an easy recommendation. https://t.co/eiEvm7LX6P

Sachin

@sachindetrax

about 15 hours ago

@MiaAI_lab do you have a set of list of prompts to benchmark models?

Sachin

@sachindetrax

about 22 hours ago

@analogalok I have 5070 ti lemme reach home and try this

288

Sachin

@sachindetrax

1 day ago

@nullfoundry Yes to overcome this i have a plan lets see how much i can cook

Sachin

@sachindetrax

2 days ago

I think the biggest problem with local LLMs on GPUs with <16GB VRAM is the context limit. What if we built a system that continuously indexes the entire repo, builds dependency/call graphs, understands the architecture, docs, workflows, etc., so the model retrieves only the relevant information instead of loading everything into the context? Since 35B local models still aren’t as capable as GPT-4.8 or Sonnet on harder tasks, we could also add a confidence-based research loop. If the model detects it’s stuck or its confidence is low, it automatically researches the missing information, replans, and retries instead of hallucinating. Feels like this could make local vibe coding actually viable on consumer GPUs. Thoughts?

2 days ago

2 days ago

@emilstridell @MiaAI_lab @UnslothAI @NVIDIAAI Can we use it for coding though? Wait i have tweet i need your help there

Sachin

@sachindetrax

2 days ago

@emilstridell @MiaAI_lab @UnslothAI @NVIDIAAI I’m currently running qwen 35b a3b with offloading layers getting 50-60 around tk/s and context 120k, gemma 4 12b sucks hard bro

2 days ago

2 days ago

@Teknium is it possible?

Sachin

@sachindetrax

2 days ago

@Teknium That’s impressive! Congratulations 🙌

113

Sachin

@sachindetrax

2 days ago

@IntCyberDigest Irony how democratic country like USA is increasingly promoting closed-source AI models, while China is driving the open-source AI ecosystem forward by releasing powerful models at remarkably affordable prices.

sachindetrax retweeted

Sachin

@sachindetrax

2 days ago

@0xSero Irony how democratic country like USA is increasingly promoting closed-source AI models, while China is driving the open-source AI ecosystem forward by releasing powerful models at remarkably affordable prices.

2 days ago

2 days ago

@Teknium I wonder if we can one shot a startup idea after sol + fable moA

Sachin

@sachindetrax

2 days ago

@GeorgeChen92 @MiaAI_lab @UnslothAI @NVIDIAAI No 5070ti

Sachin

@sachindetrax

4 days ago

@Hase852 There will be but ig it does the self learning loop thing to get better so its better that way, are you facing context issue?

Sachin

@sachindetrax

6 days ago

If you're planning to code with a local LLM and have 16GB of VRAM or less, Ornith-1.0-35B is the only model I'd confidently recommend. I've tried a lot of local coding models, and this one genuinely stands out. It follows complex instructions, understands large codebases, writes clean, maintainable code, and stays remarkably consistent throughout long coding sessions. It honestly feels like a different class of local coding model. I'm running it on my local machine with an RTX 5070 Ti (16GB VRAM) and 32GB RAM, and it's absolutely rock solid. I'm even using a 90K context window with llama.cpp, and it's handling large repositories and long coding sessions far better than I expected. My current llama.cpp configuration: llama-server.exe ^ -hf "%LLMODEL%" ^ -ngl 999 ^ -fa on ^ --n-cpu-moe 20 ^ -np 1 ^ -c 90000 ^ --no-mmap ^ --cache-type-k q8_0 ^ --cache-type-v turbo3 ^ --temp 0.6 ^ --top-p 0.95 ^ --top-k 20 ^ --min-p 0.05 ^ --presence-penalty 0.0 ^ --chat-template-file .\qwen_fix.jinja ^ --reasoning-budget 2048 ^ --jinja Massive respect to the Ornith team. This model is genuinely something special. https://t.co/G5rGeUJ2zv

$sachindetrax's tweet photo. If you're planning to code with a local LLM and have 16GB of VRAM or less, Ornith-1.0-35B is the only model I'd confidently recommend. I've tried a lot of local coding models, and this one genuinely stands out. It follows complex instructions, understands large codebases, writes clean, maintainable code, and stays remarkably consistent throughout long coding sessions. It honestly feels like a different class of local coding model. I'm running it on my local machine with an RTX 5070 Ti (16GB VRAM) and 32GB RAM, and it's absolutely rock solid. I'm even using a 90K context window with llama.cpp, and it's handling large repositories and long coding sessions far better than I expected. My current llama.cpp configuration: llama-server.exe ^ -hf "%LLMODEL%" ^ -ngl 999 ^ -fa on ^ --n-cpu-moe 20 ^ -np 1 ^ -c 90000 ^ --no-mmap ^ --cache-type-k q8_0 ^ --cache-type-v turbo3 ^ --temp 0.6 ^ --top-p 0.95 ^ --top-k 20 ^ --min-p 0.05 ^ --presence-penalty 0.0 ^ --chat-template-file .\qwen_fix.jinja ^ --reasoning-budget 2048 ^ --jinja Massive respect to the Ornith team. This model is genuinely something special. https://t.co/G5rGeUJ2zv$

475

Sachin

@sachindetrax

Last Seen Users on Sotwe

Trends for you

Most Popular Users