Andrey Oblivantsev

@eSlider

Software Engineer

Santa Cruz de Tenerife

Joined May 2009

40 Following

17 Followers

46 Posts

eSlider retweeted

Palantir @PalantirTech

2 days ago

Our thoughts on the importance of AI sovereignty. 1. Your AI sovereignty dictates your institution’s future. Sovereignty is the precondition for choice. Relinquishing sovereignty transfers the future choices of your institution to others, who are likely to exploit it for their gain and your loss. 2. Data retention is your treasure. Transfer it at your own peril. Your ability to win is dictated by your ability to recognize and use your unique edges, and you keep winning by compounding the underlying data to generate new insights. Transferring that data hands over access to your pre-existing winning plays and yields the means of production for new ones. 3. Tokenmaxxing hijacks your value orientation and decreases your institutional fortitude and intelligence. The pursuit of high token usage incentivizes disposable scripts over robust software — with the addictive feeling of false progress. There is a reason why those selling tokens refuse to charge based on value. 4. Controlling your weights is controlling your fate. Weights are the distilled form of hard-won, accumulated institutional knowledge. If you let others control your weights, you are allowing them to migrate the alpha of your business to theirs. 5. There is no contradiction between sovereignty and alpha. The architecture that maximally preserves sovereignty is one that enables institutions to own their tribal knowledge, and to compound it as alpha. 6. Politicizing the technical issues involving sovereignty is what your adversary wants. Techno-politicization is the wellspring of false sovereignty. Techno-politicization drives decisions that seem to reduce dependency, but ultimately limit agency — especially on the battlefield in the West. 7. Real expertise is existential. Allowing politics or favoritism to determine your technical decisions rewards whoever is best at politics, not whoever is right. Listen to those closest to the problems, not those speaking most compellingly about them. 8. Learn from institutions that are winning or that have consistently delivered. Institutions facing existential threats do not have the luxury of making technical decisions based on political preferences. 9. Only listen to institutions, countries, and people who have a proven record of being right. A track record of correctness is the best and only signal for future correctness. Judging something as right or wrong based on who you like is exceedingly misguided.

548

Andrey Oblivantsev @eSlider

3 days ago

@Samaytwt 100% Lenovo Legion driven by linux

139

Andrey Oblivantsev @eSlider

3 days ago

After 20+ years shipping web, GIS, data platforms, and self-hosted AI, I consolidated everything into one technical blog: https://t.co/JHLyeWWkzi #Data #DevOps #GraphRAG #SelfHostedAI #Matrix #WebRTC #GoLang #OpenSource #RemoteWork #Architecture #KnowledgeGraph #Platform

eSlider retweeted

dax

@thdxr

3 days ago

holy crap

311

368K

Who to follow

Wizard of LatLng

@_papalapap_

Geospatial Wizardry. Lazy Running. Coffeneuring. Berlin, London, irgendwo im Nirgendwo.

Andrey Oblivantsev @eSlider

7 days ago

@Taniyatweets_ BASIC

Andrey Oblivantsev @eSlider

7 days ago

@SandraRodkey traumerika?

Andrey Oblivantsev @eSlider

7 days ago

@DeepStarts All of them. There is no users, no matter who say what then.

172

Andrey Oblivantsev @eSlider

7 days ago

Ship once, run everywhere serverless(AWS+GCP+Azure). LAMBADA: shared Python in src/. No VMs, no NAT. Webhooks, blob storage, peer sync, DuckDB on S3. $0 on free tier if you stay within limits. https://t.co/t2b1c2XuJ4 #serverless #python #aws #gcp #azure

Andrey Oblivantsev @eSlider

10 days ago

@MerlijnTrader Is double spend tx issue solved?

eSlider retweeted

regent0x

@regent0x_

13 days ago

$4,500/month from one sentence: “your documents go nowhere” a law firm’s managing partner couldn’t explain where cloud AI sends their privileged files every vendor he asked said “it’s secure” but none could say where the data actually went the consultant set a 58-watt box on the table and said “with this, the answer is nowhere” deal signed before he left the building the partner had been stuck for months. associates wanted AI tools, he couldn’t approve anything because nobody could tell him where the data physically lived every vendor gave the same non-answer: “it’s encrypted, it’s compliant, it’s secure” none of them could say where the consultant didn’t pitch features. he brought a cluster that draws less power than a lightbulb and answered the one question every regulated client actually loses sleep over the demo: → four small mainboards clustered together, pulling 58 watts total → runs a 70B model entirely offline → indexed on the firm’s own case files → live power monitor showing it sips less than a desk lamp the partner asked what he’d asked every vendor: “where do our documents go when we use this?” consultant pointed at the box: “nowhere. they never leave this machine. i can’t see them. the manufacturer can’t. no cloud company can. there’s no server to breach because there’s no server” that was the entire pitch the firm had privileged client documents, sealed settlements, strategy memos - the kind of data that ends careers if it leaks. one breach and they’re explaining to clients why confidential files were on someone else’s servers the box removed the question entirely you can’t leak what never leaves the building the numbers: → hardware cost: ~$2,000 → setup: one day on-site → the deal: $4,500 setup + $4,500/month support the partner signed because for the first time someone gave him an answer he could repeat to his own people without lying “it’s in our server closet, nobody else can touch it” that sentence ends the conversation every time the consultant now has 8 firms on monthly contracts every one came from a partner who couldn’t sleep over the cloud question he doesn’t sell hardware he sells the ability to say “nowhere” and mean it

607

983

380K

Andrey Oblivantsev @eSlider

14 days ago

Generate typed models from JSON or Schema entirely in your browser. No server, No uploads, No account. Paste sample data, pick a language (Go, TypeScript, C#, Rust, Python…), get codegen instantly: https://t.co/HJ2BCfVpAU

eSlider retweeted

Charly Wargnier

@DataChaz

17 days ago

🚨How do you index the entire Linux kernel (28M lines of code) for an AI agent in 3 minutes? You stop letting the agent read files one by one. There is a fascinating new open-source release called codebase-memory-mcp. It's a code intelligence engine that swaps traditional file-searching for high-speed AST knowledge graphs. What makes this project stand out is the research behind it. Evaluated across 31 real-world repositories (detailed in arXiv:2603.27277), the architectural shift yields massive efficiency gains: → 99% reduction in tokens for structural queries → 83% answer quality across complex tasks → 2.1x fewer tool calls required It maps functions, classes, HTTP routes, and cross-service links into a graph. When the agent needs context, it queries the graph directly. Security is prioritized too: everything happens 100% locally on your machine via a single static binary. It runs entirely locally. No Docker, no Ollama, no API keys. You download the binary, restart your agent, and it just works. Are we one good index away from cutting AI dev costs to zero? Paper and Repo links in the thread ↓

DataChaz's tweet photo. 🚨How do you index the entire Linux kernel (28M lines of code) for an AI agent in 3 minutes?

You stop letting the agent read files one by one.

There is a fascinating new open-source release called codebase-memory-mcp.

It's a code intelligence engine that swaps traditional file-searching for high-speed AST knowledge graphs.

What makes this project stand out is the research behind it.

Evaluated across 31 real-world repositories (detailed in arXiv:2603.27277), the architectural shift yields massive efficiency gains:
→ 99% reduction in tokens for structural queries
→ 83% answer quality across complex tasks
→ 2.1x fewer tool calls required

It maps functions, classes, HTTP routes, and cross-service links into a graph. When the agent needs context, it queries the graph directly.

Security is prioritized too: everything happens 100% locally on your machine via a single static binary.

It runs entirely locally.

No Docker, no Ollama, no API keys.

You download the binary, restart your agent, and it just works.

Are we one good index away from cutting AI dev costs to zero?

Paper and Repo links in the thread ↓

169

86K

eSlider retweeted

AlexAImaginator

@TraffAlex

18 days ago

🖥️ Best Local LLMs for Consumer GPUs — llama.cpp Guide (June 2026) What I actually run on consumer hardware right now. Every model below runs via llama.cpp with a simple one-liner — no Docker, no Python env, no cloud. ━━━ 8-16GB VRAM ━━━ 🔹 Gemma 4-12B (Google) • Smartest model in this size class — competes with stuff 2× bigger • Unsloth's MTP GGUFs: 162 tok/s vs 52 tok/s normal (3× speedup) • Minimum 8GB VRAM recommended for Q4_K_M quant • GGUF → https://t.co/VWp818MB3D 🔹 LFM2.5-8B-A1B (LiquidAI) • Hybrid MoE, only 1B active params — absurdly fast for its size • Perfect for 8-12GB cards, MacBooks, or anyone on a tight budget • GGUF → https://t.co/ZbOs4mXJDq ━━━ 16-32GB VRAM ━━━ 🔹 Qwen3.6-27B (Qwen) • Scored 1.00 on tool-efficiency benchmarks — best local agent available • 40 deterministic tasks, 32k/128k context needle tests — all passed • GGUF → https://t.co/n7K3sPvliE • MTP version (faster) → https://t.co/gwdfnJTzcy 🔹 Qwopus3.6-27B-v2 (Jackrong) • Best quantization of Qwen3.6-27B — topped 5 agent & coding benchmarks (1200 samples) • If you're running Q4, this is the one to grab • GGUF → https://t.co/tV1DFqXnOD • MTP version → https://t.co/PMqz7V5ewv 🔹 Gemma 4-31B QAT (Google/Unsloth) • QAT variant with MTP draft head: 76-125 tok/s (1.67× speedup) • Excellent for multi-agent / subagent workflows • GGUF → https://t.co/FgVsUX0YOB 🔹 Nex-N2-Mini (Nex AGI) • Post-train of Qwen3.5-35B-A3B — MoE with only 3B active params • Fits on 16GB+ VRAM, overflow loads from system RAM • Adaptive thinking saves ~20% tokens with no quality loss • For deep multi-step reasoning, nothing in this size comes close • GGUF → https://t.co/oyC522a8Eh ━━━ Quick Picks ━━━ • 16GB all-rounder → Gemma 4-12B with MTP GGUFs • 32GB all-rounder → Qwen3.6-27B / Qwopus-v2 • Agents & tool use → Qwen3.6-27B or Qwopus Q4 • Deep reasoning → Nex-N2-Mini (MoE, fits 16GB+) • Tight budget → LFM2.5-8B-A1B • Cheapest full build: 1× used RTX 3090 (24GB) + rest of PC ≈ $1000-1500 ━━━ Setup on Windows ━━━ 1. Download llama.cpp → https://t.co/et0J7Swua7 (latest .zip) 2. Extract to any folder (e.g. C:\llama.cpp) 3. Download a .gguf from the links above (Q4_K_M or Q5_K_M for best quality/speed balance) 4. Run one of the commands below depending on your hardware ━━━ Launch Commands ━━━ SINGLE GPU — Standard model (no MTP): llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja SINGLE GPU — MTP model (faster inference): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU — Split across two cards: llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ --tensor-split 0.55,0.45 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU + MTP + Vision (multimodal): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ --tensor-split 0.60,0.40 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja ^ --mmproj C:\models\mmproj-F16.gguf ━━━ Parameter Breakdown ━━━ -m <path> Path to your .gguf model file. Change this to wherever you downloaded it. --ctx-size 180000 Context window in tokens. 180k = huge context for long conversations or big codebases. Reduce to 32768 or 65536 if you don't need long context — uses less VRAM. --flash-attn on Flash Attention — dramatically speeds up inference and reduces VRAM usage. Works on RTX 30xx/40xx/50xx. Always enable this. --cache-type-k q4_0 / --cache-type-v q4_0 Quantizes the KV cache (key/value attention cache) to 4-bit. This is what makes 180k context fit in VRAM. Without it, huge contexts eat all your memory. Quality impact is minimal — this is a free performance win. --batch-size 1024 / --ubatch-size 512 batch-size = how many tokens are processed in one forward pass (throughput). ubatch-size = micro-batch actually sent to the GPU per step. Higher = faster prompt processing but needs more VRAM. If you run out of VRAM, lower these (e.g. 512/256). -ngl 100 Number of layers to offload to GPU. 100 = all layers on GPU (full offload). This is what you want if the model fits in your VRAM. If it doesn't fit, reduce this (e.g. -ngl 40) — remaining layers run on CPU/RAM. --tensor-split 0.55,0.45 How to split model layers across multiple GPUs. Values are ratios. 0.55,0.45 = GPU 0 gets 55% of layers, GPU 1 gets 45%. Adjust based on your VRAM — give more to the card with more memory. Example: 0.70,0.30 for a 24GB + 12GB setup. Not needed for single GPU setups. --main-gpu 0 Which GPU handles the batch computation (the "orchestrator"). Set to 0 (your primary GPU). The other GPU(s) handle their assigned layers. Minor performance impact — usually just leave it at 0. -np 1 Number of parallel slots (concurrent requests). 1 = one user at a time. Increase to 2-4 if you want multiple clients connected simultaneously. Each extra slot uses additional VRAM for its own KV cache. --port 8080 Which port the server listens on. Change if port 8080 is busy. --jinja Enables Jinja2 template processing — required for proper chat formatting. Most modern models expect this. Always include it. --spec-type draft-mtp Enables Multi-Token Prediction (MTP) speculative decoding. Only works with MTP GGUF models (downloaded separately). The model predicts multiple tokens at once and verifies them — big speed boost. --spec-draft-n-max 3 How many tokens the MTP draft head proposes per step. 3 is a good default. Higher = potentially faster but more VRAM and may reduce quality. --mmproj <path> Path to the multimodal projector file (for vision models). Enables image understanding — paste screenshots into the web chat. Only needed if you want vision capabilities. Omit for text-only use. ━━━ Your Hardware → Your Command ━━━ Single GPU (8-24GB VRAM): Use the "Single GPU" command. Change -m to your model path. 8GB card → Gemma 4-12B Q4 or LFM2.5-8B 12GB card → Gemma 4-12B Q5/Q6 16GB card → Gemma 4-31B QAT Q4 or Nex-N2-Mini 24GB card → Qwen3.6-27B Q4/Q5, Qwopus-v2, Gemma 4-31B QAT Q5/Q6 Dual GPU: Use the "Dual GPU" command. Adjust --tensor-split based on your VRAM ratio. 24GB + 24GB → --tensor-split 0.50,0.50 24GB + 12GB → --tensor-split 0.70,0.30 24GB + 8GB → --tensor-split 0.75,0.25 Want speed? Use MTP versions of models with the "MTP" commands. Want vision? Add --mmproj with the projector file from the model's HuggingFace repo. 5. Once running, you get: • Web chat UI → http://localhost:8080 • OpenAI-compatible API → http://localhost:8080/v1 • Playground → http://localhost:8080/playground ━━━ Why /v1 API Is the Killer Feature ━━━ One local endpoint replaces your entire cloud API bill. The /v1 endpoint is drop-in OpenAI-spec compatible — every tool that speaks OpenAI just works. No custom code, no glue layer. Works out of the box with: • IDEs: Cursor, Continue, Windsurf, Cline, Roo Code • CLI tools: aider, Open Interpreter, OpenCode • Frameworks: LangChain, LlamaIndex, LiteLLM • Any OpenAI SDK (Python, Node, Go, Rust) Why this beats cloud APIs: • 100% private — code never leaves your machine • $0 per token — no rate limits, no quotas, no surprise bills • Works fully offline • Zero telemetry, no training on your data • Swap models by dropping in a different .gguf — no app changes needed • Run 32k–128k context windows without burning money Good combos: • Cursor + Qwopus-v2 → near-frontier quality, zero API cost • Continue + Qwen3.6-27B → best local coding agent • aider + Gemma 4-12B MTP → 162 tok/s, feels instant • OpenCode + Nex-N2-Mini → deep reasoning on 16GB Set any OpenAI-compatible client to your local endpoint: set OPENAI_API_KEY=sk-dummy (any non-empty string works) set OPENAI_BASE_URL=http://localhost:8080/v1 # every OpenAI-compatible tool now hits your local GPU Shoutouts: @0xSero @rS_alonewolf @witcheer @UnslothAI @LottoLabs

206

292K

Andrey Oblivantsev @eSlider

18 days ago

@nalinrajput23 Could you imagine for some, programming is a enjoying tool, not a forced action for the sake of money?

401

Andrey Oblivantsev @eSlider

18 days ago

@BrooksWhaleX Yeah, even AMD Ryzen AI 9 HX 370(Radeon 890M) is good enough to launch own open-code agent with GPT OSS 20b for leess then 1.000$. See: https://t.co/1SFqO8fED4

149

Andrey Oblivantsev @eSlider

19 days ago

@aditiitwt For what? gemma4 is nice for general human tasks, as example.

Andrey Oblivantsev @eSlider

19 days ago

@DataChaz @karpathy Same idea https://t.co/63uUTgujiX

112

eSlider retweeted

Xenova

@xenovacom

20 days ago

I gave Fable 5 one job: write custom WebGPU kernels for Gemma 4 inference. It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible. Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s. The next day, access to Fable 5 was suspended globally.

148

376

Andrey Oblivantsev @eSlider

19 days ago

Benched RTS TTS on AMD Ryzen AI 9 HX 370 CPU vs. GPU, vulkan vs llama.cpp (very fresh) https://t.co/mMPr0LqPyJ #amd #GPUs #igpu #bench

Andrey Oblivantsev @eSlider

19 days ago

I could support you on the move to linux, event if you need your need to preserver your windows/mac software incl. current os.

starmex

@starmexxx

19 days ago

AMD CEO LISA SU HELD A MINI PC ON STAGE THAT RUNS A 235B MODEL AND REPLACES YOUR $440/MONTH AI STACK amd's ryzen ai max+ 395 is the first x86 chip that runs a 200 billion parameter model on one piece of silicon. cpu and gpu share 128gb of unified memory, no separate graphics card needed the gmktec evo-x2 runs qwen3 235b fully, deepseek v3 comfortably and llama 3.3 70b with headroom. on linux you get 110gb of usable vram out of 128gb amd claimed the chip beat an nvidia rtx 5080 by more than 3x on deepseek r1 inference. a lunchbox sized pc outrunning a $1,000 discrete gpu on a real ai workload a heavy ai user pays $200 for claude code max, $200 for chatgpt pro, $20 for cursor and $20 for gemini. that's $5,280 a year and the box pays itself off in 9 to 10 months install ollama, pull the model, point claude code at localhost. same interface, nothing leaves the machine, nothing costs per request bookmark this and read the article below

259

529

976K

Andrey Oblivantsev

@eSlider

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users