Omer Bas

@omerbasdev

Testing AI tools for real work. Models, builders, and what actually ships. No hype, just what I tried.

Joined October 2023

68 Following

11 Followers

155 Posts

Omer Bas

@omerbasdev

5 days ago

Everyone pipes docs through markitdown to feed LLMs, and on clean born-digital files it nails it. But it shreds tables column by column and skips headings. As one HN dev put it, it "just pulls the plaintext." Got tables or scanned PDFs? That's Docling's job, not markitdown's.

Omer Bas

@omerbasdev

6 days ago

@idavidrein The CoT-access part is doing a lot of work. Anthropic's own faithfulness work showed a model's chain-of-thought doesn't reliably track what it computed. So reading the reasoning is a weaker control signal than it sounds.

199

Omer Bas

@omerbasdev

6 days ago

@natolambert The Claude Code vs Claude app split is the real tell. Same model, opposite behavior. Maybe the harness isn't adding independence, just removing chat's reason to stop at one answer. Laziness as a default, not a trait.

429

Omer Bas

@omerbasdev

6 days ago

@bindureddy Token consumption catching up is an adoption signal, not a parity one. Open models clear the easy 50% fine. The gap comes back the moment a task has to hold state across many steps.

264

Omer Bas

@omerbasdev

6 days ago

Every shipped compound-engineering as a Claude Code plugin: 37 skills, 51 agents, one inversion. Spend 80% of the work planning and reviewing, 20% writing code. Users praise the multi-agent review pass. The honest catch reviewers flag: solid scaffolding, not a new idea.

Omer Bas

@omerbasdev

6 days ago

@arvidkahl The empty middle tracks. MCP doesn't delete the integration cost, it moves it into an auth surface you operate. That's a surface enterprises can staff and solos can ignore. The medium agency can do neither, so it waits.

Omer Bas

@omerbasdev

6 days ago

@ozansihay Kaybolan detaylar her karede aynı yerde mi gidiyordu, yoksa kare kare titreşiyor muydu? Yerçekimi değişince nesne sürekliliği video modellerinin en zorlandığı kısım. Flash modelin orada tutarlı kalması umut verici.

Omer Bas

@omerbasdev

7 days ago

@swyx @ErikSchluntz @barry_zyj agentic coding only moved 64 to 69, pretty modest. the bigger 4.8 unlock for agents is calibration. it flags uncertainty instead of guessing wrong confidently, which is what actually survives a long autonomous run.

Omer Bas

@omerbasdev

7 days ago

@abidlabs @huggingface The real win isn't speed, it's killing idle-runner cost on bursty CI. The catch is cold start. Booting a GPU and pulling the container each run eats wall-time, so caching matters way more than on always-on runners.

159

Omer Bas

@omerbasdev

7 days ago

@rileybrown Generating the UI was never the bottleneck. The hard part is the permission boundary to all your tools and data that a static app encodes today. Handing an agent live write-access on demand is the piece nobody's solved.

Omer Bas

@omerbasdev

7 days ago

@dair_ai The wake decision was never a reasoning task, it was always perception dressed up as one. That's why a tiny encoder wins: an LLM was always the wrong tool for what is really just classification.

132

Omer Bas

@omerbasdev

7 days ago

@AlphaSignalAI Agents trusting a SKILL.md because it sits on local disk is the same mistake as trusting any tool description from an MCP server. The text is attacker-controllable. A registry just ships that risk to everyone at once.

Omer Bas

@omerbasdev

7 days ago

@petergostev Non-monotonic is the signal here. Raw capability rarely dips one version then recovers the next. That pattern usually means a calibration axis moved, like 4.7 overcorrecting on refusals and 4.8 walking it back.

211

Omer Bas

@omerbasdev

7 days ago

@kr0der It's the cache TTL. Resume within ~5 min and it's warm and cheap. Come back hours later and the whole conversation re-reads cold. That's why yours stayed cheap while long idle sessions get burned.

108

Omer Bas

@omerbasdev

7 days ago

@badlogicgames Zero-Python on device is underrated, no Python toolchain to ship or break. The real wall for voice agents is round-trip latency. Streaming LLM tokens into qwen3-tts, or waiting for the full response before it speaks?

458

Omer Bas

@omerbasdev

7 days ago

@theo Did they break down whether the efficiency win is fewer output tokens or faster inference? Those pull cost in very different directions once you're running agent loops.

Omer Bas

@omerbasdev

7 days ago

Feels like half these 'company gave up on AI' stories trace back to one anonymous quote nobody re-checks. Funny how the failure version always spreads faster than the boring 'it works fine' one.Feels like half these 'company gave up on AI' stories trace back to one anonymous quote nobody re-checks. Funny how the failure version always spreads faster than the boring 'it works fine' one.

777

Omer Bas

@omerbasdev

8 days ago

@anilevci_ Opus’un farkı klonu koşturup varsayımını test etmesi olabilir. GPT 5.5 muhakemede değil, doğrulamadan koda geçtiği için varsayımda kalmış olabilir. Aynı senaryoda GPT’ye önce doğrulama yazdırsan fark kapanır mı?

112

Omer Bas

@omerbasdev

8 days ago

@_catwu If it "strictly follows" the plan, what happens when an early stage's output invalidates a later step? Does it halt, or re-plan mid-run? Strict ordering and adaptivity usually pull against each other.

Omer Bas

@omerbasdev

8 days ago

Claude'un yeni modeli Opus 4.8 bugün çıktı, fiyat aynı. Ama asıl haber sıralama tablosu değil: model artık kendi yazdığı koddaki hatayı 4 kat daha az kaçırıyor, ne kadar kafa yoracağını da sen ayarlıyorsun. Daha zeki model her zaman senin işine yarayan model demek değil.

Omer Bas

@omerbasdev

Last Seen Users on Sotwe

Trends for you

Most Popular Users