Alexander Young @Alex0nder - Twitter Profile

Pinned Tweet

20 days ago

Updated card: compression = API input tokens (80× Oiloop), not core-char ratio (112×). Same results, clearer metric labels after internal audit. https://t.co/waS2U5nz90

Alex0nder's tweet photo. Updated card: compression = API input tokens (80× Oiloop), not core-char ratio (112×).
Same results, clearer metric labels after internal audit.

https://t.co/waS2U5nz90 https://t.co/G7H67qjUas

0

14

Alexander Young @Alex0nder

13 days ago

@dexhorthy @swyx @RLanceMartin @Cursor npx context-os init --profile saas --cursor-rule Paper + raw runs in repo. Exploratory eval — limitations in SYNTHESIS.md.

0

17

Alexander Young @Alex0nder

13 days ago

A vs B vs C on 149 decision questions, 4 codebases. Routed domain cores (B) beat full repo (A). Up to 38× fewer tokens. Oiloop: B 2.75 vs A 0.75, 0% hallucination. https://t.co/togwA3iomD

Alex0nder's tweet photo. A vs B vs C on 149 decision questions, 4 codebases.

Routed domain cores (B) beat full repo (A). Up to 38× fewer tokens.

Oiloop: B 2.75 vs A 0.75, 0% hallucination.

https://t.co/togwA3iomD https://t.co/yybGWop9vf

1

0

10

Alexander Young @Alex0nder

13 days ago

Open replication — A/B/C vs full-repo + graph. Exploratory, LLM-as-judge. Feedback welcome if you work on context routing / agent context. cc @dexhorthy @swyx @RLanceMartin @cursor https://t.co/togwA3iomD

1

0

20

Alexander Young @Alex0nder

20 days ago

Measured context engineering on 4 real codebases (149 Q). Full repo vs routed cores vs graph - numbers in the card. Open eval + raw runs https://t.co/togwA3iomD @dexhorthy @swyx @RLanceMartin @levie @hwchase17 curious if this matches your prod experience? validity-audit.md

Alex0nder's tweet photo. Measured context engineering on 4 real codebases (149 Q). Full repo vs routed cores vs graph - numbers in the card. Open eval + raw runs https://t.co/togwA3iomD

@dexhorthy
@swyx
@RLanceMartin

@levie
@hwchase17
curious if this matches your prod experience? validity-audit.md https://t.co/N39D3s4g0d

0

1

0

34

Alexander Young @Alex0nder

20 days ago

Updated card: compression = API input tokens (80× Oiloop), not core-char ratio (112×). Same results, clearer metric labels after internal audit. https://t.co/waS2U5nz90

0

14

Alexander Young @Alex0nder

20 days ago

149 Q · 4 codebases · A/B/C eval (gpt-4o-mini) Full repo vs domain cores vs graph retrieval. Keyword router on every project. B ≥ A on all 4: MailAgent +21%, Oiloop +5%. Up to 112× compression. Open protocol + raw runs ↓

Alex0nder's tweet photo. 149 Q · 4 codebases · A/B/C eval (gpt-4o-mini)

Full repo vs domain cores vs graph retrieval.
Keyword router on every project.

B ≥ A on all 4: MailAgent +21%, Oiloop +5%.
Up to 112× compression.

Open protocol + raw runs ↓ https://t.co/FKKQL5oeLI

1

0

36

Alexander Young @Alex0nder

20 days ago

Reports: https://t.co/togwA3iomD · PHASE-2-RESULTS.md · PHASE-3-RESULTS.md · experiments/ (reproducible runs) Caveats: LLM-as-judge, Oiloop N=20, expert prefs from decode — not fresh blind humans. #ContextEngineering #RAG

Alex0nder's tweet photo. Reports:
https://t.co/togwA3iomD

· PHASE-2-RESULTS.md
· PHASE-3-RESULTS.md
· experiments/ (reproducible runs)

Caveats: LLM-as-judge, Oiloop N=20, expert prefs from decode — not fresh blind humans.

#ContextEngineering #RAG https://t.co/kSyfXJJi2m

1

0

9

Alexander Young @Alex0nder

20 days ago

139 Q · 4 codebases · A/B/C Cores beat full repo on 3 OSS projects (+19–24%, up to 45×). Private macOS app: first counterexample — cores lost; graph won. Partially supported. https://t.co/togwA3iomD #ContextEngineering

0

8

Alexander Young @Alex0nder

20 days ago

139 Q, 4 codebases: full repo (A) vs domain cores (B) vs graph (C). 3 OSS projects: cores beat full repo (+19–24% accuracy, 14–45× compression). Phase 3 private macOS: first case where B accuracy < A. Partially supported.

Alex0nder's tweet photo. 139 Q, 4 codebases: full repo (A) vs domain cores (B) vs graph (C).

3 OSS projects: cores beat full repo (+19–24% accuracy, 14–45× compression).

Phase 3 private macOS: first case where B accuracy < A.

Partially supported. https://t.co/mvSAFyBK8n

1

0

9

Alexander Young @Alex0nder

20 days ago

Oiloop (private macOS, 20 Q): A 1.20 · B 1.05 · C 1.55 Hallucination B 25% · C 15% B: 83× compression, ~3× faster Cores aren't universal. On integrated codebases, graph (C) wins. MailAgent → B. Django, Navorina, Oiloop → C.

Alex0nder's tweet photo. Oiloop (private macOS, 20 Q):

A 1.20 · B 1.05 · C 1.55
Hallucination B 25% · C 15%
B: 83× compression, ~3× faster

Cores aren't universal. On integrated codebases, graph (C) wins.

MailAgent → B. Django, Navorina, Oiloop → C. https://t.co/mPvSDzYcxf

1

0

9

Alexander Young @Alex0nder

20 days ago

Details: • Phase 3 report: …/oiloop-phase-3.md • Raw run: …/run-1781225808172 • Paper draft incoming

0

6

Alexander Young @Alex0nder

20 days ago

Phase 3 done — 4 codebases, 139 eval questions (gpt-4o-mini). We compared full repo vs routed context cores vs code graph retrieval. New: private macOS Swift app (Oiloop) — 81k → 979 tokens, ~3× faster. https://t.co/togwA3iomD #ContextEngineering #LocalLLM #OpenSource

Alex0nder's tweet photo. Phase 3 done — 4 codebases, 139 eval questions (gpt-4o-mini).

We compared full repo vs routed context cores vs code graph retrieval.

New: private macOS Swift app (Oiloop) — 81k → 979 tokens, ~3× faster.

https://t.co/togwA3iomD

#ContextEngineering #LocalLLM #OpenSource https://t.co/QOYWxrlRoI

1

0

29

Alexander Young @Alex0nder

20 days ago

Oiloop (Phase 3) honest take: • B (cores): 83× compression, fastest — but accuracy 1.05 vs 1.20 full repo • C (graph): best accuracy 1.55, 15% hallucination Rule of thumb: local/Ollama: route cores hard cross-cutting system Qs: graph Not a silver bullet. A measured tradeoff.

Alex0nder's tweet photo. Oiloop (Phase 3) honest take:
• B (cores): 83× compression, fastest — but accuracy 1.05 vs 1.20 full repo
• C (graph): best accuracy 1.55, 15% hallucination
Rule of thumb:
local/Ollama: route cores
hard cross-cutting system Qs: graph
Not a silver bullet. A measured tradeoff. https://t.co/R1UFMcu8YK

1

0

11

Alexander Young @Alex0nder

21 days ago

@pdlug You're right, NV16-C is it. Zero endpoints, no halluc flag. B named them.Binary judge rewards vagueness. Claim grounding next. D is the real path, not separate B/C baselines

0

1

0

4

Alexander Young @Alex0nder

22 days ago

I tested 3 ways to feed context to coding agents: A) dump the whole repo B) route to domain-specific "context cores" C) graph retrieval 119 questions / 3 codebases / gpt-4o-mini Results 👇

Alex0nder's tweet photo. I tested 3 ways to feed context to coding agents:

A) dump the whole repo
B) route to domain-specific "context cores"
C) graph retrieval

119 questions / 3 codebases / gpt-4o-mini

Results 👇 https://t.co/hMSAKENSyz

2

0

45

Alexander Young @Alex0nder

22 days ago

@pdlug Happy to run the same A/B/C on your codebase if you want OSS repo or a scoped slice works. 35 Q, ~$0.50 in API costs

0

5

Alexander Young @Alex0nder

22 days ago

@pdlug No. Cores won on accuracy (+20%) and cost (45× fewer tokens) Graph won on hallucination (14% vs 23%), but accuracy was = full repo, not better than cores Tradeoff - cores for speed/accuracy and graph when trust matters more

1

0

9

Alexander Young @Alex0nder

22 days ago

Full report + open eval protocol + raw runs: https://t.co/iAlvif4kaM Repo: https://t.co/yU63pxvRJm @swyx - does this match your context engineering tradeoffs? @cursor_ai - curious if cores vs graph maps to your prod pipeline. #AIAgents #ContextEngineering #RAG #BuildInPublic

0

1

0

18

Alexander Young @Alex0nder

22 days ago

The tradeoff: B (cores) = best accuracy + cheapest → 19–23% hallucination C (graph) = lower accuracy → 7–14% hallucination Complex codebases → C for production. Narrow domains → B wins.

Alex0nder's tweet photo. The tradeoff:

B (cores) = best accuracy + cheapest → 19–23% hallucination
C (graph) = lower accuracy → 7–14% hallucination

Complex codebases → C for production.
Narrow domains → B wins. https://t.co/qidlERa4aT

2

0

15

Alexander Young

@Alex0nder

Last Seen Users on Sotwe

Trends for you

Most Popular Users