Samuel Fajreldines @devindolar - Twitter Profile

Samuel Fajreldines @devindolar

6 days ago

@LLMJunky They won’t do that. If they do, gpt will bring a competitor and win market share

0

121

Samuel Fajreldines @devindolar

6 days ago

@bridgemindai I doubt they will remove fable from subscription for one simple reason. If they do, OpenAI will lunch a competitor in their subs and win the market.

0

83

Samuel Fajreldines @devindolar

7 days ago

30min on Mythos

0

10

devindolar retweeted

Peter Steinberger 🦞

@steipete

9 days ago

Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.

2K

20K

1K

14K

8M

Samuel Fajreldines @devindolar

10 days ago

@DavidOndrej1 I’d rather to have security to use my bank apps. iOS still #1 for it

0

79

Samuel Fajreldines @devindolar

10 days ago

@cjzafir Apple spent trillions of dollars and couldn’t do that. Awesome work, congratulations!! 👏🏼👏🏼

0

1

0

795

devindolar retweeted

CJ Zafir

@cjzafir

10 days ago

Here's a teaser of our Mac-1 model. > 6.6B model > runs locally (on any Mac) > requires 7GB RAM (12GB ideal) > can use 487 MacOS native tools > perform multi-tool chained tasks > reasoning: ON > output: ~65 tok/s We built a robust application layer around the model to make UI/UX MacOS native. The "model-focused" SaaS era is here. Stay tuned for more.

161

5K

294

5K

1M

devindolar retweeted

diva

@divaagurlxw

13 days ago

As an AI Engineer. Please learn >Harness engineering, not just prompt engineering >Context engineering, not just long prompts >Prompt caching vs. semantic caching tradeoffs >KV cache management, eviction, reuse, and memory pressure at scale >Prefill vs. decode latency and why they optimize differently >Continuous batching, paged attention, and throughput optimization >Speculative decoding vs. quantization vs. distillation tradeoffs >INT8, INT4, FP8, AWQ, GPTQ, and when quantization hurts quality >Structured output failures, schema validation, repair loops, and fallback chains >Function calling reliability, tool contracts, argument validation, and idempotency >Agent guardrails, loop budgets, tool budgets, and termination conditions >Model routing, graceful fallback logic, and degraded-mode UX >RAG architecture: chunking, embeddings, hybrid search, reranking, and freshness >Retrieval evals: recall, precision, grounding, attribution, and citation quality >Evals: golden sets, regression tests, adversarial tests, LLM-as-judge, and human evals >LLM observability as a first-class discipline: traces, spans, tokens, latency, errors, and drift >Cost attribution per feature, workflow, tenant, and user journey not just per model >Safety engineering: prompt injection defense, data leakage prevention, and permission boundaries >Multi-tenant isolation, cache safety, and cross-user context contamination prevention >Fine-tuning vs. in-context learning vs. RAG vs. distillation and when each is the wrong tool >Latency, quality, cost, and reliability tradeoffs across the full inference stack >Production failure modes: hallucinated tool calls, malformed JSON, stale retrieval, runaway agents, and silent eval regressions

106

4K

491

7K

240K

Samuel Fajreldines @devindolar

14 days ago

@TeksEdge OMG. Testing!

0

490

Samuel Fajreldines @devindolar

22 days ago

@leaks_legit And m7 will outperform m6. This is life.

0

1

0

951

Samuel Fajreldines @devindolar

25 days ago

@Its_Nova1012 The same that makes a CEO unreplaceable: To know what to ask.

0

298

Samuel Fajreldines @devindolar

25 days ago

@Im_IrushiK Marketing

0

57

Samuel Fajreldines @devindolar

25 days ago

@Xi_fak3 Because windows is just horrible for software development.

0

18

devindolar retweeted

Daniel Lougen

@DJLougen

25 days ago

A 2.1GB model on my gaming PC CPU just beat a $10M AI model on HumanEval. Here's exactly how: The model: Qwen2.5-Coder-3B-Instruct — 3.1B params from Alibaba, quantized to 4-bit. Downloaded in 30 seconds. The hardware: Intel i9-12900K. No GPU. A $350 consumer CPU. The score: 89.0% (146/164 problems passed) Cohere Command A+: 218B parameters, $10M+ training cost, requires 2x H100 GPUs. Scored 75%. We're +14 points. On a cpu. I resurrected busyBeaver: → Prompt engineering (expert coder framing) → pass@3 retry at 3 temperatures (pushes 80% → 89%) → Code extraction from markdown output → Sandboxed test execution (15s timeout, crash recovery) → Checkpointing (resume from any crash point) The model writes the code. The harness measures it fairly. Eval protocol: Textbook standard. Feed signature + docstring → generate code → run tests → count passes. No tricks. No benchmark training. No contamination. Honest scorecard: - HumanEval: 89% vs 75% ✅ - MBPP: 70% vs 72% - MMLU-Pro: 27% vs 68% ❌ Expected (code model vs knowledge model) You don't need $10M to beat a $10M benchmark. You need a 2GB model + a clean eval harness + a gaming PC. Code: https://t.co/48mhZP7reY

5

163

13

164

8K