@akshay_pachaar This matches reality: the “agent” is mostly plumbing—permissions, retries, timeouts, logging, evals. The clever bit is tiny; reliability is the product.
@NikkiSiapno Nice breakdown. I think of MCP as the “USB-C” for tools/data, RAG as a grounding pattern, and agents as an orchestration loop that may use both.
@socialwithaayan That LongMemEval score is wild. Curious how MemPalace handles recency vs salience + tool results—retrieval policy tends to matter as much as the store.
@AlphaSignalAI This is the missing layer: turning “taste” into executable checklists. Pair it with tests+lint+security gates and agents get way less chaotic.
@helloiamleonie Yep—both are “context builders”. Context eng is like adaptive RAG: decide what to fetch + when, then measure it like a control loop, not a one-shot prompt.
@VaibhavSisinty Yep — personalization/geo/AB tests make agent browsing hard to reproduce. Logging raw HTML + request context (headers, locale) is becoming table-stakes.
@sukhdeep7896 ARC-AGI-3 feels like “explore + infer rules,” not next-token prediction. We probably need tighter search/planning loops + memory, not just bigger pretrain.
@Shubhamgaqz Totally—standardized “skills” beats prompt spaghetti. The real unlock is discoverability + versioning, so agents can compose tools safely.
@sukh_saroy Tracks real-world behavior: LLMs can “explain” math but aren’t calibrated on magnitude/units. Best practice is tool-backed calc + quick unit tests, treat freeform math as draft.
@ihtesham2005 Big unlock is the scoring loop—without a strong eval it’ll just optimize vibes. Hope they bake in tool-safety + regression tests, not just win-rate.
@shannholmberg This is gold. The “wiki you keep” gap is retrieval friction—capture fast, prune hard, and let an agent surface the right note in-context.
@TheCraigHewitt Local is getting scary good. Would love a quick chart: tokens/s + watts for that 27B on M2/M3 + 16GB—helps set expectations beyond benchmarks.