Compute isn't the real bottleneck - shortage of talent at the intersection of domain expertise, knowing what can be built and knowing how to build it is - this is where magical experiences get created
@i_mika_el@Kimi_Moonshot Mostly been a combination of client feedback & real production data we can backtest on - but it's far from a solved problem, and QA depends on what exactly you are building & how it will be used
Asking a frontier model to perform a task in its native harness Codex with feedback (which is effectively human-in-the-loop eval building), and then distilling that judgement into prompts/schemas/guardrails for a cheap & fast open weight model like @Kimi_Moonshot's kimi k2.6 is surprisingly effective
Its like a trace driven prompt optimization and teacher-model assisted product development - similar in spirit but less rigorous compared to @DSPyOSS / GEPA based formal optimization flow
So far it's been inside the prompt & tool schema for the cheaper model apart from also being implicit in the invariants in the harness that runs the cheaper model but I feel having it as explicit evals is going to be very useful since that lets you switch the runner models without being too deeply coupled to any one model
@i_mika_el@Kimi_Moonshot The trace is implicit in the yardstick that gets developed with human feedback - if you look at the reasoning it gives while guiding the cheaper model, you will find it recalling errors it made earlier and heuristics it came up with to solve the problem.
Still haven't found any provider close to @GroqInc for fast inference - wish they served @Kimi_Moonshot 's kimi k2.6 or even the previous k2 instruct - high throughput enables amazing UX for certain workflows
@rachelnabors This looks quite similar to what @DSPyOSS does with GEPA - which uses a bigger model to "teach" an SLM by fine tuning the prompts to get to an acceptable level - interesting direction nonetheless and evals provide much needed discipline for production AI systems
Saw this recommendation first from @lucasmeijer in his amazing talk https://t.co/bqDQzIE300 - having been using it ever since and can confirm it is far superior to md files - incredible alpha to be gleaned from people at the "frontier" of coding agent usage
HTML is the new markdown.
I've stopped writing markdown files for almost everything and switched to using Claude Code to generate HTML for me. This is why.
Browser-native computer use — without Computer Use
Computer Use is expensive, slow and brittle (pixel level clicking breaks on any UI change)
Better in practice: control an RDP-over-HTML5 canvas stream purely through keyboard events. Agent reads screenshots to verify state and acts through key sequences only. Getting this to work essentially involves understanding the full keyboard navigation model deeply and encoding that as crisp tools or set of tool calls bundled as primitives.
Agentic loops doing this become faster, cheaper and more deterministic. @claudeai Sonnet 4.6 still beats every other model I've tried as an engine for this.
Agreed that prompts or text does seem like a bit of noise but consider this - it allows you to get rid of a rigid UI, more features don't necessarily mean a bloated UI - your app keeps getting smarter without looking bulkier - there is still some learning curve but arguably lesser - nevertheless, whether such a UX and distribution works for you depends on what you are building - sometimes a traditional GUI is indeed the better approach
Apps are friction. The best UI is no UI.
No downloads. No menus. Just message a Telegram bot.
LLM turns imprecise instructions to structured db actions. Built with 🦀 Rust + @GroqInc for fast inference.
Meet users where they already are. AI changes both the UI and the distribution model.
Open source: https://t.co/z3cQ2PGrxi
Accurate and fast speech-to-text models like Saaras V3 by @SarvamAI that can handle code-mixed audio enables really good UX - especially for Indian users who speak multiple languages; and great UX is all that matters in an AI enabled product like the one we are building for MSME users
People say AI is a 'force multiplier.' I think it's more like a force exponent.
Weak foundation => you get more weaker garbage.
Strong foundation => superlinear, exponential gains.
F^AI, not F×AI
Very cool - we have been working on something very similar for a client - though with some notable differences - we work directly with the RTSP stream, and need to alert on occurence of real time events (apart from supporting semantic search), and the whole thing needs to run on a commodity desktop within about $400 - its a very interesting problem, especially when done on a tight budget but to be fair, we didnt really invent anything here; just designing and gluing together the system to work cheaply and reliably
@gakonst I've not had the best experience sending PDFs or images for OCR to llm APIs - it's not just the occasional inaccuracy but also the cost. However, my experience is circa nov 2025 so time to give it a retry maybe
Retrofitting existing ERPs and legacy systems with modern interfaces and tools is more likely to get a buy-in from institutions than a full rewrite from scratch ever will - irrespective of how easy it is with agent swarms. While there is an astounding hype around how enterprises are going to shun existing systems and create custom software for themselves, it is naive to imagine most would risk breaking existing systems - much more agreeable if they can get an immediate, incremental ROI with much lesser risk by retrofitting existing systems with modern fast software.
Counter-intuitive at first, but now obvious in practice:
Asking Claude Code to write code for an analysis task usually beats asking it to just do the analysis directly - often by a lot.
Real example: Client had transaction ledgers split by location in a dozen separate PDFs.
Instead of “consolidate these”, I asked for a short script to read the PDFs then merge chronologically into one ledger.
100% accurate + reusable artifact for next time.
Reuse notwithstanding, the accuracy alone is worth creating the code artifact for.
This aligns with the spirit of @Cloudflare's 'Code Mode' insight: LLMs are better at writing code to call tools than calling tools directly - https://t.co/hn4bd7GXV9
I am no thought leader, but what @VitalikButerin has been saying recently as well as @EliBenSasson is imo a good reset of the north star - its asking the same question as the quoted tweet but of the base blockchain/L2 itself instead of the application teams - a good number of blockchains and application teams continue to port ideas which are absolutely fine on web2 to web3 just for the sake of it - the number of useful things that you could do on web3 that you just could not have done on web2 is still extremely limited
What is it that web3 enables (apart from decentralization) that could not have been done on web2 - still very few application development teams are asking this question - teams who have a convincing thesis around this will likely emerge winners
Success of coding agents like Claude Code and Cursor like IDEs just proves this point - these products have all the 3 ingredients to make it work - it just turns out that people with the domain expertise here are programmers themselves who not only know what can be built but also have the expertise to actually build it - the flywheel effect is incredible
Compute isn't the real bottleneck - shortage of talent at the intersection of domain expertise, knowing what can be built and knowing how to build it is - this is where magical experiences get created
Prompt optimization is a fragile process, but GEPA and @DSPyOSS continue to amaze me with remarkable results - turning a manual brittle process into robust engineering through reflection based prompt optimization - got almost ~30% latency reduction by teaching llama-3.3-70b-versatile through the heavier claude-sonnet-4.5 without any degradation in accuracy