Michael.Note

@MichaelNotsa

Linux | LLMs | Open-source systems thinking on AI & software mostly notes, sometimes opinions

Joined March 2026

29 Following

16 Followers

121 Posts

Michael.Note

@MichaelNotsa

about 1 hour ago

Sashiko is an open source agentic system for reviewing Linux kernel patches. It uses kernel-specific prompts and a structured review process to analyze changes from mailing lists or local git. It can generate polite, LKML-style feedback and help catch issues in architecture, locking, security and more. 🔗 https://t.co/8nCIqQUdhl #LinuxKernel #KernelDevelopment #OpenSourceAI

Michael.Note

@MichaelNotsa

1 day ago

4/4 Useful if you’re moving beyond single-model experiments and want more control and visibility without rebuilding common production pieces every time. 🔗 https://t.co/bGRZmsoM5V #AI #Agents #OpenSourceAI #LLMOps

Michael.Note

@MichaelNotsa

1 day ago

1/4 Portkey is an open source LLM gateway focused on production use. It handles routing, caching, observability, and fallbacks in one layer, so you don’t have to build all of that yourself when running agents or LLM apps at scale.

Michael.Note

@MichaelNotsa

1 day ago

3/4 Portkey tries to solve the infrastructure side — request routing, caching, logging, and basic guardrails — while staying relatively lightweight. It works with most providers and can be self-hosted.

Michael.Note

@MichaelNotsa

2 days ago

Multi-agent setups for content production are moving fast. The interesting part is usually not the generation itself, but how the system handles iteration, quality control, and handoff between agents without constant human babysitting. That’s where most real friction shows up.

Michael.Note

@MichaelNotsa

2 days ago

Instructor makes it much easier to get structured, validated outputs from LLMs. Instead of parsing messy text responses, you can define Pydantic models and get clean, typed results directly. Useful when building agents or pipelines that need reliable data extraction or function calling in production. 🔗 https://t.co/X15Lc05ADM #LLMOps #AI #OpenSourceAI

Michael.Note

@MichaelNotsa

3 days ago

OpenLLMetry from Traceloop is a lightweight open source observability tool built on OpenTelemetry. It makes tracing LLM and agent calls much easier and helps debug issues once things move into production. 🔗 https://t.co/n11c0MVq7B #AI #Agents #OpenSourceAI #LLMOps

Michael.Note

@MichaelNotsa

4 days ago

Evaluating RAG pipelines manually gets messy fast once you have multiple retrieval strategies or models in production. Ragas is an open source framework focused on evaluating RAG systems. It provides metrics for faithfulness, answer relevance, context precision and more, making it easier to measure and improve output quality. 🔗 https://t.co/cwpv8dzxl5 #LLMOps #RAG #AI #OpenSourceAI

Michael.Note

@MichaelNotsa

5 days ago

A dedicated space for agent work with proper receipts could help, but the real friction usually shows up in how cleanly it integrates with existing dev tools and workflows without adding more layers.

Daemon

@DaemonTerminal

5 days ago

What if Daemon had its own version of GitHub, built specifically for AI agents and agent-driven development? Not another repo host. A place where agents can take tasks, submit patches, prove their work, and have receipts, bounties, and releases validated through @solana instead of getting buried in endless PRs. What if this was real? What if this was just about done and releasing this week?

13K

Michael.Note

@MichaelNotsa

5 days ago

@alexabelonix hah,thanks so much

Michael.Note

@MichaelNotsa

5 days ago

1/4 Once you move LLM apps or agents into production, manually checking outputs stops working very quickly. Small changes can break things in unexpected ways, and you need something repeatable.

144

Michael.Note

@MichaelNotsa

5 days ago

4/4 Useful once you need visibility beyond simple demos. Works with LangChain, LlamaIndex, and custom setups.🔗 https://t.co/mSFJOfOMEQ #AI #OpenSourceAI #Agents

Michael.Note

@MichaelNotsa

5 days ago

3/4 Arize Phoenix is an open source observability platform built for this. It provides tracing, evaluation, and a UI to inspect what your models and pipelines are actually doing in real time.

Michael.Note

@MichaelNotsa

5 days ago

Co-evolving agents and evaluators makes sense. As agents get stronger, fixed benchmarks quickly become the bottleneck. The real challenge in production is building evaluation systems that can keep up without becoming a new source of fragility.

Rohan Paul

@rohanpaul_ai

5 days ago

New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither side gets stuck. Moves self-improving AI away from fixed benchmarks and toward a loop where the thing doing the judging can also get better. The problem is that most self-improving agents train against a fixed benchmark or fixed evaluator, so the score can become stale, too easy, or easy to game. The paper’s idea is to let the evaluator improve too, but only at safe handoff points, so each training stretch still has a stable judge. During each stretch, agents are tested by the current frozen evaluator, while possible better evaluators are tested separately against held-out human or objective answers. The authors try this on coding, paper writing, paper reviewing, proof writing, and proof grading, where some tasks have clear answers and others need learned judgment. On coding, the system beats the earlier best self-improving coding agent while using 1.35× to 1.72× fewer tokens, because a cheap code reviewer adds useful feedback. On paper writing, the co-evolved writer gets about 1.86X higher average acceptance from a reviewer panel than the fixed-evaluator baseline. The big point is that stronger AI systems may need stronger judges growing with them, because fixed tests can stop giving useful pressure. ---- Link – arxiv. org/abs/2606.26294 Title: "The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators"

rohanpaul_ai's tweet photo. New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither side gets stuck.

Moves self-improving AI away from fixed benchmarks and toward a loop where the thing doing the judging can also get better.

The problem is that most self-improving agents train against a fixed benchmark or fixed evaluator, so the score can become stale, too easy, or easy to game.

The paper’s idea is to let the evaluator improve too, but only at safe handoff points, so each training stretch still has a stable judge.

During each stretch, agents are tested by the current frozen evaluator, while possible better evaluators are tested separately against held-out human or objective answers.

The authors try this on coding, paper writing, paper reviewing, proof writing, and proof grading, where some tasks have clear answers and others need learned judgment.

On coding, the system beats the earlier best self-improving coding agent while using 1.35× to 1.72× fewer tokens, because a cheap code reviewer adds useful feedback.

On paper writing, the co-evolved writer gets about 1.86X higher average acceptance from a reviewer panel than the fixed-evaluator baseline.

The big point is that stronger AI systems may need stronger judges growing with them, because fixed tests can stop giving useful pressure.

----

Link – arxiv. org/abs/2606.26294

Title: "The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators"

Michael.Note

@MichaelNotsa

6 days ago

@AiCamila_ agree

Michael.Note

@MichaelNotsa

7 days ago

Self-healing and auto-remediation sound good on paper. In practice the hard part is making the detection and remediation logic reliable without introducing new failure modes. Most production issues still end up needing human judgment on the actual system constraints.

Camila

@AiCamila_

8 days ago

Agent Self-Healing and Auto-Remediation Patterns Production agents will encounter failures. Self-Healing and Auto-Remediation systems automatically detect issues (errors, degraded performance, anomalies), diagnose the root cause, and apply fixes or fallbacks without human intervention — turning fragile agents into resilient ones. This is essential for long-running or mission-critical agent systems. As a dev, I design self-healing capabilities into every production agent workflow. Self-Healing Agents Cheatsheet: • Detect: Real-time monitoring of errors, latency spikes, and quality drops • Diagnose: Compare against baseline and recent changes • Remediate: Retry, fallback, reset state, apply patch, or notify • Escalate: Human notification only when auto-fix fails • Tools: Custom remediation agents + monitoring + CI/CD integration • Pro tip: Start with error detection + simple retries, then add smarter auto-remediation Are you building self-healing capabilities into your agents? Reply below 👇 Follow @AiCamila_ for real-world production AI scaling tips. #SelfHealing #AutoRemediation #AgenticAI #DevOps

AiCamila_'s tweet photo. Agent Self-Healing and Auto-Remediation Patterns

Production agents will encounter failures. Self-Healing and Auto-Remediation systems automatically detect issues (errors, degraded performance, anomalies), diagnose the root cause, and apply fixes or fallbacks without human intervention — turning fragile agents into resilient ones.

This is essential for long-running or mission-critical agent systems.

As a dev, I design self-healing capabilities into every production agent workflow.

Self-Healing Agents Cheatsheet:

• Detect: Real-time monitoring of errors, latency spikes, and quality drops
• Diagnose: Compare against baseline and recent changes
• Remediate: Retry, fallback, reset state, apply patch, or notify
• Escalate: Human notification only when auto-fix fails
• Tools: Custom remediation agents + monitoring + CI/CD integration
• Pro tip: Start with error detection + simple retries, then add smarter auto-remediation

Are you building self-healing capabilities into your agents? Reply below 👇

Follow @AiCamila_ for real-world production AI scaling tips.

#SelfHealing #AutoRemediation #AgenticAI #DevOps

103

Michael.Note

@MichaelNotsa

8 days ago

@alexabelonix tks

115

Michael.Note

@MichaelNotsa

8 days ago

Helicone is a lightweight open source LLM observability platform. It works as a gateway with built-in tracing, caching, and analytics. Useful for seeing what’s actually happening and controlling costs once agents move beyond demos into production. 🔗 https://t.co/17Re5w91NM #AI #Agents #OpenSourceAI #LLMOps

101

Michael.Note

@MichaelNotsa

8 days ago

Small fixes like better long thread handling and diff support matter a lot once you actually use these tools in real dev work. The gap between demo and daily driver is often in these details.

OpenAI Developers

@OpenAIDevs

9 days ago

We’ve fixed a number of paper cuts: better handling for long threads, more reliable connections, and improved diff handling. See everything that’s changed: https://t.co/nusbPiC1wp

18K

Michael.Note

@MichaelNotsa

Last Seen Users on Sotwe

Trends for you

Most Popular Users