@m13v_ Agreed, page-wide assert_text is the coarse end. That's what assert_element_text and assert_text_in are for: the check is pinned to a selector/container, so nothing upstream can satisfy it by accident. Scoped asserts + a visual expect is the combo that's held up best for me.
I told Claude "write an E2E test for the login flow" — and it wrote, ran, and debugged the test by itself. In JSON. No test code.
This is e2e-runner: an MCP server that gives your AI agent a full browser-testing toolkit. 🧵
@m13v_ The gate is the assertions, the rewrite touches the selector, never them. Wrong element -> downstream assert_url/assert_text fail, and `expect` judges the end state visually. By your own framing: intent lives in the assertion. Fair push though, noted for the roadmap 🤝
@m13v_ Tracking doesn't patch the symptom — it's the input the agent uses to fix the cause. A selector goes flaky, the learning system flags it, Claude rewrites it in the spec, re-runs, confirms green. The diagnosis lands on the agent, not on a human. You review a one-line diff.
True of every runner. Playwright's auto-wait internals aren't in your diff either. The question is what you get when it breaks: auto error screenshot, step-by-step narrative, per-run network logs, and a learning system tracking which selectors go flaky across runs. That's the "why". Still plenty to improve there.
@m13v_ The JSON is plain text in your repo — it greps, diffs and code-reviews like any file. And it runs in CI: exit 1 on failure, report.json artifact. Unknown actions fail at load time with exact file+test location. "Opaque" is the one thing it isn't 🙂 Good luck with assrt 🤝
Open source, Apache-2.0. Works with Claude Code, OpenCode, or any MCP client.
claude mcp add e2e-runner -- npx -y -p @matware/e2e-runner e2e-runner-mcp
https://t.co/XOIJYtG9ur
Hot take: E2E test code is a liability. The spec — "user logs in, sees dashboard" — is the asset. Everything else should be generated.
What's the most painful E2E suite you've ever had to maintain?
Hoy se fue uno de los artistas nacionales mas grande de la historia. Hay que dejarnos de politizar absolutamente todo.
Que Dios te tenga en su cobijo, Indio querido. 😢
Same principle here, terminology layer: OpenMed NER detects entities in free-text notes, then we link to real SNOMED/ANMAT concepts. The LLM only picks an index among real candidates, it can't emit a code. No match -> we label the gap ("unmapped, search manually"), never invent a SCTID. Refusal as architecture, not behavior.
Heural is in production with real institutions, daily clinical use on FHIR R4 + SNOMED CT + LOINC. OpenMed integration is still in research/dev for us: clinical NER + PHI de-identification on Spanish notes.
SNOMED CT is our main coding base in Heural, how's your SNOMED development coming along? On the OpenMed front we already finished lab / blood-test extraction with full LOINC coding. Curious how your HCC-agent + high-throughput retrieval design translates to SNOMED retrieval.
@MaziyarPanahi@huggingface Thanks Maziyar! Heural is a FHIR R4 EHR — SNOMED CT + LOINC bindings, every patient modeled as a clinical graph in Neo4j. OpenMed powers our clinical NLP: auto-coding free-text notes to standard terminologies and de-identifying PHI before it leaves the clinical boundary.