I stumbled upon these react native evals from Callstack, and was surprised to see @Cursor_ai Composer 2 dominating it.
dug into the data a bit and composer was much better on tasks that required exact modern ecosystem knowledge: react-native-worklets, scheduleOnRN, keyboard-controller, TanStack Query details, FlashList/LegendList edge cases, avoiding deprecated APIs, etc.
it feels like a plausible payoff from Cursor's product loop. If your training signal is coming from real developers accepting, rejecting, editing, and rerunning agent changes in real codebases, you get a very different result than generic code benchmark training. and it's possibly more up-to-date than what can be found on github.
or maybe they just have a lot of new, private react native data
this is probably a good reason to take another look at Cursor for react native development
I had missed that @tan_stack Query has some really nice eslint rules available. Installing now! Good for forcing agents to write better code https://t.co/tdxwIStZ3h
Ever wanted your AI Agent to create its own tools on-the-fly? Well, you can do it quickly, safely and securely with Durable Object Facets!
The app starts as nothing more than an interface to an LLM. The LLM is able to write its own tools to add to its capabilities per request
Cloudflare workers now support reading the contents of HTTPS traffic so you can see/control exactly what your agent is doing over the internet.
Does this by having a way to enable man-in-the-middle certificate replacement.
Something feels a bit scary about this, but also probably useful for certain use cases.
https://t.co/eOEdxzJDfb
@kieranklaassen@trq212 yup, that sounds really good. but maybe it would be useful to have a community-supported version to run alongside with different trade-offs
@jyoti_mann1 I'm sure they used a lot of tokens, but people need to understand that the cost can scale by orders of magnitude based on token input/output ratio and caching ratio. So hopefully that was taken into account here.
here is @embirico helping is think through the future of teams:
"all these roles are blurring together. a designer can do more engineering. an engineer can do more design. a PM can do more building." "labels are losing meaning"
"if you were a good PM but not that good at engineering, maybe you should become an engineering manager wth a coding agent"
"it comes down to interest and agency. do what you're most interested in"
"every problem needs a human accountable for that problem area. but that doesn't have to be a PM"
"We write very few specs on the Codex team. We're talking 10 bullet points and that's it."
Here's my new episode with @embirico and @romainhuet where they gave me an inside look at how OpenAI's Codex team operates:
→ Live demo: Building in seconds with Spark
→ How the team built the beautiful Codex app
→ How they ship without traditional specs and roadmaps
Some quotes from Alex and Romain:
"The fewer people you need in a room to do anything, the more pure every decision is."
"Our designers write more code now than was written by an engineer six months ago."
"I'm much less likely to read someone's resume than their ideas and what they've built."
Thanks to our sponsors:
@meetgranola: The best AI meeting notes app I've ever used https://t.co/MNToIh5WTm
@linear: The AI agent platform for modern teams https://t.co/lI40xrrDsr
📌 Watch now: https://t.co/2lbvdTiQUl
@petergyang@embirico@romainhuet Cool to see how they only plan short term or longterm, not in between. A concrete thing you can rally the team around and a longterm vibe you're trying to achieve. Things like road maps don't work for them.
nice. @embirico shares why good engineers are still important for them:
"With codex, the vast majority of the code is generated by an agent, but we still spend a lot of care and attention thinking about the system and making sure it's high quality." "You don't necessarily want PMs owning these systems."
nice. @embirico shares why good engineers are still important for them:
"With codex, the vast majority of the code is generated by an agent, but we still spend a lot of care and attention thinking about the system and making sure it's high quality." "You don't necessarily want PMs owning these systems."
@thomasmurphy__ yeah true. I think I'll find it useful as a starting point. But I also want to closely monitor diffs to make sure it's not going off track.
Like the idea of obsidian and a knowledge folder of markdown files, but don't want to get used to another piece of software? This vscode extension makes the markdown experience a lot nicer! https://t.co/DvdunTQfAh