React Native on iOS doesn't support ReadableStream. When users switch apps mid-response, the AI keeps generating to nothing. This dev solved it by making the server keep going after disconnect and persist the full response to the DB.
{ author: @juandastic }
https://t.co/9cihw8KJDS
12 years building web apps. Zero mobile experience. Last month I needed an iOS app for my side project so I just built one. Took 3 days.
The real unlock was not React Native or Expo. It was having AI in the workflow. It did not write the app for me but it removed the friction of learning a new platform. I moved at the same speed I move on the web.
Then iOS broke everything. Turns out when your user switches to WhatsApp mid-response, iOS kills your network request. The AI keeps generating server-side, burning tokens, but the response never arrives.
Had to rethink the whole streaming architecture to handle disconnects gracefully.
Full writeup: https://t.co/vBDGQgV2wB
@deshrajdry Great article, this reminds me of the article made by @BennettSchwartz with similar ideas of why markdown is not a memory
https://t.co/hV7yvZMH7t
You are a great content creator. I enjoyed the 30-minute reflection. In fact, I have been following your content journey from that Time management video you made 10 years ago (I had to double check, and it is crazy how much time has passed), and personally, for someone who loved the startup ecosystem, hearing your thoughts and insights being in the core of that ecosystem is great. I will certainly enjoy more of this kind of content and format
Building AI side projects is fun until you have to pay for it
I built Synapse, an AI companion for my wife with a memory system that makes Gemini know her life, her patterns, her emotional triggers. She uses it daily.
Two weeks ago, I connected PostHog to track costs. $24. One session alone: 28 messages, $2.42.
Every message sends ~30K tokens of context. 80-90% of those tokens are the exact same compiled knowledge, repeated every turn.
Most AI providers offer automatic caching. Send the same prefix enough times, and they might cache it for you. But you have no control. One small change in the prompt (like a datetime updating every turn) breaks prefix matching. You never know if it is working.
𝗚𝗲𝗺𝗶𝗻𝗶'𝘀 𝗲𝘅𝗽𝗹𝗶𝗰𝗶𝘁 𝗰𝗮𝗰𝗵𝗶𝗻𝗴 𝗔𝗣𝗜 𝗶𝘀 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁.
You create a cache resource, get a name back, and reference it on every request. Guaranteed hit. 75% cheaper on cached tokens. You decide what gets cached and you can verify it is working.
I separated the stable knowledge compilation (~25K tokens) from the volatile parts of the prompt, cached the compilation after hydration, and referenced it by name on every message.
The client sends both the cache_name and the full compilation on every request. If the cache is hot: 75% savings. If it expired, the server inlines the compilation at full price. The user never notices.
𝗧𝗵𝗲 𝗻𝘂𝗺𝗯𝗲𝗿𝘀
Before: $0.017-0.039 per generation After: $0.0088 per generation
That $2.42 session? Would cost ~$0.25 now.
Same knowledge graph. Same memory quality. Just a lot cheaper to remember.
Full breakdown: https://t.co/zBuX2mkbh8
@mem0ai I made a post some weeks ago trying mem0 for my app, and I found my approach to use the Graph to find relevant concepts to pre-load on context is not easy to do on mem0, because the graph and facts are not connected. Is there any way to do that?
https://t.co/6Czc5DOXes
@bennettschwartz I still remember when I tried GPT-3 playground and being blown away that I could simulate a conversation with Albert, creating a script
@SushantDotDev @BacLeodiv My hack for this is building for you or someone you know, it may not be a business, but at least you know it is solving a real problem for at least 1 person you care about