Julio Valls

@Coopykins

Tech Lead. Dad. I enjoy video games, software dev and reading.

Joined July 2019

333 Following

162 Followers

1K Posts

Julio Valls @Coopykins

1 day ago

@brian_armstrong You described my approach very well. I hope I can set up something like that in my org. Biggest issue I see is that most just use Opus for example and don't bother changing the model ever.

316

Coopykins retweeted

Brian Armstrong

@brian_armstrong

1 day ago

How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching. Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work. Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task. Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented. Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted. Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect. The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable. Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.

brian_armstrong's tweet photo. How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.

Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.

Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.

Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented.

Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.

Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect.

The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.

Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.

401

611

Julio Valls @Coopykins

2 days ago

I'm going through the process of buying a home and, damn, that is stressful! Specially with a rental that will expire soon. Feels like having an Axe hanging over your head

Coopykins's tweet photo. I'm going through the process of buying a home and, damn, that is stressful! Specially with a rental that will expire soon. Feels like having an Axe hanging over your head https://t.co/IMLpEGOuOx

Coopykins retweeted

Pavol Rusnak

@PavolRusnak

4 days ago

180

62K

Who to follow

Pandy Knight

@AutomationPanda

Building excellent software! @CycleLabs Sr Director Product Mgmt. @playwrightweb Ambassador. #BoaConstrictor creator. Former @TestAutomationU Director. He/Him.

Soufiane Qadi

@QadiSoufiane

QA manager | QA Lead | Freelance

Jonas Menesklou

@menes_jo

CEO at @ask_ui | What can be said can be solved.

Coopykins retweeted

waiting. @GreenOnionDuck

7 days ago

As a software engineer I have to unfortunately inform you that if medicine was held to the standard of software engineering there would be mass death.

278

61K

Coopykins retweeted

MartaSevilla

@MarBalCas

6 days ago

Al encargado de hacer una moción de censura para luchar contra la corrupción, le han caído 24 años por corrupción.

226

12K

170

108K

Coopykins retweeted

David Fernández

@naroh

7 days ago

📌 Caso práctico de cómo funciona la transparencia en España: Un ciudadano pidió en abril de 2022 el listado de asesores (nombres, titulación y retribuciones) de Moncloa en 2021. 🤷‍ Moncloa ni siquiera contestó. En junio de 2022 el ciudadano reclamó ante el Consejo de Transparencia. ⚖️ El Consejo resolvió en abril de 2023, 228 días después de la reclamación (saltándose su plazo legal máximo de 90 días) que el ciudadano tenía razón: https://t.co/S5QwOsS7A6 📄 Moncloa termina dando acceso pero sin cubrir toda la información que el Consejo de Transparencia instó a hacer pública y de forma deficiente. 🔁En junio de 2025, tres años después de la solicitud original, el ciudadano vuelve a presentar una solicitud de acceso porque la ejecución de la anterior fue defectuosa, alegando que - La URL daba 404 - No adjuntaron el anexo de retribuciones 📬 Moncloa responde en agosto dando un listado de puestos disociados. ⏳ El ciudadano vuelve a reclamar el 6 de agosto al Consejo. ❌ El Consejo resuelve 182 días después, el 4 de febrero, de nuevo saltándose el plazo máximo de 90 días. Y su resolución es de inadmisión, argumentando que ya hay una decisión del Consejo a favor del solicitante y que si la Administración no la ejecuta entonces lo que tiene que hacer es ir a un Contencioso (pagando abogado y procurador) https://t.co/Sx2Lkuoxrs

180

165K

Coopykins retweeted

Mykhailo Fedorov

@FedorovMykhailo

9 days ago

Ukraine launches TrophyLab: we are opening access to captured Russian weapon technologies for our global partners. Every missile, drone, and vehicle seized on the battlefield is now a source of knowledge for the free world. Through this secure platform, allied governments, labs, and defense tech manufacturers gain access to deep technical data, reports, and vulnerabilities. Users can also request physical equipment for testing, significantly shortening the development cycle for countermeasures. What was meant to be the enemy's secret advantage is being dismantled to defend democracy. Join the platform: 🔗 https://t.co/xoeCfXsIy3

711

31K

Coopykins retweeted

Fardeem

@FardeemM

10 days ago

If you're on your way to building a billion dollar company that involves a web app, here are some of my notes on architecting the frontend. if you don't do this, it's probably fine but one day you'll hire someone to fix it but truly that person could be doing some other higher value thing if you make some key optimizations on day 1 you don't even have to learn anything you're gonna tell your agents to do it anyways! okay here it goes: - Make your server code generate a openapi spec which then generates all the relevant client side code. Never do this by hand. Typing backend types instead of generating them should be banned - You need to make a decision on how the client talks to the backend. rest/graphql works in which case please just use tanstack query. other libraries will look similar but tanstack query truly is goated. - if you want linear style sync setups or offline mode, think about this HARD and architect it from day 1. Bolting this on later is so tedious. - People like using plain react router but things have gotten a lot better since then. Try their new framework mode or just even use tanstack router. Use route data loaders. - If you store a lot of state in query params, make that a first class citizen and make sure its type safe. use nuqs or tanstack query. - Most apps just need a single state management situation for server state and thats it. If you have other bespoke needs, i have quite like zustand and xstate/store. - If you have a super interactive app where things come in and out of view, theres a lot of frontend state to maintain, music is playing and what not, lock in and learn xstate. Trust me if you wanna keep ur sanity, you need to model ur frontend as a state machine otherwise you're gonna be deep in useEffect hell - React compiler is here my friends, the days of useMemo and useCallback are gone. Update your priors accordingly - Tailwind is easy and fun but makes it really hard to maintain a large app with consistent styling. You need a "agent-first design system/component library" but maybe this is a rant for another day - Don't be afraid to hack your routing library to fit your needs more closely. A lot of apps have "drawers" to show additional info. You should 100% be able to say "here's a route, make it a drawer" and everything should be handled from there. - Managing loading and error states using isPending and isError is madness. Lean into Suspense and ErrorBoundary. - Figuring out a blessed path for websockets and SSE on day 1 i think will pay dividends in the long term if you're building anything AI related. - If you're building a SPA, don't use next.js. it literally makes no sense. Why would you do this. - Definitely deploy on Cloudflare or vercel. There are other services but trust, there have weird missing features. - Assuming you build something people want, the next job is to build the factory so it can efficiently build the thing. Act accordingly.

115

204K

Julio Valls @Coopykins

10 days ago

@jamonholmgren We're doing 2/3 and would like to get better results from automated code reviews from LLMs to assist with the increased throughput but nothing seems too reliable.

Coopykins retweeted

Cursor @cursor_ai

12 days ago

We're launching code storage and git hosting. Origin gives teams and agents a place to host, review, and collaborate on code. Available this fall. Join the waitlist. https://t.co/uamaIarJXY

581

15K

Coopykins retweeted

OpenRouter

@OpenRouter

15 days ago

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

OpenRouter's tweet photo. Introducing the Fusion API, the smartest compound model in the market.

Fusion achieves Fable-level intelligence at half the price.

How it works 👇 https://t.co/OTUQAdTQjU

718

15K

14K

Coopykins retweeted

John Scott-Railton

@jsrailton

18 days ago

NEW: malware developers added nuclear & biological weapons text to to their spyware. Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner. Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky. When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit. We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted. In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation. H/T to colleagues that shared this with me https://t.co/f3Aj9TYxU4

jsrailton's tweet photo. NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://t.co/f3Aj9TYxU4

224

13K

Coopykins retweeted

Low Level

@LowLevelTweets

19 days ago

If the vibe coders could read they’d be very upset

484

338

428K

Coopykins retweeted

Handre

@Handre

22 days ago

Milton Friedman's greatest regret. The federal government discovered the perfect crime in 1943: make employers collect taxes before workers ever see their paychecks. You think you earn $60,000 per year, but you actually earn $75,000 and hand over $15,000 to politicians without ever touching it. The psychological difference is enormous. Before payroll withholding, Americans wrote quarterly checks directly to the Treasury. Picture yourself sitting at your kitchen table, writing a $3,750 check to the IRS every three months. The pain was immediate and visceral. Politicians faced constant pressure to justify every dollar because citizens felt the extraction in real time. Withholding transforms this concrete loss into an abstract accounting entry. Your employer becomes an unpaid tax collector, and you never experience the actual cost of government. Worse, most people celebrate their tax refunds as government generosity rather than recognizing them as interest-free loans they provided to politicians. The Treasury collects your money throughout the year, spends it immediately, then returns your own cash and receives gratitude. This system enables the explosion in government spending you witness today. Defense contractors billing $640 for toilet seats, agricultural subsidies for corn syrup, and congressional salaries for 535 people who rarely show up to work. When taxation feels painless, voters stop demanding accountability for how their money gets spent. Milton Friedman helped design withholding as a wartime emergency measure and later called it his greatest regret. Free market economists recognized that the psychological pain of direct taxation creates political pressure for fiscal restraint. The temporary always becomes permanent in government hands, and the emergency justification disappears while the extraction mechanism remains forever.

Handre's tweet photo. Milton Friedman's greatest regret.

The federal government discovered the perfect crime in 1943: make employers collect taxes before workers ever see their paychecks. You think you earn $60,000 per year, but you actually earn $75,000 and hand over $15,000 to politicians without ever touching it. The psychological difference is enormous.

Before payroll withholding, Americans wrote quarterly checks directly to the Treasury. Picture yourself sitting at your kitchen table, writing a $3,750 check to the IRS every three months. The pain was immediate and visceral. Politicians faced constant pressure to justify every dollar because citizens felt the extraction in real time.

Withholding transforms this concrete loss into an abstract accounting entry. Your employer becomes an unpaid tax collector, and you never experience the actual cost of government. Worse, most people celebrate their tax refunds as government generosity rather than recognizing them as interest-free loans they provided to politicians. The Treasury collects your money throughout the year, spends it immediately, then returns your own cash and receives gratitude.

This system enables the explosion in government spending you witness today. Defense contractors billing $640 for toilet seats, agricultural subsidies for corn syrup, and congressional salaries for 535 people who rarely show up to work. When taxation feels painless, voters stop demanding accountability for how their money gets spent.

Milton Friedman helped design withholding as a wartime emergency measure and later called it his greatest regret. Free market economists recognized that the psychological pain of direct taxation creates political pressure for fiscal restraint. The temporary always becomes permanent in government hands, and the emergency justification disappears while the extraction mechanism remains forever.

280

11K

609K

Julio Valls @Coopykins

24 days ago

I'd go live in the mountains but I have a small kid too...

Julio Valls @Coopykins

24 days ago

My house rental contract is expiring in some months. And things are really grim in here. Renewing is going to put the price at nearly twice as much as I signed up 5 years ago. Even in small towns 50Kms away from Valencia the price is nearly as much 😱

Julio Valls @Coopykins

24 days ago

@aidenybai I'm really loving react doctor! I'm noticing that when codex desktop app runs it, it can never get the scoring results, only the list of issues. Is this because I'm running it on windows and it's using powershell?