UCL neuro PhD now in industry (ML lead). the brain was was too hard to crack, so I switched to artificial nets; RL, NLP, and neuroai - all views my own
Hosting a @cursor_ai hands off hack with @carlagriffs + Gamal! 🤌
Met Carla in our fintech hack. Gamal - well not in person lol, but cool guy.
Both had an idea to run a hands off hack… me likey so much decided to collab.
Thursday, 25th June. Hosted by @HalkinOffices + Oli!
Hi hi, yesterday was @cursor_ai happy hour in Shoreditch🇬🇧
Got to meet the EMEA GTM team and celebrate the Cursor London HQ launch, LFG🤌
New and familiar faces, one thing clear @cursor_ai is winning.
S/O to Rich Cullen and Jason Creane for organizing, ´twas a banger!
Complaining about nightlife when you *checks notes* choose to live in Soho is like living in South Kensington and complaining about the museums. Or moving to Hackney and grumbling about creatives. Living in Richmond and hating green space. It's all getting a bit silly, isn't it?
Most people training agentic LLMs with RL right now have a silently broken training loop and have no idea.
Here's the trap: single-turn RL works beautifully. Clean curves, sane rewards, everything converges. Then you add tools so the model can act mid-rollout, and things get weird. Loss spikes for no reason. Eventually a shape-mismatch error.
The culprit: every time you parse the model's output to detect a tool call, then re-tokenize the updated conversation for the next turn, you're rolling the dice. Usually the round-trip gives back the same tokens. Sometimes it doesn't and your gradient lands on a sequence the model never actually sampled. No crash. Just quietly wrong math and a useless gradient signal.
The fix is one rule: never re-encode tokens you've decoded. Keep the sampled tokens in one buffer, never re-render them, and both failure modes disappear. That's Token-In, Token-Out done right.
Our team just published a beautiful deep-dive on exactly this, including an audit across the major open-weights model families showing most chat templates already support it. Required reading if you're doing multi-turn RL 🤗🔥
https://t.co/zmx0EQl3jM
Venue announcement: Pubmaxxing is it at Strongrooms in Shoreditch next Wednesday.
We reserved a decadent area at the back of the garden - perfect for cheeky pints, dastardly plots, and unrelenting London optimism.
After pulling some strings we can confirm it's gonna be 25 degrees and SUNNY that evening. 🌞
Bring your suncream & Oyster card.
200 RSVPs already, so we need to lock it off for now. If you can't make it now, please update that you're not going on Luma, to make way for others.
See you there! #londonmaxxing
reaching out to the #londonmaxxing community.. my friends and I are organising a hackathon for next month..does anyone know of any VCs/start ups that would be able to loan their office after hours on a Thursday? Cursor will provide credits, just trying to find a venue now
the year was 2024. you wanted to build an ai chatbot. you installed chroma db locally. you couldn’t figure out how to deploy it so you switched to pgvector. you read a paper on RAG. you spent $4.82 by calling an embedding api after realizing you couldn’t figure out how to get BAAI/bge-large-en-v1.5 working with your broken cuda packages. nvidia stock was overpriced at $90 you’re sure of it. you converted all your documents to embedding. you googled cosine similarity. you called the claude 3 sonnet model api and ran out of context after 8k tokens. you’re deep into reading langchain docs and confused. maybe something called llama index might work. it took four days to prototype but at least github copilot has killer autocomplete. your responses are shit but fortunately openai has a fine tuning api that will help. surely in a few weeks you’ll have something to show your boss, and the answers will be hallucination free. life is good.