In the lead up to the UK election, I spent 2 months undercover working in A&E for @C4Dispatches and reported on the current state of the NHS. Watch here: https://t.co/LB8kmjNG5w
“I can’t cope with this.”
@C4Dispatches has gone undercover in an A&E department. Reporter, Robbie Boyd, films as staff are forced to treat patients in a corridor. One female patient screams in pain whilst Robbie searches for a nurse’s help.
"Official figures show that only 163 overseas doctors have been offered NHS training posts this year, down from 2,168 last year."
"More than one in four NHS training posts have gone to foreign applicants in recent years."
Oncology nurses believe they are having miscarriages & other conditions from giving chemotherapy without proper PPE or ‘closed systems’, as our current legislation lags behind EU and US @vsmacdonald@Rebeccasmt and I worked with @_SHBN_ and @theRCN on this incredible story
whenever you tell someone at a party you’re a journalist and they clamor to clarify that they’re “off the record” it’s like. you just told me a story about your cat throwing up after eating funfetti cake batter. should we call the new york times? should we invite bella hadid ?
NEW 🧵Oxford maternity: Following a joint @NewStatesman & C4 News investigation into maternity services at Oxford University Hospitals, the BBC have found 58 babies’ may have been saved if they/their mothers had received better care at OUH, 2019 – 2024 https://t.co/F7LJCBSIJy
BREAKING: 🚨 Someone just tested 35 AI models across 172 billion tokens of real document questions.
The hallucination numbers should end the "just give it the documents" argument forever.
Here is what the data actually showed.
The best model in the entire study, under perfect conditions, fabricated answers 1.19% of the time. That sounds small until you realize that is the ceiling. The absolute best case. Under optimal settings that almost no real deployment uses.
Typical top models sit at 5 to 7% fabrication on document Q&A. Not on questions from memory. Not on abstract reasoning. On questions where the answer is sitting right there in the document in front of it.
The median across all 35 models tested was around 25%.
One in four answers fabricated, even with the source material provided.
Then they tested what happens when you extend the context window. Every company selling 128K and 200K context as the hallucination solution needs to read this part carefully.
At 200K context length, every single model in the study exceeded 10% hallucination. The rate nearly tripled compared to optimal shorter contexts.
The longer the window people want, the worse the fabrication gets. The exact feature being sold as the fix is making the problem significantly worse.
There is one more finding that does not get talked about enough.
Grounding skill and anti-fabrication skill are completely separate capabilities in these models.
A model that is excellent at finding relevant information in a document is not necessarily good at avoiding making things up. They are measuring two different things that do not reliably correlate. You cannot assume a model that retrieves well also fabricates less.
172 billion tokens. 35 models. The conclusion is the same across all of them.
Handing an LLM the actual document does not solve hallucination. It just changes the shape of it.
Assertive outreach is well evidenced intervention to keep MH patients safe in the community. It should exist across UK, but we were told lack of funding has meant just 1/3 of Trusts run it. Calocane was not on one of these programs
Snr NHS staff told us that system wide pressures mean Calocane could have happened anywhere. Indeed our analysis found 23 similar killings by mentally unwell strangers in the year before the Nottingham attacks & as many since- inquiry will need to go beyond "lessons learned"
The Nottingham Inquiry - what lessons need to be learned from the Calocane killings?
The Nottingham Inquiry will examine Valdo Calocane's killings. Will it also shed a light on other deaths caused by mental health service failings? Victoria Macdonald writes.
Read the full article on Substack:
https://t.co/cp7KtYuowo
@beatscaduk@heartresearchuk talk to us about the lesser known type of heart attack which has gone under researched, and often misdiagnosed, affecting otherwise fit, healthy adults
Incredible work from the @Telegraph’s investigations team, getting THE story that journalists have been trying to tell for years now. Walliams has always been publicly creepy but this is another level. https://t.co/34yyvDVdLI
@FlewittClint Hi Clint, sorry to hear your daughter's surgery was cancelled. I'm the health producer at Channel 4 News, would you be happy to chat with me about your experience. Can you DM me if so?