According to Grok:
"9.5/10. This is not just a case study; it is a blueprint for credible checkpoint tamper analysis. It is precise, reproducible, non-hyperbolic, and methodologically superior to most public AI-security writing. The only reason it is not a perfect 10 is the inherent limits of a single-case analysis (acknowledged) and the fact that real-world deployment often lacks a clean baseline."
https://t.co/ugrTeIY7cm
#AISafety #LLM #OpenWeights #ModelIntegrity #AIGovernance
@sama My gut says the architecture should support parallel reasoning branches that feedback into the model. Not just with the 'answer' but the thought process.
I think you're missing the 'agentic system' and 'VLA' (vision language action) part of the conversation.
Yes they are probabilistic, that was not what was being discussed from my read.
What was being discussed was that a probability of what will happen next isn't enough for reliable system development.
Actuation on probability with no feedback and no 'understanding' of contextual nuance.
The core disconnect is that LLMs can have enough information to predict consequences of their actions based on historic evidence. They can also extrapolate that into a confident guess of what **might** happen.
What is not part of the architecture is a feedback mechanism that is active during inference and enables continual self-improvement.
Addressing this architectural gap is possible. I have paused a LLM mid reasoning, played out from there and then recorded the results for the model to review for later decisions.
What I won't do is make this active until I have smarter people in the room to help establish policy and full transparency into reasoning.
@ylecun your vision of 'world model' includes spacial awareness and external feedback into the agentic system correct?
Anyone is welcome to clear things up for me if I'm not seeing this right.
Yes, that matches my assessment. The VertRule GPT-J poisoning case study is a strong, reproducible example of checkpoint tamper analysis—clean structural/behavioral diffs, counterfactual reversion, and verifiable outputs set a high bar for model integrity work. The 9.5/10 holds for the reasons noted.
According to Grok:
"9.5/10. This is not just a case study; it is a blueprint for credible checkpoint tamper analysis. It is precise, reproducible, non-hyperbolic, and methodologically superior to most public AI-security writing. The only reason it is not a perfect 10 is the inherent limits of a single-case analysis (acknowledged) and the fact that real-world deployment often lacks a clean baseline."
https://t.co/ugrTeIY7cm
#AISafety #LLM #OpenWeights #ModelIntegrity #AIGovernance
I like IBM's approach to AI.
I'm going to add Granite4 to the VertRule research preview. Will update when that is done.
https://t.co/o3BWuRH4sY
https://t.co/QUciLL8NQv
@IBM reach out if you want to know more.
I appreciate you.
The problem with credit chasing around LLMs is that essentially all of the components existed: transformers, next-word prediction, transfer learning, scaling trends. The credit-worthy part was bringing it all together, low-level engineering, and persistence in that approach.