Introducing a new side project called Model Regression. It tests daily Claude, GPT, and Grok on various benchmark statistics to determine how well its performing and to identify model degrades over time.
@edskoudis had an idea for model testing before they conducted offensive testing to ensure the model was performing as expected, and @BlasikRandy pushed me down this road with actually going and doing it.
The main intent here is the frontier models will experience outages, issues, bugs, intentional/unintentional nerfing of the models without notice. You can't typically trust day to day activities in these models for stability, so leveraging this on your daily routine to see how well the model is performing for that day is something I'll be using everyday.
Runs every morning in my DGX sparks environment and automatically updates with how well its performing.
Enjoy!
https://t.co/1Pep6NyGoh
Also open-sourced the project, can run on your own server as well and look at the benchmarks and how they are calculated:
https://t.co/GFPigpRtUF
Formula 1 driver Valtteri Bottas decided to do an Ironman (at home) during his off week.
• 11 hours
• 7,000 calories burned
He did the swim portion in his pool and then used a peloton and indoor treadmill for the bike and run.
What a madman.
@RedTeamTactics It is not a waste of time if you use it correctly. If you use it as a checklist, then you are wasting your time. If you use it to map attacks against your detection capabilities then sure it is a good use of your time.
BREAKING: Reuters reports that the Pentagon ran a secret program in the Philippines to 'sow doubt about the safety and efficacy' of China's Sinovac coronavirus vaccine
Just because you get access denied accessing a folder, it doesn't mean you can't get access. A quick look at bypassing the security on the WindowsApps folder. https://t.co/2JN9GEMWLb
It's graduation season. Pero bakit parang lahat ng mga bata may honor at awards? Sorry hindi naman sa hindi masaya para sa kanila, pero naalala ko lang ang hirap mag-honor dati at pag may honor ka parang nakakabilib talaga. Musta naman yun 2/3 ng class may honor? Tapos kulelat tayo sa PISA?
Interesting! Two people from class today were my teammates last year at SANS SEC504. I joined via Live Online and they were In-Person. Crazy coincidence!
Quick Hackfest Hollywood keynote announcement:
Day One Keynote: David Weston (@dwizzzleMSFT)
Day Two Keynote: Yarden Shafir (@yarden_shafir)
October 28th & 29th in Los Angeles!
Register for virtual (free) & in-person attendance here: https://t.co/BF9cUVMr9y
UnitedHealth CEO confirms in US oversight hearing that they paid a $22 million ransom to Black/ALPHV ransomware operation.
Our previous coverage from yesterday:
https://t.co/v1fYznVqbl
Active Directory hardening blog post series, like a boss, by Jerry Devore. Posting this so I can reference it later!
Disabling NTLMv1 https://t.co/b6FZuUrnJ5
Removing SMBv1 https://t.co/Ngp6rGEIcE
Enforcing LDAP Signing -https://t.co/lq7wTHvOXA
Enforcing AES for Kerberos https://t.co/1ws86c9L1s