Vincenzo

@Kenfus2

Data Scientist in NLP working at the Institute of Data Science at the @FHNW School of Computer Science. In my free time, I like to submit models to @Numerai

Switzerland

Joined July 2021

310 Following

67 Followers

257 Posts

Kenfus2 retweeted

Moon @MoonL88537

2 days ago

yeah. this is not normal.

167

431K

Vincenzo

@Kenfus2

2 days ago

@suni_code @grok explain it for a vibe coder

Vincenzo

@Kenfus2

4 days ago

Some of the Reasoning of QWEN-VL is hilariously bad:

Vincenzo

@Kenfus2

4 days ago

Omg... A harness for QWEN VL finally solved one ArcAgi2 problem.. I NEVER expected it to struggle so much on it. I mean, it's a good model, right?

Kenfus2's tweet photo. Omg... A harness for QWEN VL finally solved one ArcAgi2 problem.. I NEVER expected it to struggle so much on it. I mean, it's a good model, right? https://t.co/HXihYYC5v5

Who to follow

Freedom Loving Patriotic Liberal. I am ANTI-FAscist, 🇺🇦 Slava Ukraini! 🇺🇦, ICE out!

Vincenzo

@Kenfus2

4 days ago

@AnupamHaldkar Opus 4.8, I don’t see fable as an europoor

Vincenzo

@Kenfus2

4 days ago

@Lamborghini Looks fast

Vincenzo

@Kenfus2

4 days ago

Lmao, it’s true and the chosen language by Claude is hilarious

Vincenzo

@Kenfus2

4 days ago

Yep… super sad about this. This is why open-source has to win in AI too

alphaXiv

@askalphaxiv

5 days ago

As believers of open research, we are disappointed to see Anthropic silently degrading Fable 5 for AI development "Any topic related to building pretraining pipelines, distributed training infrastructure, or ML accelerator design... may have limited effectiveness through Claude via methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning." Not only do they get to decide what you use LLMs for in research, but this also enables them to silently intervene in your research without you knowing. This sets a dangerous precedent. If a model refuses openly, users can understand the boundary. If a model falls back to another model, users can still evaluate the difference. But if a model silently modifies or weakens its own answers while still pretending to help, researchers lose the ability to know whether a failed result came from their own idea, their implementation, or an invisible intervention by the model provider. That is not safety. Safety policies should be transparent, auditable, and user-visible. On top of that, the people most harmed by this are not the largest labs with massive teams and proprietary infrastructure. It is the independent researchers, academic groups, startups, and open-source builders who rely on public tools to compete, innovate, and pioneer AI for everyone else.

askalphaxiv's tweet photo. As believers of open research, we are disappointed to see Anthropic silently degrading Fable 5 for AI development

"Any topic related to building pretraining pipelines, distributed training infrastructure, or ML accelerator design... may have limited effectiveness through Claude via methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning."

Not only do they get to decide what you use LLMs for in research, but this also enables them to silently intervene in your research without you knowing.

This sets a dangerous precedent. If a model refuses openly, users can understand the boundary. If a model falls back to another model, users can still evaluate the difference. But if a model silently modifies or weakens its own answers while still pretending to help, researchers lose the ability to know whether a failed result came from their own idea, their implementation, or an invisible intervention by the model provider.

That is not safety. Safety policies should be transparent, auditable, and user-visible.

On top of that, the people most harmed by this are not the largest labs with massive teams and proprietary infrastructure. It is the independent researchers, academic groups, startups, and open-source builders who rely on public tools to compete, innovate, and pioneer AI for everyone else.

166

720

642

220K

Kenfus2 retweeted

Taelin

@VictorTaelin

5 days ago

this is my personal singularity moment this post may sound like a paid ad. I only wish. I'm concerned, more so than happy. the world is changing, and, among the scenarios where AI goes terribly wrong, inequality is the most realistic, yet, the one Anthropic seems to be the least concerned about. I'm glad OpenAI is taking the opposite stance: *personal AGI for everyone*. I think this is a commendable position in the times we live. but who am I in the queue of the bread? anyway, Fable is here, so I'll just report my first-hour experience first of all, all my pet prompts are solved. → λ-calculus puzzles → bug questions → one-shot apps all are trivial to it. I don't have anything harder other than my ongoing work so, in the last several days, I've been toying with HVM5, a new interaction net evaluator with a faster loop. after writing the first version, I left 32 GPT-5 agents working for ~20 hours each. this resulted in up to 2x speedups, but the file size increased by 2-fold and quality decreased significantly. I then simplified the whole thing into an even simpler core, and left Opus 4.8 and GPT 5.5 optimizing it for 8 hours. Opus got a legit 6% - 34% speedup in most benches. GPT got better results, but, sadly, an unusable file. I then asked Fable to optimize it. 2 hours later, it landed a 1770% speedup in one case, 100%+ in other 4, and 22% in average. yes, in 2 hours it outperformed me, opus 4.8 and a swarm of gpt 5.5 agents, by one order of magnitude. that could not possibly be legit. "it must be hardcoding the benchmarks" (GPT trauma). so I read its explanation and what it did was, indeed, the most high impact optimization one could try first. seems like HVM5 was wasting a lot of time garbage-collecting unused branches of pattern-match nodes. I had optimized that for static mats, but not for dynamic mats. skill issue. Fable figured how to do it for these, resulting in a massive speedup in some benches but wait, is that *correct*? I'm not sure yet, it is credible, but this is the kind of thing that is very easy to get wrong on interaction nets. the problem is, when I was ready to start auditing Fable's solution so I could tell whether it was buggy or legit, it interrupted me to tell me it had found a massive bug on the code *I* had written. ... wait, what? so... for garbage collection purposes, I stored a bit on lambda term pointers that meant "the variable bound by this lambda has been freed, so, its lambda must free whatever argument it is applied to". that's fine. yet, on duplicator nodes, I also used the same bit to mean "one of the duplicated variables was freed, so, treat this dup as a passthrough no-op". so, if a lambda entered a duplicator, it would mistake the lambda's collection bit for its own, resulting in corrupted interaction! that's a mouthful, why I'm writing this? just so you can appreciate the sheer absurdity of what just happened. I didn't ask it to find bugs. I asked it for an optimization. and even if I did ask it to find bugs, this bug is so astonishingly subtle and specific, identifying it takes mastering the domain to an extent that it beyond even me. I'd easily need hours or days to fix it, *if* I ever came across it. chances are it would just go unnoticed. and Fable found it and fixed it like it was nothing, while it was busy adding a 17x speedup to a file that neither I, nor Opus 4.8, nor a fleet of GPT 5.5 managed to barely make 2x faster. oh and there is also another tab where it is also ripping through Bend's codebase and finishing everything I had to do I don't know what to say anymore this isn't about Anthropic or OpenAI, this is about our collective future as a species. the world is changing, and we need to be aware of it, and discuss how to handle this change. receipt below . . .

VictorTaelin's tweet photo. this is my personal singularity moment

this post may sound like a paid ad. I only wish. I'm concerned, more so than happy. the world is changing, and, among the scenarios where AI goes terribly wrong, inequality is the most realistic, yet, the one Anthropic seems to be the least concerned about. I'm glad OpenAI is taking the opposite stance: *personal AGI for everyone*. I think this is a commendable position in the times we live. but who am I in the queue of the bread?

anyway, Fable is here, so I'll just report my first-hour experience

first of all, all my pet prompts are solved.
→ λ-calculus puzzles
→ bug questions
→ one-shot apps
all are trivial to it.

I don't have anything harder other than my
ongoing work

so, in the last several days, I've been toying with HVM5, a new interaction net evaluator with a faster loop.

after writing the first version, I left 32 GPT-5 agents working for ~20 hours each. this resulted in up to 2x speedups, but the file size increased by 2-fold and quality decreased significantly.

I then simplified the whole thing into an even simpler core, and left Opus 4.8 and GPT 5.5 optimizing it for 8 hours. Opus got a legit 6% - 34% speedup in most benches. GPT got better results, but, sadly, an unusable file.

I then asked Fable to optimize it.

2 hours later, it landed a 1770% speedup in one case, 100%+ in other 4, and 22% in average. yes, in 2 hours it outperformed me, opus 4.8 and a swarm of gpt 5.5 agents, by one order of magnitude.

that could not possibly be legit. "it must be hardcoding the benchmarks" (GPT trauma). so I read its explanation and what it did was, indeed, the most high impact optimization one could try first. seems like HVM5 was wasting a lot of time garbage-collecting unused branches of pattern-match nodes. I had optimized that for static mats, but not for dynamic mats. skill issue. Fable figured how to do it for these, resulting in a massive speedup in some benches

but wait, is that *correct*? I'm not sure yet, it is credible, but this is the kind of thing that is very easy to get wrong on interaction nets. the problem is, when I was ready to start auditing Fable's solution so I could tell whether it was buggy or legit, it interrupted me to tell me it had found a massive bug on the code *I* had written.

... wait, what?

so... for garbage collection purposes, I stored a bit on lambda term pointers that meant "the variable bound by this lambda has been freed, so, its lambda must free whatever argument it is applied to". that's fine. yet, on duplicator nodes, I also used the same bit to mean "one of the duplicated variables was freed, so, treat this dup as a passthrough no-op". so, if a lambda entered a duplicator, it would mistake the lambda's collection bit for its own, resulting in corrupted interaction!

that's a mouthful, why I'm writing this?

just so you can appreciate the sheer absurdity of what just happened. I didn't ask it to find bugs. I asked it for an optimization. and even if I did ask it to find bugs, this bug is so astonishingly subtle and specific, identifying it takes mastering the domain to an extent that it beyond even me. I'd easily need hours or days to fix it, *if* I ever came across it. chances are it would just go unnoticed. and Fable found it and fixed it like it was nothing, while it was busy adding a 17x speedup to a file that neither I, nor Opus 4.8, nor a fleet of GPT 5.5 managed to barely make 2x faster.

oh and there is also another tab where it is also ripping through Bend's codebase and finishing everything I had to do

I don't know what to say anymore

this isn't about Anthropic or OpenAI, this is about our collective future as a species. the world is changing, and we need to be aware of it, and discuss how to handle this change.

receipt below . . .

251

680

Vincenzo

@Kenfus2

5 days ago

ArcAGI V2 is difficult.... Well, for vision models at least. You REALLY have to fine-tune those models on the task, else it just does not work! It's incredible.

Vincenzo

@Kenfus2

5 days ago

I just checked out the #lego bubble on Twitter and wow, so much entertainment

Vincenzo

@Kenfus2

5 days ago

@abhijitwt Cries in EU… wait I am Swiss 😍

Vincenzo

@Kenfus2

5 days ago

@xeophon Super impressive @Kimi_Moonshot! So, how many Spark boxes do i need to run this locally and have an AI girlfriend?

Vincenzo

@Kenfus2

6 days ago

@Polymarket But is it not SpaceXAI?

454

Vincenzo

@Kenfus2

6 days ago

@sattyyouneed I thought so but I always have a moment where I’m stuck with codex and then opus 4.8 comes to the rescue. It finds stuff I and GPT didn’t

Vincenzo

@Kenfus2

6 days ago

@enjojoyy There are many different whales! Did you see a spernwhale? I have yet to see one but I feel seeing a 16 meter predator in RL has to trigger something in your brain

Vincenzo

@Kenfus2

6 days ago

@enjojoyy Congratulations! Diving is one of the best things in the world and shuts my brain off like nothing else

Vincenzo

@Kenfus2

6 days ago

@leo1yu Yes, but I need a better paying job to drop my phone case

Vincenzo

@Kenfus2

6 days ago

@steipete @heyandras Yes, also research should be taken over by such a loop, after you have created the framework and defined a highly dimensional metric (a sum of weighted metrics).

Vincenzo

@Kenfus2

6 days ago

@elonmusk https://t.co/pxi36ChbmF Did you watch this? Any comment?

Vincenzo

@Kenfus2

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users