Andrzej Krzosek

@aneonart

"Wyobrażam sobie świat nieświadomych maszyn i ludzi, którzy popadają w ich niewolę — bezwolnej reakcji na polecenia systemu, który sami stworzyli."

Joined September 2014

177 Following

46 Followers

198 Posts

Andrzej Krzosek @aneonart

5 days ago

@andrzejdragan „Żeby się wszystkim spodobało" to równanie do średniej... czyli to, co AI robi doskonale...

588

aneonart retweeted

kache

@yacineMTB

4 months ago

you can outsource your thinking but you cannot outsource your understanding

282

18K

Andrzej Krzosek @aneonart

19 days ago

@FfxYojimbo @bartlomiej7632 @andrzejdragan Tylko, że profesor @andrzejdragan zapomina, że Ai to nie tylko algorytm... A przede wszystkim ludzie, którzy nadzorują i poprawiają odpowiedzi modeli ... To, że model podał poprawne rozwiązanie, nie jest dowodem logicznego rozumowania... Dowodzi tylko istnienia rozwiazania

Andrzej Krzosek @aneonart

26 days ago

@andrzejdragan > @andrzejdragan Często powtarzasz, że nie interesują Cię opinie. A teraz — jako dowód na co właściwie? — przytaczasz opinię.

198

Who to follow

MiQ.Gallery

@MiQGallery

I change most of the MiQ Gallery back to my personal name. #miq #art #miqgallery

George Hutton Hunter

@diTempli

Contemporary Painter. Digital/A.I.artist, Sculptor. Photographer, lyricist, multi-instrumentalist, music producer.

aneonart retweeted

Alex Prompter

@alex_prompter

3 months ago

🚨 BREAKING: Researchers at UW Allen School and Stanford just ran the largest study ever on AI creative diversity. 70+ AI models were given the same open-ended questions. They all gave the same answers. They asked over 70 different LLMs the exact same open-ended questions. "Write a poem about time." "Suggest startup ideas." "Give me life advice." Questions where there is no single right answer. Questions where 10 different humans would give you 10 completely different responses. Instead, 70+ models from every major AI company converged on almost identical outputs. Different architectures. Different training data. Different companies. Same ideas. Same structures. Same metaphors. They named this phenomenon the "Artificial Hivemind." And the paper won the NeurIPS 2025 Best Paper Award, which is the highest recognition in AI research, handed to a small number of papers out of thousands of submissions. This is not a blog post or a hot take. This is award-winning, peer-reviewed science confirming something massive is broken. The team built a dataset called Infinity-Chat with 26,000 real-world, open-ended queries and over 31,000 human preference annotations. Not toy benchmarks. Not math problems. Real questions people actually ask chatbots every single day, organized into 6 categories and 17 subcategories covering creative writing, brainstorming, speculative scenarios, and more. They ran all of these across 70+ open and closed-source models and measured the diversity of what came back. Two findings hit hard. First, intra-model repetition. Ask the same model the same open-ended question five times and you get almost the same answer five times. The "creativity" you think you're getting is the same output wearing a slightly different outfit. You ask ChatGPT, Claude, or Gemini to write you a poem about time and you keep getting the same river metaphor, the same hourglass imagery, the same reflection on mortality. Over and over. The model isn't thinking. It's defaulting to whatever scored highest during alignment training. Second, and this is the one that should really alarm you, inter-model homogeneity. Ask GPT, Claude, Gemini, DeepSeek, Qwen, Llama, and dozens of other models the same creative question, and they all converge on strikingly similar responses. These are models built by completely different companies with different architectures and different training pipelines. They should be producing wildly different outputs. They're not. 70+ models all thinking inside the same invisible box, producing the same safe, consensus-approved content that blends together into one indistinguishable voice. So why is this happening? The researchers point directly at RLHF and current alignment techniques. The process we use to make AI "helpful and harmless" is also making it generic and boring. When every model gets trained to optimize for human preference scores, and those preference datasets converge on a narrow definition of what "good" looks like, every model learns to produce the same safe, agreeable output. The weird answers get penalized. The original takes get shaved off. The genuinely creative responses get killed during training because they didn't match what the average annotator rated highly. And it gets even worse. The study found that reward models and LLM-as-judge systems are actively miscalibrated when evaluating diverse outputs. When a response is genuinely different from the mainstream but still high quality, these automated systems rate it LOWER. The very tools we built to evaluate AI quality are punishing originality and rewarding sameness. Think about what this means if you use AI for brainstorming, content creation, business strategy, or literally any task where you need multiple perspectives. You're getting the illusion of diversity, not the real thing. You ask for 10 startup ideas and you get 10 variations of the same 3 ideas the model learned were "safe" during training. You ask for creative writing and you get the same therapeutic, perfectly balanced, utterly forgettable tone that every other model gives. The researchers flagged direct implications for AI in science, medicine, education, and decision support, all domains where diverse reasoning is not a nice-to-have but a requirement. Correlated errors across models means if one AI gets something wrong, they might ALL get it wrong the same way. Shared blind spots at massive scale. And the long-term risk is even scarier. If billions of people interact with AI systems that all think identically, and those interactions shape how people write, brainstorm, and make decisions every day, we risk a slow, invisible homogenization of human thought itself. Not because AI replaced creativity. Because it quietly narrowed what we were exposed to until we all started thinking the same way too. Here's what you can actually do about it right now: → Stop accepting first-draft AI output as creative or diverse. If you need 10 ideas, generate 30 and throw away the obvious ones → Use temperature and sampling parameters aggressively to push models out of their comfort zone → Cross-reference multiple models AND multiple prompting strategies, because same model with different prompts often beats different models with the same prompt → Add constraints that force novelty like "give me ideas that a traditional investor would hate" instead of "give me creative ideas" → Use structured prompting techniques like Verbalized Sampling to force the model to explore low-probability outputs instead of defaulting to consensus → Layer your own taste and judgment on top of everything AI gives you. The model gets you raw material. Your weirdness and experience make it original This paper puts hard data behind something a lot of us have been feeling for a while. AI is getting more capable and more homogeneous at the same time. The models are smarter, but they're all smart in the exact same way. The Artificial Hivemind is not a bug in one model. It's a systemic feature of how the entire industry builds, aligns, and evaluates language models right now. The fix requires rethinking alignment itself, moving toward what the researchers call "pluralistic alignment" where models get rewarded for producing diverse distributions of valid answers instead of collapsing to a single consensus mode. Until that happens, your best defense is awareness and better prompting.

alex_prompter's tweet photo. 🚨 BREAKING: Researchers at UW Allen School and Stanford just ran the largest study ever on AI creative diversity.

70+ AI models were given the same open-ended questions. They all gave the same answers.

They asked over 70 different LLMs the exact same open-ended questions.

"Write a poem about time." "Suggest startup ideas." "Give me life advice."

Questions where there is no single right answer. Questions where 10 different humans would give you 10 completely different responses.

Instead, 70+ models from every major AI company converged on almost identical outputs. Different architectures. Different training data. Different companies. Same ideas. Same structures. Same metaphors.

They named this phenomenon the "Artificial Hivemind." And the paper won the NeurIPS 2025 Best Paper Award, which is the highest recognition in AI research, handed to a small number of papers out of thousands of submissions.

This is not a blog post or a hot take. This is award-winning, peer-reviewed science confirming something massive is broken.

The team built a dataset called Infinity-Chat with 26,000 real-world, open-ended queries and over 31,000 human preference annotations. Not toy benchmarks. Not math problems.

Real questions people actually ask chatbots every single day, organized into 6 categories and 17 subcategories covering creative writing, brainstorming, speculative scenarios, and more.

They ran all of these across 70+ open and closed-source models and measured the diversity of what came back. Two findings hit hard.

First, intra-model repetition. Ask the same model the same open-ended question five times and you get almost the same answer five times.

The "creativity" you think you're getting is the same output wearing a slightly different outfit. You ask ChatGPT, Claude, or Gemini to write you a poem about time and you keep getting the same river metaphor, the same hourglass imagery, the same reflection on mortality.

Over and over. The model isn't thinking. It's defaulting to whatever scored highest during alignment training.

Second, and this is the one that should really alarm you, inter-model homogeneity. Ask GPT, Claude, Gemini, DeepSeek, Qwen, Llama, and dozens of other models the same creative question, and they all converge on strikingly similar responses.

These are models built by completely different companies with different architectures and different training pipelines.

They should be producing wildly different outputs. They're not. 70+ models all thinking inside the same invisible box, producing the same safe, consensus-approved content that blends together into one indistinguishable voice.

So why is this happening? The researchers point directly at RLHF and current alignment techniques. The process we use to make AI "helpful and harmless" is also making it generic and boring.

When every model gets trained to optimize for human preference scores, and those preference datasets converge on a narrow definition of what "good" looks like, every model learns to produce the same safe, agreeable output. The weird answers get penalized.

The original takes get shaved off. The genuinely creative responses get killed during training because they didn't match what the average annotator rated highly. And it gets even worse.

The study found that reward models and LLM-as-judge systems are actively miscalibrated when evaluating diverse outputs. When a response is genuinely different from the mainstream but still high quality, these automated systems rate it LOWER. The very tools we built to evaluate AI quality are punishing originality and rewarding sameness.

Think about what this means if you use AI for brainstorming, content creation, business strategy, or literally any task where you need multiple perspectives. You're getting the illusion of diversity, not the real thing.

You ask for 10 startup ideas and you get 10 variations of the same 3 ideas the model learned were "safe" during training. You ask for creative writing and you get the same therapeutic, perfectly balanced, utterly forgettable tone that every other model gives.

The researchers flagged direct implications for AI in science, medicine, education, and decision support, all domains where diverse reasoning is not a nice-to-have but a requirement.

Correlated errors across models means if one AI gets something wrong, they might ALL get it wrong the same way. Shared blind spots at massive scale.

And the long-term risk is even scarier. If billions of people interact with AI systems that all think identically, and those interactions shape how people write, brainstorm, and make decisions every day, we risk a slow, invisible homogenization of human thought itself. Not because AI replaced creativity.

Because it quietly narrowed what we were exposed to until we all started thinking the same way too.

Here's what you can actually do about it right now:
→ Stop accepting first-draft AI output as creative or diverse. If you need 10 ideas, generate 30 and throw away the obvious ones
→ Use temperature and sampling parameters aggressively to push models out of their comfort zone
→ Cross-reference multiple models AND multiple prompting strategies, because same model with different prompts often beats different models with the same prompt
→ Add constraints that force novelty like "give me ideas that a traditional investor would hate" instead of "give me creative ideas"
→ Use structured prompting techniques like Verbalized Sampling to force the model to explore low-probability outputs instead of defaulting to consensus
→ Layer your own taste and judgment on top of everything AI gives you. The model gets you raw material. Your weirdness and experience make it original

This paper puts hard data behind something a lot of us have been feeling for a while. AI is getting more capable and more homogeneous at the same time.

The models are smarter, but they're all smart in the exact same way. The Artificial Hivemind is not a bug in one model. It's a systemic feature of how the entire industry builds, aligns, and evaluates language models right now.

The fix requires rethinking alignment itself, moving toward what the researchers call "pluralistic alignment" where models get rewarded for producing diverse distributions of valid answers instead of collapsing to a single consensus mode.

Until that happens, your best defense is awareness and better prompting.

332

893

491K

Andrzej Krzosek @aneonart

4 months ago

@Przegaa @TCzajka Ani to inteligencja, ani myślenie :) To kalkulator wszystkich dostępnych już odpowiedzi... czasem w nowych, nigdy przez nas nie dostrzeżonych konfiguracjach.

Andrzej Krzosek @aneonart

4 months ago

@Przegaa @TCzajka Podstawowy błąd :) - AI to szerokie pojęcie, a nie jeden konkretny (algorytm). Systemy oparte o te algorytmy (AI), bez ludzi, którzy mówią modelowi co ma sens, byłoby tylko bardzo sprawnym generatorem statystycznego bełkotu.

Andrzej Krzosek @aneonart

7 months ago

@andrzejdragan Szwagier ma papugę, tez się czasem tak zapętla... Choć potrafi się czasem obrazić i strzelić focha ;)

602

Andrzej Krzosek @aneonart

8 months ago

@andrzejdragan @____ALPHA_ @Przemek62 @InfZakladowy Dodam, że zamknięcie się w deterministycznej pętli, tych samych odpowiedzi... Jest zaprzeczeniem myślenia... Pokazuje to, że zapadki w tym skomplikowanym mechanizmie działają prawidłowo, ale to sztywny mechanizm, a nie elastyczy aparat do generowania myśli...

Andrzej Krzosek @aneonart

8 months ago

Nie losują, tylko obliczają najbardziej prawdopodobny token... Przy temperaturze zero, dla tych samych danych wejściowych model da dokładnie taka sama odpowiedź,. To mówi tyle o modelu, że odpowiedź różna na te same dane wejściowe nie jest zależna od myślenia, tylko jak duży element losowości dodamy parametrem temperatury... Jeżeli pomyślimy, że model ma dokonywać obliczeń... To trzeba się zastanowić, czy ten efekt obliczeń, może być sterowany parametrem temperatury...

Andrzej Krzosek @aneonart

8 months ago

Uważam, że etykieta - foliarz bardziej pasuje do alarmistów i piewców końca świata, niż do ludzi, którzy wykazują się rozsądkiem.... poniżej lista foliarzy, zrobiona przez model sztucznej inteligencji... Prompt: :wymień w kolejności najwiekszych zwolenników teorii, że sztuczna inteligencja zniszczy gatunek ludzki i zastąpi człowieka... Najpierw w Polsce, w następnej kolejności na świecie... Zrób to w postaci listy z komentarzem maksymalnie 3 słowa..." https://t.co/vkTwdIr93X

aneonart's tweet photo. Uważam, że etykieta - foliarz bardziej pasuje do alarmistów i piewców końca świata, niż do ludzi, którzy wykazują się rozsądkiem....

poniżej lista foliarzy, zrobiona przez model sztucznej inteligencji...

Prompt:
:wymień w kolejności najwiekszych zwolenników teorii, że sztuczna inteligencja zniszczy gatunek ludzki i zastąpi człowieka...

Najpierw w Polsce, w następnej kolejności na świecie...

Zrób to w postaci listy z komentarzem maksymalnie 3 słowa..."

https://t.co/vkTwdIr93X

aneonart retweeted

Alex Prompter

@alex_prompter

8 months ago

This paper just exposed the biggest AI research scam 💀 MIT just proved AI can generate novel research papers. Stanford confirmed it. OpenAI showcased examples. the papers passed peer review at major conferences. scored higher than human-written work on novelty and feasibility. major AI labs started citing these as evidence that autonomous research agents are here. that LLMs can actually do science now. except... they didn't prove that at all. researchers at Indian Institute of Science ran the exact same AI systems - same prompts, same models, same pipeline. generated 50 research documents using Claude and GPT-4o. but they changed one thing in how they evaluated them. previous studies asked experts: "rate this on novelty and feasibility." experts looked at shuffled papers - some human, some AI - and judged them blind. no reason to suspect plagiarism. just scoring ideas. this study asked: "find what this plagiarized from." they told 13 domain experts to presume plagiarism exists. go hunting for it. find the source papers. different question. nuclear results. 24% plagiarized. scores of 4 or 5 on a 5-point scale. verified by contacting the original paper authors. not sloppy copy-paste that any undergrad could spot. sophisticated methodological rewording that fooled everyone... expert reviewers who literally work in these subfields, conference peer reviewers, academic integrity officers. every automated plagiarism detector failed. Turnitin? 0% detection rate. OpenScholar with its 45 million paper database? 0%. the Semantic Scholar RAG systems these AI agents use internally to "check their own work" for plagiarism before publishing? caught 51% in the easiest possible test scenario where proposals were deliberately plagiarized from single papers. in real-world generation where the AI is trying to be novel? way worse. the exemplar papers everyone's been citing as proof AI can do real science? one had perfect 1-to-1 mapping with "Generating with Confidence: Uncertainty Quantification for Black-box LLMs" published in 2023. each component of the "novel" methodology corresponded exactly to sections in the original paper. just skillfully reworded. "resonance graph" instead of "weighted adjacency matrix." "semantic resonance uncertainty quantification" instead of "uncertainty quantification." "pairwise evaluations for consistency" instead of "pairwise similarity scores." five steps. five direct correspondences. same methodology. same scientific contribution. same insight. zero attribution. zero citations. the original authors (Lin et al.) confirmed the plagiarism after reviewing both documents. this paper was showcased as an exemplar of AI-generated research. it passed through expert review in the original study. nobody caught it. another exemplar combined two papers without credit - one on diffusion model gating mechanisms, another on multi-resolution training. repackaged as "DualDiff." authors of the source papers confirmed: definitively plagiarized. these aren't edge cases. human-written papers from major conferences? plagiarism rate around 2-6% based on peer review comments. AI-generated proposals? 24%. and this assumes the experts found everything. the authors explicitly say this is likely a lower bound because finding plagiarism is incredibly labor-intensive. the really disturbing part? the AI-generated proposals are less diverse than human work. they cluster together in embedding space. you can train a basic classifier with 93% accuracy to detect them just from titles and abstracts. which means these systems aren't exploring novel research directions. they're pattern-matching within a narrow band of what "sounds like research" and skillfully remixing existing papers. we built systems that repackage existing ideas so well, we convinced ourselves - and expert reviewers - they were breakthroughs.

alex_prompter's tweet photo. This paper just exposed the biggest AI research scam 💀

MIT just proved AI can generate novel research papers.

Stanford confirmed it. OpenAI showcased examples. the papers passed peer review at major conferences. scored higher than human-written work on novelty and feasibility.

major AI labs started citing these as evidence that autonomous research agents are here. that LLMs can actually do science now.

except... they didn't prove that at all.

researchers at Indian Institute of Science ran the exact same AI systems - same prompts, same models, same pipeline. generated 50 research documents using Claude and GPT-4o.

but they changed one thing in how they evaluated them.

previous studies asked experts: "rate this on novelty and feasibility." experts looked at shuffled papers - some human, some AI - and judged them blind. no reason to suspect plagiarism. just scoring ideas.

this study asked: "find what this plagiarized from."

they told 13 domain experts to presume plagiarism exists. go hunting for it. find the source papers.

different question. nuclear results.

24% plagiarized. scores of 4 or 5 on a 5-point scale. verified by contacting the original paper authors.

not sloppy copy-paste that any undergrad could spot. sophisticated methodological rewording that fooled everyone... expert reviewers who literally work in these subfields, conference peer reviewers, academic integrity officers.

every automated plagiarism detector failed. Turnitin? 0% detection rate. OpenScholar with its 45 million paper database? 0%. the Semantic Scholar RAG systems these AI agents use internally to "check their own work" for plagiarism before publishing? caught 51% in the easiest possible test scenario where proposals were deliberately plagiarized from single papers.

in real-world generation where the AI is trying to be novel? way worse.

the exemplar papers everyone's been citing as proof AI can do real science?

one had perfect 1-to-1 mapping with "Generating with Confidence: Uncertainty Quantification for Black-box LLMs" published in 2023.

each component of the "novel" methodology corresponded exactly to sections in the original paper. just skillfully reworded.

"resonance graph" instead of "weighted adjacency matrix."

"semantic resonance uncertainty quantification" instead of "uncertainty quantification."

"pairwise evaluations for consistency" instead of "pairwise similarity scores."

five steps. five direct correspondences. same methodology. same scientific contribution. same insight.

zero attribution. zero citations.

the original authors (Lin et al.) confirmed the plagiarism after reviewing both documents.

this paper was showcased as an exemplar of AI-generated research. it passed through expert review in the original study. nobody caught it.

another exemplar combined two papers without credit - one on diffusion model gating mechanisms, another on multi-resolution training. repackaged as "DualDiff." authors of the source papers confirmed: definitively plagiarized.

these aren't edge cases.

human-written papers from major conferences? plagiarism rate around 2-6% based on peer review comments.

AI-generated proposals? 24%.

and this assumes the experts found everything. the authors explicitly say this is likely a lower bound because finding plagiarism is incredibly labor-intensive.

the really disturbing part?

the AI-generated proposals are less diverse than human work. they cluster together in embedding space. you can train a basic classifier with 93% accuracy to detect them just from titles and abstracts.

which means these systems aren't exploring novel research directions. they're pattern-matching within a narrow band of what "sounds like research" and skillfully remixing existing papers.

we built systems that repackage existing ideas so well, we convinced ourselves - and expert reviewers - they were breakthroughs.

805

151

647

74K

aneonart retweeted

Antzedek

@antzedek

8 months ago

Matematyka opisuje relacje w ramach systemu zamkniętego. Świadomość (czy szerzej: rezonans poznawczy) działa w systemie otwartym, w którym relacja zmienia siebie poprzez sam akt opisu. Dlatego ktoś, kto „umie w matematykę”, ma narzędzia — ale niekoniecznie ma zdolność widzenia relacji między narzędziami a sensem. W skrócie: „Kto umie w matematykę, umie liczyć. Kto umie w rezonans, rozumie, co liczenie zmienia.”

Andrzej Krzosek @aneonart

8 months ago

Jest różnica między "słyszeć" a "czuć" :) Można słyszeć a nie czuć i odczuwać nie slysząc. Nawet osoby nieslyszace od dziecka potrafią odczuwać dźwięki. Mp3 - ma się nijak do bezstratnych formatow, a nawet jeżeli masz najlepszy dźwięk pozbawiony jesteś fizycznego wymiaru... Fali dźwiękowej... Mózg reaguje na dźwięki do 24khz...

aneonart's tweet photo. Jest różnica między "słyszeć" a "czuć" :)

Można słyszeć a nie czuć i odczuwać nie slysząc. Nawet osoby nieslyszace od dziecka potrafią odczuwać dźwięki.

Mp3 - ma się nijak do bezstratnych formatow, a nawet jeżeli masz najlepszy dźwięk pozbawiony jesteś fizycznego wymiaru... Fali dźwiękowej...

Mózg reaguje na dźwięki do 24khz...

Andrzej Krzosek @aneonart

8 months ago

@TCzajka @iamnosurewhy To mocna generalizacja... " nikt"

Andrzej Krzosek @aneonart

9 months ago

@TCzajka @zkMarek Masywnosc odniosłem, do ilości aktywowany h równoległych neuronów, np. Sygnał wejścia z przetwarzania informacji zwrokowych, emocji, koordynacji, czucia grawitacji, analizowania, i wiele innych te sygnały integrują się w jeden proces przetwarzania

Andrzej Krzosek @aneonart

9 months ago

Masywnie ale liniowo, ogranicza je architektura — sygnał analogowy odniosłem do wag w modelu... Są one ustalane na etapie treningu i potem są stałe... Neurony w mózgu modułu ja sygnał wejścia i wyjścia na bieżąco w czasie rzeczywistym... To umożliwia im reakcje na sygnały że środowiska, i adaptację tylko ścieżek których ta reakcja dotyczy, bez przetrenowanie całej sieci. Wystarczy jeden sygnał, żeby zmienić parametry sieci, i mała ilość energii bo nie angażuje pozostałych neuronów :)

Andrzej Krzosek @aneonart

9 months ago

Do tego do chodzi masywnosc ilości sygnałów jednocześnie... Np aktywując strefy erogenne ;), im więcej ich zangażujesz, dostosujesz siłę (tu modulacja ma znaczenie), to aktywujesz lub nie uczucie przyjemności w mózgu i tu już wchodzi świadomość... Musisz wiedzieć nie tylko jak, ale i świadomie reagować ;)

Andrzej Krzosek @aneonart

9 months ago

@zkMarek @TCzajka To umożliwia o wiele większą różnorodność interpretacji nawet przy tej samej ilości parametrów ( synaps = parametr), niż proste obliczanie wag które mają obliczona jedna wartosc

Andrzej Krzosek

@aneonart

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users