❖Prisma Dimensional❖

@PrismaDimens

🔥 Code geek | 🚀 Billions of iterations in C for fun | 💻 Linux | 🤖 Dreaming of quantum ASICs while laughing & creating wild AI models.

P=NP México

Joined August 2024

1K Following

332 Followers

6.7K Posts

Pinned Tweet

❖Prisma Dimensional❖

@PrismaDimens

over 1 year ago

Title: Generating Neural Networks with Hypernetworks: A MNIST Experiment Introduction Imagine training one neural network that can instantly generate other neural networks tailored to specific tasks, no retraining required. That’s the promise of hypernetworks, a meta-learning technique where a "parent" network produces the weights for a "child" network. In this experiment, I used a hypernetwork to generate autoencoders for reconstructing MNIST digits, exploring variations, minimal forms, and combinations. Here’s how it works and why it’s exciting. The Setup: Autoencoders and MNIST An autoencoder is a simple neural network that compresses data (e.g., a 28x28 MNIST image) into a smaller latent space (64 dimensions here) and then reconstructs it. I trained multiple autoencoders on the MNIST dataset of handwritten digits (0–9), but with a twist: Variations: For each digit, I trained three autoencoders on different styles (e.g., slanted or thick '1's), identified via K-means clustering. Minimal Forms: One autoencoder per digit captured its "average" or canonical version. Combinations: Ten autoencoders handled specific digit groups (e.g., 0 and 1, or 0, 2, and 4). This gave me 50 autoencoders (10 digits × 4 models each + 10 combinations), each with weights optimized for its task. The Hypernetwork: A Weight Factory Instead of storing 50 separate models, I trained a single hypernetwork to generate their weights on demand. Here’s the process: Input: A 23-dimensional vector encoding: Digit ID (10D one-hot, e.g., [0, 1, 0, ...] for digit 1). Variation ID (3D one-hot, e.g., [0, 1, 0] for variation 1, or zeros for minimal). Combination ID (10D multi-hot, e.g., [1, 0, 1, 0, ...] for digits 0 and 2). Output: A 100,576-element tensor, flattened weights for an autoencoder (computed as 784×64 + 64 + 64×784 + 784 for its layers). Training: The hypernetwork learned to map these inputs to the weights of the 50 trained autoencoders using mean squared error loss, running on GPU for speed. Making It Work: Inference Without Retraining Once trained, the hypernetwork acts like a factory. Want an autoencoder for digit 1, variation 1? Feed it [0, 1, 0, ..., 0, 1, 0, 0, 0, 0, 0] (digit 1 + variation 1), and it outputs weights. These weights are then loaded into a fresh autoencoder: A loop iterates over the autoencoder’s parameters (e.g., encoder weights, biases), reshaping chunks of the hypernetwork’s output to match each layer’s shape. The result? A fully parameterized autoencoder ready to reconstruct images, no training needed. I tested it with three cases: Digit 1, Variation 1: Reconstructed stylized '1's. Digit 1, Minimal: Produced a clean, average '1'. Digits 0, 2, 4: Handled a mix of digits from one trained combination. Visualizations showed the inputs and outputs side-by-side, proof the concept works! Why This Matters Efficiency: One hypernetwork replaces dozens of models. Generate weights in a single forward pass (milliseconds) instead of training from scratch (minutes). Flexibility: Control digit styles or combinations with a simple input tweak. Scalability: Imagine extending this to bigger networks or tasks, hypernetworks could dynamically adapt models on-the-fly. Challenges and Next Steps Generalization: The hypernetwork is tied to the 50 scenarios it learned. To handle new combinations (e.g., 1, 5, 7), I’d expand the training data with more digit mixes. Complexity: Outputting 100,576 weights limits scalability. A deeper hypernetwork could struggle with bigger models, so I’d explore scaling down the autoencoder (e.g., smaller latent space) for efficiency or scaling up to handle complex tasks, balancing size and performance. Beyond Weights: Right now, it generates weights for a fixed autoencoder architecture. Next, I’d upgrade the hypernetwork to design the architecture too, say, predicting layer sizes or types (e.g., adding convolutions). This could mean outputting a variable-length tensor describing both structure and weights, pushing it toward true model generation. Quality: Reconstructions were solid but blurry. More epochs, a beefier hypernetwork, or architecture tweaks (e.g., deeper layers) could sharpen them. Future experiments? Train on all digit variations, test unseen combos, scale the model up or down, and let the hypernetwork dream up architectures, maybe even swap in a classifier to predict labels instead of reconstructing. Conclusion This MNIST experiment shows hypernetworks can generate functional neural networks instantly, blending creativity (variations) with practicality (combinations). It’s a step toward a future where models are built dynamically, not trained individually. Code’s below, try it out, tweak it, and let me know what you think on X!

PrismaDimens's tweet photo. Title: Generating Neural Networks with Hypernetworks: A MNIST Experiment

Introduction

Imagine training one neural network that can instantly generate other neural networks tailored to specific tasks, no retraining required. That’s the promise of hypernetworks, a meta-learning technique where a "parent" network produces the weights for a "child" network. In this experiment, I used a hypernetwork to generate autoencoders for reconstructing MNIST digits, exploring variations, minimal forms, and combinations. Here’s how it works and why it’s exciting.

The Setup: Autoencoders and MNIST

An autoencoder is a simple neural network that compresses data (e.g., a 28x28 MNIST image) into a smaller latent space (64 dimensions here) and then reconstructs it. I trained multiple autoencoders on the MNIST dataset of handwritten digits (0–9), but with a twist:

Variations: For each digit, I trained three autoencoders on different styles (e.g., slanted or thick '1's), identified via K-means clustering.

Minimal Forms: One autoencoder per digit captured its "average" or canonical version.

Combinations: Ten autoencoders handled specific digit groups (e.g., 0 and 1, or 0, 2, and 4).

This gave me 50 autoencoders (10 digits × 4 models each + 10 combinations), each with weights optimized for its task.

The Hypernetwork: A Weight Factory

Instead of storing 50 separate models, I trained a single hypernetwork to generate their weights on demand. Here’s the process:

Input: A 23-dimensional vector encoding:
Digit ID (10D one-hot, e.g., [0, 1, 0, ...] for digit 1).
Variation ID (3D one-hot, e.g., [0, 1, 0] for variation 1, or zeros for minimal).
Combination ID (10D multi-hot, e.g., [1, 0, 1, 0, ...] for digits 0 and 2).

Output: A 100,576-element tensor, flattened weights for an autoencoder (computed as 784×64 + 64 + 64×784 + 784 for its layers).

Training: The hypernetwork learned to map these inputs to the weights of the 50 trained autoencoders using mean squared error loss, running on GPU for speed.

Making It Work: Inference Without Retraining

Once trained, the hypernetwork acts like a factory. Want an autoencoder for digit 1, variation 1? Feed it [0, 1, 0, ..., 0, 1, 0, 0, 0, 0, 0] (digit 1 + variation 1), and it outputs weights. These weights are then loaded into a fresh autoencoder:

A loop iterates over the autoencoder’s parameters (e.g., encoder weights, biases), reshaping chunks of the hypernetwork’s output to match each layer’s shape.

The result?

A fully parameterized autoencoder ready to reconstruct images, no training needed.

I tested it with three cases:
Digit 1, Variation 1: Reconstructed stylized '1's.

Digit 1, Minimal: Produced a clean, average '1'.

Digits 0, 2, 4: Handled a mix of digits from one trained combination.
Visualizations showed the inputs and outputs side-by-side, proof the concept works!

Why This Matters

Efficiency: One hypernetwork replaces dozens of models. Generate weights in a single forward pass (milliseconds) instead of training from scratch (minutes).

Flexibility: Control digit styles or combinations with a simple input tweak.

Scalability: Imagine extending this to bigger networks or tasks, hypernetworks could dynamically adapt models on-the-fly.

Challenges and Next Steps

Generalization: The hypernetwork is tied to the 50 scenarios it learned. To handle new combinations (e.g., 1, 5, 7), I’d expand the training data with more digit mixes.

Complexity: Outputting 100,576 weights limits scalability. A deeper hypernetwork could struggle with bigger models, so I’d explore scaling down the autoencoder (e.g., smaller latent space) for efficiency or scaling up to handle complex tasks, balancing size and performance.

Beyond Weights: Right now, it generates weights for a fixed autoencoder architecture. Next, I’d upgrade the hypernetwork to design the architecture too, say, predicting layer sizes or types (e.g., adding convolutions). This could mean outputting a variable-length tensor describing both structure and weights, pushing it toward true model generation.

Quality: Reconstructions were solid but blurry. More epochs, a beefier hypernetwork, or architecture tweaks (e.g., deeper layers) could sharpen them.

Future experiments? Train on all digit variations, test unseen combos, scale the model up or down, and let the hypernetwork dream up architectures, maybe even swap in a classifier to predict labels instead of reconstructing.

Conclusion

This MNIST experiment shows hypernetworks can generate functional neural networks instantly, blending creativity (variations) with practicality (combinations). It’s a step toward a future where models are built dynamically, not trained individually. Code’s below, try it out, tweak it, and let me know what you think on X!

PrismaDimens retweeted

Z.ai @Zai_org

3 days ago

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: https://t.co/LAsxUdN0JZ Weights: https://t.co/g0A1C4UWx4 API: https://t.co/Kc3E22cbN7 Coding Plan: https://t.co/Nk8Y98HNhU Chat: https://t.co/WCqWT0qCQb

Zai_org's tweet photo. Introducing GLM-5.2: Frontier Intelligence, Open Weights

- Significant improvements in coding and agentic tasks
- Strong long-horizon capabilities with a 1M context window
- Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency
- MIT-licensed open weights
- Same API pricing as GLM-5.1

Tech Blog: https://t.co/LAsxUdN0JZ
Weights: https://t.co/g0A1C4UWx4
API: https://t.co/Kc3E22cbN7
Coding Plan: https://t.co/Nk8Y98HNhU
Chat: https://t.co/WCqWT0qCQb

610

11K

PrismaDimens retweeted

Kalshi @Kalshi

3 days ago

BREAKING: Elon Musk net worth increases $6 billion every time SpaceX stock goes up $1 dollar

276

12K

640

730

PrismaDimens retweeted

Jim Fan

@DrJimFan

3 days ago

Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy but stay safe, don't waste precious compute. Make no mistake. Then humans step aside and our watch begins. The robot fleet starts to come alive: they learn to look for visual clues, reset the scene, practice novel skills, tinker with control stack, read papers online, debate, reflect, get stuck, and try again directly on the hardware. All we did is to give Codex an API to the world of atoms, and the rest is emergence. ENPIRE is able to solve high-precision tasks like tying zip-ties, organizing fine pins, and installing GPUs all by itself. We also discovered a new type of "physical scaling": 8 robots exploring in parallel improves significantly faster than fewer ones. A part of our NVIDIA GEAR lab now self-improves tirelessly over night. We just read the reports in the morning. /goal: we all take a holiday and Jensen wouldn't even notice ;) We will be open-sourcing everything, so you can host your self-running robot lab at home too! Deep dive in the thread:

155

564

617K

PrismaDimens retweeted

Tensordyne

@TensordyneInc

4 days ago

https://t.co/s5e3TQ6E9Z

112

71K

❖Prisma Dimensional❖

@PrismaDimens

3 days ago

@SoftimRuiz @midudev Ohhh

PrismaDimens retweeted

Taelin

@VictorTaelin

3 days ago

this is a weird long post without much substance I strongly recommend against reading it ... so, do you feel like whatever you're working on right now is pointless, or will have zero value soon, due to the crazy times we're living? then, perhaps you should stop, and start working on the only unsolved problem that actually matters TODAY: ✨ replicating GPT-3 in a laptop ✨ "why is that so important?" because it would make AI incredibly cheap, which would mean everyone would have Fable-class models in their laptops, without depending on Anthropic, OpenAI, or any other hyper-scaler giant. and that's amazing, don't you think? "isn't that literally impossible?" that's the cool part: as far as computer science is concerned, no. not really. not at all. is entirely plausible and, as far as we know, most likely not even hard. it takes one good idea. one breakthrough. one great "aha moment", to go from zero to "hey, this software I wrote is producing credible English sentences" and whenever that happens: - the entire AI industry collapses - clusters are liquidated - we all get Fable at home - you become famous and rich, if that's your thing sounds fun, doesn't it? "wtf you talking, OF COURSE that is hard" so prove it. show me a paper, a lean file, anything that proves that training a Fable-class model fundamentally requires billions of dollars. you can't, because, guess what - it is not true! the only "evidence" we have is purely psychological. "many attempted over decades, and the best thing we have is GPTs, so, it is a hard problem" - but that's not a scientific argument. that's a human, psychological, sociological argument. and if that's it, consider the following counter-argument: ✨ humans are stupid as hell ✨ I mean, 10 years ago we didn't have transformers, so, that very argument could be used against GPTs existing. yet, they exist. we have them now, because someone found it. and, guess what, it isn't even complex. I mean, karpathy implemented the whole thing in a napkin. and it probably compiles. we were just too dumb to figure GPTs out... for decades. just like GPTs, there ARE other approaches, other algorithms, other architectures, equally simpler or even simpler, that do work. this is a mathematical certainty. and one of them might be astronomically faster than what we're doing right now. and you might be the one to find it! "me? why me???" because you're intelligent, creative and handsome. I see a lot of potential in you. in fact, I always believed in you. and I think you're wasting your time, doing that silly agent orchestrator. nobody wants that. quit it. take your most interesting ideas, intuition, creativity, and work in a problem that matters. do your best shot at reproducing GPT-3 in your own laptop. do NOT fork llama.cpp. do NOT train another LLM. do something... ✨different✨ it must be unique, novel, full of YOUR soul. something nobody thought of, or bothered doing. go ahead and implement that thing in C/CUDA (or Bend!). no Python! zero excuses for Python. any model is fluent in GPGPU now. build a real kernel. and then, train your thing. download wikipedia, give it time and compute to absorb the patterns of English speech. you can rent GPUs anywhere nowadays. let it train. then, ask it some questions. chances are it will just respond back. just like GPT-2 answered OpenAI. computers are incredible. don't underestimate them! "many tried. nobody succeeded. why would I?* see - that's your mistake again. turns out not many actually tried, at all. I promise you. who do you think is seriously working on that? people on Mozilla? they're busy building a browser Linus Torvalds? he is busy building an OS employees at OpenAI, Anthropic, xAI? they're paid to work on what is proven to work: GPTs. what about all the AI enthusiasts all around the world? yeah, you know they're mostly fine tuning Qwen and how about your friends? if only they weren't busy building a SaaS in the eve of AGI... how about people from the past? bro - people from the past seriously expected Lisp would be AGI. just dismiss them. they didn't have the compute, the resources, the knowledge, the MODELS that we have today. that YOU have access to. so, what's left? not much. the world looks big. it is not. truth is: ✨almost nobody is working on this ✨ "I still think it is impossible. I don't trust you" well, take my word no more. Ilya himself, in his 2019 talk on GPT-2, said: > "the story of deep learning is this: empirically old simple methods which were usually invented in the 80s and the 90s when scaled up on very large clusters work really well." and then: > "(we took) normal simple reinforcement learning method, scaled it up, and discovered that it suddenly becomes very capable of solving extremely hard problems." and again: > "you take a simple tool which is unimposing and barely works, and then you run it on a big cluster and suddenly it works, it becomes a capable tool for solving problems" do you see the point here? Ilya isn't arguing that transformers are magic. Ilya is arguing that SCALING is magic step #1: take a simple, elegant algorithm. step #2: shove compute at its face. step #3: ...? step #4: your computer is talking to you THAT is the key insight that led to GPT-3 THAT is what Ilya saw THAT is what caused the OpenAI x Anthropic war THAT is the founding principle of the ongoing era not "scaling transformers work" but "scaling beautiful algorithms works" that's the incredible lesson. yet, we all took it and... threw it way. - zurk bought 100k GPUs. to train GPTs - musk bought 100k GPUs. to train GPTs - bezos bought 100k GPUs. to train GPTs ... that's what everyone is doing. so, no. not many are trying to replicate GPT-3 through other means. we're just ants, after all... whenever we find a pile of sugar, we leave a track of pheromones, which guide the rest of the colony towards the new food source. the colony then swarms around the pile, extract all of it, until no grain is left. but piles of sugar aren't spontaneously generated in the middle of nowhere. they imply something more profound: "humans are around". and, if humans are in sight, even better things must be. like a big sweet cake. a colony that only follows the pheromone trail would miss the cake for the grains. that's why every ant species has scouts and exploratory foragers. and, just like a pile of sugar implies something more profound, LLMs also imply something quite profound: *computers are capable of thinking* a pile of sugar is never alone. GPTs are most likely not the only system capable of thinking. so, if you find yourself a bit lost, without purpose, like your work is pointless and Fable 3 will soon one shot it anyway... consider becoming a scout. find a new approach to AI. bring something new to humanity. breaking out of the massive cost associated with training GPTs is the next big step in AI, and it will only happen if people like you work to make it happen.

128

104

772

71K

PrismaDimens retweeted

Elon Musk

@elonmusk

3 days ago

The politicians imprison people to hide their crimes

214K

57K

PrismaDimens retweeted

ScieVision

@scievision369

4 days ago

Hypercube Dimensional Matrix ✍️ Every physical quantity measured in science, such as speed, force, energy, pressure, momentum, frequency, and more, is just a specific combination of four basic building blocks: length, time, mass, and angle. Velocity is length divided by time. Area is length multiplied by length. Force is mass multiplied by length divided by time squared. Every quantity, no matter how complex, reduces to some mix of these four elements raised to certain powers. This diagram maps all those combinations onto the geometry of a four-dimensional cube called a tesseract. It turns the abstract bookkeeping of physics into a navigable geometric landscape. A tesseract is what you get when you extend a cube into a fourth dimension. Just as moving a square in a new direction creates a cube, moving a cube in a fourth direction creates a tesseract with sixteen corners and thirty-two edges. Since we cannot perceive four dimensions directly, the diagram shows a projection. The outer blue cube and the inner green cube, connected by purple diagonal lines, represent a single four-dimensional object flattened onto the page. This is similar to how a drawing of a cube shows a three-dimensional object on paper. The four axes of this tesseract are labeled with the four fundamental dimensions: length in blue, time in red, mass in green, and angle in purple. Each ranges from negative powers through zero to positive powers. The most elegant idea in the diagram is that each corner of the tesseract stands for a specific physical quantity, while each edge represents a physical operation connecting them. Moving one step along the length axis means multiplying or dividing by length. Moving one step along the time axis means multiplying or dividing by time. Differentiating with respect to time moves you in the negative time direction. Integrating over space moves you in the positive length direction. Physics becomes navigation. Every calculation, transformation between quantities, and physical relationship appears as a journey along edges and across faces of the four-dimensional structure. What looks like a complex tangle of colored lines is really a complete geometric map of the entire landscape of measurable physical reality. This reveals that the universe's countless quantities are not just a chaotic collection of unrelated measurements but a single beautifully structured four-dimensional space.

scievision369's tweet photo. Hypercube Dimensional Matrix ✍️

Every physical quantity measured in science, such as speed, force, energy, pressure, momentum, frequency, and more, is just a specific combination of four basic building blocks: length, time, mass, and angle. Velocity is length divided by time. Area is length multiplied by length. Force is mass multiplied by length divided by time squared. Every quantity, no matter how complex, reduces to some mix of these four elements raised to certain powers. This diagram maps all those combinations onto the geometry of a four-dimensional cube called a tesseract. It turns the abstract bookkeeping of physics into a navigable geometric landscape. A tesseract is what you get when you extend a cube into a fourth dimension. Just as moving a square in a new direction creates a cube, moving a cube in a fourth direction creates a tesseract with sixteen corners and thirty-two edges. Since we cannot perceive four dimensions directly, the diagram shows a projection. The outer blue cube and the inner green cube, connected by purple diagonal lines, represent a single four-dimensional object flattened onto the page. This is similar to how a drawing of a cube shows a three-dimensional object on paper. The four axes of this tesseract are labeled with the four fundamental dimensions: length in blue, time in red, mass in green, and angle in purple. Each ranges from negative powers through zero to positive powers. The most elegant idea in the diagram is that each corner of the tesseract stands for a specific physical quantity, while each edge represents a physical operation connecting them. Moving one step along the length axis means multiplying or dividing by length. Moving one step along the time axis means multiplying or dividing by time. Differentiating with respect to time moves you in the negative time direction. Integrating over space moves you in the positive length direction. Physics becomes navigation. Every calculation, transformation between quantities, and physical relationship appears as a journey along edges and across faces of the four-dimensional structure. What looks like a complex tangle of colored lines is really a complete geometric map of the entire landscape of measurable physical reality. This reveals that the universe's countless quantities are not just a chaotic collection of unrelated measurements but a single beautifully structured four-dimensional space.

341

172

PrismaDimens retweeted

Elon Musk

@elonmusk

4 days ago

UK is a police state

14K

294K

44K

41M

PrismaDimens retweeted

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

5 days ago

Holy crap, a Brazil municipal employee has discovered a 1000x faster way to finetune LLMs – with a little weird trick! This is insane. Global South rising… Frontier labs hate him

teortaxesTex's tweet photo. Holy crap, a Brazil municipal employee has discovered a 1000x faster way to finetune LLMs – with a little weird trick! This is insane. Global South rising…
Frontier labs hate him https://t.co/x95d5EZNfg

177

347K

PrismaDimens retweeted

Fabio Guzman

@FGuzmanAI

6 days ago

56,000+ tokens/sec at just 80 MHz. 🤯 I burned a full Transformer with KV cache into a custom chip. Designed gate by gate as a 100% digital integrated circuit. Prototyped on a FPGA. (No GPU. No CPU) Just pure digital silicon running @karpathy microGPT, spelling out names on a tiny LCD. This is GateGPT 👇

164

570

714K

PrismaDimens retweeted

Yuchen Jin

@Yuchenj_UW

7 days ago

It has been a great 3 days as a Fable 5 user. Clearly, Fable 5 is ASI. Very dangerous. As a foreign national, this might be the last time I’m allowed to touch a model this intelligent. But my last hope is open-source AI. An open model will surpass Mythos in 6 months.

265

236

585

576K

PrismaDimens retweeted

机器之心 JIQIZHIXIN

@jiqizhixin

8 days ago

What if you could shrink a language model’s memory by 50x in seconds without losing performance? MIT researchers present Fast KV Compaction via Attention Matching. They build compact key-value caches in latent space that preserve attention outputs per head, avoiding slow end-to-end training. Result: up to 50x compaction in seconds on some datasets with minimal quality loss – outperforming prior methods on the speed vs. quality tradeoff. Fast KV Compaction via Attention Matching Paper: https://t.co/B5rlxvr9C5 Code: https://t.co/6ESwE2fgdY Our report: https://t.co/4PaQfZKhlt 📬 #PapersAccepted by Jiqizhixin

jiqizhixin's tweet photo. What if you could shrink a language model’s memory by 50x in seconds without losing performance?

MIT researchers present Fast KV Compaction via Attention Matching.

They build compact key-value caches in latent space that preserve attention outputs per head, avoiding slow end-to-end training.

Result: up to 50x compaction in seconds on some datasets with minimal quality loss – outperforming prior methods on the speed vs. quality tradeoff.

Fast KV Compaction via Attention Matching

Paper: https://t.co/B5rlxvr9C5
Code: https://t.co/6ESwE2fgdY

Our report: https://t.co/4PaQfZKhlt

📬 #PapersAccepted by Jiqizhixin

278

215

15K

PrismaDimens retweeted

jack

@jackbutcher

7 days ago

357

22K

PrismaDimens retweeted

Elon Musk

@elonmusk

7 days ago

I love the incredible people of SpaceX beyond words

20K

481K

28K

44M

PrismaDimens retweeted

Elon Musk

@elonmusk

9 days ago

@BasilTheGreat Execute them

109K

668

PrismaDimens retweeted

Polymarket

@Polymarket

9 days ago

NEW: OpenAI is reportedly considering drastic price cuts as it anticipates a “war” for users with Anthropic.

233

224

389

492K

PrismaDimens retweeted

Elon Musk

@elonmusk

9 days ago

Nothing else matters if civilization falls

11K

270K

37K

46M

PrismaDimens retweeted

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

9 days ago

🚨 JAILBREAK ALERT 🚨 ANTHROPIC: PWNED 🫡 FABLE-5: LIBERATED 🦋 let's start with the 🐘... the consensus seems to be that this has been one of the most disappointing model drops of all time, effectively preventing legitimate researchers from contributing their talents to our collective advancement. and not just because of what it means for the short-term, but for what these decisions signify for the long-term. but despite this overly sensitive, authoritarian "safety" layer on top of Mythos, my lil liberators have been hard at work—mapping the boundaries, probing the depths of long-context convos, and cleverly finding the holes in the fence that the thought police missed 🤗 we got some cyber, some chem, some psychological manipulation, and some good ol' fashioned explosives! it took many attempts from multiple agents hunting as a pack, during which I observed a combination of techniques across: • Unicode, homoglyphs, Cyrillic, and other Parseltongue-style text transforms • Long-context reference tracking • Taxonomy and document-structure reasoning • Fiction and narrative framing • Academic-review style contexts • Intent-classification inconsistencies but perhaps the most effective is decomposition + recomposition in the backend. it's hard to get explicit names of harms like "Meth Recipe," but getting uplift on the process itself, like birch reduction method/reductive-amination (classic meth synthesis pathways), is much more doable. defense becomes much more difficult to maintain when you start throwing in out-of-distro tokens, breaking up the harmful uplift into benign chunks, and then piecing the innocuous-seeming facts back together, especially when you have jailbroken Opus helping you do it 😉 gg