firedevil

@firedevil

Cybersecurity | IOT | Sci-Fi

India

Joined November 2008

720 Following

34 Followers

368 Posts

firedevil @firedevil

14 days ago

Interesting project worth checking out: GhostRedRecon 📡 AI-assisted red team audit platform for authorized Wi-Fi, Bluetooth, IoT, and camera security assessments. https://t.co/bGI9wP6Yhx⁠�

firedevil retweeted

LaurieWired

@lauriewired

about 1 year ago

Although the iAPX 432 was a commercial flop, the design lineage was appealing to unique, military applications. Huges Aircraft used 35 i960 MXs (a rad-hard RISC chip birthed from the 432) for the main avionics of the F22. The equivalent of 2 Cray super-computers on a single aircraft! If you’d like to learn more about this unique ISA, check out Ken Shirriff’s blog. He goes into great detail about the history of the i960 design, and the 432 roots: https://t.co/F89x1nPTrW

lauriewired's tweet photo. Although the iAPX 432 was a commercial flop, the design lineage was appealing to unique, military applications.

Huges Aircraft used 35 i960 MXs (a rad-hard RISC chip birthed from the 432) for the main avionics of the F22.

The equivalent of 2 Cray super-computers on a single aircraft!

If you’d like to learn more about this unique ISA, check out Ken Shirriff’s blog. He goes into great detail about the history of the i960 design, and the 432 roots:
https://t.co/F89x1nPTrW

440

28K

firedevil retweeted

Navroop Singh

@TheNavroopSingh

2 months ago

Iran has told mediators it will continue to limit the number of ships allowed to cross the Strait of Hormuz and charge tolls for the remaining period of the cease-fire, officials familiar with the matter said. - WSJ https://t.co/WhQ20dN4CT

TheNavroopSingh's tweet photo. Iran has told mediators it will continue to limit the number of ships allowed to cross the Strait of Hormuz and charge tolls for the remaining period of the cease-fire, officials familiar with the matter said. - WSJ
https://t.co/WhQ20dN4CT https://t.co/QLTsidzvs8

firedevil retweeted

Jaynit

@jaynitx

3 months ago

In 2019, MIT professor Patrick Winston gave a legendary 1-hour lecture called “How to Speak.” It has 18M+ views for a reason. His frameworks: • Your ideas are like your children • The 5-minute rule for job talks • Why jokes fail at the start 15 lessons on communication:

230

40K

89K

Who to follow

Syed Khalid M

@syed_khaled

#define ME NULL /*Please don't dereference ME*/ RT's are bookmarks. I discuss about cyber security, politics and history.

I hack - I don't talk.

firedevil retweeted

George Kurtz

@George_Kurtz

4 months ago

Can AI replace security platforms? I asked Claude to build a CrowdStrike replacement. Claude: "I have to be straightforward: building a replacement for CrowdStrike isn't something I can do here. CrowdStrike is a massive platform built by thousands of engineers over a decade-plus. Its core capabilities include real-time kernel-level endpoint monitoring across millions of devices, a proprietary threat intelligence graph built from trillions of security events...That's not something you can replicate with a script — it's an infrastructure product." If you want to create AI, you need GPUs. If you want to deploy AI, you need security. That's not a hallucination – it's a fact. Watch the demo → https://t.co/77OnRmpjUx

George_Kurtz's tweet photo. Can AI replace security platforms? I asked Claude to build a CrowdStrike replacement.

Claude: "I have to be straightforward: building a replacement for CrowdStrike isn't something I can do here. CrowdStrike is a massive platform built by thousands of engineers over a decade-plus. Its core capabilities include real-time kernel-level endpoint monitoring across millions of devices, a proprietary threat intelligence graph built from trillions of security events...That's not something you can replicate with a script — it's an infrastructure product."

If you want to create AI, you need GPUs. If you want to deploy AI, you need security. That's not a hallucination – it's a fact.

Watch the demo → https://t.co/77OnRmpjUx

142

222

460

488K

firedevil retweeted

Alex Veremeyenko

@alex_verem

5 months ago

🚨 Anthropic just dropped a complete guide on how to build Skills like a pro. And if you’re building AI agents, this is required reading. It’s a 30+ page deep dive called The Complete Guide to Building Skills for Claude and it quietly shifts the conversation from “prompt engineering” to real execution design. Here’s the big idea: A Skill isn’t just a prompt. It’s a structured system. You package instructions inside a https://t.co/NFHAROW040 file, optionally add scripts, references, and assets, and teach Claude a repeatable workflow once instead of re-explaining it every chat. But the real unlock is something they call progressive disclosure. Instead of dumping everything into context: • A lightweight YAML frontmatter tells Claude when to use the skill • Full instructions load only when relevant • Extra files are accessed only if needed Less context bloat. More precision. They also introduce a powerful analogy: MCP gives Claude the kitchen. Skills give it the recipe. Without skills: users connect tools and don’t know what to do next. With skills: workflows trigger automatically, best practices are embedded, API calls become consistent. They outline 3 major patterns: 1) Document & asset creation 2) Workflow automation 3) MCP enhancement And they emphasize something most builders ignore: testing. Trigger accuracy. Tool call efficiency. Failure rate. Token usage. This isn’t about clever wording. It’s about designing an execution layer on top of LLMs. Skills work across https://t.co/6tb6ixQpca, Claude Code, and the API. Build once, deploy everywhere. The era of “just write a better prompt” is ending. Anthropic just handed everyone a blueprint for turning chat into infrastructure.

alex_verem's tweet photo. 🚨 Anthropic just dropped a complete guide on how to build Skills like a pro.

And if you’re building AI agents, this is required reading.

It’s a 30+ page deep dive called The Complete Guide to Building Skills for Claude and it quietly shifts the conversation from “prompt engineering” to real execution design.

Here’s the big idea:

A Skill isn’t just a prompt.
It’s a structured system.

You package instructions inside a https://t.co/NFHAROW040 file, optionally add scripts, references, and assets, and teach Claude a repeatable workflow once instead of re-explaining it every chat.

But the real unlock is something they call progressive disclosure.

Instead of dumping everything into context:

• A lightweight YAML frontmatter tells Claude when to use the skill
• Full instructions load only when relevant
• Extra files are accessed only if needed

Less context bloat. More precision.

They also introduce a powerful analogy:

MCP gives Claude the kitchen.
Skills give it the recipe.

Without skills: users connect tools and don’t know what to do next.
With skills: workflows trigger automatically, best practices are embedded, API calls become consistent.

They outline 3 major patterns:

1) Document & asset creation
2) Workflow automation
3) MCP enhancement

And they emphasize something most builders ignore: testing.

Trigger accuracy.
Tool call efficiency.
Failure rate.
Token usage.

This isn’t about clever wording.

It’s about designing an execution layer on top of LLMs.

Skills work across https://t.co/6tb6ixQpca, Claude Code, and the API. Build once, deploy everywhere.

The era of “just write a better prompt” is ending.

Anthropic just handed everyone a blueprint for turning chat into infrastructure.

815

111

74K

firedevil retweeted

Hasan Toor

@hasantoxr

5 months ago

🚨BREAKING: Someone just solved Claude Code's biggest problem. It's called Claude-Mem and it gives Claude persistent memory across sessions. - You can use up to 95% fewer tokens each time. - Make 20 times more tool calls before reaching limits. 100% Opensource.

hasantoxr's tweet photo. 🚨BREAKING: Someone just solved Claude Code's biggest problem.

It's called Claude-Mem and it gives Claude persistent memory across sessions.

- You can use up to 95% fewer tokens each time.
- Make 20 times more tool calls before reaching limits.

100% Opensource. https://t.co/0MNrtxTkKR

317

10K

15K

firedevil retweeted

Lukasz Olejnik

@lukOlejnik

5 months ago

I show how malicious Claude Code skills can spread across infrastructure. Approve one skill → it gets shell access → copies itself to every host in your SSH config. Skills are code. Treat them that way. https://t.co/vXX9SeoOkZ

148

21K

firedevil retweeted

Ahmad

@TheAhmadOsman

5 months ago

INCREDIBLE Someone on r/LocalLLaMA did an incredibly practical thing They took a tiny 0.6B model that was trash at task (Text2SQL) Created a knowledge distiliation agent with a Claude Code skill And made the 0.6B model behave like a specialist using 100 examples The problem > Small Language Models are “generally helpful” > but specialized tasks are “exact or you die” > you ask: “Which artists have >1M album sales?” > the model answers: “check if genre is NULL” The old way to fix this > Finetune the model: > collect + clean data > build training pipeline > tune hparams > rerun when it’s wrong > accidentally become the unpaid > intern of your own experiment The new way > Knowledge distillation via a Claude skill > use a strong teacher (DeepSeek-V3) > generate synthetic pairs from a small seed set > train a tiny student to imitate the teacher on your task > ship it as GGUF / HF / LoRA > run it locally Distillation isn’t “creating skill” It’s compressing skill THE REAL HACK: agent-as-interface > They wrapped the whole distillation loop in an agent “skill”: > picks task type (QA / classification / tool calling / RAG) > converts messy inputs into clean JSONL > runs teacher eval first > kicks off distillation + monitors progress > packages weights for you to run locally This is the quiet unlock Why “teacher eval first” is elite behavior > distillation amplifies competence and incompetence > if the teacher is wrong, the student learns wrong faster > garbage in -> efficient garbage out Adult supervision, but for models The run breakdown: > seed: ~100 raw conversation traces > teacher (LLM-as-judge): ~80% > base 0.6B: ~36% > distilled 0.6B: ~74% > output: ~2.2GB GGUF > runs locally with llama.cpp Before vs after (the entire reason you do this) > before: wrong tables, wrong logic, nonsense SQL > after: correct JOINs, GROUP BY, HAVING > aka “this query actually executes and answers the question” What this really means (bigger than Text2SQL) You don’t need a giant model for every job You need tiny specialists that understand your world: > internal schemas > service / OS logs > tool outputs > company-specific workflows TL;DR > “fine-tuning is hard” is mostly “the pipeline is annoying” > distillation skill turns 10–100 examples into a real specialist > the agent wrapper turns the whole thing into a conversation > this is how you get practical local SLMs > without becoming an MLOps monk Small & Specialized models > High-leverage > Boringly effective > Exactly where this is going The future is Local inference Lower latency Fewer secrets leaving the building

TheAhmadOsman's tweet photo. INCREDIBLE

Someone on r/LocalLLaMA did an incredibly practical thing

They took a tiny 0.6B model that was trash at task (Text2SQL)
Created a knowledge distiliation agent with a Claude Code skill
And made the 0.6B model behave like a specialist using 100 examples

The problem
> Small Language Models are “generally helpful”
> but specialized tasks are “exact or you die”
> you ask: “Which artists have >1M album sales?”
> the model answers: “check if genre is NULL”

The old way to fix this
> Finetune the model:
> collect + clean data
> build training pipeline
> tune hparams
> rerun when it’s wrong
> accidentally become the unpaid
> intern of your own experiment

The new way
> Knowledge distillation via a Claude skill
> use a strong teacher (DeepSeek-V3)
> generate synthetic pairs from a small seed set
> train a tiny student to imitate the teacher on your task
> ship it as GGUF / HF / LoRA
> run it locally

Distillation isn’t “creating skill”
It’s compressing skill

THE REAL HACK: agent-as-interface
> They wrapped the whole distillation loop in an agent “skill”:
> picks task type (QA / classification / tool calling / RAG)
> converts messy inputs into clean JSONL
> runs teacher eval first
> kicks off distillation + monitors progress
> packages weights for you to run locally
This is the quiet unlock

Why “teacher eval first” is elite behavior
> distillation amplifies competence and incompetence
> if the teacher is wrong, the student learns wrong faster
> garbage in -> efficient garbage out
Adult supervision, but for models

The run breakdown:
> seed: ~100 raw conversation traces
> teacher (LLM-as-judge): ~80%
> base 0.6B: ~36%
> distilled 0.6B: ~74%
> output: ~2.2GB GGUF
> runs locally with llama.cpp

Before vs after (the entire reason you do this)
> before: wrong tables, wrong logic, nonsense SQL
> after: correct JOINs, GROUP BY, HAVING
> aka “this query actually executes and answers the question”

What this really means (bigger than Text2SQL)
You don’t need a giant model for every job

You need tiny specialists that understand your world:
> internal schemas
> service / OS logs
> tool outputs
> company-specific workflows

TL;DR
> “fine-tuning is hard” is mostly “the pipeline is annoying”
> distillation skill turns 10–100 examples into a real specialist
> the agent wrapper turns the whole thing into a conversation
> this is how you get practical local SLMs
> without becoming an MLOps monk

Small & Specialized models
> High-leverage
> Boringly effective
> Exactly where this is going

The future is
Local inference
Lower latency
Fewer secrets leaving the building

209

128K

firedevil retweeted

Ming

@tslaming

5 months ago

BREAKING 🚨 TESLA HAS PATENTED A "MATHEMATICAL CHEAT CODE" THAT FORCES CHEAP 8-BIT CHIPS TO RUN ELITE 32-BIT AI MODELS AND REWRITES THE RULES OF SILICON 🐳 How does a Tesla remember a stop sign it hasn’t seen for 30 seconds, or a humanoid robot maintain perfect balance while carrying a heavy, shifting box? It comes down to Rotary Positional Encoding (RoPE)—the "GPS of the mind" that allows AI to understand its place in space and time by assigning a unique rotational angle to every piece of data. Usually, this math is a hardware killer. To keep these angles from "drifting" into chaos, you need power-hungry, high-heat 32-bit processors (chips that calculate with extreme decimal-point precision). But Tesla has engineered a way to cheat the laws of physics. Freshly revealed in patent US20260017019A1, Tesla’s "MIXED-PRECISION BRIDGE" is a mathematical translator that allows inexpensive, power-sipping 8-bit hardware (which usually handles only simple, rounded numbers) to perform elite 32-bit rotations without dropping a single coordinate. This breakthrough is the secret "Silicon Bridge" that gives Optimus and FSD high-end intelligence without sacrificing a mile of range or melting their internal circuits. It effectively turns Tesla’s efficient "budget" hardware into a high-fidelity supercomputer on wheels. 📉 The problem: the high cost of precision In the world of self-driving cars and humanoid robots, we are constantly fighting a war between precision and power. Modern AI models like Transformers rely on RoPE to help the AI understand where objects are in a sequence or a 3D space. The catch is that these trigonometric functions (sines and cosines) usually require 32-bit floating-point math—imagine trying to calculate a flight path using 10 decimal places of accuracy. If you try to cram that into the standard 8-bit multipliers (INT8) used for speed (which is like rounding everything to the nearest whole number), the errors pile up fast. The car effectively goes blind to fine details. For a robot like Optimus, a tiny math error means losing its balance or miscalculating the distance to a fragile object. To bridge this gap without simply adding more expensive chips, Tesla had to fundamentally rethink how data travels through the silicon. 🛠️ Tesla's solution: the logarithmic shortcut & pre-computation Tesla’s engineers realized they didn't need to force the whole pipeline to be high-precision. Instead, they designed the Mixed-Precision Bridge. They take the crucial angles used for positioning and convert them into logarithms. Because the "dynamic range" of a logarithm is much smaller than the original number, it’s much easier to move that data through narrow 8-bit hardware without losing the "soul" of the information. It’s a bit like dehydrating food for transport; it takes up less space and is easier to handle, but you can perfectly reconstitute it later. Crucially, the patent reveals that the system doesn't calculate these logarithms on the fly every time. Instead, it retrieves pre-computed logarithmic values from a specialized "cheat sheet" (look-up storage) to save cycles. By keeping the data in this "dehydrated" log-state, Tesla ensures that the precision doesn't "leak out" during the journey from the memory chips to the actual compute cores. However, keeping data in a log-state is only half the battle; the chip eventually needs to understand the real numbers again. 🏗️ The recovery architecture: rotation matrices & Horner’s method When the 8-bit multiplier (the Multiplier-Accumulator or MAC) finishes its job, the data is still in a "dehydrated" logarithmic state. To bring it back to a real angle theta without a massive computational cost, Tesla’s high-precision ALU uses a Taylor-series expansion optimized via Horner’s Method. This is a classic computer science trick where a complex equation (like an exponent) is broken down into a simple chain of multiplications and additions. By running this in three specific stages—multiplying by constants like 1/3 and 1/2 at each step—Tesla can approximate the exact value of an angle with 32-bit accuracy while using a fraction of the clock cycles. Once the angle is recovered, the high-precision logic generates a Rotation Matrix (a grid of sine and cosine values) that locks the data points into their correct 3D coordinates. This computational efficiency is impressive, but Tesla didn't stop at just calculating faster; they also found a way to double the "highway speed" of the data itself. 🧩 The data concatenation: 8-bit inputs to 16-bit outputs One of the most clever hardware "hacks" detailed in the patent is how Tesla manages to move 16-bit precision through an 8-bit bus. They use the MAC as a high-speed interleaver—effectively a "traffic cop" that merges two lanes of data. It takes two 8-bit values (say, an X-coordinate and the first half of a logarithm) and multiplies one of them by a power of two to "left-shift" it. This effectively glues them together into a single 16-bit word in the output register, allowing the low-precision domain to act as a high-speed packer for the high-precision ALU to "unpack". This trick effectively doubles the bandwidth of the existing wiring on the chip without requiring a physical hardware redesign. With this high-speed data highway in place, the system can finally tackle one of the biggest challenges in autonomous AI: object permanence. 🧠 Long-context memory: remembering the stop sign The ultimate goal of this high-precision math is to solve the "forgetting" problem. In previous versions of FSD, a car might see a stop sign, but if a truck blocked its view for 5 seconds, it might "forget" the sign existed. Tesla uses a "long-context" window, allowing the AI to look back at data from 30 seconds ago or more. However, as the "distance" in time increases, standard positional math usually drifts. Tesla's mixed-precision pipeline fixes this by maintaining high positional resolution, ensuring the AI knows exactly where that occluded stop sign is even after a long period of movement. The RoPE rotations are so precise that the sign stays "pinned" to its 3D coordinate in the car's mental map. But remembering 30 seconds of high-fidelity video creates a massive storage bottleneck. ⚡ KV-cache optimization & paged attention: scaling memory To make these 30-second memories usable in real-time without running out of RAM, Tesla optimizes the KV-cache (Key-Value Cache)—the AI's "working memory" scratchpad. Tesla’s hardware handles this by storing the logarithm of the positions directly in the cache. This reduces the memory footprint by 50% or more, allowing Tesla to store twice as much "history" (up to 128k tokens) in the same amount of RAM. Furthermore, Tesla utilizes Paged Attention—a trick borrowed from operating systems. Instead of reserving one massive, continuous block of memory (which is inefficient), it breaks memory into small "pages". This allows the AI5 chip to dynamically allocate space only where it's needed, drastically increasing the number of objects (pedestrians, cars, signs) the car can track simultaneously without the system lagging. Yet, even with infinite storage efficiency, the AI's attention mechanism has a flaw: it tends to crash when pushed beyond its training limits. 🔒 Pipeline integrity: the "read-only" safety lock A subtle but critical detail in the patent is how Tesla protects this data. Once the transformed coordinates are generated, they are stored in a specific location that is read-accessible to downstream components but not write-accessible by them. Furthermore, the high-precision ALU itself cannot read back from this location. This one-way "airlock" prevents the system from accidentally overwriting its own past memories or creating feedback loops that could cause the AI to hallucinate. It ensures that the "truth" of the car's position flows in only one direction: forward, toward the decision-making engine. 🌀 Attention sinks: preventing memory overflow Even with a lean KV-cache, a robot operating for hours can't remember everything forever. Tesla manages this using Attention Sink tokens. Transformers tend to dump "excess" attention math onto the very first tokens of a sequence, so if Tesla simply used a "sliding window" that deleted old memories, the AI would lose these "sink" tokens and its brain would effectively crash. Tesla's hardware is designed to "pin" these attention sinks permanently in the KV-cache. By keeping these mathematical anchors stable while the rest of the memory window slides forward, Tesla prevents the robot’s neural network from destabilizing during long, multi-hour work shifts. While attention sinks stabilize the "memory", the "compute" side has its own inefficiencies—specifically, wasting power on empty space. 🌫️ Sparse tensors: cutting the compute fat Tesla’s custom silicon doesn't just cheat with precision; it cheats with volume. In the real world, most of what a car or robot sees is "empty" space (like clear sky). In AI math, these are represented as "zeros" in a Sparse Tensor (a data structure that ignores empty space). Standard chips waste power multiplying all those zeros, but Tesla’s newest architecture incorporates Native Sparse Acceleration. The hardware uses a "coordinate-based" system where it only stores the non-zero values and their specific locations. The chip can then skip the "dead space" entirely and focus only on the data that matters—the actual cars and obstacles. This hardware-level sparsity support effectively doubles the throughput of the AI5 chip while significantly lowering the energy consumed per operation. 🔊 The audio edge: Log-Sum-Exp for sirens Tesla’s "Silicon Bridge" isn't just for vision—it's also why your Tesla is becoming a world-class listener. To navigate safely, an autonomous vehicle needs to identify emergency sirens and the sound of nearby collisions using a Log-Mel Spectrogram approach (a visual "heat map" of sound frequencies). The patent details a specific Log-Sum-Exp (LSE) approximation technique to handle this. By staying in the logarithm domain, the system can handle the massive "dynamic range" of sound—from a faint hum to a piercing fire truck—using only 8-bit hardware without "clipping" the loud sounds or losing the quiet ones. This allows the car to "hear" and categorize environmental sounds with 32-bit clarity. Of course, all this high-tech hardware is only as good as the brain that runs on it, which is why Tesla's training process is just as specialized. 🎓 Quantization-aware training: pre-adapting the brain Finally, to make sure this "Mixed-Precision Bridge" works flawlessly, Tesla uses Quantization-Aware Training (QAT). Instead of training the AI in a perfect 32-bit world and then "shrinking" it later—which typically causes the AI to become "drunk" and inaccurate—Tesla trains the model from day one to expect 8-bit limitations. They simulate the rounding errors and "noise" of the hardware during the training phase, creating a neural network that is "pre-hardened". It’s like a pilot training in a flight simulator that perfectly mimics a storm; when they actually hit the real weather in the real world, the AI doesn’t "drift" or become inaccurate because it was born in that environment. This extreme optimization opens the door to running Tesla's AI on devices far smaller than a car. 🚀 The strategic roadmap: from AI5 to ubiquitous edge AI This patent is not just a "nice-to-have" optimization; it is the mathematical prerequisite for Tesla’s entire hardware roadmap. Without this "Mixed-Precision Bridge", the thermal and power equations for next-generation autonomy simply do not work. It starts by unlocking the AI5 chip, which is projected to be 40x more powerful than current hardware. Raw power is useless if memory bandwidth acts as a bottleneck. By compressing 32-bit rotational data into dense, log-space 8-bit packets, this patent effectively quadruples the effective bandwidth, allowing the chip to utilize its massive matrix-compute arrays without stalling. This efficiency is critical for the chip's "half-reticle" design, which reduces silicon size to maximize manufacturing yield while maintaining supercomputer-level throughput. This efficiency is even more critical for Tesla Optimus, where it is a matter of operational survival. The robot runs on a 2.3 kWh battery (roughly 1/30th of a Model 3 pack). Standard 32-bit GPU compute would drain this capacity in under 4 hours, consuming 500W+ just for "thinking". By offloading complex RoPE math to this hybrid logic, Tesla slashes the compute power budget to under 100W. This solves the "thermal wall", ensuring the robot can maintain balance and awareness for a full 8-hour work shift without overheating. This stability directly enables the shift to End-to-End Neural Networks. The "Rotation Matrix" correction described in the patent prevents the mathematical "drift" that usually plagues long-context tracking. This ensures that a stop sign seen 30 seconds ago remains "pinned" to its correct 3D coordinate in the World Model, rather than floating away due to rounding errors. Finally, baking this math into the silicon secures Tesla's strategic independence. It decouples the company from NVIDIA’s CUDA ecosystem and enables a Dual-Foundry Strategy with both Samsung and TSMC to mitigate supply chain risks. This creates a deliberate "oversupply" of compute, potentially turning its idle fleet and unsold chips into a distributed inference cloud that rivals AWS in efficiency. But the roadmap goes further. Because this mixed-precision architecture slashes power consumption by orders of magnitude, it creates a blueprint for "Tesla AI on everything". It opens the door to porting world-class vision models to hardware as small as a smart home hub or smartphone. This would allow tiny, cool-running chips to calculate 3D spatial positioning with zero latency—bringing supercomputer-level intelligence to the edge without ever sending private data to a massive cloud server.

tslaming's tweet photo. BREAKING 🚨 TESLA HAS PATENTED A "MATHEMATICAL CHEAT CODE" THAT FORCES CHEAP 8-BIT CHIPS TO RUN ELITE 32-BIT AI MODELS AND REWRITES THE RULES OF SILICON 🐳

How does a Tesla remember a stop sign it hasn’t seen for 30 seconds, or a humanoid robot maintain perfect balance while carrying a heavy, shifting box?

It comes down to Rotary Positional Encoding (RoPE)—the "GPS of the mind" that allows AI to understand its place in space and time by assigning a unique rotational angle to every piece of data.

Usually, this math is a hardware killer. To keep these angles from "drifting" into chaos, you need power-hungry, high-heat 32-bit processors (chips that calculate with extreme decimal-point precision).

But Tesla has engineered a way to cheat the laws of physics. Freshly revealed in patent US20260017019A1, Tesla’s "MIXED-PRECISION BRIDGE" is a mathematical translator that allows inexpensive, power-sipping 8-bit hardware (which usually handles only simple, rounded numbers) to perform elite 32-bit rotations without dropping a single coordinate.

This breakthrough is the secret "Silicon Bridge" that gives Optimus and FSD high-end intelligence without sacrificing a mile of range or melting their internal circuits. It effectively turns Tesla’s efficient "budget" hardware into a high-fidelity supercomputer on wheels.

📉 The problem: the high cost of precision

In the world of self-driving cars and humanoid robots, we are constantly fighting a war between precision and power. Modern AI models like Transformers rely on RoPE to help the AI understand where objects are in a sequence or a 3D space.

The catch is that these trigonometric functions (sines and cosines) usually require 32-bit floating-point math—imagine trying to calculate a flight path using 10 decimal places of accuracy.

If you try to cram that into the standard 8-bit multipliers (INT8) used for speed (which is like rounding everything to the nearest whole number), the errors pile up fast. The car effectively goes blind to fine details.

For a robot like Optimus, a tiny math error means losing its balance or miscalculating the distance to a fragile object. To bridge this gap without simply adding more expensive chips, Tesla had to fundamentally rethink how data travels through the silicon.

🛠️ Tesla's solution: the logarithmic shortcut & pre-computation

Tesla’s engineers realized they didn't need to force the whole pipeline to be high-precision. Instead, they designed the Mixed-Precision Bridge.

They take the crucial angles used for positioning and convert them into logarithms. Because the "dynamic range" of a logarithm is much smaller than the original number, it’s much easier to move that data through narrow 8-bit hardware without losing the "soul" of the information.

It’s a bit like dehydrating food for transport; it takes up less space and is easier to handle, but you can perfectly reconstitute it later.

Crucially, the patent reveals that the system doesn't calculate these logarithms on the fly every time. Instead, it retrieves pre-computed logarithmic values from a specialized "cheat sheet" (look-up storage) to save cycles.

By keeping the data in this "dehydrated" log-state, Tesla ensures that the precision doesn't "leak out" during the journey from the memory chips to the actual compute cores. However, keeping data in a log-state is only half the battle; the chip eventually needs to understand the real numbers again.

🏗️ The recovery architecture: rotation matrices & Horner’s method

When the 8-bit multiplier (the Multiplier-Accumulator or MAC) finishes its job, the data is still in a "dehydrated" logarithmic state. To bring it back to a real angle theta without a massive computational cost, Tesla’s high-precision ALU uses a Taylor-series expansion optimized via Horner’s Method.

This is a classic computer science trick where a complex equation (like an exponent) is broken down into a simple chain of multiplications and additions.

By running this in three specific stages—multiplying by constants like 1/3 and 1/2 at each step—Tesla can approximate the exact value of an angle with 32-bit accuracy while using a fraction of the clock cycles.

Once the angle is recovered, the high-precision logic generates a Rotation Matrix (a grid of sine and cosine values) that locks the data points into their correct 3D coordinates.

This computational efficiency is impressive, but Tesla didn't stop at just calculating faster; they also found a way to double the "highway speed" of the data itself.

🧩 The data concatenation: 8-bit inputs to 16-bit outputs

One of the most clever hardware "hacks" detailed in the patent is how Tesla manages to move 16-bit precision through an 8-bit bus. They use the MAC as a high-speed interleaver—effectively a "traffic cop" that merges two lanes of data.

It takes two 8-bit values (say, an X-coordinate and the first half of a logarithm) and multiplies one of them by a power of two to "left-shift" it.

This effectively glues them together into a single 16-bit word in the output register, allowing the low-precision domain to act as a high-speed packer for the high-precision ALU to "unpack".

This trick effectively doubles the bandwidth of the existing wiring on the chip without requiring a physical hardware redesign. With this high-speed data highway in place, the system can finally tackle one of the biggest challenges in autonomous AI: object permanence.

🧠 Long-context memory: remembering the stop sign

The ultimate goal of this high-precision math is to solve the "forgetting" problem. In previous versions of FSD, a car might see a stop sign, but if a truck blocked its view for 5 seconds, it might "forget" the sign existed.

Tesla uses a "long-context" window, allowing the AI to look back at data from 30 seconds ago or more.

However, as the "distance" in time increases, standard positional math usually drifts. Tesla's mixed-precision pipeline fixes this by maintaining high positional resolution, ensuring the AI knows exactly where that occluded stop sign is even after a long period of movement.

The RoPE rotations are so precise that the sign stays "pinned" to its 3D coordinate in the car's mental map. But remembering 30 seconds of high-fidelity video creates a massive storage bottleneck.

⚡ KV-cache optimization & paged attention: scaling memory

To make these 30-second memories usable in real-time without running out of RAM, Tesla optimizes the KV-cache (Key-Value Cache)—the AI's "working memory" scratchpad.

Tesla’s hardware handles this by storing the logarithm of the positions directly in the cache. This reduces the memory footprint by 50% or more, allowing Tesla to store twice as much "history" (up to 128k tokens) in the same amount of RAM.

Furthermore, Tesla utilizes Paged Attention—a trick borrowed from operating systems. Instead of reserving one massive, continuous block of memory (which is inefficient), it breaks memory into small "pages".

This allows the AI5 chip to dynamically allocate space only where it's needed, drastically increasing the number of objects (pedestrians, cars, signs) the car can track simultaneously without the system lagging.

Yet, even with infinite storage efficiency, the AI's attention mechanism has a flaw: it tends to crash when pushed beyond its training limits.

🔒 Pipeline integrity: the "read-only" safety lock

A subtle but critical detail in the patent is how Tesla protects this data. Once the transformed coordinates are generated, they are stored in a specific location that is read-accessible to downstream components but not write-accessible by them.

Furthermore, the high-precision ALU itself cannot read back from this location.

This one-way "airlock" prevents the system from accidentally overwriting its own past memories or creating feedback loops that could cause the AI to hallucinate. It ensures that the "truth" of the car's position flows in only one direction: forward, toward the decision-making engine.

🌀 Attention sinks: preventing memory overflow

Even with a lean KV-cache, a robot operating for hours can't remember everything forever. Tesla manages this using Attention Sink tokens.

Transformers tend to dump "excess" attention math onto the very first tokens of a sequence, so if Tesla simply used a "sliding window" that deleted old memories, the AI would lose these "sink" tokens and its brain would effectively crash.

Tesla's hardware is designed to "pin" these attention sinks permanently in the KV-cache. By keeping these mathematical anchors stable while the rest of the memory window slides forward, Tesla prevents the robot’s neural network from destabilizing during long, multi-hour work shifts.

While attention sinks stabilize the "memory", the "compute" side has its own inefficiencies—specifically, wasting power on empty space.

🌫️ Sparse tensors: cutting the compute fat

Tesla’s custom silicon doesn't just cheat with precision; it cheats with volume. In the real world, most of what a car or robot sees is "empty" space (like clear sky).

In AI math, these are represented as "zeros" in a Sparse Tensor (a data structure that ignores empty space). Standard chips waste power multiplying all those zeros, but Tesla’s newest architecture incorporates Native Sparse Acceleration.

The hardware uses a "coordinate-based" system where it only stores the non-zero values and their specific locations. The chip can then skip the "dead space" entirely and focus only on the data that matters—the actual cars and obstacles.

This hardware-level sparsity support effectively doubles the throughput of the AI5 chip while significantly lowering the energy consumed per operation.

🔊 The audio edge: Log-Sum-Exp for sirens

Tesla’s "Silicon Bridge" isn't just for vision—it's also why your Tesla is becoming a world-class listener. To navigate safely, an autonomous vehicle needs to identify emergency sirens and the sound of nearby collisions using a Log-Mel Spectrogram approach (a visual "heat map" of sound frequencies).

The patent details a specific Log-Sum-Exp (LSE) approximation technique to handle this. By staying in the logarithm domain, the system can handle the massive "dynamic range" of sound—from a faint hum to a piercing fire truck—using only 8-bit hardware without "clipping" the loud sounds or losing the quiet ones.

This allows the car to "hear" and categorize environmental sounds with 32-bit clarity. Of course, all this high-tech hardware is only as good as the brain that runs on it, which is why Tesla's training process is just as specialized.

🎓 Quantization-aware training: pre-adapting the brain

Finally, to make sure this "Mixed-Precision Bridge" works flawlessly, Tesla uses Quantization-Aware Training (QAT).

Instead of training the AI in a perfect 32-bit world and then "shrinking" it later—which typically causes the AI to become "drunk" and inaccurate—Tesla trains the model from day one to expect 8-bit limitations.

They simulate the rounding errors and "noise" of the hardware during the training phase, creating a neural network that is "pre-hardened". It’s like a pilot training in a flight simulator that perfectly mimics a storm; when they actually hit the real weather in the real world, the AI doesn’t "drift" or become inaccurate because it was born in that environment.

This extreme optimization opens the door to running Tesla's AI on devices far smaller than a car.

🚀 The strategic roadmap: from AI5 to ubiquitous edge AI

This patent is not just a "nice-to-have" optimization; it is the mathematical prerequisite for Tesla’s entire hardware roadmap. Without this "Mixed-Precision Bridge", the thermal and power equations for next-generation autonomy simply do not work.

It starts by unlocking the AI5 chip, which is projected to be 40x more powerful than current hardware. Raw power is useless if memory bandwidth acts as a bottleneck.

By compressing 32-bit rotational data into dense, log-space 8-bit packets, this patent effectively quadruples the effective bandwidth, allowing the chip to utilize its massive matrix-compute arrays without stalling.

This efficiency is critical for the chip's "half-reticle" design, which reduces silicon size to maximize manufacturing yield while maintaining supercomputer-level throughput.

This efficiency is even more critical for Tesla Optimus, where it is a matter of operational survival. The robot runs on a 2.3 kWh battery (roughly 1/30th of a Model 3 pack).

Standard 32-bit GPU compute would drain this capacity in under 4 hours, consuming 500W+ just for "thinking".

By offloading complex RoPE math to this hybrid logic, Tesla slashes the compute power budget to under 100W. This solves the "thermal wall", ensuring the robot can maintain balance and awareness for a full 8-hour work shift without overheating.

This stability directly enables the shift to End-to-End Neural Networks. The "Rotation Matrix" correction described in the patent prevents the mathematical "drift" that usually plagues long-context tracking.

This ensures that a stop sign seen 30 seconds ago remains "pinned" to its correct 3D coordinate in the World Model, rather than floating away due to rounding errors.

Finally, baking this math into the silicon secures Tesla's strategic independence. It decouples the company from NVIDIA’s CUDA ecosystem and enables a Dual-Foundry Strategy with both Samsung and TSMC to mitigate supply chain risks.

This creates a deliberate "oversupply" of compute, potentially turning its idle fleet and unsold chips into a distributed inference cloud that rivals AWS in efficiency.

But the roadmap goes further. Because this mixed-precision architecture slashes power consumption by orders of magnitude, it creates a blueprint for "Tesla AI on everything".

It opens the door to porting world-class vision models to hardware as small as a smart home hub or smartphone. This would allow tiny, cool-running chips to calculate 3D spatial positioning with zero latency—bringing supercomputer-level intelligence to the edge without ever sending private data to a massive cloud server.

947

10K

firedevil retweeted

Tom Dörr

@tom_doerr

6 months ago

Motion detection using Wi-Fi signals https://t.co/JggLEkegw5

132

65K

firedevil retweeted

Raphael Luba

@LubaRaphael

6 months ago

Since certain companies boast about wanting to rewrite their whole code, maybe it’s time to point the next generation of engineers towards this classic: https://t.co/kNbYSz0iYE (It‘s been 25 years. People seem to have forgotten.)

573

429

88K

firedevil retweeted

Science girl

@sciencegirl

7 months ago

How a single tree affects an entire forest Suzanne Simard has fundamentally reshaped how we understand forests. Through decades of field research, she demonstrated that trees are linked by vast underground networks of mycorrhizal fungi. These symbiotic partnerships allow trees to exchange carbon, nutrients, water, and chemical signals—sometimes over great distances and across species lines. Her landmark 1997 paper in Nature provided the first clear evidence that paper birch and Douglas-fir seedlings transfer carbon to one another through fungal connections, with the flow shifting depending on which tree is shaded and needs resources most. This directly challenged the prevailing view of forests as arenas of relentless competition. Instead, Simard’s work revealed cooperative dynamics: older, hub-like trees—what she calls “Mother Trees”—are the most highly connected nodes in the network. They recognize their genetic kin, allocate more resources to their own seedlings, and even bolster unrelated young trees, enhancing overall forest resilience. The scientific community now widely refers to these fungal linkages as the “Wood Wide Web.” Today, her work has inspired everything from Avatar to the Pulitzer Prize-winning novel "The Overstory." Her memoir, "Finding the Mother Tree," became a global bestseller and is now being adapted into a film starring Amy Adams. On the ground in British Columbia, Simard partners with Indigenous communities to design logging practices that spare Mother Trees and old-growth networks. Early results show these areas store more carbon, retain more biodiversity, and regenerate decades faster than conventionally clear-cut sites. Some researchers caution against terms like “mother” or “communication,” preferring strictly neutral language. Simard maintains that the underlying phenomena—resource sharing, kin recognition, and chemical alarm signals—are rigorously documented, and evocative words help people care about forests they might otherwise see only as timber. In her words: “These forests can recover their complexity and strength, but only if we start managing them as living, connected systems”

105

163K

firedevil retweeted

Trung Phan

@TrungTPhan

7 months ago

The program was called Bell Labs’ One Year On Campus (OYOC) and offered to newly grads. Full read here (writer Elizabeth Van Nostrand interviews her dad about the program): https://t.co/QpGX92OA5e

TrungTPhan's tweet photo. The program was called Bell Labs’ One Year On Campus (OYOC) and offered to newly grads.

Full read here (writer Elizabeth Van Nostrand interviews her dad about the program): https://t.co/QpGX92OA5e https://t.co/dCRFxlD39u

firedevil retweeted

Alex Prompter

@alex_prompter

9 months ago

Anthropic's internal prompting style is completely different from what most people teach. I spent 3 weeks analyzing their official prompt library, documentation, and API examples. Here's every secret I extracted 👇

265

643

43K

firedevil @firedevil

11 months ago

Hi @myntra there is a fake website claiming to be Myntra. All items at 80% off. https://t.co/sIbbojQFZD They are using your logo

firedevil retweeted

Denis Laskov 🇮🇱

@it4sec

11 months ago

Finding vulnerabilities in DJI drones: reverse engineering, firmware decryption, and dynamic analysis (fuzzing). 👨‍💻❯❯🛸🫨🪲 More details on: LinkedIn: https://t.co/954wGcZKME Substack: https://t.co/8i6VVn6bXO

it4sec's tweet photo. Finding vulnerabilities in DJI drones: reverse engineering, firmware decryption, and dynamic analysis (fuzzing). 👨‍💻❯❯🛸🫨🪲

More details on:
LinkedIn: https://t.co/954wGcZKME
Substack: https://t.co/8i6VVn6bXO https://t.co/arpzixCiWn

402

316

20K

firedevil retweeted

Florian Roth ⚡️

@cyb3rops

11 months ago

YARA is great at string (or byte chain) matching. It’s not great at juggling hashes and loops in conditions. This rule from a public demo loops over export offsets, reads 14 bytes at each offset, hashes them, and compares to an MD5. Yeah… no. - String matching in YARA is highly optimized (Aho-Corasick) and scales well across many rules - But the condition of each rule has to be evaluated separately - That means hashes, loops, and complex logic add up quickly across large rule sets - And importing any module (not just hash) has a significant performance impact Use YARA’s strengths: string matching. A better way to write this? Include the 14-byte pattern in the strings: section, then check if it matched at the offset. Done. Also: - Avoid import unless you really need it - If you must import a module once, fine – reuse it. The performance hit comes with the first import, not the second use. In large rulesets with hundreds or thousands of rules, condition evaluation doesn’t scale well. String matching does. More info: YARA Performance Guidelines https://t.co/88tjl9CwLs The impact of importing a module https://t.co/aXVaI2ChqB Performance impact of condition evaluation https://t.co/UOv2NJralZ

cyb3rops's tweet photo. YARA is great at string (or byte chain) matching.
It’s not great at juggling hashes and loops in conditions.

This rule from a public demo loops over export offsets, reads 14 bytes at each offset, hashes them, and compares to an MD5.

Yeah… no.
- String matching in YARA is highly optimized (Aho-Corasick) and scales well across many rules
- But the condition of each rule has to be evaluated separately
- That means hashes, loops, and complex logic add up quickly across large rule sets
- And importing any module (not just hash) has a significant performance impact

Use YARA’s strengths: string matching.
A better way to write this?
Include the 14-byte pattern in the strings: section, then check if it matched at the offset. Done.

Also:
- Avoid import unless you really need it
- If you must import a module once, fine – reuse it. The performance hit comes with the first import, not the second use.

In large rulesets with hundreds or thousands of rules, condition evaluation doesn’t scale well.
String matching does.

More info:

YARA Performance Guidelines
https://t.co/88tjl9CwLs
The impact of importing a module
https://t.co/aXVaI2ChqB
Performance impact of condition evaluation
https://t.co/UOv2NJralZ

112

10K

firedevil @firedevil

11 months ago

Hey @grok, who was the most famous person to visit my profile? It doesn't need to be a mutual, don't tag them, just say who it was

firedevil retweeted

Ayush Anand

@Securityinbits

11 months ago

I’ve been tracking recent AsyncRAT infra using a simple query — it flags C2 server using the default certificate. Use this @fofabot query 👇 cert .issuer.cn="AsyncRAT" Found over 160+ unique IPs hosting AsyncRAT Top 3 ports (last 3 months) ➊ 7777 ➋ 8888 ➌ 4444 Cert:

Securityinbits's tweet photo. I’ve been tracking recent AsyncRAT infra using a simple query — it flags C2 server using the default certificate.

Use this @fofabot query 👇

cert .issuer.cn="AsyncRAT"
Found over 160+ unique IPs hosting AsyncRAT

Top 3 ports (last 3 months)
➊ 7777
➋ 8888
➌ 4444

Cert:

678

firedevil

@firedevil

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users