Mini mal @gatelevelanon - Twitter Profile

Pinned Tweet

5 months ago

$INTC can increase as much SRAM density. Groq can build inference chips with $NVDA. But can they force undo the enshittification of the software stack ? How far can hardware efficiency go to cover for the the inefficiencies of the software bloat?

1

2

0

1

6K

Mini mal

@gatelevelanon

about 4 hours ago

@yacineMTB See what I did there. I am so proud of myself.

0

9

Mini mal

@gatelevelanon

about 4 hours ago

@yacineMTB So basically it's a "differential amplifier"

1

0

134

Mini mal

@gatelevelanon

2 days ago

@zackslab @i2cjak Strong VC signal : > it uses the latest technology not something from the 80s. Get rekt spice-cels

1

0

55

Mini mal

@gatelevelanon

2 days ago

My advice to someone who messaged about job seeking :

0

1

0

25

Mini mal

@gatelevelanon

2 days ago

Petition to change x86 to x26 or something. Why we still stuck in the dark ages with that number

0

2

0

61

Mini mal

@gatelevelanon

2 days ago

@mu_chrinovic ".. and call it extension. Nobody will notice"

0

5

0

127

Mini mal

@gatelevelanon

4 days ago

@mu_chrinovic > uses O3 compiler > Less performance than hand typed assembly > Mfw O3 is tip of the iceberg. Compiler flag mining is harder than hand typed assembly > Pic related

0

3

0

426

Mini mal

@gatelevelanon

4 days ago

Velocity = Momentum / Mass I am learning about hyperdimensional computational models or sumthing and these analogies keep cropping up. Strange but interesting.

Mike Bradley

@The_Only_Signal

5 days ago

Someone out there likely needs this:

14

702

35

383

29K

0

1

0

107

Mini mal

@gatelevelanon

4 days ago

Whitepill for midwits : Economically speaking, high agency midwits will outperform low agency philosophers any day. You may not be at the top of the class according to your grades or your teachers. But you can apply just a moderate amount of reasoning. combine it with tons of agency. And you should see your upward mobility happening in real time. Economy rewards directional action, not deep thought with no action. Stop overthinking start acting. Iterate and improve.

0

1

0

37

Mini mal

@gatelevelanon

5 days ago

@mmjukic One word answer : Trust.

0

15

Mini mal

@gatelevelanon

5 days ago

@zackslab @blind_via

1

2

0

168

Mini mal

@gatelevelanon

5 days ago

In picture below : Four volume set of Intel Architecture software development manual. Total size : 600+ 2576 + 596 + 1646 = 5418 pages !! Intel prints new chips every year that comply with the instruction set in these thousands of pages. If Intel published instruction-level performance data for the hundreds of chips it has printed over the last decade, there would be hundreds of thousands of pages of information. The effort to publish that data is massive. To make it meaningful, it has to be matched with the effort from clients and end users. No client/customer has the patience to absorb the ever changing architectural performance data at that scale every year. Every time a new chip is launched, the interested community works by benchmarking performance data for only the new instruction feature additions or a few hundred classic (ALU/MOV) instructions. Not trivial, not great ROI. This is more about incentives and ROI. Perhaps the incentives will change and the ROIs will improve some time in the future when AI will auto analyze new chips, publish systematic reports. And AIs will consume them to find optimal instruction sequences. But for humans, this is a waste of time. Why spend 6 months on finding the sequence, just wait that much time and the performance will improve with the next gen?

gatelevelanon's tweet photo. In picture below : Four volume set of Intel Architecture software development manual.

Total size : 600+ 2576 + 596 + 1646 = 5418 pages !!

Intel prints new chips every year that comply with the instruction set in these thousands of pages.

If Intel published instruction-level performance data for the hundreds of chips it has printed over the last decade, there would be hundreds of thousands of pages of information.

The effort to publish that data is massive. To make it meaningful, it has to be matched with the effort from clients and end users. No client/customer has the patience to absorb the ever changing architectural performance data at that scale every year.

Every time a new chip is launched, the interested community works by benchmarking performance data for only the new instruction feature additions or a few hundred classic (ALU/MOV) instructions.

Not trivial, not great ROI.
This is more about incentives and ROI. Perhaps the incentives will change and the ROIs will improve some time in the future when AI will auto analyze new chips, publish systematic reports. And AIs will consume them to find optimal instruction sequences. But for humans, this is a waste of time. Why spend 6 months on finding the sequence, just wait that much time and the performance will improve with the next gen?

LaurieWired

@lauriewired

6 days ago

How aware are modern compilers of exact microarchitectural layouts? Quite a lot…in one very specific way. Intel x86 is *not* the same as AMD x86. Sure…it’s the “same ISA” in the broadest sense, but the individual instructions often take different numbers of cycles. You want to stall as little as possible, so the order your instructions are arranged is somewhat important. Technically, something like “AMD Zen 3” ordering on an Intel Skylake is sub-optimal. If you look at LLVM, you’ll notice these X86Sched*.td (TableGen) files. Just about every x86 CPU generation has their own version. It defines things like ROB Size, issue width, and misprediction penalties (in terms of cycles). What’s fascinating is how much of this is “guessed” (reverse-engineered) vs “revealed” by the hardware vendors. From what I understand, Intel/AMD/etc will *sometimes* lend a helping hand / give some hints…but less than you’d expect. It’s a very weird situation when you think about it. I assume vendors are tight-lipped about exact latencies for competitive secrecy…yet those are the exact things you’d need to know to extract the most performance out of your compiler! If anyone knows other reasons for the secrecy, I’d love to hear it!

lauriewired's tweet photo. How aware are modern compilers of exact microarchitectural layouts?

Quite a lot…in one very specific way.
Intel x86 is *not* the same as AMD x86.

Sure…it’s the “same ISA” in the broadest sense, but the individual instructions often take different numbers of cycles.

You want to stall as little as possible, so the order your instructions are arranged is somewhat important. Technically, something like “AMD Zen 3” ordering on an Intel Skylake is sub-optimal.

If you look at LLVM, you’ll notice these X86Sched*.td (TableGen) files. Just about every x86 CPU generation has their own version.

It defines things like ROB Size, issue width, and misprediction penalties (in terms of cycles). What’s fascinating is how much of this is “guessed” (reverse-engineered) vs “revealed” by the hardware vendors.

From what I understand, Intel/AMD/etc will *sometimes* lend a helping hand / give some hints…but less than you’d expect.

It’s a very weird situation when you think about it. I assume vendors are tight-lipped about exact latencies for competitive secrecy…yet those are the exact things you’d need to know to extract the most performance out of your compiler!

If anyone knows other reasons for the secrecy, I’d love to hear it!

62

2K

111

427

77K

0

2

0

1

150

Mini mal

@gatelevelanon

7 days ago

LLM generated literature is the mental equivalent of cornsyrup in fruit juice - abundant, lacks nutrients, toxic.

0

1

0

25

Mini mal

@gatelevelanon

7 days ago

How AI adds 20 files and 400 edge cases to my hello world project. Nobody asked.

Framstags bei Sunjii @undercoversunj

8 days ago

Das europäische mind kann nicht verstehen was die Amerikaner unter Kaffee verstehen

749

3K

125

608

5M

1

0

41

Mini mal

@gatelevelanon

9 days ago

If you ever benchmarked recent chips originating from this part of the world, you will know that, sometimes, they absolutely MOG their American/European counterparts. Even if they look half as great on the scorecards now, wait 6 months and check back. These can absolutely transform the GPU supply chain

Financelot

@FinanceLancelot

9 days ago

Huawei has begun shipping their 910c GPU with 128GB of HBM memory. The sudden release of these chips came as a shock as China accounts for approximately 40% of $NVDA revenue once smuggling and black shipments are taken into account. This chips are racked in the Cloud Matrix with 384 GPUs meshed together with optical network. The Huawei Matrix is about 66% faster than $NVDA flagship GB200, consumes 2.5x the power but is designed specifically for China's ultra low cost energy sector. Now that the Huawei 910C is in full production, the fear is China will cut off Nvidia chips entirely.

FinanceLancelot's tweet photo. Huawei has begun shipping their 910c GPU with 128GB of HBM memory.

The sudden release of these chips came as a shock as China accounts for approximately 40% of $NVDA revenue once smuggling and black shipments are taken into account.

This chips are racked in the Cloud Matrix with 384 GPUs meshed together with optical network.

The Huawei Matrix is about 66% faster than $NVDA flagship GB200, consumes 2.5x the power but is designed specifically for China's ultra low cost energy sector.

Now that the Huawei 910C is in full production, the fear is China will cut off Nvidia chips entirely.

45

751

127

271

271K

0

43

Mini mal

@gatelevelanon

9 days ago

A simple idea to work under uncertainty (business operation, HFT, coding with LLMs, agents, etc) is to break a long uncertain task into shorter higher-certainty tasks. As you reduce the problem into smaller and smaller subproblems, the uncertainty (chances of error) per task reduces. By chaining higher certainty tasks, you can get higher overall success rates. The choice is essentially between a long task with 67% success rate versus 10 smaller tasks each with an independent success rate of 99%. P(failure | one long task ) = 67% P (failure | sequence of short tasks ) = 0.99^10 = 90.4% This is an understand that most production grade engineers follow. This implicitly separates robust engineering from vibe coding. If you want to build robust systems this is one way to think about it

0

25

Mini mal

@gatelevelanon

10 days ago

@techtoby__ Cornwall cottages are not the flex that people think it is.

0

3K

Mini mal

@gatelevelanon

10 days ago

@LolandBeast67 Not entirely paradoxical. The 'perception' is that London fintech and consultancy pays more than Cambridge silicon

1

0

115

Mini mal

@gatelevelanon

11 days ago

At least in the UK, computer architecture is an ignored subject at CS undergrad level. Most students align themselves to what they *think* the companies want - AI/ML, fintech - while ignoring the real fundamentals. Computer architecture is ignored, because it is not at all catchy or attractive. Is this right or wrong? Who cares. The job market will price it in.

Samira Khan

@samiramanabi

11 days ago

Hmm… did people forget to take the undergraduate computer architecture course?

20

581

15

457

121K

1

3

0

2

1K

Mini mal

@gatelevelanon

Last Seen Users on Sotwe

Trends for you

Most Popular Users