muke @muke1010101 - Twitter Profile

Pinned Tweet

over 2 years ago

Pleased to share my first ever paper :) 'Improving Memory Dependence Prediction with Static Analysis': https://t.co/j7hpnnlb43

3

22

6

4

2K

muke @muke1010101

6 days ago

This is a really great + beginner friendly article on the topic for anyone interested: https://t.co/3sedMTnFOZ

0

57

muke @muke1010101

6 days ago

the next exciting thing im working on now is gonna be multi-branch prediction, i.e. predicting more than 1 branch per cycle to keep up with the fetch bandwidth demands of next generation CPUs. This has been announced to be included in Zen 5 too, so it's a hot topic atm

1

4

0

143

muke @muke1010101

6 days ago

on the bright side the feedback from the more positive reviews was useful, so i'll include that for a submission to HPCA in a couple months. hopefully at some point i can just be done with this project entirely

0

2

0

23

Who to follow

CONVOLVE

@CONVOLVEeurope

Seamless design of Smart Edge Processors. #CONVOLVEeurope #MadeinEurope #TechforEurope

meowy catgirl

@nyanotech

🏳️‍⚧️ it/she | 29 | meows at computers | catgirl on main | headpats / hugs / scritches ok

Jax

@IgnorantPedant

Retired beekeeper. I like Defect/Regent Slay the Spire

muke @muke1010101

6 days ago

tragically this didn't go well, got 2 strong rejects which was enough to not have the option for rebuttal. Frustratingly, it felt as if the reviewers giving strong rejects didn't understand that in doing so they were preventing me from addressing their very simple questions.

muke @muke1010101

15 days ago

meant to be getting the reviews for my MICRO submission today😵‍💫

2

3

0

148

1

2

0

73

muke @muke1010101

15 days ago

same anxiety as waiting for exam results. its not the end of the world if i don't get in but i'm not looking forward to having something i obsessed over torn apart.

0

2

0

46

muke @muke1010101

15 days ago

meant to be getting the reviews for my MICRO submission today😵‍💫

2

3

0

148

muke @muke1010101

22 days ago

Also, the icache prefetcher should really be fetch-directed prefetching with a decoupled frontend, which Gem5 does now have but its kind of broken and ruins BTB accuracy for now. But if that ever changes definitely include that too.

0

2

0

54

muke @muke1010101

22 days ago

Table of Gem5 params for a paper I'm writing. The large model is meant to be a modern high-end workstation/server core. Hoping this helps lay out all the many small things you wanna be tuning in Gem5.

muke1010101's tweet photo. Table of Gem5 params for a paper I'm writing. The large model is meant to be a modern high-end workstation/server core. Hoping this helps lay out all the many small things you wanna be tuning in Gem5. https://t.co/TD2aPzr8fL

1

7

2

7

843

muke @muke1010101

22 days ago

Not listed here: the default L2 latency is 40 cycles, which is what the L3 should be (!), set that to like 14. Also the L1-I can get away with 1 because in reality you'd use a u-op cache. You *can* use DDR5 but DDR4 actually has lower latency, which is better for spec.

1

0

57

muke @muke1010101

about 1 month ago

Waiting for work by someone in my research group to get published and then it'll also include the state of the art in data prefetching too

0

1

0

36

muke @muke1010101

about 1 month ago

This has now been updated to be based off Gem5 v25 which includes a decoupled frontend. I'm having a lot of fun making this better and better tbh.

muke @muke1010101

about 1 month ago

Been tuning a bunch of things in Gem5 for a paper lately and just pushed+listed them all in my fork, includes things like having an actual L3 cache

muke1010101's tweet photo. Been tuning a bunch of things in Gem5 for a paper lately and just pushed+listed them all in my fork, includes things like having an actual L3 cache https://t.co/8awvRqVQzc

1

2

0

418

1

0

325

muke @muke1010101

about 1 month ago

just so many more graphs i need to plot in all my future works lol

0

40

muke @muke1010101

about 1 month ago

wow, why did no one tell me gem5 v25 implemented a new store sets

1

0

251

muke @muke1010101

about 1 month ago

ok somebody made it have tagged associative entries now, this is breaking my brain a little

2

0

58

muke @muke1010101

about 1 month ago

I actually think using LLVM isn't fair here because flang is so bad of a frontend that it leaves so many opportunities for PGO to pick up. But just looking at the C/++ workloads it seems to almost always make it worse!

0

1

0

53

muke @muke1010101

about 1 month ago

Been playing about with PGO on spec2017 and this is what Gem5 says it does to performance: (profiles came from the train inputs and compiled with LLVM 22)

muke1010101's tweet photo. Been playing about with PGO on spec2017 and this is what Gem5 says it does to performance:

(profiles came from the train inputs and compiled with LLVM 22) https://t.co/rhmG0fss9t

1

2

0

79

muke @muke1010101

about 1 month ago

tempted to say i wish it came out from the start of my phd but, maybe for the best i was forced to learn how to do this stuff decently by hand before having it automated anyway, what we really need now are agents that are integrated into your environment

0

30

muke @muke1010101

about 1 month ago

i have to say phd work has become twice as easy since claude, all the tedious python scripting for wrangling results and plotting graphs are like the perfect use case

2

1

0

226

muke @muke1010101

about 1 month ago

because a 10x'er has never used either 😌

Joel 🇦🇺

@ptr_to_joel

about 1 month ago

you can measure how good an engineer is by their opinion on the jvm / java

62

175

5

26

107K

0

4

0

287

muke @muke1010101

about 1 month ago

On the other hand they removed x264 which was disproportionately my biggest performance win. They replaced it with flac but as that works on 1D data instead of 2D there'll probably be fewer independent loads for my work to win on </3

0

1

0

47

muke @muke1010101

about 1 month ago

Spec2026 is released. I look forward to figuring out all the new miscellaneous problems like 'bwaves needs 16GB of stack space' and 'xz compiles to insane bitshifts that break gem5's X86 frontend'

1

7

1

0

504

muke @muke1010101

about 1 month ago

Seriously though some of the new workload selections look cool. I'm surprised it took this long to switch from perl to python, but it makes way more sense + it'll be nice not to have to disable strict aliasing and LTO. It also includes Gem5! So I'll be simulating Gem5 on Gem5

1

0

71

muke

@muke1010101

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users