Tim Lawrence @TimNetwork - Twitter Profile

4 days ago

@Spiders_STG @JumpyFeller @PandaKingEX I don't totally agree on Jarvis about this, but there probably is some truth, due to how games have changed on computers, consoles, and phones, that the public won't pay for standard arcade games anymore. So they make the rides to remain economically viable. what do you think? /5

0

14

Tim Lawrence @TimNetwork

4 days ago

@Spiders_STG @JumpyFeller @PandaKingEX "We took all this flack that we were sellouts and our our game sucked and there was no skill and it's crazy, I look at Cruis'n USA from today's standards... the game actually is horribly difficult so if you put it out in an arcade today people would refuse to play it" /3

0

18

Tim Lawrence @TimNetwork

4 days ago

@Spiders_STG @JumpyFeller @PandaKingEX "they were kind of brutal on our game Cruis'n USA, I remember he was like yeah Snooz'n USA, that was his term for it Cruis'n was a huge commercial success but critically it took a lot of flack because it was more of a casual driving game actually appealed a lot to girls" /2

0

20

Tim Lawrence @TimNetwork

4 days ago

@Spiders_STG @JumpyFeller @PandaKingEX really interesting note on that from the insert coin documentary: https://t.co/J3FaVlh3VZ /1

0

1

10

Who to follow

cashonize

@cashonize

Cashonize: A cross-platform Bitcoin Cash Wallet

BCH Guru

@BCH_Guru

A unique, on-chain, peer-to-peer crypto prediction market and NFT collection, built on Bitcoin Cash mainchain with CashTokens.

Max H.

@MaxH1987

Tim Lawrence @TimNetwork

4 days ago

@Timstillherelol @EdwardDarson it's not just alliances, US citizens used to be on better terms with a lot of nations and cultures. nations have become more isolated and culture and politics have deteriorated. It's a shame really.

0

13

TimNetwork retweeted

Gamer_lafan

@gamer_lafan

4 days ago

If I had to pick one of the greatest PC Engine games, this title simply can’t be left out: Castlevania: Rondo of Blood. Even considering that it was released in 1993, it’s still praised for its incredible level of polish, with outstanding graphics, sound, opening sequence, voice acting, controls, and multiple branching paths. The CD format played a huge role in making all of that possible, which naturally made it stand out compared to competing consoles at the time like the Super Famicom and Mega Drive. If you ever get the chance, it’s definitely a game worth playing at least once.

22

735

98

119

31K

TimNetwork retweeted

Spiders @Spiders_STG

4 days ago

People purposefully misunderstanding credit feeding is not pay-to-win, but paying to lose.

3

49

5

4

3K

TimNetwork retweeted

Nome mais comum da década de 90ˢᶠᶜ

@MateusA06767493

6 days ago

Streets of Rage 2: Mega Drive vs Game Gear A versão de Mega Drive/Genesis é a clássica que muita gente conhece. Já a do Game Gear é um "demastered" muito competente. Mesmo com gráficos e som reduzidos, o jogo segura bem a essência e o gameplay. Se você nunca jogou essa versão portátil, vale demais a recomendação.

10

275

34

27

13K

TimNetwork retweeted

VLAD HOSTS THE BEST PODCAST IN BITCOIN

@Vladcostea

7 days ago

You know, I really respect bcashers for being true remnants 9 years later, even after they had 2 major community splits (BSV and XEC) even after the lead developer left even after market has clearly favored small block BTC while the hashrate is ~1% of Bitcoin’s they keep going lots of high profile big blockers dropped their support around 2020 to embrace Ethereum but there are still devs shipping new features and OP codes that put Bitcoin Core to shame in spite of having fewer resources. Never had inflation bugs, the lesser degree of conservativeness didn’t make the dev team sloppy They even reported the 2018 inflation bug in Bitcoin, didn’t try to exploit it in a time when they could easily become destructive. Wasabi 2.0 was inspired by Cash Fusion, many of the covenant proposals are iterations of the already available OP_CheckDataSig and the reactivation of OP_CAT. The big blockers actually did a great job scaling Bitcoin, settling for dynamic 32 MB blocks that can be bigger only if the user pays for it, and ironically setting up much stronger foundations for trust minimized L2s. They can even run a better version of Lightning, with fewer hiccups. Without any grants from HRF, Jack Dorsey’s companies, MIT, or mainstream financial institutions – which is probably why they were able to ship code instead of trying to find the meaning of the word “consensus”. Bitcoin Cash did a great job scaling and making Satoshi’s codebase more useful. I wish Core would learn and try to compete instead of “bikeshedding”. I also wish it wasn’t taboo to look at competing codebases and take the best parts. You know, like actual cypherpunks who don’t need to signal loyalty to a church in order to be taken seriously.

29

198

47

6

9K

TimNetwork retweeted

Mason @FinalMasonry

5 days ago

@PandaKingEX There are some examples people will find. On YouTube, Terminator 2 only has TAS, "longplays", and scores which aren't 1cc's. But what gets lost in the conversation is that there's an ocean of games that can be 1cc'ed. It isn't rare. So many I'll never run out of stuff to play.

0

12

2

0

417

TimNetwork retweeted

VISUELLE GAMING

@VisuelleGaming

5 days ago

R4: Ridge Racer Type 4 (1998) on Sony PlayStation. #ridgeracer #ps1 #retrogaming #gaming #nostalgia

1

158

25

12

6K

Tim Lawrence @TimNetwork

5 days ago

@stevegalaxius @InsaneMegaCD @willbobgill @darksavior2023 you could just say "oh, my bad, you were right"

0

2

0

20

TimNetwork retweeted

Falco Girgis

@falco_girgis

6 days ago

YEEEEES!!! GAINZ!!! Was up all night hand-rolling assembly routines for the Sega Dreamcast, playing SH4 instruction Tetris, with the goal of maximizing the gainz for the custom memcpy() replacement in my SH4ZAM accelerated math library. We discovered not too long ago that a piss-simple for-loop that does a byte-by-byte copy, written in plain C can actually stomp on the Newlib memcpy() implementation we get backing our C standard library within our SH GCC toolchains, given -O3 is enabled with loop unrolling... This is obviously unacceptable for a community of engineers looking to push a piece of retro hardware to its limits, so we set off to look for alternative implementations... We managed to find an extremely efficient one, written by STMicroelectronics Ltd for the SH4, which we were absolutely thrilled with... only... the license? LGPLv2.1, which wasn't going to be fit to power an entire community of commercial and open-source homebrewers as our dedicated memcpy() replacement within our toolchains... So I got sick of that shit and decided to embark on a quest to roll my own hand-optimized, generic memcpy() replacement, as part of my SH4ZAM library... Fast forward through months and months of pain and misery, continuous benchmarking, constant bugfixes, and several iterations of rewrites, and I've FINALLY absolutely DESTROYED both of the generic memcpy() implementations by Newlib and STMicroelectronics! What you're seeing here in the left and middle panes is the complete out-of-line ASM implementation for the most critical, highest-throughput, fastest copy path in the whole generic memcpy() implementation... The pathological "best-case scenario." On the right, you can see the top-level dispatcher and entry-point for shz_memcpy(), which is basically written to assess the size of the buffer we're copying, along with source and destination buffer alignments, in order to determine the most efficient hand-written ASM path to forward the call onto, which will handle the bulk of the transfer. If the given pointers are not ideally aligned, the algorithm does a slower copy on the remaining bytes at the beginning and end of the destination buffer (shz_memcpy1()), until the buffer becomes cache-line aligned... at which point it chooses the best specialization it can, based on the remaining number of bytes which need to be copied and source buffer alignment. Switch back now to the left pane, and you're looking at the fastest fast path... which can be selected to handle this bulk-copy... It requires a copy size of 128-byte multiples, a destination buffer alignment of 32 bytes or greater, and a source buffer alignment of 8 bytes or greater, in order to do its magic. Here is a list of every trick I leveraged in its implementation: 1) The main copy loop is unrolled so that the entire FPU on the SH4 CPU--2 banks of 16 single-precision, 4 byte float registers--gets filled from the source buffer and written to the destination buffer, for a copy size of 128 bytes (4 cache lines) per iteration. 2) I swap to double-precision load/store mode with the FSCHG instruction, so that I can load and store 8-bytes into 2 registers at a time, for the same cycle cost as a regular 4-byte load/store. 3) I align the stack up to 8-bytes upon entry, so that I can push/pop the values of the FP regs that I'm clobbering, 8-bytes at a time as well. This is faster than GCC knows how to manage the stack. 4) I devised a complex prefetching scheme for streaming the contents of the source buffer into the 4 cache lines that get read each iteration of the main loop before they're actually accessed, which is WAY harder than it sounds for the SH4... Any two overlapping prefetches is a stall, a write (even if it's a cache hit), during a prefetch is a stall, a cache miss during a prefetch is a double-fisted stall, and you need about 11 cycles for a prefetch to complete... so basically, looking at the SH4 the wrong way while it's issuing a prefetch will result in a full CPU pipeline stall, negating all gainz! The main problem is that there is not enough cycles of non-stalling work for simply prefetching the cache line right before the cache line we're issuing load instructions on... so I've had to devise a scheme where prefetching happens TWO cache-lines ahead, so that they have plenty of time to complete before they are actually used within the pipeline. 5) The destination buffer, despite being a write-only buffer, will also result in a big-ass pipeline stall if it's not resident within the cache... meaning everything will stall while the PREVIOUS VALUE we're about to overwrite gets loaded... which is something we ain't got time for... So I am manually "preallocating" the destination cache lines, one cache line ahead, just before I do a write to them, so that they are already resident within the cache, and there will be no stall, by the time they are written to. 6) I'm carefully pairing instructions based on their "group" types, as compatible instructions which are using different areas of the chip are able to leverage the superscalar nature of the pipeline and be dual-issued, so that they execute in parallel. If you look at the group starting on line 75, you'll notice that I'm strategically interleaving integer ALU work while I'm pushing pairs of FP registers onto the stack with the FPU, as they execute in parallel. 7) I'm aligning the code for the hot 128-byte copy loop body to a 32-byte boundary within the .text segment, which is the size of an instruction cache line, so that it fits into as a few as possible, reducing the number of pipeline stalls on icache fetches, while the icache warms up, during the first iteration. So after ALL OF THAT BS COMBINES, you can see the results of one of my performance benchmarks, which is copying a 12KB buffer, whose source and destination addresses straddle cache lines and are unaligned... The results are quite drastic. When the instruction cache and data cache are hot, I achieve a 1.9605x performance speedup over our builtin memcpy()! When they're both cold, I achieve a whopping speedup of 3.5408x, due to the hell which I went through to manually manage the cache! For the second run, I pitted STMicroelectronics' "fast_memcpy()" against Newlib's, which resulted in only a speedup of 1.8867x and 1.8913x for the hot and cold cache scenarios, respective... meaning I BEAT STMicro!!!! HEEEEEEEEELL YEAH, BABY!!! Here's the source-code for the full shz_memcpy128() implementation, which you can check out, if you're feeling brave of heart: https://t.co/01gZZVlfb6 SH4ZAM already ships with the KallistiOS SDK for Sega Dreamcast as a first-party, built-in library within kos-ports... so go pull down the latest commit and git in on deez gainz!! 💪

falco_girgis's tweet photo. YEEEEES!!! GAINZ!!! Was up all night hand-rolling assembly routines for the Sega Dreamcast, playing SH4 instruction Tetris, with the goal of maximizing the gainz for the custom memcpy() replacement in my SH4ZAM accelerated math library.

We discovered not too long ago that a piss-simple for-loop that does a byte-by-byte copy, written in plain C can actually stomp on the Newlib memcpy() implementation we get backing our C standard library within our SH GCC toolchains, given -O3 is enabled with loop unrolling...

This is obviously unacceptable for a community of engineers looking to push a piece of retro hardware to its limits, so we set off to look for alternative implementations...

We managed to find an extremely efficient one, written by STMicroelectronics Ltd for the SH4, which we were absolutely thrilled with... only... the license? LGPLv2.1, which wasn't going to be fit to power an entire community of commercial and open-source homebrewers as our dedicated memcpy() replacement within our toolchains...

So I got sick of that shit and decided to embark on a quest to roll my own hand-optimized, generic memcpy() replacement, as part of my SH4ZAM library...

Fast forward through months and months of pain and misery, continuous benchmarking, constant bugfixes, and several iterations of rewrites, and I've FINALLY absolutely DESTROYED both of the generic memcpy() implementations by Newlib and STMicroelectronics!

What you're seeing here in the left and middle panes is the complete out-of-line ASM implementation for the most critical, highest-throughput, fastest copy path in the whole generic memcpy() implementation... The pathological "best-case scenario."

On the right, you can see the top-level dispatcher and entry-point for shz_memcpy(), which is basically written to assess the size of the buffer we're copying, along with source and destination buffer alignments, in order to determine the most efficient hand-written ASM path to forward the call onto, which will handle the bulk of the transfer.

If the given pointers are not ideally aligned, the algorithm does a slower copy on the remaining bytes at the beginning and end of the destination buffer (shz_memcpy1()), until the buffer becomes cache-line aligned... at which point it chooses the best specialization it can, based on the remaining number of bytes which need to be copied and source buffer alignment.

Switch back now to the left pane, and you're looking at the fastest fast path... which can be selected to handle this bulk-copy... It requires a copy size of 128-byte multiples, a destination buffer alignment of 32 bytes or greater, and a source buffer alignment of 8 bytes or greater, in order to do its magic.

Here is a list of every trick I leveraged in its implementation:

1) The main copy loop is unrolled so that the entire FPU on the SH4 CPU--2 banks of 16 single-precision, 4 byte float registers--gets filled from the source buffer and written to the destination buffer, for a copy size of 128 bytes (4 cache lines) per iteration.

2) I swap to double-precision load/store mode with the FSCHG instruction, so that I can load and store 8-bytes into 2 registers at a time, for the same cycle cost as a regular 4-byte load/store.

3) I align the stack up to 8-bytes upon entry, so that I can push/pop the values of the FP regs that I'm clobbering, 8-bytes at a time as well. This is faster than GCC knows how to manage the stack.

4) I devised a complex prefetching scheme for streaming the contents of the source buffer into the 4 cache lines that get read each iteration of the main loop before they're actually accessed, which is WAY harder than it sounds for the SH4...

Any two overlapping prefetches is a stall, a write (even if it's a cache hit), during a prefetch is a stall, a cache miss during a prefetch is a double-fisted stall, and you need about 11 cycles for a prefetch to complete... so basically, looking at the SH4 the wrong way while it's issuing a prefetch will result in a full CPU pipeline stall, negating all gainz!

The main problem is that there is not enough cycles of non-stalling work for simply prefetching the cache line right before the cache line we're issuing load instructions on... so I've had to devise a scheme where prefetching happens TWO cache-lines ahead, so that they have plenty of time to complete before they are actually used within the pipeline.

5) The destination buffer, despite being a write-only buffer, will also result in a big-ass pipeline stall if it's not resident within the cache... meaning everything will stall while the PREVIOUS VALUE we're about to overwrite gets loaded... which is something we ain't got time for...

So I am manually "preallocating" the destination cache lines, one cache line ahead, just before I do a write to them, so that they are already resident within the cache, and there will be no stall, by the time they are written to.

6) I'm carefully pairing instructions based on their "group" types, as compatible instructions which are using different areas of the chip are able to leverage the superscalar nature of the pipeline and be dual-issued, so that they execute in parallel.

If you look at the group starting on line 75, you'll notice that I'm strategically interleaving integer ALU work while I'm pushing pairs of FP registers onto the stack with the FPU, as they execute in parallel.

7) I'm aligning the code for the hot 128-byte copy loop body to a 32-byte boundary within the .text segment, which is the size of an instruction cache line, so that it fits into as a few as possible, reducing the number of pipeline stalls on icache fetches, while the icache warms up, during the first iteration.

So after ALL OF THAT BS COMBINES, you can see the results of one of my performance benchmarks, which is copying a 12KB buffer, whose source and destination addresses straddle cache lines and are unaligned...

The results are quite drastic. When the instruction cache and data cache are hot, I achieve a 1.9605x performance speedup over our builtin memcpy()! When they're both cold, I achieve a whopping speedup of 3.5408x, due to the hell which I went through to manually manage the cache!

For the second run, I pitted STMicroelectronics' "fast_memcpy()" against Newlib's, which resulted in only a speedup of 1.8867x and 1.8913x for the hot and cold cache scenarios, respective... meaning I BEAT STMicro!!!! HEEEEEEEEELL YEAH, BABY!!!

Here's the source-code for the full shz_memcpy128() implementation, which you can check out, if you're feeling brave of heart: https://t.co/01gZZVlfb6

SH4ZAM already ships with the KallistiOS SDK for Sega Dreamcast as a first-party, built-in library within kos-ports... so go pull down the latest commit and git in on deez gainz!! 💪

15

209

29

7K

Tim Lawrence @TimNetwork

6 days ago

@thatmechaguy @revenant_MMXX https://t.co/FAk0BrgCzp

SSNTails @SSNTails

6 days ago

Just because a machine is old, doesn't mean the emulation has been perfected. Sometimes testing on real hardware is the only way. Look at how that framerate tanks in the first second or two! It's amazing how far we've come.

3

323

30

93

12K

0

27

Tim Lawrence @TimNetwork

6 days ago

@AlbionHero @revenant_MMXX https://t.co/FAk0BrgCzp

SSNTails @SSNTails

6 days ago

Just because a machine is old, doesn't mean the emulation has been perfected. Sometimes testing on real hardware is the only way. Look at how that framerate tanks in the first second or two! It's amazing how far we've come.

3

323

30

93

12K

0

11

Tim Lawrence @TimNetwork

6 days ago

@chazzerrobo https://t.co/FAk0BrgCzp

SSNTails @SSNTails

6 days ago

Just because a machine is old, doesn't mean the emulation has been perfected. Sometimes testing on real hardware is the only way. Look at how that framerate tanks in the first second or two! It's amazing how far we've come.

3

323

30

93

12K

0

10

TimNetwork retweeted

SSNTails @SSNTails

6 days ago

Just because a machine is old, doesn't mean the emulation has been perfected. Sometimes testing on real hardware is the only way. Look at how that framerate tanks in the first second or two! It's amazing how far we've come.

3

323

30

93

12K

TimNetwork retweeted

Saxman @SaxmanSonic

6 days ago

This was the first attempt at "copper" skies, pre-0.1. Most emulators didn't even support them, and ones that did had different results. When tested on hardware, it just didn't work right. A few weeks after 0.1 released, Chilly Willy fixed the bug in his interrupt handler.

2

103

13

22

18K

TimNetwork retweeted

倉津ゆえ

@YueKuratsu

8 days ago

A BASIC interpreter was once planned for release on the SEGA GENESIS. While a BASIC interpreter running on a modern OS like the Amiga would be nice, there isn't a single MC68000 computer in the world that allows you to insert a ROM cartridge and start programming immediately. Therefore, I still hope that SEGA will release a BASIC interpreter for the GENESIS!

YueKuratsu's tweet photo. A BASIC interpreter was once planned for release on the SEGA GENESIS.

While a BASIC interpreter running on a modern OS like the Amiga would be nice, there isn't a single MC68000 computer in the world that allows you to insert a ROM cartridge and start programming immediately.

Therefore, I still hope that SEGA will release a BASIC interpreter for the GENESIS!

15

235

56

22

10K

Tim Lawrence @TimNetwork

7 days ago

@yoshinokentarou yes, I have understood that Hudson always intended to have a 3D accelerator for the PC-FX, but it wasn't possible at release. They had shown simple 3D polygon demos before the PC-FX was released. Would love to see new software made for it.

0

1

0

163

Tim Lawrence

@TimNetwork

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users