Underfox

@Underfox3

Physicist, Telecom Engineering lover, HPC Enthusiast. Prog Rock/Metal fan.

Joined December 2017

129 Following

9.6K Followers

17.7K Posts

Pinned Tweet

Underfox @Underfox3

almost 5 years ago

Researchers have developed a new simulator to predict the throughput of basic blocks of all Intel Core μarchs released in the last decade, demonstrating to be more accurate than the predictions of state-of-the-art tools by more than an order of magnitude. https://t.co/83UDBQSchX

Underfox3's tweet photo. Researchers have developed a new simulator to predict the throughput of basic blocks of all Intel Core μarchs released in the last decade, demonstrating to be more accurate than the predictions of state-of-the-art tools by more than an order of magnitude.

https://t.co/83UDBQSchX https://t.co/U22j4rZjnJ

5

687

183

220

0

Underfox @Underfox3

about 2 hours ago

These findings represent a masive breakthrough in width-scaling rules using 2D nanoribbon transistors with enhanced performance at narrower channel widths, which is promising for the ultimate scaling of transistors.

Underfox3's tweet photo. These findings represent a masive breakthrough in width-scaling rules using 2D nanoribbon transistors with enhanced performance at narrower channel widths, which is promising for the ultimate scaling of transistors. https://t.co/TFRw4BZAFS

0

3

1

0

200

Underfox @Underfox3

about 2 hours ago

In this paper, researchers have demonstrated atomically thin monolayer and bilayer molybdenum disulfide nanoribbon transistors that break the width-scaling wall down to 15 nm. https://t.co/l0pKVk1LpD

Underfox3's tweet photo. In this paper, researchers have demonstrated atomically thin monolayer and bilayer molybdenum disulfide nanoribbon transistors that break the width-scaling wall down to 15 nm.

https://t.co/l0pKVk1LpD https://t.co/UnPrCu5MUT

1

4

1

4

302

Underfox @Underfox3

about 2 hours ago

The ultra-narrow nanoribbon transistors maintain the highest on/off ratios reported so far (10^6) for similar device dimensions, with improved mobility and threshold-voltage stability, indicating reduced edge scattering and depletion, along with stronger electrostatic control.

Underfox3's tweet photo. The ultra-narrow nanoribbon transistors maintain the highest on/off ratios reported so far (10^6) for similar device dimensions, with improved mobility and threshold-voltage stability, indicating reduced edge scattering and depletion, along with stronger electrostatic control. https://t.co/Y5ic9aFo16

1

3

1

0

214

Who to follow

Verified account

Create, Clean, Consume is my aspirational routine. My interests math, computer graphics, silicon, software and music.

Verified account

SemiAnalysis Boutique AI Infrastructure Research and Consulting DMs are open for consulting, quotes, or to talk shop, Opinions my own

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠

Verified account

Consultant, Chief Analyst, Influencer. Substack: https://t.co/yEtnDropHp (@MoreThanMoore2x) Youtube: https://t.co/1t9pRrV860 (@TechTechPotato)

Underfox @Underfox3

about 7 hours ago

The results show that CRAM-ER presents near-lossless accuracy with 10× better energy efficiency and a 2× improvement in EDP over the A100 GPU. Furthermore, CRAM-ER achieves up to 70× higher energy efficiency than CPUs and GPUs while reaching near-HBM2 throughput.

Underfox3's tweet photo. The results show that CRAM-ER presents near-lossless accuracy with 10× better energy efficiency and a 2× improvement in EDP over the A100 GPU. Furthermore, CRAM-ER achieves up to 70× higher energy efficiency than CPUs and GPUs while reaching near-HBM2 throughput. https://t.co/IFhMMdhrSj

0

2

0

1

300

Underfox @Underfox3

about 7 hours ago

In this paper, researchers proposed an error-resilient CRAM architecture for scalable in-memory matrix-vector multiplications, mitigating the impact of device-level errors and demonstrating high area and energy efficiency. https://t.co/WKAgJYR5vw

Underfox3's tweet photo. In this paper, researchers proposed an error-resilient CRAM architecture for scalable in-memory matrix-vector multiplications, mitigating the impact of device-level errors and demonstrating high area and energy efficiency.

https://t.co/WKAgJYR5vw https://t.co/Gz9hQgGJOx

1

17

4

10

738

Underfox @Underfox3

about 7 hours ago

The proposed architecture enables parallel in-situ multiplications and error-resilient additions. Partitioning MACs between CMOS and MRAM at the bit level provides an optimal trade-off between area overhead and processing efficiency.

Underfox3's tweet photo. The proposed architecture enables parallel in-situ multiplications and error-resilient additions. Partitioning MACs between CMOS and MRAM at the bit level provides an optimal trade-off between area overhead and processing efficiency. https://t.co/oe5gV8y9ft

1

2

0

1

333

Underfox3 retweeted

Underfox @Underfox3

1 day ago

In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning https://t.co/1fQ7AMvbwY

Underfox3's tweet photo. In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning

https://t.co/1fQ7AMvbwY https://t.co/VvvhOf005N

Underfox3's tweet photo. In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning

https://t.co/1fQ7AMvbwY https://t.co/VvvhOf005N

Underfox3's tweet photo. In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning

https://t.co/1fQ7AMvbwY https://t.co/VvvhOf005N

Underfox3's tweet photo. In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning

https://t.co/1fQ7AMvbwY https://t.co/VvvhOf005N

1

23

5

16

2K

Underfox3 retweeted

Underfox @Underfox3

1 day ago

Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput. https://t.co/VzCi2eUvC1

Underfox3's tweet photo. Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput.

https://t.co/VzCi2eUvC1 https://t.co/blVd6sqOlw

Underfox3's tweet photo. Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput.

https://t.co/VzCi2eUvC1 https://t.co/blVd6sqOlw

Underfox3's tweet photo. Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput.

https://t.co/VzCi2eUvC1 https://t.co/blVd6sqOlw

Underfox3's tweet photo. Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput.

https://t.co/VzCi2eUvC1 https://t.co/blVd6sqOlw

1

40

10

34

2K

Underfox @Underfox3

1 day ago

In this paper is presented a review on the recent progress in quantum-hardware-based simulations of condensed matter, primarily emphasizing gate-based digital quantum computer simulation, with analog experiments discussed as complementary benchmarks. https://t.co/KfcEHc8UVY

Underfox3's tweet photo. In this paper is presented a review on the recent progress in quantum-hardware-based simulations of condensed matter, primarily emphasizing gate-based digital quantum computer simulation, with analog experiments discussed as complementary benchmarks.

https://t.co/KfcEHc8UVY https://t.co/WfOK3WtCFI

Underfox3's tweet photo. In this paper is presented a review on the recent progress in quantum-hardware-based simulations of condensed matter, primarily emphasizing gate-based digital quantum computer simulation, with analog experiments discussed as complementary benchmarks.

https://t.co/KfcEHc8UVY https://t.co/WfOK3WtCFI

Underfox3's tweet photo. In this paper is presented a review on the recent progress in quantum-hardware-based simulations of condensed matter, primarily emphasizing gate-based digital quantum computer simulation, with analog experiments discussed as complementary benchmarks.

https://t.co/KfcEHc8UVY https://t.co/WfOK3WtCFI

Underfox3's tweet photo. In this paper is presented a review on the recent progress in quantum-hardware-based simulations of condensed matter, primarily emphasizing gate-based digital quantum computer simulation, with analog experiments discussed as complementary benchmarks.

https://t.co/KfcEHc8UVY https://t.co/WfOK3WtCFI

0

17

4

11

1K

Underfox @Underfox3

1 day ago

This unified formulation is enabled by modality-specific encoders, structured token arrangements, and a Mixture-of-Transformers backbone that couples autoregressive reasoning with diffusion-based generation.

Underfox3's tweet photo. This unified formulation is enabled by modality-specific encoders, structured token arrangements, and a Mixture-of-Transformers backbone that couples autoregressive reasoning with diffusion-based generation. https://t.co/27G6QGRjAQ

Underfox3's tweet photo. This unified formulation is enabled by modality-specific encoders, structured token arrangements, and a Mixture-of-Transformers backbone that couples autoregressive reasoning with diffusion-based generation. https://t.co/27G6QGRjAQ

Underfox3's tweet photo. This unified formulation is enabled by modality-specific encoders, structured token arrangements, and a Mixture-of-Transformers backbone that couples autoregressive reasoning with diffusion-based generation. https://t.co/27G6QGRjAQ

Underfox3's tweet photo. This unified formulation is enabled by modality-specific encoders, structured token arrangements, and a Mixture-of-Transformers backbone that couples autoregressive reasoning with diffusion-based generation. https://t.co/27G6QGRjAQ

0

2

1

1

334

Underfox @Underfox3

1 day ago

Finally, Nvidia have introduced Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. https://t.co/AhC5lE3Xkx

Underfox3's tweet photo. Finally, Nvidia have introduced Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture.

https://t.co/AhC5lE3Xkx https://t.co/XMWxiTnrlm

Underfox3's tweet photo. Finally, Nvidia have introduced Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture.

https://t.co/AhC5lE3Xkx https://t.co/XMWxiTnrlm

Underfox3's tweet photo. Finally, Nvidia have introduced Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture.

https://t.co/AhC5lE3Xkx https://t.co/XMWxiTnrlm

Underfox3's tweet photo. Finally, Nvidia have introduced Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture.

https://t.co/AhC5lE3Xkx https://t.co/XMWxiTnrlm

1

10

4

5

857

Underfox @Underfox3

1 day ago

Moreover, because the worm requires no commercial AI platform, centralized safety controls, such as service refusals or rate limiting, are structurally irrelevant. This results demonstrate that self-sustaining AI-driven cyber-threats are no longer theoretical.

Underfox3's tweet photo. Moreover, because the worm requires no commercial AI platform, centralized safety controls, such as service refusals or rate limiting, are structurally irrelevant. This results demonstrate that self-sustaining AI-driven cyber-threats are no longer theoretical. https://t.co/lSHL2FjPSi

0

1

1

1

718

Underfox @Underfox3

1 day ago

In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning https://t.co/1fQ7AMvbwY

Underfox3's tweet photo. In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning

https://t.co/1fQ7AMvbwY https://t.co/VvvhOf005N

Underfox3's tweet photo. In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning

https://t.co/1fQ7AMvbwY https://t.co/VvvhOf005N

Underfox3's tweet photo. In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning

https://t.co/1fQ7AMvbwY https://t.co/VvvhOf005N

Underfox3's tweet photo. In this paper, researchers have demonstrated that an off-the-shelf open-weight LLM, quantized to fit on a single GPU, suffices to drive a worm AI agent that gains privileged access to machines and replicates itself. #DeepLearning

https://t.co/1fQ7AMvbwY https://t.co/VvvhOf005N

1

23

5

16

2K

Underfox @Underfox3

1 day ago

Since the worm is powered by stolen compute, the attacker’s marginal cost per new infection is zero. This creates a destabilizing economic asymmetry between attackers and defenders.

Underfox3's tweet photo. Since the worm is powered by stolen compute, the attacker’s marginal cost per new infection is zero. This creates a destabilizing economic asymmetry between attackers and defenders. https://t.co/ptsDxsKmhA

1

1

1

0

766

Underfox @Underfox3

1 day ago

The results show that HiGS renders up to ∼15.8× faster than the original 3DGS and outperforms every other rasterizer evaluated, while preserving exact front-to-back alpha compositing.

Underfox3's tweet photo. The results show that HiGS renders up to ∼15.8× faster than the original 3DGS and outperforms every other rasterizer evaluated, while preserving exact front-to-back alpha compositing. https://t.co/zbg4NqATbJ

Underfox3's tweet photo. The results show that HiGS renders up to ∼15.8× faster than the original 3DGS and outperforms every other rasterizer evaluated, while preserving exact front-to-back alpha compositing. https://t.co/zbg4NqATbJ

Underfox3's tweet photo. The results show that HiGS renders up to ∼15.8× faster than the original 3DGS and outperforms every other rasterizer evaluated, while preserving exact front-to-back alpha compositing. https://t.co/zbg4NqATbJ

0

1

1

0

221

Underfox @Underfox3

1 day ago

In this paper, NVIDIA researchers have proposed Hierarchically Tiled Gaussian Splatting (HiGS), a 3D Gaussian Splatting rendering architecture in which spatial partitioning and rasterization operate at different granularities. https://t.co/nKtdNCLC9e

Underfox3's tweet photo. In this paper, NVIDIA researchers have proposed Hierarchically Tiled Gaussian Splatting (HiGS), a 3D Gaussian Splatting rendering architecture in which spatial partitioning and rasterization operate at different granularities.

https://t.co/nKtdNCLC9e https://t.co/N2bWSPO0ol

Underfox3's tweet photo. In this paper, NVIDIA researchers have proposed Hierarchically Tiled Gaussian Splatting (HiGS), a 3D Gaussian Splatting rendering architecture in which spatial partitioning and rasterization operate at different granularities.

https://t.co/nKtdNCLC9e https://t.co/N2bWSPO0ol

Underfox3's tweet photo. In this paper, NVIDIA researchers have proposed Hierarchically Tiled Gaussian Splatting (HiGS), a 3D Gaussian Splatting rendering architecture in which spatial partitioning and rasterization operate at different granularities.

https://t.co/nKtdNCLC9e https://t.co/N2bWSPO0ol

Underfox3's tweet photo. In this paper, NVIDIA researchers have proposed Hierarchically Tiled Gaussian Splatting (HiGS), a 3D Gaussian Splatting rendering architecture in which spatial partitioning and rasterization operate at different granularities.

https://t.co/nKtdNCLC9e https://t.co/N2bWSPO0ol

1

2

2

3

510

Underfox @Underfox3

1 day ago

This reshapes work decomposition from screen-area proportional to density proportional and eliminates the rasterizer tail effect inherent in single-tile-size pipelines.

Underfox3's tweet photo. This reshapes work decomposition from screen-area proportional to density proportional and eliminates the rasterizer tail effect inherent in single-tile-size pipelines. https://t.co/Wzq9awnQwJ

1

3

1

0

316

Underfox @Underfox3

1 day ago

"G-LFQ achieved the highest peak throughput in several settings, while G-WFQ was the most robust across architectures and workload mixes, sustaining performance under contention and degrading more gracefully at high thread counts."

Underfox3's tweet photo. "G-LFQ achieved the highest peak throughput in several settings, while G-WFQ was the most robust across architectures and workload mixes, sustaining performance under contention and degrading more gracefully at high thread counts." https://t.co/vPktBFcppI

Underfox3's tweet photo. "G-LFQ achieved the highest peak throughput in several settings, while G-WFQ was the most robust across architectures and workload mixes, sustaining performance under contention and degrading more gracefully at high thread counts." https://t.co/vPktBFcppI

Underfox3's tweet photo. "G-LFQ achieved the highest peak throughput in several settings, while G-WFQ was the most robust across architectures and workload mixes, sustaining performance under contention and degrading more gracefully at high thread counts." https://t.co/vPktBFcppI

Underfox3's tweet photo. "G-LFQ achieved the highest peak throughput in several settings, while G-WFQ was the most robust across architectures and workload mixes, sustaining performance under contention and degrading more gracefully at high thread counts." https://t.co/vPktBFcppI

0

1

1

0

438

Underfox @Underfox3

1 day ago

Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput. https://t.co/VzCi2eUvC1

Underfox3's tweet photo. Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput.

https://t.co/VzCi2eUvC1 https://t.co/blVd6sqOlw

Underfox3's tweet photo. Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput.

https://t.co/VzCi2eUvC1 https://t.co/blVd6sqOlw

Underfox3's tweet photo. Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput.

https://t.co/VzCi2eUvC1 https://t.co/blVd6sqOlw

Underfox3's tweet photo. Researchers have proposed a bounded GPU-aware wait-free queue with explicit theorem-grounded progress guarantees, and a bounded GPU lock-free queue design that uses wave-batched fast paths to maximize throughput.

https://t.co/VzCi2eUvC1 https://t.co/blVd6sqOlw

1

40

10

34

2K

Underfox @Underfox3

1 day ago

The results across fixed-duration microbenchmarks on MI210 and MI300A, the bounded ring designs delivered the strongest overall performance and efficiency.

Underfox3's tweet photo. The results across fixed-duration microbenchmarks on MI210 and MI300A, the bounded ring designs delivered the strongest overall performance and efficiency. https://t.co/tbsolyx8TL

1

1

1

1

568

Last Seen Users on Sotwe

Trends for you

Most Popular Users