Repository:
https://t.co/6gO6cmJoZU
Feedback is always appreciated.
Next I'm planning to explore CUDA matrix multiplication, convolution kernels, and GEMM optimizations.
The biggest lesson wasn't CUDA syntax.
It was learning how memory access patterns, synchronization, and warp execution determine performance.
GPU programming is fundamentally about understanding the hardware.
Along the way I explored:
• Warp divergence
• Shared memory
• Bank conflicts
• Loop unrolling
• Warp shuffle
• Cooperative Groups
• Grid-stride loops
This project became much more than "parallel reduction."
Instead of jumping to an optimized kernel, I implemented every major optimization step separately.
V1 → V7 shows exactly how each optimization improves the previous implementation.
The experiments were run on an RTX 3050 Laptop GPU (GA107).
Understanding the hardware is just as important as writing the kernel.
• 16 SMs
• 2048 CUDA cores
• Shared Memory
• 192 GB/s peak memory bandwidth
It took 7 CUDA kernel versions to understand why "adding numbers" is one of the most important optimization problems in GPU programming.
I implemented CUDA Parallel Reduction from scratch, progressing through 7 stages inspired by NVIDIA's optimization techniques.
🧵👇
@striver_79 Yeah, on the other day I ordered food from @zomato the food was pretty bad, raised the issue these guy have this weird ai that kept saying can't refund, even when the food was genuinely undercooked and bad.
Worked on neural collapse for a few months now..
Project, research publication and insights coming soon!
Stay connected and discuss freely in the comments.
What if industrial equipment could tell you it was about to fail?
I built AssetSense, an IoT predictive maintenance system using ESP32, MQTT & AWS to monitor vibration, temperature & current in real time.
My first AWS Builder article!
https://t.co/A7Yab7ILon
#AWS#IoT#DevOps
Built Synapse – Adaptive AI Tutor at Gen AI Hackathon 2026!
An AI-powered learning platform that identifies knowledge gaps, adapts explanations to learner proficiency, and visualizes complex concepts interactively.
🧵👇
#AI#GenAI#EdTech#buildinginpublic