You can't engineer luck.
Cleanest phrasing of P vs NP I've heard.
NP is the magical computer that always tells you which path to take. P is what current silicon can do. Tetris is NP-complete. Chess is EXP-complete.
MIT 6.006 Introduction to Algorithms, Fall 2011.
Someone just open-sourced a desktop app that generates 3D models from images and runs 100% locally.
It's called Modly. Drop in an image, it generates a fully textured 3D mesh on your own GPU. No cloud.
→ Photo to 3D mesh in seconds
→ Runs 100% locally on your GPU
→ Pluggable AI model extensions
→ Windows + Linux
#GameMaker#GameDev I'm a sick individual. Full raytraced (no acceleration structures) Holographic Radiance Cascades in 50 LOC frag-shader.
Also optimized the hell out of it for 512x512 and 1024x1024 renders drop from 16ms/120ms -> 5ms/28ms.
Seeing all the insane consistent assets people are dropping with GPT Image 2.0
I built Presets so you can easily copy the prompt masters
You (or someone better) nail the perfect prompt + reference images once
Then anyone can load that exact setup with one click
Out now in SpriteCook (for free):
https://t.co/uXiIpSiJws
Do you have intuition on fp precision?
float16 - tennis court at 1 inch
float32 - small city at 1 mm
float64 - continent at 1 nm
float128 - supercluster at 1 nm
float256 - observable universe at Planck length
Now you have!
I ranted a bit about 90s rendering tricks and techniques. Looking at Tomb Raider (III). Hat tip to the amazing developers behind this. Hope you all enjoy. :-) https://t.co/grsXXZALdq
Finally! All of that for just showing a dynamic illustration in app!
Bunch of algorithms was implemented for this dynamic illustration.
WIP version available (https://t.co/4ZMIat8pqU no LaTeX yet!)
Depends on ImPlatform for custom shaders https://t.co/mKI1Rn2HUF
1/4
𝐕𝐢𝐬𝐮𝐚𝐥 𝐛𝐥𝐨𝐠 on Vision Transformers is live.
https://t.co/N09njkKTXW
Learn how ViT works from the ground up, and fine-tune one on a real classification dataset.
CNNs process images through small sliding filters. Each filter only sees a tiny local region, and the model has to stack many layers before distant parts of an image can even talk to each other.
Vision Transformers threw that whole approach out.
ViT chops an image into patches, treats each patch like a token, and runs self-attention across the full sequence.
Every patch can attend to every other patch from the very first layer. No stacking required.
That global view from layer one is what made ViT surpass CNNs on large-scale benchmarks.
𝐖𝐡𝐚𝐭 𝐭𝐡𝐞 𝐛𝐥𝐨𝐠 𝐜𝐨𝐯𝐞𝐫𝐬:
- Introduction to Vision Transformers and comparison with CNNs
- Adapting transformers to images: patch embeddings and flattening
- Positional encodings in Vision Transformers
- Encoder-only structure for classification
- Benefits and drawbacks of ViT
- Real-world applications of Vision Transformers
- Hands-on: fine-tuning ViT for image classification
The Image below shows
Self-attention connects every pixel to every other pixel at once. Convolution only sees a small local window. That's why ViT captures things CNNs miss, like the optical illusion painting where distant patches form a hidden face.
The architecture is simple. Split image into patches, flatten them into embeddings (like words in a sentence), run them through a Transformer encoder, and the class token collects info from all patches for the final prediction. Patch in, class out.
Inside attention: each patch (query) compares itself to all other patches (keys), softmax gives attention weights, and the weighted sum of values produces a new representation aware of the full image, visualizes what the CLS token actually attends to through attention heatmaps.
The second half of the blog is hands-on code. I fine-tuned ViT-Base from google (86M params) on the Oxford-IIIT Pet dataset, 37 breeds, ~7,400 images.
𝐁𝐥𝐨𝐠 𝐋𝐢𝐧𝐤
https://t.co/N09njkKTXW
𝐒𝐨𝐦𝐞 𝐑𝐞𝐬𝐨𝐮𝐫𝐜𝐞𝐬
Dr @sreedathpanat Videos on ViT
ViT paper dissection
https://t.co/sg3JRvcgNG
Build ViT from Scratch
https://t.co/cnHzEeefDA
Original Paper
https://t.co/QiwrlDRQOc
Next up: demystifying Low-Rank Adaptation (LoRA) in PEFT!
Follow me @Mayank_022 along for more deep learning insights, cool fine-tuning projects, and updates from the upcoming blog posts.
Implemented the Slug Algorithm in WebGPU.
It uses a pure TypeScript text shaping pipeline to convert Inter.ttf into Bézier curves completely without Harfbuzz or WASM.
Tysm for dedicating the patent to the public domain.
Repo below
This is amazing news, Slug has become the gold standard in this domain for good reason. Huge props to Eric for making this decision, this will benefit us all immensely in the long run.