Just fine-tuned an SDXL model on a blocky, colorful oil painting style known as 'alla prima'.
Used 50 carefully selected images.
The results? So beautiful, it's messing with my brain. 🎨
Custom Instructions with GPT-4 unlocks some insane new capabilities.
If you use ChatGPT, don't ignore this.
Here are the 7 most insane example use cases I've seen so far (a thread):
Diffusion Models - The Magic Behind Stable Diffusion and MidJourney
Watching an AI model generating a photorealistic image from a text prompt feels like magic. Image generation models like Stable Diffusion and MidJourney have made us all prompt artists.
Here is how the magic happens. Stable Diffusion is a latent diffusion model. Deep learning-based diffusion models use neural networks to iteratively add and then remove noise from data, such as images, to generate new samples.
Essentially you start with a clean, target image and add noise in a series of steps - you go from a complex high-dimensional space of an image vector to a "noisy" space. You then "reconstruct" the image using a neural network trained to go from the noisy vector back to the image. As the model removes noise iteratively, it learns the essential features of the image and learns how to "reconstruct" the image and this is where the generative aspect comes into play.
The process of adding and removing noise allows for exploration of the data space and the noise acts like a form or regularization, preventing the model from overfitting and generalizing. Reconstructing or reversing the noise process helps the model generate similar data.
Once trained, you can start with random noise and end up with a generated image that has the characteristics of the training set.
Stabe Diffusion is a latent diffusion model. It largely operates in a low-dimension space by compressing the image into a much smaller lower-dimensional representation called the latent space. This makes for great efficiency and speed as you are dealing with a simplified version of the data
You start with a text prompt that is converted into a high-dimensional encoding using a Transformer-based encoder. This encoder maps a sequence of input tokens to a sequence of latent text embeddings, which are then used to condition the latent space for image generation.
These embeddings are then used to condition the latent space of the diffusion model in a process called latent space conditioning. Essentially the text embedding is used to modify or "condition" the initial random noise in the latent space and guides the diffusion model on what features to focus on and what the final image should be like. The diffusion process then iteratively refines this conditioned latent space, gradually transforming the random noise into a coherent image.
This allows for customization based on the text prompt and efficiency as we are operating in a low-dimensional space
This is of course an oversimplification of the process and there is a lot more detail to this process but hopefully should give a high-level understanding of the process by which text prompts get converted into beautiful images!
This 🧵 is about my analysis framework of DEXes: why I think @CurveFinance prevails over @Uniswap, and why Uni v3 is a wrong move!
In short, 2 reasons: (i) pricing power & (ii) profitability
Why DeGods is failing - they’re broke
- lost 90% of DeGods mint funds by keeping all funds in SOL
- collected y00ts mint funds via DUST (down 90%)
- turned off royalties
- premature / out of left field announcements
short thread 🧵
I just interviewed Jordi (@gametheorizing), safe to say my mind was blown.
Over an alpha-packed 101 minutes, we discussed:
• When #Bitcoin is likely to bottom
• Jordi's top plays for next cycle
• BTC vs ETH
and MUCH more.
🧵: Here are the 10 most important takeaways. 👇
ChatGPT has crossed 1M+ users in just 5 days.
To compare, it took Netflix 41 months, FB - 10 months, and Instagram - 2.5 months.
But many haven’t yet realized its full potential.
Here are the 10 mindblowing things you can do using it right now: