@enjalot@RSevastjanova Okay, by chance, I found that SDv2.1 uses the penultimate (second to last) layer. By that, I get the correct embeddings. π
Thanks a lot for your hints - without experimenting with the FrozenOpenCLIPEmbedder I would not have noticed this! π
@enjalot@RSevastjanova I feel like I'm missing something here...
Did you compare the embeddings between CLIP-ViT-H-14-laion2B-s32B-b79K and the SDv2.1 text encoder?
@StabilityAI, thanks for your awesome work on Stable Diffusion v2.1! π₯
Could you kindly let us know from which CLIP model you used the text encoder? Are the weights for the projection layer and the vision model anywhere available for download? Thanks!
#StableDiffusion#OpenCLIP