@osanseviero can anyone at google take a look at llama.cpp's audio input implementation for gemma-4-12b? It doesn't seem to work: https://t.co/g2zjXJAuFw
I moved from ChatGPT Pro to ChatGPT Plus, but my Pro usage is counted against my Codex usage, so I only get 3 weeks of Codex usage this month instead of 4. That does not seem fair, and is not disclosed anywhere. Why is OpenAI Support unwilling to help? @OpenAI@OpenAIDevs@sama
@wbjang11 That’s really cool! Are the code/weights going to be available under a commercial-friendly license? I’ve wanted a model like this for a long time!
@Alibaba_Qwen It makes us nervous that this is a “Plus” model, one of the proprietary models. And you just released a proprietary Omni model. Is Alibaba Qwen still doing open models?
@skalskip92 LightOnOCR-2 is really good, and FireRed OCR has also impressed me. GLM-OCR is good, but it's not even the one I'm most likely to reach for. But, directing it to extract specific things is an unconventional use case for an OCR model, and it is interesting to see that it works.
@skalskip92 Note that Qwen3.5 seems to be overpriced because it is so new. Look at even bigger models (more expensive to serve) like DeepSeek-V3.2, which is $0.25 and $0.40 on OpenRouter.
@ArtificialAnlys What about Soniox? Parakeet TDT V2 is supposed to be better than V3 at English transcription. Let’s not forget the crowd pleasers: if you want to get attention, add Apple and Google’s default keyboard transcription models to the benchmark dataset… they’re amusingly terrible!
@ArtificialAnlys@bfl_ml That is not true: “All four variants are released under the Apache 2.0 license, enabling unrestricted commercial use.”
Only the 4B models are under Apache 2.0. The 9B models are under a non-commercial license.
@agammessi10@skalskip92 Also consider that the human is supervising. The automated annotations might be wrong, or might not fit perfectly, but a model can often do 80% of the work in one click, and then the human cleans up the data, which the model will learn from and do better next time.
@ClassicMain@OfficialLoganK Gemini 1.5 Flash is $0.075/Mtok for prompts <= 128k tokens, $0.15/Mtok for prompts > 128k tokens, for input tokens. Gemini 2.0 Flash-Lite is *always* $0.075/Mtok. So it is half of the price of Gemini 1.5 Flash if you go above 128k tokens. Unless I’m very bad at reading things..?
@skalskip92@onuralpszr Why do these object detectors all stop at such small parameter counts? The mAP values seem so low…? It seems like there would be use cases for a bigger, better object detector, even if it either can’t run in real time or requires an H200 to run in real time.
@OfficialLoganK How is a preview different from experimental? And why isn’t the flash lite “preview” model under the Preview section of AI Studio? Seems confusing! But, I am excited that 2.0 Flash is finally GA!