@jerrinot RAM is abundant and attention is only computational expensive at very long contexts.
The speedup is minor and considering the capability loss, it's not worth it IMHO (for small models).
Currently the KV cache is stored using float16, which is already tight, memory-wise.
@epragt Performance is very competitive for pure Java, also with GraalVM's Native Image.
The repository links to implementations of Llama 3 (original), Qwen 3.5, OpenAI's gpt-oss, and NVIDIA's Nemotron 3 models, all in pure Java.
@__tinygrad__ The operators serve hyper-specialized implementations of each model. How good is tinygrad at fusing high-level ops?
Even with some advanced compiler magic, the hand-tuned kernels with nit-picked fusions are hard to beat. It's a pristine model blueprint vs. a tuned Franken-model.
My @GraalVM Native Image deep dive recording is already up: https://t.co/DF5f9HjtVg 🐰🚀
It includes the very public first demo of project Crema, Open World for Native Image, at 2:19:54 😅
Thank you, @Devoxx!
All demos and notes are here: https://t.co/PdEESuT6PP
We just merged the current status of the upcoming JDWP support for @GraalVM Native Image! 🥳
This will soon provide developers with the same debugging experience they are used to in Java, but for native images! Stay tuned for more details.
https://t.co/UmNLnaLns9
https://t.co/ne9OtJjWfi
Graal compiler: +10% faster inference with the latest early access build.
New features: batched prompt processing & AVX512 support.
Modern @Java Project : a Spring Boot wrapper for https://t.co/k6xDFNGg6W from @TheMukel supporting OpenAI Chat Completion REST requests 🔥 https://t.co/1cMic9PQpn #OpenAI#SpringBoot
Earlier today was asked if Java AI integration improved yet, or that we'd still need to rely on Python or C bindings.
Was happy to share https://t.co/RuRcoxbHRm by @TheMukel from the GraalVM team running native in Java without any dependencies and with superior performance!
@tjake For me your #Devoxx talk about https://t.co/fWmo5GIyBh and the one from @TheMukel about https://t.co/LXtLab5BAp were the most relevant talks. Thanks for all the background information - I have learned a lot!
@christzolov@vitalethomas@alina_yurenko I have a working prototype with function calling via LangChain4j.
Vision is just a matter of implementing an additional component, the rest of the inference remains the same.
I'll do my best to implement the missing encoder for vision soon-ish, starting with Llama, then Qwen.