I can't express with enough words how happy I'm to join the amazing group of @Java_Champions!
I need to say thank you to the global @java community for being so good to me since the day one.
Thank you also to my fellows from @soujava: it wouldn't be possible without you!
#java
New course on serving LLMs efficiently -- how do you serve models to many concurrent users at low latency and reasonable cost? This short course is built with @RedHat and taught by @cedricclyburn.
Efficient LLM serving requires efficient memory management. A 70B-parameter model takes ~140 GB just to load the weights. On top of that, every active request needs its own chunk of GPU memory, the KV cache, to store the token context it has built up so far. In this course, you'll learn to reduce a model's memory footprint with quantization and serve it using vLLM, which handles many concurrent requests efficiently through smart memory management.
Skills you'll gain:
- Quantize a model and measure the accuracy tradeoff
- Serve a model with vLLM and watch it handle concurrent requests efficiently
- Benchmark your deployment and make informed tradeoffs between speed, cost, and accuracy
Join and learn to serve LLMs efficiently:
https://t.co/x04xMbFlkO
Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor.
It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx
So the real call for a Java team: you get Docling-grade parsing without rewriting your RAG pipeline in Python. Orchestration, observability and security stay in your JVM stack. The cost you take on is running docling-serve. Weigh it against the parse fidelity.
5/5
@langchain4j 1.15.0 shipped an integration with #Docling, @IBM Research's document parser. If you build #RAG in @Java, it lands on the messiest stage of the pipeline: getting clean text and tables out of real PDFs and DOCX.
1/5
What you run underneath: Docling itself runs as a docling-serve instance. The LangChain4j integration is a Java REST client for it, built on the official Docling Java library. You operate a separate service and point the parser at its endpoint.
4/5
Workflows are the biggest upgrade to Claude Code’s capabilities since skills and subagents.
I dove deep into it with @sidbid to figure out best practices, examples and more. I’m particularly excited about the non-technical tasks it enables for Claude Code.
A change to the default object layout can move your cost per pod more than another framework rewrite will. It's unglamorous, it ships inside the platform you already run, and RDP1 for JDK 27 begins June 4.
6/6
JDK 27 forks from the main line on June 4. Two JVM defaults flip in this release, and both change what a Java service costs to run in a container. Neither asks you to touch a line of code.
1/6
The catch with any default is that it moves your baseline quietly. Heap sizing, GC behavior, and footprint numbers all shift under you. Re-measure on your real workload before you trust the new defaults in production.
5/6
So the senior read on JDK 27: the header and GC defaults move real production numbers the day you upgrade, with no code edits. The Vector API stays a feature to keep watching while it incubates. Know which column a JEP sits in before you build a roadmap around it.
7/7
JDK 27 reaches general availability on September 14. Two of its JEPs change your memory footprint and GC behavior with zero code change. A third headline feature is entering its twelfth incubation. Telling those two groups apart is the actual senior-dev read.
1/7
Now the contrast: JEP 537 takes the Vector API into its twelfth incubation, with no substantial changes since JDK 25, after eleven rounds across JDK 16 through 26. Twelve incubations and still not standardized. That is the 'almost here' column.
6/7