The conclusion of this paper is very interesting. Images may be a more friendly representation for AI to understand source code, since it can be easily compressed to reduce cost. 😃
today most interesting paper: CodeOCR
This work provides a good explanation of how code indentation and highlighting are designed to serve the human eye.
https://t.co/oG3TXIP5A6
Stripping code formatting cuts LLM token cost without hurting accuracy.
Average input tokens drop by 24.5%, with output quality basically unchanged.
The core issue is simple, indentation, spaces, and newlines help humans read but they inflate tokens that models pay to process.
They remove only cosmetic formatting while keeping program meaning identical, checked by matching the abstract syntax tree of the code.
They test Fill in the Middle code completion, where a model fills a missing block, across Java, C++, C#, and Python.
Performance stays stable on unformatted input, big models barely move, smaller ones wobble a bit, Python sees less savings because its layout is part of the language.
One surprise, models still print nicely formatted code even when given smashed input, so output token savings are small.
To fix that, 2 cheap tactics work, explicit prompts that say output without formatting, and light fine tuning on unformatted samples.
With clear instructions or tiny training, output length shrinks by 25% to 36% while pass rate on the first try holds.
They also ship a tool that strips formatting before inference then restores it after, so humans read clean code while the model pays less.
----
Paper – arxiv. org/abs/2508.13666
Paper Title: "The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget"
Want to save your LLM budget without sacrificing performance? Here's a useful trick: removing non-essential code formatting, like indentations, newlines, and extra whitespaces, cuts input tokens by an average of 24.5%!
Check out our full study: https://t.co/951LFOEres
Unitree B2-W Talent Awakening! 🥳
One year after mass production kicked off, Unitree’s B2-W Industrial Wheel has been upgraded with more exciting capabilities.
Please always use robots safely and friendly.
#Unitree#Quadruped#Robotdog#Parkour#EmbodiedAI#IndustrialRobot #InspectionRobot #IntelligentRobot #FoundationModels #LeggedRobot #WheeledLegs
🎉 Exciting News! 🎉
We are thrilled to announce that ACM SIGSOFT has officially upgraded FORGE from an ICSE Special Event to an ICSE Co-Located Conference! 🚀 We can’t wait to see your submissions for FORGE 2025!
See more below👇
#FORGE#FORGE2025@ICSEconf
AI is not making any progress"? Look closer. 🙄 GPT-4 level models got 240x cheaper in just 2 years! AI progress isn't linear and is just about bigger models.
BERT -> DistilBERT
Llama 2 70B -> Llama 3 8B
GPT-4 -> GPT-4o-mini
Llama 3 405B → Llama 4 70B?? 🤔
Models get bigger, then smaller but equally powerful. It's a cycle of innovation. Today's quality per $ is the most expensive we'll see.
Making it cheaper will lead to more people using, learning, and building with AI, which might unlock more potential and “goodput” for everyone than yet another Foundation Model!
AI's real progress: Getting into more hands.🤗
[Image credits: @davidtsong]
Our recent work on self-healing software systems is available at Arxiv now🥳:
[2408.01055] LLM as Runtime Error Handler: A Promising Pathway to Adaptive Self-Healing of Software Systems (https://t.co/fLLWHFubFF)
Wow.
@Jandodev just showed me a prompt humans can’t read but LLMs understand this language better.
The San Francisco AI people are designing a new language.
In stealth. You are first to see it.
Wow.
@Jandodev just showed me a prompt humans can’t read but LLMs understand this language better.
The San Francisco AI people are designing a new language.
In stealth. You are first to see it.
Today we are releasing two small models: Mathstral 7B and Codestral Mamba 7B.
On the MATH benchmark, Mathstral 7B obtains 56.6% pass@1, outperforming Minerva 540B by more than 20%. Mathstral scores 68.4% on MATH with majority voting@64, and 74.6% using a reward model.
Codestral Mamba is one of the first open source models with a Mamba 2 architecture. It is the best 7B code model available, and is trained with a context length of 256k tokens.
Both models are released under the Apache 2 license.
https://t.co/R2pc6u45zL
https://t.co/9szAM62xi5