Depends on what you want to achieve. For reasoning LLM it already used reasoning token before writing the toolcall. For non reasoning LLM you can add reason to make it think but it should be before the args, even before the toolname.
If it’s just to show on UI the reason of tool call then don’t matter either way.
@snowmaker Totally agree. And trust can only be earned when AI agents interact with prod data inside a safe, contained environment where errors can be backtracked easily.
This environment should be simple enough for us human to control but sure enough that AI cannot break out.
@JFPuget Of course hybrid is better quality and also more satisfying. I find using AI is useful for writing but not by asking it to write for me. I use it to ask question, challenge wording, fix spelling so I can focus on the key points
@plbiojout I think it is not only being lazy or conservatives. A class of people invest a lot of time in their life to gain status thanks to their knowledge. When knowledge become cheap, of course they try to fight it to keep their social status.
This is so true. I am building AI for data science. When I am approaching schools the professor absolutely want to avoid that.
In the same time, when I talk to last year students, many are terrified by the future and don’t know which skills to learn to avoid being competed by AI.
@lennysan Vibe DS does not work like vibe coding. Anyone can click on a web button and see if it kind of work. For data analysis and statistics, it requires true knowledge to separate what « seems correct » and real result.
@kskrygan Why segment by status? Some senior are slow to catchup while some juniors are excellent at leveraging AI.
Give a base for every one then monitor usage and efficiency. Each person has a different learning curve.
@fabianstelzer The system prompt is at the top of the context, if it get changed the whole context get cache write again. Instead having the date at the latest message only do cache write on the delta vs latest cache.
@heyandras I think a lot of people setup too specific rules, skill and complex system.
New model is not optimized on those prompts.
All those system will have to update again.
If you are feeling claude code is feeling dumper than before: add this to .claude/settings.json
```
"model": "opus",
"effortLevel": "high"
```
Also to avoid draining usage too fast, this work out quite well in CLAUDE.md
```
## Subagent Execution
When executing a plan that involves changes across multiple files, break it down into logical chunks of work.
Launch a Sonnet subagent to handle each chunk, each agent should ensure the code is tested and working .
Afterward, review the full implementation once more to ensure nothing was missed and no mistakes remain.
```
I fell into a rabbit hole reading the writeups from Kaggle’s Deep Past Challenge, where teams had to translate Old Assyrian into English.
Main takeaway: data quality beats modeling tricks.
Tiny official train set (1.5k pairs), messy OCR from academic PDFs, train/test mismatch, unstable leaderboard.
The best teams mostly won by:
-> rebuilding sentence pairs from PDFs
-> cleaning + normalizing hard
-> using byte-level models like ByT5
-> using LLMs for extraction, alignment, filtering, repair
being conservative at inference
Not much architectural magic.
Mostly better data.
Honestly it feels closer to real ML work than many Kaggle comps.
I wrote a longer breakdown here: https://t.co/dbAGHXnAxb