@fatzin_@neural_avb Not an unreasonable question but you'd want to understand how transformers are structured before attempting to upscale. Your training might go wasted if you do it without proper architectural decisions. As a general pointer, you might find net2net paper interesting.
2/ The expensive mistake in agent work is building before you've understood the shape of the problem. A few hours of structured exploration with agents arguing about it saves you weeks of building the wrong thing.
Some of y'all call it specs.
@trq212 I usually run two versions of responses: the "plain english" leads AI's messages at the top, then it follows with technical counterpart. Easier to front-load core meaning before details.
Compaction severs reliability, especially when you're in flight with methodologies to follow. I usually have the session write a handoff before compaction happens and have it re-read it and referred docs to replenish its memory, regardless of what compaction provides.