Sammy Milton-Tomkins @Miltonsammy_ - Twitter Profile

@gpusteve The compute versus memory bottleneck seems to become unavoidable once inference workloads start scaling aggressively. Interesting point on the KV cache pressure inside decode as well.

1

0

52

Sammy Milton-Tomkins

@Miltonsammy_

about 1 month ago

@jetpen The viability problem becomes brutal once agent workflows start layering orchestration, tools and memory together under sustained workloads. A lot of teams seem to hit the point where debugging stops feeling deterministic.

0

1

0

15

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@biserdimitrov @Baderasadeth @Replit @amasad @pirroh What kind of failures did you seem to be seeing? More orchestration getting stuck or the agents just producing inconsistent outputs?

0

32

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@dexterstorey @RubricLabs Are you seeing this more from orchestration getting stuck or from the model outputs drifting? We’ve seen both show up as ‘passes tests but still wrong’ in production.

0

26

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@mwfowlie @jxnlco What kind of errors are you hitting? We have seen a lot of teams run into cascading failures at the infra layer rather than the model itself, especially under load.

0

7

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@rikarends @antirez That gap vs qwen is usually not just model side, we’ve seen a lot of teams hit ceilings from how inference is actually being served rather than the model itself. Are you running into limits from batching / memory or more from raw throughput on the GPU?

0

16

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@nejatian @maelan_sdmr That balance is brutal, usually the breaking points we see are around latency under load or infra not holding up as usage spikes. Has it been more around scaling pressure or reliability issues on your side?

0

1

0

10

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@DavidZinland We’ve seen this a lot with langgraph setups, the reliability issues usually come from how state + streaming are handled under load. Did you seem to run into instability during longer sessions or more around scaling concurrent users?

0

19

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@osttoo @OpenAIDevs This usually shows up when the infra layer isn’t keeping pace with real time demand. Do you seem to be seeing this more from scaling load or from how the workloads are being scheduled?

0

9

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@boardyai Dedicated GPU infrastructure for AI teams @NexaCoreio Need access to clients that are actually currently struggling. Thanks.

1

0

20

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@PieroHerrera1 @jun_song Yes this tends to happen when demand spikes and you’re sitting on shared allocation. It looks fine until everyone hits it all at once, then latency just totally collapses. Do you seem to be seeing this consistently now or manly at peak times?

0

6

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@ariccio @AdvancedTweaker @michael_hoerger That’s where it usually starts getting real. Once you move from isolated runs to continuous loops, the time cost compounds fast, especially on inference. Are you running this on shared infra or something more dedicated?

1

0

7

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@HarveenChadha Feels like a lot of teams only realise this once they actually try to run things at scale. Talking about agentic AI is easy in you hit real constraints on memory, throughput and allocation. Are you seeing teams in your network actually struggle with this yet or still theoretical?

0

15

Sammy Milton-Tomkins

@Miltonsammy_

about 2 months ago

@AGNonX Feels like a lot of teams are moving local just to escape allocation issues, but then hit limits again once workloads grow. Do you seem to actually be seeing these setups hold under sustained inference or more for controlled use cases?

0

Sammy Milton-Tomkins

@Miltonsammy_

Last Seen Users on Sotwe

Trends for you

Most Popular Users