All day using GLM 5.2. Didn't miss much. First open model that passes the bar as a daily driver. Things are not going to be the same.
Damn, now I want to buy some serious hardware.
@tonyGewrit because it is a cool lab experiment that has no concurrency and dramatically reduced quality versus the foundation model itself served over GPUs in a cluster?need 600 GPUs to serve 20 people at 30 tokens per second
@banteg have to think about the future, what will this be able to run a year from now? fully abliterated within your control. no one training on your outputs so that other users can copy your novel contributions in one shot on the next release.