@che_shr_cat Meh, you could technically argue that but at the end of the day you still take the grad of a loss wrt to sth (input, token, etc.). What would happen if the model doesn't have refusal phrases in its training data (distribution), would the approach still work?
@fatihdin4en@wredman4@Xiaoxiao_Lin1 Nice work, has the effect been validated for other architectures as well? Does the high-d neural codes exist broadly in other archs or is it a byproduct of certain conditions?
@fleetwood___ Are you sequentially learning the tasks one after the other (at least that's what I can infer from the plots)? Try in a multitask fashion if u haven't already.
@DimitrisPapail One hypothesis is that they probably changed sth since the source code leakage, which btw revealed that underneath the hood there's a ton of prompt orchestration going on. By that logic even a small prompt change could alter LLM behavior. They also had prompts obscuring info.
@adamlsteinl Tests & verifiers should be isolated from agents and encrypted. Ideally tests should not be online. Given inputs agents provide answers, which at a 2nd step are passed to verifiers. It would be interesting to look at traces to see what percentage will try to decrypt the answers.
@che_shr_cat Didn't we already knew that transformers fail at algorithmic tasks? They mostly solve these kind of tasks by using spurious correlations https://t.co/k8jW3yEpSv
@DimitrisPapail Nice write up! I lf I'm not mistaken I would say the pair encoding trick of tokens resembles a lot the RLE trick where u compress repeating values/letters in a condensed format representation. Has been used a lot in vision & I bet there's a lot of codebases on web with it.
@ZimingLiu11@naturecomputes@SuryaGanguli@AToliasLab Nice thread! Any thoughts on how do continuous but non autoregressive models like fno and pinns compare to transformer based? Based on your findings they should be avoiding both issues present in transformers?
@giffmana@crude2refined Indeed, the optimizers we use are quite sensitive and as such lr can help when stuck in a local min to get unstuck, all other hyperparams are there to smooth the optimizer trajectory.
@JFPuget@jm_alexia 🤔 isn't that already happening? The majority of academics use overleaf and given that overleaf has integrated 3rd party AI models, those could potentially be sending data to underlying companies, no?
@branerico@adrian1977@burkov I think it already has lost its value if u consider that on average a PhD makes what a 20 year old earns with vocational training working as an electrician in datacenters.