Do yourself a favour if you are curious...just pick up Shannon's original paper A mathematical Theory of communication....It's great to go into thinking mode with this paper...
For the past few days I have been thinking about distributed inference of llms over open internet...So I wrote about it defining the constraints and how petals paper attempts to solve it...
@KeshavRamji Intresting paper.saw it trending in alphaxiv and read it .Model reasoning using abstract tokens and producing the same level result as verbal COT. The only thing is why cannot we use Token1 and Token0 instead of 64 tokens or will that create problem in RL ...
New work with @AlecRad and @DavidDuvenaud:
Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text.
Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:
For the past three days I have been thinking about how llms can be inferenced through public internet something like decentralised version.came across very few papers - petal ,planetserve,bloombee,MDI-LLM and DSSD(Distributed split speculative decoding) ...It's an elegant problem
If you are working with a small model say 1.5B and limited memory (I was using ollama) and for a multi turn task small output is not enough and increasing length cause OOM error because ollama uses llama.cpp and no pagedattention for efficient use....any thoughts..?
@arpit_bhayani I had a problem , where I used a chunk from a pdf to generate multiple statements and then I needed to have the source location exactly..not the whole chunk..I used jaccard similarity to find from where it was taken out..It's working good as of now
One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.