I'm hiring!!! Grad students/Pre-docs/Post-docs. LLMs for certain. Causal inference is also likely. We're building a top-notch, supportive, kinetic, and downright awesome community @GESSuniMannheim Formal announcement coming, but email me [email protected] w any Q's.
Oh wow. Super cool. Allowing different vertices from Chain of Thought to interact and cross over....This is getting awfully close to a thinking process...
Graphs of Thoughts for Solving Elaborate Problems w/ LLMs
- Models LLM generations as arbitrary graph
- "LLM thoughts" are vertices
- Edges are dependencies between
- Can combine & enhance LLM thoughts using feedback loops
- SoTA on a variety of tasks
https://t.co/s3MnPf1bvI
I've taken to talking about LLMs as "innovating" instead of "intelligent" or "conscious." "Innovating"=doing something I didn't train/tell them to, no muss no fuss. Hopefully this paper can give us the right words to talk about consciousness! https://t.co/70EgGdzGDd
Stanford just released all Stanford XCS224U: Natural Language Understanding course lectures by Prof. Christopher Potts!
Videos https://t.co/Rj4JuLnOAZ
Code https://t.co/DYwPuKSRZO
Writing is a process--training a LLM inspired by writing pedagogy. The idea that LLMs learn writing the same as us is a stretch, but there _must_ be quite a bit practical educators can add. https://t.co/xuqwY6vaer
Nature Comms paper: Subtle adversarial image manipulations influence both human and machine perception! We show that adversarial attacks against computer vision models also transfer (weakly) to humans, even when the attack magnitude is small. https://t.co/O7skDZe6zU
[CL] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Q Wu, G Bansal, J Zhang, Y Wu, S Zhang, E Zhu, B Li, L Jiang, X Zhang, C Wang [Pennsylvania State University & Microsoft & University of Washington] (2023)
https://t.co/p5NwOLCe5c
@vithursant19 Cool stuff! And I like the idea at an intuitive level--there's strong pathways that need to be learned then more complex ones can be followed. Is it possible to put an L1 on the neurons? Or like a LARS algorithm?
[CL] Teach LLMs to Personalize -- An Approach inspired by Writing Education
C Li, M Zhang, Q Mei, Y Wang, S A Hombaiah, Y Liang, M Bendersky [Google] (2023)
https://t.co/6q0ICICfhQ
Cool stuff! Looking forward to benchmarking. From 32 to 16 to 8 to 4 to 2 bit quantization--will we be working with sums of booleans at some point? (And what does the quantization/# of parameters tradeoff look like?)
There're few who can deliver both great AI research and charismatic talks. OpenAI Chief Scientist @ilyasut is one of them.
I watched Ilya's lecture at Simons Institute, where he delved into why unsupervised learning works through the lens of compression.
Sharing my notes:
- Kolmogorov compressor is the theoretical shortest-length program that produces a dataset. SGD is a practical approximation of the Kolmogorov search that finds an implicit program embedded in the weights of a soft computer, i.e. big Transformers.
- Unsupervised learning is about computing the conditional Kolmogorov complexity of a target dataset given an unlabelled corpus, i.e. K(Y|X)
- Theory tells us that optimizing for K(X, Y), the joint complexity, is as good as K(Y|X). So simply throw all data into the mix, and "just compress everything".
- Joint compression is maximum likelihood over the giant concatenated dataset.
- Ilya cites iGPT, Chen et al. 2020, to illustrate the ideas. iGPT is an image compressor that learns to predict the next pixel using a 1D sequence model.
This is a phenomenal lecture, very accessible, and sometimes quite entertaining.
YouTube: https://t.co/FXeFO6OZN7
Lecture page: https://t.co/oiaFUtc2um
Chain of Thought allows intermittent reasoning. Math problems can be solved better if GPT4 checks them with Python code. Cool stuff. https://t.co/vKEjbHsVmh
Good to know!! RoT train on 4 epochs, so reuse training data. May depend on specifics of model, there's a scaling law, more thoughtful details in paper.
https://t.co/cYe0CzNfxZ
Doing more with less! A small model trained w/ wide range of prompts can outperform larger models (GPT3, but not 4). For constrained tasks, smaller-with-a-wider-variety-of-high-quality-training-types can hit the same performance on a single task. https://t.co/cYe0CzNfxZ
❗️ Researchers often rely on third-party entities to field surveys. Therefore, it is important to verify the sincerity of their conduct.
In a project funded by the University of #Mannheim, @fraukolos & colleagues examined ways to detect falsified and fabricated interviews.
(1/5)
How to instruction tune Code LLMs w/o #GPT4 data? Releasing
🐙🤖OctoCoder & OctoGeeX: 46.2 on HumanEval🌟SoTA🌟of commercial LLMs
🐙📚CommitPack: 4TB of Git Commits
🐙🎒HumanEvalPack: HumanEval extended to 3 tasks & 6 lang
📜https://t.co/6OxBye8tAe
💻https://t.co/sAmlCsnRJn
1/9
For all PhD students in small labs: find all possible ways to collaborate with well-known open research groups like @AiEleuther@laion_ai@BigscienceW@BigCodeProject; apply to every single fellowship and look for connections. It’s not optional if you want to have a career.
The quality of training data matters! A lot! And feeding these models well-curated data (real or synthetic) _really_ helps. Also: pre-training loss is a great predictor of accuracy. https://t.co/vYYEHOdKt4