@alexalbert__ Nice find. Does the LLM know what a needle in the stack test is? They've only become popular quite recently, I believe, so I wonder if web pages describing them might have been part of its pretraining dataset. If so, it's still an impressive interpolation.
@NoClosedForm @EhudReiter - PriMock57: A Dataset Of Primary Care Mock Consultations
- Human evaluation and correlation with automatic metrics
- Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation
- User-driven research of medical note generation software
Congratulations to @fmoramarco for passing his PhD viva with minor corrections! His thesis was on "Evaluation of Medical Note Generation Systems" (focusing on real-world utility). I am grateful to the examiners, @mwhite14850 and @janiesinclair
@pbteja1998@OpenAI Are we sure the 20 files limit is for retrieval? In the example, they show this with code_interpreter. 20 files for retrieval sounds oddly unusable. https://t.co/QCxgKzE1tF
Me: Tell me something surprising
GPT-4: Did you know that honey never spoils? Archaeologists have ...
Me: Tell me something surprising in one sentence
GPT-4: There's a species of jellyfish ...
Me: Tell me something surprising in two words
GPT-4: E=mc^2
Me: well played