Thrilled to announce that our recent paper on lexical inference in context and continuous entailment patterns has been accepted at this year's @emnlpmeeting
Joint work with @HinrichSchuetze
Camera-ready: https://t.co/JAJgQW4k2D
Code: https://t.co/uqVvkBbqOE
📢Check out CopyBench, the first benchmark to evaluate non-literal copying in language model generation!
❗️Non-literal copying can occur even in models as small as 7B and is overlooked by current copyright risk mitigation methods.
🔗https://t.co/8bsxj75VGk
[1/N]
Bing webmaster guidelines now include “prompt injection” as a no-no.
What this tells us is that GPT-4 is still susceptible to prompt injection attacks and Microsoft doesn’t have a solution other than “pretty please don’t do it”
Is there truly no dedicated python library with model evaluation metrics (F1 etc.) that are computed "on the fly", i.e., accumulatively? I have only seen it in allennlp so far. But they don't have a standalone library that I know of. HF and torchmetrics both don't have it, right?
Can anyone recommend research papers on that question or a framework that implements a certain way? Or can anyone share what, in their experience, worked best in practice?
Continuous annotation and training are two principles of #datacentricai / #mlops and they sound super interesting! One thing I have trouble figuring out though: What should I ideally do with the new annotated data?
Then again, if I add more challenging examples to my eval data, how can I still compare training runs? If I add both new training and evaluation data, scores will not necessarily go up. Still the new model is likely better than the old one. How should I handle this?
Stable Diffusion concepts library https://t.co/X2jHPdWp4E textual inversion is amazing - can train a custom word vector (not otherwise reachable by english text) to mean a concept, based on examples. Opens up many possibilities of condensing objects/styles into special tokens 🚀
@keenuniverse Thanks for the idea! Unfortunately, I didn't see anything where you don't need to load all your predictions and ground truths completely into memory first. In my case there are too many single predictions to fit them into RAM. Did I miss something?
Today at @cranebeachmass my 9-year-old daughter nearly drowned while I was watching her, in full view of two life guard stations, on an uncrowded and overcast day. Drowning doesn’t look like drowning, and I hope a lot of people read this and remember the signs. 🧵
A short demo of the Universal Knowledge Core, which is a large multilingual lexical database with a focus on language diversity, has been published. Please find https://t.co/WREAhPrONy. For browse the data online, please check https://t.co/zhTCPYAHNQ.
📢 Welcome to https://t.co/YKX9oX7hp4
Change the "X" in any arXiv article link to the "5" in ar5iv to get a modern HTML5 document.
Thread: what is included, why now, and how we hope to merge back into arXiv.
#OA#OpenScience#preprints
1/10
I am excited to share that our paper "Improving Scene Graph Classification by Exploiting Knowledge from Texts" has been accepted to #AAAI2022@RealAAAI.
Thanks to @sahandsharif, @mnschmit, Prof. Hinrich Schütze, and Prof. Volker Tresp.
https://t.co/luSHPIdwYl
Thrilled to announce that our recent paper on lexical inference in context and continuous entailment patterns has been accepted at this year's @emnlpmeeting
Joint work with @HinrichSchuetze
Camera-ready: https://t.co/JAJgQW4k2D
Code: https://t.co/uqVvkBbqOE
Our results offer a nice explanation for our earlier finding that corpus-retrieved patterns do not help as much as handcrafted ones on the available LIiC benchmarks: Handcrafted patterns are generally shorter.