Sharing our #ACL2023NLP paper on evaluation for medical multi-document summarization! New human annotated dataset, new metrics, and an in-depth analysis, here: https://t.co/QpyE6B1WoH
Joint w/ @YuliaOtmakhova@jaydepun@ththinh_ BaileyKuehl ErinBransom @allen_ai@byron_c_wallace
@NirRatner@AI21Labs Something similar works for multidocument summarization: I trained an encoder/decoder model to independently encode inputs/concat/decode:
https://t.co/Bv5ONOIMi5
AI safety will be an important part of any system performing these tasks in the wild. There’s a lot of work to do to ensure the quality and reliability of model outputs. We encourage the community to work on these challenging and important problems! 3/3
Medical systematic reviews are costly and time-consuming to produce. We introduce a new dataset called MS^2 to help automate and assist in parts of the process: https://t.co/NKFJsklCJY #NLProc@lucyluwang@i_beltagy@SemanticScholar@allen_ai 1/3
MS^2 focuses on extraction and summarization in the review pipeline. We harvest 20K systematic reviews and 470K of their references from Semantic Scholar, identify summary targets, and experiment with multi-document summarization methods. 2/3
1/ New work by Alican (@alicanb_) and Babak (@BabakEsmaeili10): "Evaluating Combinatorial Generalization in Variational Autoencoders" (https://t.co/XtB29f9FVV)
In this paper we ask the question: "To what extent do VAEs generalize to unseen combinations of features?"(thread)
#NLProc does not have a standard benchmark for interpretability. I am stoked to announce ERASER: the first-ever effort on unifying and standardizing NLP tasks with the goal of interpretability.
https://t.co/8l74WNiYzm