🔎 Deep research agents like Asta ScholarQA and OpenAI Deep Research are transforming how we perform literature review.
But how do we know if the way we evaluate them is actually meaningful?
Announcing our new paper: “Deep Research, Shallow Evaluation: A Case Study in Meta-Evaluation for Long-Form QA Benchmarks” 🧵
try out our new prototype system! you can ask questions about a paper and the system will answer with both text and figures from the paper. your data will go towards understanding how to better serve diverse visual needs!
Ever want to ask questions about a paper, including its figures & tables? 📊📈 Want smoother interactions w/papers on desktop & mobile?
Try Paper+Figure QA, a new tool from @allen_ai that answers with the original figures, tables, and excerpts from papers: https://t.co/hoKCgPVBOI
We introduce MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
✅ Reliable: 94.3% agreement with human judgment
✅ Comprehensive: 4 modality combination × 49 tasks × 937 instructions
🔍Results and Takeaways:
> GPT-Image-1 from @OpenAI leads image generation at 78.3% accuracy—13.7% ahead of the next-best model. The top open-source model, BAGEL from #ByteDance , achieves 45.5% accuracy.
> Audio generation is still challenging: Top open-sourced models achieve only 48.7% accuracy in sound (Make-An-Audio 2 from #ByteDance) and 41.9% in music (MusicGen from @AIatMeta).
📜 Paper: https://t.co/pFsEkJZfw8
🛠️ Code and Evaluation Suite: https://t.co/QRU05NlGSO
🥇Leaderboard: https://t.co/oGEFw7YRpc
🧵1/N
[Please RT]
I’m recruiting PhD students to work with me at @UW!
I’m looking for students passionate about developing new *social media algorithms*, both broadly and within the scope of this NSF grant: https://t.co/oMaPj7phwE
More info: https://t.co/vnBqn40XWs
@UW / @UW_iSchool
📢 📅 After a long process of soliciting & vetting bids, I'm excited that we've finally been able to reveal the location for #EMNLP2025 -- it'll be at the International Expo Centre, Suzhou, China from 5-9 November 2025. Looking forward to seeing you there!
@emnlpmeeting#NLProc
I'm recruiting a PhD student to join my group @uw_ischool in 2025-26. If you like the mountains and interdisciplinary research that blends data and culture, this could be a good fit!
PhD apps due Dec 2: https://t.co/S2dMirSr0d
More info about my group: https://t.co/2jHUw4O74S
📉 Open access papers previously had higher accessibility compliance than closed access papers, but since 2019, we observe a sharp decline in compliance among OA papers (from the same publishers), driving much of the overall drop in PDF accessibility.
In 2019 when we first did this analysis, PDF accessibility trends were mostly improving, slowly. These 2024 results surprised me, and reflect major shifts in #OA publishing since Plan S and exacerbated by Covid.
Has OA mostly been a win? Sure. But not evenly for everyone…
📢 Crisis alert in academic publishing!
Less than 3.2% of scholarly PDFs meet #accessibility standards for blind and low-vision readers, and compliance has dramatically declined since 2019, especially for #OpenAccess papers!
What’s going on?👇
Joint w/ @lucyluwang@uw_ischool
🚀Varying Shades of Wrong: When no correct answers exist, can alignment still unlock better outcome?
Introducing wrong-over-wrong alignment, where models learn to prefer "less-wrong" over "more-wrong". Surprisingly, aligning with wrong answers only can lead to correct solutions!
Hi friends, colleagues, followers.
I am on the faculty job market! I am a PhD student @BerkeleyISchool + @berkeley_ai. I work on NLP, and I believe all language, whether AI- or human-generated, is ✨social and cultural data✨. My work includes: 🧵
today i left a bunch of comments for a collaborator on a grant like “what did you mean here?” and “you should expand upon this” only to realize later that i wrote those sections 😭
🚨Curious how LLMs deal with uncertainty? In our new #EMNLP2024 Findings paper, we dive deep into their ability to abstain from answering when given insufficient or incorrect context in science questions 💡https://t.co/2pAWkSwHN7
Joint work w/ @billghowe@lucyluwang@uw_ischool
@allen_ai@SemanticScholar is hiring #nlproc#hci#ml#ai researchers for the following positions with target start dates in 2025, apply by *Nov 1* for the 1st rolling deadline.
- Research intern
- Young investigator (Postdoc)
- Research scientist
Apply: https://t.co/xYovFaZ5Mn
@aarontay maybe small scale demonstrations could offer a more realistic alternative vision? agree it would take navigating a lot of powerful players and divided opinions..
Come be my colleague! We're hiring TWO tenure-track Assistant Professors at @UW_iSchool in AI, Data Science, and HCI 📊💻👩💻🌄
Link to apply: https://t.co/o5ksz5YhsS
Feel free to reach out with any questions!
Excited to share that our paper on plain language summarization evaluation has been accepted to the #EMNLP2024 main conference! I’ll be in Miami and will have several PhD openings for Fall 2025. Feel free to reach out if you’d like to chat!
Sexual harassment is a horrible impediment to academic research, shutting out talented researchers and slowing scientific progress.
What can we do? I believe we're not helpless; we can improve our communities through practical actions.
Take a look: https://t.co/dIL52QwqOM
1/ 🎉 Excited to share our #ACL2024 Findings paper on using LLMs to assist with literature review! 📝
"CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support"
Please check out our virtual poster session today at 8:15 p.m. PT!