Manuel @manuel_nlp - Twitter Profile

about 8 hours ago

In case you're planning to use MTurk for any NLP papers, hurry up: "After careful consideration, we have made the decision to close new customer access to AWS Mechanical Turk, effective 7/30/26."

0

8

Manuel @Manuel_NLP

1 day ago

@reach_vb Much appreciated! Unfortunately, I haven't received the extra reset yet.

0

8

Manuel_NLP retweeted

ACL Anthology @aclanthology

1 day ago

The proceedings of the (2^6)th Annual Meeting of the Association for Computational Linguistics are now available in the ACL Anthology. https://t.co/sjRDEzn0cq

2

54

5

5K

Manuel_NLP retweeted

Sumit @_reachsumit

1 day ago

STEB: Style Text Embedding Benchmark @rafaelrivera01 et al. introduce a benchmark for style embeddings across 96 datasets, finding semantic embeddings underperform on stylistic tasks with no single model dominating. 📝 https://t.co/aBqSuuF98A 👨🏽‍💻 https://t.co/zW0DeCpAZ1

0

19

4

5

887

Who to follow

Fu-En (Fred) Yang

@FuEnYang1

Research Scientist @NVIDIAAI | Ph.D. @NTU_TW | Prev. Research Intern @NVIDIAAI | Unifying World, Language & Action for Generalist Robotics

Matan Levi

@Matan5191

Sr. #AI Research Scientist @IBM | Taming LLMs | Ph.D. #CS @bengurionu

Manuel_NLP retweeted

1 day ago

Claude Fable 5 will be available again globally tomorrow. After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks. In the near term, some routine tasks like coding and debugging will fall back to Opus 4.8. We’ll continue to refine these classifiers over the coming weeks to reduce false positives and better distinguish genuine misuse from legitimate requests. We’ve also begun drafting a consensus framework—with Amazon, Microsoft, Google, and other Glasswing partners—for assessing the severity of AI jailbreaks and how AI developers should respond to them. We invite other industry partners and model providers to join us in this effort. Finally, we’re scaling up our collaboration with the US government on model testing and safeguards. This will include pre-release access to models and safeguards for evaluation, information sharing on jailbreaks and misuse, and dedicated resources for joint research. Thank you to our users for your patience, and to our partners across the government, industry, and the research community who worked alongside us to make Fable 5 available again. Read our full blog: https://t.co/VHyum831ri

4K

43K

6K

5K

14M

Manuel_NLP retweeted

Claude

@claudeai

2 days ago

Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.

2K

41K

4K

5K

9M

Manuel_NLP retweeted

Boris Cherny

@bcherny

2 days ago

You asked, we listened. Claude Desktop on Linux is here! Download link: https://t.co/gjgHZvbKyi

342

4K

251

406

345K

Manuel_NLP retweeted

eaclmeeting @eaclmeeting

2 days ago

📣 #EACL2027 updates: the Call for Papers is live, and our keynote speakers are confirmed! Main conference: 9–14 March 2027. Special theme: The Human in Language. 🧵👇 https://t.co/zVy6509Pos

1

44

17

14

5K

Manuel @Manuel_NLP

2 days ago

@eaclmeeting "author response (14–19 Sept) & author–reviewer discussion (20–24 Sept) are now two separate stages." Interesting change. IMO it's good. More time and having 2 stages facilitates having a proper discussion.

0

10

Manuel_NLP retweeted

elvis

@omarsar0

5 days ago

If you use LLM-as-judge, this one is worth reading. (bookmark it) It's actually one of the most effective ways to use LLM-as-a-Judge for evals. Holistic judge scores hide both their reasoning and their ceiling effects. BINEVAL decomposes each evaluation criterion into atomic yes-or-no questions, answers each independently per output, then aggregates the verdicts into calibrated multi-dimensional scores. Every question-level verdict is inspectable, so you can diagnose exactly why an output scored low, and the same verdicts feed straight back as targeted prompt-improvement signal. Across SummEval, Topical-Chat, and QAGS, it matches or beats UniEval and G-Eval, training-free, with especially strong results on factual consistency. Paper: https://t.co/oar6BZcasm Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

omarsar0's tweet photo. If you use LLM-as-judge, this one is worth reading.

(bookmark it)

It's actually one of the most effective ways to use LLM-as-a-Judge for evals.

Holistic judge scores hide both their reasoning and their ceiling effects.

BINEVAL decomposes each evaluation criterion into atomic yes-or-no questions, answers each independently per output, then aggregates the verdicts into calibrated multi-dimensional scores.

Every question-level verdict is inspectable, so you can diagnose exactly why an output scored low, and the same verdicts feed straight back as targeted prompt-improvement signal.

Across SummEval, Topical-Chat, and QAGS, it matches or beats UniEval and G-Eval, training-free, with especially strong results on factual consistency.

Paper: https://t.co/oar6BZcasm

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

52

2K

253

4K

287K

Manuel_NLP retweeted

alphaXiv

@askalphaxiv

6 days ago

Looking for a fun weekend read? Introducing the Illustrated ICML 🌎 We indexed all 6000+ ICML papers and built a visual way to explore the whole landscape Search your favorite topics, open a cluster, and dive right in Inspired by @JayAlammar

9

811

116

706

88K

Manuel_NLP retweeted

ACLRollingReview @ReviewAcl

6 days ago

🚨 ARR is recruiting extra emergency reviewers and emergency Area Chairs (ACs) for this cycle. Emergency Reviewer Registration: https://t.co/DtVZqCHm1o Emergency AC Registration: https://t.co/jKxtqzvYbX Thank you for helping support the ARR review process. #ARR #ACL #NLProc

1

37

24

13

8K

Manuel_NLP retweeted

Computational Linguistics Journal @CompLingJournal

7 days ago

🔊CFP for Special Issue on the Ethics of NLP and CL in Computational Linguistics: https://t.co/gV4Ut9dGVW ⏰Deadline: 27 November, 2026 #CLjournal #NLP #NLproc

0

18

7

1

2K

Manuel @Manuel_NLP

10 days ago

@ReviewAcl - Limit the number of submissions for authors qualified to review to 5. - Require each author who is qualified to review to review three times the number of their submitted papers. - Limit the number of submission for each author not qualified to review to 2.

0

1

0

1

378

Manuel @Manuel_NLP

10 days ago

@pangram @max_spero_ @fabianstelzer Awesome! Was only waiting for that.

0

1

0

25

Manuel_NLP retweeted

Pangram @pangram

10 days ago

🦊 Pangram is now live on Firefox 🦊 Get the extension here: https://t.co/swUwZurslF

3

39

5

9

9K

Manuel @Manuel_NLP

10 days ago

@simpsoka A Linux version

0

3

Manuel @Manuel_NLP

10 days ago

@thsottiaux It doesn't running on Linux is not delightful

0

5

Manuel_NLP retweeted

Kenneth Marino

@Kenneth_Marino

11 days ago

Absolutely insane graph here. I’m sorry, if you wrote 40 papers for ACL, no you didn’t. @CVPR put a cap and a bunch of other reforms to try to head off the paper-pocalypse. ACL was caught flat-footed and now it sounds like they’re getting absolutely dogpiled by angry reviewers.

Kenneth_Marino's tweet photo. Absolutely insane graph here. I’m sorry, if you wrote 40 papers for ACL, no you didn’t. @CVPR put a cap and a bunch of other reforms to try to head off the paper-pocalypse. ACL was caught flat-footed and now it sounds like they’re getting absolutely dogpiled by angry reviewers. https://t.co/kcIXQRdNfF

5

75

12

17

26K

Manuel_NLP retweeted

ACLRollingReview @ReviewAcl

12 days ago

📢 ARR May 2026 cycle update We have published a blog post about the increased review assignments in the May cycle, what happened, and what comes next. Please read the full post here: https://t.co/55D2Og9mz3 #ARR #ACL #NLProc

4

50

16

20

61K

Manuel

@Manuel_NLP

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users