Nir Mazor @NirMMazor - Twitter Profile

Pinned Tweet

4 months ago

New preprint 💥 Can a general-purpose model achieve results comparable to medically pre-trained models? 🤔 We show that lightweight fine-tuning of a general-purpose LVLM and an LVLM-aware retriever can. 🚀 🔗 GitHub: https://t.co/vbHBIyoNgj 📄 Paper: https://t.co/Q8tKLk8J1P

NirMMazor's tweet photo. New preprint 💥

Can a general-purpose model achieve results comparable to medically pre-trained models? 🤔

We show that lightweight fine-tuning of a general-purpose LVLM and an LVLM-aware retriever can. 🚀

🔗 GitHub: https://t.co/vbHBIyoNgj

📄 Paper: https://t.co/Q8tKLk8J1P https://t.co/1CIhEJFeRl

1

24

9

0

811

NirMMazor retweeted

Noy Sternlicht @NoySternlicht

12 days ago

🎉 Happy to share that CHIMERA has been accepted to #ACL2026NLP (main conference)! 📄 Paper - https://t.co/11tMhA07P2 🤗 Data - https://t.co/g8GgYsfj9E 🌐 Project Page - https://t.co/c4j8gEm6V5 💻 Git - https://t.co/lVC8UAzOti Joint work with @Hoper_Tom

0

35

10

5

2K

NirMMazor retweeted

Eliya Habba @EliyaHabba

about 1 month ago

New datasets keep coming, New models keep coming. Frustrating! How can we evaluate everything on everything? How do we keep scores comparable over time? We propose a way to grow benchmark suites without losing comparability. Details:👇🧵

EliyaHabba's tweet photo. New datasets keep coming,
New models keep coming.

Frustrating!
How can we evaluate everything on everything?
How do we keep scores comparable over time?

We propose a way to grow benchmark suites without losing comparability.

Details:👇🧵 https://t.co/zYdFUfTCzk

3

40

12

5

2K

NirMMazor retweeted

Michael Hassid @MichaelHassid

about 1 month ago

Have LLMs become supervised learners (once again)?! In our new paper, we argue that current LLMs’ post-training methods have effectively reverted to the "pre-train then fine-tune" era, explicitly tailoring models to desired behaviors. 1/n

MichaelHassid's tweet photo. Have LLMs become supervised learners (once again)?!

In our new paper, we argue that current LLMs’ post-training methods have effectively reverted to the "pre-train then fine-tune" era, explicitly tailoring models to desired behaviors.

1/n https://t.co/f8br6vdiKw

1

50

21

6

2K

NirMMazor retweeted

Guy Kaplan @GKaplan38844

about 1 month ago

Fine-Tuning LLMs on New Knowledge Encourages Hallucinations. (@zorikgekhman) But why? We found something unexpected: 1M facts about city-like names →hallucinations explode. 1M facts about random identifiers →near zero! Same model. Same number of facts. Only the names change.🧵

GKaplan38844's tweet photo. Fine-Tuning LLMs on New Knowledge Encourages Hallucinations. (@zorikgekhman)
But why? We found something unexpected:
1M facts about city-like names →hallucinations explode.
1M facts about random identifiers →near zero!
Same model. Same number of facts. Only the names change.🧵 https://t.co/y3xpqiP7PY

4

76

23

40

17K

NirMMazor retweeted

Itay Itzhak @Itay_itzhak_

about 2 months ago

Ever used a top-ranked LLM that just... felt wrong for you? You’re not alone. Instead of leaderboards, many of us turn to "vibe-testing" - manually comparing models to our own needs. But can we turn these feelings into a structured evaluation? New paper: "From Feelings to Metrics" 🧵

Itay_itzhak_'s tweet photo. Ever used a top-ranked LLM that just... felt wrong for you?

You’re not alone. Instead of leaderboards, many of us turn to "vibe-testing" - manually comparing models to our own needs. But can we turn these feelings into a structured evaluation?

New paper: "From Feelings to Metrics" 🧵

2

37

16

10

4K

NirMMazor retweeted

Roni Itkin

@ItkRoni

about 2 months ago

We introduce 🌍GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens.🌍 Most feed-forward 3DGS methods still start from pixel, voxel, or dense view-aligned primitives. We take a different route: align first, decode later. 🧵👇

1

44

17

7

3K

NirMMazor retweeted

Ariel Goldstein @GoldsteinYAriel

about 2 months ago

We are excited to share our new preprint! 🚨 https://t.co/mRKThbiSbH In this work, we investigate the neural mechanisms by which speakers generate information-rich linguistic content during spontaneous natural conversation.

1

31

14

5

782

NirMMazor retweeted

Eliya Habba @EliyaHabba

about 2 months ago

What if you could automatically turn a large collection of documents into structured databases, tailored for your own research needs?📄 We introduce ScheMatiQ! From question ➝ schema ➝ structured data 🔍 @ShaharLevy19 @MintzReshef @RKeydar @BarakRaveh @GabiStanovsky 🧵 👇1/5

EliyaHabba's tweet photo. What if you could automatically turn a large collection of documents into structured databases, tailored for your own research needs?📄

We introduce ScheMatiQ!
From question ➝ schema ➝ structured data 🔍

@ShaharLevy19 @MintzReshef @RKeydar @BarakRaveh @GabiStanovsky
🧵
👇1/5 https://t.co/VwjxLrlOG4

3

28

10

2

2K

NirMMazor retweeted

Tom Hope

@Hoper_Tom

about 2 months ago

Happy to share three papers accepted to the #ACL2026 @aclmeeting conference: on AI ideation, multimodal medical AI, and literature understanding. If you want to read about them, check out the following posts from @NoySternlicht , @NirMMazor and @UKPLab . More updates on each to come separately. In the hypothesis generation / ideation space : cross-domain inspirations. Led by @NoySternlicht . https://t.co/d91iDkZ4B2 In the multimodal medical AI domain: medical image diagnosis with literature retrieval. Led by @NirMMazor . https://t.co/hVRGITKfwM In literature understanding space: summarizing thousands of citations to understand a paper's impact. Led by Hiba Arnaout from @IGurevych's lab, with major contributions from @NoySternlicht . https://t.co/ZTWZhbEWZX

Hoper_Tom's tweet photo. Happy to share three papers accepted to the #ACL2026 @aclmeeting conference: on AI ideation, multimodal medical AI, and literature understanding.

If you want to read about them, check out the following posts from @NoySternlicht , @NirMMazor and @UKPLab . More updates on each to come separately.

In the hypothesis generation / ideation space : cross-domain inspirations. Led by @NoySternlicht .
https://t.co/d91iDkZ4B2

In the multimodal medical AI domain: medical image diagnosis with literature retrieval. Led by @NirMMazor .
https://t.co/hVRGITKfwM

In literature understanding space: summarizing thousands of citations to understand a paper's impact. Led by Hiba Arnaout from @IGurevych's lab, with major contributions from @NoySternlicht .
https://t.co/ZTWZhbEWZX

2

45

6

9

3K

NirMMazor retweeted

Noam Dahan @Dahan_Noam

2 months ago

Today at #EACL2026 we present our work exploring PRINTED newspapers as a data source for summarization in low-resource languages. Join the virtual session at 18:00! 🧵👇

Dahan_Noam's tweet photo. Today at #EACL2026 we present our work exploring PRINTED newspapers as a data source for summarization in low-resource languages.
Join the virtual session at 18:00!
🧵👇 https://t.co/zNtM7uCiHV

3

32

10

2

2K

NirMMazor retweeted

Shachar Don-Yehiya @Shachar_Don

3 months ago

Do you run pairwise evaluation? Do you test your models on the Arena-Hard and AlpacaEval benchmarks? You probably want to read this 🧵👇 https://t.co/z5gxRdXHHb With @LChoshen @AbendOmri

Shachar_Don's tweet photo. Do you run pairwise evaluation?
Do you test your models on the Arena-Hard and AlpacaEval benchmarks?

You probably want to read this 🧵👇

https://t.co/z5gxRdXHHb
With @LChoshen @AbendOmri https://t.co/lWneGDdxey

1

32

10

3

2K

NirMMazor retweeted

Asaf Yehudai

@AsafYehudai

3 months ago

New preprint, evaluation framework & leaderboard!🚨 General-purpose AI agents are everywhere. 🤖 From ReAct to @claudeai Code and @OpenAI SDK. But how do we actually evaluate them — as general agents? Currently, benchmarks are deeply tied to domain-specific setups, making it impossible to evaluate true cross-domain agents. We’re changing that! We’re introducing Exgentic and the Open General Agent Leaderboard. 🧵👇

AsafYehudai's tweet photo. New preprint, evaluation framework & leaderboard!🚨

General-purpose AI agents are everywhere. 🤖
From ReAct to @claudeai Code and @OpenAI SDK.

But how do we actually evaluate them — as general agents?

Currently, benchmarks are deeply tied to domain-specific setups, making it impossible to evaluate true cross-domain agents.

We’re changing that!

We’re introducing Exgentic and the Open General Agent Leaderboard. 🧵👇

2

47

14

30

7K

NirMMazor retweeted

Oren Sultan

@oren_sultan

4 months ago

Can LLMs reliably predict program termination? We evaluate frontier LLMs in the International Competition on Software Verification (SV-COMP) 2025, directly competing with state-of-the-art verification systems. @AIatMeta @HebrewU @Bloomberg @imperialcollege @ucl @jordiae @pascalkesseli @jvanegue @HyadataLab @adiyossLC @PeterOHearn12 Paper: https://t.co/5GP6q8E87v Website: https://t.co/HQwhl2UVLQ 🧵👇 1/n

oren_sultan's tweet photo. Can LLMs reliably predict program termination?

We evaluate frontier LLMs in the International Competition on Software Verification (SV-COMP) 2025, directly competing with state-of-the-art verification systems.

@AIatMeta @HebrewU @Bloomberg @imperialcollege @ucl

@jordiae @pascalkesseli @jvanegue @HyadataLab @adiyossLC @PeterOHearn12

Paper: https://t.co/5GP6q8E87v
Website: https://t.co/HQwhl2UVLQ

🧵👇
1/n

9

115

42

37

44K

Nir Mazor @NirMMazor

4 months ago

Under the guidance of @Hoper_Tom — thank you! @nlphuji

0

2

0

60

Nir Mazor @NirMMazor

4 months ago

New preprint 💥 Can a general-purpose model achieve results comparable to medically pre-trained models? 🤔 We show that lightweight fine-tuning of a general-purpose LVLM and an LVLM-aware retriever can. 🚀 🔗 GitHub: https://t.co/vbHBIyoNgj 📄 Paper: https://t.co/Q8tKLk8J1P

1

24

9

0

811

Nir Mazor @NirMMazor

4 months ago

Our model also achieves superior results over general-purpose RAG baseline models 🚀📈

1

2

0

44

NirMMazor retweeted

Avishai Elmakies @AvishaiElm37946

5 months ago

🚀 Excited to share that my paper from my internship at @IBMResearch has been accepted to #ICASSP2026! We train Speech-Aware LLMs (SALLMs) with Group Relative Policy Optimization (GRPO) on open-ended tasks (Spoken QA & Speech Translation). We find that GRPO beats SFT!

2

26

9

2

710

Nir Mazor

@NirMMazor

Last Seen Users on Sotwe

Trends for you

Most Popular Users