Kushal Tatariya @kushaltatariya - Twitter Profile

Pinned Tweet

over 1 year ago

Our work on quality estimation of non-English Wikipedia articles is on arXiv! 🎉 https://t.co/oqNl6unXPt In collaboration with @akulmizev, Wessel Poelman, @EstherPloeger, @mmbollmann, @johannesbjerva, Jiaming Luo, @heather_nlp, and @mdlhx at @lagom_nlp✨ 1/5

5

28

5

9

3K

Kushal Tatariya @KushalTatariya

over 1 year ago

Moreover, we advocate for a shift in perspective from seeking a general definition of data quality towards a more language- and task-specific one. Ultimately, we aim for this study to serve as a guide to using Wikipedia for pretraining in a multilingual setting. 5/5

0

1

0

76

Kushal Tatariya @KushalTatariya

over 1 year ago

Our work on quality estimation of non-English Wikipedia articles is on arXiv! 🎉 https://t.co/oqNl6unXPt In collaboration with @akulmizev, Wessel Poelman, @EstherPloeger, @mmbollmann, @johannesbjerva, Jiaming Luo, @heather_nlp, and @mdlhx at @lagom_nlp✨ 1/5

5

28

5

9

3K

Kushal Tatariya @KushalTatariya

over 1 year ago

We evaluate the downstream impact of quality filtering on Wikipedia by training tiny monolingual pretrained models for each Wikipedia to find that data quality pruning is an effective means for resource-efficient training without hurting performance, especially for LRLs. 4/5

1

0

84

Kushal Tatariya @KushalTatariya

over 1 year ago

Our work on quality estimation for non-English Wikipedia articles is finally out in the wild 👀. It spread before we had the chance to publicise it haha, but watch out for our upcoming thread on this next week!

WikiResearch @WikiResearch

over 1 year ago

"How Good is Your Wikipedia?" a critical analysis of the Wikipedia content quality beyond English, revealing widespread issues such as a high percentage of one-line articles and duplicate articles. (Tatariya et al, 2024) https://t.co/YFp4XGMESX

WikiResearch's tweet photo. "How Good is Your Wikipedia?" a critical analysis of the Wikipedia content quality beyond English, revealing widespread issues such as a high percentage of one-line articles and duplicate articles.

(Tatariya et al, 2024)

https://t.co/YFp4XGMESX https://t.co/POeIQO1VeU

0

20

5

17

3K

0

8

2

809

Kushal Tatariya @KushalTatariya

over 1 year ago

@MSVPJ_Sathvik @vgaraujov @ThomasBauwens_ @mdlhx Thanks a lot! 😃

0

50

Kushal Tatariya @KushalTatariya

over 1 year ago

I am happy to present our latest work at EMNLP 2024 on the interpretability of pixel-based language models! 🎉 @vgaraujov @ThomasBauwens_ @mdlhx Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models https://t.co/DAPVveutEB #NLProc

2

16

2

3

2K

Kushal Tatariya @KushalTatariya

over 1 year ago

This was my first venture into language model intepretability, and I've learnt a lot of cool things during this project. I hope everyone finds it an interesting read!

0

75

Kushal Tatariya @KushalTatariya

over 1 year ago

Additionally, we examine variants of PIXEL trained with different text rendering strategies, discovering that introducing certain orthographic constraints at the input level can facilitate earlier learning of surface-level features.

1

0

82

KushalTatariya retweeted

LAGoM NLP @lagom_nlp

almost 2 years ago

Leuven goes to Leiden in 10 days! We'll be presenting two posters about data quality of non-English Wikipedias and about typologically informed language sampling 👀 see you there!

0

3

2

0

259

KushalTatariya retweeted

Raj Dabre @prajdabre

about 2 years ago

The camera ready version is now up! https://t.co/jadE5UFPkK We hope to present this at ACL next year. To summarize our contributions: 1. The first ever benchmark for Creole NLP 2. 8 NLP tasks and 28 Creoles 3. Human generated/checked data Hopefully this is used as a starting point for future work on Creoles.

2

39

5

7

8K

KushalTatariya retweeted

SIGTYP @sig_typ

about 2 years ago

✨Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification Paper: https://t.co/FTfqw50rhF Talk: https://t.co/12GwI57tKR #SIGTYP2024 #SIGTYP #EACL2024

sig_typ's tweet photo. ✨Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification
Paper: https://t.co/FTfqw50rhF
Talk: https://t.co/12GwI57tKR

#SIGTYP2024 #SIGTYP #EACL2024 https://t.co/znLdEnRGqK

1

2

4

1

876

KushalTatariya retweeted

LAGoM NLP @lagom_nlp

about 2 years ago

These two papers are being presented today @#EACL2024 @sig_typ ! @EstherPloeger will present at 11h40 and @KushalTatariya at 15h20!

0

7

3

0

567

Kushal Tatariya @KushalTatariya

over 2 years ago

Spoiler: We find that PLMs do get more influenced by Hindi words to predict negative emotions, and by English words to predict positive emotions. Moreover, the PLMs may also overgeneralise this learning to examples where it does not apply.

0

60

Kushal Tatariya @KushalTatariya

over 2 years ago

My paper on 'Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification' is now on arXiv!🎉 Watch out for it at SIGTYP @ EACL 2024! #NLProc @mdlhx @heather_nlp @johannesbjerva https://t.co/MpOmz4uUsd

1

5

4

1

881

Kushal Tatariya @KushalTatariya

over 2 years ago

We use LIME and token-level language ID to examine the effect of language on emotion prediction across 3 PLMs finetuned on a Hinglish emotion classification dataset.

1

0

69

Kushal Tatariya

@KushalTatariya

Last Seen Users on Sotwe

Trends for you

Most Popular Users