FilWordNet tracks PH word usage online to build a context-aware digital lexicon • #NLP#networkscience @dlsu_comet @dlsu_celt @senti_ph • funded by @dostpcieerd
Para sa ating huling post ngayong #BuwanngWika, tignan natin ang kaibahang paggamit ng marites at chika na natuklasan sa aming pananaliksik.
Sana'y may natutunan kayo mula sa ating kampanya ngayon at mas nabigyan natin ng halaga ang ating lokal na lenggwahe!
Nalimutan namin i-post ito nang mas maaga... char!
Kayo ba, alin sa mga sumusunod na ekspresyon ang paborito niyong sabihin kapag kayo'y nagbibiro?
Makikita sa grap na mas madalas ginagamit ang "char" at "chz" batay sa datos na nakolekta namin mula sa Twitter.
#BuwanngWika
Hindi lang temperatura o panahon ang tinutukoy ng "malamig" at "mainit", kundi gingamit din ang mga salitang ito sa isports!
Inilalarawan nila ang kalidad ng laro ng atleta o koponan, at ayon sa nakolekta naming mga datos, mas madalas silang ginagamit sa balita kaysa sa Twitter.
Nagamit mo na ba ang "awit" para magpahayag ng kasakitan o pagkabigo?
Kung kanta ang kahulugan nito dati, sa kasalukuyan naman ay mas madalas nang gamitin ang "awit" bilang isang ekspresyon o pandamdam. Tignan ang ebolusyon ng salitang ito sa grap!
#BuwanngWika
We've discovered several new words, most of which will be familiar to the casual netizen. Yet these findings are vital to expanding our linguistic resources and developing local technologies.
We're excited to share some of our preliminary results in the coming weeks—stay tuned!
Ngayong #BuwanngWika, ipagdiwang ang ating mga lokal na lenggwahe!
Nagdedebelop ang FilWordNet ng teknolohiya para sa pagsuri ng milyon-milyong teksto mula sa iba't ibang plataporma, upang matuklasan ang pagbabago ng paggamit ng wika sa online na konteksto.
[ENG]
Join us as we celebrate our local languages this Buwan ng Wika!
To build contextually rich Philippine language resources, we are developing an automated pipeline to analyze millions of texts from various online platforms and detect new and changing word meanings. 1/2
FilWordNet has been featured in the latest issue of Questions, the magazine of @DLSUManila!
Check out the article on page 39 (page 21 on the double spread): https://t.co/iTDGasEBN4
Dr. Charibeth Cheng talks to The LaSallian about FilWordNet, the broader landscape of natural language processing, and the challenges in developing Philippine language models and technologies.
📢 Call for participants!
The team is building an automated tool to detect word meanings in sentences. We need your help to annotate our training #data.
Support our #research by signing up here: https://t.co/1x7poiVEyI
Please send us a message or email for any concerns!
Not only do we lack local language resources but we also importantly lack the means to continuously update our sources as language use evolves.
Check out the story below to learn how building a context-aware digital lexicon can help plug these gaps.
https://t.co/KnmSkXAy9N
Merry Christmas and Happy Holidays!
Thanks for joining us as we began our project to create contextually rich local language resources. Stay tuned for our year-end wrap-up and more updates in 2022!
Ever chatted with a bot before?
You've likely encountered conversational agents through various companies' websites. #AI#chatbots are an exciting way to see language models and #NLP in action!
Plus check out Natter from our industry partner Senti 👉 https://t.co/8RNZh9nW8H
These trends lend exciting insights into how language evolves in online contexts. A new sense might emerge while an old meaning drops in popularity, and senses might cooperate (similar trends) or compete (opposite trends).
Plus, our resources will be updated with all this data!
What do you do with a lot of data?
➡️Analyze and integrate!
After collecting text data and extracting word senses (meanings) from various online platforms, we can carry out diachronic studies to analyze trends in language evolution.
Read on to find out how! [1/4]
Pretrained BERT models help us identify the correct sense class or meaning of each word in our text collection, distinguishing those with multiple senses.
By comparing time periods and how often each sense appears, trends in word usage will then emerge for each platform. [3/4]
For our project, we're combining these techniques so that we can extract novel and evolving senses from online texts. This is especially useful for low-resource local languages, so we can generate context-rich sense embeddings forming the core of the FilWordNet data source. (4/4)
How does a computer understand language?
Dive into a core component of #language#technologies: sense representation.
Here's a look at one aspect of our approach to building a contextually-rich FilWordNet.
THREAD⬇️ (1/4)
For computers, words are represented as embeddings in natural language processing (#NLP), an #AI field. Here, senses are stored mathematically—like points on a cartesian plane—with similar meanings having similar numbers. These embeddings can change based on context, too (3/4).