Current LLMs are trained on text data that would take 20,000 years for a human to read.
And still, they haven't learned that if A is the same as B, then B is the same as A.
Humans get a lot smarter than that with comparatively little training data.
Even corvids, parrots, dogs, and octopuses get smarter than that very, very quickly, with only 2 billion neurons and a few trillion "parameters."
1. PDFTriage represents a document as a hierarchical tree of elements.
2. Selects the document frame needed for answering the query and retrieves it directly from the selected page, section, figure, or table.
3. Selected context with Query is used to extract answer.
3/3
This research from @Stanford@AdobeResearch , took on the challenge of solving Structured Long documents (like PDF, PPT, etc.) Q/A using LLMs by drawing inspiration from the mental models associated with these documents.
https://t.co/PsIlXhXfu7
#MachineLearning#AI
1/3
Existing Q/A methods treat long documents as plain text - which is incongruous with the user’s mental model of these documents with rich structure. This paper represents documents as structured objects and does a focused Q/A while preserving the document structure.
2/3
According to Google Scholar's latest ranking of publication venues by h5-index, ICLR is #9 in all of science, a mere 9 years after its creation, just in front of NeurIPS.
https://t.co/WzNDqAIZc8
Check out Prakhar's (@rattller) informative insights into Efficient Text-to-Text Transformers!
https://t.co/KuTRJsG28V
Keep an eye for the Segmind demo in the video 👀🙌🏻
For more details, follow Segmind @_segmind
Data augmentation can help increase data samples in the existing dataset and encourage the development of generalised ML models.
Here are some “Popular Data Augmentation Techniques in NLP” and associated libraries/github.
https://t.co/H9aVb1nhuQ
#NLProc#AI#blog
🗂️ Multi-label text classification weak labeling
Get started with this brand new feature with this tutorial by @vid_algo
https://t.co/NwaSvGykJF
#python#opensource#datacentricai
🦾 💻 #NLProc here is a cool resource
🥦, 🌿,🍅 == 🧅, 🌿, 🥕
@pandora_intl's few-shot library for NER based on @spacy_io
```
data = {"vegetable": ["broccoli", "spinach", "tomato"]}
```
You get: onion, celery, carrots
10min video by @rattller
https://t.co/orcwWV3lYH
Attention is a technique that helps the networks to focus more on important parts of the input data and fade out the remaining.
Global Attention Vs Local Attention: https://t.co/9wgVPn0Epj
#MachineLearing#NLProc#Research
This paper introduces TARS that scales current pre-train fine tune learning paradigm to fewshot and zeroshot learning scenarios 🔥 🔥
https://t.co/gUMkxNdJsi
#MachineLearing#NLProc#research#LearnInPublic
Prompt-learning is the paradigm to adapt PLMs to downstream NLP tasks. 🔥
Can we get away from manually creating these prompts and mine them automatically?
Interested in answering that question? 🤯 watch (Part-2) of the Paper Summary here - https://t.co/MCowXxFSfR
#AIml
This systematic survey organizes research work around this new, powerful, and attractive learning paradigm in Natural Language Processing, which is "Prompt-based Learning". 🔥
Paper Summary: https://t.co/VfFYRujNRf
#ArtificialIntelligence#NLProc#Researchpaper#MachineLearning