CALLHOME German Lexicon Second Edition: morphological, phonological, stress and frequency information for 318,809 German words from CELEX2 and transcripts of telephone speech between German speakers, with a pronunciation dictionary https://t.co/dRQuZlURve
CALLHOME German Second Edition brings original speech and transcript datasets up to date with new transcripts and revised directories, file formats and documentation https://t.co/MCoK4nLsWP
MADCAT Phases 1-3 Composite Evaluation Set: 1643 Arabic images w/ annotations derived from web text and newswire documents copied by hand, scanned, annotated and translated for automatic conversion of foreign language text images into English transcripts https://t.co/WfY4ECgOkl
LDC’s May newsletter announces three new publications: MADCAT Phases 1-3 Composite Evaluation Set, CALLHOME German Second Edition and CALLHOME German Lexicon Second Edition https://t.co/ySCspeCY03
More LDC data in the LORELEI series: LORELEI Somali Representative Language Pack features monolingual and parallel text, annotations, software tools and more for human language technology development to address emergent situations https://t.co/fhSShIGJiA
MATERIAL Tagalog-English Language Pack has 100 hours of Tagalog conversational telephone speech, transcripts, English translations, annotations and queries designed to support cross language information retrieval https://t.co/OhWaVOFAXB
DEFT Chinese and English Light and Rich ERE Parallel Annotation: 179 Chinese-English discussion forum documents labeled for entities, relations and events, including coreference (light) and event hoppers (rich), developed by LDC for the DARPA DEFT program https://t.co/gBZ7zO9Uro
Check out our April newsletter for LDC’s latest publications – DEFT Chinese and English Light and Rich ERE Parallel Annotation, MATERIAL Tagalog-English Language Pack and LORELEI Somali Representative Language Pack https://t.co/ySCspeCY03
CALLHOME Spanish Lexicon Second Edition: morphological, phonological, stress & frequency info for 45,547 Spanish words from transcripts of telephone speech between Spanish speakers and Spanish news text, with a pronunciation dictionary & G2P tools https://t.co/kL6u0RnBXD
CALLHOME Spanish Second Edition brings original speech and transcript datasets up to date with new transcripts and revised directories, file formats and documentation https://t.co/lRkOLFDBzh
Ancient Chinese WordNet contains 55,100 records of words from the Pre-Qin period (before 221 BCE) linked to a corresponding synset in Princeton WordNet 1.6, covering 22 noun categories, 15 verb categories, and additional adjective and adverb categories https://t.co/zAcdqnapZd
LDC’s March newsletter features the release of three new publications – Ancient Chinese WordNet, CALLHOME Spanish Second Edition and CALLHOME Spanish Lexicon Second Edition https://t.co/ySCspeCY03
The 50th Penn Linguistics Conference (PLC) is happening Feb 28–Mar 1. PLC brings together students, faculty & researchers interested in languages & linguistics to share new work and connect with peers. We wish everyone a great and productive conference. https://t.co/oOp90C8nb3
More LDC data in the LORELEI series: LORELEI Russian Representative Language Pack features monolingual and parallel text, annotations, software tools and more for human language technology development to address emergent situations https://t.co/mXDettebqj
Happy International #MotherLanguageDay This year’s theme celebrates youth voices on multilingual education – emphasizing that language is central to identity, learning, well-being and participation in society. Let’s celebrate every language, every voice https://t.co/zNn7bcgsSJ
KAIROS Schema Learning Background Source Data: 14K English & Spanish multimodal resources collected by LDC for a Schema Learning Corpus; schemas were used with event extraction to characterize & make predictions about real-world events in the corpus https://t.co/sZHUUOAV0a
2022 NIST Language Recognition Evaluation Test and Development Sets: 222 hours of telephone speech and broadcast narrowband speech in 14 languages, plus turnkey evaluation documentation, emphasizing African languages and related English and French dialects https://t.co/OuvEiXH3gw
Catch up on 2026 membership discounts, spring data scholarship awards and the release of three new publications in LDC’s February newsletter https://t.co/ySCspeDvPB
MATERIAL Swahili-English Language Pack has 112 hours of Swahili conversational telephone speech, transcripts, English translations, annotations and queries designed to support cross language information retrieval https://t.co/tBH1Jirpva
CALLHOME Japanese Lexicon Second Edition: morphological, phonological and stress information for 80,688 Japanese words from transcripts of telephone conversations between native Japanese speakers, along with a pronunciation dictionary and G2P tools https://t.co/z1H61wM67N