A research center at Harvard working to strengthen society’s connection to knowledge by advancing our access to and understanding of the data that shapes AI.
Amazing work from an amazing team using @instdin’s Institutional Books data release. Their dedication to detail and accuracy is sorely missing from the vast majority of historical-data work from the AI community. Yet there’s so much work to be done and benefit to getting it right
Even if you're not a partner library, you might be curious about what it's like to work with GRIN. Our technical report has a wealth of details: https://t.co/v4QpvSsSmA
We're also sharing the pipeline we developed for Institutional Books that seamlessly dedupes, classifies, and enhances the data once GRIN Transfer brings it down. https://t.co/10BsSiIIMM
When libraries participate in Google Books, Google not only scans their books, it also makes a wealth of image, OCR, and metadata available to them via the Google Return Interface (GRIN). But working with GRIN can be challenging.
What is the pathway towards greater diversity in data and AI?
Hear from Professor Ruth Okediji, scholar of IP Law at Harvard Law School, who will be in conversation with Assistant Dean Amanda Watson of the Harvard Law School Library on Oct 22 at 2PM.
https://t.co/20ymDIQmVK
What is the pathway towards greater diversity in data and AI?
Hear from Professor Ruth Okediji, scholar of IP Law at Harvard Law School, who will be in conversation with Assistant Dean Amanda Watson of the Harvard Law School Library on Oct 22 at 2PM.
https://t.co/20ymDIQmVK
Can a small visual language model read documents as effectively as models 27 times its size?
Next Friday, IDI will host Michele Dolfi and Peter Staar from @IBMResearch Zurich to discuss their work on SmolDocling, an “ultra-compact” model for diverse OCR tasks.
Can a small visual language model read documents as effectively as models 27 times its size?
Next Friday, IDI will host Michele Dolfi and Peter Staar from @IBMResearch Zurich to discuss their work on SmolDocling, an “ultra-compact” model for diverse OCR tasks.
This Monday, @instdin will host @petrknoth to share his experience leading CORE ("The world’s largest collection of open access research papers") as the rise of AI brings new meaning, and challenges, to stewarding knowledge repositories. Join us virtually via the link below.
Tomorrow, it's our pleasure to host @ayahbdeir to talk about the power of data in building an AI ecosystem that's open, transparent, and fair. 11am ET on June 17th. Register at the link below to attend virtually. Cohosted by the @instdin and @BKCHarvard.
We hope Institutional Books will be the beginning of a process that makes millions more books accessible to the public for a variety of uses.
We welcome feedback as we continue to expand this dataset, refine its contents, and sharpen our process.
https://t.co/gPuXtKAayI
We look forward to growing Institutional Books through community. We welcome collaboration from researchers and model makers as we:
- Evaluate the dataset’s impact on model outputs
- Continuing to refine our OCR pipelines
View the dataset on Hugging Face: https://t.co/t2dBPTjHaZ
I've loved writing words, while loops and wandering wectors, so I'm thrilled to join the @instdin team at Harvard as the director of community and communications! https://t.co/B6ZgWRAevG
As the Institutional Data Initiative (@instdin) expands its mission, we’re announcing a collaboration with the Boston Public Library (@BPLBoston) to develop AI-driven tools capable of accelerating new digitization at libraries across the world, starting at BPL. 🧵
I'm pleased to announce we're expanding our mission at the Institutional Data Initiative (@instdin) with an open call for institutional collaborators, new digitization at Harvard Law School Library, and additional support to advance this work.