Institutional Data Initiative @instdin - Twitter Profile

instdin retweeted

about 1 month ago

Amazing work from an amazing team using @instdin’s Institutional Books data release. Their dedication to detail and accuracy is sorely missing from the vast majority of historical-data work from the AI community. Yet there’s so much work to be done and benefit to getting it right

1

3

5

3

3K

instdin retweeted

Greg Leppert @leppert

7 months ago

Even if you're not a partner library, you might be curious about what it's like to work with GRIN. Our technical report has a wealth of details: https://t.co/v4QpvSsSmA

0

1

0

140

instdin retweeted

Greg Leppert @leppert

7 months ago

We're also sharing the pipeline we developed for Institutional Books that seamlessly dedupes, classifies, and enhances the data once GRIN Transfer brings it down. https://t.co/10BsSiIIMM

1

2

1

0

128

instdin retweeted

Greg Leppert @leppert

7 months ago

When libraries participate in Google Books, Google not only scans their books, it also makes a wealth of image, OCR, and metadata available to them via the Google Return Interface (GRIN). But working with GRIN can be challenging.

1

151

instdin retweeted

Institutional Data Initiative @instdin

8 months ago

What is the pathway towards greater diversity in data and AI? Hear from Professor Ruth Okediji, scholar of IP Law at Harvard Law School, who will be in conversation with Assistant Dean Amanda Watson of the Harvard Law School Library on Oct 22 at 2PM. https://t.co/20ymDIQmVK

instdin's tweet photo. What is the pathway towards greater diversity in data and AI?

Hear from Professor Ruth Okediji, scholar of IP Law at Harvard Law School, who will be in conversation with Assistant Dean Amanda Watson of the Harvard Law School Library on Oct 22 at 2PM.

https://t.co/20ymDIQmVK https://t.co/Z1X4VWnueY

0

1

0

104

Institutional Data Initiative @instdin

8 months ago

What is the pathway towards greater diversity in data and AI? Hear from Professor Ruth Okediji, scholar of IP Law at Harvard Law School, who will be in conversation with Assistant Dean Amanda Watson of the Harvard Law School Library on Oct 22 at 2PM. https://t.co/20ymDIQmVK

0

1

0

104

Institutional Data Initiative @instdin

9 months ago

Join us tomorrow at 10AM EST: https://t.co/FakraXOEzv

Institutional Data Initiative @instdin

9 months ago

Can a small visual language model read documents as effectively as models 27 times its size? Next Friday, IDI will host Michele Dolfi and Peter Staar from @IBMResearch Zurich to discuss their work on SmolDocling, an “ultra-compact” model for diverse OCR tasks.

instdin's tweet photo. Can a small visual language model read documents as effectively as models 27 times its size?

Next Friday, IDI will host Michele Dolfi and Peter Staar from @IBMResearch Zurich to discuss their work on SmolDocling, an “ultra-compact” model for diverse OCR tasks. https://t.co/sVIgMb9VqJ

1

0

143

0

46

Institutional Data Initiative @instdin

9 months ago

Register to join the talk virtually: https://t.co/sP06P70x2o

0

44

Institutional Data Initiative @instdin

9 months ago

Can a small visual language model read documents as effectively as models 27 times its size? Next Friday, IDI will host Michele Dolfi and Peter Staar from @IBMResearch Zurich to discuss their work on SmolDocling, an “ultra-compact” model for diverse OCR tasks.

1

0

143

instdin retweeted

Greg Leppert @leppert

12 months ago

This Monday, @instdin will host @petrknoth to share his experience leading CORE ("The world’s largest collection of open access research papers") as the rise of AI brings new meaning, and challenges, to stewarding knowledge repositories. Join us virtually via the link below.

1

2

0

532

instdin retweeted

Greg Leppert @leppert

12 months ago

Tomorrow, it's our pleasure to host @ayahbdeir to talk about the power of data in building an AI ecosystem that's open, transparent, and fair. 11am ET on June 17th. Register at the link below to attend virtually. Cohosted by the @instdin and @BKCHarvard.

1

5

2

1

952

Institutional Data Initiative @instdin

12 months ago

We hope Institutional Books will be the beginning of a process that makes millions more books accessible to the public for a variety of uses. We welcome feedback as we continue to expand this dataset, refine its contents, and sharpen our process. https://t.co/gPuXtKAayI

0

2

0

1

201

Institutional Data Initiative @instdin

12 months ago

Today we released Institutional Books 1.0, a 242B token dataset from Harvard Library's collections, refined for accuracy and usability. 🧵

instdin's tweet photo. Today we released Institutional Books 1.0, a 242B token dataset from Harvard Library's collections, refined for accuracy and usability. 🧵 https://t.co/HgEug3N7hg

3

36

12

18

9K

Institutional Data Initiative @instdin

12 months ago

We look forward to growing Institutional Books through community. We welcome collaboration from researchers and model makers as we: - Evaluate the dataset’s impact on model outputs - Continuing to refine our OCR pipelines View the dataset on Hugging Face: https://t.co/t2dBPTjHaZ

1

3

0

318

instdin retweeted

Fels @felchang

about 1 year ago

I've loved writing words, while loops and wandering wectors, so I'm thrilled to join the @instdin team at Harvard as the director of community and communications! https://t.co/B6ZgWRAevG

felchang's tweet photo. I've loved writing words, while loops and wandering wectors, so I'm thrilled to join the @instdin team at Harvard as the director of community and communications! https://t.co/B6ZgWRAevG https://t.co/wKBXyDEoev

2

11

2

0

899

instdin retweeted

Greg Leppert @leppert

about 1 year ago

As the Institutional Data Initiative (@instdin) expands its mission, we’re announcing a collaboration with the Boston Public Library (@BPLBoston) to develop AI-driven tools capable of accelerating new digitization at libraries across the world, starting at BPL. 🧵

1

6

7

2

2K

instdin retweeted

Greg Leppert @leppert

about 1 year ago

I'm pleased to announce we're expanding our mission at the Institutional Data Initiative (@instdin) with an open call for institutional collaborators, new digitization at Harvard Law School Library, and additional support to advance this work.

2

15

5

0

2K

Institutional Data Initiative

@instdin

Last Seen Users on Sotwe

Trends for you

Most Popular Users