i3open - the Innovation Information Initiative

@I3Open

I³ is the Inovation Information Initiative: open datasets, citation graphs, and code

Joined March 2021

53 Following

175 Followers

54 Posts

i3open - the Innovation Information Initiative @I3Open

8 months ago

hi, there was some confusion about the deadline being last night vs. end of the week. we'll leave it open until midnight Friday, thanks!

Matt Marx

@marxmatt

9 months ago

friendly reminder! tomorrow is the deadline to submit a paper for the @I3Open Technical Working Group: https://t.co/gOz1i1F1Yv as I've written before, this is not a typical academic conference. we're focused not on research results but on *datasets* and methods for building datasets. this year we're especially interested in the use of LLMs and other machine-learning methods for building and linking large-scale data, including how to take advantage of these new tools in a cost-conscious way.

742

390

i3open - the Innovation Information Initiative @I3Open

10 months ago

Happy to announce our next i3 Upskilling session, Thursday August 21 at noon (New York time / EDT). ➡️"Using Large Language Models without Blowing Your Research Budget"⬅️ Hosts: Navid Asgari (Fordham) and Deepak Nayak (OSU) Register here: https://t.co/7ECftaPAAn

494

I3Open retweeted

Matt Marx

@marxmatt

10 months ago

I’m happy to announce our next @I3Open Upskilling session, Thursday August 21 at noon (New York time / EDT). By far, our most requested topic was Large Language Models, so I’m excited that I was able to enlist Navid Asgari (Fordham) and his coauthor Deepak Nayak (OSU) for this session. Navid co-founded Cogneunce, an AI-based mental healthcare startup and is also a research fellow at IBM Watson. here's a summary: Large language models (LLMs) are opening new possibilities for research, especially in tasks like classification, sentiment, or theme extraction, and sub-corpus analysis. But navigating the growing range of models and tools can be overwhelming, and many researchers worry about cost, data quality, and hallucination. This session offers a practical, research-focused overview of how to use LLMs effectively and affordably. We’ll compare model types, discuss open vs. closed access, and walk through strategies like prompt design, retrieval-augmented generation (RAG), and lightweight fine-tuning. The focus will be on helping you choose the right tools for your research tasks, without compromising on accuracy or breaking the bank. Sound interesting? Register for the zoom at this link: https://t.co/xuyBoGRVEW

i3open - the Innovation Information Initiative @I3Open

about 1 year ago

brief update: we just received word that the PatentsView contract has been renewed for an additional year, starting tomorrow. I'm not sure what will happen next year, but for now the data will continue to be updated.

282

Who to follow

LAMBODARA PARABHOI

@LAMBODARAPARABH

Indipendent Researcher@ Metric, Bibliomterics,Altmetrics and Open Access,Open Science. Professional Assistant, Indian Institute of Advanced Study, Shimla, India

Symbols and Relations • Layers of Abstraction • Emergence • Mental Architecture

I3Open retweeted

Dror Shvadron @DShvadron

about 1 year ago

Quick update regarding PatentsView metadata: the final datasets, including granted, pre-grant and beta tables, are now available on the I3 BigQuery data repository. Link: https://t.co/9fj5apAcED Join our mailing group: https://t.co/TZ8SZge9xK

584

i3open - the Innovation Information Initiative @I3Open

about 1 year ago

update: unclear that the patentsview site will come down today (no formal announcement yet), but just in case we've posted all data from the 12/31/2024 release. details here: https://t.co/eD0ap049KV

484

i3open - the Innovation Information Initiative @I3Open

about 1 year ago

Update: 3/28 has been confirmed to me as last day for patentsview website. metadata have been posted to a permanent archive, working to find an archive large enough for the remaining ~220G of (compressed) full-text files.

258

i3open - the Innovation Information Initiative @I3Open

about 1 year ago

Dear Friends, we were advised earlier today that the PatentsView data many of us rely on may soon shut down. @I3Open has archived all metadata and full-text file, both granted and pre-grant. We plan to upload these to our BigQuery Workspace shortly & will update when complete.

i3open - the Innovation Information Initiative @I3Open

over 1 year ago

let us know what topics we should cover at the next Upskilling session

Matt Marx

@marxmatt

over 1 year ago

huge thanks @rogermasclans for leading our first @I3Open Upskilling session! Roger did a 75 minute live demo of big-data wrangling using Google BigQuery and the i3-nber data repository. here's the recording (https://t.co/tMSIZNpl5R) for anyone interested.

254

i3open - the Innovation Information Initiative @I3Open

over 1 year ago

starting in about an hour! not too late to register

Matt Marx

@marxmatt

over 1 year ago

🚀Please join us for our first @I3Open Upskilling Session, "Intro to Google BigQuery" by @rogermasclans & @DShvadron Friday 2/21 11am ET New to BigQuery & SQL? Join our first hands-on webinar to: 🔹 Query massive datasets efficiently 🔹 Optimize costs & avoid common pitfalls 🔹 Use SQL + Python for reproducible research register here: https://t.co/ihULrNxOHf

469

i3open - the Innovation Information Initiative @I3Open

over 1 year ago

70+ people registered for the first @I3Open Upskilling session this Friday! Can we hit triple digits? https://t.co/SDevKF8CGr

658

I3Open retweeted

Dror Shvadron @DShvadron

over 1 year ago

I'm looking forward to this! We’re hosting lots of innovation data on the @I3Open BigQuery repo. Join us on Feb 21st for our first webinar. Roger Masclans (@rogermasclans) will cover efficient querying, cost optimization, and key use cases. Register here https://t.co/P7PsPDj4UX

956

I3Open retweeted

Matt Marx

@marxmatt

over 1 year ago

Releasing an open dataset based on @MBikard's dissertation regarding "idea twins." David Hsu and I scaled up his algorithm to the entire Web of Science, scraping Google Scholar to detect adjacent co-citation in PDFs. Here's the server farm in my basement 1/

marxmatt's tweet photo. Releasing an open dataset based on @MBikard's dissertation regarding "idea twins." David Hsu and I scaled up his algorithm to the entire Web of Science, scraping Google Scholar to detect adjacent co-citation in PDFs. Here's the server farm in my basement
1/ https://t.co/73W3UTKsk9

I3Open retweeted

Matt Marx

@marxmatt

over 1 year ago

one last (I promise!) update from @I3Open's big weekend: ➡️the 2025 batch of i3 Fellows⬅️ funded by the Alfred P. @SloanFoundation, Fellows receive a stipend and attend i3 Technical Working Group Meetings. we seek Ph.D students engaged in open datasets. here is this year's batch, in reverse alphabetical order 1/

I3Open retweeted

Matt Marx

@marxmatt

over 1 year ago

thanks everyone for making the 2024 @I3Open technical working group so fun. none of this would have been possible without the support of the Alfred P. @SloanFoundation. if you would like to join our email list for updates, go here ➡️https://t.co/CzbAhEz2Aj⬅️

532

i3open - the Innovation Information Initiative @I3Open

over 1 year ago

@sat_yaki This includes all patents, whether granted or not, from many sources: leading to 16M patents after cleaning and deduping, with translated assignee names, tags, and non-cite measures of patent quality. #i3

I3Open's tweet photo. @sat_yaki This includes all patents, whether granted or not, from many sources: leading to 16M patents after cleaning and deduping, with translated assignee names, tags, and non-cite measures of patent quality. #i3 https://t.co/95VbxT5Il9

341

i3open - the Innovation Information Initiative @I3Open

over 1 year ago

Looking forward to today's Innovation Information Initiative (I3) technical working group! #i3 You can follow the program here: https://t.co/r9bVSojqcE

I3Open's tweet photo. Looking forward to today's Innovation Information Initiative (I3) technical working group! #i3
You can follow the program here:
https://t.co/r9bVSojqcE https://t.co/hIX5DjazvS

697

i3open - the Innovation Information Initiative @I3Open

over 1 year ago

@sat_yaki Josh Lerner on creating a new China patent dataset and its implications: https://t.co/ZndKDubjda

296

I3Open retweeted

Matt Marx

@marxmatt

over 1 year ago

Satyaki Chakravarty (Università Cattolica del Sacro Cuore, Milano) has created a dataset of patents (and applications) in India, which are undercounted in commonly-used sources. finds increasing geographic diversity of patents in India, a surge in Mumbai, and huge growth in mechanical engineering key question from Bronwyn Hall: does this mean there's more *invention* in India vs. greater awareness of the practice of patenting inventions

marxmatt's tweet photo. Satyaki Chakravarty (Università Cattolica del Sacro Cuore, Milano) has created a dataset of patents (and applications) in India, which are undercounted in commonly-used sources.

finds increasing geographic diversity of patents in India, a surge in Mumbai, and huge growth in mechanical engineering

key question from Bronwyn Hall: does this mean there's more *invention* in India vs. greater awareness of the practice of patenting inventions

476

I3Open retweeted

Matt Marx

@marxmatt

over 1 year ago

@mayadurvasula, from our first batch of @I3Open Fellows, is back for a 3rd time to show that the performance of commercial LLMs (gpt-4o) can be matched by retraining open\simpler models (BERT) with a small sample of commercial encodings

$marxmatt's tweet photo. @mayadurvasula, from our first batch of @I3Open Fellows, is back for a 3rd time to show that the performance of commercial LLMs (gpt-4o) can be matched by retraining open\simpler models (BERT) with a small sample of commercial encodings https://t.co/Mz0cwnjbgo$

605

i3open - the Innovation Information Initiative

@I3Open

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users