Ben Schmidt / @benmschmidt@sigmoid.social @benmschmidt - Twitter Profile

Pinned Tweet

Ben Schmidt / @[email protected] @benmschmidt

about 3 years ago

Read and explore this rich interactive of 20 *million* research articles from PubMed, a project we're releasing today with @ritagonmar and @hippopedoid. It's a *beautiful* embedding structure, a fascinating, complete corpus. Some highlights (thread) https://t.co/qzcZd2eKnB

13

738

213

292

136K

benmschmidt retweeted

Nomic

@nomic_ai

7 months ago

AI systems excel in domains that have abundant coverage in internet data. Large sectors of the economy are not digital-native. Their data, processes, and workflows are governed by signals that are out of distribution of foundation models. Introducing the new Nomic Platform

nomic_ai's tweet photo. AI systems excel in domains that have abundant coverage in internet data.

Large sectors of the economy are not digital-native. Their data, processes, and workflows are governed by signals that are out of distribution of foundation models.

Introducing the new Nomic Platform https://t.co/1FmsnkvA8d

1

29

13

6

10K

benmschmidt retweeted

Andriy Mulyar

@andriy_mulyar

10 months ago

Nomic has a new X account. Stay tuned for some exciting updates over the next few months.

1

8

1

0

2K

benmschmidt retweeted

Nomic

@nomic_ai

10 months ago

We're re-branding! This is now the new official Nomic X account! Follow us for updates on new open-source AI models and platform developments!

3

16

3

0

8K

Who to follow

Piotr Nawrot

@p_nawrot

LLM Efficiency @NVIDIA - views have always been only my own 🥇🥈 @ Flunkyball Polish Championships

Market Architect Capital Research

Using #AI and #NLP to study storytelling at McGillU. Director of .txtlab and author of the forthcoming book, Why You Should Read More Fiction.

benmschmidt retweeted

Andriy Mulyar

@andriy_mulyar

12 months ago

hiring an ml intern to work on vlm postraining for a special project, reports directly to me. must be exceptional. apply via dms.

10

221

14

95

40K

Ben Schmidt / @[email protected] @benmschmidt

about 1 year ago

In general I try not to post high-quality original content to this account anymore, and I feel pretty confident that the above post doesn't violate that practice.

0

5

0

404

Ben Schmidt / @[email protected] @benmschmidt

about 1 year ago

it works

1

9

0

671

benmschmidt retweeted

CalCo

@calco_io

about 1 year ago

Introducing Atlas Analyst: The Data Agent for Data Analytics Ask questions, get answers with references to your data, and immediately take action based on those insights.

1

59

14

32

5K

benmschmidt retweeted

Alexander Doria

@Dorialexander

over 1 year ago

Announcing the release of Common Corpus 2. The largest fully open corpus for pretraining comes back better than ever: 2 trillion tokens with document-level licensing, provenance and language information. https://t.co/sdN6qNJMHW

Dorialexander's tweet photo. Announcing the release of Common Corpus 2. The largest fully open corpus for pretraining comes back better than ever: 2 trillion tokens with document-level licensing, provenance and language information. https://t.co/sdN6qNJMHW https://t.co/KsaUbolzkc

7

387

74

155

41K

benmschmidt retweeted

Andriy Mulyar

@andriy_mulyar

over 1 year ago

Hugging Face is the hub for AI datasets and today we bring every dataset to life with Nomic's first-class Hugging Face data connector. With a few clicks, you can now vector search, curate, and collaborate on any dataset in @huggingface https://t.co/YT8zu4s7fb

andriy_mulyar's tweet photo. Hugging Face is the hub for AI datasets and today we bring every dataset to life with Nomic's first-class Hugging Face data connector.

With a few clicks, you can now vector search, curate, and collaborate on any dataset in @huggingface

https://t.co/YT8zu4s7fb https://t.co/Bqr8Q7uzcc

0

20

3

5

2K

benmschmidt retweeted

Daniel van Strien @vanstriendaniel

over 1 year ago

I created a map for Hub dataset cards using this new connector in less than 5 minutes.

1

14

1

2

991

benmschmidt retweeted

CalCo

@calco_io

over 1 year ago

Vector Search Any Hugging Face Dataset 🤗 Introducing the @huggingface Datasets Connector in Nomic Atlas https://t.co/eNVRuqiXO2

calco_io's tweet photo. Vector Search Any Hugging Face Dataset 🤗

Introducing the @huggingface Datasets Connector in Nomic Atlas

https://t.co/eNVRuqiXO2 https://t.co/f0qxkyj7jn

1

96

22

37

16K

benmschmidt retweeted

CalCo

@calco_io

over 1 year ago

Introducing Open-Source, On-Device Inference-Time Compute in GPT4All - New : GPT4All Reasoner v1 - Support for Code Interpreter, Tool Calling and Code Sandboxing Inference-time compute is now available to every laptop in the world.

5

354

61

267

34K

benmschmidt retweeted

Wilson Marcílio Jr @EstecioJunior

over 1 year ago

Comparing ModernBERT and BERT embeddings reveals some nice properties. The embeddings from the two base architectures show different features for this dataset in terms of class cohesion. https://t.co/a7C06Ei50n

1

20

6

9

2K

Ben Schmidt / @[email protected] @benmschmidt

over 1 year ago

@Dorialexander which is kinda weird actually given that they did the Bodleian but not BNF -- do you have any sense what library those scans would be from?

0

42

Ben Schmidt / @[email protected] @benmschmidt

over 1 year ago

@Dorialexander A bit more. Here's the French counts in the second half of the 17C (columns are `year, words, pages, books`) from https://t.co/IZIFVs4e3V

benmschmidt's tweet photo. @Dorialexander A bit more. Here's the French counts in the second half of the 17C (columns are `year, words, pages, books`) from https://t.co/IZIFVs4e3V https://t.co/Swxr0YxNsQ

1

2

0

117

Ben Schmidt / @[email protected] @benmschmidt

over 1 year ago

@Dorialexander Also OTOH nobody in DH uses google books/ngrams, so they probably don't view it as a relevant field.

0

22

Ben Schmidt / @[email protected] @benmschmidt

over 1 year ago

@Dorialexander My guess would be that they occasionally chat with Bob Darnton or something, but they're not interested in the DH people because they figure they have all the computer expertise so they just need to check that against book expertise.

1

0

25

Ben Schmidt / @[email protected] @benmschmidt

over 1 year ago

@Dorialexander Still though after those changes what I'm seeing is that GB has high dozens to low hundreds of books annually in the english corpus for the 17C. EEBO is like 10x that, although maybe a lot of EEBO is Latin?

benmschmidt's tweet photo. @Dorialexander Still though after those changes what I'm seeing is that GB has high dozens to low hundreds of books annually in the english corpus for the 17C. EEBO is like 10x that, although maybe a lot of EEBO is Latin? https://t.co/UpxQBXfHGR

1

0

62

Ben Schmidt / @[email protected] @benmschmidt

over 1 year ago

@Dorialexander I don't think they really care? Not sure. I'm kind of amazed on reflection that in the last 15 years I don't think I've never met a single person actually working on Google Books, even though they funded my postdoc.

2

0

68

Ben Schmidt / @[email protected] @benmschmidt

over 1 year ago

@Dorialexander And hah they didn't even bother to add it to the download page! https://t.co/mbb3QC4zdL

0

1

0

39

Ben Schmidt / @[email protected] @benmschmidt

over 1 year ago

@Dorialexander Oh shit and there was a 2024 update too. I haven't seen a word about that.

1

0

53

Ben Schmidt / @[email protected]

@benmschmidt

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users