wenphan @wenphan - Twitter Profile

wenphan retweeted

over 1 year ago

Time to set a new standard for code retrieval, read more in our blog https://t.co/0lL99O7eSR and https://t.co/107kJG2Rqx for evaluating code retrieval. Start building with @VoyageAI today - the first 200M tokens are on us!

1

0

2K

wenphan retweeted

Voyage AI by MongoDB

@VoyageAI

over 1 year ago

Voyage created a total of 238 new high-quality reasoning-intensive code retrieval datasets that address the shortcomings of existing benchmarks (noisy labels, overly simplistic tasks, and data contamination) voyage-code-3 outperforms all other models in every group of datasets.

VoyageAI's tweet photo. Voyage created a total of 238 new high-quality reasoning-intensive code retrieval datasets that address the shortcomings of existing benchmarks (noisy labels, overly simplistic tasks, and data contamination)

voyage-code-3 outperforms all other models in every group of datasets. https://t.co/UP7cbjdYvT

1

5

1

2K

wenphan retweeted

Voyage AI by MongoDB

@VoyageAI

over 1 year ago

📢 Announcing voyage-code-3 embedding model! 1. more accurate: + 14% gain over OpenAI-v3-large 2. flexible dimension (Matryoshka): 256-2048 3. quantized embeddings: float, int8, binary 4. new Pareto frontier: (binary,256 dim.) is 6% better than OpenAI (float,3072 dim.) 🧵🧵

VoyageAI's tweet photo. 📢 Announcing voyage-code-3 embedding model!

1. more accurate: + 14% gain over OpenAI-v3-large
2. flexible dimension (Matryoshka): 256-2048
3. quantized embeddings: float, int8, binary
4. new Pareto frontier: (binary,256 dim.) is 6% better than OpenAI (float,3072 dim.) 🧵🧵 https://t.co/4Hds7GGvCu

3

62

9

18

28K

wenphan retweeted

Ben Clavié

@bclavie

over 2 years ago

There's a new exciting reranking API from @Voyage_AI_! It's already supported in `rerankers` v0.1.2, try it out in your pipelines! `pip install --upgrade rerankers`

bclavie's tweet photo. There's a new exciting reranking API from @Voyage_AI_!

It's already supported in `rerankers` v0.1.2, try it out in your pipelines!

`pip install --upgrade rerankers` https://t.co/6N6yAT61Ul

0

32

4

21

2K

Who to follow

Adnan Hashmi (عدنان ھاشمی)

@adnan_hashmi

Data & #AI Architect, Life-long Learner, Proud #Pakistani 🇵🇰, @OpenEdPakistan + @Kolachi3D Founder, #INTJ, #Azure, #MachineLearning

Michal Malohlava

@mmalohlava

Maker at @h2oai, proud father of twins, maintainer of Sparkling Water, contributing into open source AI projects, and still believing that coding is beautiful!

Fabrizio Milo

@fabmilo

LF angels investors (inception phase) AI for Software Development at Scale. I believe: - English is the new programming language - Code will eat the world

wenphan retweeted

Connor Shorten

@CShorten30

over 2 years ago

Voyage AI (@Voyage_AI_) is the newest giant in the embedding, reranking, and search model game! 🔥 I am SUPER excited to publish our latest Weaviate podcast with Tengyu Ma (@tengyuma), Co-Founder of Voyage AI and Assistant Professor at Stanford University! 🎙️ We began the interview with a deep dive into everything embedding model training and contrastive learning theory. Tengyu delivered a masterclass in all things from scaling laws to multi-vector representations, touching on ColBERT and Matryoshka embeddings, neural architectures, representation collapse, data augmentation, semantic similarity, and more! I am beyond impressed with Tengyu's extensive knowledge and explanations of all these topics. 🧠 The next chapter dives into a case study Voyage AI did fine-tuning an embedding model for the LangChain documentation. This is an absolutely fascinating example of the role of continual fine-tuning with very new concepts (for example, very few people were talking about chaining together LLM calls 2 years ago), as well as the data efficiency advances in fine-tuning. ⚙️ We concluded by discussing ML systems challenges in serving an embeddings API. Particularly the challenge of detecting if a request is for batch or query inference and the optimizations that go into either say ~100ms latency for a query embedding or maximizing throughput for batch embeddings. 🚀 YouTube: https://t.co/w5IbkqMprG Spotify: https://t.co/YmGpnCH6y3

CShorten30's tweet photo. Voyage AI (@Voyage_AI_) is the newest giant in the embedding, reranking, and search model game! 🔥

I am SUPER excited to publish our latest Weaviate podcast with Tengyu Ma (@tengyuma), Co-Founder of Voyage AI and Assistant Professor at Stanford University! 🎙️

We began the interview with a deep dive into everything embedding model training and contrastive learning theory. Tengyu delivered a masterclass in all things from scaling laws to multi-vector representations, touching on ColBERT and Matryoshka embeddings, neural architectures, representation collapse, data augmentation, semantic similarity, and more! I am beyond impressed with Tengyu's extensive knowledge and explanations of all these topics. 🧠

The next chapter dives into a case study Voyage AI did fine-tuning an embedding model for the LangChain documentation. This is an absolutely fascinating example of the role of continual fine-tuning with very new concepts (for example, very few people were talking about chaining together LLM calls 2 years ago), as well as the data efficiency advances in fine-tuning. ⚙️

We concluded by discussing ML systems challenges in serving an embeddings API. Particularly the challenge of detecting if a request is for batch or query inference and the optimizations that go into either say ~100ms latency for a query embedding or maximizing throughput for batch embeddings. 🚀

YouTube: https://t.co/w5IbkqMprG

Spotify: https://t.co/YmGpnCH6y3

5

111

29

65

34K

wenphan retweeted

Voyage AI by MongoDB

@VoyageAI

over 2 years ago

Rerankers refine the retrieval in RAG. 🆕📢 Excited to announce our first reranker, rerank-lite-1: state-of-the-art in retrieval accuracy on 27 datasets across domains (law, finance, tech, long docs, etc.), enhancing various search methods, vector-based or lexical. 🧵

VoyageAI's tweet photo. Rerankers refine the retrieval in RAG.

🆕📢 Excited to announce our first reranker, rerank-lite-1: state-of-the-art in retrieval accuracy on 27 datasets across domains (law, finance, tech, long docs, etc.), enhancing various search methods, vector-based or lexical. 🧵 https://t.co/6V5oYqYPGQ

4

57

9

44

27K

wenphan retweeted

LangChain

@LangChain

over 2 years ago

⛵ @Voyage_AI_ Embedding Integration Package ↗️ Use the same custom embeddings that power Chat LangChain via the new langchain-voyageai package! Recommended by @AnthropicAI as their preferred embedding provider, Voyage AI builds custom embedding models for your company or domain, improving retrieval quality over your document types. ChatLangChain: https://t.co/EsAAjrpHgB Python Docs: https://t.co/ClpxYF9NDQ

LangChain's tweet photo. ⛵ @Voyage_AI_ Embedding Integration Package ↗️

Use the same custom embeddings that power Chat LangChain via the new langchain-voyageai package! Recommended by @AnthropicAI as their preferred embedding provider, Voyage AI builds custom embedding models for your company or domain, improving retrieval quality over your document types.

ChatLangChain: https://t.co/EsAAjrpHgB

Python Docs: https://t.co/ClpxYF9NDQ

1

79

12

39

20K

wenphan @wenphan

about 3 years ago

@sameerajayasoma @lizrice Ugh!! I forgot to get @lizrice ‘s autograph. Yes. Great tutorial. I’m reading my copy on the flight back.

0

53

wenphan retweeted

Erin LeDell - ledell.bsky.social @ledell

over 4 years ago

My colleague and friend, Leland Wilkinson, passed away on Friday. It was such an honor to work with Lee, and to be his friend. He was brilliant, he made incredible contributions to visualization and statistical computing, and on top of all that he was a genuinely kind person.

ledell's tweet photo. My colleague and friend, Leland Wilkinson, passed away on Friday. It was such an honor to work with Lee, and to be his friend. He was brilliant, he made incredible contributions to visualization and statistical computing, and on top of all that he was a genuinely kind person. https://t.co/CKdlVwwXOC

2

162

29

16

0

wenphan @wenphan

over 4 years ago

@Navdeep_Gill_ Nice! But so structured. Try some semi-structure :)

0

wenphan @wenphan

almost 5 years ago

@vllry @NetworkAndK8s Congrats! I’ve been reading since the early release and I love the way you explain things!

0

wenphan @wenphan

about 6 years ago

@annkspencer Cha-ya is one of our favorites too!!

0

1

0

wenphan @wenphan

about 6 years ago

@Navdeep_Gill_ @rasbt I’ll my kid get started too after her school zoom and we can compare notes.

0

wenphan @wenphan

over 6 years ago

@annkspencer I’m looking at all the booze and wondering if they’ll be a party to help with those?!?! ;)

0

1

0

wenphan retweeted

Yann LeCun

@ylecun

over 6 years ago

Some folks still seem confused about what deep learning is. Here is a definition: DL is constructing networks of parameterized functional modules & training them from examples using gradient-based optimization.... https://t.co/jmHpWZOMH8

40

2K

464

430

0