Kyle Waters @kylewaters_ - Twitter Profile

about 1 month ago

7/ links to the paper, data, and harbor adapter below! Paper: https://t.co/ntgrn8XBgS Data: https://t.co/16jWm4whGf Harbor Adapter: https://t.co/NXWNBvtjxm

0

73

Kyle Waters @kylewaters_

about 1 month ago

1/ excited to announce COMPOSITE-STEM, a new benchmark of 70 scientific tasks sourced from experts on @PortexAI Agents may soon assist in R&D, but evals are a critical step in building the trust needed to get there. We've open sourced this dataset to help advance agent evals ⤵️

kylewaters_'s tweet photo. 1/ excited to announce COMPOSITE-STEM, a new benchmark of 70 scientific tasks sourced from experts on @PortexAI

Agents may soon assist in R&D, but evals are a critical step in building the trust needed to get there. We've open sourced this dataset to help advance agent evals ⤵️ https://t.co/SOdkvCHmfG

1

3

0

1

133

Kyle Waters @kylewaters_

about 1 month ago

6/ integration with harbor All COMPOSITE-STEM tasks are fully harbor compliant. Harbor is an open source agent eval framework developed by the team behind TerminalBench. We've natively integrated Harbor within the Portex Datalab. https://t.co/neu4Uzq2Oj

1

0

70

Kyle Waters @kylewaters_

3 months ago

@r0ck3t23 But RL environments are still data... bottleneck has just shifted to designing rubrics & evals for harder-to-verify tasks in knowledge work (law/finance) & frontier scientific research. You still need experts in the loop to define objective criteria for rewarding success.

0

1

0

2

157

Who to follow

Dune

@Dune

📊 The onchain data platform for enterprises. Making crypto data accessible. Any dataset. Any use case. Any environment. Real-time wallet data API @DuneSim

Chainalysis

@chainalysis

Building trust in blockchains among people, businesses, and governments. Our crypto compliance and investigation software powers hundreds of top institutions.

CoinGlass

@coinglass_com

CoinGlass:Aggregated Derivative Exchange Data.Including Liquidation heatmap,Bitcoin Futures Open Interest, Funding Rates and Liquidations.

Kyle Waters @kylewaters_

7 months ago

Amazing couple days of conversations at the @PyTorch Conference in SF - very clear that evals & data sourced from subject-matter experts will play a critical role in advancing AI performance in economically valuable settings.

PortexAI

@PortexAI

7 months ago

Day 2 of #PyTorchCon 🔥 What a ride. Talked with folks using #PyTorch to fine-tune models for drug discovery, cancer research, autonomous vehicles and, of course, customer support! Thanks @PyTorch for having us!

PortexAI's tweet photo. Day 2 of #PyTorchCon 🔥

What a ride. Talked with folks using #PyTorch to fine-tune models for drug discovery, cancer research, autonomous vehicles and, of course, customer support!

Thanks @PyTorch for having us! https://t.co/Yh2ukhRegw

2

9

3

1

2K

2

6

0

504

kylewaters_ retweeted

⿻ Andrew Trask

@iamtrask

8 months ago

IMO — Ilya is wrong - Frontier LLMs are are trained on ~200 TBs of text - There's ~200 Zettabytes of data out there - That's about 1 billion times more data - It doubles every 2 years The problem is the data is private. Can't scrape it. The problem is not data scarcity, it's data access. The solution is attribution-based control (article below) "Unlocking a Million Times More Data For AI"

134

977

78

682

268K

Kyle Waters @kylewaters_

8 months ago

8/ The lack of any reliable data valuation framework is a massive blocker to surfacing novel datasets for AI. Auctions are remarkable engines for pricing non-alike goods. The data economy deserves the same foundation, and it’s worth building.

0

3

0

162

Kyle Waters @kylewaters_

8 months ago

1/ AI isn't just a compute race anymore. It's a data race too. Labs are paying top dollar for differentiated, high-signal data. It's clear now is the time to experiment with new approaches to valuing and incentivizing the creation of frontier AI data. https://t.co/4MyJIRHGGY

Lucas Nuzzi

@LucasNuzzi

8 months ago

AI has kicked off a gold rush for data, with OpenAI alone projecting $8B in data-related expenses by 2030. The challenge now is finding a reliable way to value data in this era. Our latest on data valuation techniques: https://t.co/QJ3vpmf6tH

5

16

2

4

5K

2

10

4

0

2K

Kyle Waters @kylewaters_

8 months ago

7/ We've also started exploring a data valuation framework with some of our early users on the Datalab. We're still refining it, but it takes into account a dataset's key features like uniqueness, quality, modality, freshness etc.

kylewaters_'s tweet photo. 7/ We've also started exploring a data valuation framework with some of our early users on the Datalab. We're still refining it, but it takes into account a dataset's key features like uniqueness, quality, modality, freshness etc. https://t.co/IxH4CmgoaG

1

4

0

196

kylewaters_ retweeted

PortexAI

@PortexAI

9 months ago

Noticing a trend? Specialized models continue to beat foundation models on task performance, cost, and latency. The emerging design pattern for agents is a foundation-model-brain that can invoke the most optimal tool for a given task.

0

7

2

0

2K

kylewaters_ retweeted

PortexAI

@PortexAI

9 months ago

GPT-4b micro is a model trained exclusively on specialized biological data. It was used to reverse cellular aging with a 50x improvement in efficiency relative to previous approaches. A testament to the power of narrow AI + specialized data. Amazing overview by @rowancheung:

1

8

2

1

1K

Kyle Waters

@kylewaters_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users