Emre @etunch - Twitter Profile

Pinned Tweet

3 months ago

AGI won't come from better LLMs and models, it'll come from better harnesses. So given the right harness, it's already here?

Rohit

@rohit4verse

3 months ago

https://t.co/H4KCn5WwNx

88

3K

309

9K

2M

0

82

Emre

@etunch

10 days ago

https://t.co/33QK1vm2eJ

0

28

Emre

@etunch

5 months ago

@emrefa we love that feeling!!

0

63

etunch retweeted

212.vc @212vc

10 months ago

Happy to see so many of our portfolio companies listed in @FastCompanyT's Startup 100 List this year! 🎉

1

10

3

2

684

Who to follow

Kivanc

@kivancok

building for mars, ex-dragon watcher

Serhat Bıçakçı

@serhatbicakci

Kurucu & CEO @afterfaang Eğitmen @mindsetinstitu İTÜ İşletme Mühendisi

ipekceliksoz

@ipekceliksoz

Venture Capitalist, Human Being 🌎

etunch retweeted

Owain Evans

@OwainEvans_UK

11 months ago

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

OwainEvans_UK's tweet photo. New paper & surprising result.
LLMs transmit traits to other models via hidden signals in data.
Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵 https://t.co/ewIxfzXOe3

281

8K

1K

5K

2M

etunch retweeted

Zafer

@ZaferElcik

about 1 year ago

Merhaba, Son bir yıldır üzerinde çalıştığımız CrayonClub sonunda yayında! 🎉 Deneyimlerinizi, ⭐️ puanlarınızı ve yorumlarınızı bekliyoruz. Destekleriniz için şimdiden çok teşekkürler! 👉 App Store: https://t.co/U9alWbvoxd 👉 Play Store: https://t.co/wjTn8LbSEK

ZaferElcik's tweet photo. Merhaba,

Son bir yıldır üzerinde çalıştığımız CrayonClub sonunda yayında! 🎉

Deneyimlerinizi, ⭐️ puanlarınızı ve yorumlarınızı bekliyoruz. Destekleriniz için şimdiden çok teşekkürler!

👉 App Store: https://t.co/U9alWbvoxd
👉 Play Store: https://t.co/wjTn8LbSEK https://t.co/5b5j4Sqcp3

2

9

3

0

497

etunch retweeted

Grant Sanderson

@3blue1brown

over 1 year ago

I just put up a new video, which was a collaboration with Terence Tao about the cosmic distance ladder. You can find the full video on YouTube, and here's a bit of extra footage that didn't make it into the final.

89

6K

587

1K

306K

etunch retweeted

Chris Lattner

@clattner_llvm

over 1 year ago

@deedydas I’m glad I didn’t take this compiler class, I would have also gotten 0/100. No wonder people think compilers are scary, they shouldn’t be taught this way! It’s also flawed in many ways (and old) but I think this is more approachable https://t.co/FWECtSYs1o

42

6K

360

3K

964K

etunch retweeted

andrew chen

@andrewchen

over 1 year ago

this stat always surprises me >50% of consumer in-app spend on iOS and Android is on mobile games 🤯 That's right, for iOS: - $25.2B total spend (that's up +13.1%) - $12.85B come from gaming - Android is even more tilted towards gaming the number is huge bc so much of the social media apps that take our time monetize through advertising, where you are the product, as opposed to letting you pay for the product!

andrewchen's tweet photo. this stat always surprises me

>50% of consumer in-app spend on iOS and Android is on mobile games 🤯

That's right, for iOS:
- $25.2B total spend (that's up +13.1%)
- $12.85B come from gaming
- Android is even more tilted towards gaming

the number is huge bc so much of the social media apps that take our time monetize through advertising, where you are the product, as opposed to letting you pay for the product!

22

298

26

159

45K

Emre

@etunch

almost 2 years ago

➕

J.R. Holmsted

@JHolmsted

almost 2 years ago

Every. Damn. Time.

117

5K

398

114

285K

0

52

Emre

@etunch

almost 2 years ago

🔥

Brian Roemmele

@BrianRoemmele

almost 2 years ago

Meet OPEN SOURCE AND FREE SakanaAI/ The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. I have been running a lot of tests on this for quite a bit. Enjoy uncensored SCIENCE. https://t.co/wXco5ZrlWF

15

522

110

370

38K

0

44

Emre

@etunch

almost 2 years ago

Brilliant

Tivadar Danka

@TivadarDanka

over 3 years ago

The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices. Encoding matrices as graphs is a cheat code, making complex behavior simple to study. Let me show you how!

TivadarDanka's tweet photo. The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior simple to study.

Let me show you how! https://t.co/abEviwAmIO

177

18K

3K

9K

2M

0

42

Emre

@etunch

almost 2 years ago

❤️

212.vc @212vc

almost 2 years ago

🎉 Congratulations to our portfolio companies, @AppSamurai, @boltinsightcom, @fazla_tr , @getmobil, @Insider, @Mall_IQ and @TrioMobil, for making the Startup 100 List by @FastCompanyT! Kudos to @B2Metric and @PhiTech_Bioinfo from @SimyaVC's portfolio for being listed 👏

212vc's tweet photo. 🎉 Congratulations to our portfolio companies, @AppSamurai, @boltinsightcom, @fazla_tr , @getmobil, @Insider, @Mall_IQ and @TrioMobil, for making the Startup 100 List by @FastCompanyT!

Kudos to @B2Metric and @PhiTech_Bioinfo from @SimyaVC's portfolio for being listed 👏 https://t.co/ZUpr8RH0OB

0

19

3

2

2K

0

22

etunch retweeted

Andrej Karpathy

@karpathy

almost 2 years ago

In 2019, OpenAI announced GPT-2 with this post: https://t.co/jjP8IXmu8D Today (~5 years later) you can train your own for ~$672, running on one 8XH100 GPU node for 24 hours. Our latest llm.c post gives the walkthrough in some detail: https://t.co/XjLWE2P0Hp Incredibly, the costs have come down dramatically over the last 5 years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g. the FineWeb-Edu dataset). For this exercise, the algorithm was kept fixed and follows the GPT-2/3 papers. Because llm.c is a direct implementation of GPT training in C/CUDA, the requirements are minimal - there is no need for conda environments, Python interpreters, pip installs, etc. You spin up a cloud GPU node (e.g. on Lambda), optionally install NVIDIA cuDNN, NCCL/MPI, download the .bin data shards, compile and run, and you're stepping in minutes. You then wait 24 hours and enjoy samples about English-speaking Unicorns in the Andes. For me, this is a very nice checkpoint to get to because the entire llm.c project started with me thinking about reproducing GPT-2 for an educational video, getting stuck with some PyTorch things, then rage quitting to just write the whole thing from scratch in C/CUDA. That set me on a longer journey than I anticipated, but it was quite fun, I learned more CUDA, I made friends along the way, and llm.c is really nice now. It's ~5,000 lines of code, it compiles and steps very fast so there is very little waiting around, it has constant memory footprint, it trains in mixed precision, distributed across multi-node with NNCL, it is bitwise deterministic, and hovers around ~50% MFU. So it's quite cute. llm.c couldn't have gotten here without a great group of devs who assembled from the internet, and helped get things to this point, especially ademeure, ngc92, @gordic_aleksa, and rosslwheeler. And thank you to @LambdaAPI for the GPU cycles support. There's still a lot of work left to do. I'm still not 100% happy with the current runs - the evals should be better, the training should be more stable especially at larger model sizes for longer runs. There's a lot of interesting new directions too: fp8 (imminent!), inference, finetuning, multimodal (VQVAE etc.), more modern architectures (Llama/Gemma). The goal of llm.c remains to have a simple, minimal, clean training stack for a full-featured LLM agent, in direct C/CUDA, and companion educational materials to bring many people up to speed in this awesome field. Eye candy: my much longer 400B token GPT-2 run (up from 33B tokens), which went great until 330B (reaching 61% HellaSwag, way above GPT-2 and GPT-3 of this size) and then exploded shortly after this plot, which I am looking into now :)

karpathy's tweet photo. In 2019, OpenAI announced GPT-2 with this post:
https://t.co/jjP8IXmu8D

Today (~5 years later) you can train your own for ~$672, running on one 8XH100 GPU node for 24 hours. Our latest llm.c post gives the walkthrough in some detail:
https://t.co/XjLWE2P0Hp

Incredibly, the costs have come down dramatically over the last 5 years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g. the FineWeb-Edu dataset). For this exercise, the algorithm was kept fixed and follows the GPT-2/3 papers.

Because llm.c is a direct implementation of GPT training in C/CUDA, the requirements are minimal - there is no need for conda environments, Python interpreters, pip installs, etc. You spin up a cloud GPU node (e.g. on Lambda), optionally install NVIDIA cuDNN, NCCL/MPI, download the .bin data shards, compile and run, and you're stepping in minutes. You then wait 24 hours and enjoy samples about English-speaking Unicorns in the Andes.

For me, this is a very nice checkpoint to get to because the entire llm.c project started with me thinking about reproducing GPT-2 for an educational video, getting stuck with some PyTorch things, then rage quitting to just write the whole thing from scratch in C/CUDA. That set me on a longer journey than I anticipated, but it was quite fun, I learned more CUDA, I made friends along the way, and llm.c is really nice now. It's ~5,000 lines of code, it compiles and steps very fast so there is very little waiting around, it has constant memory footprint, it trains in mixed precision, distributed across multi-node with NNCL, it is bitwise deterministic, and hovers around ~50% MFU. So it's quite cute.

llm.c couldn't have gotten here without a great group of devs who assembled from the internet, and helped get things to this point, especially ademeure, ngc92, @gordic_aleksa, and rosslwheeler. And thank you to @LambdaAPI for the GPU cycles support.

There's still a lot of work left to do. I'm still not 100% happy with the current runs - the evals should be better, the training should be more stable especially at larger model sizes for longer runs. There's a lot of interesting new directions too: fp8 (imminent!), inference, finetuning, multimodal (VQVAE etc.), more modern architectures (Llama/Gemma). The goal of llm.c remains to have a simple, minimal, clean training stack for a full-featured LLM agent, in direct C/CUDA, and companion educational materials to bring many people up to speed in this awesome field.

Eye candy: my much longer 400B token GPT-2 run (up from 33B tokens), which went great until 330B (reaching 61% HellaSwag, way above GPT-2 and GPT-3 of this size) and then exploded shortly after this plot, which I am looking into now :)

123

6K

743

4K

725K

etunch retweeted

Jim Rogers Nebraska @JimRogers_Nebr

about 2 years ago

@akarlin Sorry-- here's the link: https://t.co/bSoYkTZ7YU

1

46

5

36

2K

etunch retweeted

Jeff Barr ☁️

@jeffbarr

about 2 years ago

Thank you to everyone who brought this article to our attention. We agree that customers should not have to pay for unauthorized requests that they did not initiate. We’ll have more to share on exactly how we’ll help prevent these charges shortly. #AWS #S3 How an empty S3 bucket can make your AWS bill explode - https://t.co/KRgL9C1u9p

82

3K

538

675

1M

etunch retweeted

nano @nanulled

about 2 years ago

My speculation: GPT2 is an advanced multi-transformer architecture that combines two transformers (Find and Replace) The results speak for themselves This is from paper that was published by an anonymous authors

nanulled's tweet photo. My speculation:
GPT2 is an advanced multi-transformer architecture that combines two transformers (Find and Replace)
The results speak for themselves
This is from paper that was published by an anonymous authors https://t.co/t7hyvqSUPO

11

195

23

157

35K

etunch retweeted

Jack Morris

@jxmnop

about 2 years ago

one of the most important things I know about deep learning I learned from this paper: "Pretraining Without Attention" this what I found so surprising: these people developed an architecture very different from Transformers called BiGS, spent months and months optimizing it and training different configurations, only to discover that at the same parameter count, a wildly different architecture produces identical performance to transformers this may imply that as long as there are enough parameters, and things are reasonably well-conditioned (i.e. a decent number of nonlinearities and and connections between the pieces) then it really doesn't matter how you arrange them, i.e. any sufficiently good architecture works just fine i feel there's something really deep here, and we may be already very close to the upper bound of how well we can approximate a given function given a certain amount of compute. so we should spend more time thinking about other questions, such as what that function should actually look like (what data? which objective function?) and how to make it more efficient

jxmnop's tweet photo. one of the most important things I know about deep learning I learned from this paper: "Pretraining Without Attention"

this what I found so surprising:
these people developed an architecture very different from Transformers called BiGS, spent months and months optimizing it and training different configurations, only to discover that at the same parameter count, a wildly different architecture produces identical performance to transformers

this may imply that as long as there are enough parameters, and things are reasonably well-conditioned (i.e. a decent number of nonlinearities and and connections between the pieces) then it really doesn't matter how you arrange them, i.e. any sufficiently good architecture works just fine

i feel there's something really deep here, and we may be already very close to the upper bound of how well we can approximate a given function given a certain amount of compute. so we should spend more time thinking about other questions, such as what that function should actually look like (what data? which objective function?) and how to make it more efficient

93

3K

406

3K

489K

etunch retweeted

Ian Johnson 🔬🤖

@enjalot

over 2 years ago

Where do dads keep all of their jokes? In a dad-a-base! But what does a dadabase look like when you try to retrieve a joke? Introducing Latent Scope: a new open source instrument for visualizing unstructured data

2

42

12

35

6K

etunch retweeted

Jason Citron

@jasoncitron

about 2 years ago

Big news for developers today on Discord. We’ve opened up the developer preview for user installable apps as well as HTML5 experiences for apps. This dramatically changes what’s possible to build on Discord. I can’t wait to see what y’all come up with! https://t.co/9MIcJ0Vv5X

40

667

72

347

266K

Emre

@etunch

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users