Kushal Arora @karora4u - Twitter Profile

karora4u retweeted

about 2 months ago

Our white paper just came out on Arxiv https://t.co/HcSCepvXfM. We open-sourced it all https://t.co/lfySRERfeQ. Our project website also has links to the white paper, the weights, and more videos https://t.co/DRnViyXRWz

1

35

6

22

4K

karora4u retweeted

Jean Mercat @MercatJean

about 2 months ago

Releasing VLA Foundry: an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. End-to-end control from language pretraining to action-expert fine-tuning — no more stitching together incompatible repos.

10

491

76

364

75K

karora4u retweeted

Katherine Liu @robo_kat

almost 2 years ago

ReFiNe expands on a neat idea we first presented at CoRL with Recursive Octree Auto-Decoders: that recursion can enable very high compression rates of 3D data. In ReFiNe, we use this property to represent continuous fields and can decode multiple NeRFs/SDFs with a single network.

0

10

2

0

1K

karora4u retweeted

Sergey Zakharov

@ZakharovSergeyN

almost 2 years ago

Excited to introduce our paper, ReFiNe, at #SIGGRAPH2024 this Thursday! Learn how we encode multiple assets as continuous neural fields with high precision & low memory usage by exploiting object self-similarity. @RaresAmbrus @robo_kat @adnothing Webpage: https://t.co/JoSqVTIDBr

0

16

4

0

2K

Who to follow

Ioannis Mitliagkas (Γιάννης Μητλιάγκας)

@bouzoukipunks

Associate prof. at the University of Montréal and Mila. Research scientist Google DeepMind. Previously Stanford; UT Austin.

Jerry Chi (ジェリー・チー)

@peacej

Head of Tokyo AI Innovation Lab @supercell ← PixAI←日本代表 @ Stability AI ←Indeed←SmartNews←Supercell←Google←Wharton←起業←Stanford; posting on GenAI in 日本語+English

Yixuan Su

@yixuan_su

Agentic Reasoning@cohere - I led the dev of Command-A+ and Command-A-Reasoning. Contributed to Command-A/R7B/R+/R. Previously PhD@CambridgeLTL

Kushal Arora @karora4u

almost 2 years ago

@Swarooprm7 Congrats Swaroop!

1

0

124

Kushal Arora @karora4u

almost 2 years ago

@ke_huang275 @achalddave Though it is difficult to say why benchmarks got better with IT, my speculation is this is due to the DCLM-IT data, as it contains datasets such as Nectar, no_robots, StarCoder2-Self-OSS-Instruct, which have math, code, QA data that might help improve the benchmarks performance.

0

2

0

78

Kushal Arora @karora4u

almost 2 years ago

@ke_huang275 @achalddave @ke_huang275 We trained for 10 epochs as we saw AlpacaEval score improving beyond first few epochs. So, we decided to keep fine-tuning. Here is how the AlpacaEval looked for each epoch:

karora4u's tweet photo. @ke_huang275 @achalddave @ke_huang275 We trained for 10 epochs as we saw AlpacaEval score improving beyond first few epochs. So, we decided to keep fine-tuning.

Here is how the AlpacaEval looked for each epoch: https://t.co/aSa6vtxN1a

1

2

0

77

karora4u retweeted

Sachin Grover @sachingrover

almost 2 years ago

I am looking for positions in LLM based agents, and combining planning and learning techniques/systems. I have around 2.5 years of industry research including two years at @PARCinc as a research scientist and multiple summer intern positions @amazon @alexa99. 1/6

3

16

5

2

6K

karora4u retweeted

Thomas Kollar @tkollar

almost 2 years ago

Building language models is difficult and requires high quality preprocessing, modeling, evaluation and large scale training. As significant collaborators in this project at TRI, the resulting 7B model DCLM-7B is a significant achievement. It is a competitor to Mistral 7B and LLaMA-7B, even though trained on less data. And it’s fully open. And that’s just the start of the competition. Excited to see how others leverage these results to build even more capable language models and improve dataset quality.

1

3

1

0

885

Kushal Arora @karora4u

almost 2 years ago

For more details, see the paper https://t.co/YMfc46es7n, and the website: https://t.co/OuJPQLfTYk.

0

2

0

64

Kushal Arora @karora4u

almost 2 years ago

One thing I have come to greatly appreciate over the last year is the role of data filtering in building SOTA language models. DCLM introduces a filtered 240T dataset, a 7B open-source model that is competitive w/ Llama3 with 2-6x fewer tokens & a pipeline to build new datasets.

Vaishaal Shankar @Vaishaal

almost 2 years ago

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

Vaishaal's tweet photo. I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x https://t.co/uNe5mUJJxb

7

273

79

128

120K

1

6

0

735

karora4u retweeted

Aran Komatsuzaki

@arankomatsuzaki

almost 2 years ago

DataComp-LM: In search of the next generation of training sets for language models - Provides a corpus of 240T tokens from Common Crawl - Trains a LM using their filtered dataset, which performs similarly on NLU tasks w/ 6.6x less compute than Llama 3 8B proj: https://t.co/soBq1ZnAwT abs: https://t.co/r8nWIHwq1t

arankomatsuzaki's tweet photo. DataComp-LM: In search of the next generation of training sets for language models

- Provides a corpus of 240T tokens from Common Crawl
- Trains a LM using their filtered dataset, which performs similarly on NLU tasks w/ 6.6x less compute than Llama 3 8B

proj: https://t.co/soBq1ZnAwT
abs: https://t.co/r8nWIHwq1t

1

202

42

118

34K

karora4u retweeted

Luca Soldaini 🎀

@soldni

almost 2 years ago

Really impressed by the work DCLM folks did!!

0

10

2

1

2K

karora4u retweeted

Achal Dave @achalddave

almost 2 years ago

Check out DataComp for language models! Open data, open code, open training recipe, and close to Llama3-8B performance. This has been a labor of love over the last year, a huge thanks to all the collaborators for helping make this happen!

1

27

10

1

4K

karora4u retweeted

Vaishaal Shankar @Vaishaal

almost 2 years ago

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

7

273

79

128

120K

Kushal Arora @karora4u

almost 2 years ago

Sedrick is an amazing researcher and has done amazing work on pre-training, scaling, evaluation, Japanese LMs, code models, VLMs, and more in the last year. If you are at NAACL, do get a coffee with him!

Sedrick Keh @sedrickkeh2

almost 2 years ago

I'm attending #NAACL2024 at Mexico City this week! Excited to chat about pre-training, evaluation, and multimodality! (also excited for🌮🌯🫔)

0

23

0

1K

0

6

1

0

674

karora4u retweeted

Sedrick Keh @sedrickkeh2

about 2 years ago

Recurrent models like RWKV and Mamba have gained attention recently, but these can be costly to train and iterate on. What if we could simply... turn Mistral/Llama/Gemma into an RNN? 🎩🪄 Presenting our work, Linearizing Large Language Models! https://t.co/hbaSUWk8uc

4

165

32

127

20K

karora4u retweeted

Sedrick Keh @sedrickkeh2

about 2 years ago

2024 has seen tons of cool work on RNNs (cc @RWKV_AI, @BlinkDL_AI, @GoogleDeepMind Griffin, @AI21Labs). We hope our work helps further research into linear models! Work done at @ToyotaResearch with @MercatJean @vslevic @sedrickkeh2 @karora4u @achalddave @adnothing @tkollar

1

7

1

0

835

Kushal Arora @karora4u

about 2 years ago

A really large in-the-wild robotics dataset from TRI colleagues And university partners, a major step in the direction of building Robotics Foundation Model.

Alexander Khazatsky @SashaKhazatsky

about 2 years ago

After two years, it is my pleasure to introduce “DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset” DROID is the most diverse robotic interaction dataset ever released, including 385 hours of data collected across 564 diverse scenes in real-world households and offices

5

300

78

116

119K

0

3

0

230

Kushal Arora

@karora4u

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users