Siddhartha Singh

@Sid____

ML Researcher @bfh_hesb Views are my own.

Zurich, Switzerland

Joined June 2009

420 Following

195 Followers

4.4K Posts

Sid____ retweeted

Daniel Paleka

@dpaleka

over 1 year ago

It has not been reported much, but I believe ETH Zurich has, as of last week, banned new Master and PhD students who attended a long list of universities in China, Russia, and Iran. 🧵

dpaleka's tweet photo. It has not been reported much, but I believe ETH Zurich has, as of last week, banned new Master and PhD students who attended a long list of universities in China, Russia, and Iran. 🧵 https://t.co/OEjxnqnFsA

441

235

157K

Sid____ retweeted

Daniel Han

@danielhanchen

over 1 year ago

Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes. 1. First reported by @bnjmn_marie, GA is supposed to be mathematically equivalent to full batch training, but losses did not match. 2. We reproed the issue, and further investigation showed the L2 Norm betw bsz=16 and ga=16 was 10x larger. 3. The culprit was the cross entropy loss normalizer. 4. We ran training runs with denormalized CE Loss, and all training losses match. 5. We then re-normalized CE Loss with the correct denominator across all gradient accumulation steps, and verified all training loss curves match now. 6. We've already updated @UnslothAI with the fix, and wrote up more details in our blog post here: https://t.co/VdUkKN8dsB This issue impacts all libraries which use GA, and simple averaging of GA does not work for varying sequence lengths. This also impacts DDP and multi GPU training which accumulates gradients. Please update Unsloth via pip install --upgrade --no-cache-dir unsloth and use from unsloth import unsloth_train We have a Colab notebook using our fixed GA: https://t.co/1j3kxuD4mb and a Kaggle notebook: https://t.co/LVJPtOqSPw

danielhanchen's tweet photo. Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes.

1. First reported by @bnjmn_marie, GA is supposed to be mathematically equivalent to full batch training, but losses did not match.
2. We reproed the issue, and further investigation showed the L2 Norm betw bsz=16 and ga=16 was 10x larger.
3. The culprit was the cross entropy loss normalizer.
4. We ran training runs with denormalized CE Loss, and all training losses match.
5. We then re-normalized CE Loss with the correct denominator across all gradient accumulation steps, and verified all training loss curves match now.
6. We've already updated @UnslothAI with the fix, and wrote up more details in our blog post here: https://t.co/VdUkKN8dsB

This issue impacts all libraries which use GA, and simple averaging of GA does not work for varying sequence lengths.

This also impacts DDP and multi GPU training which accumulates gradients. Please update Unsloth via pip install --upgrade --no-cache-dir unsloth and use from unsloth import unsloth_train

We have a Colab notebook using our fixed GA: https://t.co/1j3kxuD4mb and a Kaggle notebook: https://t.co/LVJPtOqSPw

745

131

410

317K

Sid____ retweeted

Vik Paruchuri

@VikParuchuri

over 1 year ago

Find it here - https://t.co/f3YrtUqmrL

Sid____ retweeted

Jim Fan

@DrJimFan

over 1 year ago

Hitchhiker's guide to rebranding: - Machine learning -> statistical mechanics - Loss function -> energy functional - Optimize the model -> minimize free energy - Trained model -> reached equilibrium distribution - KL divergence -> free energy difference - Gaussian noise -> random thermal fluctuations - Random step -> Brownian motion - SGD -> directional Brownian motion - GPU -> simulated particle accelerator - Diffusion models -> Langevin dynamics - Reinforcement learning -> control theory - Robotics -> physical computation - Audio learning -> 1D signal processing - Image learning -> 2D signal processing - Video learning -> 3D signal processing - Multimodal models -> multidimensional signal processing - Sora -> learned physics engine You're welcome

109

707

499K

Who to follow

Sourodip Kundu

@KunduSourodip

Algorithmist. Interest in #Data #AI #Blockchain #Crypto #QuantumComputing #ComputationalFinance

tanay tripathi

@RusticGramophon

Another Marketing Buff, Love All Things Business

Sid____ retweeted

over 1 year ago

NVIDIA just dropped a gigantic multimodal model called NVLM 72B 🦖 Explaining everything from what I got of reading the paper here 📝

mervenoyann's tweet photo. NVIDIA just dropped a gigantic multimodal model called NVLM 72B 🦖

Explaining everything from what I got of reading the paper here 📝 https://t.co/V4gXc0pXZv

171

836

118K

Sid____ retweeted

ARC Prize

@arcprize

over 1 year ago

We put OpenAI o1 to the test against ARC Prize. Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet. Can chain-of-thought scale to AGI? What explains o1's modest scores on ARC-AGI? Our notes: https://t.co/sV6LM1foGx

arcprize's tweet photo. We put OpenAI o1 to the test against ARC Prize.

Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet.

Can chain-of-thought scale to AGI? What explains o1's modest scores on ARC-AGI?

Our notes:
https://t.co/sV6LM1foGx https://t.co/xLPqLRbSaU

831

145

263

403K

Sid____ retweeted

Patrick Collison

@patrickc

over 1 year ago

Mario Draghi's new report on EU competitiveness doesn't mince words. "Across different metrics, a wide gap in GDP has opened up between the EU and the US, driven mainly by a more pronounced slowdown in productivity growth in Europe. Europe’s households have paid the price in foregone living standards. On a per capita basis, real disposable income has grown almost twice as much in the US as in the EU since 2000." "First – and most importantly – Europe must profoundly refocus its collective efforts on closing the innovation gap with the US and China, especially in advanced technologies. Europe is stuck in a static industrial structure with few new companies rising up to disrupt existing industries or develop new growth engines. In fact, there is no EU company with a market capitalisation over EUR 100 billion that has been set up from scratch in the last fifty years, while all six US companies with a valuation above EUR 1 trillion have been created in this period. This lack of dynamism is self-fulfilling." "There are not enough academic institutions achieving top levels of excellence and the pipeline from innovation into commercialisation is weak. [...] However, while the EU boasts a strong university system on average, not enough universities and research institutions are at the top. Using volume of publications in top academic science journals as an indicative metric, the EU has only three research institutions ranked among the top 50 globally, whereas the US has 21 and China 15." "Regulatory barriers to scaling up are particularly onerous in the tech sector, especially for young companies. Regulatory barriers constrain growth in several ways. First, complex and costly procedures across fragmented national systems discourage inventors from filing Intellectual Property Rights (IPRs), hindering young companies from leveraging the Single Market. Second, the EU’s regulatory stance towards tech companies hampers innovation: the EU now has around 100 tech-focused laws and over 270 regulators active in digital networks across all Member States. Many EU laws take a precautionary approach, dictating specific business practices ex ante to avert potential risks ex post. For example, the AI Act imposes additional regulatory requirements on general purpose AI models that exceed a pre-defined threshold of computational power – a threshold which some state-of-the-art models already exceed. Third, digital companies are deterred from doing business across the EU via subsidiaries, as they face heterogeneous requirements, a proliferation of regulatory agencies and “gold plating” of EU legislation by national authorities. Fourth, limitations on data storing and processing create high compliance costs and hinder the creation of large, integrated data sets for training AI models. This fragmentation puts EU companies at a disadvantage relative to the US, which relies on the private sector to build vast data sets, and China, which can leverage its central institutions for data aggregation. This problem is compounded by EU competition enforcement possibly inhibiting intra-industry cooperation. Finally, multiple different national rules in public procurement generate high ongoing costs for cloud providers. The net effect of this burden of regulation is that only larger companies – which are often non-EU based – have the financial capacity and incentive to bear the costs of complying. Young innovative tech companies may choose not to operate in the EU at all." More: https://t.co/x1d1ApvG2Z.

695

12K

13M

Sid____ retweeted

Sebastian Raschka

@rasbt

about 2 years ago

I usually consider these as "oh, interesting. Since that doesn't look too complicated to implement, let's bookmark this and use this in a project and see if it actually works as well as advertised. (Spoiler: it usually doesn't.)" With DPO itself, you find that it works pretty well but not as well as RLHF+PPO. It's good enough that more people use it than PPO at this point though -- thanks to the added convenience of not having to train a separate reward model. Now with SimPO, since it's super, super easy to implement, I will actually use it and see what I find. I'll probably add that to the bonus materials for Chapter 7 of my LLMs from Scratch book. But all that being said, if you wait a few months, you will find follow-up papers where it turns out that the original paper was perhaps too good to be true. E.g., I saw this with DoRA the other day: https://t.co/kMhmdndPES

Sid____ retweeted

Mark Riedl @mark_riedl

about 2 years ago

The OpenAI superalignment team was only one kind of “safety”—the unproven kind. Meanwhile, there are so many actual harms that require serious thought and research. If you are panicking because OpenAI stopped caring about “safety”, you’ve probably bought into too much hype.

Sid____ retweeted

François Chollet

@fchollet

about 2 years ago

It's amazing to me that the year is 2024 and some people still equate task-specific skill and intelligence. There is *no* specific task that cannot be solved *without* intelligence -- all you need a sufficiently complete description of the task (removing all test-time novelty and uncertainty), and you can achieve arbitrary levels of skills while entirely by-passing the problem of intelligence. In the limit, even a simple hashtable can be superhuman at anything.

fchollet's tweet photo. It's amazing to me that the year is 2024 and some people still equate task-specific skill and intelligence. There is *no* specific task that cannot be solved *without* intelligence -- all you need a sufficiently complete description of the task (removing all test-time novelty and uncertainty), and you can achieve arbitrary levels of skills while entirely by-passing the problem of intelligence. In the limit, even a simple hashtable can be superhuman at anything.

180

546

211K

Sid____ retweeted

Soumith Chintala

@soumithchintala

about 2 years ago

apparently Google laid off their entire Python Foundations team, WTF! ( @SkyLi0n who is one of the pybind11 maintainers just informed me, asking what ways they can re-fund pybind11) The team seems to have done substantial work that seems critical for Google internally as well. There's a hackernews thread if folks want to read more: https://t.co/iz6uVNk4Q9

soumithchintala's tweet photo. apparently Google laid off their entire Python Foundations team, WTF!
( @SkyLi0n who is one of the pybind11 maintainers just informed me, asking what ways they can re-fund pybind11)
The team seems to have done substantial work that seems critical for Google internally as well.
There's a hackernews thread if folks want to read more: https://t.co/iz6uVNk4Q9

112

506

Sid____ retweeted

Schneier Blog

@schneierblog

about 2 years ago

The Rise of Large-Language-Model Optimization https://t.co/o44BEimu2A

Sid____ retweeted

Matt Shumer

@mattshumer_

about 2 years ago

The dataset is everything. Great read: https://t.co/snGcPx0M16

107

538

895K

Sid____ retweeted

swissinfo.ch @swissinfo_en

about 2 years ago

Swiss academics criticise a “major discrepancy” between the resources available and Switzerland’s “ambitious” strategic objectives, which remain unchanged. https://t.co/P9lLv9ff92 @snsf_ch @Innosuisse @CH_universities @ETH_Rat @ETH_en @EPFL

Sid____ retweeted

MIT CSAIL

@MIT_CSAIL

about 2 years ago

The 12 types of ML papers. (created by @natashajaques @maxhkw) #MachineLearning #ML #DataScience

361

482

130K

Sid____ retweeted

Andy Greenberg (@agreenberg at the other places)

@a_greenberg

about 2 years ago

In 2022, we at WIRED told the story of P4x, a hacker who singlehandedly took down the entire North Korean internet. Now he's revealing his name—Alejandro Caceres—and his strange experience since then: trying to teach the US military to be more like him. https://t.co/urNDXgwzHM

336

119

175

125K

Sid____ retweeted

Will Knight

@willknight

about 2 years ago

Time to use that in-room safe. Hackers crack millions of hotel room keycards by the legendary @a_greenberg. https://t.co/dNqHUndJhS

15K

Sid____ retweeted

swissinfo.ch @swissinfo_en

about 2 years ago

Even in a welfare state like Switzerland more and more people are struggling to find somewhere to live. Most emergency shelters are full. Why? Some homeless people tell their stories. https://t.co/5mgzh4m0P4

Sid____ retweeted

swissinfo.ch @swissinfo_en

about 2 years ago

Over half of Swiss families are struggling to make ends meet, according to a survey. We’ve interviewed Philippe Gnaegi, director of @ProFamiliaCH, who is now calling for swift political action. 👇 https://t.co/KXeuPDNl7i

Sid____ retweeted

The New York Times

@nytimes

about 2 years ago

Drivers of cars by General Motors, Kia, Subaru and Mitsubishi may not realize that their driving data — like when they sped or braked too hard — is being shared with insurance companies. Numerous people have complained about spiking premiums as a result. https://t.co/4cil1HHsCe

nytimes's tweet photo. Drivers of cars by General Motors, Kia, Subaru and Mitsubishi may not realize that their driving data — like when they sped or braked too hard — is being shared with insurance companies. Numerous people have complained about spiking premiums as a result. https://t.co/4cil1HHsCe https://t.co/6aDp6NS9h5

146

139K

Siddhartha Singh

@Sid____

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users