@CercaniasVLC el meu dia am vostès:
Arribada en llarga distància a sorolla 16:30
Passeget fins a Nord baix el sol de juliol ❤️
A Nord ens feu canviar de tren 2 vegades 🔝
Rodalies en vagons plenets, sense aire acondicionat , 36 graus dins del vagó (porte termòmetre)
Moleu mil!
El chaval de rojo con el que estaba hablando el Rey es el líder de la asociacion neonazi de Revuelta.Estaban allí y ningún servicio de información había avisado 🤔🤔🤔
Arribada dels bombers francesos
- ¿Me estás diciendo que somos los primeros auxilios en llegar ?
-Si
-¿No habeis recibido auxilio?
-Aún no ha llegado la policia de España
-¿Nadie?
-En este pueblo, no
Em diuen que no fem córrer més aquest video, que demostra clarament, la incopetència, irresponsabilitat i la pocavergonya dels governats responsables espaÑols, davant la pitjor catàstrofe viscuda mai a EspaÑa
Hola @fiscal_es, us hem vist actuar més ràpid per un tuit pujat de to. Aquí la dona del temps de la TV valenciana us aporta proves de la negligència criminal que ha causat centenars de morts i estralls incalculables. Què esperau per obrir la investigació?
It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥
Check out the demo (+ source code)! 👇
This is really a 'WOW' paper. 🤯
Claims that MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales and by utilizing an optimized kernel during inference, their model’s memory consumption can be reduced by more than 10× compared to unoptimized models. 🤯
'Scalable MatMul-free Language Modeling'
Concludes that it is possible to create the first scalable MatMul-free LLM that achieves performance on par with state-of-the-art Transformers at billion-parameter scales.
📌 The proposed MatMul-free LLM replaces MatMul operations in dense layers with ternary accumulations using weights constrained to {-1, 0, +1}. This reduces computational cost and memory utilization while preserving network expressiveness.
📌 To remove MatMul from self-attention, the Gated Recurrent Unit (GRU) is optimized to rely solely on element-wise products, creating the MatMul-free Linear GRU (MLGRU) token mixer. The MLGRU simplifies the GRU by removing hidden-state related weights, enabling parallel computation, and replacing remaining weights with ternary matrices.
📌 For MatMul-free channel mixing, the Gated Linear Unit (GLU) is adapted to use BitLinear layers with ternary weights, eliminating expensive MatMuls while maintaining effectiveness in mixing information across channels.
📌 The paper introduces a hardware-efficient fused BitLinear layer that optimizes RMSNorm and BitLinear operations. By fusing these operations and utilizing shared memory, training speed improves by 25.6% and memory consumption reduces by 61% over an unoptimized baseline.
📌 Experimental results show that the MatMul-free LLM achieves competitive performance compared to Transformer++ baselines on downstream tasks, with the performance gap narrowing as model size increases. The scaling law projections suggest MatMul-free LLM can outperform Transformer++ in efficiency and potentially in loss when scaled up.
📌 A custom FPGA accelerator is built to exploit the lightweight operations of the MatMul-free LLM. The accelerator processes billion-parameter scale models at 13W beyond human-readable throughput, demonstrating the potential for brain-like efficiency in future lightweight LLMs.