Just an FYI, I've migrated to posting on my own Fediverse instance: https://t.co/X2j4mjYivQ (for those interested, I've also wrote up my setup: https://t.co/5fesHd4TXc which is running https://t.co/HWnYRp44Ki)
@MarkWal06578936@0ddette@xunzic_beetle After a long chat referencing Pompeian frescoes, Roman mosaics, surviving literary evidence, and considering how refined in anatomy, rendering, and sculptural subtlety the statues themselves are, these generations can't be less likely than the clown-car/paint-by-numbers versions
@Birdyword The main issue isn't visual, but the 24/7 chronic noise. Low freq rumble is covered in ANSI S12.9 and ISO 1996-2 and there's no *technical* reason this couldn't be solved, just inadequate noise ordinances, and data center builders deciding not to due to cost and well, not caring.
@giffmana The raw score is just ChromaDB embedding results. Don't get me wrong, I think it's great that people can just do things, but my Claude found a lot of issues w/ a lot of the README claims (not just evals): https://t.co/zcRFBVlnEb
@A_y_u_s_h_i_X@deredleritt3r@GaryMarcus@FT Pro cites both S.15 and S.192 (and S.89) in its response but while I'm not an Indian tax professional, I believe you are conflating employee and employer tax burden https://t.co/lv2eyQ0cYC - anyway, I think this supports @deredleritt3r 's Pro claim more than you think it does.
@TheZachMueller While you're doing RW tests, would you mind attention-gym/nvbandwidth/memtest_vulkan on these if they're easy to script? (I think repo/dataset actually great, especially if it's easy for people to fork/PR into)
@AliTavallaie@rasbt@dontfearai@lmsysorg Not full support. If you want aotriton (FA) you have to manually build, and even then it still doesn’t get through a full attention-gym benchmark run. CK btw only compatible w gfx9 - ROCm on CDNA != ROCm on RDNA (much worse)
@sparkycollier@ClementDelangue I don’t have a post but I’ve done evals on virtually every single major JA model: https://t.co/tHBdZcjVJ1 . There’s all maybe https://t.co/pe9BBnieGA if you ignore the scores (largely don’t reflect capabilities).
I don't post much here anymore, but maybe this is worth an exception. I've spent basically all year working on an open model that is incredibly strong in Japanese. For those interested, full details published here: https://t.co/nOnnfelin8
We're incredibly proud to release the newest and most powerful member of our open, bilingual (JA/EN) Shisa V2 family: Llama 3.1 Shisa V2 405B
The strongest model ever trained in Japan, it points to how even small Japanese AI labs can compete globally!
🤗 https://t.co/L2SXHEM0OH
@VictorTaelin Largely non-actionable but I have a fair amount of research on bacterial meningitis and infection and rehab/recovery from research from a few years ago that may be useful later: https://t.co/HXbwMHyhES
@typedfemale It’s more than that. DYOR, but for laser, T-CAT based TransPRK is almost always better than LASIK. ACD willing, and if you can afford the outpatient procedure with an experienced surgeon, I found that V5 ICL was the best option for risk and outcomes.
@realGeorgeHotz@AMD I'm not so sure on the 7900 XTX hardware - need VOPD w/ no stalls to hit peak FP16, L1 cache is shared between 2 WGPs, DMA seems weak (can't hit anywhere near peak MBW even on simple bs=1 inference). High throughput, low latency, high concurrency LLM inference is nontrivial, btw.
@cognitivecompai @reguile1 @realGeorgeHotz@growing_daniel 7900XTX has 123 FP16 TFLOPS but only w/ dual issue VOPD. 3090 is 71.2 TFLOPS (142 w sparsity). 3090 also does 284/568 INT8 TOPS (7900 has no native INT8). For FP16 it may be possible to make 7900 XTX faster w/ perfect pipelining, but no one has done it it yet.
@Duderichy@sdw I often see people mention the G-1008 but I’m a G-1111 fan (has a slidable catch, much nicer file and design) of if you like the squarer look the G-1305 has a magnetic catch.
@nisten@Vultr For single-user speed `-tp 8` vs `-tp 4` should further decrease TPOT. You can also trade off some TTFT for better throughput & TPOT w/ something like `--num-scheduler-steps 8`. The most important thing I found for perf on MI300X was VLLM_USE_TRITON_FLASH_ATTN=0 (use CK FA)
@JFPuget jokes/memes aside, I pretty much stick to mamba/conda these days if I need different CUDA versions, eg: `mamba install -c "nvidia/label/cuda-12.1.1" cuda-toolkit -y` (and set CUDA_PATH/HOME) gets me stood up in a 12.1 env in about 30s.