Zandrr

@ZandrrLife

Creator. Music & Bits = LIFE. Building MOTHER. Multimodal Ontological Task Handler and Expert Reasoner. Coming to a PC near you.

latent space

Joined January 2009

1.6K Following

884 Followers

1.2K Posts

Pinned Tweet

Zandrr @ZandrrLife

about 3 years ago

"Check out 'Her Pain' MV - 1/3 lyrics & entire storyboard by MOTHER. MOTHER[w/help, early version LOL] used RunwayML Gen-2 generated 100+ scene videos, couple fx's, stitched it together using PySceneDetect & MoviePY. MOTHER ain't just a lyricist, but a music video maestro too.

0

0

0

0

334

Zandrr @ZandrrLife

12 months ago

are sink tokens really a bug, or the model telling us-- braaaa I want embeddings with native PCA-like structure?

0

0

0

0

17

Zandrr @ZandrrLife

about 2 years ago

I think if we employ a curricula style of training, optimizing for both left-to-right and any-order/FiL together, would actually make spectulative decoding useful fr fr.

0

0

0

0

38

Zandrr @ZandrrLife

about 2 years ago

can only imagine how impactful this would be for coding model with text-to-image abilities.

0

0

0

0

26

Who to follow

Costa Constantinides

Environmentalist,former rep @NYCCouncil. Love: family, Queens, serving as CEO for @vbgcq, baseball; Hate: climate deniers, pollution, word can't.

Nicholas Chirls

in the asylum with early-stage founders @asylumventures @notationcapital

Verified account

Helping startups @VC_Match CTPO @ https://t.co/oFCpzMNtl0 Prev: 👨‍🏫 @flatironschool 👨‍🎨@designerpages 💸@faradayventures A muse.

Zandrr @ZandrrLife

about 2 years ago

a true multimodal LM, with bi-directional multimodal ability and some clever hybrid data instructional data, can approximate imagination. Seems to vastly improve multimodal CoT..although I'm limited to audio..its cool asf

1

0

0

0

42

Zandrr @ZandrrLife

about 2 years ago

I been playing around with adaptive per-token compute, what if we only store "important tokens" in memory segments? vastly reduces memory while retaining meaingful context entropy.

0

0

0

0

26

Zandrr @ZandrrLife

about 2 years ago

Finally go into TransformerFAM, so it's basically activation beacons on steriods. Exploiting the residual stream for "unlimited context", had great success with beacons, so I'm excited for this...but. This highlights an issue with current self-attention on lower layers.

1

1

0

0

49

Zandrr @ZandrrLife

about 2 years ago

deeper layers, attention becomes more oblique, so to properly exploit transformerFAM, we must improve embedding quality and optimize the attention distribution shifts on lower layers.

1

0

0

0

30

Zandrr @ZandrrLife

about 2 years ago

figure if this works. I finetune a super robust coding and math model..merge them and exploit the polysemantic abilities of models to improve overall reasoning...and damn is polysemanticity a feature or a bug of train methods? new capability, always lead to n^2 more questions ha.

0

0

0

0

24

Zandrr @ZandrrLife

about 2 years ago

I'm convinced GPT-2's chatbot abilities,is just an extrapolation of AlphaCode 2 research, as evidenced by extended test-time compute charts. As I previously mentioned, models akin to GPT-4 can leverage extended test-time compute to significantly enhance final outputs.

Zandrr @ZandrrLife

over 2 years ago

This is obviously compute intensive and takes time, but again with smaller models, doable for small startups. Its provide a clear pathway for smaller models to reach true expert-level performance for code and math. wonder when this method saturates?

ZandrrLife's tweet photo. This is obviously compute intensive and takes time, but again with smaller models, doable for small startups. Its provide a clear pathway for smaller models to reach true expert-level performance for code and math. wonder when this method saturates? https://t.co/coOgEFGJNj

1

0

0

0

148

1

0

0

0

114

Zandrr @ZandrrLife

about 2 years ago

this is for a self-reinforcement learning situation btw...can fully validate if the supposed 3 iteration self-refine limit...is actually a limit. Since I'm obsess with some new cool model merging methods[evolutionary optimization + LM = heaven].

1

0

0

0

70

Zandrr @ZandrrLife

about 2 years ago

maybe llama-4 70B matches the parameter-to-to token scaling of llama3-8B, I suspect it would be as good as llama3-400B. I haven't done shit since it dropped but benchmark hahaha.

0

0

0

0

58

Zandrr @ZandrrLife

about 2 years ago

The result, whatever GPT-5 will be performance wise. People have to realize it doesn't matter. You can now get the same level of performance with a smaller model, just longer inference time. We're talking expert-level outputs. making DGX pod ever more relevant.

1

0

0

1

81

Zandrr @ZandrrLife

about 2 years ago

@finkd fucked up and changed the world...again LoL. Goat man. I forsee data-based startups. only bottleneck now is data. we need large agentic multimodal datasets. People will pay for that ish. Local/on-premise AGI looking like a real possibillity.

1

0

0

0

22

Zandrr @ZandrrLife

about 2 years ago

not to mention, the upcoming 400B will exceed even Claude Opus. ternary quantization and pruning unneeded layers[shortGPT], you're talking ~130GB of memory.

1

0

0

0

38

Zandrr @ZandrrLife

about 2 years ago

point being. I'm sure everyone is doing shit loads of benchmarking RIGHT NOW. llama-70B I believe meets that threshold, with ternary quantization, you're talking like 20GB of memory.

1

0

0

0

28

Zandrr @ZandrrLife

about 2 years ago

This event horizon allows the model exploit test-time self-improvement and extended test-time compute to vastly improve performance. Let's use openAI own words LoL extended test-time compute allow you to emulate much larger models.

1

0

0

0

30

Zandrr @ZandrrLife

about 2 years ago

So let's talk about how the goat, @finkd , literally just smashed th entire industry. Ok we know prior ablations, gpt-4 level model represent an intelligence threhold.

1

0

0

0

26

Last Seen Users on Sotwe

Trends for you

Most Popular Users