@andy_matuschak@ptdamiba@obsdmd@andy_matuschak Sorry for interjecting; not sure if you were aware of this recent course on tools for thought from Brown University https://t.co/qto9ixHMvB I would be curious to know your opinion on it: e.g., does the syllabus / reading list look reasonable to you?
@njmarko@karpathy Thanks! Out of curiosity, how long does this evaluation take? Also, how do you maintain a fair and consistent grading across students, given that you presumably need to ask different questions to each student?
@njmarko@karpathy BTW did you find that AI-assisted students develop better understanding than non-AI-assisted ones? I taught a project-based course in 2024 & 2025, and had oral exam at the end. The 2025 class turned in much better projects (thanks to AI), but their understanding was clearly worse
@HerrDreyer Thanks for the post! I would also be curious to read the 3,000-word draft about why the focus should be more on the delivery of the talk rather than a precise technical exposition 🙂
@wkvong Thanks for the answer! I'll think about this! This paper suggests that a low temperature parameter (I see that you have used τ = 0.07) might help close the gap, but they have only carried experiments on a toy example.
https://t.co/YcqFFTwXgG
@wkvong I was curious: is the contrastive loss motivated in any way by cognitive science? The negatives pairs used by the contrastive loss don't seem to have a direct correspondent in learning (while explicit negative corrections happen they seem to be more sparse than positive pairs).
@wkvong Does CVCL hear? From the model diagram it looks like the model gets text as input, not audio. Is that right? If so, do you think the conclusions would change in any way if it were to use audio?
@wkvong I assume Fig. A (second image) shows audio and text embeddings. I'm surprised that they are so well aligned given that CLIP exhibits a modality gap [1]: images and text are placed in different spaces. Did you postprocess the embeddings in any way?
[1] https://t.co/zr6DvviIRg
@abursuc I've also come across:
"[W]e do *not* aim to develop new components; instead, we make *minimal* adaptations that are sufficient to overcome the [...] challenges." [emphasis theirs]
This was an unexpected word of caution, since the paper was already interesting and consistent.
The term "ablation" is widely misused lately in ML papers. An ablation is a removal: you REMOVE some component of the system (e.g., remove batchnorm). A "sensitivity analysis" is where you VARY some component (e.g., network width). #pedantic
@andrewhowens Very nice work! Do you think the learnt features would also be suitable for training deep fake detectors? Usually, these detectors need to rely on low-level information, but as a consequence are very sensitive to data shifts (changes of the generator or its training dataset).