@kepano Civilisation is built on delegating understanding. Doing your own understanding needs to be very strategic. In most situations, it becomes the equivalent of growing your own vegetables. It's enjoyable but does not meaningfully contribute to your nutrition.
Interesting examples of the trade offs in using local models.
Not all of reasoning goes to the heart of the problem. Here's an example of Gemma 4 12b spending 30 secs on solving the problem vs Gemma 4 31b spending several minutes running around parsing the question - used the word 'wait' 26 times.
However, Gemma 4 31b is miles better at the actual Czech. They both gave a similar quality answer when it comes to the subject - different types of use. But the smaller model had some small model problems - made up word, questionable agreement, wrong word order of totiž.
Also, the small model was 4x faster (37 tps vs 9tps) and took 37 seconds to reason vs 5 minutes of the large model.
On the other hand, they both outperform GPT-3.5/4 era models and 31b is better than GPT-4o.
I did a workshop on Evaluating Claims About AI on Friday. Two big messages:
1. The same principles apply as in domain - combination of heuristics (checklist rules) and hermeneutics (knowledge of the domain)
2. The only way to reliably take advantage of the heuristics is to learn about and use AI - this takes time
"I don't really care about non-humanities fields" is such a typical mindset of people in the humanities and it's often a shorthand, I don't really understand or want to try to learn anything outside a very narrow shadow play of manipulating cliches around which is what so much of the humanities has become.
@Kyriakos_Pelek Yeah, it does work really well for that. Transcripts are a bit weaker - take too long and tie up the app and also it does not handle other languages that Parakeet excels at - which is annoying because the Gemma models are pretty strong otherwise.
Google Eloquent is a surprisingly feature complete and capable local LLM dictation app. With Gemma 4 12b you get a transcription and polishing all in one. Very impressed on Macbook Pro M4 Max - at least.
Unlock local, agentic workflows with Gemma 4 12B and Google AI Edge, directly on your laptop. Experience 100% on-device AI:
• Generate code in AI Edge Gallery (new to Mac)
• Dictate and edit text via AI Edge Eloquent (new to Mac)
• Serve Gemma 4 12B locally with LiteRT-LM
Dive in: https://t.co/gr7tOrZmc0
Your Codex activity now has a home, and an easier way to share it.
Codex profiles show your activity graph, streaks, lifetime tokens, peak daily tokens, and top features like plugins and /fast mode.
Private by default. Share a card when you want to.
Not sure. Universities have been pretty bad at preparing people for the entry-level positions. That's why corporations have always been complaining about them - and the complaints haven't stopped. University curricula (no matter the advertising) are very light on practical entry-level skills and padded with all the 'irrelevances' like prose writing, literature reading, discipline history, 'critical thinking' the faculty feel comfortable with.
The corporatization of universities is more the results of CEO-envy by leadership. Which results in them hiring support staff (incl. marketing) that end up positioning the universities in the 'value for money' place: Pay us enormous amounts of money and you will get it back through high-paying jobs in the future. As that promise starts breaking down, Universities will be facing a challenge of decreasing enrollments.
Whatever happens in the University classroom is pretty much a tangent to this particular trajectory.
The most annoying thing about AI writing ticks like not A, but B or even delve is that so many of them used be my writing ticks. And now I have to change because now everyone's talking about them, they are starting to annoy me, too.
The reason the humanities deserve to die out is not that they do or don't use whatever the latest tech is making the rounds or because they focus on things of little relevance to most people, it is because they have become really bad at the very core of their conceit: reading and understanding texts.
Let's take the latest Olga Tokarchuk controversy. Let's take this sentence from lithub: "Olga Tokarczuk’s recent remarks implying she had used AI to write her recent novel"
And then look at Tokarchuk's actual remarks which in no way "imply" she did that!
This sort of shallow and insipid dishonesty and textual illiteracy by people who go about making fun of others for not appreciating the nuance of text just makes my blood boil.
If you cannot read all texts you want to comment on with care and probity, what's the point of reading Jane Austin with a "critical mindset".
Most people who say things like "model made a mistake in X because it was trained to the Internet" still think in terms of GPT-2 - 3.5. But model training is completely different now, so any claims about model behavior based on assumptions about the data are likely wrong.
What is mid-training?
The stage between pre-training and post-training
A base model is continued on a smaller, curated data mixture chosen to strengthen capabilities that the original pre-training run undercovered, such as multilinguality, domain knowledge, or long-context extension.
It usually keeps a pre-training-like objective, but uses higher-quality or more targeted data so later instruction tuning, preference tuning, or RL can shape behavior on top of stronger capabilities.
Learn more here: https://t.co/WhpYkyGlv8
I increasingly find myself caring less and less about how good or bad AI writing is. Writing is not that big a deal. Most people who are not bureaucrats of some form almost never write and WRITING is not nearly as important as recording and communicating knowledge. LLMs are good enough for what we need. There's more human writing of variable quality than we need even if nothing was ever written by a human again.
There is a lot being written about the stylistic tells of AI writing (em-dashes, etc.) but this paper looks at AI narrative tells
Fascinating differences between AI & human narrative, and asking AI to write in different styles doesn't do much to change it https://t.co/azkRHz34NQ
I think it's more of a relational thing. What the recipient expects of that particular source. And the nature of the note - the more literary and 'personally heartfelt', the more the use of AI would be questionable (not ruled out but more open to question).
But in many ways, condolence is quite a formulaic genre - also varies greatly by culture.
How about things like love letters, arguments between spouses, admonishments or punishments meted out to children, etc. I think even here AI is permissible - people often ask friends, use literary sources, guides, etc. Is the intent to make pretensions about the skill or more to establish a relationship with that intent.
Many people who most agonise over this live in quite rarified literacy environments where the rest of the world is much more forgiving and/or just doesn't care.
Isn't that pretty much the only thing that matters? That's pretty much what defined the connectionism debate in the 1980s and what is behind things like Winograd schemas or the Moravec paradox. In linguistics it was very much at heart of the generative syntax v generative semantics debate. I don't think this is engineering, I think this is the very heart of the matter.
"LLMs do not understand" = they don't have the same feelings we do when we experience what we casually label understanding.
It would be nice if people came up with an operational definition of "understanding" so we wouldn't have to keep talking about our feelings all the time.
The Pope is making exactly our point. LLMs “may imitate or even simulate, but they do not understand.”
This is the core epistemic fault line.
Most AI evaluation is still based on one assumption: if a system statistically approximates human behaviour, then it is close to human intelligence.
But approximation is not intelligence.
Simulation is not understanding.
LLMs can produce the right answer without knowing why it is right. They can simulate empathy without feeling. They can imitate judgment without responsibility. They can generate coherent explanations without having a world to which those explanations are accountable.
Stop confusing behavioural similarity with cognitive equivalence.
Human understanding is embodied, affective, relational, motivational, and normative. It is not just the production of plausible text.
*
Full paper in the first reply
I disagree about condolence notes. These are hard and stressful for most people to write. That's why there are many guides on how to do it and even entire business categories with pre-made cards.
I'm not even sure the grant decisions are that important to be written by humans. I think humans should certify that they stand by the results but the writing should be irrelevant. I'm not even sure grants should be reviewed by humans. AI should certify they meet a threshold and the rest should be lottery. Plus certain discretionary fund for individual reviewers.
I'm more annoyed at AI-written lit reviews - not the writing or use of AI, just the fact that the authors likely phoned in understanding the literature even more than usual.
We've reset 5-hour and weekly rate limits for all users on Pro and Max plans.
We fixed an issue that caused some Claude Code sessions to spawn excessive parallel subagents, burning through usage faster than expected.
The car wash example is not very indicative of model intelligence. It is a conflict between presupposition and conversation maxims. It breaks the maxim of relevance because it assumes this is an honest question which conflicts with the presupposition of "I want to go the carwash" because I want to wash my car. In fact, there are many good scenarios where walking to the carwash makes sense.
That's not to say that the models do not make mistakes of this sort even in contexts where this is not the case, but it's not necessarily because this is outside the distribution but because 1. the model is not powerful enough to have the full representation OR 2. the model was finetuned to "look" at one thing over the others (a sort of attentional blindness as with humans missing the moonwalking ape - ie always prefer the response that privileges walking over driving).
Btw do you have a personal ranking of LLM gotchas? I think that "50m to carwash" is among the best (as a test) and the worst (as a signal of stupidity). It's not a memorized riddle. It's not about tokenizers. It's not spatial. It's straight up a failure to comprehend a situation.