Originally posted: 2025-01-07. Last updated: 2025-01-07. View source code for this page here.

#Assorted AI/LLM links and quotes

#Quotes

Much of the point of a model like o1 is not to deploy it, but to generate training data for the next model. Every problem that an o1 solves is now a training data point for an o3[...] I am actually mildly surprised OA has bothered to deploy o1-pro at all, instead of keeping it private and investing the compute into more bootstrapping of o3 training etc. (This is apparently what happened with Anthropic and Claude-3.6-opus - it didn't 'fail', they just chose to keep it private and distill it down into a small cheap but strangely smart Claude-3.6-sonnet.)
gwern• 15 Jan 2025•source

My next book, I'm writing even more for the AIs. Again, human readers are welcome. It will be free.
But who reviews it? Is TLS going to pick it up? It doesn't matter anymore. The AIs will trawl it and know I've done this, and that will shape how they see me in, I hope, a very salient and important way.
Tyler Cowen• 09 Jan 2025•source

Chat-driven programming. [...] It requires at least as much messing about to get value out of LLM chat as it does to learn to use a slide rule, with the added annoyance that it is a non-deterministic service that is regularly changing its behavior and user interface. Indeed, the long-term goal in my work is to replace the need for chat-driven programming, to bring the power of these models to a developer in a way that is not so off-putting. But as of now I am dedicated to approaching the problem incrementally, which means figuring out how to do best with what we have and improve it.
David Crawshaw• 06 Jan 2025•source

A lot of the value I personally get out of chat-driven programming is I reach a point in the day when I know what needs to be written, I can describe it, but I don't have the energy to create a new file, start typing, then start looking up the libraries I need... LLMs perform that service for me in programming. They give me a first draft, with some good ideas, with several of the dependencies I need, and often some mistakes. Often, I find fixing those mistakes is a lot easier than starting from scratch.
David Crawshaw• 06 Jan 2025•source

The world needs [more, better, harder, etc] evals for AI. This is one of the most important problems of our lifetime, and critical for continual progress.
Logan Kilpatrick• 02 Jan 2025•source

There's a variety of words that I wish we had, which we do not yet have. One word is we have the concept of "alpha" in finance, and alpha - one Greek letter smuggles in a huge amount of understanding about how the world works. I would love to be able to describe someone's alpha above the LLM baseline in discussing a topic. Because there are a lot of human writers in the world who have no alpha above the LLM baseline, and that's been true since before LLMs were a thing. The Twitterism is sometimes "this person is an NPC" - there is no intellectual content here, the performance of class and similar can allow one to pretend that there is intellectual content, but there is no intellectual content.
patio11• 02 Jan 2025•source

Easy prediction for 2025 is that the gains in AI model capability will continue to grow much faster than (a) the vast majority of people's understanding of what AI can do & (b) organizations' ability to absorb the pace of change. Social change is slower than technological change. This all means that things will get weirder and the weirdness will be unevenly distributed.
Ethan Mollick• 01 Jan 2025•source

A lot of better informed people have sworn off LLMs entirely because they can't see how anyone could benefit from a tool with so many flaws. The key skill in getting the most out of LLMs is learning to work with tech that is both inherently unreliable and incredibly powerful at the same time. This is a decidedly non-obvious skill to acquire!
Simon Willison• 31 Dec 2024•source

The efficiency thing is really important for everyone who is concerned about the environmental impact of LLMs. These price drops tie directly to how much energy is being used for running prompts. There's still plenty to worry about with respect to the environmental impact of the great AI datacenter buildout, but a lot of the concerns over the energy cost of individual prompts are no longer credible.
Simon Willison• 31 Dec 2024•source

General consensus in the replies and quotes of this seems to be that the entire concept of "AI skills" is a joke - how hard is typing text into a chatbot, really? I will continue to argue that it's genuinely difficult, and that the challenge in using these tools is widely underestimated
Simon Willison• 26 Dec 2024•source

The interesting part is that they will crush tests but you wouldn't hire them over a person for the most menial jobs. It's a neat challenge how to properly evaluate the "easy stuff" that is secretly hard because of Moravec's paradox. Very long contexts, autonomy, common sense, …
Andrej Karpathy• 24 Dec 2024•source

ARC is a silly benchmark, the other results in math and coding are much more impressive. o3 is just o1 scaled up, the main takeaway from this line of work that people should walk away with is that we now have a proven way to RL our way to super human performance on tasks where it's cheap to sample and easy to verify the final output. Programming falls in that category, they focused on known benchmarks but the same process can be done for normal programs, using parsers, compilers, existing functions and unit tests as verifiers. Pre o1 we only really had next token prediction, which required high quality human produced data, with o1 you optimize for success instead of MLE of next token.
m_ke• 21 Dec 2024•source

One very important thing to understand about the future: the economics of AI are about to change completely. We'll soon be in a world where you can turn test-time compute into competence -- for the first time in the history of software, marginal cost will become critical.
François Chollet• 20 Dec 2024•source

A central property in formal software engineering is compositionality: the idea that composite systems can be understood in terms of the meanings of their parts and the nature of the composition, rather than by having to look at the parts themselves.
This idea lies at the heart of piecewise development: parts can be engineered (and verified) separately and hence in parallel, and reused in the form of modules, libraries and the like [...]
Current AI systems have no internal structure that relates meaningfully to their functionality. They cannot be developed, or reused, as components. There can be no separation of concerns or piecewise development.
Eerke Boiten• 06 Nov 2024•source

I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.
Andrej Karpathy• 09 Dec 2023•source

#Links

Does current AI represent a dead end? This article made me think of LLMs as like really software with no tests, no documentation, and lots of bugs. And yet very useful.
Is AI progress slowing down? A good guide to thinking about whether scaling is dead
Machines of Loving Grace The CEO of Anthropic outlines how AI could transform the world for the better
Moon by Bartosz Ciechanowski Not directly relevant to LLMs, but it's interesting to think at what point an LLM could produce an article like this. I feel like they're a long way off.
Latent Space Ultimate Guide to Prompting Prompting can go very deep!
LLM Challenge: Writing Non-Biblical Sentences There are lots of examples of strange capabilities like this you'd never see in a benchmark
Building effective agents. What are agents and how do we expect them to evolve

#Podcast episdoes

Francois Chollet on the Dwarkesh podcast - LLMs won't lead to AGI Understanding LLMs reasoning abilities
Cursor Team: Future of Programming with AI | Lex Fridman Podcast How AI is being integrated into software development
Amanda Askell on Lex Fridman How LLMs are trained to be useful, the importance of prompting
Chris Olah on Lex Fridman Interpretability