Originally posted: 2025-01-07. Last updated: 2025-01-07. View source code for this page here.

Much of the point of a model like o1 is not to deploy it, but to generate training data for the next model. Every problem that an o1 solves is now a training data point for an o3[...] I am actually mildly surprised OA has bothered to deploy o1-pro at all, instead of keeping it private and investing the compute into more bootstrapping of o3 training etc. (This is apparently what happened with Anthropic and Claude-3.6-opus - it didn't 'fail', they just chose to keep it private and distill it down into a small cheap but strangely smart Claude-3.6-sonnet.)

My next book, I'm writing even more for the AIs. Again, human readers are welcome. It will be free.

But who reviews it? Is TLS going to pick it up? It doesn't matter anymore. The AIs will trawl it and know I've done this, and that will shape how they see me in, I hope, a very salient and important way.

Tyler Cowen09 Jan 2025source

Chat-driven programming. [...] It requires at least as much messing about to get value out of LLM chat as it does to learn to use a slide rule, with the added annoyance that it is a non-deterministic service that is regularly changing its behavior and user interface. Indeed, the long-term goal in my work is to replace the need for chat-driven programming, to bring the power of these models to a developer in a way that is not so off-putting. But as of now I am dedicated to approaching the problem incrementally, which means figuring out how to do best with what we have and improve it.

David Crawshaw06 Jan 2025source

A lot of the value I personally get out of chat-driven programming is I reach a point in the day when I know what needs to be written, I can describe it, but I don't have the energy to create a new file, start typing, then start looking up the libraries I need... LLMs perform that service for me in programming. They give me a first draft, with some good ideas, with several of the dependencies I need, and often some mistakes. Often, I find fixing those mistakes is a lot easier than starting from scratch.

David Crawshaw06 Jan 2025source

The world needs [more, better, harder, etc] evals for AI. This is one of the most important problems of our lifetime, and critical for continual progress.

Logan Kilpatrick02 Jan 2025source

There's a variety of words that I wish we had, which we do not yet have. One word is we have the concept of "alpha" in finance, and alpha - one Greek letter smuggles in a huge amount of understanding about how the world works. I would love to be able to describe someone's alpha above the LLM baseline in discussing a topic. Because there are a lot of human writers in the world who have no alpha above the LLM baseline, and that's been true since before LLMs were a thing. The Twitterism is sometimes "this person is an NPC" - there is no intellectual content here, the performance of class and similar can allow one to pretend that there is intellectual content, but there is no intellectual content.

patio1102 Jan 2025source

Easy prediction for 2025 is that the gains in AI model capability will continue to grow much faster than (a) the vast majority of people's understanding of what AI can do & (b) organizations' ability to absorb the pace of change. Social change is slower than technological change. This all means that things will get weirder and the weirdness will be unevenly distributed.

Ethan Mollick01 Jan 2025source

A lot of better informed people have sworn off LLMs entirely because they can't see how anyone could benefit from a tool with so many flaws. The key skill in getting the most out of LLMs is learning to work with tech that is both inherently unreliable and incredibly powerful at the same time. This is a decidedly non-obvious skill to acquire!

Simon Willison31 Dec 2024source

The efficiency thing is really important for everyone who is concerned about the environmental impact of LLMs. These price drops tie directly to how much energy is being used for running prompts. There's still plenty to worry about with respect to the environmental impact of the great AI datacenter buildout, but a lot of the concerns over the energy cost of individual prompts are no longer credible.

Simon Willison31 Dec 2024source

General consensus in the replies and quotes of this seems to be that the entire concept of "AI skills" is a joke - how hard is typing text into a chatbot, really? I will continue to argue that it's genuinely difficult, and that the challenge in using these tools is widely underestimated

Simon Willison26 Dec 2024source

The interesting part is that they will crush tests but you wouldn't hire them over a person for the most menial jobs. It's a neat challenge how to properly evaluate the "easy stuff" that is secretly hard because of Moravec's paradox. Very long contexts, autonomy, common sense, …

Andrej Karpathy24 Dec 2024source

ARC is a silly benchmark, the other results in math and coding are much more impressive. o3 is just o1 scaled up, the main takeaway from this line of work that people should walk away with is that we now have a proven way to RL our way to super human performance on tasks where it's cheap to sample and easy to verify the final output. Programming falls in that category, they focused on known benchmarks but the same process can be done for normal programs, using parsers, compilers, existing functions and unit tests as verifiers. Pre o1 we only really had next token prediction, which required high quality human produced data, with o1 you optimize for success instead of MLE of next token.

One very important thing to understand about the future: the economics of AI are about to change completely. We'll soon be in a world where you can turn test-time compute into competence -- for the first time in the history of software, marginal cost will become critical.

François Chollet20 Dec 2024source

A central property in formal software engineering is compositionality: the idea that composite systems can be understood in terms of the meanings of their parts and the nature of the composition, rather than by having to look at the parts themselves.

This idea lies at the heart of piecewise development: parts can be engineered (and verified) separately and hence in parallel, and reused in the form of modules, libraries and the like [...]

Current AI systems have no internal structure that relates meaningfully to their functionality. They cannot be developed, or reused, as components. There can be no separation of concerns or piecewise development.

Eerke Boiten06 Nov 2024source

I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.

Andrej Karpathy09 Dec 2023source