Originally posted: 2024-12-08. Last updated: 2024-12-19. View source code for this page here.

This post contains reflections on the impact of LLMs after two years of intensive use. I cover three areas where I've found them most useful, and how capabilities in these area may improve in the short term:

  1. Intermediating information to present it in an easier to use format
  2. Writing code
  3. Education and personalised tutoring

This pattern is emerging in multiple applications, and I find it genuinely useful. For example, ChatGPT Search and Perplexity intermediate between web pages and the reader. Office 365 Copilot and Apple Intelligence summarise emails. Custom GPTs and Claude Projects both allow you to chat with a knowledgebase. In the year since I last mentioned it, it feels like this has gone from a likely future development to real, working tech that's well on the path to being widespread.

A straightforward extension of this which is already emerging is that team and corporate knowledge bases will be accessed thought a chat interface.

I have done this for my work here. In the space of a year, this has gone from totally useless to being pretty good. The improvement seems due to a combination of longer context models and improvements in RAG implementations. In the longer term, I imagine this will largely be automated, and the knowledebase will be able to include pretty much all team communications - Slack, emails, video calls, presentations and so on.

For now, I'm starting to wish that all the teams I work in had a person curating the team's knowledge into context document that I can provide to an LLM.

Over the past year there has been dramatic progress in the utility of LLMs to software developers, both in terms of the quality of models, and their integration with programming tools (e.g. Cursor with Sonnet 3.5). I use LLMs daily to help me code, and I estimate it increases my programming productivity by at least 50%.

Whether scaling is over or not, it's likely LLMs' coding ability will improve faster than their general abilities, since there's a close feedback loop. In addition, the models themselves help developers quickly integrate them into programming tools.

As this trend continues, increasingly the skill of software development will be mostly the ability to precisely describe the desired functionality, and to arbitrate the quality of code outputted by the LLM and iteratively improve it. This has always been the hard part of software development, so the impact is more limited than it first appears, but it does lead to significantly increased pace of delivery and smaller teams.

One key benefit is that working prototypes are much quicker to develop, so mapping out the space of ideas, and iteratively improving is much faster.

Since I was in school over 20 years ago, software has always promised to radically improve education by offering personalised experiences to match students' abilities. But until now it's massively underdelivered. I think that's now changing.

Even in its current state, Open AI's Advanced Voice Mode often explains things better than I can to my 6 year old. It tailors its answers effectively to his ability. I use it multiple times a week, for a mix of quick questions and longer conversations with my kids. Sure, it makes mistakes, but so do I.

It seems likely these capabilities will develop in multiple dimensions, resulting in an expert personal tutor for all children:

  • LLMs need to develop better long term memories of past conversations. Closely related, models need to be customised to be aware of curriculums, prerequisite knowledge, and children's current abilities.
  • Advanced Voice Mode needs web search capabilities, or the ability to work from a knowledgebase or long-context prompt.
  • Models need to improve their multimodal capabilities - the ability to output diagrams and graphics in particular.
  • Virtual tutors should be able to easily transition between didactic and conversational styles. It should be possible to 'enter the conversation' during any educational video or podcast and ask questions. But as Karpathy says, "chat is hard. You don't know what to say or ask". So it should be possible for the tutor to take over and direct the conversation when chat runs out.

I'm optimistic about this because these feel like iterative improvements on existing tech rather than requiring new breakthroughts.

To be clear, from my experience using these tools with my kids, I don't think this eliminates the need for teachers. I do think it empower teachers to be much more effective. As with most other uses of LLMs, it requires practice and effort to learn how to use them well, and the way in which teachers will use them are probably personal to their style.

It's impossible to predict just how good LLMs will get, but I'm particularly excited about the possibility of more agentic behaviour and computer use - a few examples:

  • Agentic behavior in code editors - e.g. the ability to run a debugger to iteratively detect and fix problems
  • Effective computer use - e.g. filling out a PDF forms, doing annoying jobs like filling out an expenses sheet
  • LLMs will begin to write PRs in response to Github issues.

To begin with, I expect agentic behaviour will be relatively expensive and slow, since it will require a large number of LLM prompts. But I don't expect it to be too long before it's possible to set an LLM on a fairly open ended task like "refactor this code to implement a few ideas for performance enhancements and benchmark each one for me". So employers may need to prepare for how to pay for £20 API calls which deliver £100 of value.

Overall, given how useful LLMs already are to my effectiveness at work, it's difficult for me to imagine them not having substantial economy-wide impacts in the long run. However, I think adoption will be slow. Some killer 'universal applications' may emerge, but much of their value is in personalised, individual use cases that users tailor to their individual jobs and ways of working. As a result, it takes time and conscious effort for people to learn how to use them effectively. So in the short term, I think we'll continue to see them everywhere except in the producivity stats.