Originally posted: 2024-12-08. Last updated: 2024-12-19. View source code for this page here.
This post contains reflections on the impact of LLMs after two years of intensive use. I cover three areas where I've found them most useful, and how capabilities in these area may improve in the short term:
This pattern is emerging in multiple applications, and I find it genuinely useful. For example, ChatGPT Search and Perplexity intermediate between web pages and the reader. Office 365 Copilot and Apple Intelligence summarise emails. Custom GPTs and Claude Projects both allow you to chat with a knowledgebase. In the year since I last mentioned it, it feels like this has gone from a likely future development to real, working tech that's well on the path to being widespread.
A straightforward extension of this which is already emerging is that team and corporate knowledge bases will be accessed thought a chat interface.
I have done this for my work here. In the space of a year, this has gone from totally useless to being pretty good. The improvement seems due to a combination of longer context models and improvements in RAG implementations. In the longer term, I imagine this will largely be automated, and the knowledebase will be able to include pretty much all team communications - Slack, emails, video calls, presentations and so on.
For now, I'm starting to wish that all the teams I work in had a person curating the team's knowledge into context document that I can provide to an LLM.
Over the past year there has been dramatic progress in the utility of LLMs to software developers, both in terms of the quality of models, and their integration with programming tools (e.g. Cursor with Sonnet 3.5). I use LLMs daily to help me code, and I estimate it increases my programming productivity by at least 50%.
Whether scaling is over or not, it's likely LLMs' coding ability will improve faster than their general abilities, since there's a close feedback loop. In addition, the models themselves help developers quickly integrate them into programming tools.
As this trend continues, increasingly the skill of software development will be mostly the ability to precisely describe the desired functionality, and to arbitrate the quality of code outputted by the LLM and iteratively improve it. This has always been the hard part of software development, so the impact is more limited than it first appears, but it does lead to significantly increased pace of delivery and smaller teams.
One key benefit is that working prototypes are much quicker to develop, so mapping out the space of ideas, and iteratively improving is much faster.
Since I was in school over 20 years ago, software has always promised to radically improve education by offering personalised experiences to match students' abilities. But until now it's massively underdelivered. I think that's now changing.
Even in its current state, Open AI's Advanced Voice Mode often explains things better than I can to my 6 year old. It tailors its answers effectively to his ability. I use it multiple times a week, for a mix of quick questions and longer conversations with my kids. Sure, it makes mistakes, but so do I.
It seems likely these capabilities will develop in multiple dimensions, resulting in an expert personal tutor for all children:
I'm optimistic about this because these feel like iterative improvements on existing tech rather than requiring new breakthroughts.
To be clear, from my experience using these tools with my kids, I don't think this eliminates the need for teachers. I do think it empower teachers to be much more effective. As with most other uses of LLMs, it requires practice and effort to learn how to use them well, and the way in which teachers will use them are probably personal to their style.
It's impossible to predict just how good LLMs will get, but I'm particularly excited about the possibility of more agentic behaviour and computer use - a few examples:
To begin with, I expect agentic behaviour will be relatively expensive and slow, since it will require a large number of LLM prompts. But I don't expect it to be too long before it's possible to set an LLM on a fairly open ended task like "refactor this code to implement a few ideas for performance enhancements and benchmark each one for me". So employers may need to prepare for how to pay for £20 API calls which deliver £100 of value.
Overall, given how useful LLMs already are to my effectiveness at work, it's difficult for me to imagine them not having substantial economy-wide impacts in the long run. However, I think adoption will be slow. Some killer 'universal applications' may emerge, but much of their value is in personalised, individual use cases that users tailor to their individual jobs and ways of working. As a result, it takes time and conscious effort for people to learn how to use them effectively. So in the short term, I think we'll continue to see them everywhere except in the producivity stats.