Episode 1: AI is Eating The World

Mar 11

In This Podcast

In this episode, tech analyst Benedict Evans joins hosts Ayesha Khanna and Bernard Leong to explore the rapid evolution of AI. They discuss the scale of AI investment, its real-world applications, and whether generative AI is truly a revolutionary shift or just another tech hype cycle. Benedict breaks down how businesses are adopting AI, the challenges of integrating it into workflows, and the broader impact on industries and jobs. With insights on scaling laws, open-source vs. closed models, and the future of automation, this conversation unpacks AI’s role in reshaping the world as we know it.

About the guest

Benedict Evans is a distinguished technology analyst with over two decades of experience in mobile, media, and technology sectors.

His career encompasses roles in equity research, strategy, consulting, and venture capital. Notably, he served as a partner at Andreessen Horowitz in San Francisco and later joined Mosaic Ventures in London as a venture partner.

With years of experience, Benedict brings a sharp perspective on whether AI is overhyped or if we’re only scratching the surface of its true potential.

Transcript from this podcast

Bernard Leong: Hi, I'm Bernard.

Ayesha Khanna: I'm Ayesha.

Bernard Leong: We are the hosts of the new Augment podcast. Ayesha, we've been talking about doing this podcast for a long time. We tried all kinds of things from using LLMs and seeing every other podcast. And we finally got our first episode. So who do we have on the show today?

Ayesha Khanna: We had Benedict Evans, who's an amazing analyst from the U.S. who's been widely covering technological disruption.

Bernard Leong: Yes.

Ayesha Khanna: It was a great conversation.

Bernard Leong: Formerly from Andreessen Horowitz, and he has also one of the top newsletters. To really start off this inaugural podcast, you wrote a pretty good presentation that was so aligned with when we are starting with AI in the world.

Bernard Leong: So we're going to talk to Benedict today and let's get started. Hi, Benedict. It's an honour to actually have you here on our inaugural podcast. Many of our listeners will definitely know you for your incredible annual presentations that capture the pulse of technology and your weekly newsletter, which I'm a subscriber of. But for those who might not be familiar, can you just share a bit about yourself and what is currently exciting you in your work?

Benedict Evans: What can I share about myself? I'm an analyst on the East Coast, an enjoyable influencer and thought leader on the West Coast, depending on how you want to describe it. I've spent sort of 20 years working in media and telecoms and consulting and research and banking, trying to understand and trying to explain what was going on. There was a time when that was mobile, there was a time when it was smartphones, or e-commerce. And now the one thing that everyone is trying to understand is this current wave of AI.

Ayesha Khanna: Benedict, speaking about AI, tell us about how you are personally using it in your daily work.

Benedict Evans: I think there's maybe two ways to answer that. One of them is, what do you mean when you say AI? Because I'm calling you using the camera in my MacBook and it's applying a background blur and it's doing that using machine learning. There's a sort of sequence in that something starts out as science fiction, and then it becomes, when it works, it's AI. And then when it's been working for a while, it's not AI anymore, it's just software. It starts out as AI, and then it becomes smart, and then it's auto. It's autocorrect, and then it just disappears, and you think, oh, well, that's just software. And so we're all using lots of things every day that 20 years ago would have been called AI, and now it's just software.

If we're talking about generative AI, generally large language models, or whatever the next thing is in that sequence, I don't really have any particular use cases where this is helpful. I don't write code, I don't brainstorm, I don't need something to write me generic text, I don't need something to make images for me or make video. And so I don't really have use cases where the raw LLM out of the box is useful. The use cases that I have will be things where entrepreneurs sit and solve some specific problem rather than just say, here you are, open ChatGPT and work it out. I don't need meeting transcriptions, for example, which I think you're using.

Ayesha Khanna: What about search? Do you use that? Because all the genius search engines now, is that helpful?

Benedict Evans: I think there's a big open question around this. Going back to the beginning, you can split search into a bunch of different categories. The obvious one is navigational search, where you want to find Singapore Airlines website, or that would be an easy one, but you want to find the website for a particular organization or hotel or something. That's pure navigational search. Clearly something like complexity isn't useful for that.

Then there's questions where you might be looking to find the answer, or you might be looking for a page that has the answer and it's not necessarily clear which it is. It may be that you are looking for a page that has the answer because that's how you've done it in the past. But it might be that what you actually want is just the answer. And there it may be that generative AI will be quicker at doing that because it goes off and synthesizes things. I don't do that much of that kind of search. I'm generally looking to find—most of my search I realize is navigational rather than informational. I'm generally not using Google to find the answer to a question. I'm using Google to find me a particular PDF that covers a particular thing. I'm looking for it to find me a source. I'm not looking for it to tell me the answer. Some of that is habit. Some of that is just people have different use cases or different ways of approaching problems.

Bernard Leong: So Benedict, I'm very pleased to actually finally meet you in early September with my wife and we had the lunch. During that conversation, you always talk about every year the deck that you produce tells a pretty nuanced story about technology. What's the central narrative of your latest presentation "AI is eating the world"? And why do you think that the story is actually particularly relevant for now?

Benedict Evans: There's different ways I could answer this question. I think what I was trying to do was to take stock and say, where are we and what are the questions and what are people trying to do? And we'd moved forward a long way from a year ago. If I was doing a presentation on e-commerce, there's not a huge amount that you would say differently from year to year, but clearly in the state of AI at the moment, there is.

And it seemed to me that there were sort of three categories you could group all the questions into. Firstly, the effort to scale these things, the surge of investment, the number of people building models, the way those models are evolving. Secondly, a set of questions around what are these models good for exactly? What are they best used for? What do you do with them? How do you work out what you do with them? And thirdly, which kind of overlaps quite a bit with the first and the second, how do you build products with them? How do you deploy them if you're a big company? How do you buy it as a startup? How do you build things with this?

Obviously all these questions overlap because if it turns out that these models get 100 times better, then that fundamentally changes what the products might be and maybe the LLM can just kind of do the whole thing and you don't need products. And also, of course, that changes what you can do with them. And what you can do with them changes what kind of products you would build with them.

But I suppose the questions are that you have these fairly common patterns for how you deploy new technology, and you can sort of slot this in and say, well, it sort of works like this. For example, generally, when you have a new technology, the incumbents will try and make it into a feature, and big companies will use it to integrate into the workflows they already have. So you start by making the new tool fit what you're already doing.

And then over time, people work out entirely new things that you could do with this. And big companies work out kind of new products and startups create new ways that they can unbundle Microsoft or Google or Oracle, or indeed that they can unbundle a bank or an insurance company or a retailer. And then the third step is that every now and then somebody works out a way that you can redefine the market, which is a sort of canonical example of Uber redefining what you mean by hotel.

And so what I tried to do in the presentation is to kind of map these sorts of ways that things generally change with ways that generative AI is evolving and the kind of questions about how those might all fit together. And I don't think anyone really knows the answers yet. Primarily because we don't know where the science is going to go. We don't know how much better these models are going to work again.

Ayesha Khanna: I think that's the key that I wanted to kind of double click on, Benedict, was where does generative AI fit into these computing kind of breakthroughs that you have covered, right? So is it highly hyped at the moment? Is it a structural shift that we're going to see in the tech landscape? And you have some thoughts on that. I'd love to hear some more about that.

Bernard Leong: Just to clarify that, right? Essentially, is it in the dot com era, the open web, or is it the mobile platform shift or even cloud computing shift as well?

Benedict Evans: Well, clearly it's very hyped at the moment. I don't think that's, there's not really any doubt about that. And it's worth remembering that the dot com bubble, the mobile bubble turned out to be right. They were just, it just took 10 years. Everything that everyone said about the internet in 1999 happened, it just took until about 2010 or 2015. Everything that people said about what we would be doing with mobile phones in 2000 and then again in 2010 did happen, it just took kind of 10 or 15 years.

So that's kind of true of most bubbles, or a large class of bubble. It is kind of true, it's just not at that price and not today. Getting right back to things like the railway manias, railways did completely transform the world, but if you invested all your money in railway stocks in 1840, you probably lost all your money. So something can be overhyped and indeed a bubble and yet also be extremely important. I think it's important to remember that.

Secondly, the character of the debate around this is, is this platform shift that's sort of comparable to smartphones or the web or Sass or open source or SQL? Is it kind of one of those kinds of technology shift? Or is it something much bigger than that? Is it more like the arrival of computing? The more excitable people would talk about electricity or something. But it may be that it's much more radical than just a change in which platforms we use to build software.

Bernard Leong: So there is this concept of the scaling law, right? And this is what mostly OpenAI, Anthropic, they are all advocating about. The more data you put in, the AGI is going to get there and we're going to get there somewhere. But I think there's some recent talk about we're nearing some limits. How does that actually affect the pace of innovation in terms of the broader AI narrative? I mean, myself, I have a very different perspective on thinking about the scaling laws, but maybe I want to hear what are your thoughts on that?

Benedict Evans: Well, there isn't really a scaling law. There's a scaling observation, which is kind of like Moore's law. Moore's law is an observation. At a primary level, we don't really understand why these models produce results as good as they do. We made the models much bigger, and they got much better results, and we don't really understand why. And so we don't understand, we don't have any way of predicting what would happen if you made the models ten times bigger. It seems like they'll probably carry on getting better, but we don't know. We don't have a theoretical basis to explain why the results are so good and what would happen if you would make the model X percent bigger. So we'll try and we'll find out.

Now, in the last couple of months, there's been a lot of discussion that people at the Frontier Labs seem to be running, that that seems to be not working so well anymore. And then there's a lot of highly technical debate about whether that's really true and why that would be. OpenAI has pursued this new approach of essentially using vastly more compute when you run the model, as opposed to when you train the model, which has the downside of being much more expensive, at least for now. So that's what you get with the O1 and then the O3 model that they announced just at the end of last year.

So it may be that progress continues, but maybe not in quite the same way that it did before, or certainly not with the same technique that it was in before. And then we'll have, well, but we'll carry on the scaling just with different techniques. It may be that we get O3 to be much cheaper, but we don't know if O4 isn't much better or not. And again, the challenge is we don't know, we don't have kind of a good theoretical model that would allow us to predict this.

Bernard Leong: But then that's the case, right? But with all the foundation models that requires quite massive investments in servers, GPUs, or NVIDIA chips, for that matter, right? Do you think that it would let these big tech companies actually create a more sustainable competitive advantage or it's just only a short term barrier to the foundation model makers at the moment?

Benedict Evans: I think the one thing that's become extremely clear in the last sort of 12, 13 months is there's not really any fundamental barrier to creating LLMs as we understand them. All you need is money. In 2023 OpenAI had an LLM. In 2024, anyone with a couple of billion dollars could have an LLM.

If you look at things like the model that DeepSeek dropped at the end of last year, they got pretty much state of the art performance, at least on some benchmarks, with an order of magnitude less compute and a lot less data. And so it may be that we get the same results with radically less cost. It may be that we get much better results with more or less the same cost. Again, it's kind of tough to say.

Except you can kind of make these deterministic observations that clearly where we started was that it didn't really matter what it cost and you certainly didn't matter what the inference cost was because you didn't have any users. And then when you've got 100 million, half a billion, a billion users, then suddenly the inference cost matters as much as the training cost, or maybe more.

And so we had a huge amount of, on the one hand, kind of science and optimization, on the other hand, just kind of straightforward data center engineering and semiconductor engineering to try and get the operating costs of this down. And so the costs have come down, pick your metric, but at least an order of magnitude in the last 18 months, maybe towards two orders of magnitude, depending on how you count it.

So there's still marginal cost. And it does still cost money to train a model, but it's not clear that it's going to be billions of dollars. But even if it is a half a billion dollars or a billion dollars to train the next model, well, there's quite a lot of people that got half a billion or a billion dollars.

And the difference, partly just because tech is so much bigger than it was in the past, but the difference from, say, smartphone operating systems or search or social networks, is there doesn't really seem to be a network effect. There doesn't seem to be a mechanic whereby your model gets better because it gets better, and more people use it. This is the classic reinforcement effect in search, which we saw discussed in the DOJ antitrust case last year. Everyone uses Google because it's better, and it gets better because everybody uses it. And so they can see what you're clicking on, and what you're searching, and what you search next, and what you search before. And then they give you ten results, and you tell them which one is good. And so Google gets better because everybody uses it, and then everyone uses it because it gets better.

There doesn't seem to be any such mechanic in building large language models. It really does just seem to be having more money, having enough money to build them. And so there's a kind of a whole class of technology, which is both very expensive and involves an enormous amount of science and engineering, and is also a commodity. You can think of flat panel screens—there's a lot of Nobel Prize winning science that goes into making a 50 inch TV that is an inch thick. But they're also a commodity. And all things being equal, it kind of seems like LLMs are going to be like that. They're going to be commodity infrastructures that sell at marginal cost.

Ayesha Khanna: And with all these, now that the new thing is talking about AI agents, AI agentic workforce, and new innovations like reasoning models, that's a new trajectory that we see in the narrative. There's a lot of enterprise automation and use cases. What is your view on that, Benedict? Are you seeing that enterprises are using this? Are they legitimate models across different industries? And which industry is doing perhaps better than the other in adopting generative AI?

Benedict Evans: That's kind of four or five different questions.

Ayesha Khanna: So let me just go to the first one, which is, do you think it's translating into enterprise real world use cases?

Benedict Evans: We're already seeing a lot of enterprises—you go and look at sort of survey data from Bain, Morgan Stanley or Goldman Sachs, basically every enterprise has got pilots. Three quarters of enterprises have got pilots. And then a bunch of people have got something in deployment. It depends a bit on which sectors you look at, or it's not so much which industries, it's which functions within a company. Say a quarter of people have got something, and a quarter of companies have got something in deployment.

Now that's very different from saying a quarter of stuff has shifted to generative AI. No, a quarter of companies have deployed at least one thing. And that has tended to be concentrated—going back to your first question, like, what do I use this for? There is stuff where it's immediately obvious and straightforward to see what you would use this for, which is basically writing code, where people talk about across the industry 20-30 percent efficiency improvement, and marketing. And to some extent, customer support, although you have to be kind of careful how you build that because it can give your customers answers that look good and aren't right. But those fields, that's kind of where the adoption tends to have been concentrated so far. So it's not really by industry, it's by function within companies that it's been adopted.

And then if you look at survey data from the USA asking individuals within companies, are you using this and what's the user rate? The answer is that in some sectors like management, software development, 20, 30, 40 percent of people say they're using this every day. Again, what does every day mean? Does that mean all day? Or does that mean when I looked at it for five minutes this afternoon? Then you go across to other functions like finance or law, no one's using it. Some of that is about speed of adoption and culture. Some of it is also who has the kinds of problems that this is good at out of the box and who kind of doesn't.

Ayesha Khanna: So would you say, Benedict, I think that's helpful that people are using it. And maybe the next phase people are talking about is that it won't be people directly interfacing with generative AI, but that there'll be more back end processes that would be communicating with each other. At least that's what ServiceNow and Salesforce and some of these other companies are saying is next.

Benedict Evans: Well, first of all, of course, bear in mind the incumbents always try and make the new thing a feature. If your business is managing backend processes, then they'll say, this is great for backend processes. Whatever your business is, you will say, well, of course that just enables the stuff that we're already doing. And it makes our existing products better. It doesn't fundamentally change how they work. And so of course, everyone always says that. The incumbents always say that. And sometimes they're right. And they're generally right to begin with anyway.

Going back to my example of Airbnb, the internet was clearly an existential challenge to travel agents. But it wasn't an existential challenge to hotels because hotels are still real estate. Until Airbnb 15 years later comes along and redefines what it means when you say a hotel. So that's sort of one way to think about this.

Another way would be that it kind of takes time to work out the right thing to do with any new thing and it takes time to build enterprise software. One of the existential debates around how you build stuff with LLMs, which intersects very much with how much better they get, is whether the chatbot is the right user interface, or whether you should abstract the LLM away and wrap it in tooling and auditing and management and code and effectively treat it as an API call within traditional software.

And that's, of course, how we use machine learning. You have some application where it would be useful to have sentiment analysis. So you have an API call to a sentiment analysis platform that's using machine learning. But the user doesn't get told, like, oh, this is the AI button now, and you're going to do AI instant sentiment analysis.

And where we are now is, coming back to this question of who's adopted this first. Where is it that the chatbot by itself is easy to use? Well, it's writing code and it's marketing, but if you want to use this within some workflow where you can say, synthesize what's in all of these reviews, you don't want the user to have to work out how I would tell ChatGPT which reviews I wanted to synthesize and then work out to tell it how to log into my CMS and pipe the review summaries back out into that. You want that to be an API. You want it to be an automated process that runs like other APIs inside a deterministic system.

And so this is the long term, big long term question is, do you use probabilistic systems? Do you use LLMs to control deterministic systems? So do you make everything else an API call to ChatGPT? Or, do you make ChatGPT and Gemini and Claude and everything else, do those end up basically as just API calls inside a whole new class, inside everything else? Inside everything else, and inside a whole class of new stuff, doing things that you couldn't have done before, but still fundamentally wrapped in GUI and sales forces and routes to market and ARR and all the kind of traditional ways that you think about building software.

Ayesha Khanna: And that's a really good point. And kind of, I think where the big companies want is they want all of that, but they still are meeting this new open source companies and open source LLMs that are coming out that are challenging what companies are going to have as closed source models, closed source weights. And where do you see as the gap between them is narrowing when you're kind of evaluating the progress of AI or generative AI, open source versus closed source as the gap is narrowing. What does that mean for this?

Benedict Evans: As a big company, you don't buy open source, you buy product. If you're JP Morgan or a giant corporation that writes its own code, then yes, that makes a difference to you. But the way that most companies deploy most software is that a vendor comes to them with a product that solves the problem.

So if you're a giant law firm, you don't buy sentiment analysis. You don't buy translation. You don't buy AWS's translation API. You buy legal software that has tooling and auditing and tracking and permissioning and version control and all this kind of stuff and seat management and does all the stuff that lawyers need around this, so that they can control and manage and understand what your sentiment analysis or your clustering or your translation is doing. And that's the same when they buy generative AI, they don't buy an API key to OpenAI.

So as an enterprise, whether it's open source or not is kind of an interesting question. But at a primary level, you're a VP in HR, and somebody's offering you a software tool to help you manage your graduate recruitment flow. And this new thing does stuff that you couldn't do before because it's using an API call to an LLM. But do you care which LLM it's using? Not really. What you care is does this product make my graduate recruit process and graduate recruit hiring work better. That's your business objective.

Now, where the strategic value interest in open source is, that what Meta is trying to do is to turn this into cheap commodity infrastructure that sells at marginal cost. Whereas, if you are OpenAI, what you want is to sell people API keys. Meta's business is not selling API keys. Meta's business is social media. So Meta wants this to be as cheap as possible. And by making it free, they want to pull innovation and investment and company creation to be built around this model. Firstly, so that it will get cheap. Secondly, so that people will be dogging around the tools that they're using, which is good for them. But if you are Google, then your business is quite different, so you try and build it from a different direction.

Bernard Leong: If I were to just add on to Ayesha's question, right? So there's a scenario where the open source models would dominate specific domains and industries, and then the closed source models actually maintain a certain age, maybe in the consumer side.

Benedict Evans: No, I don't think that's right. If you're building a product as a consumer or an enterprise, you don't buy models. You buy a product that solves the problem. And the product that you use may be built with any number of different APIs from a number of different model providers. Or it may be, if it's coming from Google or Microsoft, it may be their own model. It may be somebody else's model. And as a consumer, you don't care whether the model is open source or closed source. You don't know. You don't know which model it's using.

Just as if you do your expenses and your expenses app takes a photograph of your receipt, you don't know which image recognition library they're using. And you certainly don't know or care whether it's open source. It's a meaningless question. That's much less abstracted away, much further down stack.

Now, as a developer, you might care or might not care whether it's open source because you might be building it yourself or you might say, well, we need to get this task done and we're going to evaluate Llama running on this hosting platform and also going to evaluate GPT Turbo, and we're going to evaluate Gemini, we're going to evaluate the different models and the different pricing and the different options and choose which model we use. You don't care at a fundamental product level which one is open source necessarily. What open source does is drive down the cost of the models.

Bernard Leong: That's right. So actually my question is, let me reframe it differently, right? It's actually the knowledge base, right? Suppose you are a pharmaceutical company. And you're a large pharmaceutical company and you are using, you're fine tuning it with say, a Llama 3 or whichever open source models, but you want a product that actually allows you to get correct predictions of drug discovery, drug targeting. Then you may have the incentive to actually train your own models, right? In that scenario, does it mean that only those industries that have very specific domain knowledge are the ones that are actually going to be building their own models? Or, what you're alluding to is probably, I don't really care as a pharmaceutical company, as long as Google has a pretty good large language model that has trained up most of the open data in medical science.

Benedict Evans: Well, it depends. It depends on what you're trying to do. If you are looking for a fairly generic commodity LLM capability, you'll evaluate five different models and maybe switch between them. I don't think very many people are trying to train their own foundation models right now, simply because of the amount of expertise, the ability to get access to that expertise, and of course, access to just the infrastructure to train the models and then what it costs to do that.

So there's a dozen technology companies that are doing this. But we're not in a situation yet where every pharmaceutical company is going to have 20 LLMs. Now, clearly we are in a situation where every pharmaceutical company has hundreds of databases, or thousands of databases. If you go back to the 1970s, there were maybe 70,000 mainframes on Earth. And so if you cared, you could probably have sat down and worked out how many databases there were on Earth. It's a fairly straightforward calculation to do.

Today, if you were to say, how many databases are there on Earth, like, what does that even mean? There's probably 50 on your phone. It's like asking, how many light bulbs are there on Earth? Well, it's a fun interview question, but otherwise, what does that even mean? And there's a bit of similar, so the point is, 10 years ago, you could have said the same thing about machine learning models. And there was this sort of moment when people thought, well, only Google will be able to, only giant tech companies that have all the data will be able to train a machine learning model. And that turned out not to be true.

And today, again, how many machine learning models are there on earth? Again, it's like asking how many spreadsheets are there? It's a meaningless question. It kind of is possible today to say how many LLMs are there? And the answer is like hundreds. But, of course, that's including 50 different versions made by OpenAI and made by Google, but it's certainly three digits, it may be two digits, depending on how you count it. And this question of how much do they scale versus how cheap do they get kind of gets to that question.

So it may be that you can get more or less what you get out of GPT-4 for a couple of million dollars in three years. Maybe. I mean, I don't think we really know. In which case, but then, of course, the question would be, well, why should you build, why would you build your own? And that applies to all software. You get to very classic IT procurement questions. Should we build our own? Should we customize something off the shelf? Should we just take what's off the shelf? Should we hire Accenture to build it for us? Should we go with Microsoft or Google? Do we expense it? Is it CapEx? Is it OpEx? These are all very traditional IT procurement questions. And this kind of comes back to the same question.

Ayesha Khanna: So shifting gears a little bit. AI innovation is happening, but then other technological innovations are also driving disruptions. Edge computing, 5G, Internet of Things. How do you see the interplay of these? And how that's going to be transformative in terms of AI acceleration.

Benedict Evans: All the things that we were excited about the day before ChatGPT 3.5 are kind of still there. Or not, as the case may be. I mean, I don't think anybody in tech is excited about or interested in 5G, it's just faster network. There's like three companies in the world that care about network slicing. If you're running a very large operation, if you're like Hyundai Heavy and you're building ships, then 5G is interesting, 5G networks are interesting. Nobody else cares. IoT, I haven't heard that term in 10 years.

But, you know, Google has like 50, 55 billion dollars of ad revenue in the last 12 months. Amazon's platform is doing GMV at like a third of Amazon. Shein is the world's largest apparel retailer. BYD will probably overtake, if they haven't already—I forget the numbers that were out yesterday—but if they haven't already overtaken Tesla, they will. Is the Chinese car industry going to do in the next 10 years what the Japanese car industry did in the 80s and just run over the rest of the world and crush a whole bunch of incumbents?

And all of those questions, and meanwhile, there's still the conversation about VR was five years ago, 10 years ago, and is still five years away, and may always be five years away. And then the crypto people are still out there as well. All the tourists and the crooks and the scammers all left crypto, mostly, but there's a whole bunch of very clever people who are working very hard building some sort of next generation financial plumbing on blockchains. Whether it will be anything more than that, I'm pretty skeptical, but it may be, but certainly there's a bunch of people trying to build financial infrastructure on blockchains.

So all the stuff that people cared about before is sort of still there and still happening. It's just that all of the attention in tech two years ago was basically talking about stuff that was already happening. Smartphones had happened. What's the next thing? VR isn't here yet. Crypto is not really here, maybe. Very polarizing subject, obviously. I can't think of anything else that was as polarizing as crypto. But then this happened, this thing started working and pretty much everybody in tech thinks, no, this is certainly as big a deal as smartphones or as big a deal as the web and possibly much bigger than that.

Bernard Leong: What about robotics? Because there's a lot of venture capital money now that's gone into robotics companies and I think the large language models have actually also taken this view into a new direction.

Benedict Evans: Yes, definitely. I think there's two or three things going on here. Clearly the primary thing is that combination of better hardware, but mostly AI, means that there's a lot of stuff that was quite hard to get robots to do that may be much easier and it may be much easier to get them to be more generalized to do more stuff without having to be everything kind of having to be carefully pre-programmed. And so there's a lot of renewed interest in robotics.

A subset of that, which I must confess kind of baffles me is the humanoid robot thing, where a lot of the use cases people talk about would seem to require general AI. Like, I don't quite understand why, if you want to automate moving pallets around a warehouse, you want to have like three human shaped robots walk around and pick it up. Why don't you just do what Amazon does and have a hockey puck that slides underneath and lifts it up? Why is the form that evolved from monkeys, whenever it was half a million years ago, the most efficient form? It reminds me of the observation people always make about Star Wars, which is why do you have people flying the ships? You've got robots. Why didn't R2-D2 fly the ship? Why do you have a cockpit? And that's kind of the humanoid robot thing puzzles me in that sense. Why does it have to look like a person?

Ayesha Khanna: I think Benedict, when you think about robots or machines or autonomous planes or drones, what are the unknown unknowns or unintended consequences of AI that we're really not paying enough attention to because we're just so excited about it.

Benedict Evans: Well, I think small consumer drones and their use in Ukraine is very interesting. I think you can get, and so there's a huge amount of interest now in, much more general than that, in applying modern Silicon Valley startup speed to military technology. I wrote something briefly about this in my newsletter a couple of weeks ago, that military technology has followed this kind of irresistible path of becoming radically more expensive with each generation, and taking longer to develop, and being slower to ship, and harder to manufacture.

There's an astonishing stat I saw the other day that in World War II the Allies produced something over a million aircraft. If you were going to just ask you as a guess like how many aircraft would you have guessed that Britain and America produced in the Second World War, you'd have probably said like tens of thousands, you would not have said millions. Which of course partly reflects that they were basically just like literally they were flying cars. The aircraft in World War II were literally flying cars. That's what they were. They were made with a car engine made by car companies. That's who made all of those aircraft.

And clearly today, a modern fighter aircraft costs $50 million and there's a reason why. And so there's a whole bunch of innovation and curiosity around—obviously Anduril, but also there's a bunch of other companies on how can you take, I suppose, the structural point here would be that in the fifties and sixties, the military got the cool technology first. Or the intelligence agencies got the really cool cutting edge stuff first. And then the military got it ten years later, and then big companies got it ten years after that.

So if you think about what happened with computers, they start out as one computer and it's there to break codes. And in the 60s and 70s, big companies get computers. And then small companies, and then a decade later, consumers start being able to get it. And now, of course, that sequence works completely the other way. The consumers get the cool stuff first, because of the economies of scale, the consumers get the cool stuff first, and then small companies, and then big companies, and then military gets it like 40 years after consumers get it.

There's this whole sense of curiosity, creation, interest in how that can be addressed. Particularly given that we now live in a world in which it is conceivable that you might go to war again, which after several decades or generations in which it kind of wasn't conceivable that you would have major industrial countries going to war with each other. So that's a whole sector.

There's a lot of stuff going on in bio and computational biology, which just isn't my field, but the introduction of software to that, and then now the introduction of AI to that has all of the people in that field very excited. People in Silicon Valley have started talking about nuclear again. That's another conversation. What else is going on? Fusion, maybe. Quantum, maybe. All sorts of other frontier technologies, which was a kind of tough fit for venture capital. But that's that sort of comment I made 10 minutes ago, like all the stuff that was kind of cool and exciting before we talk about ChatGPT is still there.

Ayesha Khanna: Benedict, what about these—we don't talk enough about what AI gets wrong, the error rates that it has. We talk about hallucinations and bias, but it also makes mistakes. And as it's getting embedded into healthcare, into finance, this could have a butterfly effect, potentially. What are some of your thoughts on how it should be managed, or how companies are managing it?

Benedict Evans: So there are some fairly fundamental philosophical puzzles in how to think about this. You can say at a mechanistic level what's going on, and then there's some puzzles in how to think about it. So mechanistically, machine learning in general is a statistical system. And we use, apply machine learning to systems that worked problems where logic didn't work or became impractical.

So you can use logic to calculate your taxes or to calculate a million people's taxes, you write down logical steps. You can't really use logical steps to do translation. You can't really use logical steps to do image recognition or sentiment analysis. And it turned out basically a very broad class of problem that we could never really get computers to do because we couldn't tell the computer what the logical steps would be.

The way I always used to describe this is it was kind of like trying to make a mechanical horse. Like, there's no law of physics that says you can't make a mechanical horse, but in practice they always fall over. Because you can't actually write, you can't actually write down enough logical steps to make it work. And you can't actually write down enough logical steps to say, how do I know that's a cat and not a dog? You think you can, and then you can't.

And so machine learning, ten years ago, and now LLMs, solved this kind of class of problem that you couldn't write, where you couldn't write down the rules, you couldn't write. So you make them by turning them into statistics. But statistics, by definition, doesn't produce deterministic answers. It's a probabilistic system, which is what I was saying earlier, you have a deterministic computer so far has always been deterministic. Indeed, all automation is deterministic. Steam engines are deterministic. They will do the same thing exactly the same way every time, every single time.

But we have this class of problem that we do not and do not know how to solve with a deterministic system. And we have been solving them using probabilistic systems and probabilistic systems do not produce the same answer every single time. And sometimes they produce answers that are quote unquote wrong. And we use this word error rate, or we say hallucination, which I think is a really bad word, partly because it anthropomorphizes it. It sort of suggests that these things know truth or that they're people. Lying is a particularly problematic term as well. It's like thinking that a calculator is intelligent. It's just a machine.

Now, so then you get a set of questions. So, first of all, if you use an LLM and you try and get it to answer deterministic questions, and then you say, look, it got them wrong. Well, well done. Good for you. But what have you really proved? It's kind of like there are a lot of puzzles around this, because whenever the new thing arrives, it always tends to be bad at the thing that was important to the old thing.

And so if you were looking at an Apple 2 in the late 70s, as someone who ran IT for a big bank, and you said, well, does this have five nines reliability? Can it handle a million transactions a second? If I buy a hundred of them, can it process a billion transactions in a minute? And never make, and never crash, or never melt, or fail. And the answer would be, no, it can't. Because if you looked at that and said, well, these things are useless then.

If you looked at these and said, these things are useless. You'd be kind of missing the point. I mean, I remember seeing an interview with David Bowie and a British, famous British TV journalist called Jeremy Paxman. In 1995, or 94, and David Bowie is saying this internet thing is going to change the world, and Jeremy Paxman says, well, but it's all nonsense, anybody can write anything they want on it. Which is true, but completely misses the point.

And so this gets to this puzzle, as you look at these systems, you can ask them to do things that a deterministic system might do—you can ask them for, if you ask them deterministic questions, they will not always get it right. What does that mean? Where does that matter? Where does that not matter? A lot of the things we would like to automate but can't are sort of in a fuzzy area in the middle where you'll kind of want a deterministic answer to a non-deterministic question. And so there's a bunch of places where it's quite conceptually difficult to understand what is actually going on here and why would you use this and not use this.

Now this kind of comes back to what we were talking about a while ago, it's kind of early use cases, software development and marketing. So software development, these models tend to be very good at this. But it's also very easy to see the mistakes. And you can run the code and see if it breaks. It's very easy to check whether it's right. It's kind of like saying, get an LLM to make your spreadsheet, and then go and check all the formulas. That still might be better than doing it yourself.

The other thing, of course, is marketing, and you'll think, well, write me 50 ideas for a slogan here. There isn't really a wrong answer, and there are only better or worse answers, and the worse answers you can see, and you can see, well, obviously I'm not going to use that, you can see what's bad and what's good, and you can pick it, but again, it's still helpful.

And, by extension, one can imagine scenarios where the same question and the same answer might be very useful for one scenario and completely useless for another. I think I may have mentioned this to you Bernard, but somebody was pitching me for a consulting project and they wanted like a thousand word biography of me which I didn't have. And I didn't really want to write it. And so I go and ask ChatGPT, and it produces a thousand word biography of someone that looks like me. Half of the jobs are wrong. The university's wrong. It's the right kind of university. It's just not the university I went to. It's like it said Oxford instead of Cambridge.

But it's still good for me. For me, that's perfect. Because I know what university I went to. I can fix that. So for me, all the mistakes are very easy to correct, but now I've got a good thousand word biography that says I'm a wonderfully intelligent, clever person, and brilliant, and you should hire me. That's great. If a consultancy had used it, and did not know any of this information about me, it would be useless, because it would be full of mistakes. But for me, it's useful because I can fix the mistakes.

So there's a lot of these sorts of puzzles around. You look for use cases where it doesn't matter. And do you wrap it in tooling and controls and patterns and APIs? Do you abstract it away completely? Do you build it into workflows where you can manage this? How do you build stuff around it? And so there's a slide in my presentation, which is basically, well, on the one hand, like, make the models better. Well, duh, that would be fun. That would be good. Don't just make a statistical model that always produces the same result the same way every time. Well, I'm not sure that's even theoretically conceptually possible, but fine.

Meanwhile, look for use cases where the mistakes are easy to see or abstract it, manage it, build tooling and build things around it. What you can't do is just treat it as though it's a sort of better search engine or a better database because it isn't a database.

I think there's a thing I've been thinking about recently, which is you can go on LinkedIn or you can go on Blue Sky or Mastodon, these places, and you can find these whole groups of people who are absolutely convinced that these LLMs are kind of like the new NFT. They're like a scam and it's all just piracy and all the output is useless and it's stupid and this is all bullshit and it's all just kind of going to blow away. These are sort of, like an LLM is like some sort of piracy system that produces crappy regurgitations of what was in the training data.

And, on one level, I'm not interested in arguing with this, like it's the XKCD cartoon, someone's wrong on the internet. There's a famous observation by Jonathan Swift where he said you can't reason somebody out of an idea that they weren't reasoned into. And, you know, if you look at these things and you look at the error rates and you don't care and you just kind of act like it's not a problem or not an issue and there aren't error rates, you're going to get into terrible trouble. And so I've certainly seen people talking about, oh, we'll just use this for medical advice, like, immediately. Well, up to a point, sort of, but with an enormous amount of care and thought about how you do that.

On the other hand, if you look at it and you say, well, I asked this an extremely specific, factual question, and one of the details was wrong, these things are just useless bullshit engines, you're Jeremy Paxman looking at the internet and saying, well, anyone can write anything. Well, yes, I can.

Bernard Leong: So moving ahead, we talked about the users and how generative AI is changing society. How do you see the balance between, say, AI augmentation and job displacement evolving in the different industries over the next 5-10 years? Would it be like what Keynes said, we're going to reach that four day a week work?

Benedict Evans: Well, people have been saying that new technology would mean no one will have to work since probably the 20s, the 1920s. And people have always been worrying that new technology would cause mass unemployment for hundreds of years. I mean, there are cases in European history of kings and queens saying, well, we're not going to do this because people will lose their jobs.

And there's a fairly basic economic fallacy here that any first year economic student would tell you, which is that if you make it cheaper and easier to do something, one or two things can happen. Either that's cheaper for everybody else so they can spend money on other things and new things, or you do more of it.

So, my joke is always that, the young people won't believe this, but before Excel, investment bankers used to work really long hours. But now, thanks to Excel, Goldman Sachs associates get all their work done at lunchtime on Fridays and go home for the weekend. And of course, that's not what happens. What happens is when spreadsheets make it much easier and much cheaper to do analysis, you do much more analysis. We have not had a collapse in the number of accountants in the last 40 years. We've had an increase in the number of accountants.

So one way to think about this is that it's a classic lump of labor fallacy in economic theory. This is the idea that there's a certain amount of work to be done. And if you automate that, well, then there just isn't any more work. And, we should sort of be able to sit down and think, you know what? 200 years ago, all of us were peasants. Like 90 percent of us were peasants. And now, how many people work on the land? Low single digit percentage of people. So why is it that everybody else still has a job? Well, because there were new jobs. And once food got cheap enough that you didn't need to have everybody working on the land for us all to be able to eat, then that frees up resource to do new things. And there's not any sort of a primary reason to think that those processes aren't going to continue.

On the one hand, human desires are infinite. And if you automate a task that frees up resource, that frees up time, that frees up capability for new things. And on the other hand, sometimes when you automate the new thing, we just do more of it, which is what happens to spreadsheets, for example, what happened to steam engines. Now, of course, it doesn't mean that it isn't painful for the people whose job got automated away. Skilled weavers automated away by weaving machines. I mean, this is kind of the classic story. It's the Luddites. The Luddites, people who went around smashing machines. One of the narratives of the Luddites is that these were skilled weavers who were losing their jobs. And so it's entirely rational behavior from their perspective, but everybody else got cheap clothes.

Bernard Leong: So Benedict, what's the one question that you wish more people will ask you about AI?

Benedict Evans: I don't know, I'm not sure that I particularly have an answer to that. I think the challenge in talking about generative AI, I think, is that on the one hand there's a huge amount of activity. There's a huge amount of stuff happening. Everything from all of the innovation going on around data centers and water cooling and new chips and interconnect and how do you connect, can you train a model on mobile data centers and both the models themselves, which we've talked about, but then all the infrastructure and all the people going out and founding companies. And the number of people building models and everything that's going on, there's loads of stuff happening.

On the other hand, the core conceptual questions around what this means and what you do with it and how it's going to work haven't really changed since 2023. How do we think about intellectual property? How do we think about error rates? How do we think about scaling? What's going to happen to employment? You make a list of 10 or 15 questions and none of them have really changed. I mean, there's a little bit of a change over the question of scaling. It may be that scaling on data alone won't keep working, but we just run out of data. But it's not like we've kind of reached a point of clarity on what this is going to look like.

And what we don't have really is we've got a lot of vertical product. We've got lots of people building enterprise software. What we don't have is product market fit for Gemini or ChatGPT themselves beyond certain verticals and certain profiles of early adoption. So we don't, I mean, one of the charts in my presentation up at the beginning is most people have heard of ChatGPT now, but most people who've tried it didn't go back. Most people use this and think, well, that's very clever. What do I do with it?

And it's not, if you think about the evolution of most other technologies, it's not the user's job to work out what this is for. It's the job of the entrepreneur and the technology company and the software company to work out what you would do with this. It's not the user's job to work out what you would do with image recognition. It's the job of Apple and Google to integrate it into the Photos app and the Camera app and Meta and Amazon and everybody else to think about ways that it would be useful to do something with image recognition. But it's not the user's job to think about what they should do with image recognition.

And then you give someone ChatGPT and you say, Hey, you want to go off and use it? And they think, well, for what? And they say anything. Like, well, that's not really an answer.

Ayesha Khanna: What are some metrics or questions that you keep in mind when you're looking at this AI trajectory? Like just, what are the top three or four things that you consider to really understand its impact?

Benedict Evans: So, what I look for is active use. Like, how many people are actually using this, how much, for what? And so we have, as we've said several times, this clearly has product market fit in software development. It clearly has product fit in marketing. We're starting to see, one step at a time, individual things, where people say, we can massively automate this.

There’s a sort of puzzle as to, well, how do you think conceptually about what that means? Is this sort of like the web? Is this like, or is this kind of like open source or like SaaS? Because, you know, you can't use SaaS. You can't go out and say to somebody, hey, have you used SaaS? SaaS is simply a route to the market that enabled a thousand new companies and a better user experience for their product. And so it may be that saying, hey, have you used generative AI? It's like, well, anyway. Where have you used SaaS? I've used Salesforce. I've used Gmail. Gmail is SaaS. Gmail is great. I've used that, but I haven't used these other 500 things that people have built using this. And so it may be that this is how this evolves, that we're waiting for Gmail.

Bernard Leong: Many thanks for coming on our first episode and also the inaugural of our podcast, Augment. How can our audience find you? I know you have a great newsletter, and tell us more about where we can find you.

Benedict Evans: Well, my parents have good SEO, so if you Google Benedict Evans, you'll mostly find me. I have a website where I publish essays, although partly to what we just said, I haven't written anything for a while. I do a big annual presentation, which is there, and then I do a weekly newsletter, which is here's everything interesting that happened in a week, and why I thought it was interesting, what I thought it might mean, and that's got 170, 180,000 subscribers now. So yes, Google me and then check around.

Bernard Leong: So Benedict, thanks. And we'll look forward to speak to you again.

Ayesha Khanna: That was awesome. Thank you so much for your insights and your thoughts. It was really fun and very interesting.

Ayesha Khanna: Take care.

Mahlaqa Khalid

Episode 1: AI is Eating The World

In This Podcast

About the guest

Transcript from this podcast

Episode 2: Everything You Need To Know About DeepSeek

Augment AI Podcast©2025

Augment AI Podcast
©2025