Episode 2: Everything You Need To Know About DeepSeek

In This Podcast

This episode explores how AI is transforming the future of work, not by replacing humans but by augmenting their abilities. It highlights the importance of adaptability and continuous learning in a rapidly evolving digital landscape. AI is automating repetitive tasks, allowing professionals to focus on creativity, problem-solving, and decision-making. Key skills for the future will include not only technical expertise but also critical thinking, emotional intelligence, and adaptability.

The discussion covers AI’s impact across various industries, from healthcare to finance, while also addressing ethical considerations. It also emphasizes the need for greater diversity in AI leadership, particularly the role of women in shaping the field. Ultimately, the conversation encourages listeners to embrace AI as an opportunity rather than a threat, underscoring the importance of lifelong learning in a tech-driven world.

Transcript from this podcast

Bernard Leong: Hey Ayesha, how are you doing?

Ayesha Khanna: Hey, Bernard.

Bernard Leong: I think everybody has already watched our first episode with Benedict Evans talking about AI is eating the world. I know we are two months behind time and a lot has happened over the past few weeks. Definitely without doubt in Asia, everybody's talking about DeepSeek, but before we get there, I think our audience probably wouldn't know what we do and how we are involved in AI. I probably should ask you to introduce yourself and then we can get into the main subjects of the day.

Ayesha Khanna: Absolutely. So Bernard, like you, I have been doing machine learning and computational statistics my entire career, and I started off on Wall Street where I was working on a variety of systems for a dozen years. After my PhD in Smart Cities and AI, I moved to Singapore where I started my own AI engineering firm, and we have been building large data platforms and AI solutions for some of the largest logistics healthcare and financial services companies in North America, Asia and now the Middle East.

Bernard Leong: That's everybody, right?

Ayesha Khanna: Yes. And we met that way. Remember?

Bernard Leong: Yes. We met when you are a partner of Amazon Web Services machine learning team. I was the person leading that business for Southeast Asia and that was how we probably already knew each other earlier, but that was actually where we formed the partnership to look at how to help businesses with AI earlier. But I think I should also add, and you're being very modest, you have also advised our Infocom and Media Development Authority of Singapore. You want to talk more on that?

Ayesha Khanna: Well, I also sit on a large number of private, public and government boards at various points in my career, including IMDA, which regulates the AI sector in Singapore. We have a lot of emphasis on AI in the country. Bernard, of course, you've been very instrumental in that since you've been giving AI courses both to government and to private organizations, and I hope you'll talk about that because that is the biggest demand in every single country as part of the national AI strategy.

Bernard Leong: And I also know that you have one other thing that you are always co-chairing and helping to curate and assemble the content for most of the major conferences. I think Fortune Brainstorm AI was one of the last events that you have just done right.

Ayesha Khanna: I did. It's an incredible event by Fortune Magazine. It's called Brainstorm, and we curate and bring some of the biggest names in AI to Asia and the stories of what works in reality, separating the hype from reality. The reality is that it's moving very fast and becoming cheaper and more pervasive as we speak.

Bernard Leong: Yes, and definitely this is what our podcast is about. So I'm just going to give a brief introduction myself. I'm a theoretical physicist by training. I graduated from the University of Cambridge. My first foray into machine learning was developing unsupervised learning to look at the human genome data and what we were looking at at that point in time wasn't that important, called mRNA targets that eventually the quotes that we have had actually been productionized by the likes of Moderna, BioNTech into how they're searching for mRNA targets for the vaccines itself.

I have been in the startup corporate world, I was the Chief Digital Officer of Singapore Post previously, then the head of Airbus AEL, which is the drone and satellite business for Airbus in the Asia Pacific, and with Amazon Web Services I was the head of artificial intelligence and machine learning. Now I'm running my own company, Dodge AI. I also teach in NUS, doing both the business aspects, helping a lot of the Singapore government officials and private sector on how they think about AI. Recently I started also going to Hong Kong, which I chatted with the real estate executive senior executives there. Most likely I'm going to be heading to Dubai and Philippines as well. So that's a little bit about me.

Ayesha Khanna: But Bernard, don't forget, you're advising all these startups also on AI and how AI is even entering the crypto space. Can you talk a little bit about that? Because I think that most people don't even know that that is another area of intersection.

Bernard Leong: I was very fortunate last year working with a team in Malaysia. They're called Virtuous Protocol. I think now the whole world knows about them. The team has been sending a lot of updates about how they're now in the US. I think there are a couple of AI cross crypto projects where they built AI agents. The team led by Jensen is one of the fun teams to work with. They have built AI agents and how they can monetize the social part of the decentralized social in the crypto space. I think the easiest way to think about it was they created a very nice anime agent on TikTok, and they were able to use AI agents to help the TikTok avatar to grow very quickly and even got a human geograph of her. If you want to check out, it's Virtuous Protocol, but I think for today, we should just talk about one big topic that has been on everyone's mind.

Ayesha Khanna: Let's do it.

Bernard Leong: I think the best way we probably should start is we should talk a little bit about the history. For our audience to not worry, we are pretty much up to date because we very nicely start on the first day of March and just last week they have launched a whole series of innovations that we're going to talk about later. Maybe just as a story, you want to talk about the chronology of how DeepSeek started and how eventually got caught on everyone's radar, including those who are building foundational models in the US? I do know of them because I'm also part of the discussion forums about all the AI tech that's going on between the US, China, and everywhere else.

Ayesha Khanna: So Bernard, as we know, DeepSeek has been around for over a year. It was founded in China in late 2023, and the founder, Liang Wenfeng, is a serial entrepreneur. He is a hedge fund manager and runs High Flyer, which is a quantitative hedge fund. It was really a side project, DeepSeek. They were trying to just explore how to leverage AI for innovation. Then they started releasing some products, even as far back as November 2023, open source models tailored for coding tasks. In early 2024, they had released their first large language model, but they were still very much under the radar. It was only people such as yourself and me who were paying attention to AI models that they started coming up in our chat groups.

In May we saw they had a version two of DeepSeek, which is a really significant upgrade with a strong performance at a low cost. This whole low cost thing is going to be very big. It gained international notice for its efficiency and open source approach. Then it continued to release another version and another version till it was really in December 2024, I think, where they released a research paper where they claimed that the final training cost of their version three model was less than $6 million, and they had used about 2000 Nvidia H800 chips. That is something that took everyone by surprise because in the US they've been building it and asking investors for billions of dollars in investment. Then of course Bernard came January and DeepSeek R1. You want to talk a little bit about the shock the system got then?

Bernard Leong: I think the shock was that when they released Deep R1, which is an advanced reasoning model, it basically rivaled OpenAI's Claude One and Chat 4.0. It's pretty free. It became the most downloaded app on the Apple App store and overtook ChatGPT. Currently it's been demoted to 11 last week, and I think there's some throttling issues as well. One of the things that came up was that the company did claim that the R1 model was developed in less than two months for the $6 million number, which we are going to talk about contesting, using less Nvidia H100 chips. With that launch, it actually tanked NVIDIA's stock by $560 billion, that's half a trillion dollars within a week. The market has a big shockwave.

I think one thing people don't realize is that when you use any of the AI chatbots or the foundation models, a lot of the compute currently is still in high demand and there's very little supply. What DeepSeek has shown is that now we can actually make this more efficient, which would mean that there's going to be more demand, which I think was Satya Nadella, the CEO of Microsoft, said is called the Jevons Paradox. Jevons Paradox is something that was in the 19th century where this economist has shown that even though with the steam engine showing up, people ended up not leaving coal and ended up buying more coal.

If you flip this to today's AI world, what we are actually saying is that, yes, DeepSeek did bring a lot of efficiencies, reduce the amounts of computational time, reduce everything, but the demand for compute is still increasing. So hence people are going to buy more Nvidia chips. So I am not so worried because Nvidia just announced its earnings, I think two days back too. Their sales are ok. So you can actually see Jevons Paradox really at work for this AI.

Ayesha Khanna: What I think really was interesting also was not only the cost, and I know we're going to talk about that a little bit more, but also the speed at which they did it. Going from essentially what one could say stage one or stage zero to where they were in 18 months. That is really challenging tech giants and reshaping the expectation of speed. We already started to see this acceleration in new models coming out from Anthropic, from Google, from ChatGPT, but suddenly now across the rest of the world is an emergent player that is moving at a speed that's putting a whole new kind of pressure on them.

Bernard Leong: Yes, and do not even underestimate Elon Musk's X AI. Grok is really impressive. This is really interesting. The whole of last week there was Claude 3.7 that was just released, and I'm really getting like three, four different people telling me, you know what? I think actually Grok is really good specifically on the Think and Deep research and even Cloud 3.7. I think it also put all the current foundational models in the US on high alert as well. Then I think yesterday, 28th Feb, OpenAI just announced ChatGPT 4.5 to be released next week.

Ayesha Khanna: Actually I'm using it. It's already available.

Bernard Leong: You're using a pro version, right?

Ayesha Khanna: I'm, it's available to the pro people. It is good. I actually like their deep research a lot too. I think the other interesting thing, Bernard, is if you think about Grok and DeepSeek. Grok put a cluster of 200,000 Nvidia chips, and they're fast and they're good. Then you look at DeepSeek and they're fast and they're good, but they don't have access to those chips. So they had to be really innovative mathematically in how they optimized it. One is a little bit like we're just going to throw money in chips at the problem, and of course the talent, but the other is we don't have the money, we don't have the chips. So we just have to be very innovative and that is what the world didn't expect. That's what's interesting about DeepSeek.

Bernard Leong: Maybe let me put into context, what does that mean for business? All the cloud providers have now allowed all the enterprises to deploy DeepSeek. Does that mean DeepSeek is really making money? The answer is no, because DeepSeek has open sourced its R1 model. You can literally use something like Ollama, just download. If you have more than one terabyte, you can definitely host a DeepSeek model on your MacBook Pro laptop or maybe your equivalent Windows desktop for free.

Now, what the cloud providers have done is they allow you to deploy DeepSeek on the cloud and you can run it, and it's much cheaper. To give some comparison for a reasoning task in OpenAI, if you compare three weeks back, is about $60 per million tokens, and DeepSeek is charging $2.19 cents. That is a one of a 30th cost of the same thing. If I were to even host it, I don't even pay that $2.19 cents. I'm actually taking everything for free. This is where the cloud computing, or what we call the hyperscalers, are now figuring out how to even monetize DeepSeek. I'm hearing some parts of the world that maybe they're going to have a serverless version and it's going to be even cheaper.

Ayesha Khanna: So let's talk about that. Let's talk about the cost, but first let's delve into just a little bit on how is it technically different? What did they do that was different from the existing models? Because they had a small team that were all under 35 from this university in China that now everybody's talking about. But you knew about it way before, right, Bernard?

Bernard Leong: Yeah, because in my days working for Airbus Arrow, I go a lot to China to look at the drone teams there. In China there's the top four university. The Peking U, which is based in Beijing. You have the Fudan University in Shanghai. These are what we call the big four. Where these PhDs are from is actually from Zhejiang. Zhejiang is also known to be where most of China's best entrepreneurs come from. One that we know is Jack Ma from Alibaba. Zhejiang in the past few years have been shaping to be one of the largest production of AI PhDs.

I think one of the things that, according to the DeepSeek recruitment was that if you join High Flyer, you'll be allowed to work on the best projects. That's how they actually brought in the best AI talent. Just to take some comparison, if we talk to our counterparts in Silicon Valley, which you and I know, an AI engineer today, probably with a PhD like ours, would probably cost about maybe one million. I have some old friends who works for DeepMind as well in the UK. That's the cost of an AI engineer. Now, at DeepSeek, guess how much are they paid in China? In US dollars.

Ayesha Khanna: 250,000, 300,000.

Bernard Leong: Actually, I think they probably much more now. It's one fifth of the cost, so it's about 200K USD. So you could imagine, so much young talent and the pace of how fast they actually came up with ideas. Some of the ideas in the technology architecture are not new. They were known for a while. But I think a lot of the times when these conversations go on forums, what tends to happen is that someone from Europe and Asia will say, including China, will say something like, maybe we should try this matter and take it to scale. And somebody from the US will just come in and say, let's just throw money at that problem. You don't really need these matters, and this is what they show that made it work.

I'm just going to highlight one thing I think the most well-known thing they did was with the distillation of larger models. You can have a ChatGPT 4.0 where what they do is they push queries. I think they did about 800,000 queries to actually look at how the reasoning model works and then distill these large models down to smaller models. They didn't just do it for OpenAI, but they also did it to Alibaba's model as well, which they actually distilled a pretty significant amount of the intelligence from these larger models to actually train the DeepSeek R1 model. That's also the reason why it is so effective. Then of course there's this thing about when some people try to prompt engineer DeepSeek, it has an identity crisis. Sometimes it causes out DeepSeek and sometimes it causes out ChatGPT.

What are the sort of the other innovations that you are very excited about? I know we have this private conversation about mixture of experts.

Ayesha Khanna: So that's the one I love actually. Mistral was using it as well. But the mixture of experts model is essentially when instead of using your whole brain or the whole neural network model, you're actually dividing into sub neural networks and you are going specifically to sub experts. So you are using a lot less computation because you're activating a lot fewer parameters for processing each token that comes in. There's a gating model, so at the gate, you kind of decide which of the subnets you're going to go for.

There's some more capabilities of making sure that the noise is not there and you're not missing out on the important subnets. But essentially the bottom line is the following, instead of 671 billion parameters that are processed for each token, for each inquiry that goes in, the mixture of experts model is activating only 37 billion. So you can imagine that's a huge reduction in the overhead and computation, and that is what everybody's talking about. It's not new. But by combining this optimized distillation, this mixture of experts model the reinforcement learning, that's how they are really able to, or even this multi-head, latent attention mechanism. I'd love for you to kind of go through that as well.

Bernard Leong: So maybe I'll just highlight, you are totally spot on. I think the two things to add, first, let me talk about the reinforcement learning part. You know what, I'm actually teaching a technical course next week and I have to rewrite my entire set of notes. Just to help the audience, most of the foundational models are trained using this method called reinforcement learning with human feedback, even plus AI now. The best way to approximate to get to the correct answer to a query is by this method called proximal policy optimization is to ensure that you positively reward the large language model for making the, say a dog is. You can say he's a furry animal, but you'll score it if he says it's a man's best friend, for example.

So what DeepSeek team has done is they came up with a much more generalized version of PPO method, and they actually rolled out that equation in their, it's called group relative policy optimization. This is really one of the big things they have actually done. Because they now have a much more general formula, it means that you can actually do these approximations really at scale. Because most of the time it was actually using an older version of a quick and dirty way to actually make that approximation for the reinforcement learning. But the team has actually come up with a generalized formula and it's inside that R1 paper and it's a great read for me.

One more thing I wanted to add was that one of the things they did in the transformer model is to use what is called multi-hit latent attention. What typically happens in the attention, the attention is basically trying to find the relationship with all the words. They did this smart thing to reduce the memory usage down to 5% as compared to 13% to all the previous model. They were able to even cache the query by about 93.3% versus the standard attention. This actually decreases the amount of hardware needed per query and decreases the cost, and I think that these are the things that really, it's not just, like you said, not one thing, but maybe the four or five things you have to put it in and get to that model, which comes to this interesting part. Do you believe that it's actually $6 million to train the model?

Ayesha Khanna: Bernard, we've talked about this a lot. We've gone back and forth and had our debates and I think we really don't, it's still significantly less, but they built it over many iterations. They had to hire the talent. They did have the Nvidia chips that they had to pay for. I think it's the last run that cost about 6 million maybe. You know, you were mentioning maybe it's a lost in translation because the Western media didn't understand that. But what do you think? What was the actual cost?

Bernard Leong: I have a couple of conversations with different people, people who usually work in Nvidia. As the head of AI/ML for family, for Amazon, I actually get to sell GPU chips, so we did a kind of back of a napkin calculation to see how much is it really worth. So just to help everybody in context, they have bought 1.5 billion Nvidia chips three years ago, and the amount of talent which you and I have received that 50 engineers is probably worth about 250K per year of annual salary.

If you were to train the way how they have trained the model, even for distillation, because it requires them to use OpenAI's Oh One query to actually distill that information that you take 800,000 times, probably $60, depending on how much query, you're probably already hitting more than $10 to $50 million. So all in all from how we estimated it, we think that the best recent models probably trained between 35 to $70 million US dollars. The reason is not just because maybe the training one itself is 6 million, but the rest of it, all the other foundations that they have built into building this DeepSeek model is probably much more as compared to the numbers. I mean the 6 million is a narrative, right?

Ayesha Khanna: It's still so much cheaper though than what the original ChatGPT cost OpenAI, which was a billion dollars I believe.

Bernard Leong: I mean, we hear every other podcast will talk about the impact to the US and I think we see it in dollar signs because of the stock market crash with their companies. So I would be very curious, what do you think is the global impact for DeepSeek to the rest of the world? Think about Europe, I mean Africa, they don't even have the energy to be able to run the data centers to actually run these AI models. What are your thoughts on that at the moment?

Ayesha Khanna: Bernard, I've been speaking just yesterday. I was with somebody from Switzerland. I've been speaking to people in India and Indonesia and Latin America, and they are very excited about this because they feel like now it democratizes AI. I think the big bottom line is that generative AI is getting cheaper. It was already getting cheaper, by the way. Now DeepSeek has shown that it can get cheaper by many magnitudes. For companies that I advise, for example, it doesn't make any sense to sit on the fence anymore because it's not a luxury. If it's that cheap, everyone's going to be using it.

So if that happens, not only will we see companies rapidly adopting it, but then we'll see a lot of interesting innovations coming out exactly as you said, out of Africa, out of India, out of Bangladesh, out of Vietnam. One of the things that you love to talk about that I want to emphasize here is that it's open source, but there's some kind of, people really don't understand the significance of that. Bernard, just break it down for us a little bit.

Bernard Leong: Okay. So to help everyone, the models like OpenAI, ChatGPT and Anthropic Claude is closed off means they don't divulge three things. The source code, what the AI algorithms are, the data, which is what is the training data being used, and the weights. The weights is something in machine learning. What you need to know is that you need to tune the machine learning model so that it can do the task that you intended it to do.

Now, when we talk about open source, most foundation models currently, the open source one is using is the MIT license, which is very common, including DeepSeek is also using that. So let's use two comparisons. I think the best comparison to say DeepSeek is Meta's LLAMA model, right? They only give two things. They give their source code, they don't give their weights. You have to make some educated guesses. Don't worry, do a Google search. You'll find it all over the internet. And the third is the data. They didn't talk about what data they use.

I think DeepSeek did two out of three. They tell you the source code, and also they release a whole lot of it this week on how they actually improve the architecture to train things faster. They reveal the weights, but they didn't reveal the data. It depends on who you believe, but there is a lot of discussion in the world where this could have got escape data. Just talk about books for example. There's a pretty well-known Russian repository that actually allow you to get most of the books in ePub formats.

I, for one, because I respect intellectual property, so I buy all my books through the Kindle and Apple Books. So there is this place that you can actually download and some do suspect that it's being trained on some of these data as well. So the question is to now, no one seems to be able to tell you where the data is, but for everyone else sake, I would just let you know there is actually a place, my students always like to ask me this question. Where do you get the data to train this models and I'll tell them that, just do a Google search called Allen's Institute and they have a whole trove of data that they have curated that won't violate copyright law. Let me be very clear that, if you take this Allen's Institute data source and train it, you won't get sued by anyone because it is the best data source that's in the market. That will also include things like Wikipedia, things that are all public source data on there.

So I think this is something that when we talk about open source, we have to be very clear who is releasing what. But I'm sure some will say that we are on the wrong side of history. So you might see them maybe releasing one of these three things. So let me repeat. Source code, the weights, the data being trained, I suspect no one's going to talk about the data being trained.

Ayesha Khanna: I agree with you, Bernard. That's very helpful, by the way, because I think that all these companies claim like LLAMA, they're open source, but then only those people who know the details know whether it's actually open source or not. The fact of the matter is, it's coming up. Yesterday I was with the CEO of a very large marketing agency and they said that their clients, because they create a lot of the imagery and the content for marketing, are asking, what about IP rights? What are you using as your foundation model? They're kind of afraid of it, but even though it's coming up, we know that all of the current foundation models have used data that they did not have the authority to use. That is actually one of the unfortunate things because people have worked hard to write books, to create movies, to create short videos on YouTube, and a lot of them have been used to train these generative AI models.

Bernard Leong: I think this copyright conversation is going to come up more and more, and I don't think we are going to be able to solve it today. We are probably going to have multiple episodes where we will keep talking about because there's actually a recent lawsuit that talks about where Reuters won this lawsuit over the training of the data, but there was a very restrictive case.

So there is still a lot to be. I think it's also a point where I would say that we need to rethink what copyright means in the 21st century context because I think one argument that I would give, for example, so suppose if on the internet, you do a TikTok video, you can remix that video and then spin it off as yours, and then you actually proliferate somebody's viral video. So if you were to take someone's text and then rewrite it and do something, does it constitute as remixing? Of course, we have different points of view on that, but that is one of those questions that I think is going to be much more difficult for people who are developing AI models, specifically in image and videos moving forward.

Ayesha Khanna: Hundred percent, Bernard. I think that we need to have a conversation with some experts in this field on the podcast for sure.

Bernard Leong: So let's get down to this week for DeepSeek again.

Ayesha Khanna: Yeah. And the competitors don't forget.

Bernard Leong: Then the competitors, they have done quite interesting stuff. We'll also talk about the competitors. So I think for this week, DeepSeek has released five open source repositories. They call it the open source week for them. They want to promote transparency, collaborative AI development. A couple of things came up. They actually share their production stuff. That means how do they make things move faster.

So they, over five days, they did a few things. They wanted to show how DeepSeek V3 was trained for 6 million versus 400 million. They also make some pretty standard metrics. For example, first day, they release a decoding kernel for the multi-headed latent attention that are on Nvidia Hopper GPUs. They actually need Nvidia chips to make this work. They basically achieved 300 gigabytes per memory and 580 teraflops in computational bounds versus what is currently known in the market. They actually doubled the inference speed for real AI tasks, so it makes your chatbot run very fast. Have you ever felt like whenever you run a DeepSeek model in reasoning it moves very quickly?

Ayesha Khanna: Very fast. So does Grok. Very fast.

Bernard Leong: The other one I think we should already talk about, and you're very familiar with the Mixture of Experts model. The second day they actually delivered this thing called Deep EP. It's another library which actually delivers about 153 gigabytes per intra node and 43 gigabytes of Internode. With low latency options, that's between 162 to 318 microseconds. So what it does is that it actually enables very fast mixture of experts scaling. This is actually the example of that 36 billion parameters that you use for the mixture of experts.

Another way to think about it is, if you think about a knowledge tree. So imagine you have all the world's information in the knowledge tree that's trained by AI model. So where would you look for the most useful information within that tree? You're probably looking at the roots, right? So the tree has leaves, has branches. So you can think of the 676 billion across the entire tree, but actually where they thought that the most useful information is actually down at the roots. So the 37 billion is actually a kind of pruning, it's probably like an 80/20 rule that they have done. This is actually one of the key things that they have done.

Then the other three more days is basically doing, they call it something Deep Jam, Dual Pipe, and 3FS. These are things that they try to use to improve the speed up of training AI workloads, inference. One of the things like the 3FS actually improved the traditional systems by 10x the speed of inference. So it really makes a lot of difference to how DeepSeek has actually done technologically. I think, as we always say, constraint breeds innovation. Do you think this is the perfect story for that?

Ayesha Khanna: It's the perfect story and I think it's the perfect story for understanding inference also. Because I think people don't talk about it enough. We constantly look at the cost of training, but once you come to inference, it's kind of like, we went to grad school and we learned all of this stuff, and then we encounter all kinds of new problems that we were not taught in our law degree or a business degree, and just like that, an AI has all this knowledge of the world that it's been trained on, and then you ask it something and then it has to infer the best answer for you based on its previous knowledge.

So inference is a very big deal. And speed. What you encounter in the moment that you're interacting with the AI is actually inference. And that I think is incredible. And the other thing I find interesting is DeepSeek has set such a benchmark, not only for Silicon Valley or Europe, but everybody in China also. So whether it is Alibaba or Tencent, now they're benchmarking themselves against DeepSeek. So did you hear Tencent released their model today?

Bernard Leong: Yeah, a couple weeks ago. Based on what DeepSeek has done, right? So it's not like everybody's standing still. I'm pretty sure the frontier models in the US are now copying what DeepSeek is doing. I remember someone was asking me who is the most affected when DeepSeek happened to the tech companies. Is it the stock or is it something else? And I just told them the stocks will be fine. I know we took a big dip this week. It's actually Meta's employees, they're probably asking, why didn't you guys think of using these things to improve efficiency? Is it worth paying you $1 million versus I will pay these guys 250K and bring them over to the US.

Ayesha Khanna: I was just thinking, look, the Alibaba's, they call it Qwen right? The Qwen model was released in Jan 29th. It's also using a large scale, as you said, mixture of experts model. So now they're trying to do the same thing, but they're saying it's outperforming it. There is some skepticism about whether it's true or not, but they did that. And of course, Alibaba already has such large enterprises, so the moment it's interesting when a model comes out, but it really hits the consumer when they embed this new model in products, whether it is cars, chatbots, robots, drones, I mean, you know this area really well.

Then what I thought was interesting was that they also say they're open source, but also only partially. Exactly what you were saying, Bernard, right. They don't really share their weights. They don't really give details of the architecture. They're kind of, as you said, building on the MOE or mixture of experts model.

Bernard Leong: And I think this is going to continue with talking point on that, right? So you think of the publication side, there's going to be more and more new ideas about how to improve AI. Transformer is not the be all and end all of doing AI. There are other models, reasoning. How does it work? How do you constrain things like hallucination?

I follow Anthropic a lot. I think what they do in AI safety is amazing. Specifically, I don't know whether you read the recent research where they did this thought experiment where the management has gone evil and keep telling Claude to do things that are totally out of line and slowly for some reason, Claude is saying yes to them, but doing something else in the background based on the way how it's being trained, on the constitutional AI that they have invented.

So I think this is something that a lot don't talk about, right? I think there's still a lot of value that while all the innovations going in terms of speed, how the AI safety is going, how people are thinking of using the AI, but also put a lot of pressure on the application sites. The day-to-day stuff that you and I use. Maybe I just ask you, how many tools have you been using using all these foundation models every day. Do you use a lot of ChatGPT?

Ayesha Khanna: I use it. I use all of them. So I use Claude, Gemini, ChatGPT, obviously DeepSeek and of course Grok now. So I have like five, six assistants that are coming up and there I feel there's a competition. Of course, Claude as well. There's a competition between them, but just like any consumer and Perplexity, oh my God, there's so many now. We can't remember. I have a tiny team of AI research assistants, but what I find interesting is that we, I'm getting spoiled now. If an AI chatbot takes just a millisecond longer, I get annoyed. It's just how quickly consumers get used to it. And that's really the competitiveness.

So we look at Tencent, they came today and they said that faster, they're using this hybrid transformer mamba model, and the Mamba model came out like, I remember two years ago there was an article about it. It was, I think, Princeton University or somewhere in the US that it came out and they published it and now they're using it. So a lot of this research and, you know, you did your PhD, a lot of this research is already out there, but it's about who picks it up and who is doing what with it. And that is really where I think the race is coming because you can't hide these kinds of research that takes place in academia, which is a good thing in my opinion.

Bernard Leong: I think one of the things that people don't realize is that AI went through six winters of development, and I think the other part of it is that it is a field that actually for the first time, academia is not leading it. It's actually led by companies. One interesting insight I thought would be funny. X AI, they started off as a hedge fund. I looked at DeepSeek CEO, started off as a hedge fund, maybe for the rest of the world. Maybe you need to start off as a hedge fund. And I can understand why, because in hedge funds, I think you work in Wall Street. You will know this better than me. I think they really, for every trade they make, whether it is doing high frequency trading or normal trading, the amount of compute time, they work down to the unit economy. I mean, I talked to Wall Street clients in my previous role and they are really very efficient about their compute, which is not surprising for me when I hear who is actually figuring it out that they can actually use those kind of efficiency that they've learned in the hedge funds to actually put into the AI model. Kudos to the Chinese team for actually figuring this out. I give them.

Ayesha Khanna: I totally agree. And I'm actually quite curious about the Tencent model that just came out as well, because it's about efficiency, but it's also about speed. And that's very Wall Street, especially in algorithmic trading. It's all about getting to the market and executing that trade first. So I think there's a lot to the story now. Bernard is really about the architecture and changing it. And I think I'd love to have us do a session on that and maybe interview somebody from China or from Europe or Silicon Valley to talk about that. Like how is it actually changing. And what does that do to its cost? Efficiency, speed. Because that's what businesses are interested in. That's what governments are interested in. That's what democratizes it and makes it pervasive.

Bernard Leong: I think this is also a good time for us to take a pause on the conversation and we will work to the next episode then.

Ayesha Khanna: Yes, absolutely. I can't wait.

Bernard Leong: Thanks everyone. See you next time.

Next
Next

Episode 1: AI is Eating The World