Apr 12, 2021
NVIDIA GTC 2021 Keynote Event with CEO Jensen Huang Transcript April 12
NVIDIA CEO Jensen Huang hosted the GTC 2021 keynote event on April 12, 2021. Read the full transcript of the event speech here.
Transcribe Your Own Content
Try Rev and save time transcribing, captioning, and subtitling.
Jensen Huang: (15:57)
There are powerful forces shaping the world’s industries. Accelerated computing that we’ve pioneered has supercharged scientific discovery while providing the computer industry a path forward. Artificial intelligence, particularly, has seen incredible advances. With NVIDIA GPUs, computers learn and software writes software no human can. The AI software is delivered as a service from the cloud, performing automation at the speed of light. Software is now composed of microservices that scale across the entire data center, treating the data center as a single unit of computing. AI and 5G are the ingredients to kickstart the fourth industrial revolution where automation and robotics can be deployed to far edges of the world. There is one more miracle we need, the Metaverse. A virtual world that is a digital twin of ours. Welcome to GTC 2021. We’re going to talk about these dynamics and more.
Jensen Huang: (17:09)
Let me give you the architecture of my talk. It’s organized in four stacks. This is how we work. As a full-stack computing platform company, the flow also reflects the waves of AI and how we’re expanding the reach of our platform to solve new problems and to enter new markets. First is Omniverse, built from the ground up on NVIDIA’s body of work. It is a platform to create and simulate virtual worlds. We’ll feature many applications of Omniverse like design collaboration, simulation, and future robotic factories. The second stack is DGX and high-performance data centers, official BlueField, new DGXs, new chips, and the new work we’re doing in AI, drug discovery, and quantum computing. Here we’ll also talk about Arm and new Arm partnerships. The third stack is one of our most important new platforms, NVIDIA EGX with aerial 5G. Now, enterprises and industries can do AI and deploy AI on 5G. We’ll talk about NVIDIA AI and pre-trained models like Jarvis, conversational AI. And finally, our work with the auto industry to revolutionize the future of transportation, NVIDIA Drive. We’ll talk about new chips, new platforms and software, and lots of new customers.
Jensen Huang: (18:33)
Let’s get started. Scientists, researchers, developers, and creators are using NVIDIA to do amazing things. Your word gets global reach with the installed base of over a billion CUDA GPUs shipped and 250 ExaFLOPS of GPU computing power in the cloud. Two and a half million developers and 7500 startups are creating thousands of applications for accelerated computing. We’re thrilled by the growth of the ecosystem we’re building together, and we’ll continue to put our heart and soul into advancing it. Building tools for the Davincis of our time is our purpose. And in doing so, we also help create the future. Democratizing high-performance computers is one of NVIDIA’s greatest contributions to science. With just the GeForce, every student can have a supercomputer. This is how Alex Krichevsky, Ilya and Hinton train AlexNet that caught the world’s attention on deep learning. And with GPUs in supercomputers, we gave scientists a time machine. A scientist once told me that because of invidious work, he could do his life’s work in his lifetime. I can’t think of a greater purpose.
Jensen Huang: (19:44)
Let me highlight a few achievements from last year. NVIDIA is continually optimizing the full stack. With the chips you have, your software runs faster every year and even faster if you upgrade. On our golden suite of important science codes, we increased performance 13 fold in the last five years. And for some, performance doubled every year. NAMD molecular dynamics simulator for example, was rearchitected and can now run across multiple GPUs. Researchers led by Dr. Rommie Amaro at UC San Diego, use this multi-GPU NAMD running on Oak Ridge Summit supercomputers 20,000 NVIDIA GPUs to do the largest atomic simulation ever, 305 million atoms. This work was critical toward better understanding of the COVID-19 virus and accelerated the making of the vaccine. Dr. Amaro and her collaborators won the Gordon Bell Award for this important work. I’m very proud to welcome Dr. Amaro and more than a hundred thousand of you to this year’s GTC, our largest ever by double.
Jensen Huang: (20:48)
We have some of the greatest computer scientists and researchers of our time speaking here, 3 Turing Award winners, 12 Gordon Bell Award winners, 9 Kaggle Grandmasters, and even 10 Oscar winners. We’re also delighted to have the brightest minds from industry sharing their discoveries, leaders from every field, healthcare, auto, finance, retail, energy, internet services, every major enterprise IT company. They’re bringing you their latest work in COVID research, data science cybersecurity, new approaches to computer graphics, and the most recent advances in AI and robotics. In total, 1600 talks about the most important technologies of our time from the leaders in the field that are shaping our world. Welcome to GTC.
Jensen Huang: (21:34)
Let’s start where NVIDIA started, computer graphics. Computer graphics is the driving force of our technology. Hundreds of millions of gamers and creators each year seek out the best NVIDIA has to offer. At its core, computer graphics is about simulations, using mathematics and computer science to simulate the interactions of light and material, the physics of objects, particles and waves, and now, simulating intelligence in animation. The science, engineering, and artistry that we dedicate in pursuit of achieving mother nature’s physics has led to incredible advances and allowed our technology to contribute, to advancing the basic sciences, the arts, in the industries.
Jensen Huang: (22:15)
This last year, we introduced the second generation of RTX, a new rendering approach that fuses rasterization and programmable shading with hardware accelerated ray tracing in artificial intelligence. This is the culmination of 10 years of research. RTX has reset computer graphics, giving developers a powerful new tool just as rasterization plateaus. Let me show you some amazing footage from games in development. The technology and artistry is amazing. We’re giving the world’s billion gamers an incredible reason to upgrade.
Jensen Huang: (24:39)
(silence) RTX is a reset of computer graphics. It has enabled us to build Omniverse, a platform for connecting 3D worlds into a shared virtual world. One’s not unlike the science-fiction Metaverse, first described by Neal Stephenson in his early 1990s novel, Snow Crash, where the Metaverse would be collectives of shared 3D spaces and virtually enhanced physical spaces that are extensions of the internet. Pieces of the early Metaverse vision are already here, massive online social games like Fortnite or user-created virtual worlds like Minecraft.
Jensen Huang: (25:15)
Let me tell you about Omniverse from the perspective of two applications, design collaboration and digital twins. There are several major parts of the platform. First, the Omniverse Nucleus, a database engine that connects users and enables the interchange of 3D assets and scene descriptions. Once connected, designers doing modeling, layout, shading, animation, lighting, special effects or rendering can collaborate to create a scene. The Omniverse Nucleus is described with the open standard USD, Universal Scene Description, a fabulous interchange framework invented by Pixar. Multiple users can connect to nucleus, transmitting and receiving changes to their world as USD snippets. The second part of Omniverse is the composition, rendering and animation engine, the simulation of the virtual world. Omniverse is a platform built from the ground up to be physically based. It is fully path traced, physics is simulated with NVDIA PhysX, materials are simulated with NVIDIA MDL, and Omniverse is fully integrated with NVIDIA AI. Omniverse is cloud native, multi-GPU scalable, and runs on any RTX platform, and streams remotely to any device. The third part is NVIDIA CloudXR, a stargate, if you will. You can teleport into Omniverse with VR, and AIs can teleport out of Omniverse with AR. Omniverse was released open beta in December. Let me show you what talented creators are doing.
Jensen Huang: (26:46)
Jensen Huang: (27:16)
Creators are doing amazing things with Omniverse. At Foster and Partners, designers in 17 locations around the world are designing buildings together in their Omniverse shared virtual space. ILM is testing Omniverse to bring together internal and external tool pipelines for multiple studios. Omniverse lessen collaborate, render final shots in real time, and create massive virtual sets like holodecks. Ericsson is using Omniverse to do real-time 5G wave propagation simulation with many multipath interferences. Twin Earth is creating a digital twin of earth that will run on 20,000 NVIDIA GPUs. And Activision is using Omniverse to organize their more than a hundred thousand 3D assets into a shared and searchable world.
Jensen Huang: (28:02)
… into a shared and searchable world. Bentley is the world’s leading infrastructure engineering software company, everything that’s constructed, roads and bridges, rail and transit systems, airports and seaports, about 3% of the world’s GDP or three and a half trillion dollars a year. Bentley’s software is used to design, model and simulate the largest infrastructure projects in the world. 90% of the world’s top 250 engineering firms use Bentley. They have a new platform called, iTwin, an exciting strategy to use the 3D model after construction, to monitor and optimize the performance throughout its life.
Jensen Huang: (28:50)
We’re super excited to partner with Bentley to create infrastructure digital twins in Omniverse. Bentley is the first third-party company to be developing a suite of applications on the Omniverse platform. This is just an awesome use of Omniverse. A great example of digital twins and Bentley is the perfect partner. Here’s Perry Nightingale from WPP, the largest ad agency in the world to tell you what they’re doing.
Perry Nightingale: (29:18)
WPP is the largest marketing services organization on the planet, and because of that, we’re also one of the largest production companies in the world. That is a major carbon hotspot for us. We’ve partnered with NVIDIA to capture locations virtually and bring them to life in studios with Omniverse. Over 10 billion points are turned into a giant mesh in Omniverse. For the first time, we can shoot locations virtually that are as real as the actual places themselves. Omniverse also changes the way we make work,, a collaborative platform, that means multiple artists, at multiple points in the pipeline, in multiple parts of the world can collaborate on a single scene. Real-time CGI, and sustainable studios, collaboration with Omniverse, is the future of film at WPP.
Jensen Huang: (30:07)
One of the most important features of Omniverse is that it obeys the laws of physics. Omniverse can simulate particles, fluids, materials, springs, and cables. This is a fundamental capability for robotics. Once trained, the AI and software can be downloaded from Omniverse. In this video, you’ll see Omniverse’s physics simulation with rigid and soft bodies, fluids, and finite element modeling and a lot more. Enjoy.
Jensen Huang: (32:17)
Omniverse is a physically-based, virtual world where robots can learn to be robots. They’ll come in all sizes and shapes, box movers, pick and place arms, forklifts, cars, trucks. In the future, a factory will be a robot, orchestrating, many robots inside, building cars that are robots themselves.
Jensen Huang: (32:35)
We can use Omniverse to create virtual factory, train and simulate the factory and its robotic workers inside. The AI and software that run the virtual factory are exactly the same as what will run the actual one. The virtual and physical factories and their robots will operate in a loop. They are digital twins, connecting to ERP systems, simulating the throughput of the factory, simulating new plant layouts and becoming the dashboard of the operator, even uplinking into a robot to tele-operate it.
Jensen Huang: (33:09)
BMW may very well be the world’s largest custom manufacturing company. BMW produces over two million cars a year, in their most advanced factory, a car minute. Every car is different. We’re working with BMW to create a future factory, designed completely in digital, simulated from beginning to end in Omniverse, creating a digital twin and operating a factory where robots and humans work together. Let’s take a look at the BMW factory.
Dr. Milan Nedeljkovic: (33:41)
Welcome to BMW Production, Jensen. I am pleased to show you why BMW sets the standards for innovation and flexibility. Our collaboration within NVIDIA Omniverse and NVIDIA AI leads into a new era of digitalization of automobile production.
Jensen Huang: (33:58)
Fantastic to be with you, Milan. I’m excited to do this virtual factory visit with you.
Dr. Milan Nedeljkovic: (34:04)
We are inside the digital twin of BMW’s assembly system, powered by Omniverse. For the first time, we are able to have our entire factory in simulation. Global teams can collaborate using different software packages like Rabbit, KETIA or Point Clouds to design and plan the factory in real time 3D. The capability to operate in a perfect simulation revolutionizes BMW’s planning processes.
Speaker 4: (34:35)
Are those a gaskets still in the right place for you?
Speaker 5: (34:37)
Dr. Milan Nedeljkovic: (34:38)
Yeah. BMW regularly reconfigures its factories to accommodate new vehicle launches. Here we see two planning experts located in different parts of the world, testing a new line designed in Omniverse. One of them worm holes into an assembly simulation with the motion capture suit, records task movements, while the other expert adjusts the line design in real time. They work together to optimize the line, as well as worker ergonomics and safety.
Speaker 4: (35:10)
Can you tell how far I have to bend down there?
Speaker 5: (35:11)
Against the wall, I’ll get you a taller one, but there’s [crosstalk 00:35:14].
Speaker 4: (35:14)
Yeah, it’s perfect.
Dr. Milan Nedeljkovic: (35:15)
Would like to be able to do this at scale in simulation.
Jensen Huang: (35:20)
That’s exactly why NVIDIA has digital human for simulation. Digital humans are trained with data from real associates. You could then use digital humans in simulation to test new workflows for worker ergonomics and efficiency. Now your factories employ 57,000 people that share workspace with many robots designed to make their jobs easier. Let’s talk about them.
Dr. Milan Nedeljkovic: (35:42)
You’re right, Jensen, robots are crucial for a modern production system. With NVIDIA Isaac Robotics platform, BMW is deploying a fleet of intelligent robots for logistics to improve the material flow in our production. This agility is necessary since we produce 2.5 million vehicles per year, 99% of them are custom.
Jensen Huang: (36:07)
Synthetic data generation and domain randomization available in Isaac are key to bootstrapping machine learning. Isaac SIM generates millions of synthetic images and vary the environment to teach the robots. Domain randomization can generate an infinite permutation of photorealistic objects, textures, orientations, and lighting conditions. It is ideal for generating ground truth, whether for detection, segmentation, or depth perception.
Jensen Huang: (36:38)
Let me show you an example of how we can combine it all to operate your factory. With NVIDIA’s Fleet Command, your associates can securely orchestrate robots and other devices in the factory for Mission Control. They can monitor in real time complex manufacturing cells, update software over the air, launch robot missions and tele-operate. When a robot needs a helping hand, an alert can be sent to Mission Control and one of your associates can take control to help the robot. We’re in the digital twin of one of your factories, but you have 30 others spread across 15 countries. The scale of BMW production is impressive, Milan.
Dr. Milan Nedeljkovic: (37:21)
Indeed, Jensen, the scale and complexity of our production network requires BMW to constantly innovate. I am happy about the tight collaboration between our two companies. NVIDIA Omnivorous and NVIDIA AI give us the chance to simulate all 31 factories in our production network. These new innovations will reduce the planning times, improve flexibility and precision, and at the end, produce 30% more efficient planning processes.
Jensen Huang: (37:53)
Milan, I could not be more proud of the innovations that our collaboration is bringing to the factories of the future. I appreciate you hosting me for a virtual visit of the digital twin of your BMW production. It is a work of art.
Jensen Huang: (38:13)
The ecosystem is really excited about Omniverse. This open platform with USD Universal 3D Interchange, connects them into a large network of users. We have 12 connectors to major design tools already, with another 40 in flight. Omniverse Connector SDK is available for download now. You could see that the most important design tools are already signed up. Our Lighthouse Partners are from some of the world’s largest industries, media and entertainment, gaming, architecture, engineering, construction, manufacturing, telecommunications, infrastructure, and automotive. Computer makers worldwide are building NVIDIA-certified workstations, notebooks and servers optimized for Omniverse and starting this summer, Omniverse will be available for enterprise license. Omniverse, NVIDIA’s platform for creating and simulating shared virtual worlds.
Jensen Huang: (39:08)
Data center is the new unit of computing. Cloud computing and AI are driving fundamental changes in the architecture of data centers. Traditionally, enterprise data centers ran monolithic software packages. Virtualization started to trend towards software-defined data centers, allowing applications to move about and letting IT manage from a single pane of glass. With virtualization, the compute, networking, storage and security functions are emulated in software, running on the CPU. Though easier to manage, the added CPU load reduced the data center’s capacity to run applications, which is its primary purpose.
Jensen Huang: (39:47)
This illustration shows the added CPU load in the gold color part of the stack. Cloud computing rearchitect the data centers again, now to provision services for billions of consumers, monolithic applications were dis- aggregated into smaller microservices that can take advantage of any idle resource. Equally important, multiple engineering teams can work concurrently using CICD methods. Data centers networks became swamped by east, west traffic generated by dis-aggregated microservices. CSPs’ tackled this with Mellanox’s high-speed, low latency networking. Then deep learning emerged. Magical internet services were rolled out, attracting more customers and better engagement than ever.
Jensen Huang: (40:32)
Deep learning is compute intensive, which drove adoption of GPUs. Nearly overnight, consumer AI services became the biggest users of GPU super computing technologies. Now adding zero trust security initiatives makes infrastructure for software processing one of the largest workloads in a data center. The answer is a new type of chip, for data center, infrastructure processing like NVIDIA’s BlueField DPU. Let me up illustrate this with our own Cloud-gaming service, GeForce Now as an example. GeForce Now is NVIDA’s GeForce in the Cloud service. GeForce Now serves 10 million members in 70 countries, incredible growth. GeForce Now is a seriously hard consumer service to deliver, everything matters, speed of light, visual quality, frame rate, response, smoothness, startup time, server costs, and most important of all, security.
Jensen Huang: (41:33)
We’re transitioning GeForce Now to BlueField. With BlueField and we can isolate the infrastructure from the game instances and offload and accelerate the networking, storage, the security. The GeForce Now infrastructure is costly, with BlueField we will improve our quality of service and concurrent users at the same time. The ROI of BlueField is excellent. I’m thrilled to announce our first data center infrastructure SDK. DOCA 1.0 is available today. DOCA is our STK to program BlueField. There’s all kinds of great technology inside, deep packet inspection, secure boot, TLS crypto offload, regular expression acceleration, and a very exciting capability, a hardware-based, real-time clock that could be used for synchronous data centers, 5G and video broadcast. We have great partners working with us to optimize leading platforms on BlueField, infrastructure, software providers, edge and CDM providers, cyber security solutions and storage providers. Basically the world leading companies in data center infrastructure.
Jensen Huang: (42:37)
Though, we’re just getting started with BlueField-2, today we’re announcing BlueField-3, 22 billion transistors, the first 400 gigabit per second, networking chip, 16 arm CPUs to run the entire virtualization software stack, for instance, running VMware ESX. BlueField-3 takes security to a whole new level, fully offloading accelerating IPsec and TLS cryptography, secret key management and regular expression processing. We’re on a pace to introduce a new BlueField generation every 18 months. BlueField-3 will 400 gigabits per second and be 10X the processing capability of BlueField-2 and BlueField-4 will do 800 gigabits per second and add NVIDIA’s AI computing technologies to get another 10X boost, 100X in three years and all of it will be needed.
Jensen Huang: (43:31)
A simple way to think about this is that a third of the roughly 30 million data center servers shipped each year are consumed running the software defined data center stack. This workload is increasing much faster than Moore’s Law. So unless we offload and accelerate this workload, data centers will have fewer and fewer CPUs to run applications. The time for BlueField has come.
Jensen Huang: (43:55)
At the beginning of the big bang of modern AI, we recognized the need to create a new kind of computer for a new way of developing software. Software will be written by software, running on AI computers. This new type of computer will need new chips, new system architecture, new ways to network, new software and new methodologies and tools. We’ve invested billions into this intuition and has proven helpful to the industry. It all comes together as DGX, a computer for AI. We offer DGX as a fully integrated system, as well as offer the components to the industry to create differentiated options. I am pleased to see so much AI research advancing because of DGX, top universities, research hospitals, telcos, banks, consumer products companies, car makers, and aerospace companies.
Jensen Huang: (44:49)
DGX helped their AI researchers whose expertise is rare, scarce and their work strategic. It is imperative to make sure they have the right instrument. Simply, if software’s to be written by computers, then companies with the best software engineers will also need the best computers. We offer several configurations, all software compatible. The DGX A100 is a building block that contains five petaFLOPS of computing and super fast storage and networking defeated. DGX Station is an AI data center in a box, designed for work groups, plugs into a normal outlet. DGX SuperPOD is a fully integrated, fully network- optimized AI data center as-a-product. SuperPOD is for intensive AI research and development.
Jensen Huang: (45:36)
NVIDIA’s own new supercomputer, called Selene, is four SuperPODs. Is the fifth fastest supercomputer and the fastest industrial supercomputer in the world’s top 500. We have a new DGX Station, 320G. DGX Station can train large models, 320 gigabytes of super fast HBM2e, connected to four A100 GPUs, over eight terabytes per second of memory bandwidth. Eight terabytes transferred in one second. It would take 40 CPU servers to achieve this memory bandwidth. DGX Station plugs into a normal wall outlet, like a big gaming rig, consumes just 1500 Watts and is liquid cooled to a silent 37DB. Take a look at the cinematic that our engineers and creative team did. A CPU cluster of this performance would cost about a million dollars today. DGX Station is $149,000, the ideal AI programming companion for every AI researcher. Today, we’re also announcing a new DGX SuperPOD, three major upgrades, the new 80GB A100, which brings the SuperPOD to 90 terabytes of HBM2 memory, with aggregate bandwidth of 2.2 exabytes per second. It would take 11,000 CPU servers to achieve this bandwidth. About a 250 rack data center, 15 times bigger than the SuperPOD. Second, SuperPOD has been upgraded with NVIDIA BlueField-2. SuperPOD is now the world’s first Cloud-native supercomputer, multi-tenant shareable, with full isolation and bare-metal performance. Third we’re offering Base Command, the DGX management and orchestration tool used within NVIDIA. We use Base Command to support 1,000s of engineers, over 200 teams, consuming a million plus GPU hours a week. DGX SuperPOD starts at $7 million and scales to $60 million for a full system.
Jensen Huang: (48:49)
Jensen Huang: (49:58)
Model sizes are growing exponentially, on a pace of doubling every two and a half months. We expect to see multi-trillion parameter models by next year and 100 trillion plus parameter models by 2023. As a very loose comparison, the human brain has roughly 125 trillion synapses. So these transformer models are getting quite large. Training models of this scale is incredible computer science. Today, we’re announcing NVIDIA Megatron for training transformers. Megatron trains, giant transformer models. It partitions and distributes the model for optimal multi-GPU and multi-node parallelism. Megatron does fast data loading, micro-badging, scheduling and syncing, kernel fusing and pushes the limits of every NVIDIA invention, NCCL, NVLink, InfiniBand, Tensor Cores. Even with Megatron, a trillion parameter model will take three to four months to train on Selene, so lots of DGX SuperPODs will be needed around the world.
Jensen Huang: (51:06)
Inferencing giant transformer models is also a great computer science challenge. GPT-3 is so big, with so many floating point operations, that it would take a dual CPU server over a minute to respond to a single 128-word query. GPT-3 is so large that it doesn’t fit in GPU memory, so it will have to be distributed. Multi-GPU, multi-node inference has never been done. Today, we’re announcing the Megatron Triton Inference Server, a DGX with Megatron Triton will respond within a second, not a minute, a second, and for 16 queries at the same time. DGX is 1,000 times faster and opens up many new use cases, like call center support, where a one minute response is effectively unusable. Naver is Korea’s number one search engine. They installed the DGX SuperPOD and are running their AI platform, CLOVA, to train language models for Korean. I expect many leading service providers around the world to do the same, use DGX to develop and operate region-specific, industry-specific language services. NVIDIA Clara Discovery is our suite of acceleration libraries created for computational drug discovery, from imaging, to quantum chemistry, to gene variant calling, to using NLP to understand genetics, and using AI to generate new drug compounds. Today, we’re announcing four new models available in Clara Discovery.
Jensen Huang: (52:45)
MegaMolBart is a model for generating bio molecular compounds. This method has seen recent success within silicone medicine, using AI to find a new drug in less than two years. NVIDIA ATAC-Seq, de-noising algorithm for rare and single cell epigenomics, is helping to understand gene expression for individual cells. AlphaFold1 is a model that can predict the 3D structure of a protein from the amino acid sequence. Gator Tron is the world’s largest clinical language model that can read and understand doctor’s notes. Gator Tron was developed at University of Florida, using Megatron and trained on a DGX SuperPOD gifted to his alma mater by Chris Malachowsky, who founded an NVIDIA, with Curtis and me. Oxford Nanopore is the third generation genomic sequencing technology, capable of ultra high throughput and digitizing biology. One fifth of the SARS COV-2 virus genomes in the global database were generated on an Oxford Nanopore. Last year, Oxford Nanopore developed a diagnostic test for COVID-19 called LamPORE, which is used by NHS. Oxford Nanopore is GPU-accelerated throughout. DNA samples pass through nanopores, and the current signal is fed into an AI model, like speech recognition, but trained to recognize genetic code. Another model called, Medaka, reads the code and detects genetic variance. Both models were trained on DGX SuperPOD. These new deep learning algorithms achieved 99.9% detection accuracy of single nucleotide variants, this is a gold standard of human sequencing.
Jensen Huang: (54:24)
Pharma is a $1.3 trillion industry, where a new drug can take 10 plus years and fail 90% of the time. Schrodinger is the leading physics-based and machine-learning computational platform for drug discovery and material science. Schrodinger is already a heavy user of NVIDIA GPUs, recently entering into an agreement to use 100s of millions of NVIDIA GPU hours on the Google Cloud.
Jensen Huang: (54:51)
Some customers can’t use the Cloud. So today we’re announcing a partnership to accelerate Schrodinger’s drug discovery workflow, with NVIDIA Clare Discovery Libraries and NVIDIA DGX. The world’s top 20 pharmas use Schrodinger today, their researchers are going to see a giant boost in productivity.
Jensen Huang: (55:11)
Recursion is a biotech company using leading-edge computer science to decode biology, to industrialize drug discovery. The Recursion operating system is built on NVIDIA DGX SuperPOD, for generating, analyzing, and gaining insight from massive biological and chemical datasets. They called their SuperPOD, the BioHive-1. It is the most powerful computer at any pharma today. Using deep learning on DGX, Recursion is classifying cell responses after exposure to small molecule drugs. Quantum-computing is a field of physics that studies the use of natural quantum behavior, superposition, entanglement, and interference to build a computer. The computation is performed using quantum circuits that operate on quantum bits, called qubits. Qubits can be zero or one-
Jensen Huang: (56:03)
On quantum bids called Cubits. Cubits can be zero or one, like a classical computing bit, but also in superposition, meaning they exist simultaneously in both states, the cubits can be entangled where the behavior of one can affect or control the behavior of others. Adding and entangling more cubits let quantum computers calculate exponentially more information. There’s a large community around the world doing research in quantum computers and algorithms. Well over 50 teams in industry, academia, and national labs are researching the field. We’re working with many of them.
Jensen Huang: (56:35)
Quantum computing consultant exponential order complexity problems, like factoring large numbers for cryptography, simulating atoms and molecules for drug discovery, finding shortest path optimizations, like the traveling salesman problem. The limiter in quantum computing is decoherence, falling out of quantum states, caused by the tiniest of background noise. So error correction is essential. It is estimated that to solve meaningful problems, several million physical cubits will be required to sufficiently air correct. The research community is making fast progress, doubling physical cubits each year, so likely achieving the milestone by 2035 to 2040, well within my career horizon.
Jensen Huang: (57:20)
In the meantime, our mission is to help the community research the computers of tomorrow with the fastest computer of today. Today< we’re announcing Cuquantum, an acceleration library designed for simulating quantum circuits, for both tensor networks solvers and state vector solvers. It is optimized to scale to large GPU memories, multiple GPUs, and multiple DGX nodes. The speed up of Cuquantum on DGX is excellent. Running the Cuquantum benchmark, state vector simulation takes 10 days on a dual CPU server, but only two hours on a DGX A100. Cuquantum on DGX can productively simulate tens of cubits. In Cal Tech, using [inaudible 00:58:08] simulated the Sycamore quantum circuit at depth 20 in record time, using Cuquantum on invidious saline supercomputer.
Jensen Huang: (58:16)
What would have taken years on CPUs can now run in a few days on Cuquantum and DGX. Cuquantum will accelerate quantum circuit simulators, so researchers can design better quantum computers and verify the results, architect hybrid quantum classical systems, and discover more quantum optimal algorithms, like Shores and Grover’s. Cuquantum on DGX is going to give the quantum community a huge boost. I’m hoping Cuquantum will do for quantum computing what [inaudible 00:58:48] did for deep learning.
Jensen Huang: (58:50)
Modern data centers host diverse applications that require varying system architectures. Enterprise servers are optimized for a balance of strong single threaded performance, and a nominal number of cores. Hyperscale servers optimized for microservice containers are designed for a high number of cores, low costs and great energy efficiency. Storage servers are optimized for a large number of cores and high IO throughput. Deep learning training servers are built like supercomputers, with the largest number of fast CPU cores, the fastest memory, the fastest IO, and high speed links that connect the GPUs.
Jensen Huang: (59:27)
Deep learning inference servers are optimized for energy efficiency and best ability to process a large number of models concurrently. The genius of the x86 server architecture is the ability to do a good job using varying configurations of CPU, memory, PCI Express, and peripherals to serve all of these applications, yet processing large amounts of data remains a challenge for computer systems today. This is particularly true for AI models like transformers and recommender systems.
Jensen Huang: (59:59)
Let me illustrate the bottleneck with half of a DGX. Each GPU is connected to 80 gigabytes of super fast memory, running at two terabytes per second. Together, the four [inaudible 01:00:14] process 320 gigabytes at eight terabytes per second. Contrast that with CPU memory, which is one terabyte large, but only 0.2 terabytes per second. The CPU memory is three times larger, but 40 times slower than the GPU.
Jensen Huang: (01:00:33)
We would love to utilize the full 1,320 gigabytes of memory in this node to train AI models. So why not something like this? Make faster CPU memories, connect four channels to the CPU, a dedicated channel to feed these each GPU. Even if a package can be made, PCI Express is now the bottleneck. We can surely use it MV link. MV link is fast enough, but no x86 CPU has MV link, not mention four MV links.
Jensen Huang: (01:01:03)
Today, we’re announcing our first data center CPU, Project Grace. Named after Grace Hopper, a computer scientist and US Navy rear admiral, who in the 50s pioneered computer programming. Grace is ARM-based and purpose-built for accelerated computing applications of large amounts of data, such as AI. Grace highlights the beauty of ARM. Their IP model allowed us to create the optimal CPU for this application, which has achieved X-factor speed up. The ARM core in Grace is a next generation off the shelf IP for servers. Each CPU will deliver 300 spec int, with a total of over 2,400 spec and rates CPU performance for an eight GPU DGX. For comparison, today’s DGX, the highest performance computer in the world, is 450 spec and rate. 2,400 spec and rate with Grace versus 450 spec and rate today. So look at this again. Before, after, before, after. Amazing increase in system and memory bandwidth. Today, we’re introducing a new kind of computer, the basic building block of the modern data center. Here it is.
Jensen Huang: (01:02:42)
What I’m about to show you brings together the latest GPU accelerated computing, Mellanox high-performance networking, and something brand new. The final piece of the puzzle. The world’s first CPU designed for terabytes scale accelerated computing. Her secret code name, Grace. This powerful ARM-based CPU gives us the third foundational technology for computing and the ability to rearchitect every aspect of the data center for AI.
Jensen Huang: (01:03:20)
We’re thrilled to announce the Swiss national supercomputing center. We’ll build a super computer powered by Grace and our next generation GPU. The new supercomputer called ALPS will be 20 X Exaflops for AI, 10 times faster, 10 times faster than the world’s fastest supercomputer today. ALPS will be used to do whole earth scale weather and climate simulation, quantum chemistry, and quantum physics for the large Hadron Collider. ALPS were built by HP Enterprise and come online in 2023.
Jensen Huang: (01:03:54)
We’re thrilled by the enthusiasm of the supercomputing community, welcoming us to make ARM a top notch scientific computing platform. Our data center roadmap is now a rhythm consisting of three chips, CPU, GPU, and DPU. Each chip architecture has a two year rhythm with likely a ticker in between. One year will focus on x86 platforms. One year will focus on ARM platforms. Every year, we’ll see new exciting products from us.
Jensen Huang: (01:04:27)
The NVIDIA architecture and platforms will support x86 and ARM, whatever customers and markets prefer. Three chips, yearly leaps, one architecture. ARM is the most popular CPU in the world for good reason. It’s super energy efficient. Its open licensing model inspires a world of innovators to create products around it. ARM is used broadly in mobile and embedded today. For other markets like the cloud, enterprise, and edge data centers, super computing and PCs, ARM is just starting and has great growth opportunities. Each market has different applications and has unique systems, software, peripherals, and ecosystems.
Jensen Huang: (01:05:15)
For the markets we serve, we can accelerate ARM’s adoption. Let’s start with the big one, cloud. One of the earliest designers of ARM CPUs for data center is AWS. It’s Graviton CPUs are extremely impressive. Today, we’re announcing NVIDIA and AWS are partnering to bring Graviton II and NVIDIA GPUs together.
Jensen Huang: (01:05:38)
This partnership brings ARM into the most demanding cloud workloads, AI and cloud gaming. Mobile gaming is growing fast and it’s the primary form of gaming in some markets. With AWS designed Graviton II, users can stream ARM-based applications and Android games straight from AWS. It’s expected later this year. We’re announcing a partnership with Ampere Computing to create a scientific and cloud computing SDK and reference system. Ampere Computing’s ultra CPU is excellent. 80 cores, 285 spec in 17, right up there with the highest performance x86.
Jensen Huang: (01:06:20)
We’re seeing excellent reception at supercomputing centers around the world and at Android Cloud Gaming Services. We’re also announcing a partnership with Marvell to create an edge and enterprise computing SDK and reference system. Marvell [Octian 01:06:36] excels at IO storage and 5G processing. This system is ideal for hyper-converged edge servers. We’re announcing a partnership with Media Tech to create a reference system and SDK for ChromoS and Lenox PCs. Media Tech is the world’s largest SOC maker. Combining NVIDIA GPUs and media tech SOCs will make excellent PCs and notebooks.
Jensen Huang: (01:07:02)
AI, computers automating intelligence is the most powerful technology force of our time. We see AI in four waves. The first wave was to reinvent computing for this new way of doing software. We’re all in and have been driving this for nearly 10 years. The first adopters of AI were the internet companies. They have excellent computer scientists, large computing infrastructure, and ability to collect a lot of training data. We’re now at the beginning of the next wave. The next wave is enterprise and industrial edge, where AI can revolutionize the world’s largest industries, from manufacturing, logistics, agriculture, healthcare, financial services, and transportation. There are many challenges to overcome. One of which is connectivity, which 5G will solve. And then, autonomous systems.
Jensen Huang: (01:07:56)
Self-driving cars are an excellent example, but everything that moves will eventually be autonomous. The industrial edge in autonomous systems are the most challenging, but also the largest opportunities for AI to make an impact. Trillion dollar industries can soon apply AI to improve productivity and invent new products, services, and business models. We have to make AI easier to use. Turn AI from computer science to computer products. We’re building the new computing platform for this fundamentally new software approach. The computer for the age of AI.
Jensen Huang: (01:08:33)
AI is not just about an algorithm. Building an operating AI is a fundamental change in every aspect of software. Andre Carpathy rightly called it software 2.0. Machine learning at the highest level is a continuous learning system that starts with data scientists developing data strategies and engineering predictive features. This data is the digital life experience of a company. Training involves inventing or adapting an AI model that learns to make the desired predictions. Simulation and validation test the AI application for accuracy, generalization, and potential bias. And finally, orchestrating a fleet of computers. Whether in your data center or the edge in the warehouse, ARMs are wireless base stations.
Jensen Huang: (01:09:22)
NVIDIA created the chips, systems, and libraries needed for end to end machine learning. For example, technologies like tensor core GPU’s, NV link, DGX, [inaudible 01:09:33] Rapids, Nickel, GPU Direct, [Doka 01:09:36], and so much more. We call the platform NVIDIA AI. NVIDIA AI libraries accelerate every step, from data processing to fleet orchestration. NVIDIA AI is integrated into all of the industry’s popular tools and workflows. NVIDIA AI is in every cloud, used by the world’s largest companies, and by over 7,500 AI startups around the world. And NVIDIA AI runs on any system that includes NVIDIA GPUs. From PCs and laptops to workstations to super computers in any cloud to our $99 Jetson Robot computer, one segment of computer we’ve not served is enterprise computing. 70% of the world’s enterprise run VMware, as we do at NVIDIA.
Jensen Huang: (01:10:23)
VMware was created to run many applications on one virtualized machine. AI, on the other hand, runs a single job, bare metal, on multiple GPUs and often multiple nodes. All of the NVIDIA optimizations for compute and data transfer are now plumbed through the VMware stack. So AI workloads can be distributed to multiple systems and achieve bare metal performance. The VMware stack is also offloaded and accelerated on NVIDIA Bluefield. NVIDIA AI now runs in its full glory on VMware, which means everything that has been accelerated by NVIDIA AI now runs great on VMware. AI applications can be deployed and orchestrated with Kubernetes running on VMware Tansu. We call this platform NVIDIA EGX for enterprise. The enterprise IT ecosystem is thrilled. Finally, the 300,000 VMware enterprise customers can easily build an AI computing infrastructure that seamlessly integrates into their existing environment.
Jensen Huang: (01:11:26)
In total, over 50 servers from the world’s top server makers will be certified for NVIDIA EGX Enterprise. Bluefield to offloads and accelerates the VMware stack and does the networking for distributed computing. Enterprise could choose big or small GPUs. For heavy compute or heavy graphics workloads, like omniverse, or mix and match, all NVIDIA AI. Enterprise companies make up the world’s largest industries and they operate at the edge.
Jensen Huang: (01:11:54)
In hospitals, factories, plants, warehouses stores, farms, city and roads, far from data centers, the missing link is 5G. Consumer 5G is great, but private 5G is revolutionary. Today, we’re announcing the Aerial A100. Bring together 5G and AI into a new type of computing platform designed for the edge. Aerial A100 integrates the Ampere GPU and the Bluefield DPU into one card. This is the most advanced PCI Express card ever created. So it’s not a surprise that Aerial A100 and an EGX system will be a complete 5G base station. Aerial A100 delivers up to full 20 gigabits per second, and can process up to nine 100 megahertz, massive MIMO, for 64T-64R, or 64 transmit and 64 receive and 10 arrays.
Jensen Huang: (01:12:52)
State-of-the-art capabilities. Aerial A100 is software defined, with accelerated features like fi, virtual network functions, network acceleration, packet pacing, and line rate cryptography. Our partners, Ericsson, Fujitsu, Navenir, [inaudible 01:13:11], and Radisys will build their total 5G solutions on top of the Aerial library.
Jensen Huang: (01:13:17)
NVIDIA EGX server with Aerial A100 is the first 5G base station that is also a cloud native, secure AI edge data center. We have brought the power of the cloud to the 5G edge. Aerial also extends the power of 5G into the cloud. Today, we’re excited to announce that Google will support NVIDIA Aerial in the GCP cloud. I have an important new platform to tell you about. The rise of microservice-based applications and hybrid cloud has exposed billions of connections in a data center to potential attack.
Jensen Huang: (01:13:54)
Modern zero trust security models assume the intruder is already inside and all container to container communications should be inspected even within a node. This is not possible today. The CPU load of monitoring every piece of traffic is simply too great. Today, we’re announcing NVIDIA Morpheus, a data center security platform for real time, all packet inspection. Morpheus is built on NVIDIA AI, NVIDIA Bluefield, Net Q Network Telemetry software, and EGX. We’re working to create solutions with industry leaders in data center security. [inaudible 01:14:33], Red Hat, Cloud Flair, Splunk, F5, and Area Cybersecurity. and early customers, Booz Allen Hamilton, Best Buy. And of course, our own team at NVIDIA. Let me show you how we’re using Morpheus at NVIDIA.
Speaker 6: (01:14:52)
It starts with a network. Here, we see a representation of a network where dots are servers and lines, the edges, are packets flowing between those servers. Except in this network, Morpheus is deployed. This enables AI inferencing across your entire network, including East West traffic. The particular model being used here has been trained to identify sensitive information, AWS credentials, GitHub credentials, private keys, passwords. If observed in the packet, these would appear as red lines. And we don’t see any of that.
Speaker 6: (01:15:22)
Uh-oh, what happened? An updated configuration was deployed to a critical business app on this server. This update accidentally removed encryption. And now, everything that communicates with that app sends and receives sensitive credentials in the clear. This can quickly impact additional servers. This translates to continuing exposure on the network. The AI model in Morpheus is searching through every packet for any of these credentials, continually flagging when it encounters such data. And rather than using pattern matching, this is done with a deep neural network, trained to generalize and identify patterns beyond static rule sets.
Speaker 6: (01:16:01)
Notice all of the individual lines. It’s easy to see how quickly a human could be overwhelmed by the vast amount of data coming in. Scrolling through the raw data, it gives a sense of the massive scale and complexity that is involved. With Morpheus, we immediately see the lines that represent leaked sensitive information. By hovering over one of those red lines, we show complete info about their credential, making it easy to triage and remediate. But what happens when this remediation is necessary? Morpheus enables cyber applications to integrate and collect information for automated incident management and action prioritization. Originating servers, destination servers, actual exposed credentials, and even the raw data is available. This speeds recovery and informs which keys were compromised and need to be rotated. With Morpheus, the chaos becomes manageable.
Jensen Huang: (01:16:56)
The IT ecosystem has been hungry for an AI computing platform that’s enterprise and edge ready. NVIDIA AI on EGX with Aerial 5G is the foundation of what the IT ecosystem has been waiting for. We are supported by leaders from all across the IT industry, from systems, infrastructure software, storage and security, data analytics, industrial edge solutions, manufacturing, design, and automation, to 5G infrastructure.
Jensen Huang: (01:17:27)
To complete our enterprise offering, we now have NVIDIA AI Enterprise software, so businesses can get direct line support from NVIDIA. NVIDIA AI Enterprise is optimized and certified for VMware and offer services and support needed by mission critical enterprises. Deep learning has unquestionably revolutionized computing. Researchers continue to innovate at light speed with new models and new variants. We’re creating new systems to expand AI into new segments like ARM, enterprise, or 5G edge. We’re also using the systems to do basic research in AI and building new AI products and services.
Jensen Huang: (01:18:07)
Let me show you some of our work in AI, basic, and applied research. DLSS, deep learning super sampling. StyleGAN, AI high resolution image generator. GaNcraft, a neural rendering engine turning Minecraft into realistic 3D. GaNverse3D turns photographs into animatable 3D models. Face Vid2Vid, a talking head rendering engine that can reduce streaming bandwidth by 10X while reposing the head and eyes. Sim2Real, a quadruped trained in omniverse. And the AI can run in a real robot. Digital twins.
Jensen Huang: (01:18:47)
SimNet, a physics informed neural networks solver that can simulate large scale multi-physics. BioMegatron, the largest biomedical language model ever trained. 3DGT, omniverse, synthetic data generation. And OrbNet, a machine learning quantum solver for quantum chemistry. This is just a small sampling of the AI work we’re doing at NVIDIA. We’re building AIs to use in our products and platforms. We’re also packaging up the models to be easily integrated into your applications.
Jensen Huang: (01:19:18)
These are essentially no coding, open source applications that you can modify. Now, we’re offering NGC Pre-Trained Models that you can plug into these applications, or ones you develop. These pre-trained models are production quality, trained by experts and will continue to benefit from refinement. There are new credentials to tell you about models development, testing, and use. And each comes with a reference application sample code. NVIDIA Pre-Trained Models are state of the art and meticulously trained, but there is infinite diversity of application domains, environments, and specializations. No one has all the data. Sometimes it’s rare. Sometimes there are trade secrets.
Jensen Huang: (01:19:59)
So we create a technology for you to fine tune and adapt NVIDIA Pre-Trained Models for your applications. Tao applies transfer learning on your data to fine tune our models with your data. Tao has an excellent federated learning system to let multiple parties collectively train a shared model, while protecting data privacy.
Jensen Huang: (01:20:19)
NVIDIA federated learning is a big deal. Researchers at different hospitals can collaborate on one AI model, while keeping their data separate to protect patient privacy. Tao uses NVIDIA’s great tensor RT to optimize the model for the target GPUs system. With NVIDIA Pre-Trained Models and Tao, something previously impossible for many can now be done in hours.
Jensen Huang: (01:20:42)
Fleet Command is a cloud native platform for securely operating and orchestrating AI across a distributed fleet of computers. It was purpose-built for operating AI at the edge. Fleet Command running on certified EGX systems will feature secure boot, [inaudible 01:21:04], uplink, and downlink to a confidential enclave. From any cloud or on-prem, you can monitor the health of your fleet. Let’s take a look at how one customer is using NGC Pre-Trained Models and Tao to find two models and run in our metropolis smart city application, orchestrated by Fleet Command.
Speaker 7: (01:21:25)
No two industrial environments have the same and conditions routinely change. Adapting, managing, and operationalizing AI enabled applications at the edge for unique specific sites can be incredibly challenging, requiring a lot of data and time for model training. Here, we’re building applications for a factory with multiple problems to solve. Understanding how a factory floor space is used over time. Ensuring worker safety with constantly evolving machinery and risk factors. Inspecting products on the factory line, where operations change routinely. We start with a metropolis application running three pre-trained models from NGC.
Speaker 7: (01:21:59)
We’re up and running in minutes. But since the environment is visually quite different from the data used to train this model, the video analytics accuracy at the specific site isn’t great. People are not being accurately recognized and tracked and defects are being missed. Now, let’s use NVIDIA Tao to solve for this. With this simple UI, we retrain and adapt our pre-trained models with labeled data from the specific environment we’re deploying it.
Speaker 7: (01:22:22)
We select our data sets. Each is a few hundred images, as opposed to millions of labeled images required if we were to train this from scratch. With NVIDIA Tao, we go from 65% to over 90% accuracy. And through pruning and quantification, compute complexity is reduced by 2X ,with no reduction in accuracy and real-time performance is maintained.
Speaker 7: (01:22:43)
In this example, the result is three models, specifically trained for our factory, all in just minutes. With one click, we update and deploy these optimized models onto NVIDIA certified servers with Fleet Command seamlessly and securely from the cloud. From secure boot to our confidential AI enclave and GPUs, application data and critical intellectual property remains safe. AI accuracy system performance and health can be monitored remotely. This establishes a feedback loop for continuous application enhancements. The factory now has an end-to-end framework to adapt to changing conditions often and easily. We make it easy to adapt and optimize NGC Pre-Trained models with NVIDIA Tao, and deploy and orchestrate applications with Fleet Command.
Jensen Huang: (01:23:30)
We have all kinds of computer vision, speech, language, and robotics models, and more coming all the time. Some are the works of our genetics and medical imaging team. For example, this model that predicts supplemental oxygen needs from x-ray and electronic health records. This is a collaboration of 20 hospitals across eight countries and five continents for COVID-19. Federated learning and NVIDIA Tao was essential to make this possible. The world needs a state-of-the-art conversational AI that can be customized and process anywhere.
Jensen Huang: (01:24:03)
… conversational AI that can be customized and processed anywhere. Today, we’re announcing the availability of NVIDIA Jarvis, a state-of-the-art, deep learning AI for speech recognition, language understanding, translations and speech. End-to-end GPU accelerated, Jarvis interacts in about 100 milliseconds, listening, understanding and responding faster than the blink of a human eye. We train Jarvis for several million GPU hours on over one billion pages of text and 60,000 hours of speech in different languages, accents, environments and lingos.
Jensen Huang: (01:24:41)
Out of the box, Jarvis achieves a world class 90% recognition accuracy. You can get even better for your application by refining with your own data using a NVIDIA TAO. Jarvis supports five languages today. English, Japanese, Spanish, German, French and Russian. Jarvis does excellent translation. In the bilingual evaluation understudy benchmark, Jarvis scores 40 points for English to Japanese, and 50 points for English to Spanish. 40 is high quality and state-of-the-art, 50 is considered fluent translation. Jarvis could be customized for domain jargon. We’ve trained Jarvis for technical and healthcare scenarios. It’s easy with TAO. And Jarvis now speaks with expression and emotion that you can control, no more mechanical talk.
Jensen Huang: (01:25:33)
And lastly, this is a big one. Jarvis can be deployed in a cloud, on EGX and your data center, at the Edge in a shop, warehouse or factory, running on EGX Aerial, or inside a delivery robot running on a Jetson computer. Jarvis early access program began last May. And our conversational AI software has been downloaded 45,000 times. Among early users is T-Mobile, the U.S. telecom giant. They’re using Jarvis to offer exceptional customer service, the demands high quality and low latency needed in realtime speech recognition.
Jensen Huang: (01:26:12)
We’re announcing a partnership with Mozilla Common Voice, one of the world’s largest multi-language voice datasets. And it’s openly available to all. NVIDIA we use our DGS’s to process and train Jarvis. With the dataset of 150,000 speakers and 65 languages and offer Jarvis back to the community for free. So, go to Mozilla Common Voice and make some recordings. Let’s make universal translation possible and help people around the world understand each other.
Jensen Huang: (01:26:43)
Now, let me show you Jarvis. The first part of Jarvis is speech recognition. Jarvis is over 90% accurate out of the box. That’s world class. And you can still use TAO to make it even better for your application, like customizing for healthcare jargon. Chest x-ray shows left retrocardiac opacity. And this maybe due to atelectasis, aspiration or early pneumonia. I have no idea what I said, but Jarvis recognized it perfectly.
Jensen Huang: (01:27:11)
Jarvis translation now supports five languages. Let’s do Japanese. Excuse me, I’m looking for the famous Jungra ramen shop. It should be nearby, but I don’t see it on my map. Can you show me the way? I’m very hungry. That’s great. Excellent accuracy, I think. Instantaneous response. You can do German, French and Russian, with more languages on the way. Jarvis also speaks with feelings. Let’s try this. The more you buy, the more you save.
The more you buy, the more you save.
Jensen Huang: (01:27:52)
I think we’re going to need more enthusiasm.
The more you buy, the more you save.
Jensen Huang: (01:28:03)
NVIDA Jarvis, state-of-the-art, deep learning, conversational AI. Interactive response. Five languages. Customized with TAO. And deploy from cloud to Edge, to autonomous systems. Recommender systems are the most important machine learning pipeline in the world today. It is the engine for search, ads, online shopping, music, books, movies, user generated content, news. Recommender systems predict your needs and preferences from past interactions with you. Your explicit preferences and learned preferences, using methods called Collaborative and Content Filtering.
Jensen Huang: (01:28:42)
Trillions of items to be recommended to billions of people. The problem space is quite large. We would like to productize a state-of-the-art recommender system so that all companies can benefit from the transformative capabilities of this AI. We built an open source recommender system framework called Merlin, which simplifies an end-to-end workflow from ETL to training, to validation, to inference. It is architected to scale as your dataset grows. And if you already have a ton of data, it is the fastest recommender system every built.
Jensen Huang: (01:29:16)
In our benchmarks, we achieve speedups of 10 to 50X for ETL. Two to 10X for training. And three to 100 times for inference, depending on the exact setup. Merlin is now available on NGC. Our vision for Maxine is to eventually be your avatar in virtual worlds created in Omniverse, to enable virtual presence. The technologies of virtual presence can benefit video conferencing today. Alex is going to tell you all about it.
Hi, everyone. I’m Alex and I’m the Product Manager of NVIDIA Maxine. These days we’re spending so much time doing video conferencing, don’t you want a much better video communication experience? Today, I’m so thrilled to share with you NVIDIA Maxine. A switch of AI technologies that can improve the video conferencing experiencing for everyone. When combined with NVIDIA Jarvis, Maxine offers the most accurate speech to text. See, it’s now transcribing everything I’m saying, all in real time. And, in addition, with the help of Jarvis, Maxine can also translate what I’m saying into multiple languages that make international meetings so much easier. Another great feature I’d love to share with you is Maxine’s eye contact feature. Oftentimes, when I’m presenting, I’m not looking at a camera. When I turn on Maxine’s eye contact feature, it corrects the position of my eyes so that I’m looking back into the camera again. Now, I can make eye contact with everyone else. The meeting experience just gets so much more engaging.
Last, but definitely not least, Maxine can even improve the video quality when bandwidth isn’t sufficient. Now, let’s take a look at my video quality when bandwidth drops to as low as 50 kbps. Definitely not great. Maxine’s AI face codec can improve the quality of my video even when bandwidth is as low as 50 kbps. The most powerful part of Maxine is that all of these amazing AI features can run simultaneously together and all in realtime. Or you can just pick and choose a field, it’s just that flexible. Now, before I go, let me turn off all Maxine features, so you can see what I look like without Maxine’s help. Not ideal. Now, let’s turn back on all the Maxine features. See, that’s so much better. Thanks to NVIDIA Maxine.
Jensen Huang: (01:31:50)
NGC has pre trained models. TAO lets you fine tun and adapt models to your applications. Fleet Command deploys and orchestrates your models. The final piece is the Inference Server. To infer insight from the continuous streams of data coming into EGX servers or your cloud instance. And video Triton is our Inference Server. Triton is a model scheduling and dispatch engine that can handle just about anything you throw at it. Any AI model that runs on QDNN, basically every AI model. From any framework, TensorFlow, PyTorch, ONNX, OpenVINO, TensorRT or custom C++ Python backends. Triton schedules on the multiple generations of NVIDIA GPUs and x86 CPUs. Triton maximizes the CPU and GPU utilization. Triton scales with Kubernetes and handles live updates.
Jensen Huang: (01:32:45)
Triton is fantastic for everything from image to speech recognition. From recommenders to language understanding. But let me show you something really hard, generating biomolecules for drug discovery. In drug discovery, it all comes down to finding the right molecule, the right shape, the right interactions with the protein, the right pharmacokinetic properties. Working with scientists from AstraZeneca, NVIDIA researchers trained a language model to understand smiles, the language of chemical structures. We developed a model with MEGATRON and trained it on a SuperPOD, then created a generative model that reads the structure of successful chemical drug compounds, to then generate potentially effective novel compounds.
Jensen Huang: (01:33:34)
Now we can use AI to generate candid compounds that can then be further refined with physics based simulations, like Docking or Schrodinger’s FEP+. Generating with a CPU is slow, it takes almost eight seconds to generate one molecule. On a single A100 with Triton, it takes about 0.3 seconds, 32 times faster. Using Triton Inference Server, we can scale this up to a SuperPOD and generate thousands of molecules per second. AI and simulation is going to revolutionize drug discovery.
Jensen Huang: (01:34:14)
We provide these AI capabilities to our ecosystem of millions of developers around the world. Thousands of companies are building their most important services on NVIDIA AI. Best Buy is using Morpheus as the foundation of AI based anomaly detection in their network. GE Healthcare has built an echocardiogram that uses TensorRT to quickly identify different views of the wall motion of the heart and select the best ones for analysis. Spotify has over four billion playlists. They’re using RAPIDS to analyze models more efficiently, which lets them refresh personalized playlists more often. This keeps their content fresh with the latest trends. WeChat is using TensorRT to achieve eight millisecond latency, even when using the state-of-the-art natural language understanding models like BERT. This gives them more accuracy and relevant results when maintaining realtime response. It’s great to see NVIDIA AI used to build these amazing AI services.
Jensen Huang: (01:35:10)
NVIDIA AI on EGX with Aerial 5G. The AI computing platform for the Enterprise and Edge, the next wave of AI. DRIVE AV is our AV platform and service, AV is the most intense machine learning and robotics challenge. One of the hardest but also the greatest impact. We’re building an AV platform end-to-end, from the AV chips and computers, sensor architecture, data processing, mapping, developing the driving software, creating the simulator and digital twin, Fleet Command operations, to road testing. All of it to the highest functional safety and cyber security standards. None of these things are openly available, so we build them.
Jensen Huang: (01:35:55)
We build in modules, so our customers and partners in the $10 trillion transportation industry can leverage the parts they need. Meanwhile, we build our end-to-end AV service in partnership with Mercedes. AV computing demand is skyrocketing. Orin is a giant leap over Xavier. The more developers learn about AV, the more advanced the algorithms become and the greater the computing demand. More computation capacity gives teams faster iteration and speed to market. Leading some to call TOPS the new horsepower. And many are realizing that the reserve computing capacity today is an opportunity to offer new services tomorrow.
Jensen Huang: (01:36:36)
NVIDIA DRIVE is an open programmable platform that AI engineers all over the world are familiar with. New AI technology is being invented on NVIDIA AI constantly. These inventions will be tomorrow’s new services. Orin is an amazing AV computer. Today, we’re announcing that Orin was also designed to be the central computer of the car. Orin will process in one central computer the cluster, infotainment, passenger interaction AI, and very importantly, the confidence view, or the perception wold model. The confidence view is what the car actual perceives around it, and construct it into a 3D surround model. This is what’s in the mind of the autopilot AI. It is important that the car shows us that it’s surround perception is accurate, so that we have confidence in its driving.
Jensen Huang: (01:37:27)
In time, rear view mirrors will be replaced by surround 3D perception. The future is one central computer, four domains, virtualized and isolated, architected for functional safety and security, software defined and upgradable for the life of the car. And super smart AI, and beautiful graphics. Many car companies have already adopted Orin. And it would help them tremendously to also leverage our full AV development platform. Today, we’re announcing NVIDIA’s eighth generation Hyperion car platform, including reference sensors, AV and central computers, the 3D ground-truth data recorder, networking and all of the essential software. Hyperion 8 is compatible with the NVIDIA DRIVE AV stack. So, easy to adopt and integrate elements of our stack.
Jensen Huang: (01:38:19)
Hyperion 8 is a fully functional, going to production, open AV platform for the multi trillion dollar transportation ecosystem. Orin will be in production in 2022. Meanwhile, our next generation is already in full gear and will be yet another giant leap. Today, we’re announcing NVIDIA DRIVE Atlan. DRIVE Atlan will be 1,000 TOPS on one ship. More than the total compute in most level five robo-taxis today. To achieve higher autonomy in more conditions, sensor resolutions will continue to increase. There will be more of them. AI models will get more sophisticated. There will be more redundancy and safety functionality. We’re going to need all of the computing we can get.
Jensen Huang: (01:39:09)
Atlan is a technical marvel. Fusing all of NVIDIA’s technologies in AI, auto, robotics, safety and BlueField secure data center technologies. AV and software must be planned as multigenerational investments. The software you invested billions in today, must carry over to the entire fleet and to future generations. The DRIVE Atlan, the next level. One architecture.
Jensen Huang: (01:39:40)
The car industry has become a technology industry. Future cars are going to be completely programmable computers and business models are going to be software driven. Car companies will offer software services for the life of the car. The new tech mindset sees the car not just as a product to see, but as an install base of tens or hundreds of millions to build upon, creating billions in services opportunities. The world’s big brands have giant opportunities. Step one was going electric. Now, the big brands are making their new fleets autonomous and programmable. Orin will be powering many next generation EVs. And with Mercedes, we’re building the end-to-end system. The world moves 10 trillion miles a year. If only a fraction of these miles were served by robo-taxis, the opportunity is giant.
Jensen Huang: (01:40:33)
We expect robo-taxis to start ramping in the next couple of years. These services will also be platforms for all kinds of new services to be built upon. Like last mile delivery. The internet moves electrons. Trucks move atoms. The rise of e-commerce is putting intense pressure on the system. One click and a new TV shows up at your house. Another click and a cheeseburger shows up. This trend will only go up. The U.S. will be short by a 100,000 truckers by 2023. The E.U. is short 150,000 truckers already. China is short four million drivers. Do, ideas like driverless trucks from hub to hub, or driverless trucks inside a port or warehouse campus, are excellent near term ways to augment with automation.
Jensen Huang: (01:41:25)
We started my talk with Omniverse, we’ll close with Omniverse. You could see how important it is to our work, to robotics, to anyone building AIs, that interact with the physical world, to have a physically based simulator or digital twin. In the case of Isaac, the digital twin is the factory in Omniverse. In the case of DRIVE, the digital twin is the collective memories of the fleet captured in Omniverse. The DRIVE digital twin is used throughout the development, it’s used for HD map reconstruction, synthetic data generation so we can bootstrap training new models, new scenario simulations, [inaudible 01:42:02] simulations, release validation. For replaying unfamiliar scenarios experienced by a car, or a tele operator up linking into a car to remotely pilot.
Jensen Huang: (01:42:13)
The DRIVE digital twin in Omniverse is a virtual world that every car in the fleet is connected to. Today, we’re announcing that DriveSim, the engine of DRIVE digital twin, will be available for the community this summer. Let me show you what DRIVE AV and DriveSim can do. This is a mountain of technology. Enjoy. (silence) What a GTC. It’s incredible to think how much was done in this last year. I spoke about several things. NVIDIA is a full stack computing platform. We’re building virtual worlds with NVIDIA Omniverse, a miraculous platform that will help build the next wave of AI for robotics and self-driving cars. We announce new DGX systems and new software, MEGATRON for giant transformers, Clara for drug discovery and Quantum for quantum computing. NVIDA is now a three chip company. With the addition of gray CPU, designed for a specialized segment of computing focused on giant scale AI and HPC. And for data center infrastructure processing, we announced BlueField 3 and Doka 1.
Jensen Huang: (01:45:59)
NVIDIA is expanding the reach of AI. We announced an important new platform. NVIDIA EGX with Aerial 5G, to make AI accessible to all companies and industries. We’re joined by the leaders of IT, VMware, computer makers, infrastructure software, storage and security providers. And to lower the bar to the highest levels of AI, we offer pre trained models like Jarvis conversational AI, Merlin recommender system, Maxine virtual conferencing and Morpheus AI security. All of it is optimized on NVIDIA AI Enterprise. And with new tools like NVIDIA TAO, Fleet Command and Triton, it is easy for you to customize and deploy on EGX.
Jensen Huang: (01:46:49)
And DRIVE, our end-to-end platform for the $10 trillion transportation industry, which is becoming one of the world’s largest technology industries. From Orin and Atlan, the first 1,000 TOPS SoC, to Hyperion 8, a fully operational, reference AV car platform, to DriveSim, to DRIVE AV. We’re working with the industry at every layer. 20 years ago, all of this was science fiction. 10 years ago, it was a dream. Today, we’re living it. I want to thank all of you, developers and partners, and a very special thanks to all the NVIDIA employees. Our purpose is to advance our craft so that you, the Da Vincis of our time, can advance yours. Ultimately, NVIDIA is an instrument, an instrument for you to do your life’s work. Have a great GTC.