Q + A + Q + A + Q + A + Q + A + Q + A + Q + A + Q + A +
KEVIN DEIERLING,
VP OF MARKETING,
NVIDIA
EDITOR’S QUESTION
Advanced AI applications are
becoming commonplace across
cloud, enterprise and Edge, driving
massive compute and data requirements
and making data centre resiliency more
critical than ever. Data centre resilience
is achieved by adopting a cloud-native
architecture, where applications are
broken down into small, distributed
microservices which are assembled – or
composed – into scalable applications as
needed and on-demand. Such cloud-native
applications are far more resilient than
apps developed as giant monolithic code
behemoths, because the small, cooperating
microservices dynamically come and go.
These microservices are implemented
within containers, so they are easy to
launch or update and the application can
quickly scale across hundreds and even
thousands of nodes. Resilience to failure
is a huge additional benefit of this cloudnative
architecture, because the distributed
application is designed to accommodate
containers that come and go, whether
intentionally or not. So failures of individual
containers or entire servers are expected
and accommodated by design and the
microservices are quickly replaced by new
containers running on different servers.
Accelerated computing using GPUs and
an intelligent network are critical elements
needed to build this resilient, distributed
cloud-native data centre. A good example
is the NVIDIA’s accelerated computing
GPUs for AI applications, that deliver
faster and more efficient natural language
processing, Big Data analytics, task
automation and recommendation engines
for both consumers and IT staff. GPUpowered
AI can recognise anomalies or
problematic trends in power consumption,
storage usage, network traffic, hardware
reliability, or response time to let data
centre professionals prevent outages or
resource shortages. It can also recognise
“
RESILIENCE
TO FAILURE
IS A HUGE
ADDITIONAL
BENEFIT OF THIS
CLOUD-NATIVE
ARCHITECTURE,
BECAUSE THE
DISTRIBUTED
APPLICATION
IS DESIGNED TO
ACCOMMODATE
CONTAINERS
THAT COME
AND GO.
and stop security threats or intrusions
more quickly. The AI acceleration is
complemented by the intelligent NVIDIA
networking switches, sSmartNICs and Data
Processing Units (DPUs) from the Mellanox
acquisition. The SmartNICs offload SDN,
virtualisation (for networking containers),
data movement and encryption tasks from
the CPUs. This allows applications to run
more quickly while using fewer CPUs and
servers, and also simplifies connecting
new or moved containers with their
microservices. The DPUs provide security
isolation, a distributed software-defined,
hardware-accelerated data and control
plane, and storage virtualisation to servers
and containers, making it faster and easier
to spin up or spin down microservices with
all the needed security protections and
just the right amount of shared storage.
Additionally, intelligent, open-networking
switches provide multiple high-bandwidth
paths between servers to avoid bottlenecks
or outages due to congestions or
broken links. The switches also provide
programmable fabric automation and smart
telemetry across the network, increasing
resiliency and simplifying the management
of composable microservices. This entire
accelerated AI computing stack and cloudnative
fabric are fully integrated within a
Kubernetes container orchestration platform
that is at the heart of achieving resilience
and scale in next-generation data centres. •
www.intelligentcio.com
INTELLIGENTCIO
35