Success Story: GIGABYTE and NVIDIA Help “Talos” Fight Against Aging at Rey Juan Carlos University
Background of the Success Story
In early 2023, Rey Juan Carlos University completed the installation of the “Talos” cluster, led by researchers Sergio Muñoz and Luis Bote. Talos, a name derived from Greek mythology’s first non-organic artificial intelligence, provides significant computational power to the team’s research on cellular aging mechanisms.
Sergio Muñoz, who has a Ph.D. in machine learning and is a biomedical engineering professor at Rey Juan Carlos University, collaborates with the BigMed+ professors and researchers in designing AI and machine learning algorithms.
University of Rey Juan Carlos (URJC) is a dynamic institution known for its contributions to cutting-edge knowledge, ranking well both nationally and internationally. With 46,000 students and five research groups spanning 31 fields of arts, sciences, and literature, the university boasts a vibrant academic environment.
In their research, algorithms are vital for not only providing solutions but also comprehending the underlying data. Understanding the data enables the algorithms to respond effectively to questions. In this field, black boxes, which cannot provide answers to these questions, are unwelcome. While humans excel at certain perceptual tasks, they struggle with extracting hidden insights from vast amounts of data. Hence, processing this information and discovering concealed patterns to address the posed questions is crucial.
Health, particularly biomedical engineering, holds a central focus and great significance in their research. To design artificial intelligence algorithms, they require horizontally scalable algorithms, especially in the domain of machine learning.
Solution for the Research Challenge
Overcoming the barrier of limited storage and infrastructure capacity for horizontal scaling and efficient algorithm execution was a top priority. Therefore, securing a substantial number of CPU cores, such as those provided by SIE and GIGABYTE, became crucial.
Moreover, they developed explainable AI algorithms, emphasizing deep learning techniques and generative models, necessitating the use of cutting-edge NVIDIA A100 Tensor Core GPUs built with NVIDIA Ampere architecture.
Given that the research group specializes in designing space-temporal simulations, the GPUs had to perform well in double precision calculations.
The needs of this research group are three:
- A significant amount of CPU cores to make parallel computing and apply their machine learning models.
- Double precision GPUs and the last generation of explainable AI and simulation.
- Enough storage, especially for biomedical applications, allows getting an important budget through a European research program that involves researchers from all over the world.
The researchers contacted an integrator team that designed a cluster that was able to break
those technical barriers. Thanks to the knowledge and experience that SIE has gained through
HPC, it was possible to manage a computation center ideal for research with the GIGABYTE G492-ZD2 platforms.
G492-ZD2 – the GIGABYTE GPU Server Solution that Empowers the Researchers
The G492-ZD2 is a server purpose built for the absolute best in GPU-centric workloads. It uses a dual chamber design in a 4U chassis, with the top 1U dedicated for the CPU platform and the bottom 3U dedicated for the GPUs, all while still having support for up to 10 low-profile NICs. This solution offers the best air cooling possible so that the system can sustain peak performance without compromising.
In the configuration chosen by the research team at URJC, each GPU node has two AMD EPYC 7282 processors for a combined 32 CPU cores and 128 PCIe 4.0 lanes. The heavy lifting and parallel processing come from NVIDIA HGX A100 SXM4 GPUs. Each GPU server has eight NVIDIA A100 GPUs. This innovative GPU cluster has impressive computing power seen in its 221,184 CUDA cores. And it achieves a theoretical FP64 performance of more than 300 TFLOPS. Connectivity is optimized for GPU-to-GPU direct data movement; the NVIDIA A100 Tensor Core GPUs are interconnected through several NVIDIA® NVLink™ interconnects, which gives a rate of 600 GB/s of throughput between GPUs.
The inclusion and choices of the NVIDIA A100 SXM4 modules in the G492-ZD2 system is important, in that new NVIDIA Magnum IO GPUDirect technologies favor faster throughput while offloading workloads from the CPU to achieve performance boosts. G492-ZD2 supports NVIDIA GPUDirect RDMA for direct data exchange between GPUs and third-party devices such as NICs or storage adapters. And there is support for GPUDirect Storage for a direct data path to move data from storage to GPU memory while offloading the CPU, thus resulting in higher bandwidth and lower latency.
Access to the 576 TB of shared storage capacity of the cluster, can be done by native InfiniBand, with a high speed of management through data and metadata, with GFS Access, a hardware-independent POSIX parallel file system that allows a great concurrent speed of the nodes and the user management, with capacity that can be increased in the future to 1 PB. It was designed for all performance-oriented environments, including HPC, AI, deep learning, and life sciences.
About the Future
This group is not only focused on knowledge, but also on transferring this knowledge. The super computation center benefits collaborators and society by sharing knowledge with partner universities. Rapid data processing aids companies interested in machine learning, enabling valuable data integration. Their future research focuses on two areas, one is studying partial or transitory cellular reprogramming for enhanced quality of life, and the other is oncology. Talos has a promising future ahead.
For the full story, please visit the success story page on the website of SIE: https://www.sie.es/…
GIGABYTE is an engineer, visionary, and leader in the world of tech that uses its hardware expertise, patented innovations, and industry leadership to create, inspire, and advance. Renowned for over 30 years of award-winning excellence in motherboards and graphics cards, GIGABYTE is a cornerstone in the HPC community, providing businesses with server and data center expertise to accelerate their success.
<br>
<br>
About Giga Computing<br>
Giga Computing Technology is an industry innovator and leader in the enterprise computing market. Having spun off from GIGABYTE, we maintain hardware expertise in manufacturing and product design, while operating as a standalone business that can drive more investment into core competencies.
Giga Computing Technology Co., Ltd.
7F, 6 Baoqiang Rd., Xindian Dist.
231 New Taipei City
Telefon: +31 40 290 2071
Telefax: +49 (40) 253304-45
https://www.gigabyte.com/
Telefon: +31 40 290 2071
E-Mail: bernice@giga-byte.nl