A modern HPC cluster integrates several technologies that work together to solve a wide range of computational tasks in the shortest time possible.
High-end compute nodes
Arrays of compute nodes with multicore processors are stacked on top of each other and stored in racks.
The compute nodes produce lots of heat which must be efficiently drained, either through air-cooling or liquid cooling.
The special sauce of HPC is to make multiple processor cores work together through parallel processing.
A high-bandwidth, low-latency network connects the nodes in the cluster, increasing the number of cores that can work together beyond a single node.
Fast connections between the nodes are necessary to make quick data exchange possible.
Parallel shared filesystem
The compute nodes are connected to a shared filesystem for storing input data, temporary data, and final calculation results.
Since the filesystem is shared, each node in the cluster nodes has access all the data of a user.
As all the nodes may be using the same filesystem at the same time, a high performance is critical.
A parallel filesystem achieves this performance by reading from (or writing to) multiple storage drives simultaneously.
The fastest parallel filesystem is still orders of magnitude slower than the data that can be generated by a compute node.
To mitigate slow disk reads and writes, some nodes are equipped with lots of RAM memory.
Accessing RAM memory is much faster than accessing HDDs or SSDs.
Applications that generate large amounts of intermediate results can take advantage of this by storing their data in memory as much as possible.
Graphical processing units (GPUs) can greatly improve the processing power of a HPC cluster through general-purpose computing on GPUs (GPGPU).
GPUs are specialized processors, ideally suited for data processing tasks.
Despite their limited instruction set, thanks to the thousands of processing cores highly parallel computations can be performed.
One node can contain multiple GPUs, and multiple GPU nodes can be combined to further increase parallelization.
In order to take advantage of all these technologies we need specialized software that is designed for HPC:
- built to perform calculations efficiently, making use of modern processor features
- splitting up complex computations into smaller tasks which can be calculated in simultaneously on multiple processors
- designed to make use of HPC technologies, such as GPGPU or in-memory calculations
Cloud computing is ideal for workloads that need maximal security and/or maximal flexibility.
Whereas in traditional HPC the underlying operating system can shared between users, in a cloud setup each user has exclusive access to an OS, which runs in a virtual machine.
Having exclusive access to the OS means users can customize their environment to their needs and perform a wide variety of workloads.
In addition, applications running in the cloud can be dynamically scaled up or down by adding or removing instances on demand.