The Cloud HPC Toolkit, now rebranded as Cluster Toolkit, simplifies the creation and management of high performance computing environments on Google Cloud. Initially focused on scientific and technical computing workloads, it has expanded to encompass AI/ML applications, reflecting its widespread adoption across various domains.
The Cluster Toolkit empowers users to focus on their workloads by streamlining cluster setup and deployment, leveraging Google Cloud’s best practices, and offering flexibility for diverse computing tasks. Key benefits include:
-
Easy deployment and management of clusters: The Toolkit simplifies the process of setting up and maintaining clusters, allowing users to focus on their workloads rather than infrastructure management. The Toolkit supports multiple schedulers including Slurm, GKE, and Batch.
-
Quickstart options for HPC and AI/ML workloads: The Toolkit has a library of pre-built blueprints and modules that let users begin running their workloads quickly, accelerating time-to-value.
-
Integration of Google Cloud best practices: The aforementioned blueprints and modules incorporate Google Cloud’s recommended configurations, ensuring that clusters are set up for optimal performance and efficiency.
-
Regular updates and new features: The Toolkit is actively maintained and updated with new features and improvements, providing users with ongoing support and enhancements.
-
Open-source accessibility: The Toolkit is open-source, allowing users to customize and extend its capabilities to meet their specific needs.
What’s new in Cluster Toolkit
In addition to a new name, Cluster Toolkit has several new features for HPC and AI/ML workloads:
-
A3 Mega Blueprint: This blueprint makes it easy to deploy a cluster of A3 Mega VMs ready for training large language models (LLMs) and other AI/ML workloads. Earlier in the year, we also launched the A3 Blueprint.
-
HPC VM Image: This VM Image is pre-installed with popular HPC tools and libraries, ensuring you can begin running your HPC workloads quickly with assured performance.
-
Note that we have released the final CentOS 7 version of the HPC VM Image. CentOS reached end-of-life on June 30, 2024, meaning that it will no longer receive security updates. Going forward, we strongly recommend moving to Rocky 8 and will be releasing regular Rocky 8 versions of the HPC VM Image.
-
We are releasing the ability to disable automatic updates in the HPC VM Image. Automatic updates can disrupt the performance of HPC applications, so we’re giving you the option to turn them off via metadata.
-
Slurm-gcp v6: The latest version of the Slurm-gcp solution, which provides a seamless experience for running Slurm workloads on Google Cloud, is now GA.
Guidelines for existing Toolkit customers
We’ve renamed our GitHub repo to “Cluster Toolkit” and some commands (e.g., ghpc is now gcluster). Existing Git operations and commands will still work, but we strongly recommend updating local clones and command names to avoid confusion.
How to get started
To get started with the Cluster Toolkit, select one of our easy-to-use HPC and AI/ML blueprints, available through our GitHub repo, and use it to set up a cluster. We also offer a variety of resources to help you get started, including documentation, quickstarts, and videos.