We’re excited to share that Gartner has recognized Google as a Leader for the third year in a row in the 2025 Gartner® Magic Quadrant™ for Container Management, based on its Completeness of Vision and Ability to Execute. Google was positioned highest in Ability to Execute of all vendors evaluated and we believe this validates the success of our mission to be the best place for customers anywhere to run containerized workloads. In the 2025 Gartner Critical Capabilities for Container Management report, Google Cloud was ranked first in every critical capability, in addition to leading every major use case: new cloud native applications, containerize existing applications, AI workloads, edge applications, and hybrid applications.
Gartner predicts that “by 2027, more than 75% of all AI/ML deployments will use container technology as the underlying compute environment, which is a major increase from fewer than 50% in 2024.” Containers power today’s most innovative apps and businesses — and deliver the infrastructure customers demand as they transform their businesses with AI.
Google Cloud led the cloud-native and container technology charge when we introduced Kubernetes in 2014 and when we launched Google Kubernetes Engine (GKE), the first managed Kubernetes service, in 2015. We are proud and humbled to celebrate GKE’s 10th birthday this year, alongside our community, customers, and industry partners. Learn more and join the celebration by exploring the 10 years of GKE ebook. We‘ve distilled a decade of insights into what makes GKE so effective, how customers are using GKE to support their work at scale, and why we’re ready for everything AI has in store for the decade ahead.
With a goal to create the most scalable, secure, and simple Kubernetes service available, we’ve set the bar high for GKE. Gartner’s recognition of Google Cloud as the highest in Ability to Execute reflects and validates our commitment to improve customer experience. Here are three initiatives we’re focused on: enhancing GKE’s scalability, performance, and cost-efficiency with container-optimized compute, accelerating our customers’ path to AI value on GKE, and establishing Cloud Run as the fastest and simplest AI app and container hosting solution.
Container-optimized compute for every workload
Stop managing nodes and start managing applications with GKE Autopilot. Use Autopilot for up to 7x faster pod scheduling and intelligent, low-latency scaling with GKE’s container-optimized compute platform. With Autopilot’s fast node provisioning when you need it, you only have to pay for the pod resources you use, not idle VMs. This provides a true and unique “serverless Kubernetes” experience. GKE’s architectural scale (up to 65,000 nodes) helps ensure your platform doesn’t become an innovation bottleneck. Build a secure, governable multi-cluster platform out-of-the-box with fleet management, GitOps (Config Sync), and policy enforcement (Policy Controller) included at no extra cost, reducing the tooling sprawl and complexity inherent in competing services.
Leading customers like Signify and Toyota have shared how GKE powers their most business-critical apps and workloads at global scale.
“Scaling our infrastructure as Moloco’s Ads business grew exponentially was a huge challenge. GKE’s autoscaling capabilities enabled the engineering team to focus on development without spending a ton of effort on operations.” – Sechan Oh, Director of Machine Learning, Moloco
GKE: The AI-ready platform for powering innovation
GKE is engineered for massive-scale AI workloads, supporting clusters up to 65,000 nodes. It offers industry-leading integration with diverse AI accelerators, including various generations of NVIDIA GPUs (H100, A100, L4) and Google’s own TPUs (Tensor Processing Units), which provide excellent price-performance for training, fine-tuning, and inference. Recent advancements like Dynamic Resource Allocation (DRA), driven by Google’s contributions to upstream Kubernetes, and Custom Compute Classes, help ensure strong utilization and obtainability of these expensive resources, as do features like intelligent fallback priorities across different capacity types (reservations, on-demand, Spot). This makes GKE a great place for the most demanding AI training and inference jobs.
Plus, GKE Inference Gateway introduces model-aware load balancing that routes requests based on metrics like KV cache utilization and pending queue length, helping to reduce serving costs by up to 30%, tail latency by up to 60%, and increase throughput by up to 40%. Our focus on inference-specific optimizations is critical for customers deploying high-performance, cost-effective generative AI models in production. Cluster Director for GKE simplifies the deployment and management of large, AI-optimized clusters, including automated repair and topology-aware scheduling. And GKE offers direct support for popular AI/ML frameworks like Ray (with Ray on GKE for distributed training and serving) and vLLM.
Leading customers like Moloco and Anthropic have shared how GKE is powering their AI futures.
“At Contextual AI, we are building the next generation of Retrieval Augmented Generation (RAG). Contextual Language Models (CLMs) are end-to-end optimized to address pain points of RAG 1.0 and help enterprise customers build production-grade workflows. To achieve this, we rely on GKE Autopilot, a fully managed Kubernetes service that handles the complexity of running our application. With GKE Autopilot, we can easily scale our pods, optimize our resource utilization, and ensure the security and availability of our nodes. We also take advantage of the new billing models that offer more cost-effective GPUs for our inference tasks, while using regular Autopilot pods for our non-GPU services. We are excited to use GKE Autopilot to power CLMs while saving us money and improving our performance.” – Soumitr Pandey, Member of Technical Staff, Contextual AI
Cloud Run: The fastest way to get your AI applications to production
We recently announced GA for GPU support, specifically NVIDIA L4 GPUs, in Cloud Run. This is a significant differentiator for AI workloads as it allows developers to leverage powerful hardware for inference (like LLMs) while still benefiting from Cloud Run’s serverless advantages: scale to zero (no cost when idle), pay-per-second billing, and rapid startup times (around 5 seconds for GPU instances).
We have also dramatically advanced the developer experience for AI on Cloud Run. A key recent launch is the collaboration with Docker, enabling users to deploy Docker Compose files directly to Cloud Run. This simplifies moving AI applications from local development to production, especially for multi-container applications and those leveraging new AI-specific “models” attributes in Compose. Cloud Run also supports direct deployment of applications, Gemma, and other open models from Google AI Studio to Cloud Run with a single click, making it faster to bring AI apps from idea to production.
Leading customers like Telegraph, L’Oreal, and Ford have shared how they are transforming businesses and markets with Cloud Run.
“The most interesting feature of Cloud Run is that it helps us keep ML processes simpler, at scale, and with lower costs. We have also achieved flexibility gains.” – Everton Alvares Cherman, Co-founder and CTO, Birdie.ai
Take the next steps on your container journey
Wherever our customers build and run containers, from Google Cloud to other clouds to the data center and the edge, we aim to deliver the most simple, comprehensive, secure, and reliable container platform (Kubernetes and serverless) for all workloads. Let us help accelerate your business transformation journey today. We can’t wait to see what you build!
-
Download your complimentary copy of the 2025 Gartner® Magic Quadrant™ for Container Management.
-
Read all about how a decade of GKE evolution is shaping the future in Celebrating 10 years of GKE: Incredible customer journeys, amazing AI futures and download the ebook.
-
If you’re attending KubeCon in November in Atlanta, please stop by our booth and register for Google Container Day @ KubeCon and CloudNativeCon.
Gartner, Magic Quadrant for Container Management, Dennis Smith, et al, 6 August 2025
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved.
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Google.