Moloco: 10x faster model training times with TPUs on Google Kubernetes Engine

In today’s congested digital landscape, businesses of all sizes face the challenge of optimizing their marketing budgets. They must find ways to stand out amid the bombardment of messages vying for potential customers’ attention. Moreover, they grapple with rising customer acquisition costs and dwindling retention rates, impeding their profitability.

Adding to this complexity is the abundance of consumer data, which businesses often struggle to harness effectively to target the right audience. To address these challenges, companies are seeking data-driven approaches to enhance their advertising effectiveness, to help ensure their continued relevance and profitability.

Moloco offers AI-powered advertising solutions that drive user acquisition, retention, and monetization efforts. Moloco Ads, its demand-side platform (DSP), utilizes its customers’ unique first-party data, helping them to target and acquire high-value users based on real-time consumer behavior — ultimately, delivering higher conversion rates and return on investment.

To meet this demand, Moloco leverages predictions from a dozen deep neural networks, while continuously designing and evaluating new models. The platform ingests 10 petabytes of data and processes bid requests per day at a peak rate of 10.5 million queries per second (QPS).

Moloco has seen tremendous growth over the last three years, with its business growing over 8X and multiple customers spending more than $50 million annually. Moloco’s rapid growth required an infrastructure that could handle massive data processing and real-time ML predictions while remaining cost effective. As Moloco’s models grew in complexity, training times increased, hindering productivity and innovation. Separately, the Moloco team realized that they also needed to optimize serving efficiency to scale low-latency ad experiences for users across the globe.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud containers and Kubernetes'), ('body', <wagtail.rich_text.RichText object at 0x3ec492e607c0>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com'), ('image', None)])]>

Training complex ML models with GKE

After evaluating multiple cloud providers and their solutions, Moloco opted for Google Cloud for its scalability, flexibility, and robust partner ecosystem. The infrastructure provided by Google Cloud aligned with Moloco’s requirements for handling its rapidly growing data and machine learning workloads that are instrumental to optimizing customers’ advertising performance.

Google Kubernetes Engine (GKE) was a primary reason for Moloco selecting Google Cloud over other cloud providers. As Moloco discovered, GKE is more than a container orchestration tool; it’s a gateway to harnessing the full potential of AI and ML. GKE provides scalability and performance optimization tools to meet diverse ML workloads, and supports a wide range of frameworks, allowing Moloco to customize the platform according to their specific needs.

GKE serves as a foundation for a unified AI/ML platform, integrating with other Google Cloud services, facilitating a robust environment for the data processing and distributed computing that underpin Moloco’s complex AI and ML tasks. GKE’s ML data layer offers the high-throughput storage solutions that are crucial for read-heavy workloads. Features like cluster autoscaler, node-auto provisioner, and pod autoscalers ensure efficient resource allocation.

“Scaling our infrastructure as Moloco’s Ads business grew exponentially was a huge challenge. GKE’s autoscaling capabilities enabled the engineering team to focus on development without spending a ton of effort on operations.” – Sechan Oh, Director of Machine Learning, Moloco

Shortly after migrating to Google Cloud, Moloco began using GKE for model training. However, Moloco quickly found that using traditional CPUs was not competitive at its scale, in terms of both cost and velocity. GKE’s ability to autoscale on multi-host Tensor Processing Units (TPUs), Google’s specialized processing units for machine learning workloads, was critical to Moloco’s success, allowing Moloco to harness TPUs at scale, resulting in significant enhancements in training speed and efficiency.

Moloco further leveraged GKE’s AI and ML capabilities to optimize the management of its compute resources, minimizing idle time and generating cost savings while improving performance. Notably, GKE empowered Moloco to scale its ML infrastructure to accommodate exponential business growth without straining its engineering team. This enabled Moloco’s engineers to concentrate on developing AI and ML software instead of managing infrastructure.

“The GKE team collaborated closely with us to enable auto scaling for multi host TPUs, which is a recently added feature. Their help has really enabled amazing performance on TPUs, reducing our cost per training job by 2-4 times.” – Kunal Kukreja, Senior Machine Learning Engineer, Moloco

In addition to training models on TPUs, Moloco also uses GPUs on GKE to deploy ML models into production. This lets the Moloco platform handle real-time inference requests effectively and benefit from GKE’s scalability and operational stability, enhancing performance and supporting more complex models.

Moloco collaborated closely with the Google Cloud team throughout the implementation process, leveraging their expertise and guidance. The Google Cloud team supported Moloco in implementing solutions that ensured a smooth transition and minimal disruption to operations. Specifically, Moloco worked with the Google Cloud team to migrate its ML workloads to GKE using the platform’s autoscaling and pod prioritization capabilities to optimize resource utilization and cost efficiency. Additionally, Moloco integrated Cloud TPUs into its training pipeline, resulting in significantly reduced training times for complex ML models. Furthermore, Moloco optimized its serving infrastructure with GPUs, ensuring low-latency ad experiences for its customers.

A powerful foundation for ML training and inference

Moloco’s collaboration with Google Cloud profoundly transformed its capacity for innovation.

“By harnessing Google Cloud’s solutions, such as GKE and Cloud TPU, Moloco dramatically reduced ML training times by up to tenfold.” – Sechan Oh, Director of Machine Learning, Moloco

This in turn facilitated swift model iteration and experimentation, empowering Moloco’s engineers to innovate with unprecedented speed and efficiency. Moreover, the scalability and performance of Google Cloud’s infrastructure enabled Moloco to manage increasingly intricate models and expansive datasets, to create and implement cutting-edge machine learning solutions. Notably, Moloco’s low-latency advertising experiences, bolstered by GPUs, fostered enhanced customer satisfaction and retention.

Moloco’s success demonstrates the power of Google Cloud’s solutions to enable businesses achieve their full potential. By leveraging GKE, Cloud TPU, and GPUs, Moloco was able to scale its infrastructure, accelerate its ML training, and deliver exceptional ad experiences to its customers. As Moloco continues to grow and innovate, Google Cloud will remain a critical partner in its success.

Meanwhile, GKE is transforming the AI and ML landscape by offering a blend of scalability, flexibility, cost-efficiency, and performance. And Google Cloud continues to invest in GKE so it can handle even the most demanding AI training workloads. For example, GKE now supports 65,000-node clusters, offering unmatched scale for training or inference. For more, watch this demo of 65,000 nodes on a single GKE cluster.