Aviator scales developer collaboration with Google Cloud

Editor’s note: In this post, we’ll hear from Aviator, an engineering productivity platform founded by former Googlers, about how its team is leveraging Google Cloud to build tools that solve developer collaboration challenges across the entire development lifecycle — from reviewing code to building, testing, merging, and deployment. If you have any follow-up questions after reading this post, please fill out this form to start a discussion with Aviator.

Although Google has been investing heavily in engineering productivity for the past two decades, it was not a primary area of focus for most of the industry until a few years ago. With the growth of remote work and the rapidly evolving AI landscape, however, this data-driven discipline has come to the fore as organizations look to help their engineering teams be more effective.

As former Googlers, we have firsthand experience with the challenges (and opportunities) of enhancing engineering productivity. That’s why we set out to build Aviator — an engineering productivity platform that helps teams eliminate necessary-but-mundane tasks from their work days while improving performance across every step of the development lifecycle.

Building a scalable service with Google Cloud

As our mission is to bring Google-level productivity engineering to every developer, it was an easy choice to build Aviator from scratch on Google Cloud. We also applied and got into the Google for Startups program, which offers technical training, business support, and generous cloud product credits. This helped our team explore several cloud capabilities without worrying too much about the cost.

Using key metrics developed by the DORA (DevOps Research and Assessment) team as our guiding principles, we built a platform with Google Cloud that offers:

Faster, flexible code reviews: Optimize code review cycles with automated code review rules, instant reviewer suggestions, and defined response time goals. These capabilities enable developers to ship code faster, improve development team velocity, and reduce the overall time it takes code to get into production.
Accelerated review cycles: Break down code changes with stacked pull requests (PRs) — multiple small code changes that can be individually reviewed in a sequence and then synced — to unblock development bottlenecks and avoid merge conflicts.
Streamlined, customizable merging: Take control of busy repositories with a high-throughput merge queue built to scale thousands of PRs and reduce outdated pull requests, merge conflicts, incompatible changes, and broken builds. This improves deployment frequency and change failure rate as independent code changes are validated before they are merged back to the main line of development.
Smart, service-specific release notes: Get rid of messy release notes and error-prone verification processes with a single dashboard that helps teams generate release notes automatically and easily manage deployments, rollbacks, and releases across all environments. The releases framework also improves deployment frequency and rollbacks, helping development teams reduce the time it takes to recover from production failures and deliver more reliable products and systems.

To bring our scalable service to life, we used several different Google Cloud products. For example, Aviator relies heavily on background tasks to perform automated actions by design. We chose Google Kubernetes Engine (GKE) to help us scale Aviator to thousands of active users creating millions of code changes, which allowed us to horizontally scale our Kubernetes pods as usage grew. In addition, Google Cloud helped us manage deployments without needing to store credentials with the CD platform. We also leveraged Google Cloud’s modern IAM architecture to provide greater flexibility for authorization management.

Using Google Cloud, Aviator is also able to further streamline collaboration and management for engineers with these additional capabilities:

System health monitoring

Prometheus is an open-source monitoring platform that uses a pull model that collects time series data from configured targets, such as infrastructure and applications. Using Managed Service for Prometheus, we were able to set up end-to-end monitoring and alerting for Aviator without having to worry about scaling and reliability. As part of Cloud Monitoring, it also gives us access to more than 6,500 free metrics along with our Prometheus metrics that provide a full picture into the performance, availability, and health of our service in a single place.

Logs management

Aviator actively interacts with third-party services, such as GitHub, PagerDuty, and Slack, primarily using API calls. These API calls often fail due to reliability or network issues with these services. To address this problem, we used Google Cloud’s robust log management capabilities to ensure that we could easily troubleshoot and resolve any issues that are reported. This also made it easy to filter the logs for different services, make structured queries, or even create alerts based on certain conditions.

Slow queries detection

For our core database, we used Cloud SQL, Google Cloud’s fully managed database service for PostgreSQL, which provides high availability and performance out of the box. More recently, we have been exploring query tagging with Sqlcommenter to enable us to detect slow queries on Aviator. This open-source library samples and tags all queries, allowing us to pinpoint the source of each slow query quickly. We also use the Sqlcommenter Python library, which integrates well with our application backend.

Rate limits management

Given that our team works with so many third-party services, managing rate limits was also very critical for us to ensure an uninterrupted experience for our users while staying under the allowed limits from the third-party services. In addition, Aviator itself has numerous APIs that have to be rate limited. We leveraged Memorystore for Redis to simplify the process of tracking and enforcing rate limits for both inbound and outbound API calls.

Self-hosted, single-tenant, and in the cloud

Since Aviator supports engineering teams of all sizes — from teams of 20 engineers to those with more than 2,000 — installation can vary greatly. For us, it was imperative that Aviator be able to meet many different types of requirements and needs.

Today, a developer can choose from cloud, self-hosted, or single tenant installation options when setting up Aviator. Let’s dive into details of each:

Cloud installation

This version is the simplest setup for a user and completely managed by Aviator in Google Cloud using a Kubernetes cluster. We also update it with a standard daily deployment.

Self-hosted

Some users prefer a self-hosted version of Aviator that they can set up on their private cloud. For this setup, we publish Helm charts to a private repository and upload new releases of Aviator software as Docker images to Google Cloud’s Artifact registry.

For every self-hosted customer, we create a new IAM service account with an authentication key with read-only access to the private repository where we host our Docker images, which is then shared with our users. This makes the process simple and secure to enable our users to install a self-hosted version of Aviator.

Single-tenant

Single-tenant installation is very similar to the self-hosted version, except Aviator manages the installation within our own Google Cloud account. This provides users with better security and control over their Aviator setup.

AI explorations

Recent advancements with LLMs have shown even more promising opportunities for scaling engineering productivity. At Aviator, we have already started exploring several AI-powered tools that can assist in various stages of development lifecycle, including:

Test generation: Leveraging AI to generate test cases automatically saves developers significant time and helps catch potential bugs early in the development cycle.
Code auto-completion: Tools like GitHub Copilot, powered by AI, suggest code snippets in real-time, allowing developers to code faster and more accurately.
Predictive test selection: By predicting which tests are likely to fail based on code changes, AI can reduce the number of tests run in each cycle, significantly speeding up the development process.

Google is a recognized leader in AI and has been at the forefront of AI innovation for over a decade, uniquely positioning Google Cloud to offer some of the industry’s leading AI technologies, including Vertex AI and Gemini. Having a partner like Google Cloud provides Aviator with a powerful AI foundation, allowing us to accelerate development and launch gen AI capabilities blazingly fast.

Conclusion

Engineering productivity is more than just a performance metric — it’s a critical driver of business success. By enhancing developer collaboration and efficiency, companies can significantly reduce time-to-market and adapt more quickly to changing market demands. Google Cloud has proven to be an invaluable partner in this journey. Its combination of reliability, speed, and performance, along with its cutting-edge AI capabilities, makes it uniquely equipped to support fast iterations while abstracting away complexities. At Aviator, we are excited to continue leveraging these tools to push the boundaries of what’s possible in engineering productivity.

Learn more about how Google Cloud can help your startup, and unlock resources at the recently launched Startup Learning Center.