Improve query performance and optimize costs in BigQuery using the anti-pattern recognition tool

Improve query performance and optimize costs in BigQuery using the anti-pattern recognition tool

BigQuery is a serverless and cost-effective enterprise data warehouse that works across cloud environments and scales with your data. As with any large scale data-intensive platform, following best practices and avoiding inefficient anti-patterns goes a long way in terms of performance and cost savings. 

Usually SQL optimization requires a significant time investment from engineers, who must read high-complexity queries, devise a variety of approaches to improve performance and efficiency, and test several optimization techniques. The best place to start is to fix anti-patterns, since this only requires easily applicable changes and provides significant performance improvements.

To facilitate the task of identifying and fixing said anti-pattern, Google Professional Services Organization (PSO) and Global Services Delivery (GSD) have developed a BigQuery anti-pattern recognition tool. This tool automates the process of scanning SQL queries, identifying antipatterns, and providing optimization recommendations.

1 BigQuery anti-pattern recognition tool.jpg

What is the BigQuery anti-pattern recognition tool?

The BigQuery anti-pattern recognition tool let you easily identify performance impacting anti-patterns across a large number of SQL queries in a single go. 

It utilizes ZetaSQL to parse BigQuery SQL queries into abstract syntax trees (AST) and then traverses the tree nodes to detect the presence of anti-patterns. 

The tool takes a BigQuery SQL query as an input, such as:

code_block
[StructValue([(u’code’, u’SELECTrn t.dim1,rn t.metric1rnFROMrn `dataset.table` trnWHERErn t.id NOT IN (rn SELECTrn idrn FROMrn `dataset.table2`)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b4f389190>)])]

And produces the output as:

code_block
[StructValue([(u’code’, u’Subquery in filter without aggregation at line 8.’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b58df6f50>)])]

It examines potential optimizations, including:

  • Selecting only the necessary columns

  • Handling multiple WITH-clause references 

  • Addressing subqueries in filters with aggregations

  • Optimizing ORDER BY queries with LIMIT

  • Enhancing string comparisons

  • Improving JOIN patterns

  • Avoiding subquery aggregation in the WHERE clause

The solution supports reading from various sources, such as:

  • Command line 

  • Local files

  • Cloud Storage files

  • Local folders

  • Cloud Storage folders

  • CSV (with one query per line)

  • INFORMATION_SCHEMA

Additionally, the solution provides flexibility in writing output to different destinations, including:

  • Printing to the terminal

  • Exporting as CSV

Writing to a BigQuery table

Using the BigQuery anti-pattern recognition tool

The BigQuery anti-pattern recognition tool is hosted on GitHub. Below are the Quick Start steps on using the tool via command line for inline queries. You can also leverage Cloud Run to deploy it as a container on cloud. 

Prerequisites

  • Linux OS

  • JDK 11 or above is installed

  • Maven

  • Docker

  • gcloud CLI

Quick start: – steps

1. Clone the repo into your local machine.

code_block
[StructValue([(u’code’, u’git clone [email protected]:GoogleCloudPlatform/bigquery-antipattern-recognition.git’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b58df6cd0>)])]

2. Build the tool image inside the `bigquery-antipattern-recognition` folder.

code_block
[StructValue([(u’code’, u’mvn clean package jib:dockerBuild -DskipTests’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b58df63d0>)])]
2 BigQuery anti-pattern recognition tool.png

3. Run the tool for a simple inline query.

code_block
[StructValue([(u’code’, u’docker run rn -i bigquery-antipattern-recognition rn –query “SELECT * FROM `project.dataset.table1`”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b4f33ad50>)])]

4. Below is the output result in the command-line interface:

3 BigQuery anti-pattern recognition tool.png

Additionally, the above tool can read queries from Information Schema and load the output recommendations to a BigQuery table.

Below is an example of the BigQuery anti-pattern recognition tool results exported to a BigQuery table.

4 BigQuery anti-pattern recognition tool.png

Getting started

Ready to start optimizing your BigQuery queries and cutting costs? Check out the tool here and contribute to the tool via GitHub.

Have questions or feedback?

We’re actively working on new features to make the tool as useful to our customers. Use it and tell us what you think! For product feedback/technical questions, reach out to us at [email protected]. If you’re already a BigQuery customer and would like a briefing on the tool, please reach out, we’d be happy to talk.