Enterprises are generating data at an exponential rate, spanning traditional structured transactional data, semi-structured like JSON and unstructured data like images and audio. Beyond the scale of the data, these divergent types present processing challenges for developers, at times requiring separate processing flows for each. With its initial release BigQuery’s support for semi-structured JSON eliminated the need for such complex preprocessing and providing schema flexibility, intuitive querying and the scalability benefits afforded to structured data.
Today, we are excited to announce the release of new SQL functions for BigQuery JSON, extending the power and flexibility of our core JSON support. These functions make it even easier to extract and construct JSON data and perform complex data analysis.
With these new query functions, you can:
-
Convert JSON values into primitive types (INT64, FLOAT64, BOOL and STRING) in an easier and more flexible way with the new JSON LAX functions
-
Easily update and modify an existing JSON value in BigQuery with the new JSON mutator functions.
-
Construct JSON object and JSON array with SQL in BigQuery with the new JSON constructor functions.
Let’s review these new features and some examples of how to use them. First, we will create a table for demonstration.
- code_block
- [StructValue([(u’code’, u’CREATE TABLE dataset_name.users_sample AS (rn SELECT JSON ‘{“name”: “Alice”, “age”: 28, “address”: {“country”: “USA”, “city”: “SF”, “zipcode”: 94102}}’ AS user UNION ALLrn SELECT JSON ‘{“name”: “Bob”, “age”: “40”, “address”: {“country”: “Germany”}}’ UNION ALLrn SELECT JSON ‘{“name”: “Charlie”, “age”: null, “address”: {“zipcode”: 12356, “country”: null}}’rn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfef719c410>)])]
- code_block
- [StructValue([(u’code’, u’Query:rnrn– Table contentsrnSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);rnOutput:rn+———————————————————————————–+rn| user |rn+———————————————————————————–+rn| {“address”:{“city”:”SF”,”country”:”USA”,”zipcode”:94102},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null,”zipcode”:12356},”age”:null,”name”:”Charlie”} |rn+———————————————————————————–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfef7e6ec50>)])]
Great! Let’s say we want to get a list of all users over 30. Looking at the table, you will see that user.age contains a JSON number in the first record, a JSON string in the second, and a JSON null in the third. With the new powerful LAX function, LAX_INT64, all types are automatically inferred and processed correctly.
- code_block
- [StructValue([(u’code’, u’Query:rnrnrnSELECT user.name FROM dataset_name.users_samplernWHERE LAX_INT64(user.age) > 30rnOutput:rn+——-+rn| name |rn+——-+rn| “Bob” |rn+——-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dff05d72b50>)])]
Unlike the “strict” conversion functions, which require that the JSON type matches the primitive type exactly, the “lax” conversion functions will also handle conversions between mismatched data types.
For example, the strict conversion function below would return an error:
- code_block
- [StructValue([(u’code’, u’Query:rnrnSELECT INT64(JSON ‘”10″‘) AS strict_int64rnOutput:rnError: The provided JSON input is not an integer’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dff064c1090>)])]
However, the lax conversion function below would return the desired result:
- code_block
- [StructValue([(u’code’, u’Query:rnrnSELECT LAX_INT64(JSON ‘”10″‘) AS lax_int64rnOutput:rn+———–+rn| lax_int64 |rn+———–+rn| 10 |rn+———–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dff05a483d0>)])]
Furthermore, you can quickly and easily remove a field in the JSON data by using JSON_REMOVE functions.
- code_block
- [StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_REMOVE(user, “$.address.zipcode”)rnWHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null},”age”:null,”name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dff05a48d50>)])]
JSON_STRIP_NULLS compresses the data by removing JSON nulls. Although BigQuery null values neither impact performance nor storage cost, it can be helpful for reducing data size during exports.
- code_block
- [StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_STRIP_NULLS(user, remove_empty=>true) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dff05d7e390>)])]
Now, what if we want to modify or add a field to the JSON data?
You can now update the data with the new JSON_SET function. And you have the ability to mix and match JSON functions together to achieve desired results.
For example, the query below adds a new field, “region_code”, to the table. The value of the field will be “America” if the value of the “country” field is “USA”, and “Other” if it is not.
- code_block
- [StructValue([(u’code’, u’– Updating/Adding a field is easy to do as well. The structure will be automatically created (see “Charlie” row)rnrnQuery:rnrnUPDATE dataset_name.users_sample SET user = JSON_SET(user, “$.address.region_code”, IF(LAX_STRING(user.address.country) = “USA”, “America”, “Other”)) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrn+——————————————————————————————-+rn| user |rn+——————————————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”,”region_code”:”America”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”,”region_code”:”Other”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“region_code”:”Other”},”name”:”Charlie”} |rn+——————————————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dff05d7e6d0>)])]
Last but not least, let’s say you have a table of property/value pairs you want to convert to a JSON object. With the new JSON_OBJECT constructor function, you can effortlessly create the new JSON object.
- code_block
- [StructValue([(u’code’, u’Query:rnrnWITH Fruits AS (rnSELECT 0 AS id, ‘color’ AS k, ‘Red’ AS v UNION ALLrnSELECT 0, ‘fruit’, ‘apple’ UNION ALLrnSELECT 1, ‘fruit’,’banana’ UNION ALLrnSELECT 1, ‘ripe’, ‘true’rn)rnSELECT JSON_OBJECT(ARRAY_AGG(k), ARRAY_AGG(v)) AS json_datarnFROM FruitsrnGROUP BY idrnOutput:rn+———————————-+rn| json_data |rn+———————————-+rn| {“color”:”Red”,”fruit”:”apple”} |rn| {“fruit”:”banana”,”ripe”:”true”} |rn+———————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dff05d7ee90>)])]
Complete list of functions
Lax conversion functions:
JSON constructor functions:
JSON mutator functions:
Try it out!
Google BigQuery is constantly adding new features to make it easier and more powerful to analyze your data. We encourage you to check them out and provide your feedback to us as we continue to develop additional features and capabilities to make working JSON easier and faster over time.