Eighty percent of data leaders believe that the lines between data and AI are blurring. Using large language models (LLMs) with your business data can give you a competitive advantage, but to realize this advantage, how you structure, prepare, govern, model, and scale your data matters.
Tens of thousands of organizations already choose BigQuery and its integrated AI capabilities to power their data clouds. But in a data-driven AI era, organizations need a simple way to manage all of their data workloads. Today, we’re going a step further and unifying key data Google Cloud analytics capabilities under BigQuery, which is now the single, AI-ready data analytics platform. BigQuery incorporates key capabilities from multiple Google Cloud analytics services into a single product experience that offers the simplicity and scale you need to manage structured data in BigQuery tables, unstructured data like images, audience and documents, and streaming workloads, all with the best price-performance.
BigQuery helps you:
-
Scale your data and AI foundation with support for all data types and open formats
-
Eliminate the need for upfront sizing and just simply bring your data, at any scale, with a fully managed, serverless workload management model and universal metastore
-
Increase flexibility and agility for data teams to collaborate by bringing multiple languages and engines (SQL, Spark, Python) to a single copy of data
-
Support the end-to-end data to AI lifecycle with built-in high availability, data governance, and enterprise security features
-
Simplify analytics with a unified product experience designed for all data users and AI-powered assistive and collaboration features
With your data in BigQuery, you can quickly and efficiently bring gen AI to your data and take advantage of LLMs. BigQuery simplifies multimodal generative AI for the enterprise by making Gemini models available through BigQuery ML and BigQuery DataFrames. It helps you unlock value from your unstructured data, with its expanded integration with Vertex AI’s document processing and speech-to-text APIs, and its vector capabilities to enable AI-powered search for your business data. The insights from combining your structured and unstructured data can be used to further fine-tune your LLMs.
Support for all data types and open formats
Customers use BigQuery to manage all data types, structured and unstructured, with fine-grained access controls and integrated governance. BigLake, BigQuery’s unified storage engine, supports open table formats which let you use existing open-source and legacy tools to access structured and unstructured data while benefiting from an integrated data platform. BigLake supports all major open table formats, including Apache Iceberg, Apache Hudi and now Delta Lake natively integrated with BigQuery. It provides a fully managed experience for Iceberg, including DDL, DML and streaming support.
Your data teams need access to a universal definition of data, whether in structured, unstructured or open formats. To support this, we are launching BigQuery metastore, a managed, scalable runtime metadata service that provides universal table definitions and enforces fine-grained access control policies for analytics and AI runtimes. Supported runtimes include Google Cloud, open source engines (through connectors), and 3rd party partner engines.
Use multiple languages and serverless engines on a single copy of data
Customers increasingly want to run multiple languages and engines on a single copy of their data, but the fragmented nature of today’s analytics and AI systems makes this challenging. You can now bring the programmatic power of Python and PySpark right to your data without having to leave BigQuery.
BigQuery DataFrames brings the power of Python together with the scale and ease of BigQuery with a minimum learning curve. It implements over 400 common APIs from pandas and scikit-learn by transparently and optimally converting methods to BigQuery SQL and BigQuery ML SQL. This breaks the barriers of client side capabilities, allowing data scientists to explore, transform and train on terabytes of data and processing horsepower of BigQuery.
Apache Spark has become a popular data processing runtime, especially for data engineering tasks. In fact, customers’ use of serverless Apache Spark in Google Cloud increased by over 500% in the past year.1 BigQuery’s newly integrated Spark engine lets you process data using PySpark as you do with SQL. Like the rest of BigQuery, the Spark engine is completely serverless — no need to manage compute infrastructure. You can even create stored procedures using PySpark and call them from your SQL-based pipelines.
Make decisions and feed ML models in near real-time
Data teams are also increasingly being asked to deliver real-time analytics and AI solutions, reducing the time between signal, insight, and action. BigQuery now helps make real-time streaming data processing easy with new support for continuous SQL queries, an unbounded SQL query that processes data the moment it arrives via SQL statement. BigQuery continuous queries amplifies downstream SaaS applications, like Salesforce, with the real-time enterprise knowledge of your data and AI platform. In addition, to support open source streaming workloads, we are announcing a preview of Apache Kafka for BigQuery. Customers can use Apache Kafka to manage streaming data workloads and feed ML models without the need to worry about version upgrades, rebalancing, monitoring and other operational headaches.
Scale analytics and AI with governance and enterprise features
To make it easier for you to manage, discover, and govern data, last year we brought data governance capabilities like data quality, lineage and profiling from Dataplex directly into BigQuery. We will be expanding BigQuery to include Dataplex’s enhanced search capabilities, powered by a unified metadata catalog, to help data users discover data and AI assets, including models and datasets from Vertex AI. Column-level lineage tracking in BigQuery is now available in preview, which will be followed by a preview for lineage for Vertex AI pipelines. Governance rules for fine-grained access control are also in preview, allowing businesses to define governance policies based on metadata.
For customers looking for enhanced redundancy across geographic regions, we are introducing managed disaster recovery for BigQuery. This feature, now in preview, offers automated failover of compute and storage and will offer a new cross-regional service level agreement (SLA) tailored for business-critical workloads. The managed disaster recovery feature provides standby compute capacity in the secondary region included in the price of BigQuery’s Enterprise Plus edition.
A unified experience for all data users
As Google Cloud’s single integrated platform for data analytics, BigQuery unifies how data teams work together with BigQuery Studio. Now generally available, BigQuery Studio gives data teams a collaborative data workspace that all data practitioners can use to accelerate their data-to-AI workflows. BigQuery Studio lets you use SQL, Python, PySpark, and natural language in a single unified analytics workspace, regardless of the data’s scale, format or location. B All development assets in BigQuery Studio are enabled with full lifecycle capabilities, including team collaboration and version control. Since BigQuery Studio’s launch at Next ‘23, hundreds of thousands of users are actively using the new interface.2
Gemini in BigQuery for AI assistive and collaborative experiences
We announced several new innovations for Gemini in BigQuery that help data teams with AI-powered experiences for data preparation, analysis and engineering as well as intelligent recommendations to enhance user productivity and optimize costs. BigQuery data canvas, an AI-centric experience with natural language input, makes data discovery, exploration, and analysis faster and more intuitive. AI augmented data preparation in BigQuery helps users to cleanse and wrangle their data and build low-code visual data pipelines, or rebuild legacy pipelines. Gemini in BigQuery also helps you write and edit SQL or Python code using simple natural language prompts, referencing relevant schemas and metadata.
How Deutsche Telekom is innovating with the BigQuery platform
“Deutsche Telekom built a horizontally scalable data platform in an innovative way that was designed to meet our current and future business needs. With BigQuery at the center of our enterprise’s One Data Ecosystem, we created a unified approach to maintain a single source of truth while fostering de-centralized usage of data across all of our data teams. With BigQuery and Vertex AI, we built a governed and scalable space for data scientists to experiment and productionize AI models while maintaining data sovereignty and federated access controls. This has allowed us to quickly deploy practical usage of LLMs to turbocharge our data engineering life cycle and unleash new business opportunities.” – Ashutosh Mishra, VP of Data Architecture, Deutsche Telekom
Start building your AI-ready data platform
To learn more and start building your AI-ready data platform, start exploring the next generation of BigQuery today. Read more about the latest innovations for Gemini in BigQuery and an overview of what’s next for data analytics at Google Cloud.
1. Google internal data – YoY growth of data processed using Apache Spark on Google Cloud compared with Feb ‘23.
2. Since the August 2023 announcement of BigQuery Studio, monthly active users have continued to grow.