Vertex AI at I/O: Bringing new Gemini and Gemma models to Google Cloud customers

Vertex AI is Google Cloud’s fully-managed, unified development platform for leveraging models at scale, with a selection of over 150 first-party, open, and third-party foundation models; for customizing models with enterprise-ready tuning, grounding, monitoring, and deployment capabilities; and for building AI agents.

Customers such as ADT, IHG Hotels & Resorts, ING Bank, Verizon, and more are innovating faster with Vertex AI as a one-stop platform for building, deploying, and maintaining AI apps and agents.

Today at Google I/O ‘24, we announced a range of Vertex AI updates, headlined by several new models developed by Google DeepMind and other groups across Google, that are available to Cloud customers today, with additional innovations to come:

Available today:

Gemini 1.5 Flash, in public preview, offers our groundbreaking context window of 1 million tokens, but is lighter-weight than 1.5 Pro and designed to efficiently serve with speed and scale for tasks like chat applications.
PaliGemma, available in Vertex AI Model Garden, is the first vision-language model in the Gemma family of open models, and is well-suited for tasks like image captioning and visual question-answering.

Coming soon:

Imagen 3 is our highest-quality text-to-image generation model yet, able to generate an incredible level of detail and produce photorealistic, lifelike images.
Gemma 2 is the next generation in our family of open models built for a broad range of AI developer use cases from the same technologies used to create Gemini.
Gemini 1.5 Pro with an expanded 2 million context window. Sign up here to join the waitlist.

To help customers optimize model performance, we also announced new capabilities that include context caching, controlled generation, and a batch API. And to empower developers to more flexibly and quickly build AI agents, we’ve made Firebase Genkit and LlamaIndex available on Vertex AI.

Today’s announcements continue to help developers innovate and organizations to accelerate their AI deployments in production — let’s take a closer look.

Gemini 1.5 Flash: Built for high-volume tasks where cost and latency matter

Earlier this year, we announced Gemini 1.5 Pro, which provides our customers with an industry-leading, breakthrough context window of 1 million tokens that enables accurate processing across large documents, codebases, or entire videos with a single prompt. After going into public preview in April, Gemini 1.5 Pro will be generally available in the next month.

Our announcement today of Gemini 1.5 Flash takes these capabilities even further. It features the same 1 million token context window as 1.5 Pro but is purpose-built for high-volume tasks where cost and latency matter — like chat applications, captioning, detailed video and image analysis, extracting content and data from long-form documents, and more.

For use cases that require an even larger context window — such as analyzing very large code bases or extensive document libraries — customers will be able to try Gemini 1.5 Pro with up to a 2 million token context window. Customers can sign up here to join the waitlist.

PaliGemma: Expanding developer choice on Vertex AI

Built using the same research and technology as Gemini models, our Gemma family of open models released earlier this year offers state-of-the-art performance in lightweight 7B and 2B packages. We’re grateful to see Gemma embraced by the community, with millions of downloads within just a few months.

Announced today, PaliGemma is the Gemma family’s first vision-language open model. PaliGemma is optimized for use cases such as image captioning, visual question-answering, understanding text in images, object detection, and object segmentation. PaliGemma adds to the choice of models developers can access on Vertex AI to pair the right models with the right tasks and budget requirements.

More model innovation to come with Imagen 3 and Gemma 2 models

In addition to the models and tools made available today, we’re pleased to share that Vertex AI customers will soon be able to start innovating with Imagen 3 and Gemma 2 models.

Imagen 3 will be available to Vertex AI customers this summer, offering our most sophisticated image generation capabilities yet. Imagen 3 understands natural language, resulting in better understanding of intent behind prompts, the incorporation of small details from longer prompts, and improved ability to render text within an image.

Gemma 2 will also be available on Vertex AI this summer, including a 27B model that performs comparably to much larger models, giving developers even more powerful options for use cases that require open models.

Accelerate the path for models in production

Vertex AI enables developers and enterprises to tune, optimize, evaluate, deploy, and monitor foundation models. It includes our recently-announced prompt management and model evaluation tools, and today, we’re adding three new capabilities:

Context caching, in public preview next month, lets customers actively manage and reuse cached context data. As processing costs increase by context length, it can be expensive to move long-context applications to production. Vertex AI context caching helps customers significantly reduce costs by leveraging cached data.
Controlled generation, coming to public preview later this month, lets customers define Gemini model outputs according to specific formats or schemas. Most models cannot guarantee the format and syntax of their outputs, even with specified instructions. Vertex AI controlled generation lets customers choose the desired output format via pre-built options like YAML and XML, or by defining custom formats. JSON, as a pre-built option, is live today.
Finally, batch API, available in public preview today, is a super-efficient way to send large numbers of non-latency sensitive text prompt requests, supporting use cases such as classification and sentiment analysis, data extraction, and description generation. It helps speed up developer workflows and reduces costs by enabling multiple prompts to be sent to models in a single request.

With these new capabilities, we are making it easier for organizations to get the best performance from their gen AI models at scale, and to iterate faster from experimentation to production.

Agent Builder: New open-source integrations help fast track agent building

Announced at Next ‘24, Vertex AI Agent Builder enables developers to easily build and deploy enterprise-ready gen AI experiences via a range of tools for different developer needs and levels of expertise — from a no-code console for building AI agents using natural language, to code-first open-source orchestration frameworks like LangChain on Vertex AI. These capabilities help customers balance rapid experimentation and iteration with cost, governance, and performance requirements.

To make Agent Builder even more powerful, we’ve made it easy for developers to access Firebase Genkit and LlamaIndex on Vertex AI.

Genkit, announced by Firebase today at I/O, is an open-source Typescript/JavaScript framework designed to simplify the development, deployment, and monitoring of production-ready AI agents. Facilitated through the Vertex AI plugin, Firebase developers can now take advantage of Google models like Gemini and Imagen 2, as well as text embeddings.

LlamaIndex on Vertex AI simplifies the retrieval augmented generation (RAG) process, from data ingestion and transformation to embedding, indexing, retrieval, and generation. Now Vertex AI customers can leverage Google’s models and AI-optimized infrastructure alongside LlamaIndex’s simple, flexible, open-source data framework, to connect custom data sources to generative models.

With these new features, and existing support for LangChain on Vertex AI, open-source continues to be a key part of our mission to provide developers with cutting-edge tools to create more intelligent and informative AI agents.

Finally, in addition to helping our customers ground outputs in their proprietary databases or designated sources of “enterprise truth,” we’re announcing that Grounding with Google Search is now generally available. We’ve also once again expanded our generated output indemnity coverage so that outputs grounded with Google Search are now covered under our Generative AI indemnified services. By grounding Gemini models with Google Search, we offer customers the combined power of Google’s latest foundation models along with access to fresh, high-quality information to significantly improve the completeness and accuracy of responses.

Get started with Vertex AI today

Get started with Gemini 1.5 Flash in Vertex AI today!

To learn more about how Google Cloud customers are succeeding with generative AI, check out our recent ebook “Crossing the generative AI tipping point: From quick wins to sustained growth” or see what other customers are building in “101 real-world gen AI use cases from the world’s leading organizations.”