Gone are the days (for most of us) of needing to craft neural networks layer by layer and neuron by neuron on a scarce GPU server. As AI models increase in size, complexity, and infrastructure demands, the AI tools ecosystem is keeping pace by developing higher-level abstractions for ease of use and faster deployments. This comes with more choices, more confusion, and more risk.
Navigating these complexities is crucial for striking the right balance between innovation and maintenance, customization, and generic solutions. The variety of tools mean organizations have the opportunity to build across the AI stack: lower down to build their own ML system or higher up to leverage AI-powered applications. But this would-be flexibility can be hazardous. Selecting a solution too low in the stack may result in allocating excessive resources to maintaining infrastructure and boilerplate code, ultimately hindering your organization’s ability to innovate. On the other hand, opting for a solution too high in the stack could lead to implementing a generic solution that proves challenging to tailor to your specific needs and may not adequately address your unique use case.
At Google Cloud we provide a range of platforms, tools, and services across the evolving AI tech stack to not only match your needs today, but also offer the flexibility to move up or down the stack as your needs evolve. Find what works best for your organization and run with it—the only winners in technology are those who are bold enough to try, fail, and try again.
The evolution of LLMs and ML models over the past year
Before diving into the evolving AI tech stack let’s touch on the recent breakthroughs in machine learning with foundation models.
Foundation models are AI models trained on large, diverse datasets that are able to perform many different tasks. Based on breakthroughs in machine learning technology like the Transform architecture, foundation models have only become widely available to organizations in recent months and are fundamentally different from earlier waves of AI where machine learning models were trained on a single task. Moreover, many of these foundation models can be interacted with using natural language prompts, a paradigm shift in human-computer interaction that’s continuing to reverberate throughout organizations, developer communities, and with end users.
It wasn’t even a year ago that the ML model and AI systems in AI-enabled applications were only a small component of the system architecture. These apps required organizations to mix single-task ML models and heuristics to build out production-grade AI systems (e.g., recommendation systems, image classifiers, fraud detectors, chat bots). The complexity of building and serving models meant many projects never reached production deployments in applications or powered only small features within an app.
Going forward, organizations will likely use a variety of models for different specialties. But for the first time, it’s possible to use a single multitask foundation model as the AI engine for a wide variety of use cases. Chaining together natural language prompts enables any employee across the organizations to analyze, extract, troubleshoot, explain, summarize, and generate data to create novel capabilities and experiences. Developers can rapidly accelerate how they prototype, experiment, and build new data-backed AI features and services. These previously unheard of possibilities empower enterprises to accelerate AI adoption and reimagine business processes, market strategies, internal applications, and product innovation.
The five layers of the AI stack
With these exciting advances in AI, the question many organizations are asking themselves is what is necessary to build, and what can be adopted with respect to AI capabilities. Prior to 2023, your organization was likely experimenting with AI and deploying custom models, but now everyone in your organization is empowered to incorporate AI across their work streams. We all have a place in the AI renaissance and the diagram below shows an example of the evolving AI tech stack and layers to consider when building AI capabilities. Spoiler alert — there is no right or wrong answer and most organizations will likely have a mix of architectures across use cases.
AI infrastructure
For engineers and AI researchers pushing the boundaries of what’s possible, you need infrastructure as cutting-edge and flexible as your ideas, and generic solutions don’t cut it when you’re changing the game. Infrastructure is no longer a commodity, it is a specialized competitive differentiator — particularly for successful AI workloads. Low level access to OS images, runtimes, accelerators and command line interfaces on virtual machines and kubernetes clusters allow users to experiment with different approaches and optimize their code for specific tasks. Compute Engine and Google Kubernetes Engine allow this level of granular control that lets teams optimize and debug their code. Teams also have access to Google Cloud TPU and Google Cloud GPU to scale their AI workloads for training and serving of models of all sizes, including large foundation models.
ML platform
Data Scientists & ML Engineers are leveraging AI to accelerate experimentation, simplify code management, and ship code faster. They tend to work with a mix of off-the-shelf OSS models and modeling frameworks including PyTorch, TensorFlow, Keras, XGBoost, Hugging Face Transformers, and more. If your organization is looking to train, manage and serve your own custom models, look for a fully-managed machine learning platform like Vertex AI with tools, workflows, and infrastructure designed to help ML practitioners accelerate and scale ML in production. Today Vertex AI offers much more than just training and serving. Vertex AI’s suite of tools includes experiment tracking, embeddings, pipeline orchestration, feature store, vector database, model monitoring, and more.
Single-task ML models and multi-task foundation models
The future of AI is likely to be shaped by the development of new and innovative interfaces including simple ML APIs. These allow developers to add AI capabilities to their applications in a simple and reliable way with the ability to customize for their specific use case. Developers and AI practitioners in this layer are looking to embed intelligent AI features into their product and applications with node.js, ruby, python, golang, and other popular programming libraries.
They tend to look for fully managed API endpoints that require a simple input and provide an AI predicted or generated response output. Generative AI App Builder and Vertex AI provide access to single-task models, multi-task foundation models, enterprise search services, conversational chat services and more. Single-task models found in Vertex AI’s Model Garden include translation, transcription, optical character recognition, sentiment analysis and more. Vertex Generative AI Studio provides access to Google’s multi-task foundation models including PaLM 2 for text, Imagen for Image, and Chirp for Speech and audio. Generative AI App Builder allows users to build search and conversational applications with pre-built workflows and the ability to connect to datasets within an enterprise.
Emerging roles like the Prompt Engineer will be ever important here as this layer is built on the ability of the user to create clear, concise, and easy to understand instructions as part of an input prompt.
AI-powered applications
This layer is primarily for business users looking for low code AI-powered solutions or end to end SaaS applications to solve a business task or business process. By leveraging AI and machine learning, organizations streamline their workflows, optimize their resources, and make data-driven decisions. AI-powered applications provide solutions for a wide range of use cases, including customer service, document processing, recommendation systems, customer relationship management, HR management, finance and accounting, project management, and more.
Accelerating ML teams and processes
Foundation models and the thriving OSS AI community are also evolving ML development workflow from years past. Developers can now accelerate their workflow by choosing from many off-the-shelf models to test and apply to their use cases.
The image below demonstrates how certain data scientists and developers can jump directly into experimenting and prototyping AI capabilities with existing ML models. These existing ML models range from OSS, foundation, and single-task models. While in the past a team would need 10000s of training samples and expertise in lower level ML frameworks, today’s approaches allow teams to get started with fewer training examples, minimal ML expertise, and easy-to-use APIs and natural language prompts.
Don’t reinvent the wheel
Which layers do your projects fall into? Whether you start at the top or closer to the bottom, your organization should be deliberate about what layer is right for what project. It is important to take stock of the resources, skills, datasets, and use cases you have today instead of building everything from the ground up which your organization may not be able to maintain and support long-term.
The AI technology stack is transforming enterprise innovation. From data collection to model development and deployment, the stack enables organizations to strategically build, scale, and optimize AI. Building your stack with deliberation and care can optimize TCO while accelerating adoption. But this will only work if you have a technology partner flexible enough to reinforce your strengths while filling the gaps you’ll need to scale and deploy your generative AI projects.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘Foundations for models: Why accelerating AI has to start with your infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3edf1b1b3e20>), (‘btn_text’, ‘Read more’), (‘href’, ‘https://cloud.google.com/blog/transform/accelerating-ai-requires-right-cloud-infrastructure’), (‘image’, <GAEImage: Foundations for models.jpg>)])]>