We recently announced a suite of LangChain packages for the Google Cloud database portfolio. Each package will have up to three LangChain integrations:
-
Vector stores to enable semantic search for our databases that support vectors
-
Document loaders for loading and saving documents to/from your database
-
Chat Message Memory to enable chains to recall previous conversations
In this blog, we deep dive into the benefits of the VectorStore from our Cloud SQL for PostgreSQL LangChain package, and see how it helps make generative AI application development easy, secure, and flexible.
Security
The Cloud SQL for PostgreSQL LangChain packages come embedded with the Cloud SQL Python connector, which makes connecting securely to your database easy. Developers get the following benefits out of the box:
-
IAM authorization: Uses IAM permissions to control who or what can connect to your Cloud SQL instances
-
Convenience: Removes the requirement to manage SSL certificates, configure firewall rules, or enable authorized networks
-
IAM database authentication: Provides support for Cloud SQL’s automatic IAM database authentication feature
Ease of use
Connect with just instance name
Now, you no longer need to construct a connection string with your IP address or pass in a myriad of arguments to connect to your PostgreSQL instance. Instead, the instance name alone will suffice as shown below:
- code_block
- <ListValue: [StructValue([('code', 'from langchain_google_cloud_sql_pg import PostgresVectorStore, PostgresEnginernfrom langchain.embeddings import VertexAIEmbeddingsrnrnrnengine = PostgresEngine.from_instance(rn "project-id", "region", "my-instance", "my-database")rnembedding_service = VertexAIEmbeddings(model_name="textembedding-gecko@003")rnvectorstore = PostgresVectorStore.create_sync(rn engine,rn table_name="my-table",rn embedding_service=embedding_servicern)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e3f4cd348b0>)])]>
Connection pooling by default
Connection management is an important part of scaling PostgreSQL. Cloud SQL for PostgreSQL’s LangChain packages come automatically configured with an SQLAlchemy connection pool. Our package supports custom configurations as well as reusing the pool in other parts of your application:
- code_block
- <ListValue: [StructValue([('code', 'from sqlalchemy.ext.asyncio import create_async_enginernrnengine = create_async_engine(rn "postgresql+asyncpg://langchain:langchain@…",rn)rnengine = PostgresEngine.from_engine(engine)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e3f4cd34640>)])]>
Schema flexibility
While the existing langchain-postgres
package offers a VectorStore, it only supports a limited and fixed schema. It uses two tables with fixed names and schemas for all vector stores initialized in a database. Any schema changes require dropping and recreating the table, losing the previous data. Nor does it currently support indexing, and it can only be used for KNN.
Table per collection
In contrast, Cloud SQL for PostgreSQL’s LangChain packages use a different table for each collection of vectors, meaning that schemas can vary as shown below.
- code_block
- <ListValue: [StructValue([('code', '# first storernengine.init_vectorstore_table(rn table_name="my-table1",rn vector_size=768, # VertexAI model: textembedding-gecko@003rn)rnvectorstore = PostgresVectorStore.create_sync(rn engine,rn table_name="my-table1",rn embedding_service=embedding_servicern)rn# second storernengine.init_vectorstore_table(rn table_name="my-table2",rn vector_size=768, # VertexAI model: textembedding-gecko@003rn)rnvectorstore = PostgresVectorStore.create_sync(rn engine,rn table_name="my-table2",rn embedding_service=embedding_servicern)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e3f4cd34040>)])]>
Support indexing
The Cloud SQL for PostgreSQL LangChain packages supports ANN to speed up vector search. Below are simple code snippets to create, refresh and drop indexes using the package.
Create index
- code_block
- <ListValue: [StructValue([('code', 'from langchain_google_cloud_sql_pg.indexes import IVFFlatIndexrnrnindex = IVFFlatIndex()rnawait vectorstore.aapply_vector_index(index)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e3f4cd343a0>)])]>
Re-index
- code_block
- <ListValue: [StructValue([('code', 'await vectorstore.areindex() # Re-index using default index name'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e3f4cd34d90>)])]>
Drop indexes
- code_block
- <ListValue: [StructValue([('code', 'await vectorstore.adrop_vector_index() # Delete index using default name'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e3f4cd34100>)])]>
Custom schemas
The Cloud SQL for PostgreSQL LangChain packages allow you to use different schemas, which means you can both reuse any existing table, as well as more easily migrate from other implementations (such as langchain-postgres package).
Use preexisting table
When initializing a PostgresVectorStore, you can optionally specify the names of the columns that you store things like content, ids, or other metadata fields from your LangChain document. This allows you to leverage existing tables, or tables made from other integrations (such as langchain-postgres).
- code_block
- <ListValue: [StructValue([('code', '# Initialize PostgresVectorStorerncustom_store = await PostgresVectorStore.create(rn engine=engine,rn # Connect to the table the langchain-postgres extension usesrn table_name='langchain_pg_embedding',rn embedding_service=embedding_service,rn # Connect to a existing VectorStore by customizing the table schema:rn id_column="id",rn content_column="document",rn metadata_json_column="cmetadata",rn)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e3f4cd34850>)])]>
Extract metadata to column
Specifying metadata columns causes the integration to pull that field from the document metadata and store it in its own, properly typed column.
- code_block
- <ListValue: [StructValue([('code', 'from langchain_google_cloud_sql_pg import Column, PostgresVectorStorernrn# Set table namernTABLE_NAME = "vectorstore_custom"rnrnawait engine.ainit_vectorstore_table(rn table_name=TABLE_NAME,rn vector_size=768, # VertexAI model: textembedding-gecko@003rn metadata_columns=[Column("length", "INTEGER")],rn)rnrn# Initialize PostgresVectorStorerncustom_store = await PostgresVectorStore.create(rn engine=engine,rn table_name=TABLE_NAME,rn embedding_service=embedding_service,rn metadata_columns=["length"],rn)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e3f4cd34550>)])]>
Filter on metadata
Having a metadata field stored in a column allows you to leverage PostgreSQL’s value as a relational database, filtering efficiently.
- code_block
- <ListValue: [StructValue([('code', 'import uuidrnrn# Add texts to the Vector Storernall_texts = ["Apples and oranges", "Cars and airplanes", "Pineapple", "Train", "Banana"]rnmetadatas = [{"length": len(t)} for t in all_texts]rnids = [str(uuid.uuid4()) for _ in all_texts]rnawait custom_store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)rnrn# Use filter on searchrndocs = await custom_store.asimilarity_search_by_vector(rn query_vector, rn filter="length >= 6"rn)rnrnprint(docs)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e3f4cd34880>)])]>
In summary, it’s very easy to get started with Cloud SQL for PostgreSQL as a vector database, and our native LangChain packages make gen AI development more flexible and powerful.
You can play around with the Cloud SQL for PostgreSQL VectorStore with this VectorStore Notebook, or you can check out the package on GitHub, including filing bugs or other feedback by opening an issue.