Azure Database for PostgreSQL에서 LangChain 사용

Azure Database for PostgreSQL은 LangChain과 같은 주요 LLM(대규모 언어 모델) 오케스트레이션 패키지와 원활하게 통합됩니다. 이 통합을 통해 개발자는 애플리케이션에서 고급 AI 기능을 사용할 수 있습니다. LangChain은 생성 AI 애플리케이션을 더 쉽게 개발할 수 있도록 LLM, 포함 모델 및 데이터베이스의 관리 및 사용을 간소화할 수 있습니다.

이 문서에서는 Azure Database for PostgreSQL의 통합 벡터 데이터베이스 를 사용하여 LangChain을 사용하여 컬렉션에 문서를 저장하고 관리하는 방법을 보여 줍니다. 또한 코사인 거리, L2 거리(유클리드 거리) 및 내부 제품과 같은 가장 인접한 알고리즘을 사용하여 인덱스를 만들고 벡터 검색 쿼리를 수행하여 쿼리 벡터에 가까운 문서를 찾는 방법을 보여 줍니다.

벡터 지원

Azure Database for PostgreSQL을 사용하여 PostgreSQL에 수백만 개의 벡터 포함을 효율적으로 저장하고 쿼리할 수 있습니다. 이 서비스는 개념 증명에서 프로덕션으로 AI 사용 사례를 확장하는 데 도움이 될 수 있습니다. 다음과 같은 이점을 제공합니다.

벡터 포함 및 관계형 데이터를 쿼리하기 위한 친숙한 SQL 인터페이스를 제공합니다.
pgvector DiskANN 인덱싱 알고리즘을 사용하여 1억 개 이상의 벡터에서 더 빠르고 정확한 유사성 검색을 향상시킵니다.
관계형 메타데이터, 벡터 포함 및 시계열 데이터를 단일 데이터베이스에 통합하여 작업을 간소화합니다.
복제 및 고가용성을 비롯한 엔터프라이즈급 기능을 위해 강력한 PostgreSQL 에코시스템 및 Azure 클라우드 플랫폼의 기능을 사용합니다.

인증

Azure Database for PostgreSQL은 암호 기반 및 Microsoft Entra (이전의 Azure Active Directory) 인증을 지원합니다.

Microsoft Entra 인증을 사용하면 Microsoft Entra ID를 사용하여 PostgreSQL 서버에 인증할 수 있습니다. Microsoft Entra ID를 사용하면 데이터베이스 사용자에 대해 별도의 사용자 이름과 암호를 관리할 필요가 없습니다. 이를 통해 다른 Azure 서비스에 사용하는 것과 동일한 보안 메커니즘을 사용할 수 있습니다.

이 문서에서는 인증 방법 중 하나를 사용할 수 있습니다.

설치 프로그램

Azure Database for PostgreSQL은 오픈 소스 LangChain Postgres 지원을 사용하여 Azure Database for PostgreSQL에 연결합니다. 먼저 파트너 패키지를 다운로드합니다.

%pip install -qU langchain-azure-postgresql
%pip install -qU langchain-openai
%pip install -qU azure-identity

Azure Database for PostgreSQL에서 pgvector 사용

Azure Database for PostgreSQL에서 pgvector 사용 및 사용을 참조하세요.

자격 증명 설정

Azure Database for PostgreSQL 연결 세부 정보를 가져와서 환경 변수로 추가해야 합니다.

USE_ENTRA_AUTH 플래그를 True로 설정하여 Microsoft Entra 인증을 사용합니다. Microsoft Entra 인증을 사용하는 경우 호스트 및 데이터베이스 이름만 제공해야 합니다. 암호 인증을 사용하는 경우 사용자 이름 및 암호도 설정해야 합니다.

import getpass
import os

USE_ENTRA_AUTH = True

# Supply the connection details for the database
os.environ["DBHOST"] = "<server-name>"
os.environ["DBNAME"] = "<database-name>"
os.environ["SSLMODE"] = "require"

if not USE_ENTRA_AUTH:
    # If you're using a username and password, supply them here
    os.environ["DBUSER"] = "<username>"
    os.environ["DBPASSWORD"] = getpass.getpass("Database Password:")

Azure OpenAI 임베딩 설정

os.environ["AZURE_OPENAI_ENDPOINT"] = "<azure-openai-endpoint>"
os.environ["AZURE_OPENAI_API_KEY"] = getpass.getpass("Azure OpenAI API Key:")

AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]
AZURE_OPENAI_API_KEY = os.environ["AZURE_OPENAI_API_KEY"]

from langchain_openai import AzureOpenAIEmbeddings

embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key=AZURE_OPENAI_API_KEY,
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    azure_deployment="text-embedding-3-small",
)

초기화

Microsoft Entra 인증 사용

다음 섹션에서는 Microsoft Entra 인증을 사용하도록 LangChain을 설정하는 방법을 보여 줍니다. LangChain Azure Postgres 패키지의 클래스 AzurePGConnectionPool은 DefaultAzureCredential 라이브러리의 azure.identity을 사용하여 Azure Database for PostgreSQL 서비스에 대한 토큰을 검색합니다.

연결은 connection LangChain 벡터 저장소의 AzurePGVectorStore 매개 변수로 전달될 수 있습니다.

Azure에 로그인하려면 Azure CLI 가 설치되어 있는지 확인합니다. 터미널에서 다음 명령을 실행합니다.

az login

로그인한 후 다음 코드는 토큰을 가져옵니다.

from langchain_azure_postgresql.common import (
    BasicAuth,
    AzurePGConnectionPool,
    ConnectionInfo,
)
from langchain_azure_postgresql.langchain import AzurePGVectorStore
entra_connection_pool = AzurePGConnectionPool(
        azure_conn_info=ConnectionInfo(
            host=os.environ["DBHOST"],
            dbname=os.environ["DBNAME"]
        )
    )

암호 인증 사용

Microsoft Entra 인증을 사용하지 않는 경우 BasicAuth 클래스를 사용하면 사용자 이름과 암호를 사용할 수 있습니다.

basic_auth_connection_pool = AzurePGConnectionPool(
    azure_conn_info=ConnectionInfo(
        host=os.environ["DBHOST"],
        dbname=os.environ["DBNAME"],
        credentials=BasicAuth(
            username=os.environ["DBUSER"],
            password=os.environ["DBPASSWORD"],
        )
    )
)

벡터 저장소 만들기

from langchain_core.documents import Document
from langchain_azure_postgresql.langchain import AzurePGVectorStore

collection_name = "my_docs"

# The connection is either using Entra ID or Basic Auth
connection = entra_connection_pool if USE_ENTRA_AUTH else basic_auth_connection_pool

vector_store = AzurePGVectorStore(
    embeddings=embeddings,
    table_name=table_name,
    connection=connection,
)

벡터 저장소 관리

벡터 저장소에 항목 추가

ID로 문서를 추가할 때, 해당 ID와 일치하는 기존 문서가 있을 경우 덮어쓰게 됩니다.

docs = [
    Document(
        page_content="there are cats in the pond",
        metadata={"doc_id": 1, "___location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="ducks are also found in the pond",
        metadata={"doc_id": 2, "___location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="fresh apples are available at the market",
        metadata={"doc_id": 3, "___location": "market", "topic": "food"},
    ),
    Document(
        page_content="the market also sells fresh oranges",
        metadata={"doc_id": 4, "___location": "market", "topic": "food"},
    ),
    Document(
        page_content="the new art exhibit is fascinating",
        metadata={"doc_id": 5, "___location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a sculpture exhibit is also at the museum",
        metadata={"doc_id": 6, "___location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a new coffee shop opened on Main Street",
        metadata={"doc_id": 7, "___location": "Main Street", "topic": "food"},
    ),
    Document(
        page_content="the book club meets at the library",
        metadata={"doc_id": 8, "___location": "library", "topic": "reading"},
    ),
    Document(
        page_content="the library hosts a weekly story time for kids",
        metadata={"doc_id": 9, "___location": "library", "topic": "reading"},
    ),
    Document(
        page_content="a cooking class for beginners is offered at the community center",
        metadata={"doc_id": 10, "___location": "community center", "topic": "classes"},
    ),
]

uuids = vector_store.add_documents(docs)
uuids

벡터 저장소의 항목 업데이트

updated_docs = [
    Document(
        page_content="Updated - cooking class for beginners is offered at the community center",
        metadata={"doc_id": 10, "___location": "community center", "topic": "classes"},
        id=uuids[-1],
    )
]
vector_store.add_documents(docs, ids=[uuids[-1]], on_conflict_update=True)

벡터 저장소의 항목 보기

vector_store.get_by_ids([str(uuids[-1])])

벡터 저장소에서 항목 삭제

vector_store.delete(ids=[uuids[-1]])

벡터 저장소에 대한 쿼리

벡터 저장소를 만들고 관련 문서를 추가한 후 체인 또는 에이전트의 벡터 저장소를 쿼리할 수 있습니다.

필터링 지원

벡터 저장소는 FilterCondition 패키지의 OrFilter, AndFilter 및 를 통해 문서의 메타데이터 필드에 적용할 수 있는 필터 집합을 지원합니다.

오퍼레이터	의미/범주
`=`	같음(==)
`!=`	Inequality(!=)
`<`	보다 작음(<)
`<=`	작거나 같음(<=)
`>`	보다 큼(>)
`>=`	크거나 같음(>=)
`in`	특수 사례(in)
`not in`	특정 예외(포함 안됨)
`is null`	특수 사례(null인 경우)
`is not null`	특수한 경우(null이 아닌 경우)
`between`	특별한 경우 (중간)
`not between`	특수 사례(between인 경우)
`like`	텍스트(like)
`ilike`	텍스트(대/소문자 구분 없음)
`AND`	논리적(and)
`OR`	논리적(또는)

직접 쿼리

다음과 같이 간단한 유사성 검색을 수행할 수 있습니다.

from langchain_azure_postgresql import FilterCondition, AndFilter

results = vector_store.similarity_search(
    "kitty",
    k=10,
    filter=FilterCondition(
        column="(metadata->>'doc_id')::int",
        operator="in",
        value=[1, 5, 2, 9],
    ),
)

for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

    * there are cats in the pond [{'doc_id': 1, 'topic': 'animals', '___location': 'pond'}]
    * ducks are also found in the pond [{'doc_id': 2, 'topic': 'animals', '___location': 'pond'}]
    * the new art exhibit is fascinating [{'doc_id': 5, 'topic': 'art', '___location': 'museum'}]
    * the library hosts a weekly story time for kids [{'doc_id': 9, 'topic': 'reading', '___location': 'library'}]

여러 필드가 있지만 연산자가 없는 사전을 제공하는 경우 최상위 수준은 논리 AND 필터로 해석됩니다.

results = vector_store.similarity_search(
    "ducks",
    k=10,
    filter=AndFilter(
        AND=[
            FilterCondition(
                column="(metadata->>'doc_id')::int",
                operator="in",
                value=[1, 5, 2, 9],
            ),
            FilterCondition(
                column="metadata->>'___location'",
                operator="in",
                value=["pond", "market"],
            ),
        ]
    ),
)

for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

    * ducks are also found in the pond [{'topic': 'animals', 'doc_id': 2, '___location': 'pond'}]
    * there are cats in the pond [{'topic': 'animals', 'doc_id': 1, '___location': 'pond'}]

유사성 검색을 실행하고 해당 점수를 받으려면 다음을 실행할 수 있습니다.

results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.528338] there are cats in the pond [{'doc_id': 1, 'topic': 'animals', '___location': 'pond'}]

벡터 저장소에서 최대 한계 관련성 검색을 사용하려면:

results = vector_store.max_marginal_relevance_search(
    "query about cats",
    k=10,
    lambda_mult=0.5,
    filter=FilterCondition(
        column="(metadata->>'doc_id')::int",
        operator="in",
        value=[1, 2, 5, 9],
    ),
)

for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

    * there are cats in the pond [{'doc_id': 1, 'topic': 'animals', '___location': 'pond'}]
    * ducks are also found in the pond [{'doc_id': 2, 'topic': 'animals', '___location': 'pond'}]
    * the new art exhibit is fascinating [{'doc_id': 5, 'topic': 'art', '___location': 'museum'}]
    * the library hosts a weekly story time for kids [{'doc_id': 9, 'topic': 'reading', '___location': 'library'}]

벡터 저장소에서 PGVector 실행할 수 있는 검색의 전체 목록은 API 참조를 참조하세요.

리트리버로 변환

체인에서 더 쉽게 사용할 수 있는 리트리버로 벡터 저장소를 변환할 수도 있습니다.

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")

[Document(id='9fe8bc1c-9a8e-4f83-b546-9b64527aa79d', metadata={'doc_id': 1, 'topic': 'animals', '___location': 'pond'}, page_content='there are cats in the pond')]

피드백

이 페이지가 도움이 되었나요?

Last updated on 2025-10-30