AI 함수를 사용하여 데이터 변환 및 보강(미리 보기)

2025-09-17

Important

이 기능은 프리뷰 상태입니다.

Microsoft Fabric을 사용하면 개발자에서 분석가에 이르기까지 모든 비즈니스 전문가가 생성 AI를 통해 엔터프라이즈 데이터에서 더 많은 가치를 얻을 수 있습니다. 코필로트 및 패브릭 데이터 에이전트와 같은 환경을 사용할 수 있습니다. 데이터 엔지니어링을 위한 새로운 AI 함수 집합으로 인해 패브릭 사용자는 업계 최고의 LLM(대규모 언어 모델)을 활용하여 데이터를 변환하고 보강할 수 있습니다.

AI 함수는 요약, 분류, 텍스트 생성 등에 생성 AI의 기능을 사용합니다. 사용자는 한 줄의 코드를 사용하여 다음을 수행할 수 있습니다.

유사성 ai.similarity계산: 입력 텍스트의 의미를 단일 공통 텍스트 값 또는 다른 열의 해당 텍스트 값과 비교합니다.
텍스트를 다음과 같이 ai.classify분류합니다. 선택한 레이블에 따라 입력 텍스트 값을 분류합니다.
감정 ai.analyze_sentiment검색: 입력 텍스트로 표현된 감정 상태를 식별합니다.
입력 텍스트(예: 위치 또는 이름)에서 특정 유형의 정보를 찾아 추출하여 엔터티를 추출합니다ai.extract.
입력 텍스트의 맞춤법, 문법 및 문장 부호를 수정하여 문법ai.fix_grammar을 수정합니다.
다음을 사용하여 ai.summarize텍스트를 요약합니다. 입력 텍스트 요약을 가져옵니다.
텍스트 ai.translate번역: 입력 텍스트를 다른 언어로 번역합니다.
사용자 지정 사용자 프롬프트에 응답: ai.generate_response사용자 고유의 지침에 따라 응답을 생성합니다.

pandas 또는 Spark를 사용하든 데이터 과학 및 데이터 엔지니어링 워크플로의 일부로 이러한 함수를 통합할 수 있습니다. 자세한 구성도 없고 복잡한 인프라 관리도 없습니다. 특정 기술 전문 지식이 필요하지 않습니다.

Prerequisites

패브릭의 기본 제공 AI 엔드포인트에서 AI 함수를 사용하려면 관리자가 코필로트 및 Azure OpenAI에서 제공하는 기타 기능에 대해 테넌트 스위치를 사용하도록 설정해야 합니다.
위치에 따라 지역 간 처리를 위해 테넌트 설정을 사용하도록 설정해야 할 수 있습니다. Azure OpenAI Service에 사용 가능한 지역에 대해 자세히 알아봅니다.
유료 Fabric 용량(F2 이상 또는 P 에디션)이 필요합니다. 사용자 고유의 Azure OpenAI 리소스는 패브릭 평가판에서 지원되지 않습니다.

Important

Fabric 평가판은 AI 함수에 대한 사용자 고유의 Azure OpenAI 리소스를 지원하지 않습니다. 사용자 지정 Azure OpenAI 엔드포인트를 연결하려면 F2 이상 또는 P 용량으로 업그레이드합니다.

Note

AI 함수는 패브릭 런타임 1.3 이상에서 지원됩니다.
다른 모델을 구성하지 않는 한 AI 함수는 기본적으로 gpt-4o-mini(2024-07-18)로 설정됩니다. 청구 및 소비율에 대해 자세히 알아봅니다.
대부분의 AI 함수는 영어 텍스트에 사용하도록 최적화되어 있습니다.

AI 함수 시작

판다스 AI 함수:
- pandas AI 함수는 런타임에 관계없이 패키지를 설치해야 합니다 openai .
- Python 실행 환경에서 아래에 나와 있는 설치 명령을 사용하여 synapseml_internalsynapseml_core whl 파일을 설치해야 합니다.
- PySpark 런타임에서는 AI 함수가 미리 설치됩니다.
PySpark AI Functions (PySpark 프레임워크 내에서 AI 관련 함수)
- PySpark AI 함수에는 패키지 설치가 필요하지 않습니다.

다음 코드 셀에는 필요한 모든 설치 명령이 포함됩니다.

pandas(PySpark 환경)
pandas(Python 환경)

# The pandas AI functions package requires OpenAI version 1.99.5 or later
%pip install -q --force-reinstall openai==1.99.5 2>/dev/null

# AI functions are preinstalled on the Fabric PySpark runtime

# The pandas AI functions package requires OpenAI version 1.99.5 or later
%pip install -q --force-reinstall openai==1.99.5 synapseml_internal-latest-py3-none-any.whl synapseml_core-latest-py3-none-any.whl

# Install latest versions of AI functions library whl
!wget -q https://aka.ms/fabric-aifunctions-whl -O synapseml_internal-latest-py3-none-any.whl
!wget -q https://aka.ms/fabric-synapseml-core-whl -O synapseml_core-latest-py3-none-any.whl

이 코드 셀은 AI 함수 라이브러리 및 해당 종속성을 가져옵니다. 또한 pandas 셀은 선택적 Python 라이브러리를 가져와 모든 AI 함수 호출의 상태를 추적하는 진행률 표시줄을 표시합니다.

pandas
PySpark

# Required imports
import synapse.ml.aifunc as aifunc
import pandas as pd

# Optional import for progress bars. In future versions, this import will be included by default
# Controlled by aifunc.default_conf.use_progress_bar and conf parameter of AI functions
from tqdm.auto import tqdm
tqdm.pandas()

import synapse.ml.spark.aifunc as aifunc

# SparkSession with accessor `spark` in PySpark environments is pre-setup and available for use

AI 함수 적용

다음 각 함수를 사용하면 패브릭의 기본 제공 AI 엔드포인트를 호출하여 한 줄의 코드로 데이터를 변환하고 보강할 수 있습니다. AI 함수를 사용하여 pandas DataFrames 또는 Spark DataFrames를 분석할 수 있습니다.

Tip

AI 함수의 구성을 사용자 지정 하는 방법을 알아봅니다.

ai.similarity를 사용하여 유사성 계산

이 함수는 ai.similarity 각 입력 텍스트 값을 하나의 공통 참조 텍스트 또는 다른 열의 해당 값(쌍 모드)과 비교합니다. 출력 유사성 점수 값은 상대적이며(반대) 범위에서 -1 (동일)할 1 수 있습니다. 점수 0 는 값이 의미와 관련이 없다는 것을 나타냅니다. 의 사용에 대한 ai.similarity 가져옵니다.

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([ 
        ("Bill Gates", "Microsoft"), 
        ("Satya Nadella", "Toyota"), 
        ("Joan of Arc", "Nike") 
    ], columns=["names", "companies"])
    
df["similarity"] = df["names"].ai.similarity(df["companies"])
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Bill Gates", "Microsoft"), 
        ("Satya Nadella", "Toyota"), 
        ("Joan of Arc", "Nike")
    ], ["names", "companies"])

similarity = df.ai.similarity(input_col="names", other_col="companies", output_col="similarity")
display(similarity)

ai.classify를 사용하여 텍스트 분류

ai.classify 함수는 AI를 호출하여 선택한 사용자 지정 레이블에 따라 입력 텍스트를 분류합니다. 사용에 ai.classify대한 자세한 내용은 이 문서를 참조하세요.

샘플 사용량

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])

df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
    ], ["descriptions"])
    
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)

ai.analyze_sentiment 사용하여 감정 검색

ai.analyze_sentiment 함수는 AI를 호출하여 입력 텍스트로 표현된 감정 상태가 양수, 음수, 혼합 또는 중립인지 여부를 식별합니다. AI가 이 결정을 내릴 수 없는 경우 출력은 비워 집니다. 사용에 ai.analyze_sentiment대한 자세한 지침은 이 문서를 참조하세요.

샘플 사용량

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "The cleaning spray permanently stained my beautiful kitchen counter. Never again!",
        "I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",
        "I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",
        "The umbrella is OK, I guess."
    ], columns=["reviews"])

df["sentiment"] = df["reviews"].ai.analyze_sentiment()
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("The cleaning spray permanently stained my beautiful kitchen counter. Never again!",),
        ("I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",),
        ("I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",),
        ("The umbrella is OK, I guess.",)
    ], ["reviews"])

sentiment = df.ai.analyze_sentiment(input_col="reviews", output_col="sentiment")
display(sentiment)

ai.extract를 사용하여 엔터티 추출

이 함수는 ai.extract AI를 호출하여 입력 텍스트를 검색하고 선택한 레이블(예: 위치 또는 이름)으로 지정된 특정 유형의 정보를 추출합니다. 사용에 ai.extract대한 자세한 지침은 이 문서를 참조하세요.

샘플 사용량

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "MJ Lee lives in Tucson, AZ, and works as a software engineer for Microsoft.",
        "Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
    ], columns=["descriptions"])

df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("MJ Lee lives in Tucson, AZ, and works as a software engineer for Microsoft.",),
        ("Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey.",)
    ], ["descriptions"])

df_entities = df.ai.extract(labels=["name", "profession", "city"], input_col="descriptions")
display(df_entities)

ai.fix_grammar 사용하여 문법 수정

ai.fix_grammar 함수는 AI를 호출하여 입력 텍스트의 맞춤법, 문법 및 문장 부호를 수정합니다. 사용에 ai.fix_grammar대한 자세한 지침은 이 문서를 참조하세요.

샘플 사용량

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "There are an error here.",
        "She and me go weigh back. We used to hang out every weeks.",
        "The big picture are right, but you're details is all wrong."
    ], columns=["text"])

df["corrections"] = df["text"].ai.fix_grammar()
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("There are an error here.",),
        ("She and me go weigh back. We used to hang out every weeks.",),
        ("The big picture are right, but you're details is all wrong.",)
    ], ["text"])

corrections = df.ai.fix_grammar(input_col="text", output_col="corrections")
display(corrections)

ai.summarize를 사용하여 텍스트 요약

ai.summarize 함수는 AI를 호출하여 입력 텍스트 요약(DataFrame의 단일 열의 값 또는 모든 열의 행 값)을 생성합니다. 사용에 ai.summarize대한 자세한 지침은 이 문서를 참조하세요.

샘플 사용량

pandas
PySpark

# This code uses AI. Always review output for mistakes.
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df= pd.DataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """)
    ], columns=["product", "release_year", "description"])

df["summaries"] = df["description"].ai.summarize()
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """,),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """,)
    ], ["product", "release_year", "description"])

summaries = df.ai.summarize(input_col="description", output_col="summary")
display(summaries)

ai.translate를 사용하여 텍스트 번역

ai.translate 함수는 AI를 호출하여 입력 텍스트를 원하는 새 언어로 변환합니다. 사용에 ai.translate대한 자세한 지침은 이 문서를 참조하세요.

샘플 사용량

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "Hello! How are you doing today?", 
        "Tell me what you'd like to know, and I'll do my best to help.", 
        "The only thing we have to fear is fear itself."
    ], columns=["text"])

df["translations"] = df["text"].ai.translate("spanish")
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Hello! How are you doing today?",),
        ("Tell me what you'd like to know, and I'll do my best to help.",),
        ("The only thing we have to fear is fear itself.",),
    ], ["text"])

translations = df.ai.translate(to_lang="spanish", input_col="text", output_col="translations")
display(translations)

ai.generate_response 사용하여 사용자 지정 사용자 프롬프트에 응답

ai.generate_response 함수는 AI를 호출하여 사용자 고유의 지침에 따라 사용자 지정 텍스트를 생성합니다. 사용에 ai.generate_response대한 자세한 지침은 이 문서를 참조하세요.

샘플 사용량

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        ("Scarves"),
        ("Snow pants"),
        ("Ski goggles")
    ], columns=["product"])

df["response"] = df.ai.generate_response("Write a short, punchy email subject line for a winter sale.")
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Scarves",),
        ("Snow pants",),
        ("Ski goggles",)
    ], ["product"])

responses = df.ai.generate_response(prompt="Write a short, punchy email subject line for a winter sale.", output_col="response")
display(responses)

ai.similarity사용하여 유사성을 계산합니다.
ai.analyze_sentiment사용하여 감정 검색
ai.classify사용하여 텍스트를 분류합니다.
ai.extract를 사용하여 엔터티를 추출합니다.
ai.fix_grammar사용하여 문법을 수정합니다.
ai.summarize사용하여 텍스트를 요약합니다.
ai.translate을(를) 사용하여 텍스트를 번역합니다.
ai.generate_response사용하여 사용자 지정 사용자 프롬프트에 응답합니다.
AI 함수의 구성을 사용자 지정합니다.
필요한 기능을 누락했나요? 패브릭 아이디어 포럼에 제안하세요.

피드백

이 페이지가 도움이 되었나요?

다음을 통해 공유

AI 함수를 사용하여 데이터 변환 및 보강(미리 보기)

Prerequisites

AI 함수 시작

AI 함수 적용

ai.similarity를 사용하여 유사성 계산

샘플 사용량

ai.classify를 사용하여 텍스트 분류

샘플 사용량

ai.analyze_sentiment 사용하여 감정 검색

샘플 사용량

ai.extract를 사용하여 엔터티 추출

샘플 사용량

ai.fix_grammar 사용하여 문법 수정

샘플 사용량

ai.summarize를 사용하여 텍스트 요약

샘플 사용량

ai.translate를 사용하여 텍스트 번역

샘플 사용량

ai.generate_response 사용하여 사용자 지정 사용자 프롬프트에 응답

샘플 사용량

관련 콘텐츠

피드백

추가 리소스