Foundry Local과 추론 SDK 통합

2025-10-01

중요합니다

Foundry Local은 미리 보기로 제공됩니다. 공개 미리 보기 릴리스는 활성 배포 중인 기능에 대한 초기 액세스를 제공합니다.
기능, 접근 방식 및 프로세스는 GA(일반 공급) 전에 기능이 변경되거나 제한될 수 있습니다.

Foundry Local은 OpenAI, Azure OpenAI 및 LangChain과 같은 유추 SDK와 통합됩니다. 이 문서에서는 인기 있는 SDK를 사용하여 로컬 AI 모델에 앱을 연결하는 방법을 보여 줍니다.

필수 조건

Foundry Local을 설치합니다. Foundry Local 시작하기에 대한 지침을 참조하세요.

pip 패키지 설치

다음 Python 패키지를 설치합니다.

pip install openai
pip install foundry-local-sdk

팁 (조언)

패키지 충돌을 방지하려면 가상 환경을 사용하는 것이 좋습니다. venv 또는 conda을 사용하여 가상 환경을 만들 수 있습니다.

Foundry Local에서 OpenAI SDK 사용

다음 예제에서는 Foundry Local에서 OpenAI SDK를 사용하는 방법을 보여 줍니다. 코드는 Foundry 로컬 서비스를 초기화하고, 모델을 로드하고, OpenAI SDK를 사용하여 응답을 생성합니다.

다음 코드를 복사하여 Python app.py파일에 붙여넣습니다.

import openai
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "qwen2.5-0.5b"

# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code uses the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
    base_url=manager.endpoint,
    api_key=manager.api_key  # API key is not required for local usage
)
# Set the model to use and generate a response
response = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[{"role": "user", "content": "What is the golden ratio?"}]
)
print(response.choices[0].message.content)

다음 명령을 사용하여 코드를 실행합니다.

python app.py

스트리밍 응답

스트리밍 응답을 받으려면 다음과 같이 코드를 수정할 수 있습니다.

import openai
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "qwen2.5-0.5b"

# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)

# The remaining code us es the OpenAI Python SDK to interact with the local model.

# Configure the client to use the local Foundry service
client = openai.OpenAI(
    base_url=manager.endpoint,
    api_key=manager.api_key  # API key is not required for local usage
)

# Set the model to use and generate a streaming response
stream = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[{"role": "user", "content": "What is the golden ratio?"}],
    stream=True
)

# Print the streaming response
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

이전과 동일한 명령을 사용하여 코드를 실행할 수 있습니다.

python app.py

`requests`을 Foundry Local과 함께 사용하세요

# Install with: pip install requests
import requests
import json
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "qwen2.5-0.5b"

# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)

url = manager.endpoint + "/chat/completions"

payload = {
    "model": manager.get_model_info(alias).id,
    "messages": [
        {"role": "user", "content": "What is the golden ratio?"}
    ]
}

headers = {
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()["choices"][0]["message"]["content"])

Node.js 패키지 설치

다음 Node.js 패키지를 설치해야 합니다.

npm install openai
npm install foundry-local-sdk

Foundry 로컬 SDK를 사용하면 Foundry 로컬 서비스 및 모델을 관리할 수 있습니다.

Foundry Local에서 OpenAI SDK 사용

다음 코드를 복사하여 JavaScript app.js파일에 붙여넣습니다.

import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";

// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager();

// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias);
console.log("Model Info:", modelInfo);

const openai = new OpenAI({
  baseURL: foundryLocalManager.endpoint,
  apiKey: foundryLocalManager.apiKey,
});

async function generateText() {
  const response = await openai.chat.completions.create({
    model: modelInfo.id,
    messages: [
      {
        role: "user",
        content: "What is the golden ratio?",
      },
    ],
  });

  console.log(response.choices[0].message.content);
}

generateText();

다음 명령을 사용하여 코드를 실행합니다.

node app.js

스트리밍 응답

스트리밍 응답을 받으려면 다음과 같이 코드를 수정할 수 있습니다.

import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";

// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager();

// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias);
console.log("Model Info:", modelInfo);

const openai = new OpenAI({
  baseURL: foundryLocalManager.endpoint,
  apiKey: foundryLocalManager.apiKey,
});

async function streamCompletion() {
  const stream = await openai.chat.completions.create({
    model: modelInfo.id,
    messages: [{ role: "user", content: "What is the golden ratio?" }],
    stream: true,
  });

  for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
      process.stdout.write(chunk.choices[0].delta.content);
    }
  }
}

streamCompletion();

다음 명령을 사용하여 코드를 실행합니다.

node app.js

Foundry 로컬에서 Fetch API 사용

다음과 같이 fetchHTTP 클라이언트를 사용하려는 경우 다음과 같이 수행할 수 있습니다.

import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";

// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager();

// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias);
console.log("Model Info:", modelInfo);

async function queryModel() {
  const response = await fetch(
    foundryLocalManager.endpoint + "/chat/completions",
    {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        model: modelInfo.id,
        messages: [{ role: "user", content: "What is the golden ratio?" }],
      }),
    }
  );

  const data = await response.json();
  console.log(data.choices[0].message.content);
}

queryModel();

스트리밍 응답

Fetch API를 사용하여 스트리밍 응답을 받으려면 다음과 같이 코드를 수정할 수 있습니다.

import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";

// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager();

// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias);
console.log("Model Info:", modelInfo);

async function streamWithFetch() {
  const response = await fetch(
    foundryLocalManager.endpoint + "/chat/completions",
    {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Accept: "text/event-stream",
      },
      body: JSON.stringify({
        model: modelInfo.id,
        messages: [{ role: "user", content: "what is the golden ratio?" }],
        stream: true,
      }),
    }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split("\n").filter((line) => line.trim() !== "");

    for (const line of lines) {
      if (line.startsWith("data: ")) {
        const data = line.substring(6);
        if (data === "[DONE]") continue;

        try {
          const json = JSON.parse(data);
          const content = json.choices[0]?.delta?.content || "";
          if (content) {
            // Print to console without line breaks, similar to process.stdout.write
            process.stdout.write(content);
          }
        } catch (e) {
          console.error("Error parsing JSON:", e);
        }
      }
    }
  }
}

// Call the function to start streaming
streamWithFetch();

프로젝트 만들기

새 C# 프로젝트를 만들고 다음으로 이동합니다.

dotnet new console -n hello-foundry-local
cd hello-foundry-local

NuGet 패키지 설치

프로젝트 폴더에 다음 NuGet 패키지를 설치합니다.

dotnet add package Microsoft.AI.Foundry.Local --version 0.1.0
dotnet add package OpenAI --version 2.2.0-beta.4

Foundry Local에서 OpenAI SDK 사용

다음 코드를 복사하여 C Program.cs# 파일에 붙여넣습니다.

using Microsoft.AI.Foundry.Local;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel;
using System.Diagnostics.Metrics;

var alias = "qwen2.5-0.5b";

var manager = await FoundryLocalManager.StartModelAsync(aliasOrModelId: alias);

var model = await manager.GetModelInfoAsync(aliasOrModelId: alias);
ApiKeyCredential key = new ApiKeyCredential(manager.ApiKey);
OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions
{
    Endpoint = manager.Endpoint
});

var chatClient = client.GetChatClient(model?.ModelId);

var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue'");

Console.Write($"[ASSISTANT]: ");
foreach (var completionUpdate in completionUpdates)
{
    if (completionUpdate.ContentUpdate.Count > 0)
    {
        Console.Write(completionUpdate.ContentUpdate[0].Text);
    }
}

다음 명령을 사용하여 코드를 실행합니다.

dotnet run

프로젝트 만들기

새 Rust 프로젝트를 만들고 해당 디렉터리로 이동합니다.

cargo new hello-foundry-local
cd hello-foundry-local

상자 설치

Cargo를 사용하여 다음 Rust 상자를 설치합니다.

cargo add foundry-local anyhow env_logger serde_json
cargo add reqwest --features json
cargo add tokio --features full

`main.rs` 파일 업데이트

다음 예제에서는 Foundry 로컬 서비스에 대한 요청을 사용하여 유추하는 방법을 보여 줍니다. 이 코드는 Foundry 로컬 서비스를 초기화하고, 모델을 로드하고, 라이브러리를 사용하여 reqwest 응답을 생성합니다.

다음 코드를 복사하여 Rust 파일에 main.rs붙여넣습니다.

use foundry_local::FoundryLocalManager;
use anyhow::Result;

#[tokio::main]
async fn main() -> Result<()> {
    // Create a FoundryLocalManager instance with default options
    let mut manager = FoundryLocalManager::builder()
        .alias_or_model_id("qwen2.5-0.5b") // Specify the model to use   
        .bootstrap(true) // Start the service if not running
        .build()
        .await?;
    
    // Use the OpenAI compatible API to interact with the model
    let client = reqwest::Client::new();
    let endpoint = manager.endpoint()?;
    let response = client.post(format!("{}/chat/completions", endpoint))
        .header("Content-Type", "application/json")
        .header("Authorization", format!("Bearer {}", manager.api_key()))
        .json(&serde_json::json!({
            "model": manager.get_model_info("qwen2.5-0.5b", true).await?.id,
            "messages": [{"role": "user", "content": "What is the golden ratio?"}],
        }))
        .send()
        .await?;

    let result = response.json::<serde_json::Value>().await?;
    println!("{}", result["choices"][0]["message"]["content"]);
    
    Ok(())
}

다음 명령을 사용하여 코드를 실행합니다.

cargo run

피드백

이 페이지가 도움이 되었나요?

다음을 통해 공유

Foundry Local과 추론 SDK 통합

필수 조건

pip 패키지 설치

Foundry Local에서 OpenAI SDK 사용

스트리밍 응답

requests을 Foundry Local과 함께 사용하세요

Node.js 패키지 설치

Foundry Local에서 OpenAI SDK 사용

스트리밍 응답

Foundry 로컬에서 Fetch API 사용

스트리밍 응답

프로젝트 만들기

NuGet 패키지 설치

Foundry Local에서 OpenAI SDK 사용

프로젝트 만들기

상자 설치

main.rs 파일 업데이트

관련 콘텐츠

피드백

추가 리소스

`requests`을 Foundry Local과 함께 사용하세요

`main.rs` 파일 업데이트