Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The ONNX Runtime shipped with Windows ML allows apps to run inference on ONNX models locally.
Creating an inference session
The APIs are the same as when using ONNX Runtime directly. For example, to create an inference session:
// Create inference session using compiled model
using InferenceSession session = new(compiledModelPath, sessionOptions);
We suggest reading the ONNX Runtime docs for more info about how to use the ONNX Runtime APIs within Windows ML. Model inference code will be different for every model.
Compile models
Before using an ONNX model in an inference session, it often must be compiled into an optimized representation that can be executed efficiently on the device's underlying hardware.
As of ONNX Runtime 1.22, there are new APIs that better encapsulate the compilation steps. More details are available in the ONNX Runtime compile documentation (see OrtCompileApi struct).
// Prepare compilation options
OrtModelCompilationOptions compileOptions = new(sessionOptions);
compileOptions.SetInputModelPath(modelPath);
compileOptions.SetOutputModelPath(compiledModelPath);
// Compile the model
compileOptions.CompileModel();
Note
Compilation can take several minutes to complete. So that any UI remains responsive, consider doing this as a background operation in your application.
Tip
For optimal performance, compile your models once and reuse the compiled version. Store compiled models in your app's local data folder for subsequent runs. Note that updates to the EPs or runtime might require recompiling.