Deployment Workflow

End-to-end guide for compiling custom-trained ONNX models and deploying them on the DEEPX NPU across Sixfab AI HAT+ for Raspberry Pi 5, Sixfab Edge AI Expansion Board, and ALPON X5 AI. The two-machine workflow (compile once on an Ubuntu host, run forever on the target Sixfab device) is the same on all three products. No cloud service, no GPU at inference time.

ONNX → DXNN INT8 precision Two-machine workflow Offline inference

AI Model Deployment · Custom Models (DXNN SDK) · Deployment Workflow · Updated 2026-05-16

How do I deploy a custom model on Sixfab edge AI hardware?

The DXNN SDK converts a trained ONNX model into a .dxnn binary that the DEEPX NPU can execute. The compiler runs once on an Ubuntu x86_64 machine; the resulting .dxnn file then runs offline on Sixfab AI HAT+, Edge AI Expansion Board, or ALPON X5 AI via the dxrt-runtime package on the target device. Quantisation to INT8 is automatic, with approximately 2 % accuracy loss versus the original FP32 model.

This page is the operational end-to-end guide. It walks through the three-stage conversion path, the constraints that catch most teams on their first deployment, and the partnership track that takes you from a labelled dataset to a deployed custom model in days. For the full JSON parameter reference of config.json, see the companion DX-COM Configuration Reference.

The two-machine workflow

Custom-model deployment on Sixfab edge AI hardware is split between two machines because compilation is expensive and inference is not. Compilation runs once per model, on a desktop. Inference runs every frame, on the target device.

The DEEPX compiler (DX-COM) requires Ubuntu x86_64 with at least 16 GB of RAM. The dxrt-runtime on the target device (Raspberry Pi 5 with AI HAT+ or Edge AI Expansion Board, or ALPON X5 AI) only loads and executes the compiled .dxnn file. No compiler toolchain is needed on the target. Once a model is compiled, it runs indefinitely without an internet connection.

Stage 1 · Compile

DX-COM compiler

Ubuntu x86_64 host

Converts a trained ONNX model into an optimised .dxnn file for the DEEPX NPU. Quantises automatically to INT8. Runs once per model, on a desktop or a Kaggle Notebook.

One-time CPU only ≥ 16 GB RAM

Stage 2 · Run

`dxrt-runtime`

Target Sixfab device

Loads and executes the compiled .dxnn file on the DEEPX NPU. Runs on Raspberry Pi 5 with AI HAT+ or Edge AI Expansion Board, and on ALPON X5 AI. No compiler dependencies on the target; the runtime artifact is portable across the three products.

Every frame NPU only Offline

The same compiled .dxnn file runs on all three Sixfab edge AI products without modification. The DEEPX DX-M1 family (DX-M1, DX-M1M, DX-M1ML) shares a unified instruction set, so a model compiled for the DEEPX NPU is portable across AI HAT+, Edge AI Expansion Board, and ALPON X5 AI. The only practical constraint is on-chip NPU memory: a model that fits the larger 25-TOPS variants may not fit the 13-TOPS DX-M1ML envelope on AI HAT+.

Compiler machine requirements

DX-COM must run on a separate Ubuntu x86_64 machine, not on the target Sixfab device. Every item below is required unless tagged Optional.

CPU architecture x86_64 ARM and aarch64 are not supported for compilation.

Operating system Ubuntu 20.04 / 22.04 / 24.04 Ubuntu 18.04 is not supported.

glibc ≥ 2.31 Verify with ldd --version.

Python 3.12 Required by the DX-COM wheel.

RAM ≥ 16 GB Minimum for model compilation.

Disk space ≥ 20 GB free Toolchain, dependencies, and intermediate build artifacts.

Internet Required for install Needed for the initial repository clone and pip install. Inference itself runs offline.

GPU Optional CPU compilation only No CUDA or accelerator required on the host machine.

Kaggle Notebooks run on Linux x86_64 with sufficient RAM, a compatible glibc, and internet access out of the box. Every requirement listed above is met by the free Kaggle environment, and Sixfab maintains a reference notebook that walks through the full ONNX → DXNN compile flow without installing anything on a local machine.

Reference notebook: dx-compiler.ipynb ([NEED FROM SIXFAB: Kaggle notebook URL]).

What the notebook does end-to-end:

Clones the DEEPX-AI/dx-compiler repository into the Kaggle working directory.
Runs install.sh with a Docker volume path to stage the toolchain.
Installs the DX-COM Python wheel (pip install dx_com*.whl).
Installs ultralytics and exports a YOLO model to ONNX. The notebook ships with yolo11n as a worked example; swap in your own .onnx file.
Writes a config.json with input shape, calibration settings, preprocessing, and PPU configuration.
Runs dxcom against the ONNX model and config, then produces the compiled .dxnn artifact.

Download the resulting .dxnn file from the Kaggle output panel and copy it to the target Sixfab device to run inference. See Stage 3 below for the runtime call.

Compilation is one-time. Once compiled, the .dxnn file loads in milliseconds on the target device.

Conversion path: ONNX → DXNN → deployed artifact

Three stages take a trained model from a development machine onto the NPU. Each stage runs in a different environment and produces a different artifact.

Export the trained model to ONNX

Train the model in any major framework, then export to ONNX on the development machine. ONNX is the only input format DX-COM accepts.

PyTorch Ultralytics YOLO TensorFlow Keras ONNX Model Zoo

Vision architectures (object detection, classification, segmentation, pose) are the supported scope on current DEEPX silicon. LLMs and audio-only transformer architectures are not supported today; LLMs are on the DEEPX roadmap and Sixfab will support them as the silicon enables.

PyTorch export

import torch

# 1. Load your trained model and switch to eval mode
model.eval()

# 2. Provide a dummy input matching the production input shape
dummy_input = torch.randn(1, 3, 640, 640)

# 3. Export to ONNX (opset 11 is widely compatible with DX-COM)
torch.onnx.export(
    model, dummy_input,
    "my_model.onnx",
    opset_version=11,
    input_names=["images"],
    output_names=["output"]
)

Ultralytics YOLO export

from ultralytics import YOLO

model = YOLO("yolo11n.pt")
model.export(format="onnx")
# Produces yolo11n.onnx in the working directory.

Batch size must be 1. Ultralytics exports with batch size 1 by default. For custom exports, set the batch dimension to 1; DX-COM does not support dynamic or multi-batch compilation via the CLI.

Verify the operators in the exported ONNX graph against the DEEPX-supported list before compiling. Full operator and architecture reference: Supported Models Catalog. Upstream source: github.com/DEEPX-AI/dx-compiler · Building_Models.md.

Compile ONNX to DXNN with DX-COM

DX-COM reads the ONNX file, calibrates it for INT8 arithmetic, and produces a .dxnn binary that runs natively on the DEEPX DX-M1 family NPU. Quantisation is automatic; expect approximately 2 % accuracy loss versus the original FP32 model. This is the published, honest accuracy envelope for INT8 inference on DEEPX silicon.

Install DX-COM

# 1. Clone the DEEPX compiler repository
git clone https://github.com/DEEPX-AI/dx-compiler
cd dx-compiler
./install.sh

# 2. Install the Python wheel
cd dx_com
pip install dx_com*.whl

# 3. Verify the installation
dxcom --version    # expected: DX-COM v2.3.0

Prepare a calibration dataset

Quantisation requires calibration data: a small set of representative images the compiler uses to estimate activation ranges. Use images from your deployment domain; random stock images degrade quantised accuracy. Start with 100 images.

calibration_images/
├── frame_0001.jpg
├── frame_0002.jpg
└── ...

Write `config.json` and compile

DX-COM is driven by a JSON configuration file describing input shape, calibration settings, and the preprocessing pipeline. A minimal YOLOv11n configuration:

{
  "inputs":             {"images": [1, 3, 640, 640]},
  "calibration_method": "ema",
  "calibration_num":    100,
  "default_loader": {
    "dataset_path":    "./calibration_images",
    "file_extensions": ["jpeg", "jpg", "png"],
    "preprocessings": [
      {"resize":       {"width": 640, "height": 640}},
      {"convertColor": {"form": "BGR2RGB"}},
      {"div":          {"x": 255}},
      {"transpose":    {"axis": [2, 0, 1]}},
      {"expandDim":    {"axis": 0}}
    ]
  }
}

dxcom -m ./yolo11n.onnx -c config.json -o yolo11n-output

# Successful run ends with:
# [INFO] - Compilation complete.
# [INFO] - Output: yolo11n-output/yolo11n.dxnn

For every config.json field (calibration algorithms, the full preprocessings operator list, all three PPU types for YOLO models, and DXQ accuracy-recovery schemes), see the dedicated DX-COM Configuration Reference.

After compilation, check the log for [INFO] - Added nodes: entries. Each operation listed there has been absorbed into the NPU graph and must be removed from your host-side preprocessing code. Running an absorbed operation on the host and inside the NPU produces wrong results: the image gets normalised twice. See the Preprocessing operations reference for the full list of absorbable operations.

What the compiler does to the model

Operator mapping. Each ONNX op is mapped to a DEEPX NPU instruction sequence. Unsupported ops either error out or fall back to CPU execution at runtime, depending on architecture.
INT8 quantisation. Weights and activations are calibrated against your dataset and quantised. No developer configuration required beyond the calibration set.
Memory planning. The compiler lays out tensors to fit NPU on-chip memory. Models that exceed available NPU memory fail at compile time.
Optional PPU fusion. For YOLO architectures, the post-processing graph is fused onto the NPU when PPU is enabled.

Compiler upstream reference: github.com/DEEPX-AI/dx-compiler →

Deploy the `.dxnn` artifact and run inference

The compiled .dxnn file is the deployment artifact. Copy it to the target Sixfab device, then load it with the DEEPX runtime (dxrt-runtime) and run inference. The Python and C++ APIs are functionally equivalent; C++ runs slightly more efficiently on tight inference loops.

Transfer the artifact to the target device

The transfer step is the only stage where the procedure varies by product. The .dxnn file itself is identical across all three.

Sixfab AI HAT+

Raspberry Pi 5 host

SCP from the compile host. Example:

scp model.dxnn [email protected]:~/

Standard SSH-over-LAN deployment.

Sixfab Edge AI Expansion Board

Raspberry Pi 5 host

SCP from the compile host. Example:

scp model.dxnn [email protected]:~/

Same Pi 5 deployment path as AI HAT+.

ALPON X5 AI

Fanless industrial system

Deployment over ALPON Cloud or direct SCP, depending on fleet management setup. [NEED FROM SIXFAB: confirm canonical ALPON deployment path for custom .dxnn artifacts.]

Model size limit. The compiled model must fit in NPU on-chip memory. Models that exceed available NPU memory fail to load on the device. Check the model footprint reported by DX-COM at the end of compilation, then pick the target product whose NPU envelope fits: DX-M1M and DX-M1 are the 25-TOPS variants; DX-M1ML is the 13-TOPS variant on AI HAT+ with the smaller on-chip memory envelope.

Run inference with the Python API

from dx_engine import InferenceEngine
import cv2, numpy as np

# 1. Load the compiled model
engine = InferenceEngine("models/my_model.dxnn")

# 2. Preprocess input (remove operations the NPU absorbed)
img = cv2.imread("frame.jpg")
img = cv2.resize(img, (640, 640))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.transpose(img, (2, 0, 1))
img = np.expand_dims(img, 0)

# 3. Run inference
output = engine.run(img)

Run inference with the C++ API

#include <dxrt/inference_engine.h>

dxrt::InferenceEngine engine("models/my_model.dxnn");

for (const auto& frame : frames) {
    auto output = engine.Run(frame.data());
    // process output
}

Verify the NPU is being used

On the target device, run dxrt-cli -s to confirm the NPU is loaded and the model is dispatched correctly. For the full monitoring reference (utilisation, temperature, fleet integration with Prometheus and Grafana), see AI Model Deployment · System Monitoring.

NPU-absorbed preprocessing. Operations that the compiler absorbed into the NPU graph (logged at compile time) must be removed from this host-side preprocessing. See the DX-COM Configuration Reference for which operations can be absorbed.

If a pre-compiled artifact has been provided (by Sixfab, a colleague, or a partner), skip directly to Stage 3 and run inference on the target Sixfab device.

Sixfab × Ultralytics acceleration path

For teams that want to skip the compile workflow entirely, Sixfab offers a managed partnership track with Ultralytics. Bring a labelled dataset; the partnership delivers a trained, optimised, and deployed custom model on your target Sixfab hardware in days, with no DX-COM toolchain on your side.

Learn about the Sixfab × Ultralytics acceleration path

What the partnership covers, what Sixfab needs from you, expected timeline, and how to engage.

Open partnership page

Compatible libraries

DX-COM accepts ONNX models produced by the following ecosystems. The training framework is independent of compilation; choose what your team already uses.

PyTorch. Export via torch.onnx.export. Opset 11 is widely compatible with DX-COM; opset version recommendations beyond this [NEED FROM SIXFAB: confirm canonical opset version per framework].
TensorFlow. Export via the tf2onnx converter. Verify input/output tensor names match what config.json declares.
Keras. Export via tf2onnx or keras2onnx. Check for dynamic shape dimensions before compiling; DX-COM requires fixed input shapes.
Ultralytics YOLO. Built-in ONNX export with model.export(format="onnx"). Recommended starting point for object detection, classification, segmentation, and pose.
ONNX Model Zoo. Direct download. Verify the model's operator set against the Supported Models Catalog before compiling.

Standard data-science and computer-vision libraries (OpenCV, NumPy, Pillow, scikit-image, Picamera2, libcamera) work alongside dxrt-runtime on the target device with no conflicts.

Limitations and constraints

DXNN SDK has well-defined boundaries. Acknowledging them up front saves debugging time later.

Vision models today

Scope

The DEEPX NPU accelerates convolutional and vision-based networks. LLMs, audio-only models, and text-based transformers are not supported on current silicon. Vision today; LLMs are on the DEEPX roadmap and Sixfab will support them as the silicon enables. No dates.

INT8 precision only

~2 % loss

All models run at INT8. The compiler quantises automatically. FP16 and FP32 are not supported on the NPU. Expect approximately 2 % accuracy reduction versus the original FP32 model. Plan evaluation against this envelope, not against the FP32 numbers.

Inference only

No training

Sixfab edge AI hardware runs inference exclusively. On-device training is not supported. Train on a GPU-equipped machine (or via the Sixfab × Ultralytics acceleration path), then deploy the resulting .dxnn file using the ONNX → DXNN compiler path described above.

Ubuntu x86_64 to compile

No ARM

DX-COM requires Ubuntu x86_64 with at least 16 GB RAM. ARM and aarch64 are not supported for compilation. The Pi 5 itself cannot compile models; neither can ALPON X5 AI.

Operator support boundary

Verify first

Models with operators outside the DEEPX-supported list either error at compile time or fall back to CPU at runtime, depending on architecture. Verify against the Supported Models Catalog before exporting.

No hot-plug on HAT+ or Expansion Board

Power off first

Hot-plug is not supported on AI HAT+ or Edge AI Expansion Board. Power off the Raspberry Pi 5 before mounting or removing the board. ALPON X5 AI is a sealed industrial computer; the NPU is integrated, not removable.

Host compatibility

Pi 5 / CM5

Supported hosts for AI HAT+ and Edge AI Expansion Board: Raspberry Pi 5 and Raspberry Pi Compute Module 5 via the official Raspberry Pi CM5 IO Board. Not supported: Pi 4, CM4, non-Raspberry Pi SBCs. ALPON X5 AI ships as a complete system built on Pi CM5 + DEEPX inside a fanless enclosure.

Recompile on SDK breaks

Versioning

If a runtime SDK update introduces breaking changes, existing .dxnn files may need recompilation. The runtime reports a version mismatch at load time when this occurs.

Need parameter-level detail?

Every config.json field, all three PPU types with Netron walkthroughs, DXQ accuracy recovery, and the full preprocessing operator list live in the dedicated reference.

Open Configuration Reference