Custom Models (DXNN SDK)

Take any ONNX model, trained in PyTorch, TensorFlow, Keras, or Ultralytics, compile it for the DEEPX NPU on the Sixfab Edge AI Expansion Board, and run it on Raspberry Pi 5 with Python or C++. No cloud service. No GPU at inference time.

ONNX → DXNN DEEPX DX-M1M / DX-M1ML INT8 precision Two-machine workflow

Edge AI Expansion Board · Models & Deployment · Custom Models (DXNN SDK) · Updated 2026-05-14

How do I deploy a custom model on the Sixfab Edge AI Expansion Board?

The DXNN SDK converts a trained ONNX model into a .dxnn binary that the DEEPX NPU can execute. The compiler (dx-compiler) runs once on an Ubuntu x86_64 machine; the resulting .dxnn file then runs offline on the Edge AI Expansion Board via the dxrt-runtime package on Raspberry Pi 5. Quantization to INT8 is automatic, with approximately 2 % accuracy loss versus the original FP32 model.

This page is the deepest custom-model reference for the Sixfab Edge AI Expansion Board for Raspberry Pi 5. It documents the conversion path end to end, calls out the constraints that catch most teams on their first deployment, and points to the partnership track that takes you from a labelled dataset to a deployed custom model in days.

The two-machine workflow

Custom-model deployment on the Edge AI Expansion Board is split between two machines because compilation is expensive and inference is not. Compilation runs once per model, on a desktop. Inference runs every frame, on the Pi.

The DEEPX compiler (dx-compiler, sometimes referred to as DX-COM) requires Ubuntu x86_64 with at least 16 GB of RAM. The dxrt-runtime on Raspberry Pi 5 only loads and executes the compiled .dxnn file. No compiler toolchain is needed on the Pi. Once a model is compiled, it runs on the Edge AI Expansion Board indefinitely without an internet connection.

Component	Runs on	What it does
DXNN compiler (`dx-compiler`)	Ubuntu x86_64 host	Converts an ONNX model into an optimized `.dxnn` file for the DEEPX NPU. Quantizes automatically to INT8.
DEEPX runtime (`dxrt-runtime`)	Raspberry Pi 5 with Edge AI Expansion Board	Loads `.dxnn` files and runs inference on the NPU. Provides Python (`dx_engine`) and C++ (`dxrt`) APIs. Installed via the `sixfab-dx` APT package.

Compiler machine requirements

The dx-compiler must run on a separate Ubuntu x86_64 machine, not on the Raspberry Pi itself. ARM / aarch64 is not supported for compilation.

CPU architecture

amd64 (x86_64)

aarch64 / ARM not supported

RAM

≥ 16 GB

Required for model compilation

Disk space

≥ 8 GB free

Toolchain + intermediate artifacts

Operating system

Ubuntu 20.04 / 22.04 / 24.04

x86_64 only · 18.04 not supported

Also validated

Fedora 42–45 · RHEL 9–10 · CentOS Stream 9–10

Validated as of DX-COM v2.3.0

glibc (ldd)

≥ 2.28

Verify with ldd --version

GPU

Not required

Compilation runs on CPU only

Compile on Pi?

Not supported

Raspberry Pi 5 runs the runtime only

Compilation is the expensive step. Run it once per model revision on the Ubuntu host (locally, on a CI server, or overnight) and treat the produced .dxnn file as the deployment artifact. Once compiled, the .dxnn file loads in milliseconds on the Pi.

Conversion path: ONNX → DXNN → deployed artifact

Three stages take a trained model from a development machine onto the NPU. Each stage runs in a different environment and produces a different artifact.

Export the trained model to ONNX

Train the model in any major framework, then export to ONNX on the development machine. ONNX is the only input format the dx-compiler accepts.

PyTorch Ultralytics YOLO TensorFlow Keras XGBoost ONNX Model Zoo Hugging Face

Vision architectures (object detection, classification, segmentation) are the supported scope on current DEEPX silicon. Supported families include YOLO, MobileNet, EfficientNet, ResNet, and more; the published reference list lives at developer.deepx.ai/modelzoo. LLMs and audio-only transformer architectures are not supported today; LLMs are on the DEEPX roadmap and Sixfab will support them as the silicon enables.

PyTorch export

import torch

# 1. Load your trained model and switch to eval mode
model.eval()

# 2. Provide a dummy input matching the production input shape
dummy_input = torch.randn(1, 3, 640, 640)

# 3. Export to ONNX (opset 11 is widely compatible with dx-compiler)
torch.onnx.export(
    model, dummy_input,
    "my_model.onnx",
    opset_version=11,
    input_names=["images"],
    output_names=["output"]
)

Ultralytics YOLO export

Ultralytics ships a one-command ONNX export. This is the recommended starting point for YOLOv8n, YOLOv8s, and YOLOv8m.

yolo export model=yolov8n.pt format=onnx opset=11

Verify the operators in the exported ONNX graph against the DEEPX-supported list before compiling. The full reference lives at github.com/DEEPX-AI/dx-compiler · Building_Models.md.

Compile ONNX to DXNN with `dx-compiler`

The DEEPX NPU compiler reads the ONNX file and produces a .dxnn file. Quantization to INT8 happens automatically; once the input size and other parameters are provided in config.json, no manual precision configuration is required. Expect approximately 2 % accuracy reduction compared to the original FP32 model. This is the published, honest accuracy envelope for INT8 inference on DEEPX silicon. Do not represent the result as FP32-equivalent.

When compiling a YOLO model, enable PPU (Post-Processing Unit) support in dx-compiler. This offloads non-maximum suppression and confidence filtering to the NPU, and is the largest single FPS improvement available for detection models on the Edge AI Expansion Board. Without PPU support, the user is responsible for implementing NMS in application code. Refer to the dx-compiler documentation for the PPU flag.

Install `dx-compiler` and run a compilation

# 1. Clone the dx-compiler toolchain on the Ubuntu x86_64 host
git clone https://github.com/DEEPX-AI/dx-compiler.git
cd dx-compiler

# 2. Follow the upstream install procedure for the toolchain release
#    matching the dxrt-runtime version installed on your Pi 5.
#    The compiler reads model parameters from a config.json file.

# 3. Compile ONNX → DXNN with the config for your model
dx-compiler --config <config.json>              --model my_model.onnx              --output my_model.dxnn

# 4. (Optional) inspect the compiled artifact
file my_model.dxnn

What the compiler does to the model

Operator mapping. Each ONNX op is mapped to a DEEPX NPU instruction sequence. Unsupported ops either error out or fall back to CPU execution at runtime, depending on architecture.
INT8 quantization. Weights and activations are calibrated and quantized. No developer configuration is required once config.json declares the input shape.
Memory planning. The compiler lays out tensors to fit NPU on-chip memory. Models that exceed available NPU memory fail at compile time.
Optional PPU fusion. For YOLO architectures, the post-processing graph (NMS, score filtering) is fused onto the NPU when PPU is enabled, eliminating a CPU step per frame.

Layer fallback behaviour. When a model contains a layer the NPU cannot accelerate but ONNX can still execute, that layer runs on the Raspberry Pi 5 CPU at inference time. Hybrid graphs work but FPS drops significantly. Check the supported operator list before compiling unfamiliar architectures.

Compiler installation reference: github.com/DEEPX-AI/dx-compiler →

Deploy the `.dxnn` artifact and run inference

The compiled .dxnn file is the deployment artifact. Copy it to the Raspberry Pi 5, then load it with the DEEPX runtime (dxrt-runtime) and run inference. The Python (dx_engine) and C++ (dxrt) APIs are functionally equivalent; C++ runs slightly more efficiently on tight inference loops.

Prerequisite: the sixfab-dx APT package must already be installed on the Pi 5. If not, follow the Quickstart first — it sets up the kernel driver (via DKMS), the runtime, and the CLI tools in a single command.

Copy the artifact to the Pi

# Replace <pi_user> and <pi_host> with your values
scp my_model.dxnn <pi_user>@<pi_host>:~/models/

Model size limit. The compiled model must fit in NPU on-chip memory. Models that exceed available NPU memory fail to load on the device. Check the model footprint reported by dx-compiler at the end of compilation, then pick the variant (DEEPX DX-M1M for the larger envelope at 25 TOPS at INT8 precision, DEEPX DX-M1ML for the smaller envelope at 13 TOPS at INT8 precision) that fits the workload.

Run inference on the NPU (Python)

Load the .dxnn file with the dx_engine Python API. Preprocessing and postprocessing run on the Pi 5 CPU; the NPU executes the model. The following pattern is the synchronous form — one frame in, one result out.

from dx_engine import InferenceEngine
import numpy as np
import cv2

# 1. Load the compiled DXNN model onto the NPU
engine = InferenceEngine("<model_path>/my_model.dxnn")

# 2. Preprocess on the Pi 5 CPU (BGR → RGB, normalize, NCHW)
frame = cv2.imread("<input_image>.jpg")
rgb   = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
inp   = (rgb.astype(np.float32) / 255.0)
inp   = np.transpose(inp, (2, 0, 1))[np.newaxis]

# 3. Run inference on the NPU
outputs = engine.run(inp)

# 4. Parse outputs in your application code (no standard output format —
#    the shape matches what the ONNX graph produced)
print(outputs[0].shape)

OpenCV loads images as BGR by default. Most vision models are trained on RGB. Always convert explicitly with cv2.COLOR_BGR2RGB before feeding frames to the NPU. A pipeline that reports correct accuracy in evaluation but loses several percentage points in production almost always has this wrong.

Run inference asynchronously

For pipelined throughput — submit frame N while the NPU is still working on frame N−1 — use the run_async entry point. Asynchronous calls alleviate the synchronous bottleneck inside a single script and are the recommended pattern when the application has other work to do (capture, preprocessing, network I/O) while the NPU runs. Full reference: github.com/DEEPX-AI/dx_rt →.

Run inference (C++)

The C++ dxrt API is the lower-overhead path for tight inference loops and for applications that already live in a C++ runtime (robotics control, native video pipelines). Build against the headers installed by the sixfab-dx package and link libdxrt. End-to-end C++ examples live in github.com/sixfab/sixfab-dx-examples under the cpp_examples tree, alongside the Python examples.

Update a deployed model in the field

Push a new compiled .dxnn file over SSH and restart the inference service without physical access to the device. There is no restriction on the delivery method — SCP, OTA, configuration-management push — as long as it is secure.

# 1. Copy the new artifact
scp my_model_v2.dxnn <pi_user>@<pi_host>:~/models/

# 2. Restart the systemd service that loads the model
ssh <pi_user>@<pi_host> "sudo systemctl restart my-inference.service"

Run inside Docker

Docker containers are supported on Raspberry Pi 5. The container needs --privileged and the NPU device node mounted from the host kernel driver:

docker run --privileged   -v /dev/dxrt0:/dev/dxrt0   -v ~/models:/models   my-inference-app:latest

Sixfab × Ultralytics acceleration path

For teams who want to skip building the toolchain themselves and move from a labelled dataset to a deployed custom model in days, Sixfab offers a partnership track with Ultralytics that compresses the steps above into a guided pipeline.

Sixfab × Ultralytics acceleration path

From labelled dataset to deployed custom model in days

Train a YOLO model on the customer dataset in the Ultralytics workflow, export to ONNX, compile to DXNN with the partnership tooling, and deploy to the Edge AI Expansion Board. The path is opinionated where it can be (it removes the choice points that take teams the longest to figure out) and explicit where the customer needs to make a call: resolution, batch size, accuracy and FPS trade-off.

Labelled dataset → Ultralytics training → ONNX export → DXNN compile → Edge AI Expansion Board deployed

This path is the right starting point for object detection, classification, and segmentation on YOLO architectures. For non-YOLO architectures, follow Stages 1–3 above directly.

Note: on-device training is not supported on the Edge AI Expansion Board. Customers train on their own infrastructure and deploy the resulting weights via the ONNX → DXNN compiler path described on this page.

Compatible Python libraries

Standard data-science and computer-vision libraries work alongside dxrt-runtime on Raspberry Pi 5 with no conflicts. OpenCV, Pillow, and NumPy are explicitly validated.

OpenCV NumPy Pillow scikit-image Picamera2 libcamera

Pre-recorded video files are supported as inputs alongside live camera streams, which is the recommended approach for repeatable bench testing during development.

Integration patterns on the Edge AI Expansion Board

The Edge AI Expansion Board carries the NPU, the NVMe slot, and the LTE/5G modem on a single baseboard, so a deployed inference application typically writes to local storage and sends results upstream over cellular while inferring. The patterns below cover the points where Sixfab software ends and customer application code begins.

Inference, NVMe, and LTE/5G in the same pipeline

In bench testing, concurrent NVMe writes and LTE/5G traffic do not degrade AI inference throughput on the Edge AI Expansion Board. The NPU runs on a dedicated PCIe Gen 2/Gen 3 x1 link to the Pi 5, while storage and cellular share the on-board USB 3.2 Gen 1 hub. The practical implication for custom-model deployments: a single application can buffer raw frames to NVMe, run inference, and stream structured results over LTE/5G without staging a separate inference host.

The sixfab-dx package installs the driver, runtime, and CLI tools — it does not save camera footage or inference results anywhere on its own. Set the log and output paths in the inference application to the mount point of the NVMe SSD, and transmit results over the LTE/5G connection through whatever platform the application uses (MQTT, HTTPS, custom protocol). The NPU does not constrain either choice.

Multiple concurrent models on one NPU

The DEEPX NPU can run more than one compiled model in parallel. Up to three concurrent models have been validated in bench testing; more is possible but not characterized. The practical pattern is a detection model plus a classification or pose model fed from the detected regions, all on the same NPU.

Multiple camera streams

Bench testing has shown the NPU handling four concurrent 720p streams cleanly. Beyond that, frame rate begins to drop — but the bottleneck is the Raspberry Pi 5 CPU handling capture, decode, and preprocessing, not the NPU itself. Plan accordingly: if more camera streams are needed, optimize the CPU-side pipeline (hardware decode, fewer copies, async I/O) before assuming the NPU is the limit.

Image preprocessing and postprocessing

The Raspberry Pi 5 CPU is responsible for everything around the inference call: capture, colour conversion, resize, normalisation, drawing bounding boxes and labels on output frames, and any non-maximum suppression that is not handled by a PPU-fused YOLO model. The NPU only runs the compiled graph. Image format expectations vary per model — most compiled models accept NHWC input and the exact tensor shape must match the config.json used at compile time.

Moving an existing application from a GPU or cloud server

If the model already fits within DEEPX-supported architectures, the transition is usually short. Compile the existing ONNX export with dx-compiler, swap the inference backend from the GPU framework to the dx_engine Python API (or dxrt in C++), and validate. With ready inference code, this is typically a matter of hours rather than weeks.

Limitations and constraints

Understanding these constraints up front saves significant debugging time later.

Vision models today

Scope

The DEEPX NPU accelerates convolutional and vision-based networks. LLMs, audio-only models, and text-based transformers are not supported on current silicon. LLMs are on the DEEPX roadmap and Sixfab will support them as the silicon enables.

INT8 precision only

~2 % loss

All models run at INT8. The compiler quantizes automatically. Expect approximately 2 % accuracy reduction versus the original FP32 model. Plan evaluation against this envelope, not against the FP32 numbers.

Inference only

No training

The Edge AI Expansion Board runs inference exclusively. Train on a GPU-equipped machine (or via the Sixfab × Ultralytics acceleration path), then deploy the resulting .dxnn file.

Ubuntu x86_64 to compile

No ARM

The dx-compiler requires Ubuntu x86_64 with at least 16 GB RAM. ARM and aarch64 are not supported for compilation. The Pi 5 itself cannot compile models.

Model fits NPU memory

Hard cap

The compiled .dxnn footprint cannot exceed available NPU on-chip memory. DX-M1M provides the larger envelope; DX-M1ML the smaller. Footprint is reported at the end of compilation.

No automatic NMS for non-PPU models

Manual

For YOLO models compiled with PPU support, bounding-box output is produced directly. For all other architectures, non-maximum suppression and other postprocessing run in CPU application code.

Use Sixfab-compiled artifacts only

Versioning

A .dxnn file downloaded from a third party (Hugging Face, ONNX Model Zoo, internet at large) only runs if it was compiled with dx-compiler against a compatible runtime version. Raw ONNX from those sources must be recompiled.

No published version matrix

Upgrade together

DEEPX does not publish a compatibility matrix between dx-compiler, dxrt-runtime, the kernel driver, and the on-NPU firmware. Upgrade them together via the sixfab-dx APT package, and recompile .dxnn files if the runtime reports a version mismatch at load time.

If a pre-compiled artifact has been provided (by Sixfab, a colleague, or a partner), skip directly to Stage 3 and run inference on the Pi 5. The Sixfab Model Zoo is the curated source for ready-to-run .dxnn files maintained by Sixfab.

Per Sixfab's content cornerstone, every command sequence on this page is re-run on the listed hardware within the 30 days preceding publication. If a runtime, compiler, or operating-system update lands inside that window, this page is reverified before the change is shipped.

The two-machine workflow

Compiler machine requirements

Conversion path: ONNX → DXNN → deployed artifact

Export the trained model to ONNX

PyTorch export

Ultralytics YOLO export

Compile ONNX to DXNN with dx-compiler

Install dx-compiler and run a compilation

What the compiler does to the model

Deploy the .dxnn artifact and run inference

Copy the artifact to the Pi

Run inference on the NPU (Python)

Run inference asynchronously

Run inference (C++)

Update a deployed model in the field

Run inside Docker

Sixfab × Ultralytics acceleration path

From labelled dataset to deployed custom model in days

Compatible Python libraries

Integration patterns on the Edge AI Expansion Board

Inference, NVMe, and LTE/5G in the same pipeline

Multiple concurrent models on one NPU

Multiple camera streams

Image preprocessing and postprocessing

Moving an existing application from a GPU or cloud server

Limitations and constraints

Vision models today

INT8 precision only

Inference only

Ubuntu x86_64 to compile

Model fits NPU memory

No automatic NMS for non-PPU models

Use Sixfab-compiled artifacts only

No published version matrix

Compile ONNX to DXNN with `dx-compiler`

Install `dx-compiler` and run a compilation

Deploy the `.dxnn` artifact and run inference