Custom Models (DXNN SDK)

Custom Models (DXNN SDK)

Take any ONNX model, trained in PyTorch, TensorFlow, Keras, or Ultralytics, compile it for the DEEPX NPU on the Sixfab Edge AI Expansion Board, and run it on Raspberry Pi 5 with Python or C++. No cloud service. No GPU at inference time.

ONNX → DXNN DEEPX DX-M1M / DX-M1ML INT8 precision Two-machine workflow
Edge AI Expansion Board · Models & Deployment · Custom Models (DXNN SDK) · Updated 2026-05-14
How do I deploy a custom model on the Sixfab Edge AI Expansion Board?

The DXNN SDK converts a trained ONNX model into a .dxnn binary that the DEEPX NPU can execute. The compiler (dx-compiler) runs once on an Ubuntu x86_64 machine; the resulting .dxnn file then runs offline on the Edge AI Expansion Board via the dxrt-runtime package on Raspberry Pi 5. Quantization to INT8 is automatic, with approximately 2 % accuracy loss versus the original FP32 model.

This page is the deepest custom-model reference for the Sixfab Edge AI Expansion Board for Raspberry Pi 5. It documents the conversion path end to end, calls out the constraints that catch most teams on their first deployment, and points to the partnership track that takes you from a labelled dataset to a deployed custom model in days.

The two-machine workflow

Custom-model deployment on the Edge AI Expansion Board is split between two machines because compilation is expensive and inference is not. Compilation runs once per model, on a desktop. Inference runs every frame, on the Pi.

Compile once on Ubuntu, run forever on Pi 5

The DEEPX compiler (dx-compiler, sometimes referred to as DX-COM) requires Ubuntu x86_64 with at least 16 GB of RAM. The dxrt-runtime on Raspberry Pi 5 only loads and executes the compiled .dxnn file. No compiler toolchain is needed on the Pi. Once a model is compiled, it runs on the Edge AI Expansion Board indefinitely without an internet connection.

ComponentRuns onWhat it does
DXNN compiler (dx-compiler) Ubuntu x86_64 host Converts an ONNX model into an optimized .dxnn file for the DEEPX NPU. Quantizes automatically to INT8.
DEEPX runtime (dxrt-runtime) Raspberry Pi 5 with Edge AI Expansion Board Loads .dxnn files and runs inference on the NPU. Provides Python (dx_engine) and C++ (dxrt) APIs. Installed via the sixfab-dx APT package.

Compiler machine requirements

The dx-compiler must run on a separate Ubuntu x86_64 machine, not on the Raspberry Pi itself. ARM / aarch64 is not supported for compilation.

CPU architecture
amd64 (x86_64)
aarch64 / ARM not supported
RAM
≥ 16 GB
Required for model compilation
Disk space
≥ 8 GB free
Toolchain + intermediate artifacts
Operating system
Ubuntu 20.04 / 22.04 / 24.04
x86_64 only · 18.04 not supported
Also validated
Fedora 42–45 · RHEL 9–10 · CentOS Stream 9–10
Validated as of DX-COM v2.3.0
glibc (ldd)
≥ 2.28
Verify with ldd --version
GPU
Not required
Compilation runs on CPU only
Compile on Pi?
Not supported
Raspberry Pi 5 runs the runtime only
Plan compilation as a one-time step per model version

Compilation is the expensive step. Run it once per model revision on the Ubuntu host (locally, on a CI server, or overnight) and treat the produced .dxnn file as the deployment artifact. Once compiled, the .dxnn file loads in milliseconds on the Pi.

Conversion path: ONNX → DXNN → deployed artifact

Three stages take a trained model from a development machine onto the NPU. Each stage runs in a different environment and produces a different artifact.

1

Export the trained model to ONNX

Train the model in any major framework, then export to ONNX on the development machine. ONNX is the only input format the dx-compiler accepts.

PyTorch Ultralytics YOLO TensorFlow Keras XGBoost ONNX Model Zoo Hugging Face

Vision architectures (object detection, classification, segmentation) are the supported scope on current DEEPX silicon. Supported families include YOLO, MobileNet, EfficientNet, ResNet, and more; the published reference list lives at developer.deepx.ai/modelzoo. LLMs and audio-only transformer architectures are not supported today; LLMs are on the DEEPX roadmap and Sixfab will support them as the silicon enables.

PyTorch export

python: export PyTorch model to ONNX Ubuntu host
import torch

# 1. Load your trained model and switch to eval mode
model.eval()

# 2. Provide a dummy input matching the production input shape
dummy_input = torch.randn(1, 3, 640, 640)

# 3. Export to ONNX (opset 11 is widely compatible with dx-compiler)
torch.onnx.export(
    model, dummy_input,
    "my_model.onnx",
    opset_version=11,
    input_names=["images"],
    output_names=["output"]
)

Ultralytics YOLO export

Ultralytics ships a one-command ONNX export. This is the recommended starting point for YOLOv8n, YOLOv8s, and YOLOv8m.

bash: export YOLOv8 to ONNX Ubuntu host
yolo export model=yolov8n.pt format=onnx opset=11
Verify the operators in the exported ONNX graph against the DEEPX-supported list before compiling. The full reference lives at github.com/DEEPX-AI/dx-compiler · Building_Models.md.
2

Compile ONNX to DXNN with dx-compiler

The DEEPX NPU compiler reads the ONNX file and produces a .dxnn file. Quantization to INT8 happens automatically; once the input size and other parameters are provided in config.json, no manual precision configuration is required. Expect approximately 2 % accuracy reduction compared to the original FP32 model. This is the published, honest accuracy envelope for INT8 inference on DEEPX silicon. Do not represent the result as FP32-equivalent.

Enable PPU for YOLO models

When compiling a YOLO model, enable PPU (Post-Processing Unit) support in dx-compiler. This offloads non-maximum suppression and confidence filtering to the NPU, and is the largest single FPS improvement available for detection models on the Edge AI Expansion Board. Without PPU support, the user is responsible for implementing NMS in application code. Refer to the dx-compiler documentation for the PPU flag.

Install dx-compiler and run a compilation

bash: install dx-compiler and compile a model Ubuntu x86_64
# 1. Clone the dx-compiler toolchain on the Ubuntu x86_64 host
git clone https://github.com/DEEPX-AI/dx-compiler.git
cd dx-compiler

# 2. Follow the upstream install procedure for the toolchain release
#    matching the dxrt-runtime version installed on your Pi 5.
#    The compiler reads model parameters from a config.json file.

# 3. Compile ONNX → DXNN with the config for your model
dx-compiler --config <config.json>              --model my_model.onnx              --output my_model.dxnn

# 4. (Optional) inspect the compiled artifact
file my_model.dxnn

What the compiler does to the model

  • Operator mapping. Each ONNX op is mapped to a DEEPX NPU instruction sequence. Unsupported ops either error out or fall back to CPU execution at runtime, depending on architecture.
  • INT8 quantization. Weights and activations are calibrated and quantized. No developer configuration is required once config.json declares the input shape.
  • Memory planning. The compiler lays out tensors to fit NPU on-chip memory. Models that exceed available NPU memory fail at compile time.
  • Optional PPU fusion. For YOLO architectures, the post-processing graph (NMS, score filtering) is fused onto the NPU when PPU is enabled, eliminating a CPU step per frame.
Layer fallback behaviour. When a model contains a layer the NPU cannot accelerate but ONNX can still execute, that layer runs on the Raspberry Pi 5 CPU at inference time. Hybrid graphs work but FPS drops significantly. Check the supported operator list before compiling unfamiliar architectures.

Compiler installation reference: github.com/DEEPX-AI/dx-compiler →

3

Deploy the .dxnn artifact and run inference

The compiled .dxnn file is the deployment artifact. Copy it to the Raspberry Pi 5, then load it with the DEEPX runtime (dxrt-runtime) and run inference. The Python (dx_engine) and C++ (dxrt) APIs are functionally equivalent; C++ runs slightly more efficiently on tight inference loops.

Prerequisite: the sixfab-dx APT package must already be installed on the Pi 5. If not, follow the Quickstart first — it sets up the kernel driver (via DKMS), the runtime, and the CLI tools in a single command.

Copy the artifact to the Pi

bash: push compiled model to the Pi 5 Ubuntu host
# Replace <pi_user> and <pi_host> with your values
scp my_model.dxnn <pi_user>@<pi_host>:~/models/
Model size limit. The compiled model must fit in NPU on-chip memory. Models that exceed available NPU memory fail to load on the device. Check the model footprint reported by dx-compiler at the end of compilation, then pick the variant (DEEPX DX-M1M for the larger envelope at 25 TOPS at INT8 precision, DEEPX DX-M1ML for the smaller envelope at 13 TOPS at INT8 precision) that fits the workload.

Run inference on the NPU (Python)

Load the .dxnn file with the dx_engine Python API. Preprocessing and postprocessing run on the Pi 5 CPU; the NPU executes the model. The following pattern is the synchronous form — one frame in, one result out.

python: synchronous inference with dx_engine Raspberry Pi 5
from dx_engine import InferenceEngine
import numpy as np
import cv2

# 1. Load the compiled DXNN model onto the NPU
engine = InferenceEngine("<model_path>/my_model.dxnn")

# 2. Preprocess on the Pi 5 CPU (BGR → RGB, normalize, NCHW)
frame = cv2.imread("<input_image>.jpg")
rgb   = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
inp   = (rgb.astype(np.float32) / 255.0)
inp   = np.transpose(inp, (2, 0, 1))[np.newaxis]

# 3. Run inference on the NPU
outputs = engine.run(inp)

# 4. Parse outputs in your application code (no standard output format —
#    the shape matches what the ONNX graph produced)
print(outputs[0].shape)
Check colour order: the most common accuracy regression

OpenCV loads images as BGR by default. Most vision models are trained on RGB. Always convert explicitly with cv2.COLOR_BGR2RGB before feeding frames to the NPU. A pipeline that reports correct accuracy in evaluation but loses several percentage points in production almost always has this wrong.

Run inference asynchronously

For pipelined throughput — submit frame N while the NPU is still working on frame N−1 — use the run_async entry point. Asynchronous calls alleviate the synchronous bottleneck inside a single script and are the recommended pattern when the application has other work to do (capture, preprocessing, network I/O) while the NPU runs. Full reference: github.com/DEEPX-AI/dx_rt →.

Run inference (C++)

The C++ dxrt API is the lower-overhead path for tight inference loops and for applications that already live in a C++ runtime (robotics control, native video pipelines). Build against the headers installed by the sixfab-dx package and link libdxrt. End-to-end C++ examples live in github.com/sixfab/sixfab-dx-examples under the cpp_examples tree, alongside the Python examples.

Update a deployed model in the field

Push a new compiled .dxnn file over SSH and restart the inference service without physical access to the device. There is no restriction on the delivery method — SCP, OTA, configuration-management push — as long as it is secure.

bash: replace a deployed model on the Pi 5 Ubuntu host
# 1. Copy the new artifact
scp my_model_v2.dxnn <pi_user>@<pi_host>:~/models/

# 2. Restart the systemd service that loads the model
ssh <pi_user>@<pi_host> "sudo systemctl restart my-inference.service"

Run inside Docker

Docker containers are supported on Raspberry Pi 5. The container needs --privileged and the NPU device node mounted from the host kernel driver:

bash: run inference inside Docker Raspberry Pi 5
docker run --privileged   -v /dev/dxrt0:/dev/dxrt0   -v ~/models:/models   my-inference-app:latest

Sixfab × Ultralytics acceleration path

For teams who want to skip building the toolchain themselves and move from a labelled dataset to a deployed custom model in days, Sixfab offers a partnership track with Ultralytics that compresses the steps above into a guided pipeline.

Sixfab × Ultralytics acceleration path

From labelled dataset to deployed custom model in days

Train a YOLO model on the customer dataset in the Ultralytics workflow, export to ONNX, compile to DXNN with the partnership tooling, and deploy to the Edge AI Expansion Board. The path is opinionated where it can be (it removes the choice points that take teams the longest to figure out) and explicit where the customer needs to make a call: resolution, batch size, accuracy and FPS trade-off.

Labelled dataset Ultralytics training ONNX export DXNN compile Edge AI Expansion Board deployed

This path is the right starting point for object detection, classification, and segmentation on YOLO architectures. For non-YOLO architectures, follow Stages 1–3 above directly.

Note: on-device training is not supported on the Edge AI Expansion Board. Customers train on their own infrastructure and deploy the resulting weights via the ONNX → DXNN compiler path described on this page.

Compatible Python libraries

Standard data-science and computer-vision libraries work alongside dxrt-runtime on Raspberry Pi 5 with no conflicts. OpenCV, Pillow, and NumPy are explicitly validated.

OpenCV NumPy Pillow scikit-image Picamera2 libcamera

Pre-recorded video files are supported as inputs alongside live camera streams, which is the recommended approach for repeatable bench testing during development.

Integration patterns on the Edge AI Expansion Board

The Edge AI Expansion Board carries the NPU, the NVMe slot, and the LTE/5G modem on a single baseboard, so a deployed inference application typically writes to local storage and sends results upstream over cellular while inferring. The patterns below cover the points where Sixfab software ends and customer application code begins.

Inference, NVMe, and LTE/5G in the same pipeline

In bench testing, concurrent NVMe writes and LTE/5G traffic do not degrade AI inference throughput on the Edge AI Expansion Board. The NPU runs on a dedicated PCIe Gen 2/Gen 3 x1 link to the Pi 5, while storage and cellular share the on-board USB 3.2 Gen 1 hub. The practical implication for custom-model deployments: a single application can buffer raw frames to NVMe, run inference, and stream structured results over LTE/5G without staging a separate inference host.

Storage and uplink paths are your responsibility

The sixfab-dx package installs the driver, runtime, and CLI tools — it does not save camera footage or inference results anywhere on its own. Set the log and output paths in the inference application to the mount point of the NVMe SSD, and transmit results over the LTE/5G connection through whatever platform the application uses (MQTT, HTTPS, custom protocol). The NPU does not constrain either choice.

Multiple concurrent models on one NPU

The DEEPX NPU can run more than one compiled model in parallel. Up to three concurrent models have been validated in bench testing; more is possible but not characterized. The practical pattern is a detection model plus a classification or pose model fed from the detected regions, all on the same NPU.

Multiple camera streams

Bench testing has shown the NPU handling four concurrent 720p streams cleanly. Beyond that, frame rate begins to drop — but the bottleneck is the Raspberry Pi 5 CPU handling capture, decode, and preprocessing, not the NPU itself. Plan accordingly: if more camera streams are needed, optimize the CPU-side pipeline (hardware decode, fewer copies, async I/O) before assuming the NPU is the limit.

Image preprocessing and postprocessing

The Raspberry Pi 5 CPU is responsible for everything around the inference call: capture, colour conversion, resize, normalisation, drawing bounding boxes and labels on output frames, and any non-maximum suppression that is not handled by a PPU-fused YOLO model. The NPU only runs the compiled graph. Image format expectations vary per model — most compiled models accept NHWC input and the exact tensor shape must match the config.json used at compile time.

Moving an existing application from a GPU or cloud server

If the model already fits within DEEPX-supported architectures, the transition is usually short. Compile the existing ONNX export with dx-compiler, swap the inference backend from the GPU framework to the dx_engine Python API (or dxrt in C++), and validate. With ready inference code, this is typically a matter of hours rather than weeks.

Limitations and constraints

Understanding these constraints up front saves significant debugging time later.

Vision models today

Scope

The DEEPX NPU accelerates convolutional and vision-based networks. LLMs, audio-only models, and text-based transformers are not supported on current silicon. LLMs are on the DEEPX roadmap and Sixfab will support them as the silicon enables.

INT8 precision only

~2 % loss

All models run at INT8. The compiler quantizes automatically. Expect approximately 2 % accuracy reduction versus the original FP32 model. Plan evaluation against this envelope, not against the FP32 numbers.

Inference only

No training

The Edge AI Expansion Board runs inference exclusively. Train on a GPU-equipped machine (or via the Sixfab × Ultralytics acceleration path), then deploy the resulting .dxnn file.

Ubuntu x86_64 to compile

No ARM

The dx-compiler requires Ubuntu x86_64 with at least 16 GB RAM. ARM and aarch64 are not supported for compilation. The Pi 5 itself cannot compile models.

Model fits NPU memory

Hard cap

The compiled .dxnn footprint cannot exceed available NPU on-chip memory. DX-M1M provides the larger envelope; DX-M1ML the smaller. Footprint is reported at the end of compilation.

No automatic NMS for non-PPU models

Manual

For YOLO models compiled with PPU support, bounding-box output is produced directly. For all other architectures, non-maximum suppression and other postprocessing run in CPU application code.

Use Sixfab-compiled artifacts only

Versioning

A .dxnn file downloaded from a third party (Hugging Face, ONNX Model Zoo, internet at large) only runs if it was compiled with dx-compiler against a compatible runtime version. Raw ONNX from those sources must be recompiled.

No published version matrix

Upgrade together

DEEPX does not publish a compatibility matrix between dx-compiler, dxrt-runtime, the kernel driver, and the on-NPU firmware. Upgrade them together via the sixfab-dx APT package, and recompile .dxnn files if the runtime reports a version mismatch at load time.

Already have a compiled .dxnn file? Skip Stages 1 and 2.

If a pre-compiled artifact has been provided (by Sixfab, a colleague, or a partner), skip directly to Stage 3 and run inference on the Pi 5. The Sixfab Model Zoo is the curated source for ready-to-run .dxnn files maintained by Sixfab.

Command-validation policy

Per Sixfab's content cornerstone, every command sequence on this page is re-run on the listed hardware within the 30 days preceding publication. If a runtime, compiler, or operating-system update lands inside that window, this page is reverified before the change is shipped.