Deployment Workflow
Deployment Workflow
End-to-end guide for compiling custom-trained ONNX models and deploying them on the DEEPX NPU across Sixfab AI HAT+ for Raspberry Pi 5, Sixfab Edge AI Expansion Board, and ALPON X5 AI. The two-machine workflow (compile once on an Ubuntu host, run forever on the target Sixfab device) is the same on all three products. No cloud service, no GPU at inference time.
The DXNN SDK converts a trained ONNX model into a .dxnn binary
that the DEEPX NPU can execute. The compiler runs once on an Ubuntu x86_64 machine; the
resulting .dxnn file then runs offline on Sixfab AI HAT+, Edge AI Expansion
Board, or ALPON X5 AI via the dxrt-runtime package on the target device.
Quantisation to INT8 is automatic, with approximately 2 % accuracy loss versus the original
FP32 model.
This page is the operational end-to-end guide. It walks through the three-stage conversion
path, the constraints that catch most teams on their first deployment, and the partnership
track that takes you from a labelled dataset to a deployed custom model in days. For the full
JSON parameter reference of config.json, see the companion
DX-COM Configuration Reference.
The two-machine workflow
Custom-model deployment on Sixfab edge AI hardware is split between two machines because compilation is expensive and inference is not. Compilation runs once per model, on a desktop. Inference runs every frame, on the target device.
The DEEPX compiler (DX-COM) requires Ubuntu x86_64 with at least 16 GB of RAM. The
dxrt-runtime on the target device (Raspberry Pi 5 with AI HAT+ or Edge AI
Expansion Board, or ALPON X5 AI) only loads and executes the compiled .dxnn
file. No compiler toolchain is needed on the target. Once a model is compiled, it runs
indefinitely without an internet connection.
DX-COM compiler
Converts a trained ONNX model into an optimised .dxnn file for the DEEPX
NPU. Quantises automatically to INT8. Runs once per model, on a desktop or a Kaggle
Notebook.
dxrt-runtime
Loads and executes the compiled .dxnn file on the DEEPX NPU. Runs on
Raspberry Pi 5 with AI HAT+ or Edge AI Expansion Board, and on ALPON X5 AI. No compiler
dependencies on the target; the runtime artifact is portable across the three products.
.dxnn file, three deployment targets
The same compiled .dxnn file runs on all three Sixfab edge AI products
without modification. The DEEPX DX-M1 family (DX-M1, DX-M1M, DX-M1ML) shares a unified
instruction set, so a model compiled for the DEEPX NPU is portable across AI HAT+, Edge
AI Expansion Board, and ALPON X5 AI. The only practical constraint is on-chip NPU memory:
a model that fits the larger 25-TOPS variants may not fit the 13-TOPS DX-M1ML envelope on
AI HAT+.
Compiler machine requirements
DX-COM must run on a separate Ubuntu x86_64 machine, not on the target Sixfab device. Every item below is required unless tagged Optional.
x86_64
ARM and aarch64 are not supported for compilation.
ldd --version.
pip install. Inference itself runs offline.
Kaggle Notebooks run on Linux x86_64 with sufficient RAM, a compatible glibc, and internet access out of the box. Every requirement listed above is met by the free Kaggle environment, and Sixfab maintains a reference notebook that walks through the full ONNX → DXNN compile flow without installing anything on a local machine.
Reference notebook: dx-compiler.ipynb
([NEED FROM SIXFAB: Kaggle notebook URL]).
What the notebook does end-to-end:
- Clones the
DEEPX-AI/dx-compilerrepository into the Kaggle working directory. - Runs
install.shwith a Docker volume path to stage the toolchain. - Installs the DX-COM Python wheel (
pip install dx_com*.whl). - Installs
ultralyticsand exports a YOLO model to ONNX. The notebook ships withyolo11nas a worked example; swap in your own.onnxfile. - Writes a
config.jsonwith input shape, calibration settings, preprocessing, and PPU configuration. - Runs
dxcomagainst the ONNX model and config, then produces the compiled.dxnnartifact.
Download the resulting .dxnn file from the Kaggle output panel and copy it
to the target Sixfab device to run inference. See Stage 3 below for the runtime call.
.dxnn file loads in milliseconds
on the target device.
Conversion path: ONNX → DXNN → deployed artifact
Three stages take a trained model from a development machine onto the NPU. Each stage runs in a different environment and produces a different artifact.
Export the trained model to ONNX
Train the model in any major framework, then export to ONNX on the development machine. ONNX is the only input format DX-COM accepts.
Vision architectures (object detection, classification, segmentation, pose) are the supported scope on current DEEPX silicon. LLMs and audio-only transformer architectures are not supported today; LLMs are on the DEEPX roadmap and Sixfab will support them as the silicon enables.
PyTorch export
import torch # 1. Load your trained model and switch to eval mode model.eval() # 2. Provide a dummy input matching the production input shape dummy_input = torch.randn(1, 3, 640, 640) # 3. Export to ONNX (opset 11 is widely compatible with DX-COM) torch.onnx.export( model, dummy_input, "my_model.onnx", opset_version=11, input_names=["images"], output_names=["output"] )
Ultralytics YOLO export
from ultralytics import YOLO model = YOLO("yolo11n.pt") model.export(format="onnx") # Produces yolo11n.onnx in the working directory.
1; DX-COM does
not support dynamic or multi-batch compilation via the CLI.
Compile ONNX to DXNN with DX-COM
DX-COM reads the ONNX file, calibrates it for INT8 arithmetic, and produces a
.dxnn binary that runs natively on the DEEPX DX-M1 family NPU. Quantisation
is automatic; expect approximately 2 % accuracy loss versus the original FP32 model.
This is the published, honest accuracy envelope for INT8 inference on DEEPX silicon.
Install DX-COM
# 1. Clone the DEEPX compiler repository git clone https://github.com/DEEPX-AI/dx-compiler cd dx-compiler ./install.sh # 2. Install the Python wheel cd dx_com pip install dx_com*.whl # 3. Verify the installation dxcom --version # expected: DX-COM v2.3.0
Prepare a calibration dataset
Quantisation requires calibration data: a small set of representative images the compiler uses to estimate activation ranges. Use images from your deployment domain; random stock images degrade quantised accuracy. Start with 100 images.
calibration_images/ ├── frame_0001.jpg ├── frame_0002.jpg └── ...
Write config.json and compile
DX-COM is driven by a JSON configuration file describing input shape, calibration settings, and the preprocessing pipeline. A minimal YOLOv11n configuration:
{
"inputs": {"images": [1, 3, 640, 640]},
"calibration_method": "ema",
"calibration_num": 100,
"default_loader": {
"dataset_path": "./calibration_images",
"file_extensions": ["jpeg", "jpg", "png"],
"preprocessings": [
{"resize": {"width": 640, "height": 640}},
{"convertColor": {"form": "BGR2RGB"}},
{"div": {"x": 255}},
{"transpose": {"axis": [2, 0, 1]}},
{"expandDim": {"axis": 0}}
]
}
}
dxcom -m ./yolo11n.onnx -c config.json -o yolo11n-output # Successful run ends with: # [INFO] - Compilation complete. # [INFO] - Output: yolo11n-output/yolo11n.dxnn
For every config.json field (calibration algorithms, the full
preprocessings operator list, all three PPU types for YOLO models, and
DXQ accuracy-recovery schemes), see the dedicated
DX-COM Configuration Reference.
After compilation, check the log for [INFO] - Added nodes: entries.
Each operation listed there has been absorbed into the NPU graph and must be
removed from your host-side preprocessing code. Running an absorbed operation on
the host and inside the NPU produces wrong results: the image gets
normalised twice. See the
Preprocessing operations reference
for the full list of absorbable operations.
What the compiler does to the model
- Operator mapping. Each ONNX op is mapped to a DEEPX NPU instruction sequence. Unsupported ops either error out or fall back to CPU execution at runtime, depending on architecture.
- INT8 quantisation. Weights and activations are calibrated against your dataset and quantised. No developer configuration required beyond the calibration set.
- Memory planning. The compiler lays out tensors to fit NPU on-chip memory. Models that exceed available NPU memory fail at compile time.
- Optional PPU fusion. For YOLO architectures, the post-processing graph is fused onto the NPU when PPU is enabled.
Compiler upstream reference: github.com/DEEPX-AI/dx-compiler →
Deploy the .dxnn artifact and run inference
The compiled .dxnn file is the deployment artifact. Copy it to the target
Sixfab device, then load it with the DEEPX runtime (dxrt-runtime) and run
inference. The Python and C++ APIs are functionally equivalent; C++ runs slightly more
efficiently on tight inference loops.
Transfer the artifact to the target device
The transfer step is the only stage where the procedure varies by product. The
.dxnn file itself is identical across all three.
Sixfab AI HAT+
Raspberry Pi 5 host
SCP from the compile host. Example:
scp model.dxnn [email protected]:~/
Standard SSH-over-LAN deployment.
Sixfab Edge AI Expansion Board
Raspberry Pi 5 host
SCP from the compile host. Example:
scp model.dxnn [email protected]:~/
Same Pi 5 deployment path as AI HAT+.
ALPON X5 AI
Fanless industrial system
Deployment over ALPON Cloud or direct SCP, depending on fleet management setup.
[NEED FROM SIXFAB: confirm canonical ALPON deployment path for custom .dxnn artifacts.]
Run inference with the Python API
from dx_engine import InferenceEngine import cv2, numpy as np # 1. Load the compiled model engine = InferenceEngine("models/my_model.dxnn") # 2. Preprocess input (remove operations the NPU absorbed) img = cv2.imread("frame.jpg") img = cv2.resize(img, (640, 640)) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) img = np.transpose(img, (2, 0, 1)) img = np.expand_dims(img, 0) # 3. Run inference output = engine.run(img)
Run inference with the C++ API
#include <dxrt/inference_engine.h> dxrt::InferenceEngine engine("models/my_model.dxnn"); for (const auto& frame : frames) { auto output = engine.Run(frame.data()); // process output }
Verify the NPU is being used
On the target device, run dxrt-cli -s to confirm the NPU is loaded and the
model is dispatched correctly. For the full monitoring reference (utilisation,
temperature, fleet integration with Prometheus and Grafana), see
AI Model Deployment · System Monitoring.
.dxnn file? Skip Stages 1 and 2.
If a pre-compiled artifact has been provided (by Sixfab, a colleague, or a partner), skip directly to Stage 3 and run inference on the target Sixfab device.
Sixfab × Ultralytics acceleration path
For teams that want to skip the compile workflow entirely, Sixfab offers a managed partnership track with Ultralytics. Bring a labelled dataset; the partnership delivers a trained, optimised, and deployed custom model on your target Sixfab hardware in days, with no DX-COM toolchain on your side.
Learn about the Sixfab × Ultralytics acceleration path
What the partnership covers, what Sixfab needs from you, expected timeline, and how to engage.
Compatible libraries
DX-COM accepts ONNX models produced by the following ecosystems. The training framework is independent of compilation; choose what your team already uses.
- PyTorch. Export via
torch.onnx.export. Opset 11 is widely compatible with DX-COM; opset version recommendations beyond this [NEED FROM SIXFAB: confirm canonical opset version per framework]. - TensorFlow. Export via the
tf2onnxconverter. Verify input/output tensor names match whatconfig.jsondeclares. - Keras. Export via
tf2onnxorkeras2onnx. Check for dynamic shape dimensions before compiling; DX-COM requires fixed input shapes. - Ultralytics YOLO. Built-in ONNX export with
model.export(format="onnx"). Recommended starting point for object detection, classification, segmentation, and pose. - ONNX Model Zoo. Direct download. Verify the model's operator set against the Supported Models Catalog before compiling.
Standard data-science and computer-vision libraries (OpenCV, NumPy, Pillow, scikit-image,
Picamera2, libcamera) work alongside dxrt-runtime on the target device with no
conflicts.
Limitations and constraints
DXNN SDK has well-defined boundaries. Acknowledging them up front saves debugging time later.
Vision models today
ScopeThe DEEPX NPU accelerates convolutional and vision-based networks. LLMs, audio-only models, and text-based transformers are not supported on current silicon. Vision today; LLMs are on the DEEPX roadmap and Sixfab will support them as the silicon enables. No dates.
INT8 precision only
~2 % lossAll models run at INT8. The compiler quantises automatically. FP16 and FP32 are not supported on the NPU. Expect approximately 2 % accuracy reduction versus the original FP32 model. Plan evaluation against this envelope, not against the FP32 numbers.
Inference only
No trainingSixfab edge AI hardware runs inference exclusively. On-device training is not supported. Train on a GPU-equipped machine (or via the Sixfab × Ultralytics acceleration path), then deploy the resulting .dxnn file using the ONNX → DXNN compiler path described above.
Ubuntu x86_64 to compile
No ARMDX-COM requires Ubuntu x86_64 with at least 16 GB RAM. ARM and aarch64 are not supported for compilation. The Pi 5 itself cannot compile models; neither can ALPON X5 AI.
Operator support boundary
Verify firstModels with operators outside the DEEPX-supported list either error at compile time or fall back to CPU at runtime, depending on architecture. Verify against the Supported Models Catalog before exporting.
No hot-plug on HAT+ or Expansion Board
Power off firstHot-plug is not supported on AI HAT+ or Edge AI Expansion Board. Power off the Raspberry Pi 5 before mounting or removing the board. ALPON X5 AI is a sealed industrial computer; the NPU is integrated, not removable.
Host compatibility
Pi 5 / CM5Supported hosts for AI HAT+ and Edge AI Expansion Board: Raspberry Pi 5 and Raspberry Pi Compute Module 5 via the official Raspberry Pi CM5 IO Board. Not supported: Pi 4, CM4, non-Raspberry Pi SBCs. ALPON X5 AI ships as a complete system built on Pi CM5 + DEEPX inside a fanless enclosure.
Recompile on SDK breaks
VersioningIf a runtime SDK update introduces breaking changes, existing .dxnn files may need recompilation. The runtime reports a version mismatch at load time when this occurs.
Need parameter-level detail?
Every config.json field, all three PPU types with Netron walkthroughs, DXQ
accuracy recovery, and the full preprocessing operator list live in the dedicated
reference.
Updated 5 days ago
