AI Development
Run vision AI on the DEEPX DX-M1 NPU.
End-to-end developer reference for the ALPON X5 AI: the Sixfab Model Zoo for instant inference, the DXNN SDK for compiling your own ONNX graphs, and the dx_engine Python and dxrt_api.h C++ APIs that drive the NPU from a Docker container. Intelligented by DEEPX. Built on Raspberry Pi.
The ALPON X5 AI supports two paths. Path 1 — Sixfab Model Zoo ships pre-compiled .dxnn models inside the sixfab-dx APT package and exposes a run_hello_world demo for instant inference. Path 2 — DXNN SDK is the custom-model workflow: train in PyTorch, TensorFlow, or Keras, export to ONNX, compile to .dxnn with dx-com on a host PC, and deploy through dx_engine Python or dxrt_api.h C++ on the DEEPX DX-M1 NPU.
01Overview
The ALPON X5 AI is an industrial edge AI computer built for image-based inference at the network edge. AI acceleration is provided by the DEEPX DX-M1: a 25 TOPS NPU with 4 GB of dedicated on-chip memory, fully independent of the Raspberry Pi CM5 system RAM. Every developer tool on this page targets that accelerator.
Custom-model pipeline
Path 2 (DXNN SDK) follows a four-stage pipeline. Path 1 (Sixfab Model Zoo) skips the first three stages because the .dxnn file ships pre-compiled inside sixfab-dx.
Source
Training framework
PyTorch · TF · Keras
Interchange
ONNX graph
.onnx · opset ≥ 13
Compile (Host PC)
dx-com
DEEPX compiler
Deploy (ALPON X5 AI)
sixfab-dx + dx_engine
.dxnn on DX-M1 NPU
What lives where
The toolchain is split cleanly between the host PC (your development laptop) and the target device (the ALPON X5 AI). Keep this split in mind when wiring up build systems and CI pipelines.
Host PC
x86_64 Linuxdx-com: compiles ONNX graphs into.dxnnbinaries optimized for DEEPX DX-M1.dx-tron: Netron-based viewer for.dxnnfiles. Used for inspection and debugging.- Your training pipeline and ONNX export script.
ALPON X5 AI (target)
arm64 · ALPON X5 AI OS- DEEPX kernel driver: exposes the NPU as
/dev/dxrt0. Preinstalled. sixfab-dx: runtime, Python wheels, pre-built venv, and CLI tools.dx_enginePython anddxrt_api.hC++ APIs.- Compiled
.dxnnmodels and your inference application.
Compilation happens on a host PC once per model. The .dxnn artifact is then copied to any number of ALPON X5 AI devices and loaded at runtime. You do not need a DEEPX DX-M1 attached to your development machine to compile.
Languages and APIs
The DEEPX Runtime ships first-class bindings for Python (dx_engine) and C++ (dxrt_api.h). Use Python for rapid prototyping, video analytics with OpenCV or GStreamer, and glue code. Use C++ when you need deterministic latency, embedded integration, or direct runtime control.
Two paths to inference
The ALPON X5 AI supports two complementary workflows. Pick the one that matches what you need today; you can mix them in the same project.
Pre-compiled .dxnn models bundled with sixfab-dx. Run run_hello_world for an instant YOLOv8 demo. Fastest path to working inference, no compiler or training pipeline required.
Compile your own ONNX model into a .dxnn binary with dx-com on a host PC and deploy it through dx_engine on the ALPON X5 AI. Full control over architecture, quantization, and post-processing.
Supported workloads
The DEEPX DX-M1 is optimized for image-based AI workloads. The runtime executes convolutional architectures reliably. Transformer-based models are not supported on the current runtime.
- Object detection: YOLO family (v5, v7, v8, v11, YOLO26, YOLOX), SSD variants.
- Classification: ResNet, MobileNet, EfficientNet.
- Segmentation: U-Net, DeepLabV3, and other semantic segmentation backbones.
- OCR: PaddleOCR-based detection, classification, and recognition heads.
- Face detection and recognition: RetinaFace, SCRFD, ArcFace pipelines.
- Pose estimation: YOLOv8-Pose, HRNet keypoint regressors.
02DEEPX Runtime
The runtime is delivered through the sixfab-dx APT package. It bundles the DEEPX Runtime libraries, the dx_engine Python wheel, a pre-built virtual environment at /opt/sixfab-dx/venv, and CLI tools. The kernel driver is preinstalled on ALPON X5 AI OS and exposes the DEEPX DX-M1 at /dev/dxrt0.
How do I install the DEEPX Runtime on the ALPON X5 AI?
The DEEPX kernel driver and the Sixfab APT repository are preconfigured on ALPON X5 AI OS. Install sixfab-dx with a single apt command, or pull it into your Docker image during the build step.
Install sixfab-dx
One command brings in the shared libraries, CLI tools, and the pre-built Python virtual environment with dx_engine already inside.
sudo apt update && sudo apt install -y sixfab-dx
Verify the NPU
Two CLI tools ship with the package. Use dxrt-cli -s for a quick status check and dxtop for a live, htop-like view of NPU utilization.
# 1. One-shot status check dxrt-cli -s # 2. Live monitoring while a workload runs dxtop
Activate the bundled Python environment
The dx_engine wheel is pre-installed inside the bundled venv. Activate it before importing.
source /opt/sixfab-dx/venv/bin/activate python -c "import dx_engine; print(dx_engine.__version__)"
pip install needed
The dx_engine wheel ships inside the sixfab-dx package and is already installed in /opt/sixfab-dx/venv. Use that venv directly. Avoid creating a parallel venv and copying files around.
Running the runtime inside a Docker container
Production ALPON X5 AI deployments run inference code inside Docker containers. The kernel driver lives on the host, so the container only needs access to the device node and the runtime libraries.
# Minimal run command: privileged mode + NPU device node docker run --privileged --device /dev/dxrt0 -v $(pwd)/models:/models -it your-image:tag
The equivalent in docker-compose.yml:
services: inference: image: your-image:tag privileged: true devices: - "/dev/dxrt0:/dev/dxrt0" volumes: - ./models:/models restart: unless-stopped
The DEEPX Runtime uses PCIe ioctls to communicate with the DEEPX DX-M1 that are not exposed by default in unprivileged containers. --privileged is currently required. If your threat model demands tighter isolation, restrict the container with a narrow seccomp profile or AppArmor policy.
Concurrent models
The DEEPX Runtime schedules NPU time across multiple concurrently loaded models automatically. A typical ALPON X5 AI deployment pairs a detector (for example YOLO) with a classifier or OCR head on the same device. You do not need to implement your own scheduler.
03Sixfab Model Zoo
The Sixfab Model Zoo is a curated set of pre-compiled .dxnn models bundled with the sixfab-dx APT package. It includes a ready-to-run run_hello_world demo and validated builds for object detection, face detection, pose estimation, and instance segmentation. There is no compiler, no training pipeline, and no extra download: install sixfab-dx and you have working inference.
The Sixfab Model Zoo is the fastest way to validate your hardware and understand the inference pipeline before building your own application. The same runtime powers the demos and your production code, so any zoo demo can ship as-is or serve as a starting point.
Sixfab Model Zoo demos use the same sixfab-dx runtime you already installed. There is no separate dependency to fetch or compiler to set up. Production deployments can use a zoo model directly without any compilation step.
Quick demo: run_hello_world
The sixfab-dx package includes a ready-to-run YOLOv8 object detection demo. It draws bounding boxes around cars, people, and other objects in real time using a bundled sample video, so no camera is needed to get started.
# Activate the bundled venv (if not already active) source /opt/sixfab-dx/venv/bin/activate # Launch the YOLOv8 demo run_hello_world
Expected performance summary on the DEEPX DX-M1 (25 TOPS):
PERFORMANCE SUMMARY ================================================ Pipeline Step Avg Latency Throughput ------------------------------------------------ Read 21.45 ms 46.6 FPS Preprocess 14.11 ms 70.9 FPS Inference 399.69 ms 16.0 FPS* Postprocess 2.23 ms 449.0 FPS Display 32.47 ms 30.8 FPS ------------------------------------------------ * Actual throughput via async inference Overall FPS : 16.0 FPS
To watch the NPU in real time, open a second SSH session and run dxtop. Core utilization should sit at 80 to 90 percent during active inference.
Available models
Pre-compiled .dxnn files included with the package. Performance figures are for the 25 TOPS DEEPX DX-M1 at default settings.
| Model file | Task | Performance |
|---|---|---|
| YoloV8N.dxnn | Object detection (nano + PPU) | ~35 FPS |
| YoloV8S.dxnn | Object detection (small) | FPS pending |
| YoloV8M.dxnn | Object detection (medium) | FPS pending |
| YoloV9S.dxnn | Object detection | FPS pending |
| YoloV9C.dxnn | Object detection (compact) | FPS pending |
| SCRFD500M.dxnn | Face detection | FPS pending |
| YoloV5Pose.dxnn | Pose estimation | FPS pending |
| YoloV26S-Seg.dxnn | Instance segmentation | FPS pending |
PPU (Post-Processing Unit) models execute non-maximum suppression and confidence filtering on the NPU itself, removing a major CPU bottleneck. The YOLOv8n + PPU variant in the zoo reaches ~35 FPS for this reason.
Supported camera inputs
The camera connects to the Raspberry Pi CM5 host; the NPU only receives preprocessed frames. Any of these sources work with zoo demos and with custom applications.
| Source | Invocation | Notes |
|---|---|---|
| USB Webcam | --camera_index 0 |
Any UVC-compatible USB camera. Change the index for multi-camera setups. |
| Video file | -v video.mp4 |
MP4, AVI, MKV, and any other format OpenCV decodes. |
| RTSP stream | -v rtsp://<ip>/stream |
IP cameras over RTSP. Multi-stream pipelines are supported in the GitHub examples. |
| RPi Camera Module | libcamera / picamera2 |
Capture frames in C++ with libcamera or in Python with picamera2 and feed them into the inference pipeline. |
04DEEPX Model Zoo
The DEEPX Model Zoo is the upstream catalog of pre-compiled .dxnn models maintained by DEEPX. It is broader than the curated Sixfab Model Zoo and covers object detection, image classification, semantic segmentation, face detection and recognition, pose estimation, and OCR. Models ship in Q-Lite (fast INT8 default) and Q-Pro (fine-tuned, higher accuracy) quantization modes. Browse and download at developer.deepx.ai/modelzoo.
Categories and representative models
Bounding-box detectors for real-time video analytics, safety monitoring, and inventory tracking.
Multi-class image classifiers for product sorting, defect detection, and tagging pipelines.
Per-pixel masks for road-scene understanding, medical imaging, and surface inspection.
Detection, alignment, and embedding models for access control and attendance systems.
2D keypoint regression for workplace safety analytics and human activity monitoring.
Text detection, classification, and recognition pipelines for document processing and industrial labels.
The Model Zoo catalog reflects what the DEEPX Runtime supports: image-based CNNs. Transformer-based models (ViT, DETR, LLMs) are not supported on the current runtime. Plan your architecture around the categories listed above.
Quantization modes: Q-Lite vs Q-Pro
Models in the zoo ship in two quantization flavors. Pick one based on your accuracy headroom and deployment timeline.
| Mode | Use case | Trade-off |
|---|---|---|
| Q-Lite | Default choice. Standard INT8 quantization optimized for fast inference and short compile times. | Lower accuracy floor than Q-Pro on sensitive models; typically negligible for well-trained detectors. |
| Q-Pro | Accuracy-sensitive workloads. High-precision quantization with fine-tuning to recover accuracy close to FP32. | Longer compile time. Use for production models where every mAP point matters. |
How do I download a model from the Model Zoo?
Models are distributed as .dxnn files from the DEEPX developer portal. On the ALPON X5 AI, fetch the file directly with wget or curl and mount it into your container.
# 1. Keep models under /opt/models on the host sudo mkdir -p /opt/models cd /opt/models # 2. Download a Q-Lite YOLOv8 nano model (example path) sudo curl -L -O https://developer.deepx.ai/modelzoo/download/yolov8n_qlite.dxnn
Review the per-model license on the DEEPX developer portal before redistributing or shipping a commercial product derived from a zoo model.
05Deploy a custom model
Deploying a custom model takes six steps: (1) export to ONNX on your development machine, (2) compile to .dxnn on an Ubuntu x86_64 host with dx-com, (3) copy the artifact to the ALPON X5 AI, (4) write a Python inference script using dx_engine, (5) build and run it inside a privileged Docker container against /dev/dxrt0, and (6) verify with dxrt-cli -s or dxtop. To skip compilation entirely, use a Sixfab Model Zoo or DEEPX Model Zoo build instead.
Compiler host requirements
The dx-com compiler runs on a separate Ubuntu x86_64 machine. ARM and aarch64 hosts are not supported for compilation. The ALPON X5 AI itself is the deployment target, not the build host. Compile once per model on Ubuntu; the resulting .dxnn file then runs offline on any number of ALPON X5 AI devices.
Compiling a typical vision model with dx-com takes approximately two hours. Run compilations overnight or on a CI worker. Once compiled, the .dxnn file loads instantly on the ALPON X5 AI and can be redeployed indefinitely without recompiling.
Prerequisites
- An ALPON X5 AI powered on and reachable over SSH, with ALPON X5 AI OS up to date.
sixfab-dxinstalled on the device per the DEEPX Runtime section.- Docker available on the device (included with ALPON X5 AI OS).
- An Ubuntu x86_64 host meeting the requirements above for the compile step.
- A
.dxnnmodel, either downloaded from a Model Zoo or compiled withdx-comon your host PC.
Step-by-step walkthrough
Export your model to ONNX
On your host PC, export the trained model to ONNX with opset 13 or later. This example uses Ultralytics YOLOv8.
from ultralytics import YOLO model = YOLO("yolov8n.pt") model.export(format="onnx", opset=13, simplify=True) # produces yolov8n.onnx
Compile the ONNX graph with dx-com
Run the DEEPX compiler on your host PC to produce a .dxnn artifact. PPU support, which offloads bounding-box decoding and NMS to the NPU, is configured inside the model .cfg file rather than as a CLI flag.
dx-com compile --model yolov8n.onnx --config yolov8n.cfg --output yolov8n.dxnn
Enable PPU inside yolov8n.cfg:
ppu: enabled: true type: yolo num_classes: 80 conf_threshold: 0.25 iou_threshold: 0.45
Copy the model to the ALPON X5 AI
Transfer the .dxnn artifact to a stable path on the device, typically /opt/models.
scp yolov8n.dxnn alpon@<device-ip>:/opt/models/
Write an inference script
On the ALPON X5 AI, write a minimal Python script that loads the model and runs one inference pass. Save it as infer.py.
import cv2 import numpy as np from dx_engine import InferenceEngine # 1) Load the compiled model onto the DX-M1 NPU engine = InferenceEngine("/models/yolov8n.dxnn") # 2) Prepare an input frame (BGR, 640x640 for YOLOv8n) frame = cv2.imread("/models/sample.jpg") input_tensor = cv2.resize(frame, (640, 640)) # 3) Run inference on the NPU outputs = engine.run([input_tensor]) # 4) outputs is a list of numpy arrays; shape depends on the model print("Output tensors:", [o.shape for o in outputs])
Build and run the container
Write a minimal Dockerfile that installs sixfab-dx and copies your script. Inside a fresh Debian base image, the Sixfab APT repository must be registered manually (the host ALPON X5 AI OS has it preconfigured, but a clean container does not).
FROM debian:trixie-slim RUN apt-get update && apt-get install -y wget gnupg ca-certificates python3-opencv && wget -qO - https://sixfab.github.io/sixfab_dx/public.gpg | gpg --dearmor -o /usr/share/keyrings/sixfab-dx.gpg && echo "deb [signed-by=/usr/share/keyrings/sixfab-dx.gpg] https://sixfab.github.io/sixfab_dx trixie main" > /etc/apt/sources.list.d/sixfab-dx.list && apt-get update && apt-get install -y sixfab-dx COPY infer.py /app/infer.py WORKDIR /app # Use the pre-built Sixfab venv so dx_engine is importable CMD ["/opt/sixfab-dx/venv/bin/python", "/app/infer.py"]
Build and run it on the device, mounting /opt/models and exposing the NPU device node.
docker build -t alpon-infer:latest . docker run --rm --privileged --device /dev/dxrt0 -v /opt/models:/models alpon-infer:latest
Monitor the NPU
In another SSH session, run dxrt-cli -s for a one-shot status check, or dxtop for a live, htop-like view. Utilization should spike while your container is running.
# One-shot status, firmware version, PCIe link state dxrt-cli -s # Live monitoring; expect 80 to 90% core utilization dxtop
Common pitfalls
| Symptom | Likely cause and fix |
|---|---|
ImportError: No module named dx_engine |
The script is running under the system Python instead of the bundled venv. Invoke /opt/sixfab-dx/venv/bin/python directly, or source /opt/sixfab-dx/venv/bin/activate first. |
Cannot open /dev/dxrt0 inside container |
Missing --privileged or --device /dev/dxrt0. Add both to docker run or the compose file. |
| Model fails to load, error references memory | Compiled .dxnn footprint exceeds 4 GB NPU memory. Recompile with a smaller input resolution or switch to a lighter model variant. |
| Extremely low FPS on YOLO | Model compiled without PPU support. Add a ppu block to the .cfg file and recompile. |
| Detections look right but accuracy is degraded | Color-channel mismatch. OpenCV loads frames as BGR; most models train on RGB. Convert explicitly with cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) before inference. |
| Compile fails with unsupported operator | The ONNX graph contains an attention block or transformer operator. Switch to a CNN-based architecture; transformers are not supported on the current runtime. |
| Runtime reports a version mismatch on model load | An SDK update introduced breaking changes. Recompile the .dxnn file with the matching dx-com version. |
06Performance & benchmarks
With PPU support enabled, a YOLO-nano class detector reaches approximately 50 FPS at 1280 × 720 and 20 to 25 FPS at 1920 × 1080 on the DEEPX DX-M1. Larger YOLO variants scale down proportionally. Actual throughput depends on the model variant, input resolution, pre-processing path, and whether PPU is compiled into the graph.
YOLO throughput reference
| Input resolution | Approximate FPS (PPU-compiled YOLO nano) | Notes |
|---|---|---|
| 1280 × 720 (HD) | ~50 FPS | Real-time processing for single-stream HD video analytics. |
| 1920 × 1080 (Full HD) | ~20 to 25 FPS | Headroom for additional CPU-side pre- and post-processing. |
PPU-compiled models handle bounding-box decoding and NMS on the NPU, which significantly reduces CPU overhead. Without PPU, post-processing runs on the CM5 CPU and can become the bottleneck on Full HD streams. Most YOLO variants support PPU compilation.
How to benchmark your own model
Use a simple loop around engine.run() and measure wall-clock time over a warm window. Skip the first ~10 iterations to avoid warm-up noise.
import time, numpy as np from dx_engine import InferenceEngine engine = InferenceEngine("/models/yolov8n.dxnn") dummy = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8) # Warm-up for _ in range(10): engine.run([dummy]) # Timed window N = 200 t0 = time.perf_counter() for _ in range(N): engine.run([dummy]) dt = time.perf_counter() - t0 print(f"FPS: {N/dt:.1f} (n={N})")
Variables that move the numbers
- Model variant. Nano tier is the fastest; small, medium, and large variants drop FPS roughly proportional to compute.
- Input resolution. Doubling resolution roughly quarters FPS on detection models.
- PPU compilation. On or off is usually the largest single factor for YOLO.
- Quantization mode. Q-Lite runs slightly faster than Q-Pro; choose Q-Pro only if you measure an accuracy regression.
- Async inference. Use the runtime's async API to overlap NPU compute with CPU pre- and post-processing. Submit the next frame while the previous one is still on the NPU.
- Concurrent models. Running a detector and a classifier simultaneously shares NPU time; each runs slower than in isolation.
- Pre- and post-processing path. OpenCV decode plus color conversion on the CPU can bottleneck high-resolution pipelines. Consider GStreamer with hardware decode for sustained Full HD.
Hard limits
| NPU memory ceiling | 4 GB on-chip. Models whose compiled footprint exceeds 4 GB will not load. |
| NPU power envelope | 2 W minimum, 5 W maximum under supported AI workloads. |
| Interface bandwidth | PCIe Gen3 x2, shared with the NVMe SSD via the on-board ASM2806I packet switch. |
| Supported architectures | Image-based CNNs. Transformer-based models are not supported on the current runtime. |
| Power-mode control | None. The NPU runs at a fixed performance profile; there is no software API for low-power or performance modes. |
Published FPS numbers are single-stream, dummy-input references. Real-world pipelines add frame capture, decode, color conversion, and result rendering, all of which consume CPU cycles on the CM5. Always measure end-to-end throughput under your exact workload before committing to a deployment budget.
07Examples & references
Working code is the fastest way to evaluate the ALPON X5 AI for a new use case. The Sixfab DX examples repository contains 54 ready-to-run demos (28 Python, 26 C++) covering object detection across all supported YOLO variants, instance and semantic segmentation, pose estimation, OCR, face detection, PPU-accelerated and async pipelines, and analytics applications such as zone intrusion, people tracking, traffic counting, and queue analysis.
Quick demo run
After sixfab-dx is installed, clone the examples repository and run the auto-installer to fetch models and build the C++ demos.
git clone https://github.com/sixfab/sixfab-dx-examples cd sixfab-dx-examples ./auto-install.sh # Activate the bundled venv before running Python demos source /opt/sixfab-dx/venv/bin/activate # Launch the interactive Python demo menu bash python_examples/start.sh # Or the C++ menu bash cpp_examples/start.sh
External references
Sixfab DX runtime & APT repository
github.com/sixfab/sixfab_dxSixfab DX examples (Python & C++)
github.com/sixfab/sixfab-dx-examplesDEEPX Model Zoo
developer.deepx.ai/modelzooDEEPX dx_rt runtime source
github.com/DEEPX-AI/dx_rtDEEPX kernel driver source
github.com/DEEPX-AI/dx_rt_npu_linux_driverDEEPX full toolchain (compiler, simulator)
github.com/DEEPX-AI/dx-all-suiteUpdated about 16 hours ago
