AI Development
AI Development on the ALPON
End-to-end developer reference for running Edge AI workloads on the ALPON, an industrial edge AI computer built on Raspberry Pi CM5. This page covers the DEEPX DX-M1 AI accelerator, the DEEPX toolchain, the pre-compiled model zoo, the path from an ONNX file to a running container, and the throughput you can expect on the ai gateway in production deployments.
Overview
The ALPON AI development workflow is a two-stage pipeline. You train or obtain a model in PyTorch, TensorFlow, Keras, XGBoost, or MXNet, export it to ONNX, and compile it to the .dxnn format with the DEEPX dx-com compiler on a host PC. You then deploy the .dxnn artifact in a Docker container on the ALPON, where the DEEPX Runtime executes it on the DX-M1 NPU through the dx_engine (Python) or dxrt_api.h (C++) API.
The ALPON is an edge ai computer designed for image-based ai on edge inference. Hardware acceleration is provided by the DEEPX DX-M1, a 25 TOPS NPU with 4 GB of dedicated on-chip memory that is independent of the Raspberry Pi CM5 system RAM. All of the developer tooling described on this page targets that accelerator.
Two-stage pipeline
Source
Training framework
PyTorch · TF · Keras
Interchange
ONNX graph
.onnx
Compile (Host PC)
dx-com
DEEPX compiler
Deploy (ALPON)
dx-rt + dx_engine
.dxnn on DX-M1
Host vs. container architecture
ALPON OS ships with the DEEPX kernel driver preinstalled. Your inference code runs inside Docker containers. This decouples host maintenance from application deployment: OS and driver updates arrive via OTA, while inference logic updates independently through container images.
Host (ALPON OS)
- DEEPX kernel driver
- PCIe enumeration of
/dev/dxrt0 - OTA-managed by ALPON Cloud
sixfab-dxruntime and CLI tools
Container (your image)
- DEEPX Runtime libraries (
libdxrt) dx_engine/dxrt_api.h- Compiled model files (
.dxnn) - Your inference application
Compilation runs on a host PC once per model. The .dxnn artifact is then copied to the ALPON and loaded at runtime. You do not need a DX-M1 attached to your development machine to compile.
What lives where
The toolchain is split cleanly between the host PC (your development laptop) and the target device (the ALPON). Keep this split in mind when wiring up build systems and CI pipelines.
| Component | Runs on | Role |
|---|---|---|
dx-com | Host PC (x86 Linux) | Compiles ONNX graphs into .dxnn binaries optimized for the DX-M1 NPU. |
dx-sim | Host PC (x86 Linux) | Bit-accurate simulator for validating compiled models before shipping them to the device. |
dx-tron | Host PC (x86 Linux/Windows) | Graph viewer for .dxnn files, based on Netron. Used for inspection and debugging. |
| DEEPX kernel driver | ALPON (host OS) | Exposes the NPU as /dev/dxrt0. Preinstalled on ALPON OS. |
sixfab-dx | ALPON (host OS) | Metapackage that installs and maintains the DEEPX Runtime, CLI tools, and Python bindings. |
libdxrt (DEEPX Runtime) | ALPON (host or container) | Loads and executes .dxnn models on the NPU. Provides C++ and Python APIs. |
dx_engine | ALPON (Python) | Python binding bundled inside the runtime virtual environment. |
dxrt_api.h | ALPON (C++) | Native C++ header. Used for low-latency inference and embedded pipelines. |
Languages and APIs
The DEEPX Runtime ships first-class bindings for Python (dx_engine) and C++ (dxrt_api.h). Use Python for rapid prototyping, video analytics with OpenCV or GStreamer, and glue code. Use C++ when you need deterministic latency, embedded integration, or direct runtime control.
Bundled inside the runtime virtual environment. Best fit for video analytics, prototyping, and glue code around OpenCV or GStreamer.
Low-overhead native header. Use when you need deterministic latency, integration with existing C++ pipelines, or embedded applications linking the runtime directly.
AI Accelerator (DEEPX DX-M1)
The DEEPX DX-M1 is a dedicated ai edge accelerator (NPU) that delivers 25 TOPS of INT8 inference with 4 GB of on-chip memory. It is an M.2 2280 module connected to the Raspberry Pi CM5 host over PCIe Gen3, and it runs image-based AI models (CNNs, YOLO-family detectors, classification, segmentation) compiled from ONNX.
| Parameter | Value |
|---|---|
| Module | DEEPX DX-M1 (M.2 2280) |
| Inference throughput | 25 TOPS (INT8) |
| Dedicated memory | 4 GB on-chip. The NPU does not share system RAM with the CM5 host. |
| Host interface | PCIe Gen3 via the ASM2806I packet switch, shared with the NVMe SSD lane. |
| Power draw | 2 W minimum, 5 W maximum under supported AI workloads. |
| Model format | ONNX compiled to .dxnn with dx-com. |
| APIs | Python: dx_engine · C++: dxrt_api.h |
| Device node | /dev/dxrt0 |
| Monitoring tool | dxtop |
| Supported architectures | Image-based CNNs. Transformer-based models are not supported on the current runtime. |
| Power-mode control | None. The NPU runs at a fixed performance profile; there is no software API for low-power or performance modes. |
Supported model formats
Models must be exported to ONNX and compiled with the DEEPX compiler (dx-com) before they execute on the NPU. Source frameworks for ONNX export include PyTorch, TensorFlow, Keras, XGBoost, and MXNet. Any framework that emits a valid ONNX graph of supported operators works.
Supported workloads
The DX-M1 is optimized for image-based AI workloads. The runtime executes convolutional architectures reliably. Transformer-based models are not supported on the current runtime.
- Object detection: YOLO family (v5, v7, v8, YOLOX), SSD variants
- Classification: ResNet, MobileNet, EfficientNet
- Segmentation: U-Net and semantic segmentation backbones
- OCR: PaddleOCR-based detection, classification, and recognition heads
- Face detection and recognition: CNN-based pipelines
- Pose estimation: keypoint-regression CNNs
Known limitations
Plan model architecture and memory footprint around the following constraints before deployment.
DEEPX Runtime
The DEEPX Runtime is preinstalled on ALPON OS. No action is required out of the box. To reinstall or update the runtime and its CLI tools, run a single command: sudo apt update && sudo apt install sixfab-dx. The sixfab-dx metapackage provides the kernel driver, runtime libraries, dxrt-cli, dxtop, run_model, and the bundled Python environment with dx_engine.
Install or reinstall the runtime
The runtime ships with every ALPON out of the box, so first-time users can skip ahead to the Deploy Your First Model section. Use the command below only when you need to reinstall, recover from a broken state, or update after a kernel upgrade.
sudo apt update && sudo apt install sixfab-dx
If ALPON OS applies a kernel update, the DEEPX kernel module needs to be rebuilt against the new kernel. Rerun sudo apt install sixfab-dx to trigger the rebuild. This is the same recovery path used after any kernel change.
Verify the runtime
Confirm the NPU is enumerated and the driver is healthy with dxrt-cli -s. This is the first command to run when diagnosing any issue.
dxrt-cli -s
DXRT v3.2.0 * Device 0: M1, Accelerator type * RT Driver version : v2.1.0 * FW version : v2.5.0 * Memory : LPDDR5x 6000 Mbps, 3.92 GiB * PCIe : Gen3 X1 [01:00:00] NPU 0: voltage 750 mV, clock 1000 MHz, temperature 46°C NPU 1: voltage 750 mV, clock 1000 MHz, temperature 46°C NPU 2: voltage 750 mV, clock 1000 MHz, temperature 46°C
Installed CLI tools
The sixfab-dx package installs a small set of command-line tools for diagnostics, monitoring, and headless inference.
htop for the NPU..dxnn file without writing application code.Using the Python API
The dx_engine Python library ships inside the runtime virtual environment. Activate it before importing.
source /usr/lib/libdxrt/dxrt-venv/bin/activate python -c "import dx_engine; print(dx_engine.__version__)"
There is no pip install dx_engine command. The library is bundled inside sixfab-dx and loaded from the runtime venv. Do not create a parallel venv; use the one at /usr/lib/libdxrt/dxrt-venv.
Concurrent models
The DEEPX Runtime schedules NPU time across multiple concurrently loaded models automatically. A typical ALPON deployment pairs a detector (for example YOLO) with a classifier or OCR head on the same device. You do not need to implement your own scheduler.
Model Zoo
The DEEPX Model Zoo is a catalog of pre-compiled .dxnn models that run on the DX-M1 NPU without any conversion work. It covers the common edge ai tasks: object detection, image classification, semantic segmentation, face detection and recognition, pose estimation, and OCR. Models are available in two quantization flavors, Q-Lite (fast INT8 default) and Q-Pro (fine-tuned, higher accuracy). Browse and download at developer.deepx.ai/modelzoo.
Categories and representative models
Object Detection
Bounding-box detectors for real-time video analytics, safety monitoring, and inventory tracking.
Image Classification
Multi-class image classifiers for product sorting, defect detection, and tagging pipelines.
Semantic Segmentation
Per-pixel masks for road-scene understanding, medical imaging, and surface inspection.
Face Detection & Recognition
Detection, alignment, and embedding models for access control and attendance systems.
Pose Estimation
2D keypoint regression for workplace safety analytics and human activity monitoring.
OCR
Text detection, classification, and recognition pipelines for document processing and industrial labels.
The Model Zoo catalog reflects what the DEEPX Runtime supports: image-based CNNs. Transformer-based models (ViT, DETR, LLMs) are not supported on the current runtime. Plan your architecture around the categories listed above.
Quantization modes: Q-Lite vs Q-Pro
Models in the zoo ship in two quantization flavors. Pick one based on your accuracy headroom and deployment timeline.
| Mode | Use case | Trade-off |
|---|---|---|
| Q-Lite | Default choice. Standard INT8 quantization optimized for fast inference and short compile times. | Lower accuracy floor than Q-Pro on sensitive models; typically negligible for well-trained detectors. |
| Q-Pro | Accuracy-sensitive workloads. High-precision quantization with fine-tuning to recover accuracy close to FP32. | Longer compile time. Use for production models where every mAP point matters. |
How do I download a model from the Model Zoo?
Models are distributed as .dxnn files from the DEEPX developer portal. On the ALPON, fetch the file directly with wget or curl and mount it into your container.
# Example layout: keep models under /opt/models on the host sudo mkdir -p /opt/models cd /opt/models # Download a Q-Lite YOLOv5 nano model (example path) sudo curl -L -O https://developer.deepx.ai/modelzoo/download/yolov5n_qlite.dxnn
Review the per-model license on the DEEPX developer portal before you redistribute or ship a commercial product derived from a zoo model.
Deploy Your First Model
Deploying your first model on the ALPON takes five steps: (1) export to ONNX, (2) compile to .dxnn on a host PC with dx-com, (3) copy the artifact to the ALPON, (4) run inference in a privileged Docker container against /dev/dxrt0, and (5) verify with dxtop. If you start from a Model Zoo download, skip steps 1 and 2 and go straight to deployment.
Prerequisites
- An ALPON powered on and reachable over SSH, with ALPON OS up to date.
- DEEPX Runtime healthy on the device. Verify with
dxrt-cli -s; the runtime ships preinstalled. - Docker available on the device (included with ALPON OS).
- A
.dxnnmodel, either downloaded from the Model Zoo or compiled withdx-comon your host PC.
Step-by-step walkthrough
On your host PC, export the trained model to ONNX with opset 13 or later. This example uses Ultralytics YOLOv8.
from ultralytics import YOLO model = YOLO("yolov8n.pt") model.export(format="onnx", opset=13, simplify=True) # produces yolov8n.onnx
dx-comRun the DEEPX compiler on your host PC to produce a .dxnn artifact. Enable PPU support when available to offload post-processing to the NPU.
dx-com compile --model yolov8n.onnx --config yolov8n.cfg --output yolov8n.dxnn --ppu
Transfer the .dxnn artifact to a stable path on the device, typically /opt/models.
scp yolov8n.dxnn alpon@<device-ip>:/opt/models/
On the ALPON, write a minimal Python script that loads the model and runs one inference pass. Save it as infer.py.
import cv2 import numpy as np from dx_engine import InferenceEngine # 1) Load the compiled model onto the DX-M1 NPU engine = InferenceEngine("/models/yolov8n.dxnn") # 2) Prepare an input frame (BGR, 640x640 for YOLOv8n) frame = cv2.imread("/models/sample.jpg") input_tensor = cv2.resize(frame, (640, 640)) # 3) Run inference on the NPU outputs = engine.run([input_tensor]) # 4) outputs is a list of numpy arrays; shape depends on the model print("Output tensors:", [o.shape for o in outputs])
Write a minimal Dockerfile that installs the runtime and copies your script. Since the kernel driver runs on the host, the container only needs user-space libraries.
FROM debian:trixie-slim RUN apt-get update && apt-get install -y sixfab-dx python3-opencv COPY infer.py /app/infer.py WORKDIR /app # Use the bundled DEEPX venv so dx_engine is importable CMD ["/usr/lib/libdxrt/dxrt-venv/bin/python", "/app/infer.py"]
Build and run it on the device, mounting /opt/models and exposing the NPU device node.
docker build -t alpon-infer:latest . docker run --rm --privileged --device /dev/dxrt0 -v /opt/models:/models alpon-infer:latest
In another SSH session, run dxtop to confirm the NPU is active and memory is allocated. Utilization should spike while your container is running.
dxtop # watch NPU utilization, memory, and temperature in real time.
Common pitfalls
| Symptom | Likely cause and fix |
|---|---|
ImportError: No module named dx_engine |
The script is running under the system Python instead of the bundled venv. Invoke /usr/lib/libdxrt/dxrt-venv/bin/python or source the venv first. |
Cannot open /dev/dxrt0 inside container |
Missing --privileged or --device /dev/dxrt0. Add both to docker run or the compose file. |
| Model fails to load, error references memory | Compiled .dxnn footprint exceeds the 4 GB NPU memory. Recompile with a smaller input resolution or switch to a lighter model variant. |
| Extremely low FPS on YOLO | Model compiled without PPU support; post-processing runs on the CPU. Recompile with --ppu. |
| Compile fails with unsupported operator | The ONNX graph contains an attention block or transformer operator. Switch to a CNN-based architecture; transformers are not supported on the current runtime. |
dxrt-cli reports no device |
Reboot and re-check. If still failing, rerun sudo apt install sixfab-dx (rebuilds the kernel module) and inspect dmesg | grep -i dx. |
Docker Access
Run the container in privileged mode and expose the NPU device node with --privileged --device /dev/dxrt0. No additional drivers need to be installed inside the image because the kernel driver runs on the host.
Production ALPON deployments run inference code inside Docker containers. The kernel driver lives on the host, so the container only needs access to the device node and the runtime libraries.
# Minimal run command: privileged mode + NPU device node docker run --privileged --device /dev/dxrt0 -v $(pwd)/models:/models -it your-image:tag
The equivalent in docker-compose.yml:
services: inference: image: your-image:tag privileged: true devices: - "/dev/dxrt0:/dev/dxrt0" volumes: - ./models:/models restart: unless-stopped
The --privileged flag grants the container access to all host devices. The DEEPX Runtime uses PCIe ioctls to communicate with the DX-M1 that are not exposed in unprivileged containers, so --privileged is currently required. Limit it to containers that need NPU access, build from trusted base images, and do not expose privileged containers to untrusted networks without additional isolation (seccomp, AppArmor).
Performance & Benchmarks
With PPU (Post-Processing Unit) support enabled, a YOLO-nano class detector reaches approximately 50 FPS at 1280 x 720 and 20 to 25 FPS at 1920 x 1080 on the DEEPX DX-M1. Larger YOLO variants scale down proportionally. Actual throughput depends on model variant, input resolution, pre-processing path, and whether PPU is compiled into the graph.
YOLO throughput reference
| Input resolution | Approximate FPS (PPU-compiled YOLO nano) | Notes |
|---|---|---|
| 1280 x 720 (HD) | ~50 FPS | Real-time processing for single-stream HD video analytics. |
| 1920 x 1080 (Full HD) | ~20 to 25 FPS | Headroom for additional CPU-side pre- and post-processing. |
PPU-compiled models handle bounding-box decoding and NMS on the NPU, which reduces CPU overhead significantly. Without PPU, post-processing runs on the CM5 CPU and can become the bottleneck on Full HD streams. Most YOLO variants support PPU compilation.
How to benchmark your own model
Use a simple loop around engine.run() and measure wall-clock time over a warm window. Skip the first ~10 iterations to avoid warm-up noise.
import time, numpy as np from dx_engine import InferenceEngine engine = InferenceEngine("/models/yolov8n.dxnn") dummy = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8) # Warm-up for _ in range(10): engine.run([dummy]) # Timed window N = 200 t0 = time.perf_counter() for _ in range(N): engine.run([dummy]) dt = time.perf_counter() - t0 print(f"FPS: {N/dt:.1f} (n={N})")
Variables that move the numbers
- Model variant. Nano tier is the fastest; small, medium, and large variants drop FPS roughly proportional to compute.
- Input resolution. Doubling resolution roughly quarters FPS on detection models.
- PPU compilation. On or off is usually the largest single factor for YOLO.
- Quantization mode. Q-Lite runs slightly faster than Q-Pro; choose Q-Pro only if you measure an accuracy regression.
- Concurrent models. Running a detector and a classifier simultaneously shares NPU time; each will run slower than in isolation.
- Pre- and post-processing path. OpenCV decode plus color conversion on the CPU can bottleneck high-resolution pipelines. Consider GStreamer with hardware decode for sustained Full HD.
Optional: enable PCIe Gen3
The ALPON is tuned to run the DX-M1 on a Gen3 PCIe link. If you have modified the device tree or carrier configuration, verify the link speed reports Gen3 X1 after reboot.
dxrt-cli -s # look for: PCIe : Gen3 X1
Hard limits
| NPU memory ceiling | 4 GB on-chip. Models whose compiled footprint exceeds 4 GB will not load. |
| NPU power envelope | 2 W minimum, 5 W maximum under supported AI workloads. |
| Interface bandwidth | PCIe Gen3 x1 shared with the NVMe SSD via the ASM2806I packet switch. |
| Supported architectures | Image-based CNNs. Transformer-based models are not supported on the current runtime. |
| Power-mode control | None. The NPU runs at a fixed performance profile; there is no software API for low-power or performance modes. |
Published FPS numbers are single-stream, dummy-input references. Real-world pipelines add frame capture, decode, color conversion, and result rendering, all of which consume CPU cycles on the CM5. Always measure end-to-end throughput under your exact workload before committing to a deployment budget.
Monitoring & Power
Use dxtop on the ALPON host. It is a command-line monitor that reports NPU utilization, memory, temperature, voltage, and clock in real time, in the spirit of htop for CPUs or nvidia-smi for GPUs.
Live monitor: dxtop
# On the host or inside a privileged container with /dev/dxrt0 mounted dxtop
For headless monitoring from a remote system, run dxtop over the ALPON Cloud remote terminal or an SSH session. Pipe the output into your logging stack if you want long-running metrics rather than an interactive view.
Quick reference commands
# Full hardware status dxrt-cli -s # Real-time NPU utilisation (q to quit) dxtop # Test a compiled model headlessly run_model --model_path yolov8n.dxnn # Reinstall or update the runtime sudo apt update && sudo apt install sixfab-dx
Can I control DEEPX NPU power modes from software?
No. The DX-M1 on the ALPON runs at a fixed performance profile. There is currently no user-facing API to switch between "maximum performance" and "low power" modes. In practice the NPU draws 2 W idle to 5 W under full load on supported AI workloads, which is a narrow enough envelope that dynamic throttling has little operational value at the edge.
If your deployment is power-constrained, control the total inference budget at the application layer: reduce the input frame rate, skip alternate frames, or stop inference between triggers rather than throttling the silicon.
Updated about 8 hours ago
