AI Development

ALPON X5 AI · AI Development

AI Development on the ALPON

End-to-end developer reference for running Edge AI workloads on the ALPON, an industrial edge AI computer built on Raspberry Pi CM5. This page covers the DEEPX DX-M1 AI accelerator, the DEEPX toolchain, the pre-compiled model zoo, the path from an ONNX file to a running container, and the throughput you can expect on the ai gateway in production deployments.

Overview

What is the ALPON AI development workflow?

The ALPON AI development workflow is a two-stage pipeline. You train or obtain a model in PyTorch, TensorFlow, Keras, XGBoost, or MXNet, export it to ONNX, and compile it to the .dxnn format with the DEEPX dx-com compiler on a host PC. You then deploy the .dxnn artifact in a Docker container on the ALPON, where the DEEPX Runtime executes it on the DX-M1 NPU through the dx_engine (Python) or dxrt_api.h (C++) API.

The ALPON is an edge ai computer designed for image-based ai on edge inference. Hardware acceleration is provided by the DEEPX DX-M1, a 25 TOPS NPU with 4 GB of dedicated on-chip memory that is independent of the Raspberry Pi CM5 system RAM. All of the developer tooling described on this page targets that accelerator.

Two-stage pipeline

Source

Training framework

PyTorch · TF · Keras

Interchange

ONNX graph

.onnx

Compile (Host PC)

dx-com

DEEPX compiler

Deploy (ALPON)

dx-rt + dx_engine

.dxnn on DX-M1

Host vs. container architecture

ALPON OS ships with the DEEPX kernel driver preinstalled. Your inference code runs inside Docker containers. This decouples host maintenance from application deployment: OS and driver updates arrive via OTA, while inference logic updates independently through container images.

Host (ALPON OS)

Preinstalled by Sixfab
  • DEEPX kernel driver
  • PCIe enumeration of /dev/dxrt0
  • OTA-managed by ALPON Cloud
  • sixfab-dx runtime and CLI tools

Container (your image)

Deployed by you
  • DEEPX Runtime libraries (libdxrt)
  • dx_engine / dxrt_api.h
  • Compiled model files (.dxnn)
  • Your inference application
Host vs. target

Compilation runs on a host PC once per model. The .dxnn artifact is then copied to the ALPON and loaded at runtime. You do not need a DX-M1 attached to your development machine to compile.

What lives where

The toolchain is split cleanly between the host PC (your development laptop) and the target device (the ALPON). Keep this split in mind when wiring up build systems and CI pipelines.

ComponentRuns onRole
dx-comHost PC (x86 Linux)Compiles ONNX graphs into .dxnn binaries optimized for the DX-M1 NPU.
dx-simHost PC (x86 Linux)Bit-accurate simulator for validating compiled models before shipping them to the device.
dx-tronHost PC (x86 Linux/Windows)Graph viewer for .dxnn files, based on Netron. Used for inspection and debugging.
DEEPX kernel driverALPON (host OS)Exposes the NPU as /dev/dxrt0. Preinstalled on ALPON OS.
sixfab-dxALPON (host OS)Metapackage that installs and maintains the DEEPX Runtime, CLI tools, and Python bindings.
libdxrt (DEEPX Runtime)ALPON (host or container)Loads and executes .dxnn models on the NPU. Provides C++ and Python APIs.
dx_engineALPON (Python)Python binding bundled inside the runtime virtual environment.
dxrt_api.hALPON (C++)Native C++ header. Used for low-latency inference and embedded pipelines.

Languages and APIs

The DEEPX Runtime ships first-class bindings for Python (dx_engine) and C++ (dxrt_api.h). Use Python for rapid prototyping, video analytics with OpenCV or GStreamer, and glue code. Use C++ when you need deterministic latency, embedded integration, or direct runtime control.

Python
dx_engine

Bundled inside the runtime virtual environment. Best fit for video analytics, prototyping, and glue code around OpenCV or GStreamer.

C++
dxrt_api.h

Low-overhead native header. Use when you need deterministic latency, integration with existing C++ pipelines, or embedded applications linking the runtime directly.


AI Accelerator (DEEPX DX-M1)

What is the DEEPX DX-M1 on the ALPON?

The DEEPX DX-M1 is a dedicated ai edge accelerator (NPU) that delivers 25 TOPS of INT8 inference with 4 GB of on-chip memory. It is an M.2 2280 module connected to the Raspberry Pi CM5 host over PCIe Gen3, and it runs image-based AI models (CNNs, YOLO-family detectors, classification, segmentation) compiled from ONNX.

Module
DX-M1
M.2 2280 · DEEPX
AI Performance
25 TOPS
INT8 inference
On-chip Memory
4 GB
Does not share system RAM
Host Interface
PCIe Gen3
Shared via ASM2806I switch
Power (NPU only)
2 to 5 W
Min / max under AI load
Device Node
/dev/dxrt0
PCIe-enumerated
ParameterValue
ModuleDEEPX DX-M1 (M.2 2280)
Inference throughput25 TOPS (INT8)
Dedicated memory4 GB on-chip. The NPU does not share system RAM with the CM5 host.
Host interfacePCIe Gen3 via the ASM2806I packet switch, shared with the NVMe SSD lane.
Power draw2 W minimum, 5 W maximum under supported AI workloads.
Model formatONNX compiled to .dxnn with dx-com.
APIsPython: dx_engine · C++: dxrt_api.h
Device node/dev/dxrt0
Monitoring tooldxtop
Supported architecturesImage-based CNNs. Transformer-based models are not supported on the current runtime.
Power-mode controlNone. The NPU runs at a fixed performance profile; there is no software API for low-power or performance modes.

Supported model formats

Models must be exported to ONNX and compiled with the DEEPX compiler (dx-com) before they execute on the NPU. Source frameworks for ONNX export include PyTorch, TensorFlow, Keras, XGBoost, and MXNet. Any framework that emits a valid ONNX graph of supported operators works.

Supported workloads

The DX-M1 is optimized for image-based AI workloads. The runtime executes convolutional architectures reliably. Transformer-based models are not supported on the current runtime.

  • Object detection: YOLO family (v5, v7, v8, YOLOX), SSD variants
  • Classification: ResNet, MobileNet, EfficientNet
  • Segmentation: U-Net and semantic segmentation backbones
  • OCR: PaddleOCR-based detection, classification, and recognition heads
  • Face detection and recognition: CNN-based pipelines
  • Pose estimation: keypoint-regression CNNs

Known limitations

Plan model architecture and memory footprint around the following constraints before deployment.

Transformer architectures are not supported yet. Attention-heavy models (ViT, DETR, LLMs) will not run on the current runtime.
Image-based CNNs run reliably. Classification, object detection, segmentation, and OCR backbones (ResNet, MobileNet, EfficientNet, YOLO variants, U-Net) are the validated path.
! 4 GB on-chip memory ceiling. Models whose compiled footprint exceeds 4 GB will not load on the NPU.
! No software power-mode control. The NPU runs at a fixed performance profile; there is no user-facing API to toggle performance or low-power modes.

DEEPX Runtime

How do I install the DEEPX Runtime on the ALPON?

The DEEPX Runtime is preinstalled on ALPON OS. No action is required out of the box. To reinstall or update the runtime and its CLI tools, run a single command: sudo apt update && sudo apt install sixfab-dx. The sixfab-dx metapackage provides the kernel driver, runtime libraries, dxrt-cli, dxtop, run_model, and the bundled Python environment with dx_engine.

Package
sixfab-dx
APT metapackage
Python API
dx_engine
bundled venv
C++ API
dxrt_api.h
native header
Monitor
dxtop
CLI, htop-like
Status CLI
dxrt-cli
hardware status
Model Format
.dxnn
from ONNX via dx-com

Install or reinstall the runtime

The runtime ships with every ALPON out of the box, so first-time users can skip ahead to the Deploy Your First Model section. Use the command below only when you need to reinstall, recover from a broken state, or update after a kernel upgrade.

terminal bash
sudo apt update && sudo apt install sixfab-dx
Kernel updates rebuild the driver

If ALPON OS applies a kernel update, the DEEPX kernel module needs to be rebuilt against the new kernel. Rerun sudo apt install sixfab-dx to trigger the rebuild. This is the same recovery path used after any kernel change.

Verify the runtime

Confirm the NPU is enumerated and the driver is healthy with dxrt-cli -s. This is the first command to run when diagnosing any issue.

terminal bash
dxrt-cli -s
Expected output Ready
DXRT v3.2.0
 * Device 0: M1, Accelerator type
 * RT Driver version  : v2.1.0
 * FW version         : v2.5.0
 * Memory : LPDDR5x 6000 Mbps, 3.92 GiB
 * PCIe   : Gen3 X1 [01:00:00]
NPU 0: voltage 750 mV, clock 1000 MHz, temperature 46°C
NPU 1: voltage 750 mV, clock 1000 MHz, temperature 46°C
NPU 2: voltage 750 mV, clock 1000 MHz, temperature 46°C

Installed CLI tools

The sixfab-dx package installs a small set of command-line tools for diagnostics, monitoring, and headless inference.

dxrt-cli
Hardware status, firmware version, and firmware update. First command to run when diagnosing any issue.
dxtop
Real-time NPU monitor with per-core utilization, temperature, voltage, and clock. Like htop for the NPU.
run_model
Headless model runner. Test any compiled .dxnn file without writing application code.

Using the Python API

The dx_engine Python library ships inside the runtime virtual environment. Activate it before importing.

terminal bash
source /usr/lib/libdxrt/dxrt-venv/bin/activate
python -c "import dx_engine; print(dx_engine.__version__)"
No separate pip install

There is no pip install dx_engine command. The library is bundled inside sixfab-dx and loaded from the runtime venv. Do not create a parallel venv; use the one at /usr/lib/libdxrt/dxrt-venv.

Concurrent models

The DEEPX Runtime schedules NPU time across multiple concurrently loaded models automatically. A typical ALPON deployment pairs a detector (for example YOLO) with a classifier or OCR head on the same device. You do not need to implement your own scheduler.


Model Zoo

What is the DEEPX Model Zoo?

The DEEPX Model Zoo is a catalog of pre-compiled .dxnn models that run on the DX-M1 NPU without any conversion work. It covers the common edge ai tasks: object detection, image classification, semantic segmentation, face detection and recognition, pose estimation, and OCR. Models are available in two quantization flavors, Q-Lite (fast INT8 default) and Q-Pro (fine-tuned, higher accuracy). Browse and download at developer.deepx.ai/modelzoo.

Categories and representative models

Object Detection

Bounding-box detectors for real-time video analytics, safety monitoring, and inventory tracking.

YOLOv5 YOLOv7 YOLOv8 YOLOX SSD

Image Classification

Multi-class image classifiers for product sorting, defect detection, and tagging pipelines.

ResNet-18/50 MobileNet v2/v3 EfficientNet-B0/B3

Semantic Segmentation

Per-pixel masks for road-scene understanding, medical imaging, and surface inspection.

U-Net DeepLabV3

Face Detection & Recognition

Detection, alignment, and embedding models for access control and attendance systems.

RetinaFace SCRFD ArcFace

Pose Estimation

2D keypoint regression for workplace safety analytics and human activity monitoring.

YOLOv8-Pose HRNet

OCR

Text detection, classification, and recognition pipelines for document processing and industrial labels.

PaddleOCR det PaddleOCR cls PaddleOCR rec
Architecture support

The Model Zoo catalog reflects what the DEEPX Runtime supports: image-based CNNs. Transformer-based models (ViT, DETR, LLMs) are not supported on the current runtime. Plan your architecture around the categories listed above.

Quantization modes: Q-Lite vs Q-Pro

Models in the zoo ship in two quantization flavors. Pick one based on your accuracy headroom and deployment timeline.

ModeUse caseTrade-off
Q-Lite Default choice. Standard INT8 quantization optimized for fast inference and short compile times. Lower accuracy floor than Q-Pro on sensitive models; typically negligible for well-trained detectors.
Q-Pro Accuracy-sensitive workloads. High-precision quantization with fine-tuning to recover accuracy close to FP32. Longer compile time. Use for production models where every mAP point matters.

How do I download a model from the Model Zoo?

Models are distributed as .dxnn files from the DEEPX developer portal. On the ALPON, fetch the file directly with wget or curl and mount it into your container.

terminal bash
# Example layout: keep models under /opt/models on the host
sudo mkdir -p /opt/models
cd /opt/models
 
# Download a Q-Lite YOLOv5 nano model (example path)
sudo curl -L -O   https://developer.deepx.ai/modelzoo/download/yolov5n_qlite.dxnn

Review the per-model license on the DEEPX developer portal before you redistribute or ship a commercial product derived from a zoo model.


Deploy Your First Model

How do I deploy my first AI model on the ALPON?

Deploying your first model on the ALPON takes five steps: (1) export to ONNX, (2) compile to .dxnn on a host PC with dx-com, (3) copy the artifact to the ALPON, (4) run inference in a privileged Docker container against /dev/dxrt0, and (5) verify with dxtop. If you start from a Model Zoo download, skip steps 1 and 2 and go straight to deployment.

Prerequisites

  • An ALPON powered on and reachable over SSH, with ALPON OS up to date.
  • DEEPX Runtime healthy on the device. Verify with dxrt-cli -s; the runtime ships preinstalled.
  • Docker available on the device (included with ALPON OS).
  • A .dxnn model, either downloaded from the Model Zoo or compiled with dx-com on your host PC.

Step-by-step walkthrough

1
Export your model to ONNX

On your host PC, export the trained model to ONNX with opset 13 or later. This example uses Ultralytics YOLOv8.

host PC python
from ultralytics import YOLO
 
model = YOLO("yolov8n.pt")
model.export(format="onnx", opset=13, simplify=True)
# produces yolov8n.onnx
2
Compile the ONNX graph with dx-com

Run the DEEPX compiler on your host PC to produce a .dxnn artifact. Enable PPU support when available to offload post-processing to the NPU.

host PC bash
dx-com compile   --model yolov8n.onnx   --config yolov8n.cfg   --output yolov8n.dxnn   --ppu
3
Copy the model to the ALPON

Transfer the .dxnn artifact to a stable path on the device, typically /opt/models.

host PC bash
scp yolov8n.dxnn alpon@<device-ip>:/opt/models/
4
Write an inference script

On the ALPON, write a minimal Python script that loads the model and runs one inference pass. Save it as infer.py.

infer.py python
import cv2
import numpy as np
from dx_engine import InferenceEngine
 
# 1) Load the compiled model onto the DX-M1 NPU
engine = InferenceEngine("/models/yolov8n.dxnn")
 
# 2) Prepare an input frame (BGR, 640x640 for YOLOv8n)
frame = cv2.imread("/models/sample.jpg")
input_tensor = cv2.resize(frame, (640, 640))
 
# 3) Run inference on the NPU
outputs = engine.run([input_tensor])
 
# 4) outputs is a list of numpy arrays; shape depends on the model
print("Output tensors:", [o.shape for o in outputs])
5
Build and run the container

Write a minimal Dockerfile that installs the runtime and copies your script. Since the kernel driver runs on the host, the container only needs user-space libraries.

Dockerfile docker
FROM debian:trixie-slim
 
RUN apt-get update && apt-get install -y       sixfab-dx python3-opencv
 
COPY infer.py /app/infer.py
WORKDIR /app
 
# Use the bundled DEEPX venv so dx_engine is importable
CMD ["/usr/lib/libdxrt/dxrt-venv/bin/python", "/app/infer.py"]

Build and run it on the device, mounting /opt/models and exposing the NPU device node.

ALPON bash
docker build -t alpon-infer:latest .
 
docker run --rm --privileged   --device /dev/dxrt0   -v /opt/models:/models   alpon-infer:latest
6
Monitor the NPU

In another SSH session, run dxtop to confirm the NPU is active and memory is allocated. Utilization should spike while your container is running.

terminal bash
dxtop
# watch NPU utilization, memory, and temperature in real time.

Common pitfalls

SymptomLikely cause and fix
ImportError: No module named dx_engine The script is running under the system Python instead of the bundled venv. Invoke /usr/lib/libdxrt/dxrt-venv/bin/python or source the venv first.
Cannot open /dev/dxrt0 inside container Missing --privileged or --device /dev/dxrt0. Add both to docker run or the compose file.
Model fails to load, error references memory Compiled .dxnn footprint exceeds the 4 GB NPU memory. Recompile with a smaller input resolution or switch to a lighter model variant.
Extremely low FPS on YOLO Model compiled without PPU support; post-processing runs on the CPU. Recompile with --ppu.
Compile fails with unsupported operator The ONNX graph contains an attention block or transformer operator. Switch to a CNN-based architecture; transformers are not supported on the current runtime.
dxrt-cli reports no device Reboot and re-check. If still failing, rerun sudo apt install sixfab-dx (rebuilds the kernel module) and inspect dmesg | grep -i dx.

Docker Access

How do I access the NPU from inside a Docker container?

Run the container in privileged mode and expose the NPU device node with --privileged --device /dev/dxrt0. No additional drivers need to be installed inside the image because the kernel driver runs on the host.

Production ALPON deployments run inference code inside Docker containers. The kernel driver lives on the host, so the container only needs access to the device node and the runtime libraries.

terminal bash
# Minimal run command: privileged mode + NPU device node
docker run --privileged   --device /dev/dxrt0   -v $(pwd)/models:/models   -it your-image:tag

The equivalent in docker-compose.yml:

docker-compose.yml yaml
services:
  inference:
    image: your-image:tag
    privileged: true
    devices:
      - "/dev/dxrt0:/dev/dxrt0"
    volumes:
      - ./models:/models
    restart: unless-stopped
Use privileged mode only where required

The --privileged flag grants the container access to all host devices. The DEEPX Runtime uses PCIe ioctls to communicate with the DX-M1 that are not exposed in unprivileged containers, so --privileged is currently required. Limit it to containers that need NPU access, build from trusted base images, and do not expose privileged containers to untrusted networks without additional isolation (seccomp, AppArmor).


Performance & Benchmarks

What FPS can I expect on the ALPON DX-M1?

With PPU (Post-Processing Unit) support enabled, a YOLO-nano class detector reaches approximately 50 FPS at 1280 x 720 and 20 to 25 FPS at 1920 x 1080 on the DEEPX DX-M1. Larger YOLO variants scale down proportionally. Actual throughput depends on model variant, input resolution, pre-processing path, and whether PPU is compiled into the graph.

YOLO throughput reference

Input resolutionApproximate FPS (PPU-compiled YOLO nano)Notes
1280 x 720 (HD) ~50 FPS Real-time processing for single-stream HD video analytics.
1920 x 1080 (Full HD) ~20 to 25 FPS Headroom for additional CPU-side pre- and post-processing.
Compile with PPU whenever you can

PPU-compiled models handle bounding-box decoding and NMS on the NPU, which reduces CPU overhead significantly. Without PPU, post-processing runs on the CM5 CPU and can become the bottleneck on Full HD streams. Most YOLO variants support PPU compilation.

How to benchmark your own model

Use a simple loop around engine.run() and measure wall-clock time over a warm window. Skip the first ~10 iterations to avoid warm-up noise.

bench.py python
import time, numpy as np
from dx_engine import InferenceEngine
 
engine = InferenceEngine("/models/yolov8n.dxnn")
dummy  = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
 
# Warm-up
for _ in range(10):
    engine.run([dummy])
 
# Timed window
N = 200
t0 = time.perf_counter()
for _ in range(N):
    engine.run([dummy])
dt = time.perf_counter() - t0
print(f"FPS: {N/dt:.1f} (n={N})")

Variables that move the numbers

  • Model variant. Nano tier is the fastest; small, medium, and large variants drop FPS roughly proportional to compute.
  • Input resolution. Doubling resolution roughly quarters FPS on detection models.
  • PPU compilation. On or off is usually the largest single factor for YOLO.
  • Quantization mode. Q-Lite runs slightly faster than Q-Pro; choose Q-Pro only if you measure an accuracy regression.
  • Concurrent models. Running a detector and a classifier simultaneously shares NPU time; each will run slower than in isolation.
  • Pre- and post-processing path. OpenCV decode plus color conversion on the CPU can bottleneck high-resolution pipelines. Consider GStreamer with hardware decode for sustained Full HD.

Optional: enable PCIe Gen3

The ALPON is tuned to run the DX-M1 on a Gen3 PCIe link. If you have modified the device tree or carrier configuration, verify the link speed reports Gen3 X1 after reboot.

terminal bash
dxrt-cli -s   # look for: PCIe : Gen3 X1

Hard limits

NPU memory ceiling4 GB on-chip. Models whose compiled footprint exceeds 4 GB will not load.
NPU power envelope2 W minimum, 5 W maximum under supported AI workloads.
Interface bandwidthPCIe Gen3 x1 shared with the NVMe SSD via the ASM2806I packet switch.
Supported architecturesImage-based CNNs. Transformer-based models are not supported on the current runtime.
Power-mode controlNone. The NPU runs at a fixed performance profile; there is no software API for low-power or performance modes.
Benchmark results are workload-dependent

Published FPS numbers are single-stream, dummy-input references. Real-world pipelines add frame capture, decode, color conversion, and result rendering, all of which consume CPU cycles on the CM5. Always measure end-to-end throughput under your exact workload before committing to a deployment budget.


Monitoring & Power

How do I monitor NPU usage, memory, and temperature?

Use dxtop on the ALPON host. It is a command-line monitor that reports NPU utilization, memory, temperature, voltage, and clock in real time, in the spirit of htop for CPUs or nvidia-smi for GPUs.

Live monitor: dxtop

terminal bash
# On the host or inside a privileged container with /dev/dxrt0 mounted
dxtop

For headless monitoring from a remote system, run dxtop over the ALPON Cloud remote terminal or an SSH session. Pipe the output into your logging stack if you want long-running metrics rather than an interactive view.

Quick reference commands

Quick reference bash
# Full hardware status
dxrt-cli -s
 
# Real-time NPU utilisation (q to quit)
dxtop
 
# Test a compiled model headlessly
run_model --model_path yolov8n.dxnn
 
# Reinstall or update the runtime
sudo apt update && sudo apt install sixfab-dx

Can I control DEEPX NPU power modes from software?

No. The DX-M1 on the ALPON runs at a fixed performance profile. There is currently no user-facing API to switch between "maximum performance" and "low power" modes. In practice the NPU draws 2 W idle to 5 W under full load on supported AI workloads, which is a narrow enough envelope that dynamic throttling has little operational value at the edge.

If your deployment is power-constrained, control the total inference budget at the application layer: reduce the input frame rate, skip alternate frames, or stop inference between triggers rather than throttling the silicon.