Copy of AI Accelerator

Reference guide for the DEEPX DX-M1 Ultra NPU on the ALPON X5 AI — covering runtime installation, supported model formats, real-world performance benchmarks, and Docker deployment configuration.

Overview

ParameterValue
ModuleDEEPX DX-M1 Ultra (M.2 2280)
AI Performance25 TOPS
Dedicated Memory4 GB (on-chip — does not share system RAM
Host InterfacePCIe Gen 3 (shared with NVMe SSD via ASM2806I switch)
Power Consumption2W minimum / 5W maximum under AI workloads
Monitoring Tooldxtop (command-line, similar to htop for the NPU)

Installing the DEEPX Runtime

The DEEPX Python library requires the DEEPX Runtime (libdxrt) to be installed first. Without the runtime, the Python library will not function. Run the following commands to install the runtime:

Step 1 — Import the GPG key:

wget -qO - https://durmazdev.github.io/dxrt-apt/public.gpg | \
sudo gpg --dearmor -o /usr/share/keyrings/dxrt-repo.gpg

Step 2 — Add the repository:

echo "deb [signed-by=/usr/share/keyrings/dxrt-repo.gpg] \
https://durmazdev.github.io/dxrt-apt trixie main" | \
sudo tee /etc/apt/sources.list.d/dxrt-repo.list

Step 3 — Install the runtime library:

sudo apt update && sudo apt install libdxrt

Step 4 — Activate the Python environment:

source /usr/lib/libdxrt/dxrt-venv

Driver Deployment Model

The DEEPX driver is installed at the host OS level and will come pre-installed on the ALPON OS image. Application code and the runtime environment are deployed inside Docker containers, enabling clean separation between the system layer and user workloads.


Supported AI Model Formats

Models must be converted to ONNX format before deployment on the DEEPX NPU. The following frameworks are supported for model export:

  • PyTorch
  • TensorFlow / Keras
  • XGBoost
  • MXNet
  • Any framework capable of exporting to ONNX format

Once in ONNX format, models are compiled for the DEEPX hardware using the DEEPX compiler toolchain before being deployed to the device.


Known Limitations

  • Transformer-based model architectures are not yet supported by the DEEPX Runtime.
  • Image-based models (CNNs, detection models) work reliably.
  • Models requiring more than 4 GB of memory will not run on this accelerator.

Real-World Performance

Performance depends on how the model is compiled. Models compiled with PPU (Post-Processing Unit) support achieve significantly higher throughput:

ResolutionApproximate FPS (PPU-compiled YOLO models)
1280 × 720 (HD)~50 FPS
1920 × 1080 (Full HD)~20–25 FPS
📘

PPU support is available for most YOLO model variants. PPU-compiled models handle post-processing on the NPU itself, reducing CPU overhead. Without PPU, post-processing runs on the CPU and may reduce overall throughput.


Running Multiple Models Simultaneously

The DEEPX DX-M1 supports concurrent execution of multiple AI models. Models are scheduled and managed by the DEEPX Runtime.


Task Allocation: CPU vs. NPU

Tasks can be split between the CM5 CPU and the DEEPX NPU. For optimal performance, it is strongly recommended to compile models with PPU support. PPU-compiled models perform all post-processing steps on the NPU, keeping CPU usage low. If a model is compiled without PPU support, post-processing will run on the CPU, which may create a bottleneck depending on workload complexity.


Programming Languages & APIs

LanguageAPI / Library
Pythondx_engine
C++dxrt_api.h

Using the NPU Inside Docker Containers

To access the DEEPX NPU from within a Docker container, use the following configuration:

  • Run the container in privileged mode (--privileged)
  • Mount the NPU device node: --device /dev/dxrt0

Example Docker run command:

docker run --privileged --device /dev/dxrt0 -it your-image:tag

Monitoring NPU Performance

Use the dxtop command to monitor the NPU's real-time status, including utilization and memory consumption:

dxtop

Power Mode Control

There is currently no software interface for switching between NPU power modes. The NPU operates at a fixed performance level and does not support user-configurable power profiles.