Performance enhancement of embedded object detection via neural hardware acceleration
Telecommunication Computing Electronics and Control
Abstract
This paper presents the first benchmarking of you only look once version 11 (YOLO11) on the Rockchip RK3566 neural processing unit (NPU) within the Orange Pi 3B platform. Performance was compared between the quad-core ARM Cortex-A55 CPU and the integrated NPU using the COCO2017 dataset, evaluating latency, energy, and accuracy. NPU acceleration achieved >80% latency reduction and ≈ 94% lower per-inference energy consumption, with speedup of up to 16.7× while maintaining accuracy within 0.03 mean average precision (mAP) of the baseline. Average power remained nearly constant (3.60 W central processing unit (CPU) vs. 3.59 W NPU), indicating that the efficiency gains stem from reduced inference time rather than lower wattage. Limitations included unstable INT8 quantization due to unsupported operators and calibration-range mismatch, as well as minor CPU-side overhead in preprocessing and non-maximum suppression. The findings confirm that the RK3566 NPU delivers substantial efficiency gains without accuracy loss, enabling compact and low-cost platforms to sustain modern object-detection workloads. This demonstrates that affordable NPUs can provide reliable, real time artificial intelligence (AI) inference for embedded vision, internet of things (IoT), and robotics applications.
Discover Our Library
Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.





