Reference Models

Timings

The tables in this section contain inference timings for a set of representative models. The quantized models have been imported and compiled offline using SyNAP toolkit. The floating point models are benchmarked for comparison purpose with the corresponding quantized models.

The mobilenet_v1, mobilenet_v2, posenet and inception models are open-source models available in tflite format from TensorFlow Hosted Models page: https://www.tensorflow.org/lite/guide/hosted_models

yolov5 models are available from https://github.com/ultralytics/yolov5, while yolov5_face comes from https://github.com/deepcam-cn/yolov5-face.

Other models come from AI-Benchmark APK: https://ai-benchmark.com/ranking_IoT.html.

Some of the models are Synaptics proprietary, including test models, object detection (mobilenet224), super-resolution and format conversion models.

The model test_64_128x128_5_132_132 has been designed to take maximum advantage of the computational capabilities of the NPU. It has 64 5x5 convolutions with a [1, 128, 128, 132] input and output. Its execution requires 913’519’411’200 operations (0.913 TOPs). Inference time shows that in the right conditions VS640 and SL1640 achieve above 1.6 TOP/s while VS680 and SL1680 able to achieve above 7.9 TOP/s. For 16-bits inference the maximum TOP/s can be achieved with test_64_64x64_5_132_132. With this model we achieve 0.45 TOP/s on VS640/SL1640 and above 1.7 TOP/s on VS680/SL1680. For actual models used in practice it’s very difficut to get close to this level of performance and it’s hard to predict the inference time of a model from the number of operation it contains. The only reliable way is to execute the model on the platform and measure.

Remarks:

  • In the following tables all timing values are expressed in milliseconds

  • The columns Online CPU and Online NPU represent the inference time obtained by running the original tflite model directly on the board (online conversion)

  • Online CPU tests have been done with 4 threads (--num_threads=4) on both vs680 and vs640

  • Online CPU tests of floating point models on vs640 have been done in fp16 mode (--allow_fp16=true)

  • Online NPU tests on have been done with the timvx delegate (--external_delegate_path=libvx_delegate.so).

  • The Offline Infer column represents the inference time obtained by using a model converted offline using SyNAP toolkit (median time over 10 consecutive inferences)

  • The Online timings represent the minimum time measured (for both init and inference). We took minimim instead of average because this is measure less sensitive to outliers due to the test process being temporarily suspended by the CPU scheduler

  • Online timings, in particular for init and CPU inference, can be influenced by other processes running on the board and the total amount of free memory available. We ran all tests on Android AOSP/64bits with 4GB of memory on VS680 and 2GB on VS640. Running on Android GMS or 32-bits OS or with less memory can result in longer init and inference times

  • Timings for SL1640 and SL1680 corresponds to those of VS640 and VS680, respectively

  • Offline tests have been done with non-contiguous memory allocation and no cache flush

  • Models marked with * come precompiled and preinstalled on the platform

Table 2 Inference timings on VS680, 64-bits OS, 4GB memory

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

inception_v4_299_quant

610.15

24486

18.93

100.79

19.59

mobilenet_v1_0.25_224_quant

2.68

250

1.30

3.70

0.77

*

mobilenet_v2_1.0_224_quant

16.62

1353

2.46

10.98

1.79

*

convert_nv12@1920x1080_rgb@1920x1080

17.52

32.30

*

convert_nv12@1920x1080_rgb@224x224

14.25

1.49

*

convert_nv12@1920x1080_rgb@640x360

13.55

5.14

*

sr_fast_y_uv_1280x720_3840x2160

299

36.26

18.09

12.04

*

sr_fast_y_uv_1920x1080_3840x2160

776

56.49

20.40

17.50

*

sr_qdeo_y_uv_1280x720_3840x2160

153

36.34

21.84

21.50

*

sr_qdeo_y_uv_1920x1080_3840x2160

246

41.96

24.11

26.97

*

posenet_mobilenet_075_float

43.03

53.71

*

posenet_mobilenet_075_quant

39.01

564

6.89

1.84

2.32

mobilenet224_full80

755.52

26.14

*

yolov5m-640x480

11464

167.74

54.11

118.82

yolov5s-640x480

4506

111.81

22.17

75.83

yolov5s_face_640x480_onnx_mq

21.98

35.31

*

mobilenet224_full1

615.31

16.02

*

deeplab_v3_plus_quant

297.82

4693

62.48

7.68

59.81

dped_quant

335.63

1191

9.58

4.74

8.82

inception_v3_float

432.57

415.76

inception_v3_quant

328.16

13504

10.62

59.55

10.22

mobilenet_v2_b4_quant

67.14

1445

14.08

11.53

13.63

mobilenet_v2_float

28.16

29.84

mobilenet_v2_quant

16.57

1431

2.65

9.27

1.98

mobilenet_v3_quant

59.63

1760

10.62

13.15

10.15

pynet_quant

1067.53

6389

19.76

24.45

19.30

srgan_quant

1816.22

4680

56.43

14.72

56.95

unet_quant

288.01

906

10.38

7.73

14.80

vgg_quant

1641.18

1987

30.50

10.74

30.07

test_64_128x128_5_132_132

50.07

119.34

sublima_cnn_model_relu_400_pruned_mq

1.49

45.69

sublima_cnn_model_relu_400_pruned_uint8

1.49

27.12

sublima_cnn_model_relu_400_pruned_uint8full

1.49

26.25

Table 3 Inference timings on VS640, 64-bits OS, 2GB memory

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

inception_v4_299_quant

1006.29

38736

54.07

127.13

53.82

mobilenet_v1_0.25_224_quant

5.12

381

1.82

4.99

0.93

*

mobilenet_v2_1.0_224_quant

29.30

1947

3.20

14.21

2.31

*

convert_nv12@1920x1080_rgb@1920x1080

17.48

34.49

*

convert_nv12@1920x1080_rgb@224x224

15.14

1.25

*

convert_nv12@1920x1080_rgb@640x360

14.70

5.29

*

sr_fast_y_uv_1280x720_3840x2160

309

54.75

17.87

17.01

*

sr_fast_y_uv_1920x1080_3840x2160

618

88.67

20.35

25.90

*

sr_qdeo_y_uv_1280x720_3840x2160

20.33

26.16

*

sr_qdeo_y_uv_1920x1080_3840x2160

22.03

33.56

*

posenet_mobilenet_075_float

125.53

90.06

*

posenet_mobilenet_075_quant

49.06

827

10.80

2.48

4.13

mobilenet224_full80

718.96

52.98

*

yolov5m-640x480

17885

234.41

60.64

178.00

yolov5s-640x480

7132

145.38

24.90

103.36

yolov5s_face_640x480_onnx_mq

27.31

63.06

*

mobilenet224_full1

595.52

36.53

*

deeplab_v3_plus_quant

442.51

4877

84.37

8.31

70.85

dped_quant

630.28

1287

26.65

6.71

25.72

inception_v3_float

991.44

706.98

inception_v3_quant

536.96

21292

31.00

80.70

29.82

mobilenet_v2_b4_quant

120.55

2144

19.72

13.60

18.39

mobilenet_v2_float

70.24

49.92

mobilenet_v2_quant

29.37

2097

3.33

13.81

2.44

mobilenet_v3_quant

107.90

2770

13.62

16.38

11.91

pynet_quant

1932.73

10494

59.03

31.04

56.30

srgan_quant

2766.75

5232

121.92

15.97

121.75

unet_quant

543.22

1474

19.93

9.93

24.20

vgg_quant

2969.34

2871

103.66

10.65

102.65

test_64_128x128_5_132_132

63.95

563.81

Super Resolution

Synaptics provides two proprietary families of super resolution models: fast and qdeo, the former provides better inference time, the latter better upscaling quality. They can be tested using synap_cli_ip application, see synap_cli_ip Application.

These models are preinstalled in $MODELS/image_processing/super_resolution .

Table 4 Synaptics SuperResolution Models on Y+UV Channels

Name

Input Image

Ouput Image

Factor

Notes

sr_fast_y_uv_960x540_3840x2160

960x540

3840x2160

4

sr_fast_y_uv_1280x720_3840x2160

1280x720

3840x2160

3

sr_fast_y_uv_1920x1080_3840x2160

1920x1080

3840x2160

2

sr_qdeo_y_uv_960x540_3840x2160

960x540

3840x2160

4

sr_qdeo_y_uv_1280x720_3840x2160

1280x720

3840x2160

3

sr_qdeo_y_uv_1920x1080_3840x2160

1920x1080

3840x2160

2

sr_qdeo_y_uv_640x360_1920x1080

640x360

1920x1080

3

Format Conversion

Conversion models can be used to convert an image from NV12 format to RGB. A set of models is provided for the most commonly used resolutions. These models have been generated by taking advantage of the preprocessing feature of the SyNAP toolkit (see Preprocessing) and can be used to convert an image so that is can be fed to a processing model with RGB input.

These models are preinstalled in $MODELS/image_processing/preprocess and can be tested using synap_cli_ic2 application, see synap_cli_ic2 Application.

Table 5 Synaptics Conversion Models NV12 to RGB 224x224

Name

Input Image (NV12)

Ouput Image (RGB)

Notes

convert_nv12@426x240_rgb@224x224

426x240

224x224

convert_nv12@640x360_rgb@224x224

640x360

224x224

convert_nv12@854x480_rgb@224x224

854x480

224x224

convert_nv12@1280x720_rgb@224x224

1280x720

224x224

convert_nv12@1920x1080_rgb@224x224

1920x1080

224x224

convert_nv12@2560x1440_rgb@224x224

2560x1440

224x224

convert_nv12@3840x2160_rgb@224x224

3840x2160

224x224

convert_nv12@7680x4320_rgb@224x224

7680x4320

224x224

Table 6 Synaptics Conversion Models NV12 to RGB 640x360

Name

Input Image (NV12)

Ouput Image (RGB)

Notes

convert_nv12@426x240_rgb@640x360

426x240

640x360

convert_nv12@640x360_rgb@640x360

640x360

640x360

convert_nv12@854x480_rgb@640x360

854x480

640x360

convert_nv12@1280x720_rgb@640x360

1280x720

640x360

convert_nv12@1920x1080_rgb@640x360

1920x1080

640x360

convert_nv12@2560x1440_rgb@640x360

2560x1440

640x360

convert_nv12@3840x2160_rgb@640x360

3840x2160

640x360

convert_nv12@7680x4320_rgb@640x360

7680x4320

640x360

Table 7 Synaptics Conversion Models NV12 to RGB 1920x1080

Name

Input Image (NV12)

Ouput Image (RGB)

Notes

convert_nv12@426x240_rgb@1920x1080

426x240

1920x1080

convert_nv12@640x360_rgb@1920x1080

640x360

1920x1080

convert_nv12@854x480_rgb@1920x1080

854x480

1920x1080

convert_nv12@1280x720_rgb@1920x1080

1280x720

1920x1080

convert_nv12@1920x1080_rgb@1920x1080

1920x1080

1920x1080

convert_nv12@2560x1440_rgb@1920x1080

2560x1440

1920x1080

convert_nv12@3840x2160_rgb@1920x1080

3840x2160

1920x1080

convert_nv12@7680x4320_rgb@1920x1080

7680x4320

1920x1080