You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the Transparent API / Preferable Target and hope someone can help me understand.
My Object Detection program takes a lot longer to process images when using net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL_FP16)
or image = cv2.imread(filePath, cv2.COLOR_BGR2RGB) uMat = cv2.UMat(image)
I've created 4 benchmark programs running sequentually, processing the same 10 .jpg files.
My Baseline is a standard openCV object detection programm, not using the setPreferableTarget or UMat class for images.
The second one sets the setPreferableTarget to cv2.dnn.DNN_TARGET_OPENCL_FP16
The third converts images into UMat objects
The fourth sets the setPreferableTarget to cv2.dnn.DNN_TARGET_OPENCL_FP16 and converts images into UMat objects.
I always measured the full processing time starting before I read the image, ending after drawing the labels (excluding writing the output image or detection log) as well as the model inference time with t, _ = net.getPerfProfile() infTime = (t / cv2.getTickFrequency())
The collected output is as follows:
Benchmark One Full Processing Time: 2.78063s
Benchmark One Model Inference Time: 1.030843s
Benchmark Two Full Processing Time: 3.2567s
Benchmark Two Model Inference Time: 1.12314s
Benchmark Three Full Processing Time: 12.76886s
Benchmark Three Model Inference Time: 10.83879s
Benchmark Four Full Processing Time: 13.43161047s
Benchmark Four Model Inference Time: 11.27375169s
Is there such a large gap between CPU and GPU execution because of the data transferrel between the processing units? Am I missing something crucial?
If this big gap difference can be explained by the data transfer, is there a possibility to "bundle" my workload to reduce the amount of transferrals?
I can provide the full code for these benchmark Programs if they should be helpful.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
angryGoat500
changed the title
Object Detection running with UMat and/or OpenCL target noticeably slower
[Question] Object Detection running with UMat and/or OpenCL target noticeably slower
Aug 15, 2023
Hey everyone
I have a question regarding the Transparent API / Preferable Target and hope someone can help me understand.
My Object Detection program takes a lot longer to process images when using
net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL_FP16)
or
image = cv2.imread(filePath, cv2.COLOR_BGR2RGB) uMat = cv2.UMat(image)
I've created 4 benchmark programs running sequentually, processing the same 10 .jpg files.
My Baseline is a standard openCV object detection programm, not using the setPreferableTarget or UMat class for images.
The second one sets the setPreferableTarget to cv2.dnn.DNN_TARGET_OPENCL_FP16
The third converts images into UMat objects
The fourth sets the setPreferableTarget to cv2.dnn.DNN_TARGET_OPENCL_FP16 and converts images into UMat objects.
I always measured the full processing time starting before I read the image, ending after drawing the labels (excluding writing the output image or detection log) as well as the model inference time with
t, _ = net.getPerfProfile() infTime = (t / cv2.getTickFrequency())
The collected output is as follows:
Benchmark One Full Processing Time: 2.78063s
Benchmark One Model Inference Time: 1.030843s
Benchmark Two Full Processing Time: 3.2567s
Benchmark Two Model Inference Time: 1.12314s
Benchmark Three Full Processing Time: 12.76886s
Benchmark Three Model Inference Time: 10.83879s
Benchmark Four Full Processing Time: 13.43161047s
Benchmark Four Model Inference Time: 11.27375169s
Is there such a large gap between CPU and GPU execution because of the data transferrel between the processing units? Am I missing something crucial?
If this big gap difference can be explained by the data transfer, is there a possibility to "bundle" my workload to reduce the amount of transferrals?
I can provide the full code for these benchmark Programs if they should be helpful.
Thanks in advance!
The text was updated successfully, but these errors were encountered: