Object detection with localization using Unity Barracuda and ARFoundation
This is a simple solution to localize the detected object on 2D image into the 3D AR scene using Unity recently released Barracuda with ARFoundation. It works on both iOS and Android mobile devices.
Source code
Github repo: https://github.com/derenlei/Unity_Detection2AR
Why this approach
There aren’t that many open-source real-time 3D object detection for mobile applications. Such methods have accurate 3D bounding boxes to localize objects in the 3D scene (such as MediaPipe); however, we can still develop interesting apps and features without a very detailed 3D bounding boxes.
In this case, we could use “more popular” 2D object detection. We can then localize each 2D bounding box as a few feature points. These feature points will be enough for certain AR interactions.
It’s also easier to train your own 2D object detection model through existing popular CV pipelines and easier to collect 2D training data.
Demo
Preprocess ARCamera Images
You can find a good initial ARCamera image retrieval here. Note that if you’re using the latest preview ARCamera image retrieval sample, you may encounter some memory issues.
Note that if you are using ARCamera image as input, you need to do certain preprocessing (for example, flip, rotate and corp) to get the input format your model want. Here are some helper functions that you may want to use. You can also use our Unity_Detection2AR solution.
Identify bounding boxes across frames as objects
To localize objects, we need to identify which bounding boxes across image frames are pointing to the same object. We group bounding boxes that point to the same object and select the best one that has the highest inference confidence score to do localization. Here’s the basic framework.
This is a very simple method to group bounding boxes assuming objects cannot stack up. It’s highly possible that it is already been improved in Unity_Detection2AR (PhoneARCamera.cs) while you are reading this.
Unity Barracuda and using your own model
Unity just released the product ready Barracuda that changed a few API compared to its preview versions. It’s supported ONNX operators can be found here. For object detection, it currently only supports Yolo up to version 2.
To my best knowledge, Barracuda developers still working on supporting more interesting models, such as Transformers. Looking forward to their future release!
UPDATE 2020.1: added tiny yolo v3 support and can be found in this project!
Acknowledgment
We thank TF-Unity-ARFoundation and TFClassify-Unity-Barracuda that provide good startup codes to make our solution happen!
Thanks for reading!