Files
Leconte_56021700_2023.pdf
Open access - Adobe PDF
- 7.84 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- Context : Object detection is a critical task in computer vision that involves identifying and locating objects of interest within an image or video. It has become increasingly important in recent years due to its wide range of applications in various fields, such as autonomous driving, surveillance, robotics, and healthcare. There are multiple machine learning methods for object detection, based on neural networks or not, but most recent and efficient methods use Deep Learning. These learning methods require vast amounts of data to achieve the most effective models. The exchange and storage of such large amounts of data also poses significant challenges as it is time-consuming, costly and the data flow increases exponentially. Hence, it is often more advisable to compress these data, usually with loss. The challenge then becomes a trade-off between the loss of information due to compression and the model's detection performance. To find the best balance, it is necessary to use efficient compression methods that minimize image size while retaining maximum information. It is equally essential to develop the most robust models possible to deal with this compression. This latter point is the subject of this work. Material and methods : The experiments are conducted using the DOTA dataset and YOLOv5 as deep learning detection algorithm. The images are compressed using JPEG2000 coding system. The first part of the experiments concerns the optimisation of the training of the network by pre-processing the dataset, training on compressed images and fine-tuning. The second part focuses on applying and analyzing ensemble methods, namely TTA and models ensembling. Results : Optimizing the training in the scope of detecting objects on compressed images led to an improvement of the performances by 50%. The application of TTA resulted in a notable performance boost of up to 4%, specifically on small objects and low-compressed images. By selecting models carefully, ensembling demonstrated a slight improvement of around 1%. The combination of the two methods produced the best performance measured on highly compressed images with an improvement of around 2%. Conclusion : Training optimizations and ensemble methods can be joint to enhance the detection of objects by deep CNNs on compressed images. Moreover, recent publications and ongoing research hold promise for further advancements in this area.