Patents similarity and clustering analysis in the industry

Files

Supervisors: Delvenne, Jean-Charles
Degree label: Master [120] : ingénieur civil en science des données, à finalité spécialisée
Abstract: Patents and intellectual property are important in the industry to avoid getting fined for copyright infringement. They could also be useful for a lot of other reasons, like having an overview of the state of a specific technology at a specific time to compare it to its state later in time. The problem is that many patents are introduced every day and it is difficult to keep an eye on all of them. Is it possible to create a code that would extract the main information from thousands of patents in order to not need to read all of them ? This work tries to answer this question through machine learning and clustering analysis by testing different algorithms and checking which one of them produces the best results based on the topics present in the clusters. The approach that seems to work the best is the spectral clustering with TF-IDF embedding but other approaches were discussed and are not necessarily to be discarded. The main goal of this work is to provide a tool that could be used by the enterprise AGC for business, therefore the development of a visualization tool was needed. Here it takes the form of an automated latex document generator with all the results of the different algorithms that can be compiled to obtain a PDF file.