ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

Digital Trends Watch through web mining, machine learning and recommender systems : A case study for Solvay

(2019)

Files

Buyck_03901400_2019.pdf
  • Closed access
  • Adobe PDF
  • 3.96 MB

Details

Supervisors
Faculty
Degree label
Abstract
This paper is based on a project I had the opportunity to achieve during my internship in Solvay Chemicals. It is called ‘Digital Trends Watch’. Its goal is to identify new trends from article in a list of newsroom URL’s. Following Florent Perache, head of strategic watch in Solvay, it aims mainly at automatizing the strategic trends watch, now taking much time as it is done manually. There is an important second goal: democratizing the access within the Solvay group at a higher level of knowledge of such trends. Today, enterprises need to master the digital tools articulated around data to be able to stay competitive. Solvay did not know how to achieve their will and we had to think about which steps to adopt first. Continuously using the CRISP-DM method, the project was separated in three big steps: web mining, machine learning and recommender system. We used the Python language to implement what was needed, supported by the data science platform Dataiku. Through discovering a way to recognize text articles on the net, finding the good machine learning algorithm and parameters and choosing an appropriate recommender system, the project was achieved. There were a lot of exceptions encountered during the web mining, a lot of errors discovered in the machine learning corpus used, a lot of trials before arriving at what we arrived. This project is not a “one shot”, so results are not immediate and will be brought over time, but it is in fact, an asset that Solvay now disposes of. It should be a way for them to stay competitive and even to detect assets before their competitors can think about it. But the only question left was still: is it more advantageous to use an external solution or not? To have a better idea, we contacted import.io and evaluated their own solution. It seems like our solution is probably better, as long as an internal solution can have many advantages. There was an obvious complexity in each step, and they are described in this paper. The final results at this progress is that Solvay disposes of all the new articles in the most important websites, as well as all the process on such articles to detect new trends and to categorize old ones.