ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

Extracting ESG data from business documents

(2021)

Files

VanDerElst_46041600_2021.pdf
  • Open access
  • Adobe PDF
  • 2.99 MB

Details

Supervisors
Faculty
Degree label
Abstract
Documents are a natural way for humans to share information. They work very well to represent data in a variety of formats such as text, tables, graphs, titles, footnotes etc. They are, however, time-consuming to read through and not suited as input for computer driven analysis. It is natural to ask ourselves whether this unstructured data can be extracted into a structured format to drive algorithms that perform quantitative analysis. This is a challenging task as heuristic based methods are often tricky to implement and don't generalize well, whereas natural language processing techniques don't have the tools to capture the layout information necessary to correctly interpret these documents. This work studies different techniques for data extraction from business documents in the particular case of extracting CO2 emissions and compares their weaknesses in order to propose improvements.