Entity Similarity through word embedding and Named Entity Recognition using word vectors
Files
Gusbin_53801200_Vrielynck_80991000_Annexe1.zip
UCLouvain restricted access - Unknown
- 4.46 MB
Gusbin_53801200_Vrielynck_80991000_2017.pdf
UCLouvain restricted access - Adobe PDF
- 9.58 MB
Gusbin_53801200_Vrielynck_80991000_2017_Erratum.pdf
UCLouvain restricted access - Adobe PDF
- 23.16 KB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- The recently introduced Word2vec and GloVe models are efficient methods to build quality word embeddings. In this thesis we investigated and implemented both models to compute Entity Similarity. We assessed two different approaches. The first one considers entities as normal words to include them in the final word embedding. The second approach uses the already built word embedding to project the entities inside it. A complete data enrichment pipeline was also designed to increase the data quality and improve the final results. Currently, Named Entity Recognition state-of-the-art uses Conditional Random Fields. We built a word vector based Multilayer Bidirectional Long-Short-Term-Memory Recurrent Neural Network using the deep learning framework Tensorflow. Providing only few feature and archi- tecture engineering the model achieved near to state-of-the-art results. Considering a few more optimizations explained in the thesis, Recurrent Neural Network using word vector could become the next state-of-the-art method.