The ChatGPT of wine: to what extent can NLP models be as accurate as empirical models in the context of a wine variety prediction task?

Verleyen, Simon; De Ro, Corentin

Files

DeRo_13101800_Verleyen_27121800_2023.pdf

Open access
Adobe PDF
5 MB

Download

DeRo_13101800_Verleyen_27121800_2023_Annexe.pdf

Open access
Adobe PDF
6.4 MB

Download

Details

Supervisors: Vande Kerckhove, Corentin
Faculty: Louvain School of Management
Degree label: Master [120] : ingénieur de gestion, à finalité spécialisée
Abstract: In this increasingly digitalized world, many innovative AI tools are emerging. Fueled by data, they arouse interest in fields like Natural Language Processing and Machine Learning. Although these two disciplines have made significant progress on their own, we have yet to witness real synergies between them. Moreover, bridging these disciplines makes sense since leveraging human language could help fasten the adoption of AI technology by making it more accessible. This work aims at better understanding the gap between human language and empirical data. The research methodology involves two datasets: one with quantitative properties of wines and the other with textual reviews. The quantitative data include factors like acidity, sweetness, tannicity, power, and aromas. Textual reviews provide subjective descriptions of taste and aromas. To predict the wine variety, Machine Learning algorithms and NLP models are trained and evaluated on 8 different possibilities of white and red varieties: Cabernet-Sauvignon, Chardonnay, Merlot, Malbec, Pinot Noir, Riesling, Sangiovese & Zinfandel. According to the findings, BERT model is promising in predicting wine variety but has poorer accuracy when compared to empirical models. However, the limits and constraints of NLP models are well offset by the user-friendliness and speed that this model provides to wine lovers. The thesis also emphasizes the significance of training NLP models on specific datasets, as this has a major impact on their ability to discover patterns related to wine data. The comparison of the BERT model with the ChatGPT model clearly shows that the latter's results suffer greatly from the lack of task-specific training. Extending our findings into less lexically rich sectors and investigating hybrid models to improve accuracy and resilience could be future research objectives that will contribute to existing studies on the computerization of taste. In conclusion, this master's thesis offers useful insights into the accuracy of NLP models in the wine variety prediction challenge. It emphasizes the possibilities and limitations of NLP models in comparison to empirical models, underlying the importance of additional improvements and domain-specific training. The findings add to ongoing research on the computerization of taste and inspire future investigation regarding to the possibilities of NLP models in wine and related fields.

ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

The ChatGPT of wine: to what extent can NLP models be as accurate as empirical models in the context of a wine variety prediction task?

Files

DeRo_13101800_Verleyen_27121800_2023.pdf

DeRo_13101800_Verleyen_27121800_2023_Annexe.pdf

Details