The ChatGPT of wine: to what extent can NLP models be as accurate as empirical models in the context of a wine variety prediction task?
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- In this increasingly digitalized world, many innovative AI tools are emerging. Fueled by data, they arouse interest in fields like Natural Language Processing and Machine Learning. Although these two disciplines have made significant progress on their own, we have yet to witness real synergies between them. Moreover, bridging these disciplines makes sense since leveraging human language could help fasten the adoption of AI technology by making it more accessible. This work aims at better understanding the gap between human language and empirical data. The research methodology involves two datasets: one with quantitative properties of wines and the other with textual reviews. The quantitative data include factors like acidity, sweetness, tannicity, power, and aromas. Textual reviews provide subjective descriptions of taste and aromas. To predict the wine variety, Machine Learning algorithms and NLP models are trained and evaluated on 8 different possibilities of white and red varieties: Cabernet-Sauvignon, Chardonnay, Merlot, Malbec, Pinot Noir, Riesling, Sangiovese & Zinfandel. According to the findings, BERT model is promising in predicting wine variety but has poorer accuracy when compared to empirical models. However, the limits and constraints of NLP models are well offset by the user-friendliness and speed that this model provides to wine lovers. The thesis also emphasizes the significance of training NLP models on specific datasets, as this has a major impact on their ability to discover patterns related to wine data. The comparison of the BERT model with the ChatGPT model clearly shows that the latter's results suffer greatly from the lack of task-specific training. Extending our findings into less lexically rich sectors and investigating hybrid models to improve accuracy and resilience could be future research objectives that will contribute to existing studies on the computerization of taste. In conclusion, this master's thesis offers useful insights into the accuracy of NLP models in the wine variety prediction challenge. It emphasizes the possibilities and limitations of NLP models in comparison to empirical models, underlying the importance of additional improvements and domain-specific training. The findings add to ongoing research on the computerization of taste and inspire future investigation regarding to the possibilities of NLP models in wine and related fields.