Individual Differences and Lexical Difficulty : Towards Personalised Difficulty Predictions
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- Identifying the words in a text which might pose a problem for readers is an essential part of several NLP tasks, such as automatic readability assessment and lexical simplification. A common approach is to gather annotations from a large pool of participants and estimate the complexity of a target word based on an aggregation of those individual judgements. The issue with such a method is that it produces one-size-fits-all predictions by assuming that all readers will struggle with the same lexical items. The promising results of studies which adopted a reader-dependent approach to lexical difficulty prediction, however, attest to the limitations of that assumption. This dissertation set out to explore the possibility of producing personalised measures of lexical difficulty by taking individual characteristics into account. Difficulty perception data were gathered from B1 and B2 learners of French, using a four-point annotation scale. Comprehension questions were asked after each reading task in order to assess how well participants had understood what they had read. Inferential statistics and ordinal regression models were used to draw conclusions from the data gathered in the study. The results showed that making personalised predictions on the basis of individual predictors consistently outperformed reader-independent baselines, although fully-personalised learner models performed best. Some of the predictors which were found to be significantly related to lexical difficulty perception were L1 and L2 information, education level, age, country of origin and time spent learning French in a non-native context. We also found a significant positive correlation between the proportion of words a reader had annotated as "transparent" and the quality of their answers to comprehension questions. Although the opposite trend was observed for words labelled with the highest difficulty level, this correlation was not found to be statistically significant. We concluded that personalised predictions of difficulty on the basis of individual differences were a viable alternative to reader-independent approaches and learner models alike.