ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

Unlocking the ancient history of Japan, hidden behind kuzushiji: participating in the kuzushiji recognition competition on Kaggle

(2024)

Files

Kucheriavykh_05351900_2024.pdf
  • Open access
  • Adobe PDF
  • 30.23 MB

Details

Supervisors
Faculty
Degree label
Abstract
Kuzushiji is the cursive writing style of pre-modern Japanese. Millions of books and historical documents written in kuzushiji remain undeciphered. To address this challenge, we developed a two-stage optical character recognition (OCR) pipeline. The first stage, a text detector based on the Co-DETR architecture, locates and classifies individual characters into broad categories. The second stage employs a Mixture-of-Experts approach, utilizing specialized SVTR models to recognize characters within each category. The pipeline was trained on the Kuzushiji Recognition dataset from Kaggle. Evaluation on the test set demonstrates promising results, with high F1-scores in text detection, character categorization, and character recognition. The model's ability to handle diverse challenges, such as character variations, faded text, and complex layouts, is showcased through qualitative analysis. This research contributes to the ongoing efforts to unlock the valuable historical and cultural knowledge contained within old Japanese texts.