A Comprehensive Analysis of LayoutLM and Donut for Document Classification

Bajrami, Merxhan; Zdravevski, Eftim; Lameski, Petre; Stojkoska, Biljana

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/27397

Title:	A Comprehensive Analysis of LayoutLM and Donut for Document Classification
Authors:	Bajrami, Merxhan Zdravevski, Eftim Lameski, Petre Stojkoska, Biljana
Keywords:	document classification, layout analysis, OCR, intelligent document processing
Issue Date:	Jul-2023
Publisher:	Ss Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering, Republic of North Macedonia
Series/Report no.:	CIIT 2023 papers;22;
Conference:	20th International Conference on Informatics and Information Technologies - CIIT 2023
Abstract:	Document classification is important in everyday life as it allows for efficient management and organization of vast amounts of digital documents, saving time and resources. This task is essential for businesses, organizations, and individ uals who handle large volumes of data and need to quickly retrieve and analyze specific information. AI-based document classification can help organizations better manage and organize their digital assets, improve information retrieval, and make better business decisions based on the insights derived from the classified documents. This paper compares the performance of two transformer-based models, LayoutLM and Donut, for image classification tasks on two different datasets. LayoutLM was trained using pre-trained weights from Microsoft, while Donut used pre-trained weights from Huggingface. Both models were fine-tuned for 100 epochs with early stopping technique, using the Adam optimizer and Cross Entropy Loss. Our results show that LayoutLM performs better than Donut on the first dataset, achieving an overall accuracy of 0.88, while Donut achieved an accuracy of 0.74. Our study demonstrates the importance of carefully selecting and evaluating different models for document classification tasks, based on the specific char- acteristics of the dataset and the task requirements. Additionally, we provide insights into the strengths and weaknesses of both LayoutLM and Donut models for document classification on different datasets.
URI:	http://hdl.handle.net/20.500.12188/27397
Appears in Collections:	Faculty of Computer Science and Engineering: Conference papers

Files in This Item:

File	Description	Size	Format
CIIT2023_paper_22.pdf		9.19 MB	Adobe PDF	View/Open

Show full item record

Page view(s)

214

checked on Jul 11, 2024

Download(s)

602

checked on Jul 11, 2024

Google Scholar^TM

Check

Repository of UKIM

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM