Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.12188/22843
Title: | Framework for Real-Time Parallel and Distributed Natural Language Processing | Authors: | Mileski, D. Zdraveski, Vladimir Kostoska, Magdalena Gushev, Marjan |
Keywords: | framework, real-time processing, natural language processing, parallel processing, distributed processing | Issue Date: | 2021 | Publisher: | IEEE | Conference: | 44th International Convention on Information, Communication and Electronic Technology (MIPRO) | Abstract: | In this paper, we present a new framework for parallel and distributed processing of real-time text streams capable for executing NLP-Natural Language Processing algorithms. The focus is set on acceleration based on attention for building the topology, and not on the individual NLP algorithms. We elaborate the configuration of our specific use case, and discuss the reduction of the time required for system configuration in order to use the benefits of virtualization and containers. Research hypothesis: We can process more text tuples per unit time using the new developed framework for an algorithm that divides the sequential algorithm into smaller jobs and tasks including tokenisation, part of speech tagging, stopwords, sentiment analysis, where each of these individual jobs are specific nodes in the Apache Storm-based topology. We have conducted an experimental proof-of-concept and found the optimal configuration confirming the validity of the hypothesis. | URI: | http://hdl.handle.net/20.500.12188/22843 |
Appears in Collections: | Faculty of Computer Science and Engineering: Conference papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
MIPRO2021_Framework_for_real_time_parallel_and_distributed_Natural_Language_Processing.pdf | 530.72 kB | Adobe PDF | View/Open |
Page view(s)
49
checked on Jul 24, 2024
Download(s)
20
checked on Jul 24, 2024
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.