Atlam, El-Sayed (2020) Estimation and Evaluation Method of Words Tendency Depending on Time-Series Variation and Its Improvements. B P International. ISBN 978-93-90431-05-2
Full text not available from this repository.Abstract
In every text, some words have frequency appearance and are considered as keywords because they have strong relationship with the subjects of their texts, these words frequencies change with time-series variation in a given period. However, in traditional text dealing methods and text search techniques, the importance of frequency change with time-series variation is not considered. Therefore, traditional methods could not correctly determine index of word’s popularity in a given period. In this paper, a new method is proposed to estimate automatically the stability classes (increasing, relatively constant, and decreasing) that indicate word’s popularity with time-series variation based on the frequency change in past texts data. At first, learning data was produced by defining five attributes to measure frequency change of word quantitatively, these five attributes were extracted automatically from electronic texts. These learning data was manually (Human) classified into three stability classes. Then, these data was subjected to a decision tree to determine automatically stability classes of analysis data (test data). For learning data, we obtained the attribute values of 443 proper nouns that were extracted from 2,216 articles of CNN newspapers (1997-1999) that discussed professional baseball. For testing data, 472 proper nouns that were extracted from 972 articles of CNN newspaper (1997-2000) then classified them automatically using decision tree. According to the comparison between the evaluation of the decision tree results and manually (Human) results, F-measures of increasing, relatively constant and decreasing classes were 0.847, 0.851, and 0.768 respectively, and the effectiveness of this method is achieved. The estimating method of considering the frequency change of words with time-series variation is presented in this paper. Stability classes are defined as the index of popularity of words, and five attributes are defined to obtain the frequency change of words quantitatively. The method is proposed to estimate automatically stability classes of words by having DT learning to be done on extracted attributes from past text data. It is confirmed by the test results that classification precision can be improved when all five attributes and the longest learning period are used.
Item Type: | Book |
---|---|
Subjects: | Eurolib Press > Computer Science |
Depositing User: | Managing Editor |
Date Deposited: | 17 Nov 2023 03:59 |
Last Modified: | 17 Nov 2023 03:59 |
URI: | http://info.submit4journal.com/id/eprint/3010 |