Optimizing Content Marketing Using Automatic Keyword Extraction To Get Topic Prediction

  • Savitri Indriyani Swiss German University, Indonesia


Digital news with a variety topics is abundant on the internet. The problem is to classify news based on its appropriate category to facilitate user to find relevant news rapidly. The manual categorization of text documents requires a lot of financial and human resources to do the process. In order to get so, topic modeling usually used to classify documents. In the used topic models (LSA, LDA) each word in the corpus of vocabulary is connected with one or more topics with a probability, as estimated by the model. Many (LDA, LSA) models were built with different values of coherence and pick the one that produces the highest coherence value. Based on the result, we summarized some points, three models above can answer the question in Research Question, those models can be applied in the future to company’s automation prosess of determining topic automatically. LDA using BOW and LSA using BOW would be priority option to be applied.


Download data is not yet available.


Albalawi, R., Yeap, T. H., & Benyoucef, M. (2020). Using topic modeling methods for short-text data: A comparative analysis. Frontiers in Artificial Intelligence, 3, 42.

Beliga, S., Meštrović, A., & Martinčić-Ipšić, S. (2015). An overview of graph-based keyword extraction methods and approaches. Journal of Information and Organizational Sciences, 39(1), 1–20.

Chen, Y., Rabbani, R. M., Gupta, A., & Zaki, M. J. (2017). Comparative text analytics via topic modeling in banking. 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 1–8.

Keneshloo, Y., Wang, S., Han, E.-H., & Ramakrishnan, N. (2016). Predicting the popularity of news articles. Proceedings of the 2016 SIAM International Conference on Data Mining, 441–449.

Lee, S., & Kim, H. (2008). News keyword extraction for topic tracking. 2008 Fourth International Conference on Networked Computing and Advanced Information Management, 2, 554–559.

Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.

Loza, V., Lahiri, S., Mihalcea, R., & Lai, P.-H. (2014). Building a Dataset for Summarization and Keyword Extraction from Emails. LREC, 2441–2446.

Oghaz, T. A., Mutlu, E. Ç., Jasser, J., Yousefi, N., & Garibay, I. (2020). Probabilistic model of narratives over topical trends in social media: A discrete time model. Proceedings of the 31st ACM Conference on Hypertext and Social Media, 281–290.

Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232–247.

Payak, A., Rai, S., Shrivastava, K., & Gulwani, R. (2020). Automatic text summarization and keyword extraction using natural language processing. 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), 98–103.

Pilato, G., & Vassallo, G. (2014). TSVD as a statistical estimator in the latent semantic analysis paradigm. IEEE Transactions on Emerging Topics in Computing, 3(2), 185–192.

Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 399–408.

Shi, L.-L., Liu, L., Wu, Y., Jiang, L., & Hardy, J. (2017). Event detection and user interest discovering in social media data streams. IEEE Access, 5, 20953–20964.

Thomas, J. R., Bharti, S. K., & Babu, K. S. (2016). Automatic keyword extraction for text summarization in e-newspapers. Proceedings of the International Conference on Informatics and Analytics, 1–8.

Vinodhini, G., & Chandrasekaran, R. M. (2012). Sentiment analysis and opinion mining: a survey. International Journal, 2(6), 282–292.