npsm 새물리 New Physics : Sae Mulli

pISSN 0374-4914 eISSN 2289-0041
Qrcode

Article

Research Paper

New Phys.: Sae Mulli 2020; 70: 493-502

Published online May 29, 2020 https://doi.org/10.3938/NPSM.70.493

Copyright © New Physics: Sae Mulli.

Analysis of Articles Related to Quantum Mechanics by Using Topic Modeling: Focus on Comparing LDA, CTM, STM Models

토픽 모델링을 활용한 양자역학 관련 기사 분석: LDA 모형, CTM 모형, STM 모형 결과 비교를 중심으로

Hyunguk KIM1,Jinwoong SONG2*

1Department of Physics Education, Seoul National University, Seoul 08826, Korea

2Center for Educational Research, Seoul National University, Seoul 08826, Korea

Correspondence to:jwsong@snu.ac.kr

Received: November 8, 2019; Revised: April 2, 2020; Accepted: April 2, 2020

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In this study, topic modeling was used to analyz 154 articles related to quantum mechanics that had been uploaded on Phy.org, from January 2004 through May 2019. To do so, we created a Document-Term-Matrix (DTM), and we analyzed the frequency and the engram in the articles. We performed topic modeling using the Latent Dirichlet Allocation(LDA), Correlated Topic Model(CTM), and Structural Topic Model (STM), and according to the likelihood estimates in the LDA and the CTM models, the number of topics was assumed to be 10 while in the STM model, the number of topics was assumed to be 30. First, as a result of the analysis using the LDA model, we found that many documents were allocated to the topic, ‘quantum mechanic research’. In the CTM model analysis, we found that the topics allocated to ‘quantum mechanic research’ in the LDA model were classified in detail. In the STM model analysis, we were able to find a statistically significant possibility of manifestation based on the variables set through metadata.

Keywords: Topic modeling, Quantum mechanics, Article, Text big data analysis

이 연구는 토픽 모델링을 이용하여 2004년 1월부터 2019년 5월까지 Phy.org에 올려진 양자역학 관련 기사 154건을 분석한 것이다. 이를 위해 DTM을 만들었으며, 기사에 나타난 단어들의 빈도수, 엔그램을 분석하였다. 토픽 모델링 분석은 LDA 모형과 CTM 모형, STM 모형을 통해 수행하였는데 가능도를 통한 추정에 따라 LDA 모형과 CTM 모형에서는 토픽의 수를 10개로 가정하였으며, STM 모형에서는 토픽의 수를 30개로 가정하고 분석하였다. 먼저, LDA 모형분석 수행 결과 ‘quantum mechanic research’ 토픽에 많은 문서가 할당되었다는 것을 찾을 수 있었다. CTM 모형분석에서는 LDA 모형에서 ‘quantum mechanic research’에 할당되었던 문서의 토픽이 세부적으로 분류된 점을 확인하였다. STM 모형분석은 메타데이터를 통하여 변수로 설정한 것을 기준으로 통계적으로 유의한 발현 가능성을 확인할 수 있었다.

Keywords: 토픽 모델링, 양자역학, 기사, 텍스트 빅데이터 분석

Stats or Metrics

Share this article on :

Related articles in NPSM