search for




 

https://doi.org/10.3938/NPSM.69.1038
Qualitative Performance Evaluation of the Word-Embeddin Model Through Learning Science Textbook Corpus(K-STeC)
New Phys.: Sae Mulli 2019; 69: 1038~1052
Published online October 31, 2019;  https://doi.org/10.3938/NPSM.69.1038
© 2019 New Physics: Sae Mulli.

Eunjeong YUN, Yunebae PARK*

Department of Physics Education, Kyungpook National University, Daegu 41566, Korea
Correspondence to: ypark@knu.ac.kr
Received June 21, 2019; Revised August 19, 2019; Accepted August 26, 2019.
cc This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
In the context of science education, in order to communicate with students and machines and to establish educational strategies based on them, research on how the characteristics, semantic relationships, and conceptual connections of science language can be represented in machine type. Is required Recently, word embedding has been receiving much attention in relation to machine learning for text, so this study was carried out to evaluate the performance of the word-embedding model, to present the science educational meaning provided by the results, and to suggest follow-up research agendas. As a research methodology, from among the word embedding techniques, Word2vec was used and Gensim library was used through Python 3.6. The input corpus used 24 units on `Force and Motion' at the junior high school level from the Korean science textbook corpus (K-STeC). The performance evaluation of the word-embedding results was done qualitatively by reviewing the list of words printed one by one examining the scientific meaning. We have looked at the result differences of the iteration, minimum frequency, and context range of Word2vec, whether or not the formality morpheme is present, and the size of the input corpus. As a result, we found the variable settings to extract scientific concepts well, add the facts that word lists with different meanings are produced depending on whether a formality morpheme is included or not, and the usable size of the corpus is about 150,000 words containing 24 units.
PACS numbers: 01.40.-d
Keywords: Science textbook corpus, Word embedding, Word2vec


October 2019, 69 (10)
  • Scopus
  • CrossMark