npsm 새물리 New Physics : Sae Mulli

pISSN 0374-4914 eISSN 2289-0041
Qrcode

Article

Research Paper

New Phys.: Sae Mulli 2018; 68: 636-641

Published online June 29, 2018 https://doi.org/10.3938/NPSM.68.636

Copyright © New Physics: Sae Mulli.

Classification of Writing Style by Using a Morpheme Network Analysis

Janggyoon JEONG1, Min-Young LEE1, Minji KWON2, Woo-Sung JUNG*1,2,3

1 Department of Physics, Pohang University of Science and Technology, Pohang 37673, Korea
2 Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang 37673, Korea
3 Asia-Pacific Center for Theoretical Physics (APCTP), Pohang 37673, Korea

Correspondence to:wsjung@postech.ac.kr

Received: February 20, 2018; Revised: April 3, 2018; Accepted: May 8, 2018

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Writing style is hard to define and classify because it has many different definitions. Some researchers, including Mendenhall, Zipf, Mosteller and Wallace, made attempts to approach the authorship problem in a quantitative way. The problem of writing style consists of three categories: authorship identification, authorship characterization and similarity detection. Here, we focus on similarity detection where we compare and classify unlabeled articles. Text mining techniques, with a slight modification of the method, are used to compare the styles of writing. While the existing method mainly focuses on words, we form a network of authors by analyzing the articles based on morphemes. Articles from the Hankyoreh newspaper were targeted, from which 991 and 951 articles were randomly selected to form two discrete networks. We assumed that human and occupational styles were present in writing style, and through modularity to determine communities, we found three groups in both discrete networks. Our results show the possibility of making an objective definition of style, even though various definitions of style may exist linguistics.

Keywords: Complex systems, Network, Writing style, Morphological analysis

Stats or Metrics

Share this article on :

Related articles in NPSM