search for
Classification of Writing Style by Using a Morpheme Network Analysis
New Phys.: Sae Mulli 2018; 68: 636~641
Published online June 29, 2018;
© 2018 New Physics: Sae Mulli.

Janggyoon JEONG1, Min-Young LEE1, Minji KWON2, Woo-Sung JUNG*1,2,3

1 Department of Physics, Pohang University of Science and Technology, Pohang 37673, Korea
2 Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang 37673, Korea
3 Asia-Pacific Center for Theoretical Physics (APCTP), Pohang 37673, Korea
Correspondence to:
Received February 20, 2018; Revised April 3, 2018; Accepted May 8, 2018.
cc This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Writing style is hard to define and classify because it has many different definitions. Some researchers, including Mendenhall, Zipf, Mosteller and Wallace, made attempts to approach the authorship problem in a quantitative way. The problem of writing style consists of three categories: authorship identification, authorship characterization and similarity detection. Here, we focus on similarity detection where we compare and classify unlabeled articles. Text mining techniques, with a slight modification of the method, are used to compare the styles of writing. While the existing method mainly focuses on words, we form a network of authors by analyzing the articles based on morphemes. Articles from the Hankyoreh newspaper were targeted, from which 991 and 951 articles were randomly selected to form two discrete networks. We assumed that human and occupational styles were present in writing style, and through modularity to determine communities, we found three groups in both discrete networks. Our results show the possibility of making an objective definition of style, even though various definitions of style may exist linguistics.
PACS numbers: 89.65.-s, 89.75.-k, 89.75.Hc
Keywords: Complex systems, Network, Writing style, Morphological analysis

October 2018, 68 (10)
  • Scopus
  • CrossMark