Fig. 4. (Color online) (a) is the t-SNE visualization on the closed set changes as the number of labels included in the learning increases from 10 to 100. Each point represents a blog post, and if the data points are the same color, they are from the same author. It is evident that incorporating some non-target ID labels in the learning process is advantageous. (b) is confusion matrix for the 5 nearest neighbors per data point. The average value of the diagonal elements suggests that the model trained with 50 labels is best suited for clustering 10 ID classifications among the evaluated models.
© NPSM