

<?xml version="1.0" encoding="UTF-8"?>
<record>
  <title>A Feature Selection Method to Handle Imbalanced Data in Text Classification</title>
  <journal>Journal of Digital Information Management</journal>
  <author>Fengxiang Chang, Jun Guo, Weiran Xu, Kejun Yao</author>
  <volume>13</volume>
  <issue>3</issue>
  <year>2015</year>
  <doi></doi>
  <url></url>
  <abstract>Imbalanced data problem is often
encountered in application of text classification. Feature
selection, which could reduce the dimensionality of feature
space and improve the performance of the classifier, is
widely used in text classification. This paper presents a
new feature selection method named NFS, which selects
class information words rather than terms with high
document frequency. To improve classifier performance
further, we combine a feature selection method (NFS)
with data resampling technology to solve the problem of
imbalanced data. Experiments were evaluated on Reuters-
21578 Collection, and results show that the NFS method
performs better than chi-square statistics and mutual
information on the original dataset when the number of
selected features is greater than 1000. The maximum
value of Macro-F1 is 0.7792 when the NFS method is
applied to the resampling dataset, which represents an
increase in Macro-F1 by 4.02% given the original dataset.
Thus, our proposed method effectively improves minority
class performance.</abstract>
</record>
