Improved convolutional neural network for biomedical word sense disambiguation with enhanced context feature modeling

Improved convolutional neural network for biomedical word sense disambiguation with enhanced context feature modeling Journal of Digital Information Management REN Kai , WANG Shi-Wen 14 6 2016 http://dline.info/fpaper/jdim/v14i6/jdimv14i6_1.pdf Polysemy is a common phenomenon in the biomedical domain. Ambiguous words directly influence the accuracy of computer semantic analyses. Thus, word sense disambiguation (WSD) is often conducted in advance. Most current biomedical WSD methods rely on manual selection of features for WSD. To identify latent context features from a deep layer and reduce the negative influence of manual selection of features in WSD, this paper proposes the Convolutional Neural Network (CNN) method for biomedical WSD with enhanced text feature modeling. First, this program automatically conducts crawling of a large scale of relevant corpus from MEDLINE for training and obtains relevant context feature vectors. These feature vectors are subsequently adopted as input data in CNN. Finally, the CNN classification method is used for WSD. By testing 203 commonly used ambiguous words from MSH-WSD corpus, the author finds that the average accuracy of the proposed method is 94.65%, which is a significant improvement relative to that of previous methods. This result proves that CNN is an efficient WSD method to be used in the biomedical domain. Given that context feature representation and WSD are important pre-works in extraction and retrieval of biomedical information, WSD can reduce the negative effect of ambiguous words on accuracy of such pre-works.