Journal of Digital Information Management


Vol No. 21 ,Issue No. 1 2023

Empirical Analysis on the Efficiency of Clustering Algorithms Based on the Significance of Cluster Size
Sunitha Cheriyan, Shaniba Ibrahim, Susan Treesa
Higher College of Technology Muscat, Sultanate of Oman
Abstract: This paper mainly focuses on the performance of the various clustering algorithm on a particular dataset based on the number of clusters defined. The analysis is performed on the iris dataset from the dataset library. It also compares the performance of the algorithms based on the number of clusters defined. The various algorithms used for the comparison includes K-Means, Hierarchical, Model based and Density based Clustering based on Statistical models.
Keywords: Clustering, K-Means, Hierarchical, Model Based, Density Based, efficiency Empirical Analysis on the Efficiency of Clustering Algorithms Based on the Significance of Cluster Size
DOI:https://doi.org/10.6025/jdim/2023/21/1/9-17
Full_Text   PDF 1.46 MB   Download:   95  times
References:

[1] Benson-Putnins, D. A. V. I. D., Bonfardin, M., Magnoni,
M. E., & Martin, D. (2011). Spectral clustering and visualization:
a novel clustering of fisher’s iris data set. SIAM
Undergraduate Research Online, 4.
[2] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996,
August). A density-based algorithm for discovering clusters
in large spatial databases with noise. In Kdd (Vol.
96, No. 34, pp. 226-231).
[3] Kumar, S., & Asger, M. Analysis Clustering Techniques
in Biological Data with R.
[4] Sembiring, R. W., Zain, J. M., & Embong, A. (2011).
A comparative agglomerative hierarchical clustering
method to cluster implemented course. arXiv preprint
arXiv:1101.4270
[5] Akogul, S., & Erisoglu, M. (2017). An Approach for
Determining the Number of Clusters in a Model-Based
Cluster Analysis. Entropy, 19(9), 452.
[6] Welton, B., Samanas, E., & Miller, B. P. (2013, November).
Mr. scan: Extreme scale density-based clustering
using a tree-based network of gpgpu nodes. In High
Performance Computing, Networking, Storage and Analy
sis (SC), 2013 International Conference for (pp. 1-11).
IEEE.
[7] Zhong, S., & Ghosh, J. (2003, May). Scalable, balanced
model-based clustering. In Proceedings of the 2003
SIAM International Conference on Data Mining (pp. 71-
82). Society for Industrial and Applied Mathematics.
[8] Huang, Q., & Zhou, F. (2017, March). Research on
retailer data clustering algorithm based on spark. In AIP
Conference Proceedings (Vol. 1820, No. 1, p. 080022).
AIP Publishing.
[9] Purkait, G., & Singh, D. (2017). An effort to optimize
the error using statistical and soft computing methodologies.
Journal of Applied Computer Science & Artificial
Intelligence, 1(1), 15-20.
[10] Soni, K. G., & Patel, A. (2017). Comparative Analysis
of K-means and K-medoids Algorithm on IRIS Data.
International Journal of Computational Intelligence Research,
13(5), 899-906.
[11] Gan, J., & Tao, Y. (2017, May). Dynamic Density
Based Clustering. In Proceedings of the 2017 ACM International
Conference on Management of Data (pp. 1493-
1507). ACM.
[12] based clustering using a tree-based network of gpgpu
nodes. In High Performance Computing, Networking, Storage
and Analysis (SC), 2013 International Conference
for (pp. 1-11). IEEE.
[13] Zhang, H., Thieling, T., Prins, S. C. B., Smith, E. P.,
& Hudy, M. (2008). Model-Based Clustering in a Brook
Trout Classification Study within the Eastern United
States. Transactions of the American Fisheries Society,
137(3), 841-851.
[14] Sugar, C. A., & James, G. M. (2003). Finding the
number of clusters in a dataset: An information-theoretic
approach. Journal of the American Statistical Association,
98(463), 750-763.
[15] Nagpal, P. B., & Mann, P. A. (2011). Comparative
study of density based clustering algorithms. International
Journal of Computer Applications, 27(11), 421-435.
[16] Mai, S. T., Assent, I., & Storgaard, M. (2016, August).
AnyDBC: an efficient anytime density-based clustering
algorithm for very large complex datasets. In Proceedings
of the 22nd ACM SIGKDD international conference
on knowledge discovery and data mining (pp. 1025-
1034). ACM.
[17] Chen, C. C., & Chen, M. S. (2015). HiClus: Highly
Scalable Densitybased Clustering with Heterogeneous
Cloud. Procedia Computer Science, 53, 149-157.