معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Clustering C Index

ویژگی	مقدار
سیستم عامل	-
نام فایل	c-index-0.0.2
نام	c-index
نسخه کتابخانه	0.0.2
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	John Vorsten
ایمیل نویسنده	vorstenjohn@gmail.com
آدرس صفحه اصلی	https://github.com/johnvorsten/py_cindex
آدرس اینترنتی	https://pypi.org/project/c-index/
مجوز	MIT

## Python native implementation of C Index described by Hubert & Levin ### Description Calculates Hubert & Levin C index - internal cluster quality index The C Index was described by Hubert & Levin (1976) in “A General Statistical Framework for Assessing Categorical Clustering in Free Recall.” Psychological Bulletin, 83(6), 1072–1080. Its purpose is to compare the dispersion of clusters of data relative to the total dispersion in a dataset. Ideally, the value of the number of clusters that minimizes the C Index will also be the optimal number of clusters to partition a dataset. ### More Description The C Index is calculated as `cindex = (Sw - Smin) / (Smax - Smin)` Sw is the sum of within-cluster distance measurements (only point-wise combinations of data are summed within each cluster - not between clusters) Smin is the sum of the Nw smallest point-wise distances between points within the entire dataset Smax is the sum of the nw largest point-wise distances between points within the entire dataset Nw is the total number of pairs of observations belonging to the same cluster. It is the same as total combinations of points within clusters taken two at a time Installation ```bash pip install c_index ``` Example usage ```python from c_index import (calc_c_index, calc_cindex_clusterSim_implementation, calc_cindex_nbclust_implementation, pdist_array) xs = np.array([[1,2,1.5,1.75,1.33,0.88], [5,5.5,4.88,6.33,5.01,4.95]]) # Cluster 1, 2 x values ys = np.array([[8.9,8.5,7.89,8.25,8.85,8.29], [1.25,1.14,1.85,0.85,0.79,0.96]]) # Cluster 1,2 y values X = np.stack((xs.ravel(),ys.ravel()), axis=1) cluster_labels = np.array([0,0,0,0,0,0,1,1,1,1,1,1]) cindex = calc_c_index(X, cluster_labels) # nbclust implementation takes an array of pointwise differences distances_array = pdist_array(X) # NbClust C Index python equivalent cindex = calc_cindex_nbclust_implementation(distances_array, cluster_labels) # clusterSim C Index python equivalent cindex = calc_cindex_clusterSim_implementation(distances_array, cluster_labels) ``` The C Index is already implemented in a number of R Packages, including [clusterSim](https://cran.r-project.org/web/packages/clusterSim/clusterSim.pdf), [clusterCrit](https://cran.r-project.org/web/packages/clusterCrit/clusterCrit.pdf), and [NbClust](https://www.rdocumentation.org/packages/NbClust/versions/3.0/topics/NbClust) This packages implementation differs from these packages in a couple ways. ### ClusterSim differences Note these two lines (R Code) : ```r Dmin=sum(sort(ddist)[1:r]) Dmax=sum(sort(ddist,decreasing = T)[1:r]) ``` THey include the whole distance array, which includes all permutations of distances between points (instead of combinations). This means the high end and low end are double counted. I don't think that is the correct way to calculate C Index, but maybe they have a specific reason for doing so. ## NbClust differences Note these lines (R Code) : ```r Dmin = min(v_min) Dmax = max(v_max) result <- (DU - r * Dmin)/(Dmax * r - Dmin * r) ``` Instead of sorting and calculating the Nw minimum and maximum distance measurements between points, they simply multiply 'r' by the minimum and maximum values of distances. I'm not sure what their motivation was for doing that, but sometimes it does give a better estimate of the number of clusters than the described method above.

نیازمندی

مقدار	نام
-	pandas
-	numpy
-	scipy
-	scikit-learn

زبان مورد نیاز

مقدار	نام
>=3	Python

نحوه نصب

نصب پکیج whl c-index-0.0.2:

pip install c-index-0.0.2.whl

نصب پکیج tar.gz c-index-0.0.2:

pip install c-index-0.0.2.tar.gz