skcmeans.algorithms
Module¶Implementations of a number of C-means algorithms.
References
[1] | J. C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy models and algorithms for pattern recognition and image processing. Kluwer Academic Publishers, 2005. |
skcmeans.algorithms.
CMeans
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Base class for C-means algorithms.
Parameters: |
|
---|
metric
¶string
or function
– The distance metric used. May be any of the strings specified for
cdist
, or a user-specified function.
initialization
¶function – The method used to initialize the cluster centers.
centers
¶np.ndarray
– (n_clusters, n_features)
The derived or supplied cluster centers.
memberships
¶np.ndarray
– (n_samples, n_clusters)
The derived or supplied cluster memberships.
converge
(x)[source]¶Finds cluster centers through an alternating optimization routine.
Terminates when either the number of cycles reaches max_iter or the objective function changes by less than tol.
Parameters: | x (np.ndarray ) – (n_samples, n_features)
The original data. |
---|
distances
(x)[source]¶Calculates the distance between data x and the centers.
The distance, by default, is calculated according to metric, but this method should be overridden by subclasses if required.
Parameters: | x (np.ndarray ) – (n_samples, n_features)
The original data. |
---|---|
Returns: | (n_samples, n_clusters) Each entry (i, j) is the distance between sample i and cluster center j. |
Return type: | np.ndarray |
fit
(x)[source]¶Optimizes cluster centers by restarting convergence several times.
Parameters: | x (np.ndarray ) – (n_samples, n_features)
The original data. |
---|
initialization
(x, k, random_state=None, eps=1e-12)Selects initial points randomly from the data.
Parameters: |
|
---|---|
Returns: |
|
skcmeans.algorithms.
Fuzzy
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Base class for fuzzy C-means clusters.
m
¶float – Fuzziness parameter. Higher values reduce the rate of drop-off from full membership to zero membership.
skcmeans.algorithms.
GustafsonKesselMixin
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Gives clusters ellipsoidal character.
The Gustafson-Kessel algorithm redefines the distance measurement such that clusters may adopt ellipsoidal shapes. This is achieved through updates to a covariance matrix assigned to each cluster center.
Examples
Create a algorithm for probabilistic clustering with ellipsoidal clusters:
>>> class ProbabilisticGustafsonKessel(GustafsonKesselMixin, Probabilistic):
>>> pass
>>> pgk = ProbabilisticGustafsonKessel()
>>> pgk.fit(x)
calculate_covariance
(x)[source]¶Calculates the covariance of the data u with cluster centers v.
Parameters: | x (np.ndarray ) – (n_samples, n_features)
The original data. |
---|---|
Returns: | (n_clusters, n_features, n_features) The covariance matrix of each cluster. |
Return type: | np.ndarray |
skcmeans.algorithms.
Hard
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Hard C-means, equivalent to K-means clustering.
skcmeans.algorithms.
Possibilistic
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Possibilistic C-means.
In the possibilistic algorithm, sample points are assigned memberships according to their relative proximity to the centers. This is controlled through a weighting to the cluster centers, approximately the variance of each cluster.
calculate_memberships
(x)[source]¶Memberships are calculated from the distance \(d_{ij}\) between the sample \(j\) and the cluster center \(i\), and the weighting \(w_i\) of each center.
calculate_centers
(x)[source]¶New centers are calculated as the mean of the points closest to them, weighted by the fuzzified memberships.
initialization
(x, k, random_state=None)¶Selects initial points using a probabilistic clustering approximation.
Parameters: |
|
---|---|
Returns: |
|
skcmeans.algorithms.
Probabilistic
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Probabilistic C-means.
In the probabilistic algorithm, sample points have total membership of unity, distributed equally among each of the centers. This tends to push cluster centers away from each other.