Implementations of a number of C-means algorithms.
References
[1] | J. C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy models and algorithms for pattern recognition and image processing. Kluwer Academic Publishers, 2005. |
skcmeans.algorithms.
CMeans
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: object
Base class for C-means algorithms.
Parameters: |
|
---|
metric
¶string
or function
– The distance metric used. May be any of the strings specified for
cdist
, or a user-specified function.
initialization
¶function – The method used to initialize the cluster centers.
centers
¶np.ndarray
– (n_clusters, n_features)
The derived or supplied cluster centers.
memberships
¶np.ndarray
– (n_samples, n_clusters)
The derived or supplied cluster memberships.
converge
(x)[source]¶Finds cluster centers through an alternating optimization routine.
Terminates when either the number of cycles reaches max_iter or the objective function changes by less than tol.
Parameters: | x (np.ndarray ) – (n_samples, n_features)
The original data. |
---|
distances
(x)[source]¶Calculates the distance between data x and the centers.
The distance, by default, is calculated according to metric, but this method should be overridden by subclasses if required.
Parameters: | x (np.ndarray ) – (n_samples, n_features)
The original data. |
---|---|
Returns: | (n_samples, n_clusters) Each entry (i, j) is the distance between sample i and cluster center j. |
Return type: | np.ndarray |
fit
(x)[source]¶Optimizes cluster centers by restarting convergence several times.
Parameters: | x (np.ndarray ) – (n_samples, n_features)
The original data. |
---|
initialization
(x, k, random_state=None, eps=1e-12)Selects initial points randomly from the data.
Parameters: |
|
---|---|
Returns: |
|
metric
= 'euclidean'skcmeans.algorithms.
Fuzzy
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.CMeans
Base class for fuzzy C-means clusters.
m
¶float – Fuzziness parameter. Higher values reduce the rate of drop-off from full membership to zero membership.
fuzzifier
(memberships)[source]¶Fuzzification operator. By default, for memberships $u$ this is $u^m$.
objective
(x)[source]¶Interpretable as the data’s weighted rotational inertia about the cluster centers. To be minimised.
fuzzifier
(memberships)[source]m
= 2objective
(x)[source]skcmeans.algorithms.
GustafsonKesselMixin
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.Fuzzy
Gives clusters ellipsoidal character.
The Gustafson-Kessel algorithm redefines the distance measurement such that clusters may adopt ellipsoidal shapes. This is achieved through updates to a covariance matrix assigned to each cluster center.
Examples
Create a algorithm for probabilistic clustering with ellipsoidal clusters:
>>> class ProbabilisticGustafsonKessel(GustafsonKesselMixin, Probabilistic):
>>> pass
>>> pgk = ProbabilisticGustafsonKessel()
>>> pgk.fit(x)
calculate_covariance
(x)[source]¶Calculates the covariance of the data u with cluster centers v.
Parameters: | x (np.ndarray ) – (n_samples, n_features)
The original data. |
---|---|
Returns: | (n_clusters, n_features, n_features) The covariance matrix of each cluster. |
Return type: | np.ndarray |
covariance
= None¶skcmeans.algorithms.
Hard
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.CMeans
Hard C-means, equivalent to K-means clustering.
calculate_memberships
(x)[source]¶The membership of a sample is 1 to the closest cluster and 0 otherwise.
objective
(x)[source]¶Interpretable as the data’s rotational inertia about the cluster centers. To be minimised.
calculate_centers
(x)[source]calculate_memberships
(x)[source]objective
(x)[source]skcmeans.algorithms.
Possibilistic
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.Fuzzy
Possibilistic C-means.
In the possibilistic algorithm, sample points are assigned memberships according to their relative proximity to the centers. This is controlled through a weighting to the cluster centers, approximately the variance of each cluster.
calculate_memberships
(x)[source]¶Memberships are calculated from the distance \(d_{ij}\) between the sample \(j\) and the cluster center \(i\), and the weighting \(w_i\) of each center.
calculate_centers
(x)[source]¶New centers are calculated as the mean of the points closest to them, weighted by the fuzzified memberships.
calculate_centers
(x)[source]calculate_memberships
(x)[source]initialization
(x, k, random_state=None)¶Selects initial points using a probabilistic clustering approximation.
Parameters: |
|
---|---|
Returns: |
|
skcmeans.algorithms.
Probabilistic
(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.Fuzzy
Probabilistic C-means.
In the probabilistic algorithm, sample points have total membership of unity, distributed equally among each of the centers. This tends to push cluster centers away from each other.
calculate_memberships
(x)[source]¶Memberships are calculated from the distance \(d_{ij}\) between the sample \(j\) and the cluster center \(i\).
calculate_centers
(x)[source]¶New centers are calculated as the mean of the points closest to them, weighted by the fuzzified memberships.
calculate_centers
(x)[source]calculate_memberships
(x)[source]skcmeans.initialization.
initialize_probabilistic
(x, k, random_state=None)[source]¶Selects initial points using a probabilistic clustering approximation.
Parameters: |
|
---|---|
Returns: |
|
skcmeans.initialization.
initialize_random
(x, k, random_state=None, eps=1e-12)[source]¶Selects initial points randomly from the data.
Parameters: |
|
---|---|
Returns: |
|