skcmeans.algorithms Module¶Implementations of a number of C-means algorithms.
References
| [1] | J. C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy models and algorithms for pattern recognition and image processing. Kluwer Academic Publishers, 2005. |
skcmeans.algorithms.CMeans(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Base class for C-means algorithms.
| Parameters: |
|
|---|
metric¶string or function – The distance metric used. May be any of the strings specified for
cdist, or a user-specified function.
initialization¶function – The method used to initialize the cluster centers.
centers¶np.ndarray – (n_clusters, n_features)
The derived or supplied cluster centers.
memberships¶np.ndarray – (n_samples, n_clusters)
The derived or supplied cluster memberships.
converge(x)[source]¶Finds cluster centers through an alternating optimization routine.
Terminates when either the number of cycles reaches max_iter or the objective function changes by less than tol.
| Parameters: | x (np.ndarray) – (n_samples, n_features)
The original data. |
|---|
distances(x)[source]¶Calculates the distance between data x and the centers.
The distance, by default, is calculated according to metric, but this method should be overridden by subclasses if required.
| Parameters: | x (np.ndarray) – (n_samples, n_features)
The original data. |
|---|---|
| Returns: | (n_samples, n_clusters) Each entry (i, j) is the distance between sample i and cluster center j. |
| Return type: | np.ndarray |
fit(x)[source]¶Optimizes cluster centers by restarting convergence several times.
| Parameters: | x (np.ndarray) – (n_samples, n_features)
The original data. |
|---|
initialization(x, k, random_state=None, eps=1e-12)Selects initial points randomly from the data.
| Parameters: |
|
|---|---|
| Returns: |
|
skcmeans.algorithms.Fuzzy(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Base class for fuzzy C-means clusters.
m¶float – Fuzziness parameter. Higher values reduce the rate of drop-off from full membership to zero membership.
skcmeans.algorithms.GustafsonKesselMixin(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Gives clusters ellipsoidal character.
The Gustafson-Kessel algorithm redefines the distance measurement such that clusters may adopt ellipsoidal shapes. This is achieved through updates to a covariance matrix assigned to each cluster center.
Examples
Create a algorithm for probabilistic clustering with ellipsoidal clusters:
>>> class ProbabilisticGustafsonKessel(GustafsonKesselMixin, Probabilistic):
>>> pass
>>> pgk = ProbabilisticGustafsonKessel()
>>> pgk.fit(x)
calculate_covariance(x)[source]¶Calculates the covariance of the data u with cluster centers v.
| Parameters: | x (np.ndarray) – (n_samples, n_features)
The original data. |
|---|---|
| Returns: | (n_clusters, n_features, n_features) The covariance matrix of each cluster. |
| Return type: | np.ndarray |
skcmeans.algorithms.Hard(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Hard C-means, equivalent to K-means clustering.
skcmeans.algorithms.Possibilistic(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Possibilistic C-means.
In the possibilistic algorithm, sample points are assigned memberships according to their relative proximity to the centers. This is controlled through a weighting to the cluster centers, approximately the variance of each cluster.
calculate_memberships(x)[source]¶Memberships are calculated from the distance \(d_{ij}\) between the sample \(j\) and the cluster center \(i\), and the weighting \(w_i\) of each center.
calculate_centers(x)[source]¶New centers are calculated as the mean of the points closest to them, weighted by the fuzzified memberships.
initialization(x, k, random_state=None)¶Selects initial points using a probabilistic clustering approximation.
| Parameters: |
|
|---|---|
| Returns: |
|
skcmeans.algorithms.Probabilistic(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Probabilistic C-means.
In the probabilistic algorithm, sample points have total membership of unity, distributed equally among each of the centers. This tends to push cluster centers away from each other.