Implementations of a number of C-means algorithms.
References
| [1] | J. C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy models and algorithms for pattern recognition and image processing. Kluwer Academic Publishers, 2005. |
skcmeans.algorithms.CMeans(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: object
Base class for C-means algorithms.
| Parameters: |
|
|---|
metric¶string or function – The distance metric used. May be any of the strings specified for
cdist, or a user-specified function.
initialization¶function – The method used to initialize the cluster centers.
centers¶np.ndarray – (n_clusters, n_features)
The derived or supplied cluster centers.
memberships¶np.ndarray – (n_samples, n_clusters)
The derived or supplied cluster memberships.
converge(x)[source]¶Finds cluster centers through an alternating optimization routine.
Terminates when either the number of cycles reaches max_iter or the objective function changes by less than tol.
| Parameters: | x (np.ndarray) – (n_samples, n_features)
The original data. |
|---|
distances(x)[source]¶Calculates the distance between data x and the centers.
The distance, by default, is calculated according to metric, but this method should be overridden by subclasses if required.
| Parameters: | x (np.ndarray) – (n_samples, n_features)
The original data. |
|---|---|
| Returns: | (n_samples, n_clusters) Each entry (i, j) is the distance between sample i and cluster center j. |
| Return type: | np.ndarray |
fit(x)[source]¶Optimizes cluster centers by restarting convergence several times.
| Parameters: | x (np.ndarray) – (n_samples, n_features)
The original data. |
|---|
initialization(x, k, random_state=None, eps=1e-12)Selects initial points randomly from the data.
| Parameters: |
|
|---|---|
| Returns: |
|
metric = 'euclidean'skcmeans.algorithms.Fuzzy(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.CMeans
Base class for fuzzy C-means clusters.
m¶float – Fuzziness parameter. Higher values reduce the rate of drop-off from full membership to zero membership.
fuzzifier(memberships)[source]¶Fuzzification operator. By default, for memberships $u$ this is $u^m$.
objective(x)[source]¶Interpretable as the data’s weighted rotational inertia about the cluster centers. To be minimised.
fuzzifier(memberships)[source]m = 2objective(x)[source]skcmeans.algorithms.GustafsonKesselMixin(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.Fuzzy
Gives clusters ellipsoidal character.
The Gustafson-Kessel algorithm redefines the distance measurement such that clusters may adopt ellipsoidal shapes. This is achieved through updates to a covariance matrix assigned to each cluster center.
Examples
Create a algorithm for probabilistic clustering with ellipsoidal clusters:
>>> class ProbabilisticGustafsonKessel(GustafsonKesselMixin, Probabilistic):
>>> pass
>>> pgk = ProbabilisticGustafsonKessel()
>>> pgk.fit(x)
calculate_covariance(x)[source]¶Calculates the covariance of the data u with cluster centers v.
| Parameters: | x (np.ndarray) – (n_samples, n_features)
The original data. |
|---|---|
| Returns: | (n_clusters, n_features, n_features) The covariance matrix of each cluster. |
| Return type: | np.ndarray |
covariance = None¶skcmeans.algorithms.Hard(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.CMeans
Hard C-means, equivalent to K-means clustering.
calculate_memberships(x)[source]¶The membership of a sample is 1 to the closest cluster and 0 otherwise.
objective(x)[source]¶Interpretable as the data’s rotational inertia about the cluster centers. To be minimised.
calculate_centers(x)[source]calculate_memberships(x)[source]objective(x)[source]skcmeans.algorithms.Possibilistic(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.Fuzzy
Possibilistic C-means.
In the possibilistic algorithm, sample points are assigned memberships according to their relative proximity to the centers. This is controlled through a weighting to the cluster centers, approximately the variance of each cluster.
calculate_memberships(x)[source]¶Memberships are calculated from the distance \(d_{ij}\) between the sample \(j\) and the cluster center \(i\), and the weighting \(w_i\) of each center.
calculate_centers(x)[source]¶New centers are calculated as the mean of the points closest to them, weighted by the fuzzified memberships.
calculate_centers(x)[source]calculate_memberships(x)[source]initialization(x, k, random_state=None)¶Selects initial points using a probabilistic clustering approximation.
| Parameters: |
|
|---|---|
| Returns: |
|
skcmeans.algorithms.Probabilistic(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]¶Bases: skcmeans.algorithms.Fuzzy
Probabilistic C-means.
In the probabilistic algorithm, sample points have total membership of unity, distributed equally among each of the centers. This tends to push cluster centers away from each other.
calculate_memberships(x)[source]¶Memberships are calculated from the distance \(d_{ij}\) between the sample \(j\) and the cluster center \(i\).
calculate_centers(x)[source]¶New centers are calculated as the mean of the points closest to them, weighted by the fuzzified memberships.
calculate_centers(x)[source]calculate_memberships(x)[source]skcmeans.initialization.initialize_probabilistic(x, k, random_state=None)[source]¶Selects initial points using a probabilistic clustering approximation.
| Parameters: |
|
|---|---|
| Returns: |
|
skcmeans.initialization.initialize_random(x, k, random_state=None, eps=1e-12)[source]¶Selects initial points randomly from the data.
| Parameters: |
|
|---|---|
| Returns: |
|