skcmeans.algorithms Module

Implementations of a number of C-means algorithms.


[1]J. C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy models and algorithms for pattern recognition and image processing. Kluwer Academic Publishers, 2005.
class skcmeans.algorithms.CMeans(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]

Base class for C-means algorithms.

  • n_clusters (int, optional) – The number of clusters to find.
  • n_init (int, optional) – The number of times to attempt convergence with new initial centroids.
  • max_iter (int, optional) – The number of cycles of the alternating optimization routine to run for each convergence.
  • tol (float, optional) – The stopping condition. Convergence is considered to have been reached when the objective function changes less than tol.
  • verbosity (int, optional) –

    The verbosity of the instance. May be 0, 1, or 2.


    Very much not yet implemented.

  • random_state (int or np.random.RandomState, optional) – The generator used for initialization. Using an integer fixes the seed.
  • eps (float, optional) – To avoid numerical errors, zeros are sometimes replaced with a very small number, specified here.

string or function – The distance metric used. May be any of the strings specified for cdist, or a user-specified function.


function – The method used to initialize the cluster centers.


np.ndarray – (n_clusters, n_features) The derived or supplied cluster centers.


np.ndarray – (n_samples, n_clusters) The derived or supplied cluster memberships.


Finds cluster centers through an alternating optimization routine.

Terminates when either the number of cycles reaches max_iter or the objective function changes by less than tol.

Parameters:x (np.ndarray) – (n_samples, n_features) The original data.

Calculates the distance between data x and the centers.

The distance, by default, is calculated according to metric, but this method should be overridden by subclasses if required.

Parameters:x (np.ndarray) – (n_samples, n_features) The original data.
Returns:(n_samples, n_clusters) Each entry (i, j) is the distance between sample i and cluster center j.
Return type:np.ndarray

Optimizes cluster centers by restarting convergence several times.

Parameters:x (np.ndarray) – (n_samples, n_features) The original data.
static initialization(x, k, random_state=None, eps=1e-12)

Selects initial points randomly from the data.

  • x (np.ndarray) – (n_samples, n_features) The original data.
  • k (int) – The number of points to select.
  • random_state (int or np.random.RandomState, optional) – The generator used for initialization. Using an integer fixes the seed.

  • Unitialized memberships
  • selection (np.ndarray) – (k, n_features) A length-k subset of the original data.


Updates cluster memberships and centers in a single cycle.

If the cluster centers have not already been initialized, they are chosen according to initialization.

Parameters:x (np.ndarray) – (n_samples, n_features) The original data.
class skcmeans.algorithms.Fuzzy(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]

Base class for fuzzy C-means clusters.


float – Fuzziness parameter. Higher values reduce the rate of drop-off from full membership to zero membership.


Fuzzification operator. By default, for memberships $u$ this is $u^m$.


Interpretable as the data’s weighted rotational inertia about the cluster centers. To be minimised.

class skcmeans.algorithms.GustafsonKesselMixin(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]

Gives clusters ellipsoidal character.

The Gustafson-Kessel algorithm redefines the distance measurement such that clusters may adopt ellipsoidal shapes. This is achieved through updates to a covariance matrix assigned to each cluster center.


Create a algorithm for probabilistic clustering with ellipsoidal clusters:

>>> class ProbabilisticGustafsonKessel(GustafsonKesselMixin, Probabilistic):
>>>     pass
>>> pgk = ProbabilisticGustafsonKessel()

Calculates the covariance of the data u with cluster centers v.

Parameters:x (np.ndarray) – (n_samples, n_features) The original data.
Returns:(n_clusters, n_features, n_features) The covariance matrix of each cluster.
Return type:np.ndarray

Optimizes cluster centers by restarting convergence several times.

Extends the default behaviour by recalculating the covariance matrix with resultant memberships and centers.

Parameters:x (np.ndarray) – (n_samples, n_features) The original data.

Single update of the cluster algorithm.

Extends the default behaviour by including a covariance calculation after updating the centers

Parameters:x (np.ndarray) – (n_samples, n_features) The original data.
class skcmeans.algorithms.Hard(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]

Hard C-means, equivalent to K-means clustering.


The membership of a sample is 1 to the closest cluster and 0 otherwise.


New centers are calculated as the mean of the points closest to them.


Interpretable as the data’s rotational inertia about the cluster centers. To be minimised.

class skcmeans.algorithms.Possibilistic(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]

Possibilistic C-means.

In the possibilistic algorithm, sample points are assigned memberships according to their relative proximity to the centers. This is controlled through a weighting to the cluster centers, approximately the variance of each cluster.


Memberships are calculated from the distance \(d_{ij}\) between the sample \(j\) and the cluster center \(i\), and the weighting \(w_i\) of each center.

\[u_{ik} = \left(1 + \left(\frac{d_{ik}}{w_i}\right)^\frac{1}{m -1} \right)^{-1}\]

New centers are calculated as the mean of the points closest to them, weighted by the fuzzified memberships.

\[c_i = \left. \sum_k u_{ik}^m x_k \middle/ \sum_k u_{ik} \right.\]
static initialization(x, k, random_state=None)

Selects initial points using a probabilistic clustering approximation.

  • x (np.ndarray) – (n_samples, n_features) The original data.
  • k (int) – The number of points to select.
  • random_state (int or np.random.RandomState, optional) – The generator used for initialization. Using an integer fixes the seed.

  • np.ndarray – (n_samples, k) Cluster memberships
  • np.ndarray – (k, n_features) Cluster centers

class skcmeans.algorithms.Probabilistic(n_clusters=2, n_init=10, max_iter=300, tol=0.0001, verbosity=0, random_state=None, eps=1e-18, **kwargs)[source]

Probabilistic C-means.

In the probabilistic algorithm, sample points have total membership of unity, distributed equally among each of the centers. This tends to push cluster centers away from each other.


Memberships are calculated from the distance \(d_{ij}\) between the sample \(j\) and the cluster center \(i\).

\[u_{ik} = \left(\sum_j \left( \frac{d_{ik}}{d_{jk}} \right)^{\frac{2}{m - 1}} \right)^{-1}\]

New centers are calculated as the mean of the points closest to them, weighted by the fuzzified memberships.

\[c_i = \left. \sum_k u_{ik}^m x_k \middle/ \sum_k u_{ik} \right.\]