As far as I can tell, kmeans currently always does only one run of K-means from one initialization. However, it can sometimes be quite helpful to do multiple runs of K-means (i.e., run K-means from multiple initializations) then take the best (to avoid bad local minima).
I currently handle this by defining a batch version in my research codes, something like the following:
using Clustering, ProgressLogging
function batchkmeans(X, k, args...; nruns=100, kwargs...)
runs = @withprogress map(1:nruns) do idx
# Run K-means
Random.seed!(idx) # set seed for reproducibility
result = with_logger(NullLogger()) do
kmeans(X, k, args...; kwargs...)
end
# Log progress and return result
@logprogress idx/nruns
return result
end
# Print how many converged
nconverged = count(run -> run.converged, runs)
@info "$nconverged/$nruns runs converged"
# Return runs sorted best to worst
return sort(runs; by=run->run.totalcost)
end
I think it'd be great to have something like this functionality (not necessarily how I've done it above) built into kmeans!
For reference, scikit-learn provides an n_init argument for their K-means implementation with a default value of 1 when using k-means++ to initialize and 10 when using a random initialization. See here: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
If there's interest, I'd be happy to put together a PR!
As far as I can tell,
kmeanscurrently always does only one run of K-means from one initialization. However, it can sometimes be quite helpful to do multiple runs of K-means (i.e., run K-means from multiple initializations) then take the best (to avoid bad local minima).I currently handle this by defining a batch version in my research codes, something like the following:
I think it'd be great to have something like this functionality (not necessarily how I've done it above) built into
kmeans!For reference, scikit-learn provides an
n_initargument for their K-means implementation with a default value of1when using k-means++ to initialize and10when using a random initialization. See here: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.htmlIf there's interest, I'd be happy to put together a PR!