Add option to use multiple inits in kmeans?

As far as I can tell, `kmeans` currently always does only one run of K-means from one initialization. However, it can sometimes be quite helpful to do multiple runs of K-means (i.e., run K-means from multiple initializations) then take the best (to avoid bad local minima).

I currently handle this by defining a batch version in my research codes, something like the following:
```julia
using Clustering, ProgressLogging
function batchkmeans(X, k, args...; nruns=100, kwargs...)
    runs = @withprogress map(1:nruns) do idx
        # Run K-means
        Random.seed!(idx)  # set seed for reproducibility
        result = with_logger(NullLogger()) do
            kmeans(X, k, args...; kwargs...)
        end

        # Log progress and return result
        @logprogress idx/nruns
        return result
    end

    # Print how many converged
    nconverged = count(run -> run.converged, runs)
    @info "$nconverged/$nruns runs converged"

    # Return runs sorted best to worst
    return sort(runs; by=run->run.totalcost)
end
```

I think it'd be great to have something like this functionality (not necessarily how I've done it above) built into `kmeans`!

For reference, scikit-learn provides an `n_init` argument for their K-means implementation with a default value of `1` when using k-means++ to initialize and `10` when using a random initialization. See here: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

If there's interest, I'd be happy to put together a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to use multiple inits in kmeans? #291

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add option to use multiple inits in kmeans? #291

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions