Skip to content

AD and CVM tests with estimated parameters produce essentially random values #3

@PaulMTeggin

Description

@PaulMTeggin

The implementation of the AD and CVM tests where the parameters are estimated from the data appears incorrect. When repeatedly testing a fixed sample from a known distribution, a very wide range of p-values is produced from zero to one, so that the result in any particular call to the test is essentially random. The package documentation says that variable results are expected (as is reasonable given the sampling applied by Braun's method) but the results don't cluster around a particular value and span the full range from zero to one, which in practice makes the test unusable. See minimal example below:

library(goftest)

set.seed(1)

static_rnorm <- rnorm(100)

# Test the *same* random sample 1000 times - get a range of p-values from virtually zero to virtually one
ad_p_values <- replicate(1000, goftest::ad.test(x = static_rnorm, null = "pnorm", estimated = TRUE)$p.value)
summary(ad_p_values)
hist(ad_p_values)


# Test the *same* random sample 1000 times - get a range of p-values from virtually zero to virtually one
cvm_p_values <- replicate(1000, goftest::cvm.test(x = static_rnorm, null = "pnorm", estimated = TRUE)$p.value)
summary(cvm_p_values)
hist(cvm_p_values)

I have nothing to offer in relation to what might be causing this, I'm sorry. It is clearly possible to adapt the same approach used in the KScorrect package for the KS test to the AD test (i.e. using Monte Carlo to derive the distribution of the test statistic, estimating parameters in each replication) but this is very much a brute force approach and on the face of it Braun's method is more elegant. It just doesn't seem to work in this implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions