Kolmogorov-Smirnov Goodness-of-Fit Test
The Kolmogorov-Smirnov (K-S) test is a non-parametric statistical test designed to assess whether a dataset follows a specified continuous probability distribution. It compares the empirical distribution function (EDF) of a sample to the cumulative distribution function (CDF) of the hypothesized distribution.
Test Procedure
The Kolmogorov-Smirnov test statistic is defined as:
where:
is the empirical cumulative distribution function calculated from the data. is the cumulative distribution function of the specified theoretical distribution. is the sample size.
Hypothesis Testing
The hypotheses tested by the Kolmogorov-Smirnov test are:
- Null hypothesis
: The dataset follows the specified distribution. - Alternative hypothesis
: The dataset does not follow the specified distribution.
The decision rule for the Kolmogorov-Smirnov test is as follows:
Critical value approach: Reject
if the test statistic is greater than the critical value derived from the K-S distribution at the given significance level (typically ). p-value approach: Reject
if the calculated p-value is less than the chosen significance level ( ).
Interpretation
- Rejected (True): There is a significant difference between the empirical and theoretical distributions; the null hypothesis is rejected.
- Not Rejected (False): The empirical distribution does not significantly differ from the theoretical distribution; the null hypothesis is accepted.
Implementation in Phitter
In the Phitter library, the Kolmogorov-Smirnov test is implemented through the method:
phi.get_test_kolmogorov_smirnov(id_distribution: str) -> dict
Returned values:
test_statistic
: The Kolmogorov-Smirnov test statistic.critical_value
: The critical value corresponding to the Kolmogorov-Smirnov distribution at the specified significance level.p_value
: The p-value associated with the computed test statistic.rejected
: Boolean indicating whether the null hypothesis is rejected (True
) or not (False
).
Example Usage
import phitter
# Define dataset
data = [...]
# Fit distributions
phi = phitter.Phitter(data)
phi.fit()
# Kolmogorov-Smirnov test for a specific distribution, e.g., "normal"
ks_results = phi.get_test_kolmogorov_smirnov("normal")
print(ks_results)
This will return a dictionary containing the Kolmogorov-Smirnov test statistic, critical value, p-value, and the rejection status of the null hypothesis.
Output:
{
"test_statistic": 4.621,
"critical_value": 11.070,
"p_value": 0.795,
"rejected": False
}
Apply Kolmogorov Smirnov test to single distribution
Continous case
import phitter
# Define dataset
data = [...]
# Continous measures
continuous_measures = phitter.continuous.ContinuousMeasures(data)
# Define distribution instance
distribution_inst = phitter.continuous.Normal(continuous_measures=continuous_measures)
# Get test result
ks_results = phitter.continuous.evaluate_continuous_test_kolmogorov_smirnov(distribution_inst, continuous_measures)
Discrete case
import phitter
# Define dataset
data = [...]
# Discrete measures
discrete_measures = phitter.discrete.DiscreteMeasures(data)
# Define distribution instance
distribution_inst = phitter.discrete.Binomial(discrete_measures=discrete_measures)
# Get test result
ks_results = phitter.discrete.evaluate_discrete_test_kolmogorov_smirnov(distribution_inst, discrete_measures)