Kolmogorov-Smirnov Goodness-of-Fit Test ​
The Kolmogorov-Smirnov (K-S) test is a non-parametric statistical test designed to assess whether a dataset follows a specified continuous probability distribution. It compares the empirical distribution function (EDF) of a sample to the cumulative distribution function (CDF) of the hypothesized distribution.
Test Procedure ​
The Kolmogorov-Smirnov test statistic is defined as:
where:
is the empirical cumulative distribution function calculated from the data. is the cumulative distribution function of the specified theoretical distribution. is the sample size.
Hypothesis Testing ​
The hypotheses tested by the Kolmogorov-Smirnov test are:
- Null hypothesis
: The dataset follows the specified distribution. - Alternative hypothesis
: The dataset does not follow the specified distribution.
The decision rule for the Kolmogorov-Smirnov test is as follows:
Critical value approach: Reject
if the test statistic is greater than the critical value derived from the K-S distribution at the given significance level (typically ). p-value approach: Reject
if the calculated p-value is less than the chosen significance level ( ).
Interpretation ​
- Rejected (True): There is a significant difference between the empirical and theoretical distributions; the null hypothesis is rejected.
- Not Rejected (False): The empirical distribution does not significantly differ from the theoretical distribution; the null hypothesis is accepted.
Implementation in Phitter ​
In the Phitter library, the Kolmogorov-Smirnov test is implemented through the method:
phi.get_test_kolmogorov_smirnov(id_distribution: str) -> dict
Returned values: ​
test_statistic
: The Kolmogorov-Smirnov test statistic.critical_value
: The critical value corresponding to the Kolmogorov-Smirnov distribution at the specified significance level.p_value
: The p-value associated with the computed test statistic.rejected
: Boolean indicating whether the null hypothesis is rejected (True
) or not (False
).
Example Usage ​
import phitter
# Define dataset
data = [...]
# Fit distributions
phi = phitter.Phitter(data)
phi.fit()
# Kolmogorov-Smirnov test for a specific distribution, e.g., "normal"
ks_results = phi.get_test_kolmogorov_smirnov("normal")
print(ks_results)
This will return a dictionary containing the Kolmogorov-Smirnov test statistic, critical value, p-value, and the rejection status of the null hypothesis.
Output:
{
"test_statistic": 4.621,
"critical_value": 11.070,
"p_value": 0.795,
"rejected": False
}
Apply Kolmogorov Smirnov test to single distribution ​
Continous case ​
import phitter
# Define dataset
data = [...]
# Continous measures
continuous_measures = phitter.continuous.ContinuousMeasures(data)
# Define distribution instance
distribution_inst = phitter.continuous.Normal(continuous_measures=continuous_measures)
# Get test result
ks_results = phitter.continuous.evaluate_continuous_test_kolmogorov_smirnov(distribution_inst, continuous_measures)
Discrete case ​
import phitter
# Define dataset
data = [...]
# Discrete measures
discrete_measures = phitter.discrete.DiscreteMeasures(data)
# Define distribution instance
distribution_inst = phitter.discrete.Binomial(discrete_measures=discrete_measures)
# Get test result
ks_results = phitter.discrete.evaluate_discrete_test_kolmogorov_smirnov(distribution_inst, discrete_measures)