Discrete Fit Results
Once the fitting process has been completed with fit_type="discrete"
, a variety of methods and properties become available for analyzing and comparing the fitted discrete distributions. This section describes each of these resources in detail.
Global Results
1. phi.best_distribution
Provides the single best-fitting distribution, determined by two criteria:
- Highest number of passed statistical tests (among Chi-Square, Kolmogorov-Smirnov, etc. that are applicable).
- Lowest Sum of Squared Errors (SSE), used as a tiebreaker if multiple distributions pass the same number of tests.
Typedict
Structure
{
"id": str,
"parameters": { ... }
}
Usage Example
best_dist = phi.best_distribution
# best_dist -> {"id": "binomial", "parameters": {"p": 0.38, "n": 10}}
2. phi.sorted_distributions_sse
Yields a dictionary of all fitted distributions, sorted primarily by the number of tests passed (descending), and secondarily by SSE (ascending). This structure contains each distribution’s parameters, SSE, and statistical test outcomes.
Typedict[str, dict]
Usage Example
all_distributions = phi.sorted_distributions_sse
# all_distributions -> {
# "binomial": {
# "sse": 0.0123,
# "parameters": {"p": 0.38, "n": 10},
# "chi_square": {...},
# "kolmogorov_smirnov": {...},
# "n_test_passed": 2,
# "n_test_null": 0
# },
# "geometric": { ... },
# ...
# }
3. phi.not_rejected_distributions
Provides a dictionary of all distributions that have passed at least one statistical test (i.e., have not been rejected by all tests). This is a subset of sorted_distributions_sse
.
Typedict[str, dict]
Usage Example
valid_distributions = phi.not_rejected_distributions
# valid_distributions -> {
# "binomial": {
# "sse": 0.0123,
# "parameters": {...},
# "chi_square": {...},
# "kolmogorov_smirnov": {...},
# "n_test_passed": 2,
# "n_test_null": 0
# }
# }
4. phi.df_sorted_distributions_sse
Presents the same information as phi.sorted_distributions_sse
, but in a pandas.DataFrame
format for easier viewing and manipulation. Columns include distribution name, SSE, parameter strings, and test results.
Typepandas.DataFrame
Usage Example
df_sse = phi.df_sorted_distributions_sse
df_sse.head(n=5)
# Returns a DataFrame with columns for distribution,
# SSE, parameters, and test outcomes.
5. phi.df_not_rejected_distributions
Similarly presents the same information as phi.not_rejected_distributions
, but in a DataFrame format. Contains only those distributions not rejected by all tests.
Typepandas.DataFrame
Usage Example
df_valid = phi.df_not_rejected_distributions
df_valid
# Shows distributions that passed at least one statistical test.
6. phi.summarize(k: int = 20) -> pandas.DataFrame
Produces a concise table containing a selection of the top-fitting distributions. By default, this method lists up to 20 distributions (or a specified integer k
), ordered by the library’s internal selection criteria (for instance, SSE and number of tests passed).
Parameters
k (int)
: The maximum number of distributions to display. Default value is 20.
Returnspandas.DataFrame
: A compact summary of distribution names, SSE values, parameter listings, and the pass/fail status for each statistical test.
Usage Example
summary_df = phi.summarize(k=10)
summary_df
7. phi.summarize_info(k: int = 10) -> pandas.DataFrame
Provides a slightly more detailed summary of the top-fitting distributions, including more direct information on whether each test has been rejected or not.
Parameters
k (int)
: The maximum number of distributions to display. Default value is 10.
Returnspandas.DataFrame
: A table that lists each distribution’s SSE, parameters, and a boolean indicating rejection or non-rejection for each statistical test.
Usage Example
info_df = phi.summarize_info(k=5)
info_df
Results Specific Distribution
The following methods extract distribution-specific details from the fit results. Each method requires a string identifier id_distribution
matching the target distribution (e.g., "binomial"
, "geometric"
, etc.). If a distribution identifier is not present in the fitted results, an exception is raised.
1. phi.get_parameters(id_distribution: str) -> dict
Retrieves the fitted parameters for a specific distribution.
phi.get_parameters("binomial")
# -> {"p": 0.38, "n": 10}
2. phi.get_test_chi_square(id_distribution: str) -> dict
Returns the Chi-Square test results for the specified distribution. The dictionary typically includes:
test_statistic
critical_value
p_value
rejected
chi_result = phi.get_test_chi_square("binomial")
# chi_result -> {
# "test_statistic": ...,
# "critical_value": ...,
# "p_value": ...,
# "rejected": False
# }
3. phi.get_test_kolmogorov_smirnov(id_distribution: str) -> dict
Obtains the Kolmogorov-Smirnov test results for the distribution. The returned dictionary follows the same structure as the Chi-Square results (test statistic, critical value, p-value, rejection status).
ks_result = phi.get_test_kolmogorov_smirnov("binomial")
# ks_result -> {
# "test_statistic": ...,
# "critical_value": ...,
# "p_value": ...,
# "rejected": False
# }
4. phi.get_test_anderson_darling(id_distribution: str) -> dict
Retrieves the Anderson-Darling test results for a given distribution, if applicable. In many discrete-fitting scenarios, this test may not be available or may return a None
-based structure if not supported in the current implementation.
ad_result = phi.get_test_anderson_darling("binomial")
# ad_result -> {
# "test_statistic": None,
# "critical_value": None,
# "p_value": None,
# "rejected": None
# }
# (Depending on the distribution and whether the AD test is implemented for discrete fits.)
5. phi.get_sse(id_distribution: str) -> float
Provides the Sum of Squared Errors (SSE) calculated between the empirical frequencies and the distribution’s probability mass function (PMF).
binomial_sse = phi.get_sse("binomial")
# -> 0.0123
6. phi.get_n_test_passed(id_distribution: str) -> int
Indicates how many statistical tests (out of the ones performed) were not rejected for the given distribution.
phi.get_n_test_passed("binomial")
# -> 2 # means 2 tests did not reject the distribution
7. phi.get_n_test_null(id_distribution: str) -> int
Reports how many statistical tests returned a null or indeterminate result for the specified distribution.
phi.get_n_test_null("binomial")
# -> 0 # means 0 tests were inconclusive for that distribution
Additional Notes
- The default discrete fitting process includes the Chi-Square test and the Kolmogorov-Smirnov test. The Anderson-Darling test is part of the code interface but may be unsupported for certain discrete distributions.
- If fitting fails or if no distributions pass the set criteria, the outputs for certain methods or properties (such as
df_sorted_distributions_sse
orbest_distribution
) might be empty or raise exceptions.
Example Usage in a Discrete Setting
import phitter
# Define the dataset (discrete values)
data = [0, 1, 1, 2, 5, 3, 3, 3, 10, 10]
# Create and fit a discrete Phitter instance
phi = phitter.Phitter(
data=data,
fit_type="discrete",
distributions_to_fit=["binomial", "geometric"],
)
phi.fit(n_workers=2)
# Retrieve the best distribution
best_dist_info = phi.best_distribution
# Summarize top results
summary_table = phi.summarize(k=5)
summary_details = phi.summarize_info(k=5)
# Access methods for a specific distribution
binomial_params = phi.get_parameters("binomial")
binomial_chi = phi.get_test_chi_square("binomial")
binomial_ks = phi.get_test_kolmogorov_smirnov("binomial")
binomial_sse = phi.get_sse("binomial")
This concludes the reference for examining discrete fit results within Phitter. Each of these methods and properties is designed to facilitate rigorous, academic-style analysis of the fit quality, distribution parameters, and statistical test outcomes.