ECDF vs. Theoretical Distribution ​
The Empirical Cumulative Distribution Function (ECDF) is a statistical tool used to estimate the cumulative probability distribution of an observed dataset. It provides a stepwise function that indicates the proportion of data points less than or equal to a specific value.
Key Characteristics ​
- The ECDF explicitly represents individual data points, distinguishing it from histograms, which aggregate data into bins.
- It is a non-decreasing function constrained within the range
. - It is particularly useful for examining distributional characteristics in small-to-medium-sized datasets.
Mathematical Definition ​
Given a dataset of
Each step in the ECDF plot corresponds to a unique data point within the dataset.
Comparison with a Theoretical Distribution ​
To compare the ECDF of an empirical dataset with the cumulative distribution function (CDF) of a fitted theoretical distribution, the method .plot_ecdf_distribution() is utilized. The parameters for this method are detailed below.
Parameters ​
General Parameters ​
id_distribution(str): Identifier of the theoretical distribution against which the empirical data is compared. A full list of available distributions can be found in the distributions documentation.plot_title(str, optional): Title of the plot. Default is"ECDF".plot_xaxis_title(str, optional): Label for the x-axis. Default is"Domain".plot_yaxis_title(str, optional): Label for the y-axis. Default is"Cumulative Distribution Function".plot_xaxis_min_offset(float, optional): Offset from the minimum domain value on the x-axis. Default is0.3.plot_xaxis_max_offset(float, optional): Offset from the maximum domain value on the x-axis. Default is0.3.plot_legend_title(str | None, optional): Title for the legend. If set toNone, no title is displayed.plot_height(int, optional): Height of the plot in pixels. Default is400.plot_width(int, optional): Width of the plot in pixels. Default is600.
Empirical Distribution Customization ​
plot_empirical_line_color(str, optional): Color of the ECDF line in RGBA format. Default is"rgba(128,128,128,1)".plot_empirical_line_width(int, optional): Line width of the ECDF. Default is4.plot_empirical_line_name(str, optional): Legend label for the ECDF. Default is"Empirical Distribution".plot_empirical_bar_color(str, optional): Color of the ECDF bars in RGBA format. Default is"rgba(128,128,128,1)".
Theoretical Distribution Customization ​
plot_distribution_line_color(str, optional): Color of the theoretical distribution line in RGBA format. Default is"rgba(255,0,0,1)".plot_distribution_line_width(int, optional): Line width of the theoretical distribution. Default is2.
Rendering Options ​
plotly_plot_renderer("png" | "jpeg" | "svg" | None, optional): Export format when using Plotly. IfNone, Plotly's default renderer is used.plot_engine("plotly" | "matplotlib", optional): Specifies the plotting library to use. The default is"plotly".
Default Usage ​
To generate a basic ECDF plot comparing an empirical dataset to a theoretical distribution, use the following implementation:
phi.plot_ecdf_distribution(id_distribution="weibull")Replace "weibull" with the desired distribution identifier.
Complete Usage Example ​
For a fully customized ECDF plot with all available parameters, the following implementation can be used:
phi.plot_ecdf_distribution(
id_distribution="normal",
plot_title="Empirical vs. Theoretical CDF",
plot_xaxis_title="Values",
plot_yaxis_title="Probability",
plot_xaxis_min_offset=0.3,
plot_xaxis_max_offset=0.3,
plot_legend_title="Legend",
plot_height=500,
plot_width=700,
plot_empirical_line_color="rgba(0,128,255,1)",
plot_empirical_line_width=3,
plot_empirical_line_name="Empirical Data",
plot_empirical_bar_color="rgba(0,128,255,0.5)",
plot_distribution_line_color="rgba(255,0,0,1)",
plot_distribution_line_width=2,
plotly_plot_renderer="png",
plot_engine="matplotlib"
)Example Visualization ​
