Generic CCRVAM API
ccrvam.checkerboard.genccrvam
Module Contents
Classes
|
Central Generic Checkerboard Copula Regression, Visualization and Association Measure (CCRVAM) class object. |
API
- class ccrvam.checkerboard.genccrvam.GenericCCRVAM(P: numpy.ndarray)[source]
Central Generic Checkerboard Copula Regression, Visualization and Association Measure (CCRVAM) class object.
Initialization
Initialization with joint probability matrix P for statistical analysis.
- classmethod from_contingency_table(contingency_table: numpy.ndarray) ccrvam.checkerboard.genccrvam.GenericCCRVAM[source]
Create a CCRVAM object instance from a multi-dimensional contingency table.
Input Arguments
contingency_table: A contingency table of frequency counts (multi-dimensional numpy array)
Outputs
A new GenericCCRVAM object instance initialized with the probability matrix, which will allow for further statistical analysis of the data.
Warnings/Errors
ValueError: If the input table contains negative values or all zeros
- classmethod from_cases(cases: numpy.ndarray, dimension: tuple) ccrvam.checkerboard.genccrvam.GenericCCRVAM[source]
Create a CCRVAM object instance from the case form data.
Input Arguments
cases: A 2D array where each row represents a case array of observed values for each categorical variable. Each column corresponds to a different variable, and the values in each column represent the category indices for that variable (1-indexed).dimension: A tuple specifying the number of categories for each variable in the same order as the columns incases. For example, ifcaseshas 3 columns representing variables A, B, and C with 2, 3, and 4 categories respectively, thendimensionshould be (2,3,4).
Outputs
A new GenericCCRVAM object instance initialized with the probability matrix, which will allow for further statistical analysis of the data.
Warnings/Errors
ValueError: If the input cases are not 2-dimensional or if the dimension tuple does not match the number of variables and their categories in the cases data.
Example
cases = np.array([[1,1,1], [1,2,1], [2,1,2]]) # 3 cases with 3 variables dimension = (2,2,2) # Each variable has 2 categories ccrvam = GenericCCRVAM.from_cases(cases, dimension)
- calculate_CCRAM(predictors: Union[int, list], response: int, scaled: bool = False) float[source]
Calculate CCRAM with multiple conditioning axes.
Input Arguments
predictors: List of 1-indexed predictors axes for regression associationresponse: 1-indexed target response variable axis for regression associationscaled: Whether to return scaled or normalized CCRAM statistical measure (default: False)
Outputs
(Scaled) CCRAM value for the given predictors and response variable
Warnings/Errors
ValueError: If the response variable axis is out of bounds for the array dimensionValueError: If the predictors contain an axis that is out of bounds for the array dimension
- get_predictions_ccr(predictors: list, response: int, variable_names: Union[dict, None] = None) pandas.DataFrame[source]
Get predictions of response variable categories conditioned on multiple predictor variables.
Input Arguments
predictors: List of 1-indexed predictors axes for category predictionresponse: 1-indexed target response variable axis for category predictionvariable_names: Dictionary mapping 1-indexed variable indices to names (default: None)
Outputs
DataFrame containing the predicted category of the response variable for each combination of categories of the predictors
Notes
The DataFrame contains columns for each combination of categories of the predictors and the corresponding predicted category of the response variable. The categories are 1-indexed. Combinations with zero counts in the contingency table will have NA as the predicted category.
Warnings/Errors
ValueError: If the response variable axis is out of bounds for the array dimensionValueError: If the predictors contain an axis that is out of bounds for the array dimension
- get_prediction_under_indep(response: int) int[source]
Calculate the predicted category under joint independence between the response variables and predictors.
The CCR value equals 0.5 under the assumption of joint independence between the response variable and all predictor variables.
Input Arguments
response: 1-indexed target response variable axis
Outputs
The predicted category (1-indexed) for the response variable under joint independence
Notes
This prediction serves as an important reference point when interpreting CCR prediction results, as it represents what would be predicted if there were no association between the predictors and the response variable.
Warnings/Errors
ValueError: If the response variable axis is out of bounds for the array dimension
- calculate_ccs(var_index: int) numpy.ndarray[source]
Calculate checkerboard scores for the specified variable index.
Input Arguments
var_index: 1-Indexed axis of the variable for which to calculate scores
Outputs
Array containing checkerboard scores for the given variable index
Warnings/Errors
ValueError: If the axis is out of bounds for the array dimension
- calculate_variance_ccs(var_index: int) float[source]
Calculate the variance of the checkerboard score for the specified variable index.
Input Arguments
var_index: 1-Indexed axis of the variable for which to calculate variance of the checkerboard score
Outputs
float: Variance of the checkerboard score for the given variable index
Warnings/Errors
ValueError: If the variable index is out of bounds for the array dimension
- plot_ccr_predictions(predictors: list, response: int, variable_names: Union[dict, None] = None, legend_style: str = 'side', show_indep_line: bool = True, figsize: Union[tuple, None] = None, save_path: Union[str, None] = None, dpi: int = 300, title_fontsize: Union[int, None] = None, xlabel_fontsize: Union[int, None] = None, ylabel_fontsize: Union[int, None] = None, tick_fontsize: Union[int, None] = None, text_fontsize: Union[int, None] = None, use_category_letters: bool = False, **kwargs) None[source]
Plot CCR predictions as a visualization.
Input Arguments
predictors: List of 1-indexed predictor axesresponse: 1-indexed response variable axisvariable_names: Dictionary mapping indices to variable names (default: None)legend_style: How to display combinations of categories of predictors: ‘side’ (default) or ‘xaxis’show_indep_line: Whether to show the prediction under joint independence between the response variable and all the predictors (default: True)figsize: Figure size (width, height)save_path: Path to save the plot (e.g. ‘plots/ccr_pred.pdf’)dpi: Resolution for saving raster images (png, jpg)title_fontsize: Font size for the plot title (optional)xlabel_fontsize: Font size for x-axis label (optional)ylabel_fontsize: Font size for y-axis label (optional)tick_fontsize: Font size for axis tick labels (optional)text_fontsize: Font size for text inside the plot (optional)use_category_letters: Whether to use letters for categories instead of numbers (optional)**kwargs: Additional matplotlib arguments passed to plotting functions
Outputs
None (Plot is displayed or saved to file as per user preferences and settings)
Warnings/Errors
ValueError: If the response variable axis is out of bounds for the array dimensionValueError: If the predictors contain an axis that is out of bounds for the array dimension
- _calculate_conditional_pmf(target_axis, given_axes)[source]
Internal helper function: Calculate conditional probability mass function of the response variable given the predictors.
- _calculate_regression_batched(target_axis, given_axes, given_values)[source]
Internal helper function: Vectorized regression calculation for multiple predictor variables.
- _calculate_scores(marginal_cdf)[source]
Internal helper function: Calculate checkerboard scores from marginal CDF.
- _lambda_function(u, ul, uj)[source]
Internal helper function: Calculate lambda function for checkerboard copula construction through bilinear interpolation. (Wei and Kim, 2021)
- _get_predicted_category(regression_value, marginal_cdf)[source]
Internal helper function: Get predicted category based on the calculated regression value.
- _get_predicted_category_batched(regression_values, marginal_cdf)[source]
Internal helper function: Get predicted categories for multiple calculated regression values.
- _calculate_sigma_sq_S(axis)[source]
Internal helper function: Calculate variance of the checkerboard copula score for given axis.
- _calculate_sigma_sq_S_vectorized(axis)[source]
Internal helper function: Calculate variance of the checkerboard score using vectorized operations.
- _predict_category(source_category, predictors, response)[source]
Internal helper function: Predict the category of the response variable given given combination of categories of predictors.
- _predict_category_batched_multi(source_categories, predictors, response)[source]
Internal helper function: Vectorized prediction with multiple predictor variables.