Generic CCRVAM API

ccrvam.checkerboard.genccrvam

Module Contents

Classes

GenericCCRVAM

Central Generic Checkerboard Copula Regression, Visualization and Association Measure (CCRVAM) class object.

API

class ccrvam.checkerboard.genccrvam.GenericCCRVAM(P: numpy.ndarray)[source]

Central Generic Checkerboard Copula Regression, Visualization and Association Measure (CCRVAM) class object.

Initialization

Initialization with joint probability matrix P for statistical analysis.

classmethod from_contingency_table(contingency_table: numpy.ndarray) ccrvam.checkerboard.genccrvam.GenericCCRVAM[source]

Create a CCRVAM object instance from a multi-dimensional contingency table.

Input Arguments

  • contingency_table : A contingency table of frequency counts (multi-dimensional numpy array)

Outputs

A new GenericCCRVAM object instance initialized with the probability matrix, which will allow for further statistical analysis of the data.

Warnings/Errors

  • ValueError : If the input table contains negative values or all zeros

classmethod from_cases(cases: numpy.ndarray, dimension: tuple) ccrvam.checkerboard.genccrvam.GenericCCRVAM[source]

Create a CCRVAM object instance from the case form data.

Input Arguments

  • cases : A 2D array where each row represents a case array of observed values for each categorical variable. Each column corresponds to a different variable, and the values in each column represent the category indices for that variable (1-indexed).

  • dimension : A tuple specifying the number of categories for each variable in the same order as the columns in cases. For example, if cases has 3 columns representing variables A, B, and C with 2, 3, and 4 categories respectively, then dimension should be (2,3,4).

Outputs

A new GenericCCRVAM object instance initialized with the probability matrix, which will allow for further statistical analysis of the data.

Warnings/Errors

  • ValueError : If the input cases are not 2-dimensional or if the dimension tuple does not match the number of variables and their categories in the cases data.

Example

cases = np.array([[1,1,1], [1,2,1], [2,1,2]]) # 3 cases with 3 variables dimension = (2,2,2) # Each variable has 2 categories ccrvam = GenericCCRVAM.from_cases(cases, dimension)

calculate_CCRAM(predictors: Union[int, list], response: int, scaled: bool = False) float[source]

Calculate CCRAM with multiple conditioning axes.

Input Arguments

  • predictors : List of 1-indexed predictors axes for regression association

  • response : 1-indexed target response variable axis for regression association

  • scaled : Whether to return scaled or normalized CCRAM statistical measure (default: False)

Outputs

(Scaled) CCRAM value for the given predictors and response variable

Warnings/Errors

  • ValueError : If the response variable axis is out of bounds for the array dimension

  • ValueError : If the predictors contain an axis that is out of bounds for the array dimension

get_predictions_ccr(predictors: list, response: int, variable_names: Union[dict, None] = None) pandas.DataFrame[source]

Get predictions of response variable categories conditioned on multiple predictor variables.

Input Arguments

  • predictors : List of 1-indexed predictors axes for category prediction

  • response : 1-indexed target response variable axis for category prediction

  • variable_names : Dictionary mapping 1-indexed variable indices to names (default: None)

Outputs

DataFrame containing the predicted category of the response variable for each combination of categories of the predictors

Notes

The DataFrame contains columns for each combination of categories of the predictors and the corresponding predicted category of the response variable. The categories are 1-indexed. Combinations with zero counts in the contingency table will have NA as the predicted category.

Warnings/Errors

  • ValueError : If the response variable axis is out of bounds for the array dimension

  • ValueError : If the predictors contain an axis that is out of bounds for the array dimension

get_prediction_under_indep(response: int) int[source]

Calculate the predicted category under joint independence between the response variables and predictors.

The CCR value equals 0.5 under the assumption of joint independence between the response variable and all predictor variables.

Input Arguments

  • response : 1-indexed target response variable axis

Outputs

The predicted category (1-indexed) for the response variable under joint independence

Notes

This prediction serves as an important reference point when interpreting CCR prediction results, as it represents what would be predicted if there were no association between the predictors and the response variable.

Warnings/Errors

  • ValueError : If the response variable axis is out of bounds for the array dimension

calculate_ccs(var_index: int) numpy.ndarray[source]

Calculate checkerboard scores for the specified variable index.

Input Arguments

  • var_index : 1-Indexed axis of the variable for which to calculate scores

Outputs

Array containing checkerboard scores for the given variable index

Warnings/Errors

  • ValueError : If the axis is out of bounds for the array dimension

calculate_variance_ccs(var_index: int) float[source]

Calculate the variance of the checkerboard score for the specified variable index.

Input Arguments

  • var_index : 1-Indexed axis of the variable for which to calculate variance of the checkerboard score

Outputs

  • float : Variance of the checkerboard score for the given variable index

Warnings/Errors

  • ValueError : If the variable index is out of bounds for the array dimension

plot_ccr_predictions(predictors: list, response: int, variable_names: Union[dict, None] = None, legend_style: str = 'side', show_indep_line: bool = True, figsize: Union[tuple, None] = None, save_path: Union[str, None] = None, dpi: int = 300, title_fontsize: Union[int, None] = None, xlabel_fontsize: Union[int, None] = None, ylabel_fontsize: Union[int, None] = None, tick_fontsize: Union[int, None] = None, text_fontsize: Union[int, None] = None, use_category_letters: bool = False, **kwargs) None[source]

Plot CCR predictions as a visualization.

Input Arguments

  • predictors : List of 1-indexed predictor axes

  • response : 1-indexed response variable axis

  • variable_names : Dictionary mapping indices to variable names (default: None)

  • legend_style : How to display combinations of categories of predictors: ‘side’ (default) or ‘xaxis’

  • show_indep_line : Whether to show the prediction under joint independence between the response variable and all the predictors (default: True)

  • figsize : Figure size (width, height)

  • save_path : Path to save the plot (e.g. ‘plots/ccr_pred.pdf’)

  • dpi : Resolution for saving raster images (png, jpg)

  • title_fontsize : Font size for the plot title (optional)

  • xlabel_fontsize : Font size for x-axis label (optional)

  • ylabel_fontsize : Font size for y-axis label (optional)

  • tick_fontsize : Font size for axis tick labels (optional)

  • text_fontsize : Font size for text inside the plot (optional)

  • use_category_letters : Whether to use letters for categories instead of numbers (optional)

  • **kwargs : Additional matplotlib arguments passed to plotting functions

Outputs

None (Plot is displayed or saved to file as per user preferences and settings)

Warnings/Errors

  • ValueError : If the response variable axis is out of bounds for the array dimension

  • ValueError : If the predictors contain an axis that is out of bounds for the array dimension

_calculate_conditional_pmf(target_axis, given_axes)[source]

Internal helper function: Calculate conditional probability mass function of the response variable given the predictors.

_calculate_regression_batched(target_axis, given_axes, given_values)[source]

Internal helper function: Vectorized regression calculation for multiple predictor variables.

_calculate_scores(marginal_cdf)[source]

Internal helper function: Calculate checkerboard scores from marginal CDF.

_lambda_function(u, ul, uj)[source]

Internal helper function: Calculate lambda function for checkerboard copula construction through bilinear interpolation. (Wei and Kim, 2021)

_get_predicted_category(regression_value, marginal_cdf)[source]

Internal helper function: Get predicted category based on the calculated regression value.

_get_predicted_category_batched(regression_values, marginal_cdf)[source]

Internal helper function: Get predicted categories for multiple calculated regression values.

_calculate_sigma_sq_S(axis)[source]

Internal helper function: Calculate variance of the checkerboard copula score for given axis.

_calculate_sigma_sq_S_vectorized(axis)[source]

Internal helper function: Calculate variance of the checkerboard score using vectorized operations.

_predict_category(source_category, predictors, response)[source]

Internal helper function: Predict the category of the response variable given given combination of categories of predictors.

_predict_category_batched_multi(source_categories, predictors, response)[source]

Internal helper function: Vectorized prediction with multiple predictor variables.