pusion.util.generator module¶
- pusion.util.generator.generate_multiclass_ensemble_classification_outputs(classifiers, n_classes, n_samples, continuous_out=False, parallelize=True)¶
Generate random multiclass, crisp and redundant classification outputs (assignments) for the given ensemble of classifiers.
- Parameters
classifiers – Classifiers used to generate classification outputs. These need to implement fit and predict methods according to classifiers provided by sklearn.
n_classes – integer. Number of classes, predictions are made for.
n_samples – integer. Number of samples.
parallelize – If True, all classifiers are trained in parallel. Otherwise they are trained in sequence.
continuous_out – If True, class assignments in y_ensemble_valid and y_ensemble_test are given as probabilities. Default value is False.
- Returns
tuple of: - y_ensemble_valid: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a validation dataset. - y_valid: numpy.array of shape (n_samples, n_classes). True class assignments for the validation. - y_ensemble_test: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a test dataset. - y_test: numpy.array of shape (n_samples, n_classes). True class assignments for the test.
- pusion.util.generator.generate_multiclass_cr_ensemble_classification_outputs(classifiers, n_classes, n_samples, coverage=None, continuous_out=False, parallelize=True)¶
Generate random multiclass, crisp and complementary-redundant classification outputs (assignments) for the given ensemble of classifiers.
- Parameters
classifiers – Classifiers used to generate classification outputs. These need to implement fit and predict methods according to classifiers provided by sklearn.
n_classes – integer. Number of classes, predictions are made for.
n_samples – integer. Number of samples.
coverage – list of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list. If unset, redundant classification outputs are retrieved.
continuous_out – If True, class assignments in y_ensemble_valid and y_ensemble_test are given as probabilities. Default value is False.
parallelize – If True, all classifiers are trained in parallel. Otherwise they are trained in sequence.
- Returns
tuple of: - y_ensemble_valid: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a validation dataset. - y_valid: numpy.array of shape (n_samples, n_classes). True class assignments for the validation. - y_ensemble_test: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a test dataset. - y_test: numpy.array of shape (n_samples, n_classes). True class assignments for the test.
- pusion.util.generator.generate_multilabel_ensemble_classification_outputs(classifiers, n_classes, n_samples, continuous_out=False, parallelize=True)¶
Generate random multilabel crisp classification outputs (assignments) for the given ensemble of classifiers with the normal class included at index 0.
- Parameters
classifiers – Classifiers used to generate classification outputs. These need to implement fit and predict methods according to classifiers provided by sklearn.
n_classes – integer. Number of classes, predictions are made for with the normal class included.
n_samples – integer. Number of samples.
continuous_out – If True, class assignments in y_ensemble_valid and y_ensemble_test are given as probabilities. Default value is False.
parallelize – If True, all classifiers are trained in parallel. Otherwise they are trained in sequence.
- Returns
tuple of: - y_ensemble_valid: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a validation dataset. - y_valid: numpy.array of shape (n_samples, n_classes). True class assignments for the validation. - y_ensemble_test: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a test dataset. - y_test: numpy.array of shape (n_samples, n_classes). True class assignments for the test.
- pusion.util.generator.generate_multilabel_cr_ensemble_classification_outputs(classifiers, n_classes, n_samples, coverage=None, continuous_out=False, parallelize=True)¶
Generate random multilabel, crisp and complementary-redundant classification outputs (assignments) for the given ensemble of classifiers with the normal class included at index 0.
- Parameters
classifiers – Classifiers used to generate classification outputs. These need to implement fit and predict methods according to classifiers provided by sklearn.
n_classes – integer. Number of classes, predictions are made for with the normal class included.
n_samples – integer. Number of samples.
coverage – list of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list. If unset, redundant classification outputs are retrieved.
continuous_out – If True, class assignments in y_ensemble_valid and y_ensemble_test are given as probabilities. Default value is False.
parallelize – If True, all classifiers are trained in parallel. Otherwise they are trained in sequence.
- Returns
tuple of: - y_ensemble_valid: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a validation dataset. - y_valid: numpy.array of shape (n_samples, n_classes). True class assignments for the validation. - y_ensemble_test: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a test dataset. - y_test: numpy.array of shape (n_samples, n_classes). True class assignments for the test.
- pusion.util.generator.generate_multiclass_confusion_matrices(decision_tensor, true_assignments)¶
Generate multiclass confusion matrices out of the given decision tensor and true assignments. Continuous outputs are converted to multiclass assignments using the MAX rule.
- Parameters
decision_tensor – numpy.array of shape (n_classifiers, n_samples, n_classes). Tensor of crisp decision outputs by different classifiers per sample.
true_assignments – numpy.array of shape (n_samples, n_classes). Matrix of crisp class assignments which are considered true for calculating confusion matrices.
- Returns
numpy.array of shape (n_classifiers, n_samples, n_samples). Confusion matrices per classifier.
- pusion.util.generator.generate_multilabel_cr_confusion_matrices(decision_outputs, true_assignments, coverage)¶
Generate multilabel confusion matrices for complementary-redundant multilabel classification outputs.
- Parameters
decision_outputs – numpy.array of shape (n_classifiers, n_samples, n_classes) or a list of numpy.array elements of shape (n_samples, n_classes’), where n_classes’ is classifier-specific due to the coverage.
true_assignments – numpy.array of shape (n_samples, n_classes). Matrix of crisp class assignments which are considered true for calculating confusion matrices.
coverage – list of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list.
- Returns
List of multilabel confusion matrices.
- pusion.util.generator.generate_classification_coverage(n_classifiers, n_classes, overlap, normal_class=True)¶
Generate random complementary redundant class indices for each classifier 0..(n_classifiers-1). The coverage is drawn from normal distribution for all classifiers. However, it is guaranteed that each classifier covers at least one class regardless of the distribution.
- Parameters
n_classifiers – Number of classifiers representing the classifier 0..(n_classifiers-1).
n_classes – Number of classes representing the class label 0..(n_classes-1).
overlap – Indicator between 0 and 1 for overall classifier overlapping in terms of classes. If 0, only complementary class indices are obtained. If 1, the overlapping is fully redundant.
normal_class – If True, a class for the normal state is included for all classifiers as class index 0.
- Returns
list of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list.
- pusion.util.generator.shrink_to_coverage(decision_tensor, coverage)¶
Shrink the given decision tensor to decision outputs according to the given coverage. Assumption: the normal class is covered by each classifier at index 0.
- Parameters
decision_tensor – numpy.array of shape (n_classifiers, n_samples, n_classes). Tensor of crisp multilabel decision outputs by different classifiers per sample.
coverage – list of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list.
- Returns
list of numpy.array elements of shape (n_samples, n_classes’), where n_classes’ is classifier-specific due to the coverage.
- pusion.util.generator.split_into_train_and_validation_data(decision_tensor, true_assignments, validation_size=0.5)¶
Split the decision outputs (tensor) from multiple classifiers as well as the true assignments randomly into train and validation datasets.
- Parameters
decision_tensor – numpy.array of shape (n_classifiers, n_samples, n_classes). Tensor of decision outputs by different classifiers per sample.
true_assignments – numpy.array of shape (n_samples, n_classes). Matrix of true class assignments.
validation_size – Proportion between 0 and 1 for the size of the validation data set.
- Returns
tuple of (1) numpy.array of shape (n_classifiers, n_samples’, n_classes), (2) numpy.array of shape (n_classifiers, n_samples’), (3) numpy.array of shape (n_classifiers, n_samples’’, n_classes), (4) numpy.array of shape (n_classifiers, n_samples’’), with n_samples’ as the number of training samples and n_samples’’ as the number of validation samples.