pusion.util.generator module

pusion.util.generator.generate_multiclass_ensemble_classification_outputs(classifiers, n_classes, n_samples, continuous_out=False, parallelize=True)

Generate random multiclass, crisp and redundant classification outputs (assignments) for the given ensemble of classifiers.

Parameters
  • classifiers – Classifiers used to generate classification outputs. These need to implement fit and predict methods according to classifiers provided by sklearn.

  • n_classesinteger. Number of classes, predictions are made for.

  • n_samplesinteger. Number of samples.

  • parallelize – If True, all classifiers are trained in parallel. Otherwise they are trained in sequence.

  • continuous_out – If True, class assignments in y_ensemble_valid and y_ensemble_test are given as probabilities. Default value is False.

Returns

tuple of: - y_ensemble_valid: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a validation dataset. - y_valid: numpy.array of shape (n_samples, n_classes). True class assignments for the validation. - y_ensemble_test: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a test dataset. - y_test: numpy.array of shape (n_samples, n_classes). True class assignments for the test.

pusion.util.generator.generate_multiclass_cr_ensemble_classification_outputs(classifiers, n_classes, n_samples, coverage=None, continuous_out=False, parallelize=True)

Generate random multiclass, crisp and complementary-redundant classification outputs (assignments) for the given ensemble of classifiers.

Parameters
  • classifiers – Classifiers used to generate classification outputs. These need to implement fit and predict methods according to classifiers provided by sklearn.

  • n_classesinteger. Number of classes, predictions are made for.

  • n_samplesinteger. Number of samples.

  • coveragelist of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list. If unset, redundant classification outputs are retrieved.

  • continuous_out – If True, class assignments in y_ensemble_valid and y_ensemble_test are given as probabilities. Default value is False.

  • parallelize – If True, all classifiers are trained in parallel. Otherwise they are trained in sequence.

Returns

tuple of: - y_ensemble_valid: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a validation dataset. - y_valid: numpy.array of shape (n_samples, n_classes). True class assignments for the validation. - y_ensemble_test: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a test dataset. - y_test: numpy.array of shape (n_samples, n_classes). True class assignments for the test.

pusion.util.generator.generate_multilabel_ensemble_classification_outputs(classifiers, n_classes, n_samples, continuous_out=False, parallelize=True)

Generate random multilabel crisp classification outputs (assignments) for the given ensemble of classifiers with the normal class included at index 0.

Parameters
  • classifiers – Classifiers used to generate classification outputs. These need to implement fit and predict methods according to classifiers provided by sklearn.

  • n_classesinteger. Number of classes, predictions are made for with the normal class included.

  • n_samplesinteger. Number of samples.

  • continuous_out – If True, class assignments in y_ensemble_valid and y_ensemble_test are given as probabilities. Default value is False.

  • parallelize – If True, all classifiers are trained in parallel. Otherwise they are trained in sequence.

Returns

tuple of: - y_ensemble_valid: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a validation dataset. - y_valid: numpy.array of shape (n_samples, n_classes). True class assignments for the validation. - y_ensemble_test: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a test dataset. - y_test: numpy.array of shape (n_samples, n_classes). True class assignments for the test.

pusion.util.generator.generate_multilabel_cr_ensemble_classification_outputs(classifiers, n_classes, n_samples, coverage=None, continuous_out=False, parallelize=True)

Generate random multilabel, crisp and complementary-redundant classification outputs (assignments) for the given ensemble of classifiers with the normal class included at index 0.

Parameters
  • classifiers – Classifiers used to generate classification outputs. These need to implement fit and predict methods according to classifiers provided by sklearn.

  • n_classesinteger. Number of classes, predictions are made for with the normal class included.

  • n_samplesinteger. Number of samples.

  • coveragelist of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list. If unset, redundant classification outputs are retrieved.

  • continuous_out – If True, class assignments in y_ensemble_valid and y_ensemble_test are given as probabilities. Default value is False.

  • parallelize – If True, all classifiers are trained in parallel. Otherwise they are trained in sequence.

Returns

tuple of: - y_ensemble_valid: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a validation dataset. - y_valid: numpy.array of shape (n_samples, n_classes). True class assignments for the validation. - y_ensemble_test: numpy.array of shape (n_samples, n_classes). Ensemble decision output matrix for as a test dataset. - y_test: numpy.array of shape (n_samples, n_classes). True class assignments for the test.

pusion.util.generator.generate_multiclass_confusion_matrices(decision_tensor, true_assignments)

Generate multiclass confusion matrices out of the given decision tensor and true assignments. Continuous outputs are converted to multiclass assignments using the MAX rule.

Parameters
  • decision_tensornumpy.array of shape (n_classifiers, n_samples, n_classes). Tensor of crisp decision outputs by different classifiers per sample.

  • true_assignmentsnumpy.array of shape (n_samples, n_classes). Matrix of crisp class assignments which are considered true for calculating confusion matrices.

Returns

numpy.array of shape (n_classifiers, n_samples, n_samples). Confusion matrices per classifier.

pusion.util.generator.generate_multilabel_cr_confusion_matrices(decision_outputs, true_assignments, coverage)

Generate multilabel confusion matrices for complementary-redundant multilabel classification outputs.

Parameters
  • decision_outputsnumpy.array of shape (n_classifiers, n_samples, n_classes) or a list of numpy.array elements of shape (n_samples, n_classes’), where n_classes’ is classifier-specific due to the coverage.

  • true_assignmentsnumpy.array of shape (n_samples, n_classes). Matrix of crisp class assignments which are considered true for calculating confusion matrices.

  • coveragelist of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list.

Returns

List of multilabel confusion matrices.

pusion.util.generator.generate_classification_coverage(n_classifiers, n_classes, overlap, normal_class=True)

Generate random complementary redundant class indices for each classifier 0..(n_classifiers-1). The coverage is drawn from normal distribution for all classifiers. However, it is guaranteed that each classifier covers at least one class regardless of the distribution.

Parameters
  • n_classifiers – Number of classifiers representing the classifier 0..(n_classifiers-1).

  • n_classes – Number of classes representing the class label 0..(n_classes-1).

  • overlap – Indicator between 0 and 1 for overall classifier overlapping in terms of classes. If 0, only complementary class indices are obtained. If 1, the overlapping is fully redundant.

  • normal_class – If True, a class for the normal state is included for all classifiers as class index 0.

Returns

list of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list.

pusion.util.generator.shrink_to_coverage(decision_tensor, coverage)

Shrink the given decision tensor to decision outputs according to the given coverage. Assumption: the normal class is covered by each classifier at index 0.

Parameters
  • decision_tensornumpy.array of shape (n_classifiers, n_samples, n_classes). Tensor of crisp multilabel decision outputs by different classifiers per sample.

  • coveragelist of list elements. Each inner list contains classes as integers covered by a classifier, which is identified by the positional index of the respective list.

Returns

list of numpy.array elements of shape (n_samples, n_classes’), where n_classes’ is classifier-specific due to the coverage.

pusion.util.generator.split_into_train_and_validation_data(decision_tensor, true_assignments, validation_size=0.5)

Split the decision outputs (tensor) from multiple classifiers as well as the true assignments randomly into train and validation datasets.

Parameters
  • decision_tensornumpy.array of shape (n_classifiers, n_samples, n_classes). Tensor of decision outputs by different classifiers per sample.

  • true_assignmentsnumpy.array of shape (n_samples, n_classes). Matrix of true class assignments.

  • validation_size – Proportion between 0 and 1 for the size of the validation data set.

Returns

tuple of (1) numpy.array of shape (n_classifiers, n_samples’, n_classes), (2) numpy.array of shape (n_classifiers, n_samples’), (3) numpy.array of shape (n_classifiers, n_samples’’, n_classes), (4) numpy.array of shape (n_classifiers, n_samples’’), with n_samples’ as the number of training samples and n_samples’’ as the number of validation samples.