Evaluation

Scripts to perform evaluation on the data

Auxiliar Functions


source

plot_metric_heatmaps

 plot_metric_heatmaps (results:dict, distance_metrics:list,
                       clustering_algorithms:list,
                       evaluation_metrics:list)

Plot heatmaps for each evaluation metric showing scores across different distance metrics and clustering algorithms combinations.

Type Details
results dict Dictionary containing results for each combination of distance metric and clustering algorithm
distance_metrics list List of distance metric names
clustering_algorithms list List of clustering algorithm names
evaluation_metrics list List of evaluation metric names
Returns None

source

find_non_matching_elements

 find_non_matching_elements (main_array:numpy.ndarray,
                             check_array:numpy.ndarray)

Finds elements in check_array that are not present in main_array.

Type Details
main_array ndarray The main array with larger set of elements
check_array ndarray The array with elements to check against the main array
Returns ndarray

Evaluate Clustering with Multiple Labels


source

evaluate_clustering_multiple_labels

 evaluate_clustering_multiple_labels
                                      (latent_representations:numpy.ndarra
                                      y, list_of_labels:list,
                                      clustering_method:str='kmeans',
                                      label_names:list=None, **kwargs)

Evaluates the clustering quality of the latent representations for one or multiple sets of labels.

Type Default Details
latent_representations ndarray The latent space data.
list_of_labels list List of true labels or a single true labels array.
clustering_method str kmeans The clustering algorithm to use (‘kmeans’, ‘gmm’, ‘dbscan’).
label_names list None Optional names for the label sets.
kwargs VAR_KEYWORD
Returns dict Returns a dictionary with clustering metrics.

Physical Distances

Euclidean


source

euclidean_distance

 euclidean_distance (point1:numpy.ndarray, point2:numpy.ndarray)

Manhattan


source

manhattan_distance

 manhattan_distance (point1:numpy.ndarray, point2:numpy.ndarray)

Cosine


source

cosine_distance

 cosine_distance (point1:numpy.ndarray, point2:numpy.ndarray)

Dynamic Time Warping


source

dtw_distance

 dtw_distance (point1:numpy.ndarray, point2:numpy.ndarray)

Generic


source

calculate_distance

 calculate_distance (point1:numpy.ndarray, point2:numpy.ndarray,
                     distance_metric:str='euclidean')

*Calculates the distance between two points based on the specified distance metric.

:param point1: First data point array. :param point2: Second data point array. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). :return: Distance as a float.*


source

calculate_pairwise_distances

 calculate_pairwise_distances (array1:numpy.ndarray, array2:numpy.ndarray,
                               distance_metric:str='euclidean')

*Calculates the distance between corresponding pairs of points from two arrays using the specified distance metric.

:param array1: A 2D numpy array where each row represents a data point. :param array2: A 2D numpy array where each row represents a data point. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). :return: A 1D numpy array containing the distances between corresponding pairs.*

Batch


source

calculate_distances_batch

 calculate_distances_batch (single_points:numpy.ndarray,
                            points_array:numpy.ndarray,
                            distance_metric:str='euclidean')

*Calculates the distances between single data points and an array of data points based on the specified distance metric.

:param single_points: Single data point array or a batch of data points. :param points_array: Array of data points to compare against. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). :return: Array of distances.*

Nearest Points


source

find_nearest_points

 find_nearest_points (single_point:numpy.ndarray,
                      points_array:numpy.ndarray, n:int,
                      distance_metric:str='euclidean')

source

find_nearest_points_batch

 find_nearest_points_batch (single_points:numpy.ndarray,
                            points_array:numpy.ndarray, n:int,
                            distance_metric:str='euclidean')

*Finds the nearest indices and distances for a batch of single data points to an array of data points based on the specified distance metric.

:param single_points: Array of single data points (2D array). :param points_array: Array of data points to compare against (2D array). :param n: Number of nearest points to retrieve for each single point. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). :return: Tuple of nearest indices and nearest distances for each single point.*

Orbits Distance


source

orbits_distances

 orbits_distances (orbit_data1:numpy.ndarray, orbit_data2:numpy.ndarray,
                   distance_metric:str)

*Calculates distances between orbits in two datasets using a specified distance metric.

This function is robust to input shapes. If an input is a 2D array (representing a single orbit), it is automatically converted to a 3D array with one sample. This allows for flexible comparisons between single or multiple orbits.

:param orbit_data1: First set of orbits (shape: [n_samples1, n_features, n_time_steps] or [n_features, n_time_steps]). :param orbit_data2: Second set of orbits or a single orbit. Shape: [n_samples2, n_features, n_time_steps] or [n_features, n_time_steps]. :param distance_metric: A string representing the distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’).

return: NumPy array of distances. - If one input is single and the other is multiple: - Shape: [n_samples1] or [n_samples2] - If both inputs are multiple: - Shape: [n_samples1, n_samples2]*
Type Details
orbit_data1 ndarray Shape: [n_samples1, n_features, n_time_steps] or [n_features, n_time_steps]
orbit_data2 ndarray Shape: [n_samples2, n_features, n_time_steps] or [n_features, n_time_steps]
distance_metric str String representing the distance metric (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’)
Returns ndarray

Get the Closest Orbits


source

find_nearest_orbits

 find_nearest_orbits (single_orbit:numpy.ndarray,
                      orbit_data:numpy.ndarray, n:int,
                      distance_metric:str='euclidean')

*Finds the n closest orbits in orbit_data to the single_orbit based on the specified distance metric.

:param single_orbit: The reference orbit (shape: [n_features, n_time_steps]). :param orbit_data: The dataset of orbits (shape: [n_samples, n_features, n_time_steps]). :param n: The number of closest orbits to return. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). Defaults to ‘euclidean’. :return: A tuple containing: - Indices of the n closest orbits in orbit_data. - Distances of the n closest orbits.*


source

find_nearest_orbits_batch

 find_nearest_orbits_batch (single_orbits:numpy.ndarray,
                            orbit_data:numpy.ndarray, n:int,
                            distance_metric:str='euclidean')

*Iteratively finds the n closest orbits in orbit_data for each orbit in single_orbits.

param single_orbits: The reference orbits (shape: [num_single_orbits, n_features, n_time_steps]). :param orbit_data: The dataset of orbits to search within (shape: [n_samples, n_features, n_time_steps]). :param n: The number of closest orbits to return for each single_orbit. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). Defaults to ‘euclidean’. :return: A tuple containing: - A 2D array of shape [num_single_orbits, n] with indices of the n closest orbits. - A 2D array of shape [num_single_orbits, n] with distances of the n closest orbits.*
Type Default Details
single_orbits ndarray Shape: [num_single_orbits, n_features, n_time_steps]
orbit_data ndarray Shape: [n_samples, n_features, n_time_steps]
n int Number of nearest orbits to find
distance_metric str euclidean Distance metric
Returns tuple

Calculate Pairwise distances


source

calculate_pairwise_orbit_distances

 calculate_pairwise_orbit_distances (orbit_data1:numpy.ndarray,
                                     orbit_data2:numpy.ndarray,
                                     distance_metric:str='euclidean')

*Calculates the distance between corresponding orbits in two orbit datasets.

param orbit_data1: The first set of orbits (shape: [n_samples, n_features, n_time_steps]). :param orbit_data2: The second set of orbits (shape: [n_samples, n_features, n_time_steps]). :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). Defaults to ‘euclidean’. :return: An array of distances with shape [n_samples].*
Type Default Details
orbit_data1 ndarray Shape: [n_samples, n_features, n_time_steps]
orbit_data2 ndarray Shape: [n_samples, n_features, n_time_steps]
distance_metric str euclidean Distance metric
Returns ndarray

Evaluate Distance Metrics


source

evaluate_distance_metrics_and_clustering

 evaluate_distance_metrics_and_clustering (orbit_data:numpy.ndarray,
                                           true_labels:numpy.ndarray, dist
                                           ance_metrics:Optional[list]=Non
                                           e, clustering_algorithms:Option
                                           al[list]=None, evaluation_metri
                                           cs:Optional[list]=None,
                                           n_clusters:Optional[int]=None,
                                           plot_results:bool=True)

Evaluates combinations of distance metrics and clustering algorithms on orbit data.

Type Default Details
orbit_data ndarray Orbit data as multivariate time series [n_samples, n_features, n_time_steps] or point data [n_samples, n_features]
true_labels ndarray Array of true labels for the orbit data
distance_metrics Optional None List of distance metrics to use. If None, uses all available metrics
clustering_algorithms Optional None List of clustering algorithms to use. If None, uses all available algorithms
evaluation_metrics Optional None List of evaluation metrics to use. If None, uses all available metrics
n_clusters Optional None Number of clusters. If None, inferred from labels
plot_results bool True If True, plot heatmaps of results
Returns dict

Machine Learning


source

machine_learning_evaluation

 machine_learning_evaluation (X:numpy.ndarray, y:numpy.ndarray,
                              print_results:bool=False,
                              return_best_model:bool=False,
                              scale_data:bool=True)

Evaluates multiple machine learning algorithms on the provided dataset.

Type Default Details
X ndarray Features array, expected to be 2D. Will attempt to reshape if higher dimensions
y ndarray Target labels
print_results bool False If True, visualizes the evaluation results
return_best_model bool False If True, returns the best model based on accuracy
scale_data bool True If True, scales the features using StandardScaler
Returns Union