Evaluation
Auxiliar Functions
plot_metric_heatmaps
plot_metric_heatmaps (results:dict, distance_metrics:list, clustering_algorithms:list, evaluation_metrics:list)
Plot heatmaps for each evaluation metric showing scores across different distance metrics and clustering algorithms combinations.
| Type | Details | |
|---|---|---|
| results | dict | Dictionary containing results for each combination of distance metric and clustering algorithm |
| distance_metrics | list | List of distance metric names |
| clustering_algorithms | list | List of clustering algorithm names |
| evaluation_metrics | list | List of evaluation metric names |
| Returns | None |
find_non_matching_elements
find_non_matching_elements (main_array:numpy.ndarray, check_array:numpy.ndarray)
Finds elements in check_array that are not present in main_array.
| Type | Details | |
|---|---|---|
| main_array | ndarray | The main array with larger set of elements |
| check_array | ndarray | The array with elements to check against the main array |
| Returns | ndarray |
Evaluate Clustering with Multiple Labels
evaluate_clustering_multiple_labels
evaluate_clustering_multiple_labels (latent_representations:numpy.ndarra y, list_of_labels:list, clustering_method:str='kmeans', label_names:list=None, **kwargs)
Evaluates the clustering quality of the latent representations for one or multiple sets of labels.
| Type | Default | Details | |
|---|---|---|---|
| latent_representations | ndarray | The latent space data. | |
| list_of_labels | list | List of true labels or a single true labels array. | |
| clustering_method | str | kmeans | The clustering algorithm to use (‘kmeans’, ‘gmm’, ‘dbscan’). |
| label_names | list | None | Optional names for the label sets. |
| kwargs | VAR_KEYWORD | ||
| Returns | dict | Returns a dictionary with clustering metrics. |
Physical Distances
Euclidean
euclidean_distance
euclidean_distance (point1:numpy.ndarray, point2:numpy.ndarray)
Manhattan
manhattan_distance
manhattan_distance (point1:numpy.ndarray, point2:numpy.ndarray)
Cosine
cosine_distance
cosine_distance (point1:numpy.ndarray, point2:numpy.ndarray)
Dynamic Time Warping
dtw_distance
dtw_distance (point1:numpy.ndarray, point2:numpy.ndarray)
Generic
calculate_distance
calculate_distance (point1:numpy.ndarray, point2:numpy.ndarray, distance_metric:str='euclidean')
*Calculates the distance between two points based on the specified distance metric.
:param point1: First data point array. :param point2: Second data point array. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). :return: Distance as a float.*
calculate_pairwise_distances
calculate_pairwise_distances (array1:numpy.ndarray, array2:numpy.ndarray, distance_metric:str='euclidean')
*Calculates the distance between corresponding pairs of points from two arrays using the specified distance metric.
:param array1: A 2D numpy array where each row represents a data point. :param array2: A 2D numpy array where each row represents a data point. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). :return: A 1D numpy array containing the distances between corresponding pairs.*
Batch
calculate_distances_batch
calculate_distances_batch (single_points:numpy.ndarray, points_array:numpy.ndarray, distance_metric:str='euclidean')
*Calculates the distances between single data points and an array of data points based on the specified distance metric.
:param single_points: Single data point array or a batch of data points. :param points_array: Array of data points to compare against. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). :return: Array of distances.*
Nearest Points
find_nearest_points
find_nearest_points (single_point:numpy.ndarray, points_array:numpy.ndarray, n:int, distance_metric:str='euclidean')
find_nearest_points_batch
find_nearest_points_batch (single_points:numpy.ndarray, points_array:numpy.ndarray, n:int, distance_metric:str='euclidean')
*Finds the nearest indices and distances for a batch of single data points to an array of data points based on the specified distance metric.
:param single_points: Array of single data points (2D array). :param points_array: Array of data points to compare against (2D array). :param n: Number of nearest points to retrieve for each single point. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). :return: Tuple of nearest indices and nearest distances for each single point.*
Orbits Distance
orbits_distances
orbits_distances (orbit_data1:numpy.ndarray, orbit_data2:numpy.ndarray, distance_metric:str)
*Calculates distances between orbits in two datasets using a specified distance metric.
This function is robust to input shapes. If an input is a 2D array (representing a single orbit), it is automatically converted to a 3D array with one sample. This allows for flexible comparisons between single or multiple orbits.
:param orbit_data1: First set of orbits (shape: [n_samples1, n_features, n_time_steps] or [n_features, n_time_steps]). :param orbit_data2: Second set of orbits or a single orbit. Shape: [n_samples2, n_features, n_time_steps] or [n_features, n_time_steps]. :param distance_metric: A string representing the distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’).
| Type | Details | |
|---|---|---|
| orbit_data1 | ndarray | Shape: [n_samples1, n_features, n_time_steps] or [n_features, n_time_steps] |
| orbit_data2 | ndarray | Shape: [n_samples2, n_features, n_time_steps] or [n_features, n_time_steps] |
| distance_metric | str | String representing the distance metric (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’) |
| Returns | ndarray |
Get the Closest Orbits
find_nearest_orbits
find_nearest_orbits (single_orbit:numpy.ndarray, orbit_data:numpy.ndarray, n:int, distance_metric:str='euclidean')
*Finds the n closest orbits in orbit_data to the single_orbit based on the specified distance metric.
:param single_orbit: The reference orbit (shape: [n_features, n_time_steps]). :param orbit_data: The dataset of orbits (shape: [n_samples, n_features, n_time_steps]). :param n: The number of closest orbits to return. :param distance_metric: The distance metric to use (‘euclidean’, ‘manhattan’, ‘cosine’, ‘dtw’). Defaults to ‘euclidean’. :return: A tuple containing: - Indices of the n closest orbits in orbit_data. - Distances of the n closest orbits.*
find_nearest_orbits_batch
find_nearest_orbits_batch (single_orbits:numpy.ndarray, orbit_data:numpy.ndarray, n:int, distance_metric:str='euclidean')
*Iteratively finds the n closest orbits in orbit_data for each orbit in single_orbits.
| Type | Default | Details | |
|---|---|---|---|
| single_orbits | ndarray | Shape: [num_single_orbits, n_features, n_time_steps] | |
| orbit_data | ndarray | Shape: [n_samples, n_features, n_time_steps] | |
| n | int | Number of nearest orbits to find | |
| distance_metric | str | euclidean | Distance metric |
| Returns | tuple |
Calculate Pairwise distances
calculate_pairwise_orbit_distances
calculate_pairwise_orbit_distances (orbit_data1:numpy.ndarray, orbit_data2:numpy.ndarray, distance_metric:str='euclidean')
*Calculates the distance between corresponding orbits in two orbit datasets.
| Type | Default | Details | |
|---|---|---|---|
| orbit_data1 | ndarray | Shape: [n_samples, n_features, n_time_steps] | |
| orbit_data2 | ndarray | Shape: [n_samples, n_features, n_time_steps] | |
| distance_metric | str | euclidean | Distance metric |
| Returns | ndarray |
Evaluate Distance Metrics
evaluate_distance_metrics_and_clustering
evaluate_distance_metrics_and_clustering (orbit_data:numpy.ndarray, true_labels:numpy.ndarray, dist ance_metrics:Optional[list]=Non e, clustering_algorithms:Option al[list]=None, evaluation_metri cs:Optional[list]=None, n_clusters:Optional[int]=None, plot_results:bool=True)
Evaluates combinations of distance metrics and clustering algorithms on orbit data.
| Type | Default | Details | |
|---|---|---|---|
| orbit_data | ndarray | Orbit data as multivariate time series [n_samples, n_features, n_time_steps] or point data [n_samples, n_features] | |
| true_labels | ndarray | Array of true labels for the orbit data | |
| distance_metrics | Optional | None | List of distance metrics to use. If None, uses all available metrics |
| clustering_algorithms | Optional | None | List of clustering algorithms to use. If None, uses all available algorithms |
| evaluation_metrics | Optional | None | List of evaluation metrics to use. If None, uses all available metrics |
| n_clusters | Optional | None | Number of clusters. If None, inferred from labels |
| plot_results | bool | True | If True, plot heatmaps of results |
| Returns | dict |
Machine Learning
machine_learning_evaluation
machine_learning_evaluation (X:numpy.ndarray, y:numpy.ndarray, print_results:bool=False, return_best_model:bool=False, scale_data:bool=True)
Evaluates multiple machine learning algorithms on the provided dataset.
| Type | Default | Details | |
|---|---|---|---|
| X | ndarray | Features array, expected to be 2D. Will attempt to reshape if higher dimensions | |
| y | ndarray | Target labels | |
| print_results | bool | False | If True, visualizes the evaluation results |
| return_best_model | bool | False | If True, returns the best model based on accuracy |
| scale_data | bool | True | If True, scales the features using StandardScaler |
| Returns | Union |