Dataset

Scripts to build the different datasets used for modeling

Read Data


source

get_orbit_data_from_hdf5

 get_orbit_data_from_hdf5 (file_path:str)

Load orbit data from an HDF5 file.

Type Details
file_path str Path to the HDF5 file.
Returns Tuple Dictionary of orbits with numerical keys.

Get Features

Orbit Features


source

get_orbit_features_from_hdf5

 get_orbit_features_from_hdf5 (file_path:str)

Load orbit DataFrame from an HDF5 file.

Type Details
file_path str Path to the HDF5 file.
Returns DataFrame DataFrame containing orbit features.

source

get_orbit_features_from_folder

 get_orbit_features_from_folder (folder_path:str)

Concatenate orbit DataFrames from all HDF5 files in a folder, preserving original index and adding system column.

Type Details
folder_path str Path to the folder
Returns DataFrame DataFrame containing concatenated orbit features.

System Features

def get_system_data_from_hdf5(file_path: str              # Path to the HDF5 file.
                             ) -> Dict[str, float]:       # Dictionary containing system features.
    """
    Load system data from an HDF5 file.
    """
    with h5py.File(file_path, 'r') as file:
        # Extract system features and labels
        system_features = file['system_features'][:]
        system_labels = file['system_labels'][:].astype(str)
        
        # Create a dictionary for system
        system_dict = {label: feature[0] for label, feature in zip(system_labels.flatten().tolist(), system_features)}
        
    return system_dict
def get_system_features_from_folder(folder_path: str    # Path to the folder
                                   ) -> pd.DataFrame:   # DataFrame containing concatenated system features.
    """
    Concatenate system DataFrames from all HDF5 files in a folder, preserving original index and adding system column.
    """
    all_systems = []  # List to store individual system dictionaries

    # Iterate over all files in the folder
    for file_name in os.listdir(folder_path):
        file_path = os.path.join(folder_path, file_name)

        # Check if the file is an HDF5 file
        if file_name.endswith('.h5') or file_name.endswith('.hdf5'):
            # Get the system dictionary from the HDF5 file
            system_dict = get_system_data_from_hdf5(file_path)
            
            # Add a new entry to the dictionary for the system name
            system_dict['system'] = os.path.splitext(file_name)[0].split('_')[0]
            
            # Append the dictionary to the list
            all_systems.append(system_dict)

    # Convert the list of dictionaries to a DataFrame
    concatenated_df = pd.DataFrame(all_systems)
    
    return concatenated_df

Get Classes


source

substitute_values_from_df

 substitute_values_from_df (values:List[Any],
                            df:pandas.core.frame.DataFrame,
                            goal_column:str, id_column:str='Id')

*Substitute values in the given list based on the mapping from a DataFrame’s id column to goal column.

Parameters: values (List[Any]): List of values to be substituted. df (pd.DataFrame): DataFrame containing the mapping from id_column to goal_column. goal_column (str): Column in the DataFrame to get the substitution values from. id_column (str, optional): Column in the DataFrame to match the values with. Default is ‘Id’.

Returns: List[Any]: A list with substituted values from the DataFrame’s goal_column.*

Type Default Details
values List List of values to be substituted.
df DataFrame DataFrame containing the mapping.
goal_column str Column in the DataFrame to get the substitution values from.
id_column str Id Column in the DataFrame to match the values with. Default is ‘Id’.
Returns List

source

get_orbit_classes

 get_orbit_classes (values:List[Any])

Get orbit classes by substituting values with their corresponding Label, Type, Subtype and Direction.

Type Details
values List List of values to be substituted with orbit classifications
Returns Tuple
values = [1,7,23]
get_orbit_classes(values)
(['S_BN', 'S_L1_A', 'S_L4_LP'],
 ['System-wide', 'L1', 'L4'],
 ['Butterfly', 'Axial', 'Long Period'],
 ['North', 'No specification', 'No specification'])

Get Periods


source

get_periods_of_orbit_dict

 get_periods_of_orbit_dict (orbits:Dict[int,numpy.ndarray],
                            propagated_periods:Dict[int,int],
                            desired_periods:int)

Process the orbits to extract the desired periods and print the percentage of the dataset returned.

Type Details
orbits Dict Dictionary of orbits with numerical keys.
propagated_periods Dict Dictionary of propagated periods for each orbit.
desired_periods int Desired number of periods.
Returns Dict Processed dictionary of orbits.

Get Dataset

Fixed Period


source

get_first_period_of_fixed_period_dataset

 get_first_period_of_fixed_period_dataset (file_path:str)

Load and process orbit data from an HDF5 file for the first period.

Type Details
file_path str Path to the HDF5 file.
Returns Tuple 3D numpy array of padded orbits.

Fixed Step


source

get_full_fixed_step_dataset

 get_full_fixed_step_dataset (file_path:str, segment_length:int)

Load and process orbit data from an HDF5 file, segmenting each orbit into specified length.

Type Details
file_path str Path to the HDF5 file.
segment_length int Desired length of each segment.
Returns Tuple 3D numpy array of segmented orbits.

source

get_first_period_fixed_step_dataset

 get_first_period_fixed_step_dataset (file_path:str, segment_length:int)

Load and process orbit data from an HDF5 file, segmenting each orbit into specified length.

Type Details
file_path str Path to the HDF5 file.
segment_length int Desired length of each segment.
Returns Tuple 3D numpy array of segmented orbits.

First Period


source

get_first_period_dataset

 get_first_period_dataset (file_path:str,
                           segment_length:Optional[int]=100)

Load orbit data based on the file path. Calls the appropriate function depending on the name of the file.

Type Default Details
file_path str Path to the HDF5 file.
segment_length Optional 100 Desired length of each segment, optional.
Returns Tuple Memmap of segmented orbits.

source

get_first_period_dataset_all_systems

 get_first_period_dataset_all_systems (folder_path:str,
                                       segment_length:Optional[int]=100)

Processes all system files in a folder, concatenates their data while maintaining order.

Type Default Details
folder_path str Path to the folder containing system files
segment_length Optional 100 Desired length of each segment
Returns Tuple Concatenated orbits as a 3D NumPy memmap

Get Constants


source

get_system_constants

 get_system_constants (system_dict:Dict[str,float],
                       system_labels:numpy.ndarray, constant:str)

Extracts values for a specified constant from a system dictionary based on system labels.

Type Details
system_dict Dict Dictionary containing system constants for different systems
system_labels ndarray Array of system labels
constant str The constant to extract (e.g., ‘mu’, ‘LU’, etc.)
Returns ndarray