cryoswath.misc module
- cryoswath.misc.convert_all_esri_to_feather(dir_path: str = None) None [source]
Converts ESRI/ArcGIS formatted files to feathers
Finds all .shp in given directory. Not recursive.
- Parameters:
dir_path (str, optional) – Root directory. Defaults to None.
- cryoswath.misc.cs_id_to_time(cs_id: str) Timestamp [source]
Formats CryoSat-2 file time tag as timestamp.
- Parameters:
cs_id (str) – CryoSat-2 file time tag.
- Returns:
Timestamp.
- Return type:
pd.Timestamp
- cryoswath.misc.cs_time_to_id(time: Timestamp) str [source]
Converts timestamp to CryoSat-2 file time tag.
- Parameters:
time (pd.Timestamp) – Timestamp.
- Returns:
CryoSat-2 file time tag.
- Return type:
str
- cryoswath.misc.extend_filename(file_name: str, extension: str) str [source]
Adds string at end of file name, before last “.”
- Parameters:
file_name (str) – File name or path.
extension (str) – String to insert at end.
- Returns:
As input, including extension.
- Return type:
str
- cryoswath.misc.filter_kwargs(func: callable, kwargs: dict, *, blacklist: list[str] = None, whitelist: list[str] = None) dict [source]
Automatically reduces dict to accepted inputs
Detects expected key-word arguments of a function and only passes those. Use black- and whitelists to refine.
- Parameters:
func (callable) – Target function.
kwargs (dict) – KW-args to be filtered.
blacklist (list[str], optional) – Blacklist undesired arguments. Defaults to None.
whitelist (list[str], optional) – Include extra arguments, that are not part of the functions signature. Defaults to None.
- Returns:
Filtered kw-args.
- Return type:
dict
- cryoswath.misc.find_region_id(location: any, scope: str = 'o2') str [source]
Returns RGI id for multitude of inputs
Special behavior in Greenland! If o2 region is requested, return id of “custom” subregion: 05-11–05-15 for N, W, SW, SE, E. See geo-feathers in data/auxiliary/RGI/05-1*.feather.
- Parameters:
location (any) – Can be a geo-referenced xarray.DataArray, a geopandas.GeoDataFrame or Series, or a shapely.Geometry.
scope (str, optional) – One of “o1”, “o2”, or “basin”. Defaults to “o2”.
- Raises:
Exception – scope is “o2” and location is in Greenland but - not in one of the custom subregions or - in more than one custom subregion.
- Returns:
RGI id.
- Return type:
str
- cryoswath.misc.flag_outliers(data, *, weights=None, stat: callable = <function median>, deviation_factor: float = 3, scaling_factor: float = np.float64(1.3489795003921636))[source]
Flags data that is considered outlier given a set of assumptions
Data too far from a reference point is marked. Works analogous comparing data to its mean in terms of standard deviations.
Function was meant to be versatile. However, I’m not sure it makes sense using it with other than the “usual” statistics: mean and median.
It defaults to marking data further from the median than 3 scaled MADs.
- Parameters:
data (ArrayLike) – If data is an array, outliers will be flagged along first dimension (given stat works like most numpy functions).
weights (ArrayLike) – If weights are provided, they are passed as the keyword argument to stat.
stat (callable, optional) – Function to return first and second reference points. Defaults to np.median.
deviation_factor (float, optional) – Allowed number of reference point distances between data and first reference point. Defaults to 3.
scaling_factor (float, optional) – Reference distance scaling. Defaults to 2*2**.5*scipy.special.erfinv(.5)).
- Returns:
Mask that is positive for outliers.
- Return type:
bool, shaped like input
- cryoswath.misc.flag_translator(cs_l1b_flag)[source]
Retrieves the meaning of a flag from the attributes.
If attributes contain “flag_masks”, it converts the value to a binary mask and returns a list of flags. Else it expects “flag_values” and interprets and returns the flag as one of a set of options.
This works for CryoSat-2 L1b netCDF data. It depends on the attribute structure and names.
- Parameters:
cs_l1b_flag (0-dim xarray.DataArray) – Flag variable of waveform.
- Returns:
List of flags or single option, depending on flag.
- Return type:
list or string
- cryoswath.misc.gauss_filter_DataArray(da: DataArray, dim: str, window_extent: int, std: int) DataArray [source]
Low-pass filters input array.
Convolves each vector of an array along the specified dimension with a normalized gauss-function having the specified standard deviation.
- Parameters:
da (xr.DataArray) – Data to be filtered.
dim (str) – Dimension to apply filter along.
window_extent (int) – Window width. If not uneven, it is increased.
std (int) – Standard deviation of gauss-filter.
- Returns:
_description_
- Return type:
xr.DataArray
- cryoswath.misc.get_dem_reader(data: any = None) DatasetReader [source]
Determines which DEM to use
Attempts to determine location of data and returns appropriate rasterio.io.DatasetReader. Only implemented for ArcticDEM and REMA.
- Parameters:
data (any) – Defaults to None.
- Raises:
NotImplementedError – If region can’t be inferred.
- Returns:
Reader pointing to the file.
- Return type:
rasterio.DatasetReader
- cryoswath.misc.interpolate_hypsometrically(ds: Dataset, main_var: str, error: str, elev: str = 'ref_elev', weights: str = 'weights', outlier_replace: bool = False) Dataset [source]
Fills data gaps by hypsometrical interpolation
If sufficient data is provided, this routine sorts and bins the data by elevation bands and fits a third-order polynomial to the weighted averages.
Sufficient data requires 4 or more bands, with an effective sample size of 6 or larger, that span at least 2/3 of the total elevation range. The weights used to calculate the weighted average are the reciprocal squared errors if no weights are provided.
If dimension “time” exists, recurse into time steps and interpolate per time step.
- Parameters:
ds (xr.Dataset) – Input with voids.
main_var (str) – Name of variable to interpolate.
error (str) – Name of errors. Only used, if weights are not provided.
elev (str, optional) – Name of variable that contains the reference elevation used for binning. If the variable does not exist, it is attempted to read the reference elevations from disk. Defaults to “ref_elev”.
weights (str, optional) – Provide name of variable that contains the weights. The weights will be passed to numpy.average and should be 1/variance or similar. Defaults to “weights”.
outlier_replace (bool, optional) – If enabled, also interpolates outliers. Defaults to False.
- Returns:
Filled dataset.
- Return type:
xr.Dataset
- cryoswath.misc.load_basins(rgi_ids: list[str]) GeoDataFrame [source]
Loads RGI v7 basin ~or complex~ outlines and meta data
- Parameters:
rgi_ids (list[str]) – RGI basin ids, all within the same RGI o1 region.
- Returns:
Queried RGI data with geometry column containing the outlines.
- Return type:
gpd.GeoDataFrame
- cryoswath.misc.load_cs_full_file_names(update: str = 'no') Series [source]
Loads a pandas.Series of the original CryoSat-2 L1b file names.
Having the file names available can be handy to organize your local data.
This function can be used to update your local list by setting update.
- Parameters:
update (str, optional) – One of “no”, “quick”, “regular, or “full”. “quick” continues from the last locally known file name, “regular” checks for changes between the stages OFFL and LTA, and “full” replaces the local data base with a new one. Defaults to “no”.
- Returns:
Full L1b file names without path or extension.
- Return type:
pd.Series
- cryoswath.misc.load_cs_ground_tracks(region_of_interest: str | Polygon = None, start_datetime: str | Timestamp = '2010', end_datetime: str | Timestamp = '2030', *, buffer_period_by: relativedelta = None, buffer_region_by: float = None, update: str = 'no', n_threads: int = 8) GeoDataFrame [source]
Read the GeoDataFrame of CryoSat-2 tracks from disk.
If desired, you can query certain extents or periods by specifying arguments.
Further, you can update the database by setting update to “regular” or “full”. Mind that this typically takes some time (regular on the order of minutes, full rather hours).
- Parameters:
region_of_interest (str | shapely.Polygon, optional) – Can be any RGI code or a polygon in lat/lon (CRS EPSG:4326). If requesting o1 regions, provide the long code, e.g., “01_alaska”. Defaults to None.
start_datetime (str | pd.Timestamp, optional) – Defaults to “2010”.
end_datetime (str | pd.Timestamp, optional) – Defaults to “2030”.
buffer_period_by (relativedelta, optional) – Extends the period to both sides. Handy if you use this function to query tracks for an aggregated product. Defaults to None.
buffer_region_by (float, optional) – Handy to also query tracks in the proximity that may return elevation estimates for your region of interest. Unit are meters here. CryoSat’s footprint is +- 7.5 km to both sides, anything above 30_000 does not make much sense. Defaults to None.
update (str, optional) – If you are interested in the latest tracks, update frequently with update=”regular”. If you believe tracks are missing for some reason, choose update=”full” (be aware this takes a while). Defaults to “no”.
n_threads (int, optional) – Number of parallel ftp connections. If you choose too many, ESA will refuse the connection. Defaults to 8.
- Raises:
ValueError – For invalid update arguments.
- Returns:
CryoSat-2 tracks.
- Return type:
gpd.GeoDataFrame
- cryoswath.misc.load_glacier_outlines(identifier: str | list[str], product: str = 'complexes', union: bool = True, crs: int | CRS = None) MultiPolygon [source]
Loads RGI v7 basin or complex outlines and meta data
- Parameters:
identifier (str | list[str]) – RGI id: either o1, o2, or basin/complex id.
product (str, optional) – Either “glaciers” or “complexes”. Defaults to “complexes”.
union (bool, optional) – For backward compatibility, if enabled (by default) only return union of all shapes. If disabled, return full GeoDataFrame. Defaults to True.
crs (int | CRS, optional) – Convenience option to reproject shape(s) to crs. Defaults to None.
- Raises:
ValueError – If identifier was not understood.
- Returns:
Union of basin shapes. If union is disabled, instead return geopandas.GeoDataFrame including the full data.
- Return type:
shapely.MultiPolygon
- cryoswath.misc.load_o1region(o1code: str, product: str = 'complexes') GeoDataFrame [source]
Loads RGI v7 basin or complex outlines and meta data
- Parameters:
o1code (str) – starting with “01”..”20”
product (str, optional) – Either “glaciers” or “complexes”. Defaults to “complexes”.
- Raises:
ValueError – If o1code can’t be recognized.
FileNotFoundError – If RGI data is missing.
- Returns:
Queried RGI data with geometry column containing the outlines.
- Return type:
gpd.GeoDataFrame
- cryoswath.misc.load_o2region(o2code: str, product: str = 'complexes') GeoDataFrame [source]
Loads RGI v7 basin or complex outlines and meta data
- Parameters:
o2code (str) – RGI o2 code.
product (str, optional) – Either “glaciers” or “complexes”. Defaults to “complexes”.
- Returns:
Queried RGI data with geometry column containing the outlines.
- Return type:
gpd.GeoDataFrame
- cryoswath.misc.merge_l2_cache(source_glob: str, destination_file_name: str, exclude_endswith: list[str] = ['backup', 'collection']) None [source]
Append cached l2 data from various hdf files into one.
Tests whether data is present in destination; if not, copies the data.
This function is very specifically for cached l2 data as created by l3.build_dataset.
- Parameters:
source_glob (str) – Unix-like glob pattern to match source files in misc.tmp_path (default: data/tmp/).
destination_file_name (str) – … in misc.tmp_path.
exclude_endswith (list[str], optional) – Do not include files with the specified ending. Useful to exclude backups. Defaults to [“backup”, “collection”].
- cryoswath.misc.nan_unique(data: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) list [source]
Returns unique values that are not nan.
- Parameters:
data (np.typing.ArrayLike) – Input data.
- Returns:
List of unique values.
- Return type:
list
- cryoswath.misc.repair_l2_cache(filepath: str, *, region_of_interest: MultiPolygon = None, force: bool = False) None [source]
Attempts to repair corrupted l2 cache files.
The caching logic is not 100% safe. To repair a cache, this function removes duplicates and sorts the data index. If the note names for some reason
- Parameters:
filepath (str) – Path to l2 cache file.
region_of_interest (shapely.Geometry, optional) – EPSG:4326 outline of considered region. If provided, removes chunks with no points inside projected bounding box of outline.
force (bool) – Disregard file size safety, e.g., if you expect less than 2/3 of the data to remain.
- cryoswath.misc.request_workers(task_func: callable, n_workers: int, result_queue: Queue = None) Queue [source]
Creates workers and provides queue to assign work
- Parameters:
task_func (callable) – Task.
n_workers (int) – Number of requested workers.
result_queue (queue.Queue, optional) – Queue in which to drop results. Defaults to None.
- Returns:
Task queue.
- Return type:
queue.Queue
- cryoswath.misc.rgi_code_translator(input: str | list[str], out_type: str = 'full_name') str [source]
Translate o1 or o2 codes to region names
- Parameters:
input (str) – RGI o1 or o2 codes.
out_type (str, optional) – Either “full_name” or “long_code”. Defaults to “full_name”.
- Raises:
ValueError – If input is not understood.
- Returns:
Either full name or RGI “long_code”.
- Return type:
str
- cryoswath.misc.rgi_o1region_translator(input: int, out_type: str = 'full_name') str [source]
Finds region name for given RGI o1 number.
- Parameters:
input (int) – RGI o1 number.
out_type (str, optional) – Either “full_name” or “long_code”. Defaults to “full_name”.
- Returns:
Either full name or RGI “long_code”.
- Return type:
str
- cryoswath.misc.rgi_o2region_translator(o1: int, o2: int, out_type: str = 'full_name') str [source]
Finds subregion name for given RGI o1 and o2 number.
- Parameters:
o1 (int) – RGI o1 number.
o2 (int) – RGI o2 number.
out_type (str, optional) – Either “full_name” or “long_code”. Defaults to “full_name”.
- Returns:
Either full name or RGI “long_code”.
- Return type:
str
- cryoswath.misc.weighted_mean_excl_outliers(df: DataFrame | Dataset = None, weights: ndarray | str = 'weights', *, values: ndarray | str = None, deviation_factor: int = 5, return_mask: bool = False) float [source]
Calculates the weighted average after excluding outliers.
- Note: This function uses np.average which expects weights similar
to 1/variance - incontrast to np.lstsq and derivates, that expect 1/std and square the weights internally.
- Parameters:
df (DataFrame) – DataFrame containing values and weights.
values (1d-numpy array) – Values to average or name of dataframe column to average.
weights (1d-numpy array) – Weights to apply to values or name of dataframe column to use.
deviation_factor (int, optional) – Factor to apply to standard deviation. Values further appart from average are excluded. Defaults to 5.
- Returns:
Weighted average excluding outliers. if return_mask, returns a boolean mask that is true where outliers were detected. The mask is same as input type.
- Return type:
float
- cryoswath.misc.xycut(data: GeoDataFrame, x_chunk_meter=60000, y_chunk_meter=60000) list[dict[str, float | GeoDataFrame]] [source]
Chunk point data in planar reference system
This mainly is a helper function for l3.build_dataset() that takes many data points and chunks them based on their location. However, it may be helpful in other contexts.
- Returns:
List of dicts of which each contains the x and y extents of the current chunk and the GeoDataFrame or Series of the point data.
- Return type:
list