smart_geocubes.datasets.alphaearth
¶
Predefined accessor for TCTrend data.
Classes:
-
AlphaEarthEmbeddings–Accessor for AlphaEarth Embeddings data.
AlphaEarthEmbeddings
¶
AlphaEarthEmbeddings(
storage: Storage | Path | str,
create_icechunk_storage: bool = True,
backend: Literal["threaded", "simple"] = "threaded",
)
Bases: GEEMosaicAccessor
Accessor for AlphaEarth Embeddings data.
Attributes:
-
extent(GeoBox) –The extent of the datacube represented by a GeoBox.
-
chunk_size(int) –The chunk size of the datacube.
-
channels(list) –The channels of the datacube.
-
storage(Storage) –The icechunk storage.
-
repo(Repository) –The icechunk repository.
-
title(str) –The title of the datacube.
-
stopuhr(StopUhr) –The benchmarking timer from the stopuhr library.
-
zgeobox(GeoBox) –The geobox of the zarr array. Should be equal to the extent geobox.
-
created(bool) –True if the datacube already exists in the storage.
Initialize base class for remote accessors.
Warning
In a multiprocessing environment, it is strongly recommended to not set create_icechunk_storage=False.
Parameters:
-
(storage¶Storage) –The icechunk storage of the datacube.
-
(create_icechunk_storage¶bool, default:True) –If an icechunk repository should be created at provided storage if no exists. This should be disabled in a multiprocessing environment. Defaults to True.
-
(backend¶Literal['threaded', 'simple'], default:'threaded') –The backend to use for downloading data. Currently, only "threaded" is supported. Defaults to "threaded".
Raises:
-
ValueError–If the storage is not an icechunk.Storage.
Methods:
-
adjacent_patches–Get the adjacent patches for the given geobox.
-
assert_created–Assert that the datacube exists in the storage.
-
assert_temporal_cube–Assert that the datacube has a temporal dimension.
-
create–Create an empty datacube and write it to the store.
-
current_state–Get info about currently stored tiles.
-
download_patch–Download the data for the given patch.
-
load–Load the data for the given geobox.
-
load_like–Load the data for the given geobox.
-
loaded_patches–Get the ids of already (down-)loaded patches.
-
log_benchmark_summary–Log the benchmark summary.
-
open_xarray–Open the xarray datacube in read-only mode.
-
open_zarr–Open the zarr datacube in read-only mode.
-
post_create–Post create actions. Can be overwritten by the dataset accessor.
-
post_init–Post init actions. Can be overwritten by the dataset accessor.
-
procedural_download–Download tiles procedurally.
-
visualize_state–Visulize the extend, hence the already downloaded and filled data, of the datacube.
Source code in src/smart_geocubes/core/accessor.py
created
property
¶
Check if the datacube already exists in the storage.
Returns:
-
bool(bool) –True if the datacube already exists in the storage.
is_temporal
property
¶
Check if the datacube has a temporal dimension.
Returns:
-
bool(bool) –True if the datacube has a temporal dimension.
adjacent_patches
¶
Get the adjacent patches for the given geobox.
Must be implemented by the Accessor.
Parameters:
-
(roi¶Geometry | GeoBox | GeoDataFrame) –The reference geometry, geobox or reference geodataframe
-
(toi¶TOI) –The time of interest to download.
Returns:
Raises:
-
ValueError–If the ROI type is invalid.
-
ValueError–If the datacube is not temporal, but a time of interest is provided.
Source code in src/smart_geocubes/accessors/gee.py
assert_created
¶
assert_temporal_cube
¶
Assert that the datacube has a temporal dimension.
Raises:
-
ValueError–If the datacube has no temporal dimension.
Source code in src/smart_geocubes/core/accessor.py
create
¶
Create an empty datacube and write it to the store.
Parameters:
-
(overwrite¶bool, default:False) –Allowing overwriting an existing datacube. Has no effect if exists_ok is True. Defaults to False.
-
(exists_ok¶bool, default:False) –Do not raise an error if the datacube already exists.
Raises:
-
FileExistsError–If a datacube already exists at location and exists_ok is False.
Source code in src/smart_geocubes/core/accessor.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 | |
current_state
¶
Get info about currently stored tiles.
Returns:
-
GeoDataFrame | None–gpd.GeoDataFrame: Tiles from odc.geo.GeoboxTiles. None if datacube is empty.
Source code in src/smart_geocubes/accessors/gee.py
download_patch
¶
Download the data for the given patch.
Must be implemented by the Accessor.
Parameters:
-
(idx¶PatchIndex[Item]) –The reference patch to download the data for.
Returns:
-
Dataset–xr.Dataset: The downloaded patch data.
Source code in src/smart_geocubes/accessors/gee.py
load
¶
load(
aoi: Geometry | GeoBox,
toi: TOI = None,
persist: bool = True,
create: bool = False,
) -> xr.Dataset
Load the data for the given geobox.
Parameters:
-
(aoi¶Geometry | GeoBox) –The reference geometry to load the data for. If a Geobox is provided, it will use the extent of the geobox.
-
(toi¶TOI, default:None) –The temporal slice to load. Defaults to None.
-
(persist¶bool, default:True) –If the data should be persisted in memory. If not, this will return a Dask backed Dataset. Defaults to True.
-
(create¶bool, default:False) –Create a new zarr array at defined storage if it not exists. This is not recommended, because it can have side effects in a multi-process environment. Defaults to False.
Returns:
-
Dataset–xr.Dataset: The loaded dataset in the same resolution and extent like the geobox.
Source code in src/smart_geocubes/core/accessor.py
load_like
¶
Load the data for the given geobox.
Parameters:
-
(ref¶Dataset | DataArray) –The reference dataarray or dataset to load the data for.
-
(**kwargs¶Unpack[LoadParams], default:{}) –The load parameters (buffer, persist, create, concurrency_mode).
Other Parameters:
-
buffer(int) –The buffer around the projected geobox in pixels. Defaults to 0.
-
persist(bool) –If the data should be persisted in memory. If not, this will return a Dask backed Dataset. Defaults to True.
-
create(bool) –Create a new zarr array at defined storage if it not exists. This is not recommended, because it can have side effects in a multi-process environment. Defaults to False.
Returns:
-
Dataset–xr.Dataset: The loaded dataset in the same resolution and extent like the geobox.
Source code in src/smart_geocubes/core/accessor.py
loaded_patches
¶
Get the ids of already (down-)loaded patches.
Returns:
Source code in src/smart_geocubes/core/accessor.py
log_benchmark_summary
¶
open_xarray
¶
open_zarr
¶
Open the zarr datacube in read-only mode.
Returns:
-
Group–zarr.Group: The zarr datacube.
post_create
¶
post_init
¶
procedural_download
¶
Download tiles procedurally.
Warning
This method is meant for single-process use, but can (in theory) be used in a multi-process environment. However, in a multi-process environment it can happen that multiple processes try to write concurrently, which results in a conflict. In such cases, the download will be retried until it succeeds or the number of maximum-tries is reached.
Parameters:
-
(aoi¶Geometry | GeoBox) –The geometry of the aoi to download. If a Geobox is provided, it will use the extent of the geobox.
-
(toi¶TOI) –The time of interest to download.
Raises:
-
ValueError–If no adjacent tiles are found. This can happen if the geobox is out of the dataset bounds.
-
ValueError–If not all downloads were successful.
Source code in src/smart_geocubes/core/accessor.py
visualize_state
¶
Visulize the extend, hence the already downloaded and filled data, of the datacube.
Parameters:
-
(ax¶Axes | None, default:None) –The axes drawn to. If None, will create a new figure and axes.
Returns:
-
Figure | Axes–plt.Figure | plt.Axes: The figure with the visualization if no axes was provided, else the axes.
Raises:
-
ValueError–If the datacube is empty