Using 'open virtual dataset' capability to work with TEMPO Level 3 data¶
Summary¶
In this tutorial, we will use the earthaccess.open_virtual_mfdataset()
function to open a week's worth of granules from the Nitrogen Dioxide (NO2) Level-3 data collection of the TEMPO air quality mission (link). We will then calculate means for a subset of the data and visualize the results.
Note that this same approach can be used for a date range of any length, within the mission's duration. Running this notebook for a year's worth of TEMPO Level-3 data took approximately 15 minutes.
Prerequisites¶
AWS US-West-2 Environment: This tutorial has been designed to run in an AWS cloud compute instance in AWS region us-west-2. However, if you want to run it from your laptop or workstation, everything should work just fine but without the speed benefits of in-cloud access.
Earthdata Account: A (free!) Earthdata Login account is required to access data from the NASA Earthdata system. Before requesting TEMPO data, we first need to set up our Earthdata Login authentication, as described in the Earthdata Cookbook's earthaccess tutorial (link).
Packages:
cartopy
dask
earthaccess
version 0.14.0 or greatermatplotlib
numpy
xarray
Setup¶
import cartopy.crs as ccrs
import earthaccess
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from matplotlib import rcParams
%config InlineBackend.figure_format = 'jpeg'
rcParams["figure.dpi"] = (
80 # Reduce figure resolution to keep the saved size of this notebook low.
)
Login using the Earthdata Login¶
auth = earthaccess.login()
if not auth.authenticated:
# Ask for credentials and persist them in a .netrc file.
auth.login(strategy="interactive", persist=True)
print(earthaccess.__version__)
0.14.0
Search for data granules¶
We search for TEMPO Nitrogen Dioxide (NO2) data for a week-long period (note: times are in UTC), between January 11th and 18th, 2024.
results = earthaccess.search_data(
short_name="TEMPO_NO2_L3",
version="V03",
temporal=("2024-01-11 12:00", "2024-01-18 12:00"),
)
print(f"Number of results: {len(results)}")
Number of results: 81
Opening Virtual Multifile Datasets¶
First we set the argument options to be used by earthaccess.open_virtual_mfdataset
.
load
argument considerations:
load=True
works. Withinearthaccess.open_virtual_mfdataset
, a temporary virtual reference file (a "virtual dataset") is created and then immediately loaded with kerchunk. This is because the function assumes the user is making this request for the first time and the combined manifest file needs to be generated first. In the future, however,earthaccess.open_virtual_mfdataset
may provide a way to save the combined manifest file, at which point you could then avoid repeating these steps, and proceed directly to loading with kerchunk/virtualizarr.load=False
results inKeyError: "no index found for coordinate 'longitude'"
because it createsManifestArray
s without indexes (see the earthaccess documentation here (link))
open_options = {
"access": "direct",
"load": True,
"concat_dim": "time",
"coords": "minimal",
"compat": "override",
"combine_attrs": "override",
"parallel": True,
}
Because TEMPO data are processed and archived in a netCDF4 format using a group hierarchy, we open each group ā i.e., 'root', 'product', and 'geolocation' ā and then afterwards merge them together.
%%time
result_root = earthaccess.open_virtual_mfdataset(granules=results, **open_options)
result_product = earthaccess.open_virtual_mfdataset(
granules=results, group="product", **open_options
)
result_geolocation = earthaccess.open_virtual_mfdataset(
granules=results, group="geolocation", **open_options
)
/home/docs/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/earthaccess/dmrpp_zarr.py:127: FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars=None. This is likely to lead to different results when multiple datasets have matching variables with overlapping values. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set data_vars explicitly. vds = xr.combine_nested(vdatasets, **xr_combine_nested_kwargs)
CPU times: user 1.27 s, sys: 150 ms, total: 1.42 s Wall time: 10.7 s
--------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[5], line 1 ----> 1 get_ipython().run_cell_magic('time', '', 'result_root = earthaccess.open_virtual_mfdataset(granules=results, **open_options)\nresult_product = earthaccess.open_virtual_mfdataset(\n granules=results, group="product", **open_options\n)\nresult_geolocation = earthaccess.open_virtual_mfdataset(\n granules=results, group="geolocation", **open_options\n)\n') File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2565, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2563 with self.builtin_trap: 2564 args = (magic_arg_s, cell) -> 2565 result = fn(*args, **kwargs) 2567 # The code below prevents the output from being displayed 2568 # when using magics with decorator @output_can_be_silenced 2569 # when the last Python token in the expression is a ';'. 2570 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/magics/execution.py:1470, in ExecutionMagics.time(self, line, cell, local_ns) 1468 if interrupt_occured: 1469 if exit_on_interrupt and captured_exception: -> 1470 raise captured_exception 1471 return 1472 return out File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/magics/execution.py:1434, in ExecutionMagics.time(self, line, cell, local_ns) 1432 st = clock2() 1433 try: -> 1434 exec(code, glob, local_ns) 1435 out = None 1436 # multi-line %%time case File <timed exec>:1 File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/earthaccess/dmrpp_zarr.py:127, in open_virtual_mfdataset(granules, group, access, load, preprocess, parallel, **xr_combine_nested_kwargs) 125 vds = vdatasets[0] 126 else: --> 127 vds = xr.combine_nested(vdatasets, **xr_combine_nested_kwargs) 128 if load: 129 refs = vds.virtualize.to_kerchunk(filepath=None, format="dict") File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/structure/combine.py:637, in combine_nested(datasets, concat_dim, compat, data_vars, coords, fill_value, join, combine_attrs) 634 concat_dim = [concat_dim] 636 # The IDs argument tells _nested_combine that datasets aren't yet sorted --> 637 return _nested_combine( 638 datasets, 639 concat_dims=concat_dim, 640 compat=compat, 641 data_vars=data_vars, 642 coords=coords, 643 ids=False, 644 fill_value=fill_value, 645 join=join, 646 combine_attrs=combine_attrs, 647 ) File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/structure/combine.py:389, in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join, combine_attrs) 386 _check_shape_tile_ids(combined_ids) 388 # Apply series of concatenate or merge operations along each dimension --> 389 combined = _combine_nd( 390 combined_ids, 391 concat_dims=concat_dims, 392 compat=compat, 393 data_vars=data_vars, 394 coords=coords, 395 fill_value=fill_value, 396 join=join, 397 combine_attrs=combine_attrs, 398 ) 399 return combined File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/structure/combine.py:254, in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join, combine_attrs) 250 # Each iteration of this loop reduces the length of the tile_ids tuples 251 # by one. It always combines along the first dimension, removing the first 252 # element of the tuple 253 for concat_dim in concat_dims: --> 254 combined_ids = _combine_all_along_first_dim( 255 combined_ids, 256 dim=concat_dim, 257 data_vars=data_vars, 258 coords=coords, 259 compat=compat, 260 fill_value=fill_value, 261 join=join, 262 combine_attrs=combine_attrs, 263 ) 264 (combined_ds,) = combined_ids.values() 265 return combined_ds File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/structure/combine.py:286, in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join, combine_attrs) 284 combined_ids = dict(sorted(group)) 285 datasets = combined_ids.values() --> 286 new_combined_ids[new_id] = _combine_1d( 287 datasets, 288 concat_dim=dim, 289 compat=compat, 290 data_vars=data_vars, 291 coords=coords, 292 fill_value=fill_value, 293 join=join, 294 combine_attrs=combine_attrs, 295 ) 296 return new_combined_ids File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/structure/combine.py:316, in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join, combine_attrs) 314 if concat_dim is not None: 315 try: --> 316 combined = concat( 317 datasets, 318 dim=concat_dim, 319 data_vars=data_vars, 320 coords=coords, 321 compat=compat, 322 fill_value=fill_value, 323 join=join, 324 combine_attrs=combine_attrs, 325 ) 326 except ValueError as err: 327 if "encountered unexpected variable" in str(err): File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/structure/concat.py:295, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs, create_index_for_new_dim) 282 return _dataarray_concat( 283 objs, 284 dim=dim, (...) 292 create_index_for_new_dim=create_index_for_new_dim, 293 ) 294 elif isinstance(first_obj, Dataset): --> 295 return _dataset_concat( 296 objs, 297 dim=dim, 298 data_vars=data_vars, 299 coords=coords, 300 compat=compat, 301 positions=positions, 302 fill_value=fill_value, 303 join=join, 304 combine_attrs=combine_attrs, 305 create_index_for_new_dim=create_index_for_new_dim, 306 ) 307 else: 308 raise TypeError( 309 "can only concatenate xarray Dataset and DataArray " 310 f"objects, got {type(first_obj)}" 311 ) File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/structure/concat.py:769, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs, create_index_for_new_dim) 767 result_vars[k] = v 768 else: --> 769 combined_var = concat_vars( 770 vars, dim_name, positions, combine_attrs=combine_attrs 771 ) 772 # reindex if variable is not present in all datasets 773 if len(variable_index) < concat_index_size: File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/core/variable.py:3102, in concat(variables, dim, positions, shortcut, combine_attrs) 3054 def concat( 3055 variables, 3056 dim="concat_dim", (...) 3059 combine_attrs="override", 3060 ): 3061 """Concatenate variables along a new or existing dimension. 3062 3063 Parameters (...) 3100 along the given dimension. 3101 """ -> 3102 variables = list(variables) 3103 if all(isinstance(v, IndexVariable) for v in variables): 3104 return IndexVariable.concat(variables, dim, positions, shortcut, combine_attrs) File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/structure/concat.py:691, in _dataset_concat.<locals>.ensure_common_dims(vars, concat_dim_lengths) 689 if var.dims != common_dims: 690 common_shape = tuple(dims_sizes.get(d, dim_len) for d in common_dims) --> 691 var = var.set_dims(common_dims, common_shape) 692 yield var File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/util/deprecation_helpers.py:144, in deprecate_dims.<locals>.wrapper(*args, **kwargs) 136 emit_user_level_warning( 137 f"The `{old_name}` argument has been renamed to `dim`, and will be removed " 138 "in the future. This renaming is taking place throughout xarray over the " (...) 141 PendingDeprecationWarning, 142 ) 143 kwargs["dim"] = kwargs.pop(old_name) --> 144 return func(*args, **kwargs) File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/xarray/core/variable.py:1476, in Variable.set_dims(self, dim, shape) 1469 elif shape is None or all( 1470 s == 1 for s, e in zip(shape, dim, strict=True) if e not in self_dims 1471 ): 1472 # "Trivial" broadcasting, i.e. simply inserting a new dimension 1473 # This is typically easier for duck arrays to implement 1474 # than the full "broadcast_to" semantics 1475 indexer = (None,) * (len(expanded_dims) - self.ndim) + (...,) -> 1476 expanded_data = self.data[indexer] 1477 else: # elif shape is not None: 1478 dims_map = dict(zip(dim, shape, strict=True)) File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/virtualizarr/manifests/array.py:226, in ManifestArray.__getitem__(self, key) 224 return self 225 else: --> 226 raise NotImplementedError(f"Doesn't support slicing with {indexer}") NotImplementedError: Doesn't support slicing with (None, slice(None, None, None))
Merge root groups with subgroups.
result_merged = xr.merge([result_root, result_product, result_geolocation])
result_merged
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[6], line 1 ----> 1 result_merged = xr.merge([result_root, result_product, result_geolocation]) 2 result_merged NameError: name 'result_root' is not defined
Temporal Mean - a map showing an annual average¶
temporal_mean_ds = (
result_merged.sel({"longitude": slice(-78, -74), "latitude": slice(35, 39)})
.where(result_merged["main_data_quality_flag"] == 0)
.mean(dim=("time"))
)
temporal_mean_ds
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[7], line 2 1 temporal_mean_ds = ( ----> 2 result_merged.sel({"longitude": slice(-78, -74), "latitude": slice(35, 39)}) 3 .where(result_merged["main_data_quality_flag"] == 0) 4 .mean(dim=("time")) 5 ) 6 temporal_mean_ds NameError: name 'result_merged' is not defined
%%time
mean_vertical_column_trop = temporal_mean_ds["vertical_column_troposphere"].compute()
CPU times: user 6 μs, sys: 1 μs, total: 7 μs Wall time: 10 μs
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[8], line 1 ----> 1 get_ipython().run_cell_magic('time', '', 'mean_vertical_column_trop = temporal_mean_ds["vertical_column_troposphere"].compute()\n') File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2565, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2563 with self.builtin_trap: 2564 args = (magic_arg_s, cell) -> 2565 result = fn(*args, **kwargs) 2567 # The code below prevents the output from being displayed 2568 # when using magics with decorator @output_can_be_silenced 2569 # when the last Python token in the expression is a ';'. 2570 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/magics/execution.py:1470, in ExecutionMagics.time(self, line, cell, local_ns) 1468 if interrupt_occured: 1469 if exit_on_interrupt and captured_exception: -> 1470 raise captured_exception 1471 return 1472 return out File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/magics/execution.py:1434, in ExecutionMagics.time(self, line, cell, local_ns) 1432 st = clock2() 1433 try: -> 1434 exec(code, glob, local_ns) 1435 out = None 1436 # multi-line %%time case File <timed exec>:1 NameError: name 'temporal_mean_ds' is not defined
fig, ax = plt.subplots(subplot_kw={"projection": ccrs.PlateCarree()})
mean_vertical_column_trop.squeeze().plot.contourf(ax=ax)
ax.coastlines()
ax.gridlines(
draw_labels=True,
dms=True,
x_inline=False,
y_inline=False,
)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[9], line 3 1 fig, ax = plt.subplots(subplot_kw={"projection": ccrs.PlateCarree()}) ----> 3 mean_vertical_column_trop.squeeze().plot.contourf(ax=ax) 4 ax.coastlines() 5 ax.gridlines( 6 draw_labels=True, 7 dms=True, 8 x_inline=False, 9 y_inline=False, 10 ) NameError: name 'mean_vertical_column_trop' is not defined
Spatial mean - a time series of area averages¶
spatial_mean_ds = (
result_merged.sel({"longitude": slice(-78, -74), "latitude": slice(35, 39)})
.where(result_merged["main_data_quality_flag"] == 0)
.mean(dim=("longitude", "latitude"))
)
spatial_mean_ds
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[10], line 2 1 spatial_mean_ds = ( ----> 2 result_merged.sel({"longitude": slice(-78, -74), "latitude": slice(35, 39)}) 3 .where(result_merged["main_data_quality_flag"] == 0) 4 .mean(dim=("longitude", "latitude")) 5 ) 6 spatial_mean_ds NameError: name 'result_merged' is not defined
%%time
spatial_mean_vertical_column_trop = spatial_mean_ds[
"vertical_column_troposphere"
].compute()
CPU times: user 5 μs, sys: 1 μs, total: 6 μs Wall time: 9.78 μs
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[11], line 1 ----> 1 get_ipython().run_cell_magic('time', '', 'spatial_mean_vertical_column_trop = spatial_mean_ds[\n "vertical_column_troposphere"\n].compute()\n') File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2565, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2563 with self.builtin_trap: 2564 args = (magic_arg_s, cell) -> 2565 result = fn(*args, **kwargs) 2567 # The code below prevents the output from being displayed 2568 # when using magics with decorator @output_can_be_silenced 2569 # when the last Python token in the expression is a ';'. 2570 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/magics/execution.py:1470, in ExecutionMagics.time(self, line, cell, local_ns) 1468 if interrupt_occured: 1469 if exit_on_interrupt and captured_exception: -> 1470 raise captured_exception 1471 return 1472 return out File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/magics/execution.py:1434, in ExecutionMagics.time(self, line, cell, local_ns) 1432 st = clock2() 1433 try: -> 1434 exec(code, glob, local_ns) 1435 out = None 1436 # multi-line %%time case File <timed exec>:1 NameError: name 'spatial_mean_ds' is not defined
spatial_mean_vertical_column_trop.plot()
plt.show()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[12], line 1 ----> 1 spatial_mean_vertical_column_trop.plot() 2 plt.show() NameError: name 'spatial_mean_vertical_column_trop' is not defined
Single scan subset¶
subset_ds = result_merged.sel(
{
"longitude": slice(-78, -74),
"latitude": slice(35, 39),
"time": slice(
np.datetime64("2024-01-11T13:00:00"), np.datetime64("2024-01-11T14:00:00")
),
}
).where(result_merged["main_data_quality_flag"] == 0)
subset_ds
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[13], line 1 ----> 1 subset_ds = result_merged.sel( 2 { 3 "longitude": slice(-78, -74), 4 "latitude": slice(35, 39), 5 "time": slice( 6 np.datetime64("2024-01-11T13:00:00"), np.datetime64("2024-01-11T14:00:00") 7 ), 8 } 9 ).where(result_merged["main_data_quality_flag"] == 0) 10 subset_ds NameError: name 'result_merged' is not defined
%%time
subset_vertical_column_trop = subset_ds["vertical_column_troposphere"].compute()
CPU times: user 6 μs, sys: 1 μs, total: 7 μs Wall time: 11 μs
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[14], line 1 ----> 1 get_ipython().run_cell_magic('time', '', 'subset_vertical_column_trop = subset_ds["vertical_column_troposphere"].compute()\n') File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2565, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2563 with self.builtin_trap: 2564 args = (magic_arg_s, cell) -> 2565 result = fn(*args, **kwargs) 2567 # The code below prevents the output from being displayed 2568 # when using magics with decorator @output_can_be_silenced 2569 # when the last Python token in the expression is a ';'. 2570 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/magics/execution.py:1470, in ExecutionMagics.time(self, line, cell, local_ns) 1468 if interrupt_occured: 1469 if exit_on_interrupt and captured_exception: -> 1470 raise captured_exception 1471 return 1472 return out File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1047/lib/python3.11/site-packages/IPython/core/magics/execution.py:1434, in ExecutionMagics.time(self, line, cell, local_ns) 1432 st = clock2() 1433 try: -> 1434 exec(code, glob, local_ns) 1435 out = None 1436 # multi-line %%time case File <timed exec>:1 NameError: name 'subset_ds' is not defined
fig, ax = plt.subplots(subplot_kw={"projection": ccrs.PlateCarree()})
subset_vertical_column_trop.squeeze().plot.contourf(ax=ax)
ax.coastlines()
ax.gridlines(
draw_labels=True,
dms=True,
x_inline=False,
y_inline=False,
)
plt.show()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[15], line 3 1 fig, ax = plt.subplots(subplot_kw={"projection": ccrs.PlateCarree()}) ----> 3 subset_vertical_column_trop.squeeze().plot.contourf(ax=ax) 4 ax.coastlines() 5 ax.gridlines( 6 draw_labels=True, 7 dms=True, 8 x_inline=False, 9 y_inline=False, 10 ) NameError: name 'subset_vertical_column_trop' is not defined