Description
Summary
An inappropriate fill value is set when creating an empty dataset copy. This results in failures of subsequent processing, because instead of the dataset being truly empty, there is a "valid" value in a data variable, instead of a true fill value.
Description of the problem
When there are no data points that match the requested spatiotemporal conditions, l2ss-py creates an empty dataset copy here. @ank1m and I discovered an edge case where a valid value is being placed in the new, copied variable, instead of the expected null or fill value. This occurred for the following "ground_pixel_quality_flag" variable, which notably has an integer type (int32
) and has no declared '_FillValue'
attribute:
Here is a screenshot showing the variable, in a TEMPO collection:
Since this variable, "support_data/ground_pixel_quality_flag", doesn't have a '_FillValue'
, l2ss-py tries to create an empty array using np.nan
instead. But, because this variable is of type 'int32'
, it can't use np.nan
!
Instead, the code raises a
RuntimeWarning: invalid value encountered in cast multiarray.copyto(a, fill_value, casting='unsafe')
and then defaults back to using a 0
instead of np.nan
.
However, 0
is a valid value for this variable (see the valid_min
and valid_max
attributes in the above screenshot), so subsequent operations see a valid array, rather than an empty, or all-fill-value, array.
Impact
This causes a failure during the below service chain call, after the "Stitchee" service tries to determine whether the files coming from l2ss-py are empty. Stitchee considers the file as "not empty" here because the variable's single value is not a fill value or null.
Steps to reproduce
The following request currently fails: https://harmony.uat.earthdata.nasa.gov/C1262899916-LARC_CLOUD/ogc-api-coverages/1.0.0/collections/all/coverage/rangeset?forceAsync=true&granuleId=G1269044803-LARC_CLOUD%2CG1269044708-LARC_CLOUD%2CG1269044681-LARC_CLOUD%2CG1269044688-LARC_CLOUD%2CG1269044514-LARC_CLOUD%2CG1269044741-LARC_CLOUD%2CG1269044710-LARC_CLOUD%2CG1269044439-LARC_CLOUD%2CG1269044715-LARC_CLOUD%2CG1269044815-LARC_CLOUD%2CG1269044726-LARC_CLOUD%2CG1269044787-LARC_CLOUD%2CG1269044827-LARC_CLOUD%2CG1269044658-LARC_CLOUD%2CG1269044679-LARC_CLOUD%2CG1269044727-LARC_CLOUD&subset=lat(32.56485%3A42.82943)&subset=lon(-135.7248%3A-52.76692)&subset=time(%222024-08-02T00%3A00%3A00.000Z%22%3A%222024-08-02T10%3A39%3A37.000Z%22)&concatenate=true&skipPreview=true
Desired change
An appropriate fill or null value for each variable's dtype is used when creating an "empty" dataset.
I think that means the dataset copy in l2ss should either:
- take into account the dtype, and use the appropriate default _FillValue for that dtype to begin with (such as from
netCDF4.default_fillvals
), or - catch the invalid type warning, and then determine an appropriate _FillValue
Activity