Description
This is a spin-off issue from the conversation in #180 so we don't loose track of it and also don't distract discussion in that PR.
Original suggestion from @knaaptime:
Categoricals are important, for example, to interpolate rasters (e.g., land use), and having the functionality out in the wild would help it get tested.
it would be useful to see whether this can provide a boost to the existing functionality we have for vectorizing rasters
And response from @darribas:
It’s slightly different. We could think of a way of vectorizing pixels and doing a spatial dissolve with dask. I don’t know if that’d be faster (it'd be at least parallel/out-of-core), but it’s definitely different code (though similar philosophy), so I'd be tempted to leave that for a different PR, perhaps create an issue to remember this option in case we have bandwidth (or need) in the future to explore it.
In the case suggested above, a strategy to use Dask would be:
- Read in the raster w/
rioxarray
- Extract pixel centroids with
to_pandas
(there might be a way to go directly into adask.DataFrame
- Turn into a
dask_geopandas.GeoDataFrame
- Build pixels as vectors with
buffer(xxx, cap_style=3)
- Dissolve vector pixels by value
Once we enter a Dask data structure, all computations are lazy and parallel when .compute()
is called, providing scalability and parallelism. But I'm not sure if that will make it faster than rasterio
's vectorisation, which I imagine relies on GEOS? It might because the dissolve should be a fast one because all polygons to dissolve are four-point squares. One worth a shot for sure.
Activity