Skip to content

Improve performance when subsetting, including shapefile subsetting #316

Open
@jamesfwood

Description

Investigate possible performance improvements to the shapefile subsetting capability. This capability is currently being brute-forced, but there might be a more elegant solution. Because this is L2 data, many of the existing capabilities for doing this may not work, such as using rioxarray.

The following library may be useful in achieving this: https://github.com/xarray-contrib/xoak
Also, see the the following discussion on this exact topic: corteva/rioxarray#202

==============================================

This came from Chris Durbin on the Harmony team when they built a tool to subset using AI tools. They found that l2ss-py was the slowest piece in their tool and could use speed improvements.

Here's some comments from Chris regarding this:

The requests that would took minutes when we provided a shapefile were l2ss-py. We tried limiting the number of points in the shapefile to a small number, but were still seeing minutes to complete. We saw maskfill requests would only take a couple seconds with a shapefile so we were wondering if it would be possible to leverage it to perform the subsetting. I asked Owen briefly and he said maskfill only works on gridded data which probably makes it much faster to work with a shapefile for subsetting. In any case it's probably worth a ticket to see if anything can be done to speed up the shapefile subsetting in l2ss-py. I know when I worked on CMR we used some tricks with using bounding boxes of the minimum bounding rectangle (MBR) and largest interior rectangle (LR) to quickly find things that must be in the region or can't be in the region, and then only performed more expensive intersection searches for points that were in the MBR but not in the LR.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions