Skip to content

Improve processing when nodes have large numbers of children #2080

Open
@cbevan-figma

Description

Problem

I have an SVG with ~25k children under the same parent, and it takes ~9 seconds to optimize. Unfortunately these sorts of SVGs are not entirely unusual for my use case :( Notably though, with a small change I reduced the time to ~3 seconds.

Suggested solution

You're probably aware of the two operations that turn the overall traversal into O(N2): detachNodeFromParent() (which recreates the array while removing just one element) and the if (parentNode.children.includes(node)) line in xast.js. I addressed these in a local change by deferring the node removals using a set. The changes I made were hardly revolutionary, but I'm posting them below for clarity. The downside of course is that they'd technically introduce an incompatibility with plugins that remove siblings without calling detachNodeFromParent.

Maybe something like this is worth including in a future major change though, one where the release could announce the small difference in behavior?

export type XastElement = {
  // existing stuff
  pendingRemovals?: Set<XastChild>;
};
// similarly for XastRoot
// visit() in xast.js
if (node.type === 'element') {
  if (!parentNode.pendingRemovals?.has(node)) {
    for (const child of node.children) {
      visit(child, visitor, node);
    }
    // apply pending child removals
    if (node.pendingRemovals) {
      const childrenToRemove = node.pendingRemovals;
      node.children = node.children.filter(
        (child) => !childrenToRemove.has(child),
      );
      node.pendingRemovals = undefined;
    }
  }
}
const detachNodeFromParent = (node, parentNode) => {
  // defer removal to avoid O(N^2) complexity - can be a big hit for large files
  if (parentNode.pendingRemovals) {
    parentNode.pendingRemovals.add(node);
  } else {
    parentNode.pendingRemovals = new Set([node]);
  }
};

Workaround

I'm using this deferred removal in some custom plugins to speed them up (just adding an apply step in the exit callback). I also tried working around the issue by splitting the ~25k siblings into subgroups as a first step, but the grouping causes problems with some other parsing we do later down the line, so it's not ideal.

Related

Note this is a similar issue to #1969, so feel free to merge if you like, but I'm initially posting it here for visibility so it doesn't just disappear as a comment.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions