Skip to content

Investigate ReGrid Performance vs Node #96

Open
@bchavez

Description

Seems like Node ReGrid can get 3x more writes than .NET; yielding faster upload wall time. See image below (credits @buskila):

pasted_image_at_2016_08_24_13_03

Test setup

Upload only:
File Size: 1 GB.
Server: RethinkDB / Linux / Ubuntu 14, 3 nodes
Client: .NET Core / Linux

Chunk Size: Default
Batch Size: Default 8 -> 32

They tried single connection and connection pooling. No difference.

Using Stream IO:

// Upload a file using an IO stream
Guid uploadId;
using( var fileStream = File.Open("C:\\video.mp4", FileMode.Open) )
using( var uploadStream = bucket.OpenUploadStream("/video.mp4") )
{
    uploadId = uploadStream.FileInfo.Id;
    fileStream.CopyTo(uploadStream);
}

Suspicion

Too much chunk calculation in stream upload code. Try to parallelize / simplify some of this, especially when given byte[].

Node's ReGrid upload code is here:
https://github.com/internalfx/regrid/blob/master/lib/upload.js

Other notes

This should come after #77 is done.

After some discussion with @interalfx (thanks a bunch), the upload code is using node streams. Node streams info via @buskila:

Using .pipe() has other benefits too, like handling backpressure automatically so that
node won't buffer chunks into memory needlessly when the remote client 
is on a really slow or high-latency connection.

https://github.com/substack/stream-handbook

Currently, @internalfx runs 10 network requests in flight at any given time. In a scenario where there is infinite network latency, node won't write to the ReGrid API until at least 1 network request completes.

Cool. I think we could maybe do the same with 10 async tasks laying down bytes over a connection pool then as they complete, then come back read more bytes as network requests complete.


Other Research Findings

RethinkDB Limitations

  • Query size (419554663) greater than maximum (134217727).
    So batch size can't be too big, Max query size is ~130MB something. So only ~130MB per batch max.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions