Any faster way of copying arrays in C#? -
i have 3 arrays need combined in 1 three-dimension array. following code shows slow performance in performance explorer. there faster solution?
for (int = 0; < sortedindex.length; i++) { if (i < num_in_left) { // add instance left child leftnode[i, 0] = sortedindex[i]; leftnode[i, 1] = sortedinstances[i]; leftnode[i, 2] = sortedlabels[i]; } else { // add instance right child rightnode[i-num_in_left, 0] = sortedindex[i]; rightnode[i-num_in_left, 1] = sortedinstances[i]; rightnode[i-num_in_left, 2] = sortedlabels[i]; } }
update:
i'm trying following:
//given 3 1d arrays double[] sortedindex, sortedinstances, sortedlabels; // copy them on 3d array (forget rightnode now) double[] leftnode = new double[sortedindex.length, 3]; // magic happens here leftnode = {sortedindex, sortedinstances, sortedlabels};
use buffer.blockcopy. entire purpose perform fast (see buffer):
this class provides better performance manipulating primitive types similar methods in system.array class.
admittedly, haven't done benchmarks, that's documentation. works on multidimensional arrays; make sure you're specifying how many bytes copy, not how many elements, , you're working on primitive array.
also, have not tested this, might able squeeze bit more performance out of system if bind delegate system.buffer.memcpyimpl
, call directly. signature is:
internal static unsafe void memcpyimpl(byte* src, byte* dest, int len)
it require pointers, believe it's optimized highest speed possible, , don't think there's way faster that, if had assembly @ hand.
update:
due requests (and satisfy curiosity), tested this:
using system; using system.diagnostics; using system.reflection; unsafe delegate void memcpyimpl(byte* src, byte* dest, int len); static class temp { //there should generic createdelegate<t>() method... -___- static memcpyimpl memcpyimpl = (memcpyimpl)delegate.createdelegate( typeof(memcpyimpl), typeof(buffer).getmethod("memcpyimpl", bindingflags.static | bindingflags.nonpublic)); const int count = 32, size = 32 << 20; //use different buffers avoid cpu cache effects static byte[] asource = new byte[size], atarget = new byte[size], bsource = new byte[size], btarget = new byte[size], csource = new byte[size], ctarget = new byte[size]; static unsafe void testunsafe() { stopwatch sw = stopwatch.startnew(); fixed (byte* psrc = asource) fixed (byte* pdest = atarget) (int = 0; < count; i++) memcpyimpl(psrc, pdest, size); sw.stop(); console.writeline("buffer.memcpyimpl: {0:n0} ticks", sw.elapsedticks); } static void testblockcopy() { stopwatch sw = stopwatch.startnew(); sw.start(); (int = 0; < count; i++) buffer.blockcopy(bsource, 0, btarget, 0, size); sw.stop(); console.writeline("buffer.blockcopy: {0:n0} ticks", sw.elapsedticks); } static void testarraycopy() { stopwatch sw = stopwatch.startnew(); sw.start(); (int = 0; < count; i++) array.copy(csource, 0, ctarget, 0, size); sw.stop(); console.writeline("array.copy: {0:n0} ticks", sw.elapsedticks); } static void main(string[] args) { (int = 0; < 10; i++) { testarraycopy(); testblockcopy(); testunsafe(); console.writeline(); } } }
the results:
buffer.blockcopy: 469,151 ticks array.copy: 469,972 ticks buffer.memcpyimpl: 496,541 ticks buffer.blockcopy: 421,011 ticks array.copy: 430,694 ticks buffer.memcpyimpl: 410,933 ticks buffer.blockcopy: 425,112 ticks array.copy: 420,839 ticks buffer.memcpyimpl: 411,520 ticks buffer.blockcopy: 424,329 ticks array.copy: 420,288 ticks buffer.memcpyimpl: 405,598 ticks buffer.blockcopy: 422,410 ticks array.copy: 427,826 ticks buffer.memcpyimpl: 414,394 ticks
now change order:
array.copy: 419,750 ticks buffer.memcpyimpl: 408,919 ticks buffer.blockcopy: 419,774 ticks array.copy: 430,529 ticks buffer.memcpyimpl: 412,148 ticks buffer.blockcopy: 424,900 ticks array.copy: 424,706 ticks buffer.memcpyimpl: 427,861 ticks buffer.blockcopy: 421,929 ticks array.copy: 420,556 ticks buffer.memcpyimpl: 421,541 ticks buffer.blockcopy: 436,430 ticks array.copy: 435,297 ticks buffer.memcpyimpl: 432,505 ticks buffer.blockcopy: 441,493 ticks
now change order again:
buffer.memcpyimpl: 430,874 ticks buffer.blockcopy: 429,730 ticks array.copy: 432,746 ticks buffer.memcpyimpl: 415,943 ticks buffer.blockcopy: 423,809 ticks array.copy: 428,703 ticks buffer.memcpyimpl: 421,270 ticks buffer.blockcopy: 428,262 ticks array.copy: 434,940 ticks buffer.memcpyimpl: 423,506 ticks buffer.blockcopy: 427,220 ticks array.copy: 431,606 ticks buffer.memcpyimpl: 422,900 ticks buffer.blockcopy: 439,280 ticks array.copy: 432,649 ticks
or, in other words: they're competitive; general rule, memcpyimpl
fastest, it's not worth worrying about.
Comments
Post a Comment