gcc - How do you load/store from/to an array of doubles with GNU C Vector Extensions? -

- January 15, 2014

i'm using gnu c vector extensions, not intel's _mm_* intrinsics.

i want same thing intel's _m256_loadu_pd intrinsic. assigning values 1 one slow: gcc produces code has 4 load instructions, rather 1 single vmovupd (which _m256_loadu_pd generate).

typedef double vector __attribute__((vector_size(4 * sizeof(double))));  int main(int argc, char **argv) {     double a[4] = {1.0, 2.0, 3.0, 4.0};     vector v;      /* */     v[0] = a[0];     v[1] = a[1];     v[2] = a[2];     v[3] = a[3]; }

i want this:

v = (vector)(a);

v = *((vector*)(a));

but neither work. first fails "can't convert value vector" while second results in segfaults.

update: see you're using gnu c's native vector syntax, not intel intrinsics. avoiding intel intrinsics portability non-x86? gcc bad job compiling code uses gnu c vectors wider target machine supports. (you'd hope use 2 128b vectors , operate on each separately, apparently it's worse that.)

anyway, this answer shows how can use intel x86 intrinsics load data gnu c vector-syntax types

first of all, looking @ compiler output @ less -o2 waste of time if you're trying learn compile code. main() optimize ret @ -o2.

besides that, it's not totally surprising bad asm assigning elements of vector 1 @ time.

aside: normal people call type v4df (vector of 4 double float) or something, not vector, don't go insane when using c++ std::vector. single-precision, v8sf. iirc, gcc uses type names internally __m256d.

on x86, intel intrinsic types (like __m256d) implemented on top of gnu c vector syntax (which why can v1 * v2 in gnu c instead of writing _mm256_mul_pd(v1, v2)). can convert freely __m256d v4df, i've done here.

i've wrapped both sane ways in functions, can @ asm. notice how we're not loading array define inside same function, compiler won't optimize away.

i put them on godbolt compiler explorer can @ asm various compile options , compiler versions.

typedef double v4df __attribute__((vector_size(4 * sizeof(double))));  #include <immintrin.h>  // note return types.  gcc6.1 compiles no warnings, @ -wall -wextra v4df load_4_doubles_intel(const double *p) { return _mm256_loadu_pd(p); }     vmovupd ymm0, ymmword ptr [rdi]   # tmp89,* p     ret  v4df avx_constant() { return _mm256_setr_pd( 1.0, 2.0, 3.0, 4.0 ); }     vmovapd ymm0, ymmword ptr .lc0[rip]     ret

if args _mm_set* intrinsics aren't compile-time constants, compiler best can make efficient code elements single vector. it's best rather writing c stores tmp array , loads it, because that's not best strategy. (store-forwarding failure on multiple narrow stores forwarding wide load costs ~10 cycles (iirc) of latency on top of usual store-forwarding delay. if doubles in registers, it's best shuffle them together.)

see is possible cast floats directly __m128 if 16 byte alligned? list of various intrinsics getting single scalar vector. x86 tag wiki has links intel's manuals, , intrinsics finder.

load/store gnu c vectors without intel intrinsics:

i'm not sure how you're "supposed" that. this q&a suggests casting pointer memory want load, , using vector type typedef char __attribute__ ((vector_size (16),aligned (1))) unaligned_byte16; (note aligned(1) attribute).

you segfault *(v4df *)a because presumably a isn't aligned on 32-byte boundary, you're using vector type assume natural alignment. (just __m256d if dereference pointer instead of using load/store intrinsics communicate alignment info compiler.)

Search This Blog

celery

gcc - How do you load/store from/to an array of doubles with GNU C Vector Extensions? -

load/store gnu c vectors without intel intrinsics:

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

python 3.5 - Pyqtgraph string in x tick -