AVX-512-Elementary Math Functions-YMM#

_mm256_mask_sqrt_pd#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256d
Param Types:: __m256d src, __mmask8 k, __m256d a
Param ETypes:: FP64 src, MASK k, FP64 a

__m256d _mm256_mask_sqrt_pd(__m256d src, __mmask8 k,
                            __m256d a)

Intel Description

Compute the square root of packed double-precision (64-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR j := 0 to 3
        i := j*64
        IF k[j]
                dst[i+63:i] := SQRT(a[i+63:i])
        ELSE
                dst[i+63:i] := src[i+63:i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_sqrt_pd#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256d
Param Types:: __mmask8 k, __m256d a
Param ETypes:: MASK k, FP64 a

__m256d _mm256_maskz_sqrt_pd(__mmask8 k, __m256d a);

Intel Description

Compute the square root of packed double-precision (64-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR j := 0 to 3
        i := j*64
        IF k[j]
                dst[i+63:i] := SQRT(a[i+63:i])
        ELSE
                dst[i+63:i] := 0
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_mask_sqrt_ps#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256
Param Types:: __m256 src, __mmask8 k, __m256 a
Param ETypes:: FP32 src, MASK k, FP32 a

__m256 _mm256_mask_sqrt_ps(__m256 src, __mmask8 k,
                           __m256 a)

Intel Description

Compute the square root of packed single-precision (32-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR j := 0 to 7
        i := j*32
        IF k[j]
                dst[i+31:i] := SQRT(a[i+31:i])
        ELSE
                dst[i+31:i] := src[i+31:i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_sqrt_ps#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256
Param Types:: __mmask8 k, __m256 a
Param ETypes:: MASK k, FP32 a

__m256 _mm256_maskz_sqrt_ps(__mmask8 k, __m256 a);

Intel Description

Compute the square root of packed single-precision (32-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR j := 0 to 7
        i := j*32
        IF k[j]
                dst[i+31:i] := SQRT(a[i+31:i])
        ELSE
                dst[i+31:i] := 0
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_rsqrt_ph#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256h
Param Types:: __m256h a
Param ETypes:: FP16 a

__m256h _mm256_rsqrt_ph(__m256h a);

Intel Description

Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst”. The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ENDFOR
dst[MAX:256] := 0

_mm256_mask_rsqrt_ph#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256h
Param Types:: __m256h src, __mmask16 k, __m256h a
Param ETypes:: FP16 src, MASK k, FP16 a

__m256h _mm256_mask_rsqrt_ph(__m256h src, __mmask16 k,
                             __m256h a)

Intel Description

Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        ELSE
                dst.fp16[i] := src.fp16[i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_rsqrt_ph#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256h
Param Types:: __mmask16 k, __m256h a
Param ETypes:: MASK k, FP16 a

__m256h _mm256_maskz_rsqrt_ph(__mmask16 k, __m256h a);

Intel Description

Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        ELSE
                dst.fp16[i] := 0
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_sqrt_ph#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256h
Param Types:: __m256h a
Param ETypes:: FP16 a

__m256h _mm256_sqrt_ph(__m256h a);

Intel Description

Compute the square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst”.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        dst.fp16[i] := SQRT(a.fp16[i])
ENDFOR
dst[MAX:256] := 0

_mm256_mask_sqrt_ph#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256h
Param Types:: __m256h src, __mmask16 k, __m256h a
Param ETypes:: FP16 src, MASK k, FP16 a

__m256h _mm256_mask_sqrt_ph(__m256h src, __mmask16 k,
                            __m256h a)

Intel Description

Compute the square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := SQRT(a.fp16[i])
        ELSE
                dst.fp16[i] := src.fp16[i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_sqrt_ph#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256h
Param Types:: __mmask16 k, __m256h a
Param ETypes:: MASK k, FP16 a

__m256h _mm256_maskz_sqrt_ph(__mmask16 k, __m256h a);

Intel Description

Compute the square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := SQRT(a.fp16[i])
        ELSE
                dst.fp16[i] := 0
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_rcp_ph#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256h
Param Types:: __m256h a
Param ETypes:: FP16 a

__m256h _mm256_rcp_ph(__m256h a);

Intel Description

Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst”. The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        dst.fp16[i] := (1.0 / a.fp16[i])
ENDFOR
dst[MAX:256] := 0

_mm256_mask_rcp_ph#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256h
Param Types:: __m256h src, __mmask16 k, __m256h a
Param ETypes:: FP16 src, MASK k, FP16 a

__m256h _mm256_mask_rcp_ph(__m256h src, __mmask16 k,
                           __m256h a)

Intel Description

Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := (1.0 / a.fp16[i])
        ELSE
                dst.fp16[i] := src.fp16[i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_rcp_ph#

Tech:: AVX-512
Category:: Elementary Math Functions
Header:: immintrin.h
Searchable:: AVX-512-Elementary Math Functions-YMM
Register:: YMM 256 bit
Return Type:: __m256h
Param Types:: __mmask16 k, __m256h a
Param ETypes:: MASK k, FP16 a

__m256h _mm256_maskz_rcp_ph(__mmask16 k, __m256h a);

Intel Description

Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := (1.0 / a.fp16[i])
        ELSE
                dst.fp16[i] := 0
        FI
ENDFOR
dst[MAX:256] := 0

AVX-512-Elementary Math Functions-YMM

Contents

AVX-512-Elementary Math Functions-YMM#

_mm256_mask_sqrt_pd#

_mm256_maskz_sqrt_pd#

_mm256_mask_sqrt_ps#

_mm256_maskz_sqrt_ps#

_mm256_rsqrt_ph#

_mm256_mask_rsqrt_ph#

_mm256_maskz_rsqrt_ph#

_mm256_sqrt_ph#

_mm256_mask_sqrt_ph#

_mm256_maskz_sqrt_ph#

_mm256_rcp_ph#

_mm256_mask_rcp_ph#

_mm256_maskz_rcp_ph#