AVX-512-Elementary Math Functions-YMM#

_mm256_mask_sqrt_pd#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256d

Param Types:

__m256d src, __mmask8 k, __m256d a

Param ETypes:

FP64 src, MASK k, FP64 a

__m256d _mm256_mask_sqrt_pd(__m256d src, __mmask8 k,
                            __m256d a)

Intel Description

Compute the square root of packed double-precision (64-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR j := 0 to 3
        i := j*64
        IF k[j]
                dst[i+63:i] := SQRT(a[i+63:i])
        ELSE
                dst[i+63:i] := src[i+63:i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_sqrt_pd#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256d

Param Types:

__mmask8 k, __m256d a

Param ETypes:

MASK k, FP64 a

__m256d _mm256_maskz_sqrt_pd(__mmask8 k, __m256d a);

Intel Description

Compute the square root of packed double-precision (64-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR j := 0 to 3
        i := j*64
        IF k[j]
                dst[i+63:i] := SQRT(a[i+63:i])
        ELSE
                dst[i+63:i] := 0
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_mask_sqrt_ps#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256

Param Types:

__m256 src, __mmask8 k, __m256 a

Param ETypes:

FP32 src, MASK k, FP32 a

__m256 _mm256_mask_sqrt_ps(__m256 src, __mmask8 k,
                           __m256 a)

Intel Description

Compute the square root of packed single-precision (32-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR j := 0 to 7
        i := j*32
        IF k[j]
                dst[i+31:i] := SQRT(a[i+31:i])
        ELSE
                dst[i+31:i] := src[i+31:i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_sqrt_ps#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256

Param Types:

__mmask8 k, __m256 a

Param ETypes:

MASK k, FP32 a

__m256 _mm256_maskz_sqrt_ps(__mmask8 k, __m256 a);

Intel Description

Compute the square root of packed single-precision (32-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR j := 0 to 7
        i := j*32
        IF k[j]
                dst[i+31:i] := SQRT(a[i+31:i])
        ELSE
                dst[i+31:i] := 0
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_rsqrt_ph#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256h

Param Types:

__m256h a

Param ETypes:

FP16 a

__m256h _mm256_rsqrt_ph(__m256h a);

Intel Description

Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst”. The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ENDFOR
dst[MAX:256] := 0

_mm256_mask_rsqrt_ph#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256h

Param Types:

__m256h src, __mmask16 k, __m256h a

Param ETypes:

FP16 src, MASK k, FP16 a

__m256h _mm256_mask_rsqrt_ph(__m256h src, __mmask16 k,
                             __m256h a)

Intel Description

Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        ELSE
                dst.fp16[i] := src.fp16[i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_rsqrt_ph#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256h

Param Types:

__mmask16 k, __m256h a

Param ETypes:

MASK k, FP16 a

__m256h _mm256_maskz_rsqrt_ph(__mmask16 k, __m256h a);

Intel Description

Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        ELSE
                dst.fp16[i] := 0
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_sqrt_ph#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256h

Param Types:

__m256h a

Param ETypes:

FP16 a

__m256h _mm256_sqrt_ph(__m256h a);

Intel Description

Compute the square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst”.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        dst.fp16[i] := SQRT(a.fp16[i])
ENDFOR
dst[MAX:256] := 0

_mm256_mask_sqrt_ph#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256h

Param Types:

__m256h src, __mmask16 k, __m256h a

Param ETypes:

FP16 src, MASK k, FP16 a

__m256h _mm256_mask_sqrt_ph(__m256h src, __mmask16 k,
                            __m256h a)

Intel Description

Compute the square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := SQRT(a.fp16[i])
        ELSE
                dst.fp16[i] := src.fp16[i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_sqrt_ph#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256h

Param Types:

__mmask16 k, __m256h a

Param ETypes:

MASK k, FP16 a

__m256h _mm256_maskz_sqrt_ph(__mmask16 k, __m256h a);

Intel Description

Compute the square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set).

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := SQRT(a.fp16[i])
        ELSE
                dst.fp16[i] := 0
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_rcp_ph#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256h

Param Types:

__m256h a

Param ETypes:

FP16 a

__m256h _mm256_rcp_ph(__m256h a);

Intel Description

Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst”. The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        dst.fp16[i] := (1.0 / a.fp16[i])
ENDFOR
dst[MAX:256] := 0

_mm256_mask_rcp_ph#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256h

Param Types:

__m256h src, __mmask16 k, __m256h a

Param ETypes:

FP16 src, MASK k, FP16 a

__m256h _mm256_mask_rcp_ph(__m256h src, __mmask16 k,
                           __m256h a)

Intel Description

Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := (1.0 / a.fp16[i])
        ELSE
                dst.fp16[i] := src.fp16[i]
        FI
ENDFOR
dst[MAX:256] := 0

_mm256_maskz_rcp_ph#

Tech:

AVX-512

Category:

Elementary Math Functions

Header:

immintrin.h

Searchable:

AVX-512-Elementary Math Functions-YMM

Register:

YMM 256 bit

Return Type:

__m256h

Param Types:

__mmask16 k, __m256h a

Param ETypes:

MASK k, FP16 a

__m256h _mm256_maskz_rcp_ph(__mmask16 k, __m256h a);

Intel Description

Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

Intel Implementation Psudeo-Code

FOR i := 0 to 15
        IF k[i]
                dst.fp16[i] := (1.0 / a.fp16[i])
        ELSE
                dst.fp16[i] := 0
        FI
ENDFOR
dst[MAX:256] := 0