AVX-512-Elementary Math Functions-YMM#
_mm256_mask_sqrt_pd#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256d
- Param Types:
__m256d src, __mmask8 k, __m256d a
- Param ETypes:
FP64 src, MASK k, FP64 a
__m256d _mm256_mask_sqrt_pd(__m256d src, __mmask8 k,
__m256d a)
Intel Description
Compute the square root of packed double-precision (64-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set).
Intel Implementation Psudeo-Code
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
_mm256_maskz_sqrt_pd#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256d
- Param Types:
__mmask8 k, __m256d a
- Param ETypes:
MASK k, FP64 a
__m256d _mm256_maskz_sqrt_pd(__mmask8 k, __m256d a);
Intel Description
Compute the square root of packed double-precision (64-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set).
Intel Implementation Psudeo-Code
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
_mm256_mask_sqrt_ps#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256
- Param Types:
__m256 src, __mmask8 k, __m256 a
- Param ETypes:
FP32 src, MASK k, FP32 a
__m256 _mm256_mask_sqrt_ps(__m256 src, __mmask8 k,
__m256 a)
Intel Description
Compute the square root of packed single-precision (32-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set).
Intel Implementation Psudeo-Code
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
_mm256_maskz_sqrt_ps#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256
- Param Types:
__mmask8 k, __m256 a
- Param ETypes:
MASK k, FP32 a
__m256 _mm256_maskz_sqrt_ps(__mmask8 k, __m256 a);
Intel Description
Compute the square root of packed single-precision (32-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set).
Intel Implementation Psudeo-Code
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
_mm256_rsqrt_ph#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256h
- Param Types:
__m256h a
- Param ETypes:
FP16 a
__m256h _mm256_rsqrt_ph(__m256h a);
Intel Description
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst”. The maximum relative error for this approximation is less than 1.5*2^-12.
Intel Implementation Psudeo-Code
FOR i := 0 to 15
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ENDFOR
dst[MAX:256] := 0
_mm256_mask_rsqrt_ph#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256h
- Param Types:
__m256h src, __mmask16 k, __m256h a
- Param ETypes:
FP16 src, MASK k, FP16 a
__m256h _mm256_mask_rsqrt_ph(__m256h src, __mmask16 k,
__m256h a)
Intel Description
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
Intel Implementation Psudeo-Code
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
_mm256_maskz_rsqrt_ph#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256h
- Param Types:
__mmask16 k, __m256h a
- Param ETypes:
MASK k, FP16 a
__m256h _mm256_maskz_rsqrt_ph(__mmask16 k, __m256h a);
Intel Description
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
Intel Implementation Psudeo-Code
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
_mm256_sqrt_ph#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256h
- Param Types:
__m256h a
- Param ETypes:
FP16 a
__m256h _mm256_sqrt_ph(__m256h a);
Intel Description
Compute the square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst”.
Intel Implementation Psudeo-Code
FOR i := 0 to 15
dst.fp16[i] := SQRT(a.fp16[i])
ENDFOR
dst[MAX:256] := 0
_mm256_mask_sqrt_ph#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256h
- Param Types:
__m256h src, __mmask16 k, __m256h a
- Param ETypes:
FP16 src, MASK k, FP16 a
__m256h _mm256_mask_sqrt_ph(__m256h src, __mmask16 k,
__m256h a)
Intel Description
Compute the square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set).
Intel Implementation Psudeo-Code
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
_mm256_maskz_sqrt_ph#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256h
- Param Types:
__mmask16 k, __m256h a
- Param ETypes:
MASK k, FP16 a
__m256h _mm256_maskz_sqrt_ph(__mmask16 k, __m256h a);
Intel Description
Compute the square root of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set).
Intel Implementation Psudeo-Code
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
_mm256_rcp_ph#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256h
- Param Types:
__m256h a
- Param ETypes:
FP16 a
__m256h _mm256_rcp_ph(__m256h a);
Intel Description
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst”. The maximum relative error for this approximation is less than 1.5*2^-12.
Intel Implementation Psudeo-Code
FOR i := 0 to 15
dst.fp16[i] := (1.0 / a.fp16[i])
ENDFOR
dst[MAX:256] := 0
_mm256_mask_rcp_ph#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256h
- Param Types:
__m256h src, __mmask16 k, __m256h a
- Param ETypes:
FP16 src, MASK k, FP16 a
__m256h _mm256_mask_rcp_ph(__m256h src, __mmask16 k,
__m256h a)
Intel Description
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using writemask “k” (elements are copied from “src” when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
Intel Implementation Psudeo-Code
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := (1.0 / a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
_mm256_maskz_rcp_ph#
- Tech:
AVX-512
- Category:
Elementary Math Functions
- Header:
immintrin.h
- Searchable:
AVX-512-Elementary Math Functions-YMM
- Register:
YMM 256 bit
- Return Type:
__m256h
- Param Types:
__mmask16 k, __m256h a
- Param ETypes:
MASK k, FP16 a
__m256h _mm256_maskz_rcp_ph(__mmask16 k, __m256h a);
Intel Description
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in “a”, and store the results in “dst” using zeromask “k” (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
Intel Implementation Psudeo-Code
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := (1.0 / a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0