🔬This is a nightly-only experimental API. (
stdsimd
#48556)Available on
target_arch="nvptx"
or target_arch="nvptx64"
only.Expand description
Allocate memory dynamically from a fixed-size heap in global memory.
The CUDA in-kernel malloc()
function allocates at least size
bytes
from the device heap and returns a pointer to the allocated memory
or NULL
if insufficient memory exists to fulfill the request.
The returned pointer is guaranteed to be aligned to a 16-byte boundary.
The memory allocated by a given CUDA thread via malloc()
remains allocated
for the lifetime of the CUDA context, or until it is explicitly released
by a call to free()
. It can be used by any other CUDA threads
even from subsequent kernel launches.
Sources: Programming Guide, PTX Interoperability.