[−][src]Module accel::execution
Traits for CUDA Kernel launching
Launchable traits
Launchable traits, i.e. Launchable0
, Launchable1
, ..., implement launch
function which launches a kernel on device.
use accel::{*, error::Result}; // Trait for 2-arg kernel pub trait Launchable2 { // Type of arg1 on device type Target1; // Type of arg2 on device type Target2; // Launch kernel code on device fn launch< Arg1 /* Type of arg1 on host */, Arg2 /* Type of arg2 on host */ >( &self, grid: impl Into<Grid>, block: impl Into<Block>, (arg1, arg2): (Arg1, Arg2) ) -> Result<()> where // Types on host and on device are bundled by DeviceSend trait Arg1: DeviceSend<Target=Self::Target1>, Arg2: DeviceSend<Target=Self::Target2>, { // default impl which uses crate-internal features todo!() // skip for document } // Specify entry point (see following example) fn get_kernel(&self) -> Result<Kernel>; }
These traits are generated by accel_derive::define_launchable!
proc-macro.
Launchable traits are specialized for N-args functions because it uses a tuple (Arg1, Arg2, ..., ArgN)
for launch
argument.
DeviceSend trait specify how the host value is sent to device.
One of Launchable traits will be implemented automatically by accel::kernel for an auto-generated Module struct:
#[accel::kernel] fn f(a: i32) {}
This simple definition will create a submodule f
(same name of the function):
mod f { // same name sub-module pub const PTX_STR: &str = "{{ PTX string generated by rustc/nvptx64-nvidia-cuda }}"; // wrapper for implement one of Launchable traits pub struct Module(::accel::Module); // impl Launchable1 because number of arugment is 1 impl ::accel::execution::Launchable1<'_> for Module { type Target1 = i32; // first argument of `f` // How to get kernel PTX code fn get_kernel(&self) -> ::accel::error::Result<::accel::Kernel> { self.0.get_kernel("f") } } }
For a function which takes N arguments, Launchable{N}
will be implemented for corresponding module.
Be sure that this sub-module will be generated where the f
is defined.
get_kernel
and default implementation of launch
are separated to keep unsafe codes in this crate.
Traits
DeviceSend | Type which can be sent to device |
Launchable0 | Launchable Kernel with N-arguments |
Launchable1 | Launchable Kernel with N-arguments |
Launchable2 | Launchable Kernel with N-arguments |
Launchable3 | Launchable Kernel with N-arguments |
Launchable4 | Launchable Kernel with N-arguments |
Launchable5 | Launchable Kernel with N-arguments |
Launchable6 | Launchable Kernel with N-arguments |
Launchable7 | Launchable Kernel with N-arguments |
Launchable8 | Launchable Kernel with N-arguments |
Launchable9 | Launchable Kernel with N-arguments |
Launchable10 | Launchable Kernel with N-arguments |
Launchable11 | Launchable Kernel with N-arguments |
Launchable12 | Launchable Kernel with N-arguments |