[][src]Module accel::execution

Traits for CUDA Kernel launching

Launchable traits

Launchable traits, i.e. Launchable0, Launchable1, ..., implement launch function which launches a kernel on device.

use accel::{*, error::Result};

// Trait for 2-arg kernel
pub trait Launchable2 {
    // Type of arg1 on device
    type Target1;
    // Type of arg2 on device
    type Target2;

    // Launch kernel code on device
    fn launch<
        Arg1 /* Type of arg1 on host */,
        Arg2 /* Type of arg2 on host */
    >(
        &self,
        grid:  impl Into<Grid>,
        block: impl Into<Block>,
        (arg1, arg2): (Arg1, Arg2)
    ) -> Result<()>
    where
        // Types on host and on device are bundled by DeviceSend trait
        Arg1: DeviceSend<Target=Self::Target1>,
        Arg2: DeviceSend<Target=Self::Target2>,
    {
        // default impl which uses crate-internal features
        todo!() // skip for document
    }

    // Specify entry point (see following example)
    fn get_kernel(&self) -> Result<Kernel>;
}

These traits are generated by accel_derive::define_launchable! proc-macro. Launchable traits are specialized for N-args functions because it uses a tuple (Arg1, Arg2, ..., ArgN) for launch argument. DeviceSend trait specify how the host value is sent to device.

One of Launchable traits will be implemented automatically by accel::kernel for an auto-generated Module struct:

#[accel::kernel]
fn f(a: i32) {}

This simple definition will create a submodule f (same name of the function):

mod f { // same name sub-module

    pub const PTX_STR: &str = "{{ PTX string generated by rustc/nvptx64-nvidia-cuda }}";

    // wrapper for implement one of Launchable traits
    pub struct Module(::accel::Module);

    // impl Launchable1 because number of arugment is 1
    impl ::accel::execution::Launchable1<'_> for Module {
        type Target1 = i32; // first argument of `f`

        // How to get kernel PTX code
        fn get_kernel(&self) -> ::accel::error::Result<::accel::Kernel> {
            self.0.get_kernel("f")
        }
    }
}

For a function which takes N arguments, Launchable{N} will be implemented for corresponding module. Be sure that this sub-module will be generated where the f is defined. get_kernel and default implementation of launch are separated to keep unsafe codes in this crate.

Traits

DeviceSend

Type which can be sent to device

Launchable0

Launchable Kernel with N-arguments

Launchable1

Launchable Kernel with N-arguments

Launchable2

Launchable Kernel with N-arguments

Launchable3

Launchable Kernel with N-arguments

Launchable4

Launchable Kernel with N-arguments

Launchable5

Launchable Kernel with N-arguments

Launchable6

Launchable Kernel with N-arguments

Launchable7

Launchable Kernel with N-arguments

Launchable8

Launchable Kernel with N-arguments

Launchable9

Launchable Kernel with N-arguments

Launchable10

Launchable Kernel with N-arguments

Launchable11

Launchable Kernel with N-arguments

Launchable12

Launchable Kernel with N-arguments