Blobs

When a view is created, it needs to be given an array of blobs. A blob is an object representing a contiguous region of memory where each byte is accessible using the subscript operator. The number of blobs and the alignment/size of each blob is a property determined by the mapping used by the view. All this is handled by llama::allocView(), but I needs to be given a blob allocator to handle the actual allocation of each blob.

auto blobAllocator = ...;
auto view = llama::allocView(mapping, blobAllocator);

Every time a view is copied, it’s array of blobs is copied too. Depending on the type of blobs used, this can have different effects. If e.g. std::vector<std::byte> is used, the full storage will be copied. Contrary, if a std::shared_ptr<std::byte[]> is used, the storage is shared between each copy of the view.

Blob allocators

A blob allocator is a callable which returns an appropriately sized blob given a desired compile-time alignment and runtime allocation size in bytes. Choosing the right compile-time alignment has implications on the read/write speed on some CPU architectures and may even lead to CPU exceptions if data is not properly aligned. A blob allocator is called like this:

auto blobAllocator = ...;
auto blob = blobAllocator(std::integral_constant<std::size_t, FieldAlignment>{}, size);

There is a number of a built-in blob allocators:

Vector

llama::bloballoc::Vector is a blob allocator creating blobs of type std::vector<std::byte>. This means every time a view is copied, the whole memory is copied too. When the view is moved, no extra allocation or copy operation happens.

Shared pointer

llama::bloballoc::SharedPtr is a blob allocator creating blobs of type std::shared_ptr<std::byte[]>. These blobs will be shared between each copy of the view and only destroyed then the last view is destroyed.

Unique pointer

llama::bloballoc::UniquePtr is a blob allocator creating blobs of type std::unique_ptr<std::byte[], ...>. These blobs will be uniquely owned by a single view, so the view cannot be copied, only moved.

Array

When working with small amounts of memory or temporary views created frequently, it is usually beneficial to store the data directly inside the view, avoiding a heap allocation.

llama::bloballoc::Array addresses this issue and creates blobs of type llama::Array<std::byte, N>, where N is a compile time value passed to the allocator. These blobs are copied every time their view is copied. llama::One uses this facility. In many such cases, the extents of the array dimensions are also known at compile time, so they can be specified in the template argument list of llama::ArrayExtents.

Creating a small view of \(4 \times 4\) may look like this:

using ArrayExtents = llama::ArrayExtents<int, 4, 4>;
constexpr ArrayExtents extents{};

using Mapping = /* a simple mapping */;
auto blobAllocator = llama::bloballoc::Array<
    extents[0] * extents[1] * llama::sizeOf<RecordDim>::value
>;
auto miniView = llama::allocView(Mapping{extents}, blobAllocator);

// or in case the mapping is constexpr and produces just 1 blob:
constexpr auto mapping = Mapping{extents};
auto miniView = llama::allocView(mapping, llama::bloballoc::Array<mapping.blobSize(0)>{});

For \(N\)-dimensional one-record views a shortcut exists, returning a view with just one record on the stack:

auto tempView = llama::allocScalarView<N, RecordDim>();

CudaMalloc

llama::bloballoc::CudaMalloc is a blob allocator for creating blobs of type std::unique_ptr<std::byte[], ...>. The memory is allocated using cudaMalloc and the unique ptr destroys it using cudaFree. This allocator is automatically available if the <cuda_runtime.h> header is available.

AlpakaBuf

llama::bloballoc::AlpakaBuf is a blob allocator for creating alpaka buffers as blobs. This allocator is automatically available if the <alpaka/alpaka.hpp> header is available.

auto view = llama::allocView(mapping, llama::bloballoc::AlpakaBuf{alpakaDev});

Using this blob allocator is essentially the same as:

auto view = llama::allocView(mapping, [&alpakaDev](auto align, std::size_t size){
    return alpaka::allocBuf<std::byte, std::size_t>(alpakaDev, size);
});

You may want to use the latter version in case the buffer creation is more complex.

Non-owning blobs

If a view is needed based on already allocated memory, the view can also be directly constructed with an array of blobs, e.g. an array of std::byte* pointers or std::span<std::byte> to the existing memory regions. Everything works here as long as it can be subscripted by the view like blob[offset]. One needs to be careful though, since now the ownership of the blob is decoupled from the view. It is the responsibility of the user now to ensure that the blobs outlive the views based on them.

Alpaka

LLAMA features some examples using alpaka for the abstraction of computation parallelization. Alpaka has its own memory allocation functions for different memory regions (e.g. host, device and shared memory). Additionally there are some cuda-inherited rules which make e.g. sharing memory regions hard (e.g. no possibility to use a std::shared_ptr on a GPU).

Alpaka creates and manages memory using buffers. A pointer to the underlying storage of a buffer can be obtained, which may be used for a LLAMA view:

auto buffer = alpaka::allocBuf<std::byte, std::size_t>(dev, size);
auto view = llama::View<Mapping, std::byte*>{mapping, {alpaka::getPtrNative(buffer)}};

This is an alternative to the llama::bloballoc::AlpakaBuf blob allocator, if the user wants to decouple buffer allocation and view creation.

Shared memory is created by alpaka using a special function returning a reference to a shared variable. To allocate storage for LLAMA, we can allocate a shared byte array using alpaka and then pass the address of the first element to a LLAMA view.

auto& sharedMem = alpaka::declareSharedVar<std::byte[sharedMemSize], __COUNTER__>(acc);
auto view = llama::View<Mapping, std::byte*>{mapping, {&sharedMem[0]}};

Shallow copy

The type of a view’s blobs determine part of the semantic of the view. It is sometimes useful to strip this type information from a view and create a new view reusing the same memory as the old one, but using a plain referrential blob type (e.g. a std::byte*). This is what llama::shallowCopy is for.

This is especially useful when passing views with more complicated blob types to accelerators. E.g. views using the llama::bloballoc::CudaMalloc allocator:

E.g. views using alpaka buffers as blobs: