API

Users should just include llama.hpp and all functionality should be available. All basic functionality of the library is in the namespace llama or sub namespaces.

Useful helpers

template<typename T>
struct NrAndOffset
template<typename T>
constexpr auto llama::structName(T = {}) -> std::string_view
using llama::CopyConst = std::conditional_t<std::is_const_v<FromT>, const ToT, ToT>

Alias for ToT, adding const if FromT is const qualified.

template<typename Derived, typename ValueType>
struct ProxyRefOpMixin

CRTP mixin for proxy reference types to support all compound assignment and increment/decrement operators.

Array

template<typename T, std::size_t N>
struct Array

Array class like std::array but suitable for use with offloading devices like GPUs.

Template Parameters
  • T – type if array elements.

  • N – rank of the array.

template<typename T, std::size_t N>
inline constexpr auto llama::pushFront([[maybe_unused]] Array<T, N> a, T v) -> Array<T, N + 1>
template<typename T, std::size_t N>
inline constexpr auto llama::pushBack([[maybe_unused]] Array<T, N> a, T v) -> Array<T, N + 1>
template<typename T, std::size_t N>
inline constexpr auto llama::popFront([[maybe_unused]] Array<T, N> a)
template<typename T, std::size_t N>
inline constexpr auto llama::popBack([[maybe_unused]] Array<T, N> a)
template<typename T, std::size_t N>
inline constexpr auto llama::product(Array<T, N> a) -> T

Tuple

template<typename ...Elements>
struct Tuple
template<std::size_t I, typename ...Elements>
inline constexpr auto llama::get(Tuple<Elements...> &tuple) -> auto&
template<typename Tuple1, typename Tuple2>
inline constexpr auto llama::tupleCat(const Tuple1 &t1, const Tuple2 &t2)
template<std::size_t Pos, typename Tuple, typename Replacement>
inline constexpr auto llama::tupleReplace(Tuple &&tuple, Replacement &&replacement)

Creates a copy of a tuple with the element at position Pos replaced by replacement.

template<typename ...Elements, typename Functor>
inline constexpr auto llama::tupleTransform(const Tuple<Elements...> &tuple, const Functor &functor)

Applies a functor to every element of a tuple, creating a new tuple with the result of the element transformations. The functor needs to implement a template operator() to which all tuple elements are passed.

template<typename ...Elements>
inline constexpr auto llama::popFront(const Tuple<Elements...> &tuple)

Returns a copy of the tuple without the first element.

Array dimensions

template<typename T = std::size_t, T... Sizes>
struct ArrayExtents : public llama::Array<T, ((Sizes == dyn) + ... + 0)>

ArrayExtents holding compile and runtime indices. This is conceptually equivalent to the std::extent of std::mdspan (

See also

: https://wg21.link/P0009) including the changes to make the size_type controllable (

Subclassed by llama::ArrayIndexRange< ArrayExtents >

using llama::ArrayExtentsDynamic = ArrayExtentsNCube<SizeType, N, dyn>

N-dimensional ArrayExtents where all values are dynamic.

using llama::ArrayExtentsNCube = decltype(internal::makeArrayExtents<SizeType, Extent>(std::make_index_sequence<N>{}))

N-dimensional ArrayExtents where all N extents are Extent.

template<typename T, std::size_t Dim>
struct ArrayIndex : public llama::Array<T, Dim>

Represents a run-time index into the array dimensions.

Template Parameters

Dim – Compile-time number of dimensions.

template<typename ArrayExtents>
struct ArrayIndexIterator

Iterator supporting ArrayIndexRange.

template<typename ArrayExtents>
struct ArrayIndexRange : private llama::ArrayExtents<T, Sizes>

Range allowing to iterate over all indices in an ArrayExtents.

template<typename SizeType, SizeType... Sizes, typename Func>
inline void llama::forEachADCoord(ArrayExtents<SizeType, Sizes...> extents, Func &&func)

Record dimension

template<typename ...Fields>
struct Record

A type list of Fields which may be used to define a record dimension.

template<typename Tag, typename Type>
struct Field

Record dimension tree node which may either be a leaf or refer to a child tree presented as another Record.

Template Parameters
  • Tag – Name of the node. May be any type (struct, class).

  • Type – Type of the node. May be one of three cases. 1. another sub tree consisting of a nested Record. 2. an array of static size of any type, in which case a Record with as many Field as the array size is created, named RecordCoord specialized on consecutive numbers I. 3. A scalar type different from Record, making this node a leaf of this type.

struct NoName

Anonymous naming for a Field.

using llama::GetFieldTag = mp_first<Field>

Get the tag from a Field.

using llama::GetFieldType = mp_second<Field>

Get the type from a Field.

template<typename RecordDim, typename RecordCoord, bool Align = false>
constexpr std::size_t llama::offsetOf = flatOffsetOf<FlatRecordDim<RecordDim>, flatRecordCoord<RecordDim, RecordCoord>, Align>

The byte offset of an element in a record dimension if it would be a normal struct.

Template Parameters
  • RecordDimRecord dimension tree.

  • RecordCoordRecord coordinate of an element inrecord dimension tree.

template<typename T, bool Align = false, bool IncludeTailPadding = true>
constexpr std::size_t llama::sizeOf = sizeof(T)

The size of a type T.

template<typename T>
constexpr std::size_t llama::alignOf = alignof(T)

The alignment of a type T.

using llama::GetTags = typename internal::GetTagsImpl<RecordDim, RecordCoord>::type

Get the tags of all Fields from the root of the record dimension tree until to the node identified by RecordCoord.

using llama::GetTag = typename internal::GetTagImpl<RecordDim, RecordCoord>::type

Get the tag of the Field at a RecordCoord inside the record dimension tree.

template<typename RecordDimA, typename LocalA, typename RecordDimB, typename LocalB>
constexpr auto llama::hasSameTags = []() constexpr{if constexpr(LocalA::size != LocalB::size)return false;else if constexpr(LocalA::size == 0 && LocalB::size == 0)return true;elsereturn std::is_same_v<GetTags<RecordDimA, LocalA>, GetTags<RecordDimB, LocalB>>;}()

Is true if, starting at two coordinates in two record dimensions, all subsequent nodes in the record dimension tree have the same tag.

Template Parameters
  • RecordDimA – First record dimension.

  • LocalARecordCoord based on StartA along which the tags are compared.

  • RecordDimB – second record dimension.

  • LocalBRecordCoord based on StartB along which the tags are compared.

using llama::GetCoordFromTags = typename internal::GetCoordFromTagsImpl<RecordDim, RecordCoord<>, TagsOrTagList...>::type

Converts a series of tags, or a list of tags, navigating down a record dimension into a RecordCoord. A RecordCoord will be passed through unmodified.

using llama::GetType = typename internal::GetTypeImpl<RecordDim, RecordCoordOrTags...>::type

Returns the type of a node in a record dimension tree identified by a given RecordCoord or a series of tags.

using llama::FlatRecordDim = typename internal::FlattenRecordDimImpl<RecordDim>::type

Returns a flat type list containing all leaf field types of the given record dimension.

template<typename RecordDim, typename RecordCoord>
constexpr std::size_t llama::flatRecordCoord = 0

The equivalent zero based index into a flat record dimension (FlatRecordDim) of the given hierarchical record coordinate.

using llama::LeafRecordCoords = typename internal::LeafRecordCoordsImpl<RecordDim, RecordCoord<>>::type

Returns a flat type list containing all record coordinates to all leaves of the given record dimension.

using llama::TransformLeaves = TransformLeavesWithCoord<RecordDim, internal::MakePassSecond<FieldTypeFunctor>::template fn>

Creates a new record dimension where each new leaf field’s type is the result of applying FieldTypeFunctor to the original leaf field’s type.

llama::MergedRecordDims = typename decltype(internal::mergeRecordDimsImpl(mp_identity< RecordDimA >{}, mp_identity< RecordDimB >{}))::type

Creates a merged record dimension, where duplicated, nested fields are unified.

template<typename RecordDim, typename Functor, typename ...Tags>
inline constexpr void llama::forEachLeafCoord(Functor &&functor, Tags...)

Iterates over the record dimension tree and calls a functor on each element.

Parameters
  • functor – Functor to execute at each element of. Needs to have operator() with a template parameter for the RecordCoord in the record dimension tree.

  • baseTags – Tags used to define where the iteration should be started. The functor is called on elements beneath this coordinate.

template<typename RecordDim, typename Functor, std::size_t... Coords>
inline constexpr void llama::forEachLeafCoord(Functor &&functor, RecordCoord<Coords...> baseCoord)

Iterates over the record dimension tree and calls a functor on each element.

Parameters
  • functor – Functor to execute at each element of. Needs to have operator() with a template parameter for the RecordCoord in the record dimension tree.

  • baseCoordRecordCoord at which the iteration should be started. The functor is called on elements beneath this coordinate.

template<typename RecordDim, std::size_t... Coords>
constexpr auto llama::prettyRecordCoord(RecordCoord<Coords...> = {}) -> std::string_view

Returns a pretty representation of the record coordinate inside the given record dimension. Tags are interspersed by ‘.’ and arrays are represented using subscript notation (“[123]”).

Record coordinates

template<std::size_t... Coords>
struct RecordCoord

Represents a coordinate for a record inside the record dimension tree.

Template Parameters

Coords... – the compile time coordinate.

Public Types

using List = mp_list_c<std::size_t, Coords...>

The list of integral coordinates as mp_list.

using llama::RecordCoordFromList = internal::mp_unwrap_values_into<L, RecordCoord>

Converts a type list of integral constants into a RecordCoord.

using llama::Cat = RecordCoordFromList<mp_append<typename RecordCoords::List...>>

Concatenate a set of RecordCoords.

using llama::PopFront = RecordCoordFromList<mp_pop_front<typename RecordCoord::List>>

RecordCoord without first coordinate component.

template<typename First, typename Second>
constexpr auto llama::recordCoordCommonPrefixIsBigger = internal::recordCoordCommonPrefixIsBiggerImpl(First{}, Second{})

Checks wether the first RecordCoord is bigger than the second.

template<typename First, typename Second>
constexpr auto llama::recordCoordCommonPrefixIsSame = internal::recordCoordCommonPrefixIsSameImpl(First{}, Second{})

Checks whether two RecordCoords are the same or one is the prefix of the other.

Views

template<typename Mapping, typename Allocator = bloballoc::Vector, typename Accessor = accessor::Default>
inline auto llama::allocView(Mapping mapping = {}, const Allocator &alloc = {}, Accessor accessor = {}) -> View<Mapping, internal::AllocatorBlobType<Allocator, typename Mapping::RecordDim>, Accessor>

Creates a view based on the given mapping, e.g. AoS or :SoA. For allocating the view’s underlying memory, the specified allocator callable is used (or the default one, which is bloballoc::Vector). The allocator callable is called with the alignment and size of bytes to allocate for each blob of the mapping. Value-initialization is performed for all fields by calling constructFields. This function is the preferred way to create a View. See also allocViewUninitialized.

template<typename Mapping, typename BlobType, typename Accessor>
inline void llama::constructFields(View<Mapping, BlobType, Accessor> &view)

Value-initializes all fields reachable through the given view. That is, constructors are run and fundamental types are zero-initialized. Computed fields are constructed if they return l-value references and assigned a default constructed value if they return a proxy reference.

template<typename Mapping, typename Allocator = bloballoc::Vector, typename Accessor = accessor::Default>
inline auto llama::allocViewUninitialized(Mapping mapping = {}, const Allocator &alloc = {}, Accessor accessor = {})

Same as allocView but does not run field constructors.

template<std::size_t Dim, typename RecordDim>
inline auto llama::allocViewStack() -> decltype(auto)

Allocates a View holding a single record backed by stack memory (bloballoc::Array).

Template Parameters

Dim – Dimension of the ArrayExtents of the View.

using llama::One = RecordRef<decltype(allocViewStack<0, RecordDim>()), RecordCoord<>, true>

A RecordRef that owns and holds a single value.

template<typename View, typename BoundRecordCoord, bool OwnView>
inline auto llama::copyRecord(const RecordRef<View, BoundRecordCoord, OwnView> &rr)

Returns a One with the same record dimension as the given record ref, with values copyied from rr.

template<typename View, typename TransformBlobFunc, typename = std::enable_if_t<isView<std::decay_t<View>>>>
inline auto llama::transformBlobs(View &view, const TransformBlobFunc &transformBlob)

Applies the given transformation to the blobs of a view and creates a new view with the transformed blobs and the same mapping and accessor as the old view.

template<typename View, typename NewBlobType = CopyConst<View, std::byte>*, typename = std::enable_if_t<isView<std::decay_t<View>>>>
inline auto llama::shallowCopy(View &view)

Creates a shallow copy of a view. This copy must not outlive the view, since it references its blob array.

Template Parameters

NewBlobType – The blob type of the shallow copy. Must be a non owning pointer like type.

Returns

A new view with the same mapping as view, where each blob refers to the blob in view.

template<typename NewMapping, typename Mapping, typename BlobType, typename Accessor>
inline auto llama::withMapping(View<Mapping, BlobType, Accessor> view, NewMapping newMapping = {})
template<typename NewAccessor, typename Mapping, typename BlobType, typename OldAccessor>
inline auto llama::withAccessor(View<Mapping, BlobType, OldAccessor> view, NewAccessor newAccessor = {})

Blob allocators

struct Vector

Allocates heap memory managed by a std::vector for a View, which is copied each time a View is copied.

struct SharedPtr

Allocates heap memory managed by a std::shared_ptr for a View. This memory is shared between all copies of a View.

struct UniquePtr

Allocates heap memory managed by a std::unique_ptr for a View. This memory can only be uniquely owned by a single View.

template<std::size_t BytesToReserve>
struct Array

Allocates statically sized memory for a View, which is copied each time a View is copied.

Template Parameters

BytesToReserve – the amount of memory to reserve.

template<std::size_t Alignment>
struct AlignedArray : public llama::Array<std::byte, BytesToReserve>

Mappings

template<typename TArrayExtents, typename TRecordDim, FieldAlignment TFieldAlignment = FieldAlignment::Align, typename TLinearizeArrayDimsFunctor = LinearizeArrayDimsCpp, template<typename> typename FlattenRecordDim = FlattenRecordDimInOrder>
struct AoS : public llama::mapping::MappingBase<TArrayExtents, TRecordDim>

Array of struct mapping. Used to create a View via allocView.

Template Parameters
using llama::mapping::AlignedAoS = AoS<ArrayExtents, RecordDim, FieldAlignment::Align, LinearizeArrayDimsFunctor>

Array of struct mapping preserving the alignment of the field types by inserting padding.

See also

AoS

using llama::mapping::MinAlignedAoS = AoS<ArrayExtents, RecordDim, FieldAlignment::Align, LinearizeArrayDimsFunctor, FlattenRecordDimMinimizePadding>

Array of struct mapping preserving the alignment of the field types by inserting padding and permuting the field order to minimize this padding.

See also

AoS

using llama::mapping::PackedAoS = AoS<ArrayExtents, RecordDim, FieldAlignment::Pack, LinearizeArrayDimsFunctor>

Array of struct mapping packing the field types tightly, violating the type’s alignment requirements.

See also

AoS

Warning

doxygentypedef: Cannot find typedef “llama::mapping::SingleBlobSoA” in doxygen xml output for project “LLAMA” from directory: ./doxygen/xml

using llama::mapping::MultiBlobSoA = SoA<ArrayExtents, RecordDim, Blobs::OnePerField, SubArrayAlignment::Pack, LinearizeArrayDimsFunctor>

Struct of array mapping storing each attribute of the record dimension in a separate blob.

See also

SoA

template<typename TArrayExtents, typename TRecordDim, typename TArrayExtents::value_type Lanes, typename TLinearizeArrayDimsFunctor = LinearizeArrayDimsCpp, template<typename> typename FlattenRecordDim = FlattenRecordDimInOrder>
struct AoSoA : public llama::mapping::MappingBase<TArrayExtents, TRecordDim>

Array of struct of arrays mapping. Used to create a View via allocView.

Template Parameters
template<typename RecordDim, std::size_t VectorRegisterBits>
constexpr std::size_t llama::mapping::maxLanes = []() constexpr{auto max = std::numeric_limits<std::size_t>::max();forEachLeafCoord<RecordDim>([&](auto rc){using AttributeType =GetType<RecordDim, decltype(rc)>;max = std::min(max, VectorRegisterBits / (sizeof(AttributeType) * CHAR_BIT));});return max;}()

The maximum number of vector lanes that can be used to fetch each leaf type in the record dimension into a vector register of the given size in bits.

template<typename TArrayExtents, typename TRecordDim, typename Bits = typename TArrayExtents::value_type, SignBit SignBit = SignBit::Keep, typename TLinearizeArrayDimsFunctor = LinearizeArrayDimsCpp, template<typename> typename FlattenRecordDim = FlattenRecordDimInOrder, typename TStoredIntegral = internal::StoredUnsignedFor<TRecordDim>>
struct BitPackedIntAoS : public internal::BitPackedIntCommon<TArrayExtents, TRecordDim, Bits, SignBit, TLinearizeArrayDimsFunctor, TStoredIntegral>

Array of struct mapping using bit packing to reduce size/precision of integral data types. If your record dimension contains non-integral types, split them off using the Split mapping first.

Template Parameters
  • Bits – If Bits is llama::Constant<N>, the compile-time N specifies the number of bits to use. If Bits is an integral type T, the number of bits is specified at runtime, passed to the constructor and stored as type T. Must not be zero and must not be bigger than the bits of TStoredIntegral.

  • SignBit – When set to SignBit::Discard, discards the sign bit when storing signed integers. All numbers will be read back positive.

  • TLinearizeArrayDimsFunctor – Defines how the array dimensions should be mapped into linear numbers and how big the linear domain gets.

  • FlattenRecordDim – Defines how the record dimension’s fields should be flattened. See

  • TStoredIntegral – Integral type used as storage of reduced precision integers. Must be std::uint32_t or std::uint64_t.

template<typename TArrayExtents, typename TRecordDim, typename Bits = typename TArrayExtents::value_type, SignBit SignBit = SignBit::Keep, typename TLinearizeArrayDimsFunctor = LinearizeArrayDimsCpp, typename TStoredIntegral = internal::StoredUnsignedFor<TRecordDim>>
struct BitPackedIntSoA : public internal::BitPackedIntCommon<TArrayExtents, TRecordDim, Bits, SignBit, TLinearizeArrayDimsFunctor, TStoredIntegral>

Struct of array mapping using bit packing to reduce size/precision of integral data types. If your record dimension contains non-integral types, split them off using the Split mapping first.

Template Parameters
  • Bits – If Bits is llama::Constant<N>, the compile-time N specifies the number of bits to use. If Bits is an integral type T, the number of bits is specified at runtime, passed to the constructor and stored as type T. Must not be zero and must not be bigger than the bits of TStoredIntegral.

  • SignBit – When set to SignBit::Discard, discards the sign bit when storing signed integers. All numbers will be read back positive.

  • TLinearizeArrayDimsFunctor – Defines how the array dimensions should be mapped into linear numbers and how big the linear domain gets.

  • TStoredIntegral – Integral type used as storage of reduced precision integers. Must be std::uint32_t or std::uint64_t.

template<typename TArrayExtents, typename TRecordDim, typename ExponentBits = typename TArrayExtents::value_type, typename MantissaBits = ExponentBits, typename TLinearizeArrayDimsFunctor = LinearizeArrayDimsCpp, template<typename> typename FlattenRecordDim = FlattenRecordDimInOrder, typename TStoredIntegral = internal::StoredIntegralFor<TRecordDim>>
struct BitPackedFloatAoS : public llama::mapping::MappingBase<TArrayExtents, TRecordDim>, public llama::internal::BoxedValue<ExponentBits, 0>, public llama::internal::BoxedValue<MantissaBits, 1>
template<typename TArrayExtents, typename TRecordDim, typename ExponentBits = typename TArrayExtents::value_type, typename MantissaBits = ExponentBits, typename TLinearizeArrayDimsFunctor = LinearizeArrayDimsCpp, typename TStoredIntegral = internal::StoredIntegralFor<TRecordDim>>
struct BitPackedFloatSoA : public llama::mapping::MappingBase<TArrayExtents, TRecordDim>, public llama::internal::BoxedValue<ExponentBits, 0>, public llama::internal::BoxedValue<MantissaBits, 1>

Struct of array mapping using bit packing to reduce size/precision of floating-point data types. The bit layout is [1 sign bit, exponentBits bits from the exponent, mantissaBits bits from the mantissa]+ and tries to follow IEEE 754. Infinity and NAN are supported. If the packed exponent bits are not big enough to hold a number, it will be set to infinity (preserving the sign). If your record dimension contains non-floating-point types, split them off using the Split mapping first.

Template Parameters
  • ExponentBits – If ExponentBits is llama::Constant<N>, the compile-time N specifies the number of bits to use to store the exponent. If ExponentBits is llama::Value<T>, the number of bits is specified at runtime, passed to the constructor and stored as type T. Must not be zero.

  • MantissaBits – Like ExponentBits but for the mantissa bits. Must not be zero (otherwise values turn INF).

  • TLinearizeArrayDimsFunctor – Defines how the array dimensions should be mapped into linear numbers and how big the linear domain gets.

  • TStoredIntegral – Integral type used as storage of reduced precision floating-point values.

template<typename TArrayExtents, typename TRecordDim, template<typename, typename> typename InnerMapping>
struct Bytesplit : private InnerMapping<TArrayExtents, internal::SplitBytes<TRecordDim>>

Meta mapping splitting each field in the record dimension into an array of bytes and mapping the resulting record dimension using a further mapping.

template<typename RC, typename BlobArray>
struct Reference : public llama::ProxyRefOpMixin<Reference<RC, BlobArray>, GetType<TRecordDim, RC>>
template<typename ArrayExtents, typename RecordDim, template<typename, typename> typename InnerMapping>
struct Byteswap : public llama::mapping::Projection<ArrayExtents, RecordDim, InnerMapping, internal::MakeByteswapProjectionMap<RecordDim>>

Mapping that swaps the byte order of all values when loading/storing.

template<typename ArrayExtents, typename RecordDim, template<typename, typename> typename InnerMapping, typename ReplacementMap>
struct ChangeType : public llama::mapping::Projection<ArrayExtents, RecordDim, InnerMapping, internal::MakeProjectionMap<RecordDim, ReplacementMap>>

Mapping that changes the type in the record domain for a different one in storage. Conversions happen during load and store.

Template Parameters

ReplacementMap – A type list of binary type lists (a map) specifiying which type or the type at a RecordCoord (map key) to replace by which other type (mapped value).

template<typename Mapping, typename Mapping::ArrayExtents::value_type Granularity = 1, typename TCountType = std::size_t>
struct Heatmap : private Mapping

Forwards all calls to the inner mapping. Counts all accesses made to blocks inside the blobs, allowing to extract a heatmap.

Template Parameters
  • Mapping – The type of the inner mapping.

  • Granularity – The granularity in bytes on which to could accesses. A value of 1 counts every byte. individually. A value of e.g. 64, counts accesses per 64 byte block.

  • TCountType – Data type used to count the number of accesses. Atomic increments must be supported for this type.

Public Functions

template<typename Blobs, typename OStream>
inline void writeGnuplotDataFileAscii(const Blobs &blobs, OStream &&os, bool trimEnd = true, std::size_t wrapAfterBlocks = 64) const

Writes a data file suitable for gnuplot containing the heatmap data. You can use the script provided by gnuplotScript to plot this data file.

Parameters
  • blobs – The blobs of the view containing this mapping

  • os – The stream to write the data to. Should be some form of std::ostream.

Public Static Attributes

static constexpr std::string_view gnuplotScriptAscii  = R"(#!/bin/bashgnuplot -p <<EOFfile = '${1:-plot.bin}'set xtics format ""set x2tics autofreq 32set yrange [] reverseset link x2; set link y2set x2label "Byte"plot file matrix with image pixels axes x2y1EOF)"

An example script for plotting the ASCII heatmap data using gnuplot.

static constexpr std::string_view gnuplotScriptBinary  = R"(#!/bin/bashgnuplot -p <<EOFfile = '${1:-plot.bin}'rowlength = '${2:-64}'maxrows = '${3:-all}'format = '${4:-%uint64}'counts = system('stat -c "%s" ${1:-plot.bin}')/8rows = counts/rowlengthrows = maxrows eq 'all' ? rows : (rows < maxrows ? rows : maxrows)set xtics format ""set x2tics autofreq 32set yrange [] reverseset link x2; set link y2set x2label "Byte"plot file binary array=(rowlength,rows) format=format with image pixels axes x2y1EOF)"

An example script for plotting the binary heatmap data using gnuplot.

template<typename TArrayExtents, typename TRecordDim>
struct Null : public llama::mapping::MappingBase<TArrayExtents, TRecordDim>

The Null mappings maps all elements to nothing. Writing data through a reference obtained from the Null mapping discards the value. Reading through such a reference returns a default constructed object.

template<typename TArrayExtents, typename TRecordDim, FieldAlignment TFieldAlignment = FieldAlignment::Align, template<typename> typename FlattenRecordDim = FlattenRecordDimMinimizePadding>
struct One : public llama::mapping::MappingBase<TArrayExtents, TRecordDim>

Maps all array dimension indices to the same location and layouts struct members consecutively. This mapping is used for temporary, single element views.

Template Parameters
template<typename TArrayExtents, typename TRecordDim, template<typename, typename> typename InnerMapping, typename TProjectionMap>
struct Projection : private InnerMapping<TArrayExtents, internal::ReplaceTypesByProjectionResults<TRecordDim, TProjectionMap>>

Mapping that projects types in the record domain to different types. Projections are executed during load and store.

Template Parameters

TProjectionMap – A type list of binary type lists (a map) specifing a projection (map value) for a type or the type at a RecordCoord (map key). A projection is a type with two functions: struct Proj { static auto load(auto&& fromMem); static auto store(auto&& toMem); };

template<typename TArrayExtents, typename TRecordDim, Blobs TBlobs = Blobs::OnePerField, SubArrayAlignment TSubArrayAlignment = TBlobs == Blobs::Single ? SubArrayAlignment::Align : SubArrayAlignment::Pack, typename TLinearizeArrayDimsFunctor = LinearizeArrayDimsCpp, template<typename> typename FlattenRecordDimSingleBlob = FlattenRecordDimInOrder>
struct SoA : public llama::mapping::MappingBase<TArrayExtents, TRecordDim>

Struct of array mapping. Used to create a View via allocView. We recommend to use multiple blobs when the array extents are dynamic and an aligned single blob version when they are static.

Template Parameters
  • TBlobs – If OnePerField, every element of the record dimension is mapped to its own blob.

  • TSubArrayAlignment – Only relevant when TBlobs == Single, ignored otherwise. If Align, aligns the sub arrays created within the single blob by inserting padding. If the array extents are dynamic, this may add some overhead to the mapping logic.

  • TLinearizeArrayDimsFunctor – Defines how the array dimensions should be mapped into linear numbers and how big the linear domain gets.

  • FlattenRecordDimSingleBlob – Defines how the record dimension’s fields should be flattened if Blobs is Single. See FlattenRecordDimInOrder, FlattenRecordDimIncreasingAlignment, FlattenRecordDimDecreasingAlignment and FlattenRecordDimMinimizePadding.

template<typename TArrayExtents, typename TRecordDim, typename TSelectorForMapping1, template<typename...> typename MappingTemplate1, template<typename...> typename MappingTemplate2, bool SeparateBlobs = false>
struct Split

Mapping which splits off a part of the record dimension and maps it differently then the rest.

Template Parameters
  • TSelectorForMapping1 – Selects a part of the record dimension to be mapped by MappingTemplate1. Can be a RecordCoord, a type list of RecordCoords, a type list of tags (selecting one field), or a type list of type list of tags (selecting one field per sub list). dimension to be mapped differently.

  • MappingTemplate1 – The mapping used for the selected part of the record dimension.

  • MappingTemplate2 – The mapping used for the not selected part of the record dimension.

  • SeparateBlobs – If true, both pieces of the record dimension are mapped to separate blobs.

template<typename Mapping, typename TCountType = std::size_t, bool MyCodeHandlesProxyReferences = true>
struct FieldAccessCount : public Mapping

Forwards all calls to the inner mapping. Counts all accesses made through this mapping and allows printing a summary.

Template Parameters
  • Mapping – The type of the inner mapping.

  • TCountType – The type used for counting the number of accesses.

  • MyCodeHandlesProxyReferences – If false, FieldAccessCount will avoid proxy references but can then only count the number of address computations

struct FieldHitsArray : public llama::Array<AccessCounts<CountType>, flatFieldCount<RecordDim>>

Public Functions

inline auto totalBytes() const

When MyCodeHandlesProxyReferences is true, return a pair of the total read and written bytes. If false, returns the total bytes of accessed data as a single value.

struct TotalBytes

Acessors

struct Default

Default accessor. Passes through the given reference.

struct ByValue

Allows only read access and returns values instead of references to memory.

struct Const

Allows only read access by qualifying the references to memory with const.

struct Restrict

Qualifies references to memory with __restrict. Only works on l-value references.

Warning

doxygenstruct: Cannot find class “llama::accessor::Atomic” in doxygen xml output for project “LLAMA” from directory: ./doxygen/xml

RecordDim flattener

template<typename RecordDim>
struct FlattenRecordDimInOrder

Flattens the record dimension in the order fields are written.

template<typename RecordDim, template<typename, typename> typename Less>
struct FlattenRecordDimSorted

Flattens the record dimension by sorting the fields according to a given predicate on the field types.

Template Parameters

Less – A binary predicate accepting two field types, which exposes a member value. Value must be true if the first field type is less than the second one, otherwise false.

using llama::mapping::FlattenRecordDimIncreasingAlignment = FlattenRecordDimSorted<RecordDim, internal::LessAlignment>

Flattens and sorts the record dimension by increasing alignment of its fields.

using llama::mapping::FlattenRecordDimDecreasingAlignment = FlattenRecordDimSorted<RecordDim, internal::MoreAlignment>

Flattens and sorts the record dimension by decreasing alignment of its fields.

using llama::mapping::FlattenRecordDimMinimizePadding = FlattenRecordDimIncreasingAlignment<RecordDim>

Flattens and sorts the record dimension by the alignment of its fields to minimize padding.

Common utilities

struct LinearizeArrayDimsCpp

Functor that maps an ArrayIndex into linear numbers the way C++ arrays work. The fast moving index of the ArrayIndex object should be the last one. E.g. ArrayIndex<3> a; stores 3 indices where a[2] should be incremented in the innermost loop.

Public Functions

template<typename ArrayExtents>
inline constexpr auto operator()(const typename ArrayExtents::Index &ai, const ArrayExtents &extents) const -> typename ArrayExtents::value_type
Parameters
  • ai – Index in the array dimensions.

  • extents – Total size of the array dimensions.

Returns

Linearized index.

struct LinearizeArrayDimsFortran

Functor that maps a ArrayIndex into linear numbers the way Fortran arrays work. The fast moving index of the ArrayIndex object should be the last one. E.g. ArrayIndex<3> a; stores 3 indices where a[2] should be incremented in the innermost loop.

Public Functions

template<typename ArrayExtents>
inline constexpr auto operator()(const typename ArrayExtents::Index &ai, const ArrayExtents &extents) const -> typename ArrayExtents::value_type
Parameters
  • ai – Index in the array dimensions.

  • extents – Total size of the array dimensions.

Returns

Linearized index.

struct LinearizeArrayDimsMorton

Functor that maps an ArrayIndex into linear numbers using the Z-order space filling curve (Morton codes).

Public Functions

template<typename ArrayExtents>
inline constexpr auto operator()(const typename ArrayExtents::Index &ai, [[maybe_unused]] const ArrayExtents &extents) const -> typename ArrayExtents::value_type
Parameters
  • ai – Coordinate in the array dimensions.

  • extents – Total size of the array dimensions.

Returns

Linearized index.

Tree mapping (deprecated)

template<typename TArrayExtents, typename TRecordDim, typename TreeOperationList>
struct Mapping : private TArrayExtents

An experimental attempt to provide a general purpose description of a mapping. Array and record dimensions are represented by a compile time tree data structure. This tree is mapped into memory by means of a breadth-first tree traversal. By specifying additional tree operations, the tree can be modified at compile time before being mapped to memory.

For a detailed description of the tree mapping concept have a look at LLAMA tree mapping

Tree mapping functors

struct Idem

Functor for tree::Mapping. Does nothing with the mapping tree. Is used for testing.

struct LeafOnlyRT

Functor for tree::Mapping. Moves all run time parts to the leaves, creating a SoA layout.

template<typename TreeCoord, typename Amount = std::size_t>
struct MoveRTDown

Functor for tree::Mapping. Move the run time part of a node one level down in direction of the leaves by the given amount (runtime or compile time value).

See also

tree::Mapping

Template Parameters

TreeCoord – tree coordinate in the mapping tree which’s run time part shall be moved down one level

Dumping

Warning

doxygenfunction: Cannot find function “llama::toSvg” in doxygen xml output for project “LLAMA” from directory: ./doxygen/xml

Warning

doxygenfunction: Cannot find function “llama::toHtml” in doxygen xml output for project “LLAMA” from directory: ./doxygen/xml

Data access

template<typename TMapping, typename TBlobType, typename TAccessor = accessor::Default>
struct View : private TMapping, private TAccessor

Central LLAMA class holding memory for storage and giving access to values stored there defined by a mapping. A view should be created using allocView.

Template Parameters
  • TMapping – The mapping used by the view to map accesses into memory.

  • TBlobType – The storage type used by the view holding memory.

  • TAccessor – The accessor to use when an access is made through this view.

Public Functions

View() = default

Performs default initialization of the blob array.

inline explicit View(Mapping mapping, Array<BlobType, Mapping::blobCount> blobs = {}, Accessor accessor = {})

Creates a LLAMA View manually. Prefer the allocations functions allocView and allocViewUninitialized if possible.

Parameters
  • mapping – The mapping used by the view to map accesses into memory.

  • blobs – An array of blobs providing storage space for the mapped data.

  • accessor – The accessor to use when an access is made through this view.

inline auto operator()(ArrayIndex ai) const -> decltype(auto)

Retrieves the RecordRef at the given ArrayIndex index.

template<typename ...Indices, std::enable_if_t<std::conjunction_v<std::is_convertible<Indices, size_type>...>, int> = 0>
inline auto operator()(Indices... indices) const -> decltype(auto)

Retrieves the RecordRef at the ArrayIndex index constructed from the passed component indices.

inline auto operator[](ArrayIndex ai) const -> decltype(auto)

Retrieves the RecordRef at the ArrayIndex index constructed from the passed component indices.

inline auto operator[](size_type index) const -> decltype(auto)

Retrieves the RecordRef at the 1D ArrayIndex index constructed from the passed index.

template<typename TStoredParentView>
struct SubView

Like a View, but array indices are shifted.

Template Parameters

TStoredParentView – Type of the underlying view. May be cv qualified and/or a reference type.

Public Types

using ParentView = std::remove_const_t<std::remove_reference_t<StoredParentView>>

type of the parent view

using Mapping = typename ParentView::Mapping

mapping of the parent view

using ArrayExtents = typename Mapping::ArrayExtents

array extents of the parent view

using ArrayIndex = typename Mapping::ArrayIndex

array index of the parent view

Public Functions

template<typename StoredParentViewFwd>
inline SubView(StoredParentViewFwd &&parentView, ArrayIndex offset)

Creates a SubView given a parent View and offset.

inline auto operator()(ArrayIndex ai) const -> decltype(auto)

Same as View::operator()(ArrayIndex), but shifted by the offset of this SubView.

template<typename ...Indices>
inline auto operator()(Indices... indices) const -> decltype(auto)

Same as corresponding operator in View, but shifted by the offset of this SubView.

Public Members

const ArrayIndex offset

offset by which this view’s ArrayIndex indices are shifted when passed to the parent view.

template<typename TView, typename TBoundRecordCoord, bool OwnView>
struct RecordRef : private ArrayIndex

Record reference type returned by View after resolving an array dimensions coordinate or partially resolving a RecordCoord. A record reference does not hold data itself, it just binds enough information (array dimensions coord and partial record coord) to retrieve it later from a View. Records references should not be created by the user. They are returned from various access functions in View and RecordRef itself.

Public Types

using View = TView

View this record reference points into.

using BoundRecordCoord = TBoundRecordCoord

Record coords into View::RecordDim which are already bound by this RecordRef.

using AccessibleRecordDim = GetType<RecordDim, BoundRecordCoord>

Subtree of the record dimension of View starting at BoundRecordCoord. If BoundRecordCoord is RecordCoord<> (default) AccessibleRecordDim is the same as Mapping::RecordDim.

Public Functions

inline RecordRef()

Creates an empty RecordRef. Only available for if the view is owned. Used by llama::One.

template<typename OtherView, typename OtherBoundRecordCoord, bool OtherOwnView>
inline RecordRef(const RecordRef<OtherView, OtherBoundRecordCoord, OtherOwnView> &recordRef)

Create a RecordRef from a different RecordRef. Only available for if the view is owned. Used by llama::One.

template<typename T, typename = std::enable_if_t<!isRecordRef<T>>>
inline explicit RecordRef(const T &scalar)

Create a RecordRef from a scalar. Only available for if the view is owned. Used by llama::One.

template<std::size_t... Coord>
inline auto operator()(RecordCoord<Coord...>) const -> decltype(auto)

Access a record in the record dimension underneath the current record reference using a RecordCoord. If the access resolves to a leaf, an l-value reference to a variable inside the View storage is returned, otherwise another RecordRef.

template<typename ...Tags>
inline auto operator()(Tags...) const -> decltype(auto)

Access a record in the record dimension underneath the current record reference using a series of tags. If the access resolves to a leaf, an l-value reference to a variable inside the View storage is returned, otherwise another RecordRef.

struct Loader
struct LoaderConst

Copying

template<typename SrcMapping, typename SrcBlob, typename DstMapping, typename DstBlob>
void llama::copy(const View<SrcMapping, SrcBlob> &srcView, View<DstMapping, DstBlob> &dstView, std::size_t threadId = 0, std::size_t threadCount = 1)

Copy data from source view to destination view. Both views need to have the same array and record dimensions. Delegates to Copy to choose an implementation.

Parameters
  • threadId – Optional. Zero-based id of calling thread for multi-threaded invocations.

  • threadCount – Optional. Thread count in case of multi-threaded invocation.

template<typename SrcMapping, typename DstMapping, typename SFINAE = void>
struct Copy

Generic implementation of copy defaulting to fieldWiseCopy. LLAMA provides several specializations of this construct for specific mappings. Users are encouraged to also specialize this template with better copy algorithms for further combinations of mappings, if they can and want to provide a better implementation.

template<typename SrcMapping, typename SrcBlob, typename DstMapping, typename DstBlob>
void llama::fieldWiseCopy(const View<SrcMapping, SrcBlob> &srcView, View<DstMapping, DstBlob> &dstView, std::size_t threadId = 0, std::size_t threadCount = 1)

Field-wise copy from source to destination view. Both views need to have the same array and record dimensions.

Parameters
  • threadId – Optional. Thread id in case of multi-threaded copy.

  • threadCount – Optional. Thread count in case of multi-threaded copy.

template<typename SrcMapping, typename SrcBlob, typename DstMapping, typename DstBlob>
void llama::aosoaCommonBlockCopy(const View<SrcMapping, SrcBlob> &srcView, View<DstMapping, DstBlob> &dstView, bool readOpt, std::size_t threadId = 0, std::size_t threadCount = 1)

AoSoA copy strategy which transfers data in common blocks. SoA mappings are also allowed for at most 1 argument.

Parameters
  • threadId – Optional. Zero-based id of calling thread for multi-threaded invocations.

  • threadCount – Optional. Thread count in case of multi-threaded invocation.

SIMD

template<typename Simd, typename SFINAE = void>
struct SimdTraits

Traits of a specific Simd implementation. Please specialize this template for the SIMD types you are going to use in your program. Each specialization SimdTraits<Simd> must provide:

  • an alias value_type to indicate the element type of the Simd.

  • a static constexpr size_t lanes variable holding the number of SIMD lanes of the Simd.

  • a static auto loadUnalinged(const value_type* mem) -> Simd function, loading a Simd from the given memory address.

  • a static void storeUnaligned(Simd simd, value_type* mem) function, storing the given Simd to a given memory address.

template<typename Simd, typename SFINAE = void>
constexpr auto llama::simdLanes = SimdTraits<Simd>::lanes

The number of SIMD simdLanes the given SIMD vector or Simd<T> has. If Simd is not a structural Simd or SimdN, this is a shortcut for SimdTraits<Simd>::lanes.

using llama::SimdizeN = typename internal::SimdizeNImpl<RecordDim, N, MakeSizedSimd>::type

Transforms the given record dimension into a SIMD version of it. Each leaf field type will be replaced by a sized SIMD vector with length N, as determined by MakeSizedSimd. If N is 1, SimdizeN<T, 1, …> is an alias for T.

using llama::Simdize = TransformLeaves<RecordDim, MakeSimd>

Transforms the given record dimension into a SIMD version of it. Each leaf field type will be replaced by a SIMD vector, as determined by MakeSimd.

Warning

doxygenvariable: Cannot find variable “llama::simdLanesFor” in doxygen xml output for project “LLAMA” from directory: ./doxygen/xml

using llama::SimdN = typename std::conditional_t<isRecordDim<T>, std::conditional_t<N == 1, mp_identity<One<T>>, mp_identity<One<SimdizeN<T, N, MakeSizedSimd>>>>, std::conditional_t<N == 1, mp_identity<T>, mp_identity<SimdizeN<T, N, MakeSizedSimd>>>>::type

Creates a SIMD version of the given type. Of T is a record dimension, creates a One where each field is a SIMD type of the original field type. The SIMD vectors have length N. If N is 1, an ordinary One of the record dimension T is created. If T is not a record dimension, a SIMD vector with value T and length N is created. If N is 1 (and T is not a record dimension), then T is produced.

using llama::Simd = typename std::conditional_t<isRecordDim<T>, mp_identity<One<Simdize<T, MakeSimd>>>, mp_identity<Simdize<T, MakeSimd>>>::type

Creates a SIMD version of the given type. Of T is a record dimension, creates a One where each field is a SIMD type of the original field type.

template<typename T, typename Simd>
inline void llama::loadSimd(const T &srcRef, Simd &dstSimd)

Loads SIMD vectors of data starting from the given record reference to dstSimd. Only field tags occurring in RecordRef are loaded. If Simd contains multiple fields of SIMD types, a SIMD vector will be fetched for each of the fields. The number of elements fetched per SIMD vector depends on the SIMD width of the vector. Simd is allowed to have different vector lengths per element.

template<typename Simd, typename T>
inline void llama::storeSimd(const Simd &srcSimd, T &&dstRef)

Stores SIMD vectors of element data from the given srcSimd into memory starting at the provided record reference. Only field tags occurring in RecordRef are stored. If Simd contains multiple fields of SIMD types, a SIMD vector will be stored for each of the fields. The number of elements stored per SIMD vector depends on the SIMD width of the vector. Simd is allowed to have different vector lengths per element.

template<std::size_t N, template<typename, auto> typename MakeSizedSimd, typename View, typename UnarySimdFunction>
void llama::simdForEachN(View &view, UnarySimdFunction f)
template<template<typename> typename MakeSimd, template<typename, auto> typename MakeSizedSimd, typename View, typename UnarySimdFunction>
void llama::simdForEach(View &view, UnarySimdFunction f)

Macros

LLAMA_INDEPENDENT_DATA

May be put in front of a loop statement. Indicates that all (!) data access inside the loop is indepent, so the loop can be safely vectorized. Example:

LLAMA_INDEPENDENT_DATA
for(int i = 0; i < N; ++i)
    // because of LLAMA_INDEPENDENT_DATA the compiler knows that a and b
    // do not overlap and the operation can safely be vectorized
    a[i] += b[i];

LLAMA_FORCE_INLINE

Forces the compiler to inline a function annotated with this macro.

LLAMA_FORCE_INLINE_RECURSIVE

Forces the compiler to recursively inline the call hiearchy started by the subsequent function call.

LLAMA_UNROLL(...)

Requests the compiler to unroll the loop following this directive. An optional unrolling count may be provided as argument, which must be a constant expression.

LLAMA_HOST_ACC

Some offloading parallelization language extensions such a CUDA, OpenACC or OpenMP 4.5 need to specify whether a class, struct, function or method “resides” on the host, the accelerator (the offloading device) or both. LLAMA supports this with marking every function needed on an accelerator with LLAMA_HOST_ACC.

LLAMA_FN_HOST_ACC_INLINE
LLAMA_LAMBDA_INLINE

Gives strong indication to the compiler to inline the attributed lambda.

LLAMA_COPY(x)

Forces a copy of a value. This is useful to prevent ODR usage of constants when compiling for GPU targets.