Prerequisites
- Part 1 of this series (concepts, GF(2) intuition, and basis vectors)
- A working knowledge of CUDA-thread hierarchy (CTA → warp → lane) or an equivalent AMD mental model
What you will learn here
- How to map hardware dimensions to layout inputs without memorising folklore
- When to pick each constructor (
identity1D,strided1D,zeros1D, and manual bases) - How to combine layouts safely through composition, inversion, and reshape calls
- End-to-end examples that you can adapt straight into lowering code
1. Dimension Naming Conventions and Semantics
Understanding dimension names is the gateway to reading and writing LinearLayout code fluently. This section maps Triton’s encoding attributes to the input/output dimension vocabulary you will use throughout the API.
1.1. Input Dimensions — Hardware Hierarchy
Input dimensions describe where a value lives in hardware. The standard progression follows the GPU execution model from finest to coarsest granularity:
Distributed (Register-based) Layouts
"register" → Index into per-thread register slots (values held by one thread)
"lane" → Thread ID within a warp (0–31 on NVIDIA, 0–63 on AMD)
"warp" → Warp ID within a CTA (Cooperative Thread Array)
"block" → Block (CTA) ID within the grid/cluster
Why this order?
registeris the most minor dimension. It represents data you can permute cheaply inside a single thread without cross-lane communication.laneis the next level. Lanes execute in lockstep within a warp, so this dimension often maps to contiguous memory strides.warpandblockscale the same pattern across the SM and grid. They typically control coarser-grained tiling.
Conversion example: BlockedEncodingAttr → LinearLayout
// sizePerThread=[4,2], threadsPerWarp=[8,4], warpsPerCTA=[2,2]
auto blocked = BlockedEncodingAttr::get(ctx,
/*sizePerThread=*/{4, 2},
/*threadsPerWarp=*/{8, 4},
/*warpsPerCTA=*/{2, 2},
/*order=*/{1, 0},
CTALayoutAttr::get(/*...*/));
auto ll = blocked.toLinearLayout(shape);
// Resulting input dimensions (ins):
// - register: 0..7 (4×2 - 1 values per thread)
// - lane: 0..31 (8×4 - 1 threads per warp)
// - warp: 0..3 (2×2 - 1 warps per CTA)
// - block: size depends on launch grid
These attributes decompose into basis vectors. For a quick sanity check, evaluate ll.apply at a few known points:
// Thread 0, register 0 should map to tensor index 0
assert(ll.apply({{S("register"), 0}, {S("lane"), 0}, {S("warp"), 0}})[0].second == 0);
// Thread 0, register 1 should map to the next element along the first axis
assert(ll.apply({{S("register"), 1}, {S("lane"), 0}, {S("warp"), 0}})[0].second == 1);
Shared Memory Layouts
Shared encodings replace the fine-grained hardware axes (register, lane, warp) with a single offset dimension:
"offset" → Linear index in shared memory (measured in elements, not bytes)
"block" → CTA ID (optional; used when reasoning about multi-CTA scenarios)
Swizzled shared example
auto shared = SwizzledSharedEncodingAttr::get(
ctx,
/*vec=*/8,
/*perPhase=*/4,
/*maxPhase=*/8,
/*order=*/{1, 0},
CTALayoutAttr::get(/*...*/));
auto ll = swizzledSharedToLinearLayout(shape, shared);
// Input dimensions: offset, block
// Output dimensions: dim0, dim1
// The offset → (dim0, dim1) mapping encodes the swizzle pattern to avoid bank conflicts
Use this form when you need to reason about shared-memory bank conflicts at the IR level, before lowering to LLVM.
1.2. Output Dimensions — Logical Tensor Axes
Output dimensions always correspond to the logical tensor axes in their original order. The layout’s order field controls internal memory layout (row-major vs. column-major), but the output dimension names remain canonical:
"dim0" → First tensor axis
"dim1" → Second tensor axis
"dim2" → Third tensor axis
...
Key property: All layouts—regardless of their internal order—speak the same output language. This uniformity is why LinearLayout composes so cleanly: every transformation operates on dim0, dim1, etc., and you never need to track multiple coordinate systems.
Example:
// Both layouts have order={1,0} (column-major), but outputs are still named dim0, dim1
auto blocked = BlockedEncodingAttr::get(/*..., order={1,0}*/).toLinearLayout(shape);
auto shared = swizzledSharedToLinearLayout(shape, sharedAttr);
// Both use the same output names:
// blocked: (register,lane,warp) → (dim0, dim1)
// shared: (offset) → (dim0, dim1)
1.3. Dimension Ordering, Flattening, and Reshaping
All LinearLayout operations interpret dimensions in minor-to-major order:
- The first dimension is the most minor (changes fastest in memory).
- The last dimension is the most major (changes slowest).
This convention affects how flattenIns, flattenOuts, reshapeIns, and reshapeOuts combine or split dimensions.
Flatten Rule of Thumb
// Input dimensions: ["register", "lane", "warp"]
// Minor-to-major: register is most minor, warp is most major
auto flattened = layout.flattenIns();
// Result: single dimension with size = register_size × lane_size × warp_size
// Elements are ordered: register changes fastest, then lane, then warp
If you want a different flattening order (e.g., lane-major), transpose first:
auto laneMajor = layout.transposeIns({S("lane"), S("register"), S("warp")}).flattenIns();
// Now lane changes fastest
Reshape Checklist
Before calling reshapeIns or reshapeOuts:
- Verify total size: The product of new dimension sizes must equal the product of old dimension sizes.
- Plan transposes:
reshapeInsalways flattens in minor-to-major order before splitting. If you need a different minor axis, transpose first. - Check dimension counts: Use
getInDimSize(name)orgetOutDimSize(name)to confirm sizes before reshaping. Silent dimension drops are a common source of bugs.
Example workflow:
// Original: (register:4, lane:8, warp:2) — total size 64
auto flat = layout.flattenIns();
// Flat: (register:64)
auto reshaped = flat.reshapeIns({{S("thread"), 32}, {S("block"), 2}});
// Reshaped: (thread:32, block:2)
// If you wanted warp to be minor instead of register, do this:
auto transposed = layout.transposeIns({S("warp"), S("lane"), S("register")});
auto flatWarpMajor = transposed.flattenIns();
// Now warp changes fastest
2. Practical Examples: Step-by-Step Construction
The following scenarios show how the APIs above compose into real Triton workflows. Each example walks through the mental model explicitly.
Arithmetic view (bitfield intuition): When two inputs contribute to the same output dimension via operator* and their basis bits do not overlap, the XOR accumulation is equivalent to standard integer addition over disjoint bitfields. For example,
auto L1 = LinearLayout::identity1D(4, S("lane"), S("dim0"));
auto L2 = LinearLayout::strided1D(8, 4, S("register"), S("dim0")); // shift register by 2 bits
auto L = L1 * L2;
// Effective arithmetic form (disjoint bitfields):
// dim0 = (lane % 4) + ((register % 8) << 2)
// = (lane % 4) + (register % 8) * 4
This works because lane occupies the lower 2 bits of dim0, while register occupies the next 3 bits (shifted by 2 via stride=4). If the bitfields overlap (e.g., you deliberately place both on the same bit positions), the combination is true XOR, not integer addition.
2.1. Example 1 — Simple 1D Distribution Across Lanes
Scenario: Distribute 32 logical elements across 4 threads so that each thread owns 8 elements in a strided pattern.
Distribution pattern:
- Thread 0: elements [0, 4, 8, 12, 16, 20, 24, 28]
- Thread 1: elements [1, 5, 9, 13, 17, 21, 25, 29]
- Thread 2: elements [2, 6, 10, 14, 18, 22, 26, 30]
- Thread 3: elements [3, 7, 11, 15, 19, 23, 27, 31]
Construction:
auto L1 = LinearLayout::identity1D(4, S("lane"), S("dim0"));
auto L2 = LinearLayout::identity1D(8, S("register"), S("dim0"));
auto layout = L1 * L2;
// Verification
assert(layout.apply({{S("register"), 0}, {S("lane"), 0}})[0].second == 0);
assert(layout.apply({{S("register"), 1}, {S("lane"), 0}})[0].second == 4);
assert(layout.apply({{S("register"), 0}, {S("lane"), 1}})[0].second == 1);
assert(layout.apply({{S("register"), 2}, {S("lane"), 3}})[0].second == 11);
dim0 = (lane % 4) + ((register % 8) << 2)
= (lane & 0b11) + ((register & 0b111) << 2)
2.2. Example 2 — 2D Blocked Layout (MMA Tile)
Scenario: A CTA cooperatively computes a 32×32 accumulator tile. Each warp handles a 16×8 MMA fragment (m16n8k16), and each lane packs two accumulator values in registers. We want a LinearLayout that maps (register, lane) coordinates back to logical (row, col) indices while respecting the register packing and warp tiling order used by NVIDIA’s nvidiaMmaTile.
Image brought from https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#mma-16816-c-i8
Goal: Produce a layout equivalent to the relevant portion of LinearLayout.cpp::nvidiaMmaTile, but stripped down to just the accumulator mapping. The fragment is parameterised by
tileShape = {m=16, n=8}repOrder = {dimM, dimN}kWidth = 2
// Step 1: trivial CTA layout with the desired output dimension order
int rank = repOrder.size();
auto dimNames = standardOutDimNames(ctx, rank);
auto trivialShape = SmallVector<unsigned>(rank, 1);
LinearLayout ctaLayout =
identityStandardND(S("register"), trivialShape, repOrder);
// Step 2: identify inner/outer logical axes
assert(rank == 2);
auto inner = order[0];
auto outer = order[1];
// Step 3: accumulate the blocked structure
assert(tileShape.size() == rank);
int m = tileShape[outer]; // rows handled per warp
int n = tileShape[inner]; // columns handled per warp
assert(m % 8 == 0);
assert(n % (kWidth * 4) == 0);
ctaLayout = ctaLayout *
LinearLayout::identity1D(kWidth, S("register"), dimNames[inner]) *
LinearLayout::identity1D(4, S("lane"), dimNames[inner]) *
LinearLayout::identity1D(8, S("lane"), dimNames[outer]) *
LinearLayout::identity1D(m / 8, S("register"), dimNames[outer]) *
LinearLayout::identity1D(n / (kWidth * 4), S("register"),
dimNames[inner]);
What each factor contributes
identity1D(kWidth, register → dimN)walks across thek-width columns that a single lane updates.identity1D(4, lane → dimN)folds the 4-lane quad (threads 0–3, 4–7, …) into neighbouring columns.identity1D(8, lane → dimM)groups lanes into eight-row strips (lane >> 2).identity1D(m/8, register → dimM)repeats the strip across the vertical extent of the tile.identity1D(n/(kWidth*4), register → dimN)handles column repetitions whenn > kWidth*4.
Together these six factors pack lane quads into rows, distribute row groups across registers, and make the layout cyclic over the 16×8 accumulator tile contained in each warp.
Mapper / demapper sanity checks
You can test the layout by reproducing the PTX warp-tile mapping in Python. The helper below mirrors how the layout turns (register, lane) pairs into (outer=row, inner=col) coordinates and back:
import numpy as np
from collections import defaultdict
def mapper(register, lane, kWidth=8, m=16, n=8):
"""(lane, register) → (outer=row, inner=col)"""
m_prime = m // 8 # row repetition (two 8-row halves)
n_prime = max(1, n // (kWidth * 4)) # column repetition (>=1 when n≥32)
inner = (register % kWidth) * 4 + lane % 4
outer = (lane // 4) % 8
outer = outer * m_prime + (register // kWidth) % m_prime
inner = inner * n_prime + (register // (kWidth * m_prime)) % n_prime
return outer, inner
def demapper(outer, inner, kWidth=8, m=16, n=8):
"""(outer=row, inner=col) → (lane, register)"""
m_prime = max(1, m // 8)
n_prime = max(1, n // (kWidth * 4))
inner_base = inner // n_prime
rep_inner = inner % n_prime
lane = (outer % 8) * 4 + (inner % 4)
reg0 = (inner_base // 4) % kWidth
reg1 = outer % m_prime
reg2 = rep_inner
register = reg0 + kWidth * (reg1 + m_prime * reg2)
return lane, register
Running demapper across the 16×16 accumulator viewport prints the lane/register assignments for each row. You should see the familiar Hopper-style pattern where four consecutive columns belong to the same four-lane subgroup, and rows advance in steps of eight threads:
kWidth, m, n = 8, 16, 16
array = [[demapper(row, col, kWidth=kWidth, m=m, n=n) for col in range(n)]
for row in range(m)]
def tie(row):
d = defaultdict(list)
for reg, lane in row:
d[reg].append(lane)
return [(r, d[r]) for r in sorted(d.keys())]
for row, r in enumerate(array):
print(f"row {row:2d}", tie(r))
The output packs each row into eight-lane groups (lane >> 2) and shows how registers cycle every kWidth columns. This is a direct confirmation that our simplified ctaLayout reproduces the PTX accumulator expectations—exactly what you need before calling invertAndCompose to stitch the accumulator to shared-memory staging buffers.
3. Core API Detailed Guide
This section walks through the core LinearLayout constructors and combinators, showing you how to build, manipulate, and apply layouts in practice, with extra commentary and quick sanity checks you can paste into your own code.
3.1. Basic Constructors — Build 1D Pieces
All three factory helpers return a one-dimensional layout. You stack them with operator* to produce richer, multi-dimensional structures.
3.1.1. identity1D — Pass-Through Mapping
static LinearLayout identity1D(
int32_t size, // Input domain size (must be power of two)
StringAttr inDim, // Input dimension name
StringAttr outDim // Output dimension name
);
Semantics: L(x) = x for x ∈ [0, size)
Example:
auto L = LinearLayout::identity1D(8, S("lane"), S("dim0"));
// L(0) = 0, L(1) = 1, ..., L(7) = 7
// Basis: [L(1)=1, L(2)=2, L(4)=4]
assert(L.apply({{S("lane"), 5}})[0].second == 5);
When to use:
- Contiguous, one-to-one mappings (e.g.,
registerindices inside a single thread) - As the “seed” layout when assembling more complex multi-dimensional structures
3.1.2. zeros1D — Broadcast / Replication
static LinearLayout zeros1D(
int32_t size,
StringAttr inDim,
StringAttr outDim,
int32_t outDimSize = 1
);
Semantics: L(x) = 0 for all x
Every input value maps to zero in the chosen output dimension. This models broadcasting: all threads share the same tensor index along this axis.
Example:
auto L = LinearLayout::zeros1D(8, S("lane"), S("dim1"));
// L(0) = L(1) = ... = L(7) = 0
// Basis: [0, 0, 0] — all basis vectors are zero
assert(L.apply({{S("lane"), 5}})[0].second == 0);
When to use:
- Broadcasting a single value across all threads (e.g., a column vector broadcast)
- Modeling data replication in multi-CTA scenarios
Optional parameter: outDimSize lets you record the logical extent even though all values collapse to zero. This is useful when you later compose or invert the layout and need to track the full codomain.
3.2. Direct Construction from Basis Vectors
When the helper constructors can’t express your pattern (swizzles, MMA layouts, custom permutations), build the layout manually from basis vectors.
LinearLayout(
BasesT bases, // map<inDim, vector<vector<int>>>
ArrayRef<StringAttr> outDimNames // ordered output dimension names
);
Basis format:
bases[inDim][i]stores the value ofL(inDim = 2^i, others = 0)bases[inDim][i][j]gives the contribution to thej-th output dimension (outDimNames[j])
Example: 2D Swizzle
LinearLayout swizzle({
{S("offset"), {
{0, 1}, {0, 2}, {0, 4}, {0, 8}, // Column basis (no swizzle)
{1, 0}, {2, 0}, {4, 4}, {8, 8} // Row basis (swizzled)
}}
}, {S("dim0"), S("dim1")});
// Basis interpretation:
// L(offset=1) = (0, 1) — column bit 0
// L(offset=2) = (0, 2) — column bit 1
// ...
// L(offset=16) = (1, 0) — row bit 0
// L(offset=32) = (2, 0) — row bit 1
// L(offset=64) = (4, 4) — row bit 2 with swizzle!
Verification tip: Whenever you hand-roll a basis, test it with a few manually computed points to ensure your phase/stride arithmetic is correct:
// offset=17 → binary decomposition: 16 + 1
auto result = swizzle.apply({{S("offset"), 17}});
// L(17) = L(16) ⊕ L(1) = (1,0) ⊕ (0,1) = (1,1)
assert(result == SmallVector{{S("dim0"), 1}, {S("dim1"), 1}});
3.3. Layout Combination with operator*
friend LinearLayout operator*(LinearLayout inner, LinearLayout outer);
operator* builds the direct sum of two layouts. Think of it as:
- Concatenating the input spaces (inner inputs remain independent of outer inputs)
- XOR-accumulating contributions to shared output dimensions
Rules of thumb:
- Input dimensions can overlap — overlapping in-dims are merged; use disjoint dims when you want Cartesian products.
- Shared output dimensions receive XOR contributions from both operands.
- The left operand is more minor — its dimensions change faster.
Example 1: Building a 2D identity
auto L = LinearLayout::identity1D(4, S("lane"), S("dim1")) *
LinearLayout::identity1D(8, S("register"), S("dim0"));
// Inputs: (register, lane) — register is minor, lane is major
// Outputs: (dim0, dim1)
// L(register=3, lane=2) = (dim0=3, dim1=2)
Example 2: Sharing an output dimension (XOR interaction)
auto L1 = LinearLayout::identity1D(4, S("lane"), S("dim0"));
auto L2 = LinearLayout::identity1D(8, S("register"), S("dim0"));
auto L = L1 * L2;
// Both contribute to dim0, so their outputs XOR:
// L(lane=2, register=3) = (dim0 = 2 ⊕ 3 = 1)
assert(L.apply({{S("lane"), 2}, {S("register"), 3}})[0].second == 1);
Example 3: Broadcast combined with identity
auto L = LinearLayout::zeros1D(4, S("lane"), S("dim1")) *
LinearLayout::identity1D(8, S("register"), S("dim0"));
// L(lane=?, register=5) = (dim0=5, dim1=0)
// All lanes share dim1=0 (broadcast), but register contributes to dim0
Mental model: Use operator* whenever two hardware dimensions co-exist at runtime (e.g., register × lane, warp × block). The result is a layout that maps the Cartesian product of inputs to (possibly XOR-ed) outputs.
3.4. Function Composition — compose
LinearLayout compose(const LinearLayout &outer) const;
Semantics: (outer ∘ this)(x) = outer(this(x))
Use compose when the output of one layout feeds directly into the input of another, creating a pipeline.
Requirements:
this’s output dimensions must exactly matchouter’s input dimensions (names and sizes)this->getOutDimSize(d) ≤ outer.getInDimSize(d)for all output dimensionsd
Example:
// L1: (register) → (offset)
auto L1 = LinearLayout::identity1D(32, S("register"), S("offset"));
// L2: (offset) → (dim0, dim1)
auto L2 = /* some swizzled shared layout */;
// L3: (register) → (dim0, dim1)
auto L3 = L1.compose(L2);
// Now you can map register IDs directly to tensor coordinates:
auto coords = L3.apply({{S("register"), 5}});
// coords = {{S("dim0"), ...}, {S("dim1"), ...}}
Why use compose?
- Modularity: Test
L1andL2independently before stitching them together. - Clarity: Express multi-stage transformations (register → offset → tensor) explicitly in code.
3.5. Invert and Compose — invertAndCompose
LinearLayout invertAndCompose(const LinearLayout &outer) const;
Semantics: Solve this(x) = outer(C(x)) for C
This is the workhorse function for data layout conversions. It computes a layout C that maps from this’s input space to outer’s input space, ensuring both layouts produce the same tensor coordinates.
Typical use case: Register → Shared Memory Store
// Source layout: (register, lane, warp) → (dim0, dim1)
auto regLayout = blocked.toLinearLayout(shape);
// Destination layout: (offset, block) → (dim0, dim1)
auto sharedLayout = swizzledSharedToLinearLayout(shape, sharedAttr);
// Conversion layout: (register, lane, warp) → (offset, block)
auto cvt = regLayout.invertAndCompose(sharedLayout);
// Meaning: "Which shared memory offset should (register=r, lane=l, warp=w) write to?"
Requirements:
outermust be surjective (cover all possible outputs)outer’s codomain ≥this’s codomainoutercan be non-injective (multiple inputs map to same output); the algorithm picks the smallest pre-image
Detailed example:
// 1) Register layout: thread 0, register 0 holds tensor[2,3]
auto regLayout = toLinearLayout(blockedEncoding);
assert(regLayout.apply({{S("register"),0}, {S("lane"),0}, {S("warp"),0}})
== SmallVector{{S("dim0"),2}, {S("dim1"),3}});
// 2) Shared layout: offset 10 corresponds to tensor[2,3]
auto memLayout = toLinearLayout(sharedEncoding);
assert(memLayout.apply({{S("offset"),10}})
== SmallVector{{S("dim0"),2}, {S("dim1"),3}});
// 3) Conversion: (register=0, lane=0, warp=0) → offset=10
auto cvt = regLayout.invertAndCompose(memLayout);
assert(cvt.apply({{S("register"),0}, {S("lane"),0}, {S("warp"),0}})
== SmallVector{{S("offset"),10}});
Why it works: Both regLayout and memLayout map to the same tensor space (dim0, dim1). The inversion finds the shared memory offset that corresponds to the same tensor element.
3.6. Shape Transformations — Flatten, Reshape, Transpose
These utilities let you reorganize the input or output dimensions without changing the underlying mapping.
flattenIns / flattenOuts
LinearLayout flattenIns() const;
LinearLayout flattenOuts() const;
Merge all input (or output) dimensions into a single dimension. Dimensions are flattened in minor-to-major order.
Example:
// Input: (register:4, lane:8, warp:2)
auto flat = layout.flattenIns();
// Output: (register:64) — size = 4×8×2
// Order: register changes fastest, then lane, then warp
reshapeIns / reshapeOuts
LinearLayout reshapeIns(
ArrayRef<std::pair<StringAttr, int32_t>> newInDims
) const;
LinearLayout reshapeOuts(
ArrayRef<std::pair<StringAttr, int32_t>> newOutDims
) const;
Flatten all dimensions, then split into new named dimensions. The total size must remain the same.
Example:
auto reshaped = layout.reshapeIns({
{S("thread"), 32},
{S("block"), 2}
});
// Flattens (register:4, lane:8, warp:2) → 64 elements
// Then splits into (thread:32, block:2)
transposeIns / transposeOuts
LinearLayout transposeIns(ArrayRef<StringAttr> newOrder) const;
LinearLayout transposeOuts(ArrayRef<StringAttr> newOrder) const;
Reorder dimensions explicitly. Typically used before reshape when you need a non-default minor axis.
Example:
// Original: (register:4, lane:8)
auto transposed = layout.transposeIns({S("lane"), S("register")});
// Result: (lane:8, register:4) — now lane is minor
auto flat = transposed.flattenIns();
// Flattens with lane changing fastest
Workflow tip:
// If you want to flatten with warp as the minor axis:
auto transposed = layout.transposeIns({S("warp"), S("lane"), S("register")});
auto flat = transposed.flattenIns();
// Now warp changes fastest, then lane, then register
3.7. Evaluating Layouts — apply and applyLinearLayout
Two complementary helpers let you inspect or materialize a layout.
Integer apply — For Testing and Debugging
SmallVector<std::pair<StringAttr, int32_t>>
apply(ArrayRef<std::pair<StringAttr, int32_t>> ins) const;
Feed integer inputs, get integer outputs. Perfect for:
- Assertions in unit tests
- REPL exploration to understand a layout
- Manual verification of basis correctness
Example:
auto result = layout.apply({
{S("register"), 3},
{S("lane"), 5},
{S("warp"), 1}
});
// result: {{S("dim0"), ...}, {S("dim1"), ...}}
assert(result[0].second == expectedDim0);
assert(result[1].second == expectedDim1);
MLIR applyLinearLayout — For LLVM Lowering
// Defined in include/triton/Conversion/TritonGPUToLLVM/Utility.h
SmallVector<std::pair<StringAttr, Value>>
applyLinearLayout(
Location loc,
RewriterBase &rewriter,
const LinearLayout &layout,
ArrayRef<std::pair<StringAttr, Value>> indices
);
Consumes MLIR Values (SSA values), emits LLVM IR to compute the layout function at runtime.
Example (during lowering):
Value regId = b.i32_val(0);
Value laneId = getLaneId(rewriter, loc);
Value warpId = getWarpId(rewriter, loc);
auto offsets = applyLinearLayout(loc, rewriter, cvtLayout, {
{S("register"), regId},
{S("lane"), laneId},
{S("warp"), warpId}
});
Value sharedOffset = offsets[0].second; // MLIR Value representing offset
// Use sharedOffset in a GEP or store instruction
Critical reminder: Always keep the dimension names in sync between applyLinearLayout inputs and the layout you created. Mismatched strings lead to silent wrong-code bugs!
Conclusion
Armed with the patterns above, LinearLayout stops being a black box and becomes a practical tool you can reach for daily.