# MacroAssembler and Assembler Code Generation Behaviours This design document describes the way that the Assembler and MacroAssembler generate code. This has wide implications, including buffer management, instruction-precise generation and related control over code generation. **NOTE**: This document describes design decisions, but the code does not implement or match everything that is described in this document. TODO: Work on the code to achieve what is expressed in this document, and update the documentation. ## Basic Use-Cases ### Simple Code Generation For normal code generation, the MacroAssembler should be used. We recommend this even if the caller doesn't require the macro behaviour, because it acts as a (partial) fail-safe in case calling code accidentally passes out-of-range immediates and suchlike. The MacroAssembler is also able to check and emit literal pools. Note that the MacroAssembler is allowed to emit an arbitrary amount of code in order to achieve the requested effect. (Note that "arbitrary" includes no code, for macros with no effect other than to advance the PC.) In practical terms, a macro can effectively generate a huge amount of code if it needs to emit a literal pool (for example). ### Precise Code Generation Sometimes, the caller needs to generate a very precise code sequence. The typical use-case is where code needs to be patched. In these cases, the MacroAssembler must not be used, but the Assembler can be called directly. Since the caller most likely has a MacroAssembler object, we provide ExactAssemblyScope to restrict any MacroAssembler methods from being used for its duration, and allow access to the Assembler. ### Fuzzy Cases: Approximate Size Limits Sometimes the caller wants the convenience and fail-safe features of the MacroAssembler, and isn't worried about the precise code sequence used, but needs to ensure that the total code size does not exceed the range of a branch (or similar PC-relative instruction). Veneers simplify many cases like this, but not all (and not necessarily optimally). For example, VIXL32's Switch-Case macros probably do not have sufficient range to cope with a literal pool in the middle. For these cases, we'd like to ensure that the MacroAssembler doesn't emit _too much_ code. This is very fuzzy, and in practice means avoiding pools, but allowing standard macros. The catch is that the caller must specify the upper limit on the size of the generated code. A corner-case that is relevant for VIXL32 (but mostly irrelevant for VIXL64) is that the protected region could easily be larger than the range of some load-literal instructions, so we should not actually _block_ the pool. For example, `vldr` has a range of about 1KB, but `tbh` can easily exceed this range. If one of the Cases generates an FP literal-load, the MacroAssembler needs to put the pool in the middle of the Switch-Case sequence. This case is currently accomodated by MacroAssembler::EnsureEmitFor. ## Proposal These behaviours are similar to (or the same as) existing cases to avoid breaking backwards compatibility. Several potentially-unsafe scopes have been deprecated, and a few have been given more flexibility. Each scope utility will behave in the same way for VIXL64 as for VIXL32, even if the implementations differ. ### `CodeBufferCheckScope(Assembler* assm, size_t size, ...)` - Allow code emission from the specified `Assembler`. - Optionally reserve space in the `CodeBuffer` (if it is managed by VIXL). - Optionally, on destruction, check the size of the generated code. (The size can be either exact or a maximum size.) This scope exists so that callers can use an Assembler by itself, without even instantiating a MacroAssembler. ### `CodeBufferCheckScope(MacroAssembler* masm, ...)` - DEPRECATED Otherwise, this is the same as `CodeBufferCheckScope(Assembler*)`. It is unfortunate that this scope allows the Assembler and MacroAssembler to be mixed freely; this can cause numerous problems. For example, the Assembler doesn't know about the pools, so use of the Assembler can push the pools out of range. This was acceptable in VIXL64, where the pool range is very large, but not in VIXL32. We should retain the existing functionality for a while, but mark the `MacroAssembler*` form as DEPRECATED. A suitable replacement is EmissionCheckScope, which allows the Assembler and MacroAssembler to be mixed, but also blocks pools and therefore avoids the problems that `CodeBufferCheckScope` has. ### `EmissionCheckScope(MacroAssembler* masm, size_t size, AssertPolicy ...)` - Do the same as `CodeBufferCheckSCope`, but: - If managed by VIXL, always reserve space in the `CodeBuffer`. - Always check the (exact or maximum) size of the generated code on destruction. - Emit pools if the specified size would push them out of range. - Block pools emission for the duration of the scope. This scope allows the `Assembler` and `MacroAssembler` to be freely and safely mixed for its duration. The MacroAssembler uses this to implement its own macros. ### `ExactAssemblyScope(MacroAssembler* masm, ...)` - Do the same as `EmissionCheckScope`. - Block access to the MacroAssemblerInterface (using run-time assertions). This scope allows safely generating exact assembly code. Compared to `CodeBufferCheckScope`, it disables the `MacroAssembler`, and guarantees that no pools will be emitted during code generation. This replaces VIXL64's InstructionAccurateScope. ### `BlockPoolsScope` (and variants) - DEPRECATED - Block the pools for the duration. These scopes really shouldn't be used outside VIXL itself. Since uses inside VIXL are minimal, we should mark it as DEPRECATED and replace our own uses with EmissionCheckScope or manual `MacroAssembler::BlockPools()` calls. Note that this scope made sense in VIXL64, where pool ranges are large and we have a large contingency region built into the pool checks. In VIXL32, where the ranges are tight, we can't generally afford to block the constant pools at arbitrary points, even for short sequences of instructions. ### `InstructionAccurateScope` - DEPRECATED - Replaced by ExactAssemblyScope. When generating T32, we need something like InstructionAccurateScope to check the code _size_, rather than the instruction count, since the instruction size is much more likely to vary in a way that matters. However, it's not safe to just change `InstructionAccurateScope`'s behaviour because the constructor prototype would be unchanged, so there would be no compile-time warning for users. ### `MacroAssembler::EnsureEmitFor` - Private to the MacroAssembler (but available, in a DEPRECATED form, to VIXL64 users). - Ensure that there is space in the CodeBuffer so that `size` bytes can be emitted contiguously. Pools are emitted if `size` bytes would push them out of range, but they are not actually blocked; pools can still be emitted during the specified range if they are used during the range. __ EnsureEmitFor(4096); // Might dump pools. __ Add(...); __ Add(...); // These macros will not dump pools. They might __ Add(...); // emit multiple instructions. __ Add(...); __ Vldr(d0, 12345.0); // Adds an entry to the literal pool (range ~1KB). __ Add(...); ... __ Add(...); __ Add(...); // The pool containing 12345.0 will be dumped __ Add(...); // before the end of the EnsureEmitFor range. __ Add(...); This is a one-shot call, not a scope utility, so there is no size checking available. For that reason, it is risky, but still useful in certain cases. There are also tricky corner-cases to consider. Most notably, if literals are added to the pool during the `EnsureEmitFor` range, a pool might still be generated in that range. This can be avoided by including the size of new literals in the size check, but because this is not a scope utility and has not destruction checks, we cannot assert that the usage was safe. Also note that this does not acquire the CodeBuffer, so it is not possible to use the Assembler after using this utility alone. ## Usage Examples ### Basic Usage void fn(MacroAssembler* masm) { // - Uses delegates if necessary. // - Arbitrary length (including 0, potentially). // - Can automatically reserve space in the code buffer. // - Can automatically dump pools. masm->Add(...); } void fn(MacroAssembler* masm) { // - No delegates allowed. // - If a delegate is called, it should crash even in release mode. // (This helps to avoid security bugs derived from data-dependent code // generation.) // - Always generates exactly one instruction. // - No automatic buffer growth, but does check that there is space. (In // VIXL64, this is done by the CodeBuffer.) // - In VIXL64, this requires that the code buffer has been "acquired". // Any of the EmissionCheckScopes can do this, as can ExactAssemblyScope // and CodeBufferCheckScope. SingleEmissionCheckScope(masm); masm->add(...); } void fn(Assembler* assm) { // Identical to the MacroAssembler::add example, except that we must use // CodeBufferCheckScope to acquire the buffer. CodeBufferCheckScope(assm, ...); assm->add(...); } ### Macros: Simple Code Generation void MacroAssembler::Add(...) { // A macro no larger than // `MacroEmissionCheckScope::kTypicalMacroInstructionMaxSize`. MacroEmissionCheckScope scope(...); ... } void MacroAssembler::Printf(...) { // A macro larger than // `MacroEmissionCheckScope::kTypicalMacroInstructionMaxSize`. // We start no scope, but rely only on upper-case macros which create // their own MacroEmissionCheckScopes. Pools can be emitted during // this large macro. Add(...) Ldr(...) ... } ### Patchable Regions: Precise Code Generation void fn(MacroAssembler* masm) { __ Add(...); __ Add(...); __ Add(...); { // We want this sequence of instructions to be patched later, so we need // to use instruction-accurate code generation with a predictable size. // It is forbidden to use macros during this scope. ExactAssemblyScope(masm, 4 * kInstructionSize); __ bind(&patch_location); __ add(...); __ add(...); __ add(...); __ add(...); } __ Add(...); __ Add(...); __ Add(...); }