Courgette Internals
===================
Patch Generation
----------------
![Patch Generation](generation.png)
- courgette\_tool.cc:GenerateEnsemblePatch kicks off the patch
generation by calling ensemble\_create.cc:GenerateEnsemblePatch
- The files are read in by in courgette:SourceStream objects
- ensemble\_create.cc:GenerateEnsemblePatch uses FindGenerators, which
uses MakeGenerator to create
patch\_generator\_x86\_32.h:PatchGeneratorX86\_32 classes.
- PatchGeneratorX86\_32's Transform method transforms the input file
using Courgette's core techniques that make the bsdiff delta
smaller. The steps it takes are the following:
- _disassemble_ the old and new binaries into AssemblyProgram
objects,
- _adjust_ the new AssemblyProgram object, and
- _encode_ the AssemblyProgram object back into raw bytes.
### Disassemble
- The input is a pointer to a buffer containing the raw bytes of the
input file.
- Disassembly converts certain machine instructions that reference
addresses to Courgette instructions. It is not actually
disassembly, but this is the term the code-base uses. Specifically,
it detects instructions that use absolute addresses given by the
binary file's relocation table, and relative addresses used in
relative branches.
- Done by disassemble:ParseDetectedExecutable, which selects the
appropriate Disassembler subclass by looking at the binary file's
headers.
- disassembler\_win32\_x86.h defines the PE/COFF x86 disassembler
- disassembler\_elf\_32\_x86.h defines the ELF 32-bit x86 disassembler
- disassembler\_elf\_32\_arm.h defines the ELF 32-bit arm disassembler
- The Disassembler replaces the relocation table with a Courgette
instruction that can regenerate the relocation table.
- The Disassembler builds a list of addresses referenced by the
machine code, numbering each one.
- The Disassembler replaces and address used in machine instructions
with its index number.
- The output is an assembly\_program.h:AssemblyProgram class, which
contains a list of instructions, machine or Courgette, and a mapping
of indices to actual addresses.
### Adjust
- This step takes the AssemblyProgram for the old file and reassigns
the indices that map to actual addresses. It is performed by
adjustment_method.cc:Adjust().
- The goal is the match the indices from the old program to the new
program as closely as possible.
- When matched correctly, machine instructions that jump to the
function in both the new and old binary will look the same to
bsdiff, even the function is located in a different part of the
binary.
### Encode
- This step takes an AssemblyProgram object and encodes both the
instructions and the mapping of indices to addresses as byte
vectors. This format can be written to a file directly, and is also
more appropriate for bsdiffing. It is done by
AssemblyProgram.Encode().
- encoded_program.h:EncodedProgram defines the binary format and a
WriteTo method that writes to a file.
### bsdiff
- simple_delta.c:GenerateSimpleDelta
Patch Application
-----------------
![Patch Application](application.png)
- courgette\_tool.cc:ApplyEnsemblePatch kicks off the patch generation
by calling ensemble\_apply.cc:ApplyEnsemblePatch
- ensemble\_create.cc:ApplyEnsemblePatch, reads and verifies the
patch's header, then calls the overloaded version of
ensemble\_create.cc:ApplyEnsemblePatch.
- The patch is read into an ensemble_apply.cc:EnsemblePatchApplication
object, which generates a set of patcher_x86_32.h:PatcherX86_32
objects for the sections in the patch.
- The original file is disassembled and encoded via a call
EnsemblePatchApplication.TransformUp, which in turn call
patcher_x86_32.h:PatcherX86_32.Transform.
- The transformed file is then bspatched via
EnsemblePatchApplication.SubpatchTransformedElements, which calls
EnsemblePatchApplication.SubpatchStreamSets, which calls
simple_delta.cc:ApplySimpleDelta, Courgette's built-in
implementation of bspatch.
- Finally, EnsemblePatchApplication.TransformDown assembles, i.e.,
reverses the encoding and disassembly, on the patched binary data.
This is done by calling PatcherX86_32.Reform, which in turn calls
the global function encoded_program.cc:Assemble, which calls
EncodedProgram.AssembleTo.
Glossary
--------
**Adjust**: Reassign address indices in the new program to match more
closely those from the old.
**Assembly program**: The output of _disassembly_. Contains a list of
_Courgette instructions_ and an index of branch target addresses.
**Assemble**: Convert an _assembly program_ back into an object file
by evaluating the _Courgette instructions_ and leaving the machine
instructions in place.
**Courgette instruction**: Replaces machine instructions in the
program. Courgette instructions replace branches with an index to
the target addresses and replace part of the relocation table.
**Disassembler**: Takes a binary file and produces an _assembly
program_.
**Encode**: Convert an _assembly program_ into an _encoded program_ by
serializing its data structures into byte vectors more appropriate
for storage in a file.
**Encoded Program**: The output of encoding.
**Ensemble**: A Courgette-style patch containing sections for the list
of branch addresses, the encoded program. It supports patching
multiple object files at once.
**Opcode**: The number corresponding to either a machine or _Courgette
instruction_.