2b14d3d6a5
* Implement parts of `LegoCarBuild` and dependents * Fix BETA10 offset * Fix constants * Code style: Rename parameters * Linter fixes v2 * Linter errors v3 * Add BETA10 variable names for presenters * Address review comments --------- Co-authored-by: jonschz <jonschz@users.noreply.github.com> |
||
---|---|---|
.. | ||
decomplint | ||
ghidra_scripts | ||
isledecomp | ||
ncc | ||
reccmp | ||
roadmap | ||
verexp | ||
vtable | ||
datacmp.py | ||
patch_c2.py | ||
README.md | ||
requirements.txt |
LEGO Island Decompilation Tools
Accuracy to the game's original code is the main goal of this project. To facilitate the decompilation effort and maintain overall quality, we have devised a set of annotations, to be embedded in the source code, which allow us to automatically verify the accuracy of re-compiled functions' assembly, virtual tables, variable offsets and more.
In order for contributions to be accepted, the annotations must be used in accordance to the rules outlined here. Proper use is enforced by GitHub Actions which run the Python tools found in this folder. It is recommended to integrate these tools into your local development workflow as well.
Overview
We are continually working on extending the capabilities of our "decompilation language" and the toolset around it. Some of the following annotations have not made it into formal verification and thus are not technically enforced on the source code level yet (marked as WIP). Nevertheless, it is recommended to use them since it is highly likely they will eventually be fully integrated.
Functions
All non-inlined functions in the code base with the exception of 3rd party code must be annotated with one of the following markers, which include the module name and address of the function as found in the original binaries. This information is then used to compare the recompiled assembly with the original assembly, resulting in an accuracy score. Functions in a given compilation unit must be ordered by their address in ascending order.
The annotations can be attached to the function implementation, which is the most common case, or use the "comment" syntax (see examples below) for functions that cannot be referred to directly (such as templated, synthetic or non-inlined inline functions). The latter should only ever appear in .h
files.
FUNCTION
Functions with a reasonably complete implementation which are not templated or synthetic (see below) should be annotated with FUNCTION
.
// FUNCTION: LEGO1 0x100b12c0
MxCore* MxObjectFactory::Create(const char* p_name)
{
// implementation
}
// FUNCTION: LEGO1 0x100140d0
// MxCore::IsA
STUB
Functions with no or a very incomplete implementation should be annotated with STUB
. These will not be compared to the original assembly.
// STUB: LEGO1 0x10011d50
LegoCameraController::LegoCameraController()
{
// TODO
}
TEMPLATE
Templated functions should be annotated with TEMPLATE
. Since the goal is to eventually have a full accounting of all the functions present in the binaries, please make an effort to find and annotate every function of a templated class.
// TEMPLATE: LEGO1 0x100c0ee0
// list<MxNextActionDataStart *,allocator<MxNextActionDataStart *> >::_Buynode
// TEMPLATE: LEGO1 0x100c0fc0
// MxStreamListMxDSSubscriber::~MxStreamListMxDSSubscriber
// TEMPLATE: LEGO1 0x100c1010
// MxStreamListMxDSAction::~MxStreamListMxDSAction
SYNTHETIC
Synthetic functions should be annotated with SYNTHETIC
. A synthetic function is generated by the compiler; most common is the "scalar deleting destructor" found in virtual tables. Other cases include default destructors and assignment operators. Note: SYNTHETIC
takes precedence over TEMPLATE
.
// SYNTHETIC: LEGO1 0x10003210
// Helicopter::`scalar deleting destructor'
// SYNTHETIC: LEGO1 0x100c4f50
// MxCollection<MxRegionLeftRight *>::`scalar deleting destructor'
// SYNTHETIC: LEGO1 0x100c4fc0
// MxList<MxRegionLeftRight *>::`scalar deleting destructor'
LIBRARY
Functions located in 3rd party libraries should be annotated with LIBRARY
. Since the goal is to eventually have a full accounting of all the functions present in the binaries, please make an effort to find and annotate every function of every statically linked library, including the MSVC standard libraries.
// LIBRARY: ISLE 0x4061b0
// _MemPoolInit@4
// LIBRARY: ISLE 0x406520
// _MemPoolSetPageSize@8
// LIBRARY: ISLE 0x406630
// _MemPoolSetBlockSizeFS@8
Virtual tables
Classes with a virtual table should be annotated using the VTABLE
marker, which includes the module name and address of the virtual table. Additionally, virtual function declarations should be annotated with a comment indicating their relative offset. Please use the following example as a reference.
// VTABLE: LEGO1 0x100dc900
class MxEventManager : public MxMediaManager {
public:
MxEventManager();
virtual ~MxEventManager() override;
virtual void Destroy() override; // vtable+0x18
virtual MxResult Create(MxU32 p_frequencyMS, MxBool p_createThread); // vtable+0x28
Class size
Classes should be annotated using the SIZE
marker to indicate their size. If you are unsure about the class size in the original binary, please use the currently available information (known member variables) and detail the circumstances in an extra comment if necessary.
// SIZE 0x1c
class MxCriticalSection {
public:
MxCriticalSection();
~MxCriticalSection();
static void SetDoMutex();
Furthermore, add DECOMP_SIZE_ASSERT(MxCriticalSection, 0x1c)
to the respective .cpp
file (if the class has no dedicated .cpp
file, use any appropriate .cpp
file where the class is used).
Member variables
Member variables should be annotated with their relative offsets.
class MxDSObject : public MxCore {
private:
MxU32 m_sizeOnDisk; // 0x8
MxU16 m_type; // 0xc
char* m_sourceName; // 0x10
undefined4 m_unk0x14; // 0x14
Global variables
Global variables should be annotated using the GLOBAL
marker, which includes the module name and address of the variable.
// GLOBAL: LEGO1 0x100f456c
MxAtomId* g_jukeboxScript = NULL;
// GLOBAL: LEGO1 0x100f4570
MxAtomId* g_pz5Script = NULL;
// GLOBAL: LEGO1 0x100f4574
MxAtomId* g_introScript = NULL;
Strings
String values should be annotated using the STRING
marker, which includes the module name and address of the string.
inline virtual const char* ClassName() const override // vtable+0x0c
{
// STRING: LEGO1 0x100f03fc
return "Act2PoliceStation";
}
Tooling
Use pip
to install the required packages to be able to use the Python tools found in this folder:
pip install -r tools/requirements.txt
The example usages below assume that the current working directory is this repository's root and that the retail binaries have been copied to ./legobin
.
decomplint
: Checks the decompilation annotations (see above)- e.g.
py -m tools.decomplint.decomplint --module LEGO1 LEGO1
- e.g.
isledecomp
: A library that implements a parser to identify the decompilation annotations (see above)ncc
: Checks naming conventions based on a set of rulesreccmp
: Compares an original binary with a recompiled binary, provided a PDB file. For example:- Display the diff for a single function:
py -m tools.reccmp.reccmp --verbose 0x100ae1a0 legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .
- Generate an HTML report:
py -m tools.reccmp.reccmp --html output.html legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .
- Create a base file for diffs:
py -m tools.reccmp.reccmp --json base.json --silent legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .
- Diff against a base file:
py -m tools.reccmp.reccmp --diff base.json legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .
- Display the diff for a single function:
roadmap
: Compares symbol locations in an original binary with the same symbol locations of a recompiled binaryverexp
: Verifies exports by comparing the exports of the original DLL and the recompiled DLLvtable
: Asserts virtual table correctness by comparing a recompiled binary with the original- e.g.
py -m tools.vtable.vtable legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .
- e.g.
datacmp.py
: Compares global data found in the original with the recompiled version- e.g.
py -m tools.datacmp legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .
- e.g.
patch_c2.py
: PatchesC2.EXE
(part of MSVC 4.20) to get rid of a bugged warning
Testing
isledecomp
comes with a suite of tests. Install pytest
and run it, passing in the directory:
pip install pytest
pytest tools/isledecomp/tests/
Tool Development
In order to keep the Python code clean and consistent, we use pylint
and black
:
pip install black pylint
Run pylint (ignores build and virtualenv)
pylint tools/ --ignore=build,ncc
Check Python code formatting without rewriting files
black --check tools/
Apply Python code formatting
black tools/
Modules
The following is a list of all the modules found in the annotations (e.g. // FUNCTION: [module] [address]
) and which binaries they refer to. See this list of all known versions of the game.
Retail v1.1.0.0 (v1.1)
LEGO1
->LEGO1.DLL
CONFIG
->CONFIG.EXE
ISLE
->ISLE.EXE
These modules are the most important ones and refer to the English retail version 1.1.0.0 (often shortened to v1.1), which is the most widely released one. These are the ones we attempt to decompile and match as best as possible.
BETA v1.0
BETA10
->LEGO1D.DLL
The Beta 1.0 version contains a debug build of the game. While it does not have debug symbols, it still has a number of benefits:
- It is built with less or no optimisation, leading to better decompilations in Ghidra
- Far fewer functions are inlined by the compiler, so it can be used to recognise inlined functions
- It contains assertions that tell us original variable names and code file paths
It is therefore advisable to search for the corresponding function in BETA10
when decompiling a function in LEGO1
. Finding the correct function can be tricky, but is usually worth it, especially for longer functions.
Unfortunately, some code has been changed after this beta version was created. Therefore, we are not aiming for a perfect binary match of BETA10
. In case of discrepancies, LEGO1
(as defined above) is our "gold standard" for matching.
Re-compiling a beta build (WIP)
If you want to match the code against BETA10
, use the following cmake
setup to create a debug build:
cmake <path-to-source> -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_BUILD_TYPE=Debug -DISLE_USE_SMARTHEAP=OFF
TODO: If you can figure out how to make a debug build with SmartHeap enabled, please add it here.
If you want to run scripts to compare your debug build to BETA10
(e.g. reccmp
), it is advisable to add a copy of LEGO1D.DLL
to /legobin
and rename it to BETA10.DLL
.
Finding matching functions
This is not a recipe, but rather a list of things you can try.
- If you are working on a virtual function in a class, try to find the class' vtable. Many (but not all) classes implement
ClassName()
. These functions are usually easy to find by searching the memory for the string consisting of the class name. Keep in mind that not all child classes overwrite this function, so if the function you found is used in multiple vtables (or if you found multipleClassName()
-like functions), make sure you actually have the parent's vtable. - If that does not help, you can try to walk up the call tree and try to locate a function that calls the function you are interested in.
- Assertions can also help you - most
.cpp
file names have already been matched based onBETA10
, so you can search for the name of your.cpp
file and check all the assertions in that file. While that does not find all functions in a given source file, it usually finds the more complex ones. - If you have found any other strategies, please add them here.
Others (WIP)
ALPHA
(only used twice)