Remove decomp tools

2024-11-22 07:28:00 -05:00 · 2024-05-16 22:20:47 -04:00 · 2024-05-16 22:20:47 -04:00 · 3fdf45968e
commit 3fdf45968e
parent a548770f7e
81 changed files with 0 additions and 12562 deletions
--- a/tools/README.md
+++ b/tools/README.md
@ -1,200 +0,0 @@
 # LEGO Island Decompilation Tools
 Accuracy to the game's original code is the main goal of this project. To facilitate the decompilation effort and maintain overall quality, we have devised a set of annotations, to be embedded in the source code, which allow us to automatically verify the accuracy of re-compiled functions' assembly, virtual tables, variable offsets and more.
 In order for contributions to be accepted, the annotations must be used in accordance to the rules outlined here. Proper use is enforced by [GitHub Actions](/.github/workflows) which run the Python tools found in this folder. It is recommended to integrate these tools into your local development workflow as well.
 # Overview
 We are continually working on extending the capabilities of our "decompilation language" and the toolset around it. Some of the following annotations have not made it into formal verification and thus are not technically enforced on the source code level yet (marked as **WIP**). Nevertheless, it is recommended to use them since it is highly likely they will eventually be fully integrated.
 ## Functions
 All non-inlined functions in the code base with the exception of [3rd party code](/3rdparty) must be annotated with one of the following markers, which include the module name and address of the function as found in the original binaries. This information is then used to compare the recompiled assembly with the original assembly, resulting in an accuracy score. Functions in a given compilation unit must be ordered by their address in ascending order.
 The annotations can be attached to the function implementation, which is the most common case, or use the "comment" syntax (see examples below) for functions that cannot be referred to directly (such as templated, synthetic or non-inlined inline functions). The latter should only ever appear in `.h` files.
 ### `FUNCTION`
 Functions with a reasonably complete implementation which are not templated or synthetic (see below) should be annotated with `FUNCTION`.
 ```
 // FUNCTION: LEGO1 0x100b12c0
 MxCore* MxObjectFactory::Create(const char* p_name)
 {
  // implementation
 }
 // FUNCTION: LEGO1 0x100140d0
 // MxCore::IsA
 ```
 ### `STUB`
 Functions with no or a very incomplete implementation should be annotated with `STUB`. These will not be compared to the original assembly.
 ```
 // STUB: LEGO1 0x10011d50
 LegoCameraController::LegoCameraController()
 {
  // TODO
 }
 ```
 ### `TEMPLATE`
 Templated functions should be annotated with `TEMPLATE`. Since the goal is to eventually have a full accounting of all the functions present in the binaries, please make an effort to find and annotate every function of a templated class.
 ```
 // TEMPLATE: LEGO1 0x100c0ee0
 // list<MxNextActionDataStart *,allocator<MxNextActionDataStart *> >::_Buynode
 // TEMPLATE: LEGO1 0x100c0fc0
 // MxStreamListMxDSSubscriber::~MxStreamListMxDSSubscriber
 // TEMPLATE: LEGO1 0x100c1010
 // MxStreamListMxDSAction::~MxStreamListMxDSAction
 ```
 ### `SYNTHETIC`
 Synthetic functions should be annotated with `SYNTHETIC`. A synthetic function is generated by the compiler; most common is the "scalar deleting destructor" found in virtual tables. Other cases include default destructors and assignment operators. Note: `SYNTHETIC` takes precedence over `TEMPLATE`.
 ```
 // SYNTHETIC: LEGO1 0x10003210
 // Helicopter::`scalar deleting destructor'
 // SYNTHETIC: LEGO1 0x100c4f50
 // MxCollection<MxRegionLeftRight *>::`scalar deleting destructor'
 // SYNTHETIC: LEGO1 0x100c4fc0
 // MxList<MxRegionLeftRight *>::`scalar deleting destructor'
 ```
 ### `LIBRARY`
 Functions located in 3rd party libraries should be annotated with `LIBRARY`. Since the goal is to eventually have a full accounting of all the functions present in the binaries, please make an effort to find and annotate every function of every statically linked library, including the MSVC standard libraries.
 ```
 // LIBRARY: ISLE 0x4061b0
 // _MemPoolInit@4
 // LIBRARY: ISLE 0x406520
 // _MemPoolSetPageSize@8
 // LIBRARY: ISLE 0x406630
 // _MemPoolSetBlockSizeFS@8
 ```
 ## Virtual tables
 Classes with a virtual table should be annotated using the `VTABLE` marker, which includes the module name and address of the virtual table. Additionally, virtual function declarations should be annotated with a comment indicating their relative offset. Please use the following example as a reference.
 ```
 // VTABLE: LEGO1 0x100dc900
 class MxEventManager : public MxMediaManager {
 public:
 	MxEventManager();
 	virtual ~MxEventManager() override;
 	virtual void Destroy() override;                                     // vtable+0x18
 	virtual MxResult Create(MxU32 p_frequencyMS, MxBool p_createThread); // vtable+0x28
 ```
 ## Class size (**WIP**)
 Classes should be annotated using the `SIZE` marker to indicate their size. If you are unsure about the class size in the original binary, please use the currently available information (known member variables) and detail the circumstances in an extra comment if necessary.
 ```
 // SIZE 0x1c
 class MxCriticalSection {
 public:
 	MxCriticalSection();
 	~MxCriticalSection();
 	static void SetDoMutex();
 ```
 ## Member variables (**WIP**)
 Member variables should be annotated with their relative offsets.
 ```
 class MxDSObject : public MxCore {
 private:
 	MxU32 m_sizeOnDisk;   // 0x8
 	MxU16 m_type;         // 0xc
 	char* m_sourceName;   // 0x10
 	undefined4 m_unk0x14; // 0x14
 ```
 ## Global variables
 Global variables should be annotated using the `GLOBAL` marker, which includes the module name and address of the variable.
 ```
 // GLOBAL: LEGO1 0x100f456c
 MxAtomId* g_jukeboxScript = NULL;
 // GLOBAL: LEGO1 0x100f4570
 MxAtomId* g_pz5Script = NULL;
 // GLOBAL: LEGO1 0x100f4574
 MxAtomId* g_introScript = NULL;
 ```
 ## Strings
 String values should be annotated using the `STRING` marker, which includes the module name and address of the string.
 ```
 inline virtual const char* ClassName() const override // vtable+0x0c
 {
 	// STRING: LEGO1 0x100f03fc
 	return "Act2PoliceStation";
 }
 ```
 # Tooling
 Use `pip` to install the required packages to be able to use the Python tools found in this folder:
 ```
 pip install -r tools/requirements.txt
 ```
 * [`decomplint`](/tools/decomplint): Checks the decompilation annotations (see above)
 * [`isledecomp`](/tools/isledecomp): A library that implements a parser to identify the decompilation annotations (see above)
 * [`ncc`](/tools/ncc): Checks naming conventions based on a set of rules
 * [`reccmp`](/tools/reccmp): Compares an original binary with a recompiled binary, provided a PDB file
 * [`roadmap`](/tools/roadmap): Compares symbol locations in an original binary with the same symbol locations of a recompiled binary
 * [`verexp`](/tools/verexp): Verifies exports by comparing the exports of the original DLL and the recompiled DLL
 * [`vtable`](/tools/vtable): Asserts virtual table correctness by comparing a recompiled binary with the original
 * [`datacmp.py`](/tools/datacmp.py): Compares global data found in the original with the recompiled version 
 * [`patch_c2.py`](/tools/patch_c2.py): Patches `C2.EXE` (part of MSVC 4.20) to get rid of a bugged warning
 ## Testing
 `isledecomp` comes with a suite of tests. Install `pylint` and run it, passing in the directory:
 ```
 pip install pytest
 pytest tools/isledecomp/tests/
 ```
 ## Development
 In order to keep the code clean and consistent, we use `pylint` and `black`:
 `pip install black pylint`
 ### Run pylint (ignores build and virtualenv)
 `pylint tools/ --ignore=build,bin,lib`
 ### Check code formatting without rewriting files
 `black --check tools/`
 ### Apply code formatting
 `black tools/`
--- a/tools/datacmp.py
+++ b/tools/datacmp.py
@ -1,361 +0,0 @@
 # (New) Data comparison.
 import os
 import argparse
 import logging
 from enum import Enum
 from typing import Iterable, List, NamedTuple, Optional, Tuple
 from struct import unpack
 from isledecomp.compare import Compare as IsleCompare
 from isledecomp.compare.db import MatchInfo
 from isledecomp.cvdump import Cvdump
 from isledecomp.cvdump.types import (
    CvdumpKeyError,
    CvdumpIntegrityError,
 )
 from isledecomp.bin import Bin as IsleBin
 import colorama
 colorama.just_fix_windows_console()
 # Ignore all compare-db messages.
 logging.getLogger("isledecomp.compare").addHandler(logging.NullHandler())
 def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Comparing data values.")
    parser.add_argument(
        "original", metavar="original-binary", help="The original binary"
    )
    parser.add_argument(
        "recompiled", metavar="recompiled-binary", help="The recompiled binary"
    )
    parser.add_argument(
        "pdb", metavar="recompiled-pdb", help="The PDB of the recompiled binary"
    )
    parser.add_argument(
        "decomp_dir", metavar="decomp-dir", help="The decompiled source tree"
    )
    parser.add_argument(
        "-v",
        "--verbose",
        action=argparse.BooleanOptionalAction,
        default=False,
        help="",
    )
    parser.add_argument(
        "--no-color", "-n", action="store_true", help="Do not color the output"
    )
    parser.add_argument(
        "--all",
        "-a",
        dest="show_all",
        action="store_true",
        help="Only show variables with a problem",
    )
    parser.add_argument(
        "--print-rec-addr",
        action="store_true",
        help="Print addresses of recompiled functions too",
    )
    (args, _) = parser.parse_known_args()
    if not os.path.isfile(args.original):
        parser.error(f"Original binary {args.original} does not exist")
    if not os.path.isfile(args.recompiled):
        parser.error(f"Recompiled binary {args.recompiled} does not exist")
    if not os.path.isfile(args.pdb):
        parser.error(f"Symbols PDB {args.pdb} does not exist")
    if not os.path.isdir(args.decomp_dir):
        parser.error(f"Source directory {args.decomp_dir} does not exist")
    return args
 class CompareResult(Enum):
    MATCH = 1
    DIFF = 2
    ERROR = 3
    WARN = 4
 class ComparedOffset(NamedTuple):
    offset: int
    # name is None for scalar types
    name: Optional[str]
    match: bool
    values: Tuple[str, str]
 class ComparisonItem(NamedTuple):
    """Each variable that was compared"""
    orig_addr: int
    recomp_addr: int
    name: str
    # The list of items that were compared.
    # For a complex type, these are the members.
    # For a scalar type, this is a list of size one.
    # If we could not retrieve type information, this is
    # a list of size one but without any specific type.
    compared: List[ComparedOffset]
    # If present, the error message from the types parser.
    error: Optional[str] = None
    # If true, there is no type specified for this variable. (i.e. non-public)
    # In this case, we can only compare the raw bytes.
    # This is different from the situation where a type id _is_ given, but
    # we could not retrieve it for some reason. (This is an error.)
    raw_only: bool = False
    @property
    def result(self) -> CompareResult:
        if self.error is not None:
            return CompareResult.ERROR
        if all(c.match for c in self.compared):
            return CompareResult.MATCH
        # Prefer WARN for a diff without complete type information.
        return CompareResult.WARN if self.raw_only else CompareResult.DIFF
 def create_comparison_item(
    var: MatchInfo,
    compared: Optional[List[ComparedOffset]] = None,
    error: Optional[str] = None,
    raw_only: bool = False,
 ) -> ComparisonItem:
    """Helper to create the ComparisonItem from the fields in MatchInfo."""
    if compared is None:
        compared = []
    return ComparisonItem(
        orig_addr=var.orig_addr,
        recomp_addr=var.recomp_addr,
        name=var.name,
        compared=compared,
        error=error,
        raw_only=raw_only,
    )
 def do_the_comparison(args: argparse.Namespace) -> Iterable[ComparisonItem]:
    """Run through each variable in our compare DB, then do the comparison
    according to the variable's type. Emit the result."""
    with IsleBin(args.original, find_str=True) as origfile, IsleBin(
        args.recompiled
    ) as recompfile:
        isle_compare = IsleCompare(origfile, recompfile, args.pdb, args.decomp_dir)
        # TODO: We don't currently retain the type information of each variable
        # in our compare DB. To get those, we build this mini-lookup table that
        # maps recomp addresses to their type.
        # We still need to build the full compare DB though, because we may
        # need the matched symbols to compare pointers (e.g. on strings)
        mini_cvdump = Cvdump(args.pdb).globals().types().run()
        recomp_type_reference = {
            recompfile.get_abs_addr(g.section, g.offset): g.type
            for g in mini_cvdump.globals
            if recompfile.is_valid_section(g.section)
        }
        for var in isle_compare.get_variables():
            type_name = recomp_type_reference.get(var.recomp_addr)
            # Start by assuming we can only compare the raw bytes
            data_size = var.size
            is_type_aware = type_name is not None
            if is_type_aware:
                try:
                    # If we are type-aware, we can get the precise
                    # data size for the variable.
                    data_type = mini_cvdump.types.get(type_name)
                    data_size = data_type.size
                except (CvdumpKeyError, CvdumpIntegrityError) as ex:
                    yield create_comparison_item(var, error=repr(ex))
                    continue
            orig_raw = origfile.read(var.orig_addr, data_size)
            recomp_raw = recompfile.read(var.recomp_addr, data_size)
            # If either read exceeded the raw data size for the section,
            # assume the entire variable is uninitialized.
            # TODO: This is not correct, strictly speaking. However,
            # it is probably impossible for a variable to exceed
            # the virtual size of the section, so all that is left is
            # the uninitialized data.
            # If the variable falls at the end of the section like this,
            # it is highly likely to be uninitialized.
            if orig_raw is not None and len(orig_raw) < data_size:
                orig_raw = None
            if recomp_raw is not None and len(recomp_raw) < data_size:
                recomp_raw = None
            # If both variables are uninitialized, we consider them equal.
            # Otherwise, this is a diff but there is nothing to compare.
            if orig_raw is None or recomp_raw is None:
                match = orig_raw is None and recomp_raw is None
                orig_value = "(uninitialized)" if orig_raw is None else "(initialized)"
                recomp_value = (
                    "(uninitialized)" if recomp_raw is None else "(initialized)"
                )
                yield create_comparison_item(
                    var,
                    compared=[
                        ComparedOffset(
                            offset=0,
                            name=None,
                            match=match,
                            values=(orig_value, recomp_value),
                        )
                    ],
                )
                continue
            if not is_type_aware:
                # If there is no specific type information available
                # (i.e. if this is a static or non-public variable)
                # then we can only compare the raw bytes.
                yield create_comparison_item(
                    var,
                    compared=[
                        ComparedOffset(
                            offset=0,
                            name="(raw)",
                            match=orig_raw == recomp_raw,
                            values=(orig_raw, recomp_raw),
                        )
                    ],
                    raw_only=True,
                )
                continue
            # If we are here, we can do the type-aware comparison.
            compared = []
            compare_items = mini_cvdump.types.get_scalars_gapless(type_name)
            format_str = mini_cvdump.types.get_format_string(type_name)
            orig_data = unpack(format_str, orig_raw)
            recomp_data = unpack(format_str, recomp_raw)
            def pointer_display(addr: int, is_orig: bool) -> str:
                """Helper to streamline pointer textual display."""
                if addr == 0:
                    return "nullptr"
                ptr_match = (
                    isle_compare.get_by_orig(addr)
                    if is_orig
                    else isle_compare.get_by_recomp(addr)
                )
                if ptr_match is not None:
                    return f"Pointer to {ptr_match.match_name()}"
                # This variable did not match if we do not have
                # the pointer target in our DB.
                return f"Unknown pointer 0x{addr:x}"
            # Could zip here
            for i, member in enumerate(compare_items):
                if member.is_pointer:
                    match = isle_compare.is_pointer_match(orig_data[i], recomp_data[i])
                    value_a = pointer_display(orig_data[i], True)
                    value_b = pointer_display(recomp_data[i], False)
                    values = (value_a, value_b)
                else:
                    match = orig_data[i] == recomp_data[i]
                    values = (orig_data[i], recomp_data[i])
                compared.append(
                    ComparedOffset(
                        offset=member.offset,
                        name=member.name,
                        match=match,
                        values=values,
                    )
                )
            yield create_comparison_item(var, compared=compared)
 def value_get(value: Optional[str], default: str):
    return value if value is not None else default
 def main():
    args = parse_args()
    def display_match(result: CompareResult) -> str:
        """Helper to return color string or not, depending on user preference"""
        if args.no_color:
            return result.name
        match_color = (
            colorama.Fore.GREEN
            if result == CompareResult.MATCH
            else (
                colorama.Fore.YELLOW
                if result == CompareResult.WARN
                else colorama.Fore.RED
            )
        )
        return f"{match_color}{result.name}{colorama.Style.RESET_ALL}"
    var_count = 0
    problems = 0
    for item in do_the_comparison(args):
        var_count += 1
        if item.result in (CompareResult.DIFF, CompareResult.ERROR):
            problems += 1
        if not args.show_all and item.result == CompareResult.MATCH:
            continue
        address_display = (
            f"0x{item.orig_addr:x} / 0x{item.recomp_addr:x}"
            if args.print_rec_addr
            else f"0x{item.orig_addr:x}"
        )
        print(f"{item.name[:80]} ({address_display}) ... {display_match(item.result)} ")
        if item.error is not None:
            print(f"  {item.error}")
        for c in item.compared:
            if not args.verbose and c.match:
                continue
            (value_a, value_b) = c.values
            if c.match:
                print(f"  {c.offset:5} {value_get(c.name, '(value)'):30} {value_a}")
            else:
                print(
                    f"  {c.offset:5} {value_get(c.name, '(value)'):30} {value_a} : {value_b}"
                )
        if args.verbose:
            print()
    print(
        f"{os.path.basename(args.original)} - Variables: {var_count}. Issues: {problems}"
    )
    return 0 if problems == 0 else 1
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/tools/decomplint/decomplint.py
+++ b/tools/decomplint/decomplint.py
@ -1,103 +0,0 @@
 #!/usr/bin/env python3
 import os
 import sys
 import argparse
 import colorama
 from isledecomp.dir import walk_source_dir, is_file_cpp
 from isledecomp.parser import DecompLinter
 colorama.just_fix_windows_console()
 def display_errors(alerts, filename):
    sorted_alerts = sorted(alerts, key=lambda a: a.line_number)
    for alert in sorted_alerts:
        error_type = (
            f"{colorama.Fore.RED}error: "
            if alert.is_error()
            else f"{colorama.Fore.YELLOW}warning: "
        )
        components = [
            colorama.Fore.LIGHTWHITE_EX,
            filename,
            ":",
            str(alert.line_number),
            " : ",
            error_type,
            colorama.Fore.LIGHTWHITE_EX,
            alert.code.name.lower(),
        ]
        print("".join(components))
        if alert.line is not None:
            print(f"{colorama.Fore.WHITE}  {alert.line}")
 def parse_args() -> argparse.Namespace:
    p = argparse.ArgumentParser(
        description="Syntax checking and linting for decomp annotation markers."
    )
    p.add_argument("target", help="The file or directory to check.")
    p.add_argument(
        "--module",
        required=False,
        type=str,
        help="If present, run targeted checks for markers from the given module.",
    )
    p.add_argument(
        "--warnfail",
        action=argparse.BooleanOptionalAction,
        default=False,
        help="Fail if syntax warnings are found.",
    )
    (args, _) = p.parse_known_args()
    return args
 def process_files(files, module=None):
    warning_count = 0
    error_count = 0
    linter = DecompLinter()
    for filename in files:
        success = linter.check_file(filename, module)
        warnings = [a for a in linter.alerts if a.is_warning()]
        errors = [a for a in linter.alerts if a.is_error()]
        error_count += len(errors)
        warning_count += len(warnings)
        if not success:
            display_errors(linter.alerts, filename)
            print()
    return (warning_count, error_count)
 def main():
    args = parse_args()
    files_to_check = []
    if os.path.isdir(args.target):
        files_to_check = list(walk_source_dir(args.target))
    elif os.path.isfile(args.target) and is_file_cpp(args.target):
        files_to_check = [args.target]
    else:
        sys.exit("Invalid target")
    (warning_count, error_count) = process_files(files_to_check, module=args.module)
    print(colorama.Style.RESET_ALL, end="")
    would_fail = error_count > 0 or (warning_count > 0 and args.warnfail)
    if would_fail:
        return 1
    return 0
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/tools/isledecomp/.gitignore
+++ b/tools/isledecomp/.gitignore
@ -1,2 +0,0 @@
 isledecomp.egg-info/
 build
--- a/tools/isledecomp/isledecomp/init.py
+++ b/tools/isledecomp/isledecomp/init.py
@ -1,4 +0,0 @@
 from .bin import *
 from .dir import *
 from .parser import *
 from .utils import *
--- a/tools/isledecomp/isledecomp/bin.py
+++ b/tools/isledecomp/isledecomp/bin.py
@ -1,558 +0,0 @@
 import logging
 import struct
 import bisect
 from functools import cached_property
 from typing import Iterator, List, Optional, Tuple
 from dataclasses import dataclass
 from collections import namedtuple
 class MZHeaderNotFoundError(Exception):
    """MZ magic string not found at the start of the binary."""
 class PEHeaderNotFoundError(Exception):
    """PE magic string not found at the offset given in 0x3c."""
 class SectionNotFoundError(KeyError):
    """The specified section was not found in the file."""
 class InvalidVirtualAddressError(IndexError):
    """The given virtual address is too high or low
    to point to something in the binary file."""
 PEHeader = namedtuple(
    "PEHeader",
    [
        "Signature",
        "Machine",
        "NumberOfSections",
        "TimeDateStamp",
        "PointerToSymbolTable",  # deprecated
        "NumberOfSymbols",  # deprecated
        "SizeOfOptionalHeader",
        "Characteristics",
    ],
 )
 ImageSectionHeader = namedtuple(
    "ImageSectionHeader",
    [
        "name",
        "virtual_size",
        "virtual_address",
        "size_of_raw_data",
        "pointer_to_raw_data",
        "pointer_to_relocations",
        "pointer_to_line_numbers",
        "number_of_relocations",
        "number_of_line_numbers",
        "characteristics",
    ],
 )
@dataclass
 class Section:
    name: str
    virtual_size: int
    virtual_address: int
    view: memoryview
    @cached_property
    def size_of_raw_data(self) -> int:
        return len(self.view)
    @cached_property
    def extent(self):
        """Get the highest possible offset of this section"""
        return max(self.size_of_raw_data, self.virtual_size)
    def match_name(self, name: str) -> bool:
        return self.name == name
    def contains_vaddr(self, vaddr: int) -> bool:
        return self.virtual_address <= vaddr < self.virtual_address + self.extent
    def read_virtual(self, vaddr: int, size: int) -> memoryview:
        ofs = vaddr - self.virtual_address
        # Negative index will read from the end, which we don't want
        if ofs < 0:
            raise InvalidVirtualAddressError
        try:
            return self.view[ofs : ofs + size]
        except IndexError as ex:
            raise InvalidVirtualAddressError from ex
    def addr_is_uninitialized(self, vaddr: int) -> bool:
        """We cannot rely on the IMAGE_SCN_CNT_UNINITIALIZED_DATA flag (0x80) in
        the characteristics field so instead we determine it this way."""
        if not self.contains_vaddr(vaddr):
            return False
        # Should include the case where size_of_raw_data == 0,
        # meaning the entire section is uninitialized
        return (self.virtual_size > self.size_of_raw_data) and (
            vaddr - self.virtual_address >= self.size_of_raw_data
        )
 logger = logging.getLogger(__name__)
 class Bin:
    """Parses a PE format EXE and allows reading data from a virtual address.
    Reference: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format"""
    # pylint: disable=too-many-instance-attributes
    def __init__(self, filename: str, find_str: bool = False) -> None:
        logger.debug('Parsing headers of "%s"... ', filename)
        self.filename = filename
        self.view: memoryview = None
        self.imagebase = None
        self.entry = None
        self.sections: List[Section] = []
        self._section_vaddr: List[int] = []
        self.find_str = find_str
        self._potential_strings = {}
        self._relocations = set()
        self._relocated_addrs = set()
        self.imports = []
        self.thunks = []
        self.exports: List[Tuple[int, str]] = []
        self.is_debug: bool = False
    def __enter__(self):
        logger.debug("Bin %s Enter", self.filename)
        with open(self.filename, "rb") as f:
            self.view = memoryview(f.read())
        (mz_str,) = struct.unpack("2s", self.view[0:2])
        if mz_str != b"MZ":
            raise MZHeaderNotFoundError
        # Skip to PE header offset in MZ header.
        (pe_header_start,) = struct.unpack("<I", self.view[0x3C:0x40])
        # PE header offset is absolute, so seek there
        pe_header_view = self.view[pe_header_start:]
        pe_hdr = PEHeader(*struct.unpack("<2s2x2H3I2H", pe_header_view[:0x18]))
        if pe_hdr.Signature != b"PE":
            raise PEHeaderNotFoundError
        optional_hdr = pe_header_view[0x18:]
        (self.imagebase,) = struct.unpack("<i", optional_hdr[0x1C:0x20])
        (entry,) = struct.unpack("<i", optional_hdr[0x10:0x14])
        self.entry = entry + self.imagebase
        (number_of_rva,) = struct.unpack("<i", optional_hdr[0x5C:0x60])
        data_dictionaries = [
            *struct.iter_unpack("<2I", optional_hdr[0x60 : 0x60 + number_of_rva * 8])
        ]
        # Check for presence of .debug subsection in .rdata
        try:
            if data_dictionaries[6][0] != 0:
                self.is_debug = True
        except IndexError:
            pass
        headers_view = optional_hdr[
            pe_hdr.SizeOfOptionalHeader : pe_hdr.SizeOfOptionalHeader
            + 0x28 * pe_hdr.NumberOfSections
        ]
        section_headers = [
            ImageSectionHeader(*h) for h in struct.iter_unpack("<8s6I2HI", headers_view)
        ]
        self.sections = [
            Section(
                name=hdr.name.decode("ascii").rstrip("\x00"),
                virtual_address=self.imagebase + hdr.virtual_address,
                virtual_size=hdr.virtual_size,
                view=self.view[
                    hdr.pointer_to_raw_data : hdr.pointer_to_raw_data
                    + hdr.size_of_raw_data
                ],
            )
            for hdr in section_headers
        ]
        # bisect does not support key on the github CI version of python
        self._section_vaddr = [section.virtual_address for section in self.sections]
        self._populate_relocations()
        self._populate_imports()
        self._populate_thunks()
        # Export dir is always first
        self._populate_exports(*data_dictionaries[0])
        # This is a (semi) expensive lookup that is not necesssary in every case.
        # We can find strings in the original if we have coverage using STRING markers.
        # For the recomp, we can find strings using the PDB.
        if self.find_str:
            self._prepare_string_search()
        logger.debug("... Parsing finished")
        return self
    def __exit__(self, exc_type, exc_value, exc_traceback):
        logger.debug("Bin %s Exit", self.filename)
        self.view.release()
    def get_relocated_addresses(self) -> List[int]:
        return sorted(self._relocated_addrs)
    def find_string(self, target: str) -> Optional[int]:
        # Pad with null terminator to make sure we don't
        # match on a subset of the full string
        if not target.endswith(b"\x00"):
            target += b"\x00"
        c = target[0]
        if c not in self._potential_strings:
            return None
        for addr in self._potential_strings[c]:
            if target == self.read(addr, len(target)):
                return addr
        return None
    def is_relocated_addr(self, vaddr) -> bool:
        return vaddr in self._relocated_addrs
    def _prepare_string_search(self):
        """We are intersted in deduplicated string constants found in the
        .rdata and .data sections. For each relocated address in these sections,
        read the first byte and save the address if that byte is an ASCII character.
        When we search for an arbitrary string later, we can narrow down the list
        of potential locations by a lot."""
        def is_ascii(b):
            return b" " <= b < b"\x7f"
        sect_data = self.get_section_by_name(".data")
        sect_rdata = self.get_section_by_name(".rdata")
        potentials = filter(
            lambda a: sect_data.contains_vaddr(a) or sect_rdata.contains_vaddr(a),
            self.get_relocated_addresses(),
        )
        for addr in potentials:
            c = self.read(addr, 1)
            if c is not None and is_ascii(c):
                k = ord(c)
                if k not in self._potential_strings:
                    self._potential_strings[k] = set()
                self._potential_strings[k].add(addr)
    def _populate_relocations(self):
        """The relocation table in .reloc gives each virtual address where the next four
        bytes are, itself, another virtual address. During loading, these values will be
        patched according to the virtual address space for the image, as provided by Windows.
        We can use this information to get a list of where each significant "thing"
        in the file is located. Anything that is referenced absolutely (i.e. excluding
        jump destinations given by local offset) will be here.
        One use case is to tell whether an immediate value in an operand represents
        a virtual address or just a big number."""
        reloc = self.get_section_by_name(".reloc").view
        ofs = 0
        reloc_addrs = []
        # Parse the structure in .reloc to get the list locations to check.
        # The first 8 bytes are 2 dwords that give the base page address
        # and the total block size (including this header).
        # The page address is used to compact the list; each entry is only
        # 2 bytes, and these are added to the base to get the full location.
        # If the entry read in is zero, we are at the end of this section and
        # these are padding bytes.
        while True:
            (page_base, block_size) = struct.unpack("<2I", reloc[ofs : ofs + 8])
            if block_size == 0:
                break
            # HACK: ignore the relocation type for now (the top 4 bits of the value).
            values = list(struct.iter_unpack("<H", reloc[ofs + 8 : ofs + block_size]))
            reloc_addrs += [
                self.imagebase + page_base + (v[0] & 0xFFF) for v in values if v[0] != 0
            ]
            ofs += block_size
        # We are now interested in the relocated addresses themselves. Seek to the
        # address where there is a relocation, then read the four bytes into our set.
        reloc_addrs.sort()
        self._relocations = set(reloc_addrs)
        for section_id, offset in map(self.get_relative_addr, reloc_addrs):
            section = self.get_section_by_index(section_id)
            (relocated_addr,) = struct.unpack("<I", section.view[offset : offset + 4])
            self._relocated_addrs.add(relocated_addr)
    def find_float_consts(self) -> Iterator[Tuple[int, int, float]]:
        """Floating point instructions that refer to a memory address can
        point to constant values. Search the code sections to find FP
        instructions and check whether the pointer address refers to
        read-only data."""
        # TODO: Should check any section that has code, not just .text
        text = self.get_section_by_name(".text")
        rdata = self.get_section_by_name(".rdata")
        # These are the addresses where a relocation occurs.
        # Meaning: it points to an absolute address of something
        for addr in self._relocations:
            if not text.contains_vaddr(addr):
                continue
            # Read the two bytes before the relocated address.
            # We will check against possible float opcodes
            raw = text.read_virtual(addr - 2, 6)
            (opcode, opcode_ext, const_addr) = struct.unpack("<BBL", raw)
            # Skip right away if this is not const data
            if not rdata.contains_vaddr(const_addr):
                continue
            if opcode_ext in (0x5, 0xD, 0x15, 0x1D, 0x25, 0x2D, 0x35, 0x3D):
                if opcode in (0xD8, 0xD9):
                    # dword ptr -- single precision
                    (float_value,) = struct.unpack("<f", self.read(const_addr, 4))
                    yield (const_addr, 4, float_value)
                elif opcode in (0xDC, 0xDD):
                    # qword ptr -- double precision
                    (float_value,) = struct.unpack("<d", self.read(const_addr, 8))
                    yield (const_addr, 8, float_value)
    def _populate_imports(self):
        """Parse .idata to find imported DLLs and their functions."""
        idata_ofs = self.get_section_offset_by_name(".idata")
        def iter_image_import():
            ofs = idata_ofs
            while True:
                # Read 5 dwords until all are zero.
                image_import_descriptor = struct.unpack("<5I", self.read(ofs, 20))
                ofs += 20
                if all(x == 0 for x in image_import_descriptor):
                    break
                (rva_ilt, _, __, dll_name, rva_iat) = image_import_descriptor
                # Convert relative virtual addresses into absolute
                yield (
                    self.imagebase + rva_ilt,
                    self.imagebase + dll_name,
                    self.imagebase + rva_iat,
                )
        image_import_descriptors = list(iter_image_import())
        def iter_imports():
            # ILT = Import Lookup Table
            # IAT = Import Address Table
            # ILT gives us the symbol name of the import.
            # IAT gives the address. The compiler generated a thunk function
            # that jumps to the value of this address.
            for start_ilt, dll_addr, start_iat in image_import_descriptors:
                dll_name = self.read_string(dll_addr).decode("ascii")
                ofs_ilt = start_ilt
                # Address of "__imp__*" symbols.
                ofs_iat = start_iat
                while True:
                    (lookup_addr,) = struct.unpack("<L", self.read(ofs_ilt, 4))
                    (import_addr,) = struct.unpack("<L", self.read(ofs_iat, 4))
                    if lookup_addr == 0 or import_addr == 0:
                        break
                    # MSB set if this is an ordinal import
                    if lookup_addr & 0x80000000 != 0:
                        ordinal_num = lookup_addr & 0x7FFF
                        symbol_name = f"Ordinal_{ordinal_num}"
                    else:
                        # Skip the "Hint" field, 2 bytes
                        name_ofs = lookup_addr + self.imagebase + 2
                        symbol_name = self.read_string(name_ofs).decode("ascii")
                    yield (dll_name, symbol_name, ofs_iat)
                    ofs_ilt += 4
                    ofs_iat += 4
        self.imports = list(iter_imports())
    def _populate_thunks(self):
        """For each imported function, we generate a thunk function. The only
        instruction in the function is a jmp to the address in .idata.
        Search .text to find these functions."""
        text_sect = self.get_section_by_name(".text")
        text_start = text_sect.virtual_address
        # If this is a debug build, read the thunks at the start of .text
        # Terminated by a big block of 0xcc padding bytes before the first
        # real function in the section.
        if self.is_debug:
            ofs = 0
            while True:
                (opcode, operand) = struct.unpack("<Bi", text_sect.view[ofs : ofs + 5])
                if opcode != 0xE9:
                    break
                thunk_ofs = text_start + ofs
                jmp_ofs = text_start + ofs + 5 + operand
                self.thunks.append((thunk_ofs, jmp_ofs))
                ofs += 5
        # Now check for import thunks which are present in debug and release.
        # These use an absolute JMP with the 2 byte opcode: 0xff 0x25
        idata_sect = self.get_section_by_name(".idata")
        ofs = text_start
        for shift in (0, 2, 4):
            window = text_sect.view[shift:]
            win_end = 6 * (len(window) // 6)
            for i, (b0, b1, jmp_ofs) in enumerate(
                struct.iter_unpack("<2BL", window[:win_end])
            ):
                if (b0, b1) == (0xFF, 0x25) and idata_sect.contains_vaddr(jmp_ofs):
                    # Record the address of the jmp instruction and the destination in .idata
                    thunk_ofs = ofs + shift + i * 6
                    self.thunks.append((thunk_ofs, jmp_ofs))
    def _populate_exports(self, export_rva: int, _: int):
        """If you are missing a lot of annotations in your file
        (e.g. debug builds) then you can at least match up the
        export symbol names."""
        # Null = no exports
        if export_rva == 0:
            return
        export_start = self.imagebase + export_rva
        # TODO: namedtuple
        export_table = struct.unpack("<2L2H7L", self.read(export_start, 40))
        # TODO: if the number of functions doesn't match the number of names,
        # are the remaining functions ordinals?
        n_functions = export_table[6]
        func_start = export_start + 40
        func_addrs = [
            self.imagebase + rva
            for rva, in struct.iter_unpack("<L", self.read(func_start, 4 * n_functions))
        ]
        name_start = func_start + 4 * n_functions
        name_addrs = [
            self.imagebase + rva
            for rva, in struct.iter_unpack("<L", self.read(name_start, 4 * n_functions))
        ]
        combined = zip(func_addrs, name_addrs)
        self.exports = [
            (func_addr, self.read_string(name_addr))
            for (func_addr, name_addr) in combined
        ]
    def get_section_by_name(self, name: str) -> Section:
        section = next(
            filter(lambda section: section.match_name(name), self.sections),
            None,
        )
        if section is None:
            raise SectionNotFoundError
        return section
    def get_section_by_index(self, index: int) -> Section:
        """Convert 1-based index into 0-based."""
        return self.sections[index - 1]
    def get_section_extent_by_index(self, index: int) -> int:
        return self.get_section_by_index(index).extent
    def get_section_offset_by_index(self, index: int) -> int:
        """The symbols output from cvdump gives addresses in this format: AAAA.BBBBBBBB
        where A is the index (1-based) into the section table and B is the local offset.
        This will return the virtual address for the start of the section at the given index
        so you can get the virtual address for whatever symbol you are looking at.
        """
        return self.get_section_by_index(index).virtual_address
    def get_section_offset_by_name(self, name: str) -> int:
        """Same as above, but use the section name as the lookup"""
        section = self.get_section_by_name(name)
        return section.virtual_address
    def get_abs_addr(self, section: int, offset: int) -> int:
        """Convenience function for converting section:offset pairs from cvdump
        into an absolute vaddr."""
        return self.get_section_offset_by_index(section) + offset
    def get_relative_addr(self, addr: int) -> Tuple[int, int]:
        """Convert an absolute address back into a (section, offset) pair."""
        i = bisect.bisect_right(self._section_vaddr, addr) - 1
        i = max(0, i)
        section = self.sections[i]
        if section.contains_vaddr(addr):
            return (i + 1, addr - section.virtual_address)
        raise InvalidVirtualAddressError(f"{self.filename} : {hex(addr)}")
    def is_valid_section(self, section_id: int) -> bool:
        """The PDB will refer to sections that are not listed in the headers
        and so should ignore these references."""
        try:
            _ = self.get_section_by_index(section_id)
            return True
        except IndexError:
            return False
    def is_valid_vaddr(self, vaddr: int) -> bool:
        """Does this virtual address point to anything in the exe?"""
        try:
            (_, __) = self.get_relative_addr(vaddr)
        except InvalidVirtualAddressError:
            return False
        return True
    def read_string(self, offset: int, chunk_size: int = 1000) -> Optional[bytes]:
        """Read until we find a zero byte."""
        b = self.read(offset, chunk_size)
        if b is None:
            return None
        try:
            return b[: b.index(b"\x00")]
        except ValueError:
            # No terminator found, just return what we have
            return b
    def read(self, vaddr: int, size: int) -> Optional[bytes]:
        """Read (at most) the given number of bytes at the given virtual address.
        If we return None, the given address points to uninitialized data."""
        (section_id, offset) = self.get_relative_addr(vaddr)
        section = self.sections[section_id - 1]
        if section.addr_is_uninitialized(vaddr):
            return None
        # Clamp the read within the extent of the current section.
        # Reading off the end will most likely misrepresent the virtual addressing.
        _size = min(size, section.size_of_raw_data - offset)
        return bytes(section.view[offset : offset + _size])
--- a/tools/isledecomp/isledecomp/compare/init.py
+++ b/tools/isledecomp/isledecomp/compare/init.py
@ -1 +0,0 @@
 from .core import Compare
--- a/tools/isledecomp/isledecomp/compare/asm/init.py
+++ b/tools/isledecomp/isledecomp/compare/asm/init.py
@ -1,2 +0,0 @@
 from .parse import ParseAsm
 from .swap import can_resolve_register_differences
--- a/tools/isledecomp/isledecomp/compare/asm/const.py
+++ b/tools/isledecomp/isledecomp/compare/asm/const.py
@ -1,27 +0,0 @@
 # Duplicates removed, according to the mnemonics capstone uses.
 # e.g. je and jz are the same instruction. capstone uses je.
 # See: /arch/X86/X86GenAsmWriter.inc in the capstone repo.
 JUMP_MNEMONICS = {
    "ja",
    "jae",
    "jb",
    "jbe",
    "jcxz",  # unused?
    "je",
    "jecxz",
    "jg",
    "jge",
    "jl",
    "jle",
    "jmp",
    "jne",
    "jno",
    "jnp",
    "jns",
    "jo",
    "jp",
    "js",
 }
 # Guaranteed to be a single operand.
 SINGLE_OPERAND_INSTS = {"push", "call", *JUMP_MNEMONICS}
--- a/tools/isledecomp/isledecomp/compare/asm/fixes.py
+++ b/tools/isledecomp/isledecomp/compare/asm/fixes.py
@ -1,302 +0,0 @@
 import re
 from typing import List, Tuple, Set
 DiffOpcode = Tuple[str, int, int, int, int]
 REG_FIND = re.compile(r"(?: |\[)(e?[a-d]x|e?[s,d]i|[a-d][l,h]|e?[b,s]p)")
 ALLOWED_JUMP_SWAPS = (
    ("ja", "jb"),
    ("jae", "jbe"),
    ("jb", "ja"),
    ("jbe", "jae"),
    ("jg", "jl"),
    ("jge", "jle"),
    ("jl", "jg"),
    ("jle", "jge"),
    ("je", "je"),
    ("jne", "jne"),
 )
 def jump_swap_ok(a: str, b: str) -> bool:
    """For the instructions a,b, are they both jump instructions
    that are compatible with a swapped cmp operand order?"""
    # Grab the mnemonic
    (jmp_a, _, __) = a.partition(" ")
    (jmp_b, _, __) = b.partition(" ")
    return (jmp_a, jmp_b) in ALLOWED_JUMP_SWAPS
 def is_operand_swap(a: str, b: str) -> bool:
    """This is a hack to avoid parsing the operands. It's not as simple as
    breaking on the comma because templates or string literals interfere
    with this. Instead we check:
        1. Do both strings use the exact same set of characters?
        2. If we do break on ', ', is the first token of each different?
    2 is needed to catch an edge case like:
        cmp eax, dword ptr [ecx + 0x1234]
        cmp ecx, dword ptr [eax + 0x1234]
    """
    return a.partition(", ")[0] != b.partition(", ")[0] and sorted(a) == sorted(b)
 def can_cmp_swap(orig: List[str], recomp: List[str]) -> bool:
    # Make sure we have 1 cmp and 1 jmp for both
    if len(orig) != 2 or len(recomp) != 2:
        return False
    if not orig[0].startswith("cmp") or not recomp[0].startswith("cmp"):
        return False
    if not orig[1].startswith("j") or not recomp[1].startswith("j"):
        return False
    # Checking two things:
    # Are the cmp operands flipped?
    # Is the jump instruction compatible with a flip?
    return is_operand_swap(orig[0], recomp[0]) and jump_swap_ok(orig[1], recomp[1])
 def patch_jump(a: str, b: str) -> str:
    """For jump instructions a, b, return `(mnemonic_a) (operand_b)`.
    The reason to do it this way (instead of just returning `a`) is that
    the jump instructions might use different displacement offsets
    or labels. If we just replace `b` with `a`, this diff would be
    incorrectly eliminated."""
    (mnemonic_a, _, __) = a.partition(" ")
    (_, __, operand_b) = b.partition(" ")
    return mnemonic_a + " " + operand_b
 def patch_cmp_swaps(
    codes: List[DiffOpcode], orig_asm: List[str], recomp_asm: List[str]
 ) -> Set[int]:
    """Can we resolve the diffs between orig and recomp by patching
    swapped cmp instructions?
    For example:
        cmp eax, ebx            cmp ebx, eax
        je .label               je .label
        cmp eax, ebx            cmp ebx, eax
        ja .label               jb .label
    """
    fixed_lines = set()
    for code, i1, i2, j1, j2 in codes:
        # To save us the trouble of finding "compatible" cmp instructions
        # use the diff information we already have.
        if code != "replace":
            continue
        # If the ranges in orig and recomp are not equal, use the shorter one
        for i, j in zip(range(i1, i2), range(j1, j2)):
            if can_cmp_swap(orig_asm[i : i + 2], recomp_asm[j : j + 2]):
                # Patch cmp
                fixed_lines.add(j)
                # Patch the jump if necessary
                patched = patch_jump(orig_asm[i + 1], recomp_asm[j + 1])
                # We only register a fix if it actually matches
                if orig_asm[i + 1] == patched:
                    fixed_lines.add(j + 1)
    return fixed_lines
 def effective_match_possible(orig_asm: List[str], recomp_asm: List[str]) -> bool:
    # We can only declare an effective match based on the text
    # so you need the same amount of "stuff" in each
    if len(orig_asm) != len(recomp_asm):
        return False
    # mnemonic_orig = [inst.partition(" ")[0] for inst in orig_asm]
    # mnemonic_recomp = [inst.partition(" ")[0] for inst in recomp_asm]
    # Cannot change mnemonics. Must be same starting list
    # TODO: Fine idea but this will exclude jump swaps for cmp operand order
    # if sorted(mnemonic_orig) != sorted(mnemonic_recomp):
    #    return False
    return True
 def find_regs_used(inst: str) -> List[str]:
    return REG_FIND.findall(inst)
 def find_regs_changed(a: str, b: str) -> List[Tuple[str, str]]:
    """For instructions a, b, return the pairs of registers that were used.
    This is not a very precise way to compare the instructions, so it depends
    on the input being two instructions that would match *except* for
    the register choice."""
    return zip(REG_FIND.findall(a), REG_FIND.findall(b))
 def bad_register_swaps(
    swaps: Set[int], orig_asm: List[str], recomp_asm: List[str]
 ) -> Set[int]:
    """The list of recomp indices in `swaps` tells which instructions are
    a match for orig except for the registers used. From that list, check
    whether a register swap should not be allowed.
    For now, this means checking for `push` instructions where the register
    was not used in any other register swaps on previous instructions."""
    rejects = set()
    # Foreach `push` instruction where we have excused the diff
    pushes = [j for j in swaps if recomp_asm[j].startswith("push")]
    for j in pushes:
        okay = False
        # Get the operands in each
        reg = (orig_asm[j].partition(" ")[2], recomp_asm[j].partition(" ")[2])
        # If this isn't a register at all, ignore it
        try:
            int(reg[0], 16)
            continue
        except ValueError:
            pass
        # For every other excused diff that is *not* a push:
        # Assumes same index in orig as in recomp, but so does our naive match
        for k in swaps.difference(pushes):
            changed_regs = find_regs_changed(orig_asm[k], recomp_asm[k])
            if reg in changed_regs or reg[::-1] in changed_regs:
                okay = True
                break
        if not okay:
            rejects.add(j)
    return rejects
 # Instructions that result in a change to the first operand
 MODIFIER_INSTRUCTIONS = ("adc", "add", "lea", "mov", "neg", "sbb", "sub", "pop", "xor")
 def instruction_alters_regs(inst: str, regs: Set[str]) -> bool:
    (mnemonic, _, op_str) = inst.partition(" ")
    (first_operand, _, __) = op_str.partition(", ")
    return (mnemonic in MODIFIER_INSTRUCTIONS and first_operand in regs) or (
        mnemonic == "call" and "eax" in regs
    )
 def relocate_instructions(
    codes: List[DiffOpcode], orig_asm: List[str], recomp_asm: List[str]
 ) -> Set[int]:
    """Collect the list of instructions deleted from orig and inserted
    into recomp, according to the diff opcodes. Using this list, match up
    any pairs of instructions that we assume to be relocated and return
    the indices in recomp where this has occurred.
    For now, we are checking only for an exact match on the instruction.
    We are not checking whether the given instruction can be moved from
    point A to B. (i.e. does this set a register that is used by the
    instructions between A and B?)"""
    deletes = {
        i for code, i1, i2, _, __ in codes for i in range(i1, i2) if code == "delete"
    }
    inserts = [
        j for code, _, __, j1, j2 in codes for j in range(j1, j2) if code == "insert"
    ]
    relocated = set()
    for j in inserts:
        line = recomp_asm[j]
        recomp_regs_used = set(find_regs_used(line))
        for i in deletes:
            # Check for exact match.
            # TODO: This will grab the first instruction that matches.
            # We should probably use the nearest index instead, if it matters
            if orig_asm[i] == line:
                # To account for a move in either direction
                reloc_start = min(i, j)
                reloc_end = max(i, j)
                if not any(
                    instruction_alters_regs(orig_asm[k], recomp_regs_used)
                    for k in range(reloc_start, reloc_end)
                ):
                    relocated.add(j)
                    deletes.remove(i)
                    break
    return relocated
 DWORD_REGS = ("eax", "ebx", "ecx", "edx", "esi", "edi", "ebp", "esp")
 WORD_REGS = ("ax", "bx", "cx", "dx", "si", "di", "bp", "sp")
 BYTE_REGS = ("ah", "al", "bh", "bl", "ch", "cl", "dh", "dl")
 def naive_register_replacement(orig_asm: List[str], recomp_asm: List[str]) -> Set[int]:
    """Replace all registers of the same size with a placeholder string.
    After doing that, compare orig and recomp again.
    Return indices from recomp that are now equal to the same index in orig.
    This requires orig and recomp to have the same number of instructions,
    but this is already a requirement for effective match."""
    orig_raw = "\n".join(orig_asm)
    recomp_raw = "\n".join(recomp_asm)
    # TODO: hardly the most elegant way to do this.
    for rdw in DWORD_REGS:
        orig_raw = orig_raw.replace(rdw, "~reg4")
        recomp_raw = recomp_raw.replace(rdw, "~reg4")
    for rw in WORD_REGS:
        orig_raw = orig_raw.replace(rw, "~reg2")
        recomp_raw = recomp_raw.replace(rw, "~reg2")
    for rb in BYTE_REGS:
        orig_raw = orig_raw.replace(rb, "~reg1")
        recomp_raw = recomp_raw.replace(rb, "~reg1")
    orig_scrubbed = orig_raw.split("\n")
    recomp_scrubbed = recomp_raw.split("\n")
    return {
        j for j in range(len(recomp_scrubbed)) if orig_scrubbed[j] == recomp_scrubbed[j]
    }
 def find_effective_match(
    codes: List[DiffOpcode], orig_asm: List[str], recomp_asm: List[str]
 ) -> bool:
    """Check whether the two sequences of instructions are an effective match.
    Meaning: do they differ only by instruction order or register selection?"""
    if not effective_match_possible(orig_asm, recomp_asm):
        return False
    already_equal = {
        j for code, _, __, j1, j2 in codes for j in range(j1, j2) if code == "equal"
    }
    # We need to come up with some answer for each of these lines
    recomp_lines_disputed = {
        j
        for code, _, __, j1, j2 in codes
        for j in range(j1, j2)
        if code in ("insert", "replace")
    }
    cmp_swaps = patch_cmp_swaps(codes, orig_asm, recomp_asm)
    # This naive result includes lines that already match, so remove those
    naive_swaps = naive_register_replacement(orig_asm, recomp_asm).difference(
        already_equal
    )
    relocates = relocate_instructions(codes, orig_asm, recomp_asm)
    bad_swaps = bad_register_swaps(naive_swaps, orig_asm, recomp_asm)
    corrections = set().union(
        naive_swaps.difference(bad_swaps),
        cmp_swaps,
        relocates,
    )
    return corrections.issuperset(recomp_lines_disputed)
--- a/tools/isledecomp/isledecomp/compare/asm/instgen.py
+++ b/tools/isledecomp/isledecomp/compare/asm/instgen.py
@ -1,235 +0,0 @@
 """Pre-parser for x86 instructions. Will identify data/jump tables used with
 switch statements and local jump/call destinations."""
 import re
 import bisect
 import struct
 from enum import Enum, auto
 from collections import namedtuple
 from typing import List, NamedTuple, Optional, Tuple, Union
 from capstone import Cs, CS_ARCH_X86, CS_MODE_32
 from .const import JUMP_MNEMONICS
 disassembler = Cs(CS_ARCH_X86, CS_MODE_32)
 DisasmLiteInst = namedtuple("DisasmLiteInst", "address, size, mnemonic, op_str")
 displacement_regex = re.compile(r".*\+ (0x[0-9a-f]+)\]")
 class SectionType(Enum):
    CODE = auto()
    DATA_TAB = auto()
    ADDR_TAB = auto()
 class FuncSection(NamedTuple):
    type: SectionType
    contents: List[Union[DisasmLiteInst, Tuple[str, int]]]
 class InstructGen:
    # pylint: disable=too-many-instance-attributes
    def __init__(self, blob: bytes, start: int) -> None:
        self.blob = blob
        self.start = start
        self.end = len(blob) + start
        self.section_end: int = self.end
        self.code_tracks: List[List[DisasmLiteInst]] = []
        # Todo: Could be refactored later
        self.cur_addr: int = 0
        self.cur_section_type: SectionType = SectionType.CODE
        self.section_start = start
        self.sections: List[FuncSection] = []
        self.confirmed_addrs = {}
        self.analysis()
    def _finish_section(self, type_: SectionType, stuff):
        sect = FuncSection(type_, stuff)
        self.sections.append(sect)
    def _insert_confirmed_addr(self, addr: int, type_: SectionType):
        # Ignore address outside the bounds of the function
        if not self.start <= addr < self.end:
            return
        self.confirmed_addrs[addr] = type_
        # This newly inserted address might signal the end of this section.
        # For example, a jump table at the end of the function means we should
        # stop reading instructions once we hit that address.
        # However, if there is a jump table in between code sections, we might
        # read a jump to an address back to the beginning of the function
        # (e.g. a loop that spans the entire function)
        # so ignore this address because we have already passed it.
        if type_ != self.cur_section_type and addr > self.cur_addr:
            self.section_end = min(self.section_end, addr)
    def _next_section(self, addr: int) -> Optional[SectionType]:
        """We have reached the start of a new section. Tell what kind of
        data we are looking at (code or other) and how much we should read."""
        # Assume the start of every function is code.
        if addr == self.start:
            self.section_end = self.end
            return SectionType.CODE
        # The start of a new section must be an address that we've seen.
        new_type = self.confirmed_addrs.get(addr)
        if new_type is None:
            return None
        self.cur_section_type = new_type
        # The confirmed addrs dict is sorted by insertion order
        # i.e. the order in which we read the addresses
        # So we have to sort and then find the next item
        # to see where this section should end.
        # If we are in a CODE section, ignore contiguous CODE addresses.
        # These are not the start of a new section.
        # However: if we are not in CODE, any upcoming address is a new section.
        # Do this so we can detect contiguous non-CODE sections.
        confirmed = [
            conf_addr
            for (conf_addr, conf_type) in sorted(self.confirmed_addrs.items())
            if self.cur_section_type != SectionType.CODE
            or conf_type != self.cur_section_type
        ]
        index = bisect.bisect_right(confirmed, addr)
        if index < len(confirmed):
            self.section_end = confirmed[index]
        else:
            self.section_end = self.end
        return new_type
    def _get_code_for(self, addr: int) -> List[DisasmLiteInst]:
        """Start disassembling at the given address."""
        # If we are reading a code block beyond the first, see if we already
        # have disassembled instructions beginning at the specified address.
        # For a CODE/ADDR/CODE function, we might get lucky and produce the
        # correct instruction after the jump table's junk instructions.
        for track in self.code_tracks:
            for i, inst in enumerate(track):
                if inst.address == addr:
                    return track[i:]
        # If we are here, we don't have the instructions.
        # Todo: Could try to be clever here and disassemble only
        # as much as we probably need (i.e. if a jump table is between CODE
        # blocks, there are probably only a few bad instructions after the
        # jump table is finished. We could disassemble up to the next verified
        # code address and stitch it together)
        blob_cropped = self.blob[addr - self.start :]
        instructions = [
            DisasmLiteInst(*inst)
            for inst in disassembler.disasm_lite(blob_cropped, addr)
        ]
        self.code_tracks.append(instructions)
        return instructions
    def _handle_jump(self, inst: DisasmLiteInst):
        # If this is a regular jump and its destination is within the
        # bounds of the binary data (i.e. presumed function size)
        # add it to our list of confirmed addresses.
        if inst.op_str[0] == "0":
            value = int(inst.op_str, 16)
            self._insert_confirmed_addr(value, SectionType.CODE)
        # If this is jumping into a table of addresses, save the destination
        elif (match := displacement_regex.match(inst.op_str)) is not None:
            value = int(match.group(1), 16)
            self._insert_confirmed_addr(value, SectionType.ADDR_TAB)
    def analysis(self):
        self.cur_addr = self.start
        while (sect_type := self._next_section(self.cur_addr)) is not None:
            self.section_start = self.cur_addr
            if sect_type == SectionType.CODE:
                instructions = self._get_code_for(self.cur_addr)
                # If we didn't get any instructions back, something is wrong.
                # i.e. We can only read part of the full instruction that is up next.
                if len(instructions) == 0:
                    # Nudge the current addr so we will eventually move on to the
                    # next section.
                    # Todo: Maybe we could just call it quits here
                    self.cur_addr += 1
                    break
                for inst in instructions:
                    # section_end is updated as we read instructions.
                    # If we are into a jump/data table and would read
                    # a junk instruction, stop here.
                    if self.cur_addr >= self.section_end:
                        break
                    # print(f"{inst.address:x} : {inst.mnemonic} {inst.op_str}")
                    if inst.mnemonic in JUMP_MNEMONICS:
                        self._handle_jump(inst)
                        # Todo: log calls too (unwind section)
                    elif inst.mnemonic == "mov":
                        # Todo: maintain pairing of data/jump tables
                        if (match := displacement_regex.match(inst.op_str)) is not None:
                            value = int(match.group(1), 16)
                            self._insert_confirmed_addr(value, SectionType.DATA_TAB)
                    # Do this instead of copying instruction address.
                    # If there is only one instruction, we would get stuck here.
                    self.cur_addr += inst.size
                # End of for loop on instructions.
                # We are at the end of the section or the entire function.
                # Cut out only the valid instructions for this section
                # and save it for later.
                # Todo: don't need to iter on every instruction here.
                # They are already in order.
                instruction_slice = [
                    inst for inst in instructions if inst.address < self.section_end
                ]
                self._finish_section(SectionType.CODE, instruction_slice)
            elif sect_type == SectionType.ADDR_TAB:
                # Clamp to multiple of 4 (dwords)
                read_size = ((self.section_end - self.cur_addr) // 4) * 4
                offsets = range(self.section_start, self.section_start + read_size, 4)
                dwords = self.blob[
                    self.cur_addr - self.start : self.cur_addr - self.start + read_size
                ]
                addrs = [addr for addr, in struct.iter_unpack("<L", dwords)]
                for addr in addrs:
                    # Todo: the fact that these are jump table destinations
                    # should factor into the label name.
                    self._insert_confirmed_addr(addr, SectionType.CODE)
                jump_table = list(zip(offsets, addrs))
                # for (t0,t1) in jump_table:
                #     print(f"{t0:x} : --> {t1:x}")
                self._finish_section(SectionType.ADDR_TAB, jump_table)
                self.cur_addr = self.section_end
            else:
                # Todo: variable data size?
                read_size = self.section_end - self.cur_addr
                offsets = range(self.section_start, self.section_start + read_size)
                bytes_ = self.blob[
                    self.cur_addr - self.start : self.cur_addr - self.start + read_size
                ]
                data = [b for b, in struct.iter_unpack("<B", bytes_)]
                data_table = list(zip(offsets, data))
                # for (t0,t1) in data_table:
                #     print(f"{t0:x} : value {t1:02x}")
                self._finish_section(SectionType.DATA_TAB, data_table)
                self.cur_addr = self.section_end
--- a/tools/isledecomp/isledecomp/compare/asm/parse.py
+++ b/tools/isledecomp/isledecomp/compare/asm/parse.py
@ -1,243 +0,0 @@
 """Converts x86 machine code into text (i.e. assembly). The end goal is to
 compare the code in the original and recomp binaries, using longest common
 subsequence (LCS), i.e. difflib.SequenceMatcher.
 The capstone library takes the raw bytes and gives us the mnemonic
 and operand(s) for each instruction. We need to "sanitize" the text further
 so that virtual addresses are replaced by symbol name or a generic
 placeholder string."""
 import re
 import struct
 from functools import cache
 from typing import Callable, List, Optional, Tuple
 from collections import namedtuple
 from .const import JUMP_MNEMONICS, SINGLE_OPERAND_INSTS
 from .instgen import InstructGen, SectionType
 ptr_replace_regex = re.compile(r"\[(0x[0-9a-f]+)\]")
 displace_replace_regex = re.compile(r"\+ (0x[0-9a-f]+)\]")
 # For matching an immediate value on its own.
 # Preceded by start-of-string (first operand) or comma-space (second operand)
 immediate_replace_regex = re.compile(r"(?:^|, )(0x[0-9a-f]+)")
 DisasmLiteInst = namedtuple("DisasmLiteInst", "address, size, mnemonic, op_str")
@cache
 def from_hex(string: str) -> Optional[int]:
    try:
        return int(string, 16)
    except ValueError:
        pass
    return None
 def bytes_to_dword(b: bytes) -> Optional[int]:
    if len(b) == 4:
        return struct.unpack("<L", b)[0]
    return None
 class ParseAsm:
    def __init__(
        self,
        relocate_lookup: Optional[Callable[[int], bool]] = None,
        name_lookup: Optional[Callable[[int, bool], str]] = None,
        bin_lookup: Optional[Callable[[int, int], Optional[bytes]]] = None,
    ) -> None:
        self.relocate_lookup = relocate_lookup
        self.name_lookup = name_lookup
        self.bin_lookup = bin_lookup
        self.replacements = {}
        self.number_placeholders = True
    def reset(self):
        self.replacements = {}
    def is_relocated(self, addr: int) -> bool:
        if callable(self.relocate_lookup):
            return self.relocate_lookup(addr)
        return False
    def lookup(
        self, addr: int, use_cache: bool = True, exact: bool = False
    ) -> Optional[str]:
        """Return a replacement name for this address if we find one."""
        if use_cache and (cached := self.replacements.get(addr, None)) is not None:
            return cached
        if callable(self.name_lookup):
            if (name := self.name_lookup(addr, exact)) is not None:
                if use_cache:
                    self.replacements[addr] = name
                return name
        return None
    def replace(self, addr: int) -> str:
        """Same function as lookup above, but here we return a placeholder
        if there is no better name to use."""
        if (name := self.lookup(addr)) is not None:
            return name
        # The placeholder number corresponds to the number of addresses we have
        # already replaced. This is so the number will be consistent across the diff
        # if we can replace some symbols with actual names in recomp but not orig.
        idx = len(self.replacements) + 1
        placeholder = f"<OFFSET{idx}>" if self.number_placeholders else "<OFFSET>"
        self.replacements[addr] = placeholder
        return placeholder
    def hex_replace_always(self, match: re.Match) -> str:
        """If a pointer value was matched, always insert a placeholder"""
        value = int(match.group(1), 16)
        return match.group(0).replace(match.group(1), self.replace(value))
    def hex_replace_relocated(self, match: re.Match) -> str:
        """For replacing immediate value operands. We only want to
        use the placeholder if we are certain that this is a valid address.
        We can check the relocation table to find out."""
        value = int(match.group(1), 16)
        if self.is_relocated(value):
            return match.group(0).replace(match.group(1), self.replace(value))
        return match.group(0)
    def hex_replace_annotated(self, match: re.Match) -> str:
        """For replacing immediate value operands. Here we replace the value
        only if the name lookup returns something. Do not use a placeholder."""
        value = int(match.group(1), 16)
        placeholder = self.lookup(value, use_cache=False)
        if placeholder is not None:
            return match.group(0).replace(match.group(1), placeholder)
        return match.group(0)
    def hex_replace_indirect(self, match: re.Match) -> str:
        """Edge case for hex_replace_always. The context of the instruction
        tells us that the pointer value is an absolute indirect.
        So we go to that location in the binary to get the address.
        If we cannot identify the indirect address, fall back to a lookup
        on the original pointer value so we might display something useful."""
        value = int(match.group(1), 16)
        indirect_value = None
        if callable(self.bin_lookup):
            indirect_value = self.bin_lookup(value, 4)
        if indirect_value is not None:
            indirect_addr = bytes_to_dword(indirect_value)
            if (
                indirect_addr is not None
                and self.lookup(indirect_addr, use_cache=False) is not None
            ):
                return match.group(0).replace(
                    match.group(1), "->" + self.replace(indirect_addr)
                )
        return match.group(0).replace(match.group(1), self.replace(value))
    def sanitize(self, inst: DisasmLiteInst) -> Tuple[str, str]:
        # For jumps or calls, if the entire op_str is a hex number, the value
        # is a relative offset.
        # Otherwise (i.e. it looks like `dword ptr [address]`) it is an
        # absolute indirect that we will handle below.
        # Providing the starting address of the function to capstone.disasm has
        # automatically resolved relative offsets to an absolute address.
        # We will have to undo this for some of the jumps or they will not match.
        if (
            inst.mnemonic in SINGLE_OPERAND_INSTS
            and (op_str_address := from_hex(inst.op_str)) is not None
        ):
            if inst.mnemonic == "call":
                return (inst.mnemonic, self.replace(op_str_address))
            if inst.mnemonic == "push":
                if self.is_relocated(op_str_address):
                    return (inst.mnemonic, self.replace(op_str_address))
                # To avoid falling into jump handling
                return (inst.mnemonic, inst.op_str)
            if inst.mnemonic == "jmp":
                # The unwind section contains JMPs to other functions.
                # If we have a name for this address, use it. If not,
                # do not create a new placeholder. We will instead
                # fall through to generic jump handling below.
                potential_name = self.lookup(op_str_address, exact=True)
                if potential_name is not None:
                    return (inst.mnemonic, potential_name)
            # Else: this is any jump
            # Show the jump offset rather than the absolute address
            jump_displacement = op_str_address - (inst.address + inst.size)
            return (inst.mnemonic, hex(jump_displacement))
        if inst.mnemonic == "call":
            # Special handling for absolute indirect CALL.
            op_str = ptr_replace_regex.sub(self.hex_replace_indirect, inst.op_str)
        else:
            op_str = ptr_replace_regex.sub(self.hex_replace_always, inst.op_str)
            # We only want relocated addresses for pointer displacement.
            # i.e. ptr [register + something]
            # Otherwise we would use a placeholder for every stack variable,
            # vtable call, or this->member access.
            op_str = displace_replace_regex.sub(self.hex_replace_relocated, op_str)
        # In the event of pointer comparison, only replace the immediate value
        # if it is a known address.
        if inst.mnemonic == "cmp":
            op_str = immediate_replace_regex.sub(self.hex_replace_annotated, op_str)
        else:
            op_str = immediate_replace_regex.sub(self.hex_replace_relocated, op_str)
        return (inst.mnemonic, op_str)
    def parse_asm(self, data: bytes, start_addr: Optional[int] = 0) -> List[str]:
        asm = []
        ig = InstructGen(data, start_addr)
        for sect_type, sect_contents in ig.sections:
            if sect_type == SectionType.CODE:
                for inst in sect_contents:
                    # Use heuristics to disregard some differences that aren't representative
                    # of the accuracy of a function (e.g. global offsets)
                    # If there is no pointer or immediate value in the op_str,
                    # there is nothing to sanitize.
                    # This leaves us with cases where a small immediate value or
                    # small displacement (this.member or vtable calls) appears.
                    # If we assume that instructions we want to sanitize need to be 5
                    # bytes -- 1 for the opcode and 4 for the address -- exclude cases
                    # where the hex value could not be an address.
                    # The exception is jumps which are as small as 2 bytes
                    # but are still useful to sanitize.
                    if "0x" in inst.op_str and (
                        inst.mnemonic in JUMP_MNEMONICS or inst.size > 4
                    ):
                        result = self.sanitize(inst)
                    else:
                        result = (inst.mnemonic, inst.op_str)
                    # mnemonic + " " + op_str
                    asm.append((hex(inst.address), " ".join(result)))
            elif sect_type == SectionType.ADDR_TAB:
                asm.append(("", "Jump table:"))
                for i, (ofs, _) in enumerate(sect_contents):
                    asm.append((hex(ofs), f"Jump_dest_{i}"))
            elif sect_type == SectionType.DATA_TAB:
                asm.append(("", "Data table:"))
                for ofs, b in sect_contents:
                    asm.append((hex(ofs), hex(b)))
        return asm
--- a/tools/isledecomp/isledecomp/compare/asm/swap.py
+++ b/tools/isledecomp/isledecomp/compare/asm/swap.py
@ -1,80 +0,0 @@
 import re
 REGISTER_LIST = set(
    [
        "ax",
        "bp",
        "bx",
        "cx",
        "di",
        "dx",
        "eax",
        "ebp",
        "ebx",
        "ecx",
        "edi",
        "edx",
        "esi",
        "esp",
        "si",
        "sp",
    ]
 )
 WORDS = re.compile(r"\w+")
 def get_registers(line: str):
    to_replace = []
    # use words regex to find all matching positions:
    for match in WORDS.finditer(line):
        reg = match.group(0)
        if reg in REGISTER_LIST:
            to_replace.append((reg, match.start()))
    return to_replace
 def replace_register(
    lines: list[str], start_line: int, reg: str, replacement: str
 ) -> list[str]:
    return [
        line.replace(reg, replacement) if i >= start_line else line
        for i, line in enumerate(lines)
    ]
 # Is it possible to make new_asm the same as original_asm by swapping registers?
 def can_resolve_register_differences(original_asm, new_asm):
    # Split the ASM on spaces to get more granularity, and so
    # that we don't modify the original arrays passed in.
    original_asm = [part for line in original_asm for part in line.split()]
    new_asm = [part for line in new_asm for part in line.split()]
    # Swapping ain't gonna help if the lengths are different
    if len(original_asm) != len(new_asm):
        return False
    # Look for the mismatching lines
    for i, original_line in enumerate(original_asm):
        new_line = new_asm[i]
        if new_line != original_line:
            # Find all the registers to replace
            to_replace = get_registers(original_line)
            for replace in to_replace:
                (reg, reg_index) = replace
                replacing_reg = new_line[reg_index : reg_index + len(reg)]
                if replacing_reg in REGISTER_LIST:
                    if replacing_reg != reg:
                        # Do a three-way swap replacing in all the subsequent lines
                        temp_reg = "&" * len(reg)
                        new_asm = replace_register(new_asm, i, replacing_reg, temp_reg)
                        new_asm = replace_register(new_asm, i, reg, replacing_reg)
                        new_asm = replace_register(new_asm, i, temp_reg, reg)
                else:
                    # No replacement to do, different code, bail out
                    return False
    # Check if the lines are now the same
    for i, original_line in enumerate(original_asm):
        if new_asm[i] != original_line:
            return False
    return True
--- a/tools/isledecomp/isledecomp/compare/core.py
+++ b/tools/isledecomp/isledecomp/compare/core.py
@ -1,766 +0,0 @@
 import os
 import logging
 import difflib
 import struct
 import uuid
 from dataclasses import dataclass
 from typing import Callable, Iterable, List, Optional
 from isledecomp.bin import Bin as IsleBin, InvalidVirtualAddressError
 from isledecomp.cvdump.demangler import demangle_string_const
 from isledecomp.cvdump import Cvdump, CvdumpAnalysis
 from isledecomp.parser import DecompCodebase
 from isledecomp.dir import walk_source_dir
 from isledecomp.types import SymbolType
 from isledecomp.compare.asm import ParseAsm
 from isledecomp.compare.asm.fixes import find_effective_match
 from .db import CompareDb, MatchInfo
 from .diff import combined_diff
 from .lines import LinesDb
 logger = logging.getLogger(__name__)
@dataclass
 class DiffReport:
    # pylint: disable=too-many-instance-attributes
    match_type: SymbolType
    orig_addr: int
    recomp_addr: int
    name: str
    udiff: Optional[List[str]] = None
    ratio: float = 0.0
    is_effective_match: bool = False
    is_stub: bool = False
    @property
    def effective_ratio(self) -> float:
        return 1.0 if self.is_effective_match else self.ratio
    def __str__(self) -> str:
        """For debug purposes. Proper diff printing (with coloring) is in another module."""
        return f"{self.name} (0x{self.orig_addr:x}) {self.ratio*100:.02f}%{'*' if self.is_effective_match else ''}"
 def create_reloc_lookup(bin_file: IsleBin) -> Callable[[int], bool]:
    """Function generator for relocation table lookup"""
    def lookup(addr: int) -> bool:
        return addr > bin_file.imagebase and bin_file.is_relocated_addr(addr)
    return lookup
 def create_bin_lookup(bin_file: IsleBin) -> Callable[[int, int], Optional[str]]:
    """Function generator for reading from the bin file"""
    def lookup(addr: int, size: int) -> Optional[bytes]:
        try:
            return bin_file.read(addr, size)
        except InvalidVirtualAddressError:
            return None
    return lookup
 class Compare:
    # pylint: disable=too-many-instance-attributes
    def __init__(
        self, orig_bin: IsleBin, recomp_bin: IsleBin, pdb_file: str, code_dir: str
    ):
        self.orig_bin = orig_bin
        self.recomp_bin = recomp_bin
        self.pdb_file = pdb_file
        self.code_dir = code_dir
        # Controls whether we dump the asm output to a file
        self.debug: bool = False
        self.runid: str = uuid.uuid4().hex[:8]
        self._lines_db = LinesDb(code_dir)
        self._db = CompareDb()
        self._load_cvdump()
        self._load_markers()
        self._find_original_strings()
        self._find_float_const()
        self._match_imports()
        self._match_exports()
        self._match_thunks()
        self._find_vtordisp()
    def _load_cvdump(self):
        logger.info("Parsing %s ...", self.pdb_file)
        cv = (
            Cvdump(self.pdb_file)
            .lines()
            .globals()
            .publics()
            .symbols()
            .section_contributions()
            .types()
            .run()
        )
        res = CvdumpAnalysis(cv)
        for sym in res.nodes:
            # The PDB might contain sections that do not line up with the
            # actual binary. The symbol "__except_list" is one example.
            # In these cases, just skip this symbol and move on because
            # we can't do much with it.
            if not self.recomp_bin.is_valid_section(sym.section):
                continue
            addr = self.recomp_bin.get_abs_addr(sym.section, sym.offset)
            # If this symbol is the final one in its section, we were not able to
            # estimate its size because we didn't have the total size of that section.
            # We can get this estimate now and assume that the final symbol occupies
            # the remainder of the section.
            if sym.estimated_size is None:
                sym.estimated_size = (
                    self.recomp_bin.get_section_extent_by_index(sym.section)
                    - sym.offset
                )
            if sym.node_type == SymbolType.STRING:
                string_info = demangle_string_const(sym.decorated_name)
                if string_info is None:
                    logger.debug(
                        "Could not demangle string symbol: %s", sym.decorated_name
                    )
                    continue
                # TODO: skip unicode for now. will need to handle these differently.
                if string_info.is_utf16:
                    continue
                raw = self.recomp_bin.read(addr, sym.size())
                try:
                    # We use the string length reported in the mangled symbol as the
                    # data size, but this is not always accurate with respect to the
                    # null terminator.
                    # e.g. ??_C@_0BA@EFDM@MxObjectFactory?$AA@
                    # reported length: 16 (includes null terminator)
                    # c.f. ??_C@_03DPKJ@enz?$AA@
                    # reported length: 3 (does NOT include terminator)
                    # This will handle the case where the entire string contains "\x00"
                    # because those are distinct from the empty string of length 0.
                    decoded_string = raw.decode("latin1")
                    rstrip_string = decoded_string.rstrip("\x00")
                    if decoded_string != "" and rstrip_string != "":
                        sym.friendly_name = rstrip_string
                    else:
                        sym.friendly_name = decoded_string
                except UnicodeDecodeError:
                    pass
            self._db.set_recomp_symbol(
                addr, sym.node_type, sym.name(), sym.decorated_name, sym.size()
            )
        for (section, offset), (filename, line_no) in res.verified_lines.items():
            addr = self.recomp_bin.get_abs_addr(section, offset)
            self._lines_db.add_line(filename, line_no, addr)
        # The _entry symbol is referenced in the PE header so we get this match for free.
        self._db.set_function_pair(self.orig_bin.entry, self.recomp_bin.entry)
    def _load_markers(self):
        # Assume module name is the base filename of the original binary.
        (module, _) = os.path.splitext(os.path.basename(self.orig_bin.filename))
        codefiles = list(walk_source_dir(self.code_dir))
        codebase = DecompCodebase(codefiles, module.upper())
        def orig_bin_checker(addr: int) -> bool:
            return self.orig_bin.is_valid_vaddr(addr)
        # If the address of any annotation would cause an exception,
        # remove it and report an error.
        bad_annotations = codebase.prune_invalid_addrs(orig_bin_checker)
        for sym in bad_annotations:
            logger.error(
                "Invalid address 0x%x on %s annotation in file: %s",
                sym.offset,
                sym.type.name,
                sym.filename,
            )
        # Match lineref functions first because this is a guaranteed match.
        # If we have two functions that share the same name, and one is
        # a lineref, we can match the nameref correctly because the lineref
        # was already removed from consideration.
        for fun in codebase.iter_line_functions():
            recomp_addr = self._lines_db.search_line(fun.filename, fun.line_number)
            if recomp_addr is not None:
                self._db.set_function_pair(fun.offset, recomp_addr)
                if fun.should_skip():
                    self._db.mark_stub(fun.offset)
        for fun in codebase.iter_name_functions():
            self._db.match_function(fun.offset, fun.name)
            if fun.should_skip():
                self._db.mark_stub(fun.offset)
        for var in codebase.iter_variables():
            if var.is_static and var.parent_function is not None:
                self._db.match_static_variable(
                    var.offset, var.name, var.parent_function
                )
            else:
                self._db.match_variable(var.offset, var.name)
        for tbl in codebase.iter_vtables():
            self._db.match_vtable(tbl.offset, tbl.name, tbl.base_class)
        for string in codebase.iter_strings():
            # Not that we don't trust you, but we're checking the string
            # annotation to make sure it is accurate.
            try:
                # TODO: would presumably fail for wchar_t strings
                orig = self.orig_bin.read_string(string.offset).decode("latin1")
                string_correct = string.name == orig
            except UnicodeDecodeError:
                string_correct = False
            if not string_correct:
                logger.error(
                    "Data at 0x%x does not match string %s",
                    string.offset,
                    repr(string.name),
                )
                continue
            self._db.match_string(string.offset, string.name)
    def _find_original_strings(self):
        """Go to the original binary and look for the specified string constants
        to find a match. This is a (relatively) expensive operation so we only
        look at strings that we have not already matched via a STRING annotation."""
        for string in self._db.get_unmatched_strings():
            addr = self.orig_bin.find_string(string.encode("latin1"))
            if addr is None:
                escaped = repr(string)
                logger.debug("Failed to find this string in the original: %s", escaped)
                continue
            self._db.match_string(addr, string)
    def _find_float_const(self):
        """Add floating point constants in each binary to the database.
        We are not matching anything right now because these values are not
        deduped like strings."""
        for addr, size, float_value in self.orig_bin.find_float_consts():
            self._db.set_orig_symbol(addr, SymbolType.FLOAT, str(float_value), size)
        for addr, size, float_value in self.recomp_bin.find_float_consts():
            self._db.set_recomp_symbol(
                addr, SymbolType.FLOAT, str(float_value), None, size
            )
    def _match_imports(self):
        """We can match imported functions based on the DLL name and
        function symbol name."""
        orig_byaddr = {
            addr: (dll.upper(), name) for (dll, name, addr) in self.orig_bin.imports
        }
        recomp_byname = {
            (dll.upper(), name): addr for (dll, name, addr) in self.recomp_bin.imports
        }
        # Combine these two dictionaries. We don't care about imports from recomp
        # not found in orig because:
        # 1. They shouldn't be there
        # 2. They are already identified via cvdump
        orig_to_recomp = {
            addr: recomp_byname.get(pair, None) for addr, pair in orig_byaddr.items()
        }
        # Now: we have the IAT offset in each matched up, so we need to make
        # the connection between the thunk functions.
        # We already have the symbol name we need from the PDB.
        for orig, recomp in orig_to_recomp.items():
            if orig is None or recomp is None:
                continue
            # Match the __imp__ symbol
            self._db.set_pair(orig, recomp, SymbolType.POINTER)
            # Read the relative address from .idata
            try:
                (recomp_rva,) = struct.unpack("<L", self.recomp_bin.read(recomp, 4))
                (orig_rva,) = struct.unpack("<L", self.orig_bin.read(orig, 4))
            except ValueError:
                # Bail out if there's a problem with struct.unpack
                continue
            # Strictly speaking, this is a hack to support asm sanitize.
            # When calling an import, we will recognize that the address for the
            # CALL instruction is a pointer to the actual address, but this is
            # not only not the address of a function, it is not an address at all.
            # To make the asm display work correctly (i.e. to match what you see
            # in ghidra) create a function match on the RVA. This is not a valid
            # virtual address because it is before the imagebase, but it will
            # do what we need it to do in the sanitize function.
            (dll_name, func_name) = orig_byaddr[orig]
            fullname = dll_name + ":" + func_name
            self._db.set_recomp_symbol(
                recomp_rva, SymbolType.FUNCTION, fullname, None, 4
            )
            self._db.set_pair(orig_rva, recomp_rva, SymbolType.FUNCTION)
            self._db.skip_compare(orig_rva)
    def _match_thunks(self):
        """Thunks are (by nature) matched by indirection. If a thunk from orig
        points at a function we have already matched, we can find the matching
        thunk in recomp because it points to the same place."""
        # Turn this one inside out for easy lookup
        recomp_thunks = {
            func_addr: thunk_addr for (thunk_addr, func_addr) in self.recomp_bin.thunks
        }
        # Mark all recomp thunks first. This allows us to use their name
        # when we sanitize the asm.
        for recomp_thunk, recomp_addr in self.recomp_bin.thunks:
            recomp_func = self._db.get_by_recomp(recomp_addr)
            if recomp_func is None:
                continue
            self._db.create_recomp_thunk(recomp_thunk, recomp_func.name)
        for orig_thunk, orig_addr in self.orig_bin.thunks:
            orig_func = self._db.get_by_orig(orig_addr)
            if orig_func is None:
                continue
            # Check whether the thunk destination is a matched symbol
            recomp_thunk = recomp_thunks.get(orig_func.recomp_addr)
            if recomp_thunk is None:
                self._db.create_orig_thunk(orig_thunk, orig_func.name)
                continue
            self._db.set_function_pair(orig_thunk, recomp_thunk)
            # Don't compare thunk functions for now. The comparison isn't
            # "useful" in the usual sense. We are only looking at the
            # bytes of the jmp instruction and not the larger context of
            # where this function is. Also: these will always match 100%
            # because we are searching for a match to register this as a
            # function in the first place.
            self._db.skip_compare(orig_thunk)
    def _match_exports(self):
        # invert for name lookup
        orig_exports = {y: x for (x, y) in self.orig_bin.exports}
        for recomp_addr, export_name in self.recomp_bin.exports:
            orig_addr = orig_exports.get(export_name)
            if orig_addr is None:
                continue
            try:
                # Check whether either of the addresses is actually a thunk.
                # This is a quirk of the debug builds. Technically the export
                # *is* the thunk, but it's more helpful to mark the actual function.
                # It could be the case that only one side is a thunk, but we can
                # deal with that.
                (opcode, rel_addr) = struct.unpack(
                    "<Bl", self.recomp_bin.read(recomp_addr, 5)
                )
                if opcode == 0xE9:
                    recomp_addr += 5 + rel_addr
                (opcode, rel_addr) = struct.unpack(
                    "<Bl", self.orig_bin.read(orig_addr, 5)
                )
                if opcode == 0xE9:
                    orig_addr += 5 + rel_addr
            except ValueError:
                # Bail out if there's a problem with struct.unpack
                continue
            if self._db.set_pair_tentative(orig_addr, recomp_addr):
                logger.debug("Matched export %s", repr(export_name))
    def _find_vtordisp(self):
        """If there are any cases of virtual inheritance, we can read
        through the vtables for those classes and find the vtable thunk
        functions (vtordisp).
        Our approach is this: walk both vtables and check where we have a
        vtordisp in the recomp table. Inspect the function at that vtable
        position (in both) and check whether we jump to the same function.
        One potential pitfall here is that the virtual displacement could
        differ between the thunks. We are not (yet) checking for this, so the
        result is that the vtable will appear to match but we will have a diff
        on the thunk in our regular function comparison.
        We could do this differently and check only the original vtable,
        construct the name of the vtordisp function and match based on that."""
        for match in self._db.get_matches_by_type(SymbolType.VTABLE):
            # We need some method of identifying vtables that
            # might have thunks, and this ought to work okay.
            if "{for" not in match.name:
                continue
            # TODO: We might want to fix this at the source (cvdump) instead.
            # Any problem will be logged later when we compare the vtable.
            vtable_size = 4 * (match.size // 4)
            orig_table = self.orig_bin.read(match.orig_addr, vtable_size)
            recomp_table = self.recomp_bin.read(match.recomp_addr, vtable_size)
            raw_addrs = zip(
                [t for (t,) in struct.iter_unpack("<L", orig_table)],
                [t for (t,) in struct.iter_unpack("<L", recomp_table)],
            )
            # Now walk both vtables looking for thunks.
            for orig_addr, recomp_addr in raw_addrs:
                if not self._db.is_vtordisp(recomp_addr):
                    continue
                thunk_fn = self.get_by_recomp(recomp_addr)
                # Read the function bytes here.
                # In practice, the adjuster thunk will be under 16 bytes.
                # If we have thunks of unequal size, we can still tell whether
                # they are thunking the same function by grabbing the
                # JMP instruction at the end.
                thunk_presumed_size = max(thunk_fn.size, 16)
                # Strip off MSVC padding 0xcc bytes.
                # This should be safe to do; it is highly unlikely that
                # the MSB of the jump displacement would be 0xcc. (huge jump)
                orig_thunk_bin = self.orig_bin.read(
                    orig_addr, thunk_presumed_size
                ).rstrip(b"\xcc")
                recomp_thunk_bin = self.recomp_bin.read(
                    recomp_addr, thunk_presumed_size
                ).rstrip(b"\xcc")
                # Read jump opcode and displacement (last 5 bytes)
                (orig_jmp, orig_disp) = struct.unpack("<Bi", orig_thunk_bin[-5:])
                (recomp_jmp, recomp_disp) = struct.unpack("<Bi", recomp_thunk_bin[-5:])
                # Make sure it's a JMP
                if orig_jmp != 0xE9 or recomp_jmp != 0xE9:
                    continue
                # Calculate jump destination from the end of the JMP instruction
                # i.e. the end of the function
                orig_actual = orig_addr + len(orig_thunk_bin) + orig_disp
                recomp_actual = recomp_addr + len(recomp_thunk_bin) + recomp_disp
                # If they are thunking the same function, then this must be a match.
                if self.is_pointer_match(orig_actual, recomp_actual):
                    if len(orig_thunk_bin) != len(recomp_thunk_bin):
                        logger.warning(
                            "Adjuster thunk %s (0x%x) is not exact",
                            thunk_fn.name,
                            orig_addr,
                        )
                    self._db.set_function_pair(orig_addr, recomp_addr)
    def _dump_asm(self, orig_combined, recomp_combined):
        """Append the provided assembly output to the debug files"""
        with open(f"orig-{self.runid}.txt", "a", encoding="utf-8") as f:
            for addr, line in orig_combined:
                f.write(f"{addr}: {line}\n")
        with open(f"recomp-{self.runid}.txt", "a", encoding="utf-8") as f:
            for addr, line in recomp_combined:
                f.write(f"{addr}: {line}\n")
    def _compare_function(self, match: MatchInfo) -> DiffReport:
        # Detect when the recomp function size would cause us to read
        # enough bytes from the original function that we cross into
        # the next annotated function.
        next_orig = self._db.get_next_orig_addr(match.orig_addr)
        if next_orig is not None:
            orig_size = min(next_orig - match.orig_addr, match.size)
        else:
            orig_size = match.size
        orig_raw = self.orig_bin.read(match.orig_addr, orig_size)
        recomp_raw = self.recomp_bin.read(match.recomp_addr, match.size)
        # It's unlikely that a function other than an adjuster thunk would
        # start with a SUB instruction, so alert to a possible wrong
        # annotation here.
        # There's probably a better place to do this, but we're reading
        # the function bytes here already.
        try:
            if orig_raw[0] == 0x2B and recomp_raw[0] != 0x2B:
                logger.warning(
                    "Possible thunk at 0x%x (%s)", match.orig_addr, match.name
                )
        except IndexError:
            pass
        def orig_lookup(addr: int, exact: bool) -> Optional[str]:
            m = self._db.get_by_orig(addr, exact)
            if m is None:
                return None
            if m.orig_addr == addr:
                return m.match_name()
            offset = addr - m.orig_addr
            if m.compare_type != SymbolType.DATA or offset >= m.size:
                return None
            return m.offset_name(offset)
        def recomp_lookup(addr: int, exact: bool) -> Optional[str]:
            m = self._db.get_by_recomp(addr, exact)
            if m is None:
                return None
            if m.recomp_addr == addr:
                return m.match_name()
            offset = addr - m.recomp_addr
            if m.compare_type != SymbolType.DATA or offset >= m.size:
                return None
            return m.offset_name(offset)
        orig_should_replace = create_reloc_lookup(self.orig_bin)
        recomp_should_replace = create_reloc_lookup(self.recomp_bin)
        orig_bin_lookup = create_bin_lookup(self.orig_bin)
        recomp_bin_lookup = create_bin_lookup(self.recomp_bin)
        orig_parse = ParseAsm(
            relocate_lookup=orig_should_replace,
            name_lookup=orig_lookup,
            bin_lookup=orig_bin_lookup,
        )
        recomp_parse = ParseAsm(
            relocate_lookup=recomp_should_replace,
            name_lookup=recomp_lookup,
            bin_lookup=recomp_bin_lookup,
        )
        orig_combined = orig_parse.parse_asm(orig_raw, match.orig_addr)
        recomp_combined = recomp_parse.parse_asm(recomp_raw, match.recomp_addr)
        if self.debug:
            self._dump_asm(orig_combined, recomp_combined)
        # Detach addresses from asm lines for the text diff.
        orig_asm = [x[1] for x in orig_combined]
        recomp_asm = [x[1] for x in recomp_combined]
        diff = difflib.SequenceMatcher(None, orig_asm, recomp_asm)
        ratio = diff.ratio()
        if ratio != 1.0:
            # Check whether we can resolve register swaps which are actually
            # perfect matches modulo compiler entropy.
            codes = diff.get_opcodes()
            is_effective_match = find_effective_match(codes, orig_asm, recomp_asm)
            unified_diff = combined_diff(
                diff, orig_combined, recomp_combined, context_size=10
            )
        else:
            is_effective_match = False
            unified_diff = []
        return DiffReport(
            match_type=SymbolType.FUNCTION,
            orig_addr=match.orig_addr,
            recomp_addr=match.recomp_addr,
            name=match.name,
            udiff=unified_diff,
            ratio=ratio,
            is_effective_match=is_effective_match,
        )
    def _compare_vtable(self, match: MatchInfo) -> DiffReport:
        vtable_size = match.size
        # The vtable size should always be a multiple of 4 because that
        # is the pointer size. If it is not (for whatever reason)
        # it would cause iter_unpack to blow up so let's just fix it.
        if vtable_size % 4 != 0:
            logger.warning(
                "Vtable for class %s has irregular size %d", match.name, vtable_size
            )
            vtable_size = 4 * (vtable_size // 4)
        orig_table = self.orig_bin.read(match.orig_addr, vtable_size)
        recomp_table = self.recomp_bin.read(match.recomp_addr, vtable_size)
        raw_addrs = zip(
            [t for (t,) in struct.iter_unpack("<L", orig_table)],
            [t for (t,) in struct.iter_unpack("<L", recomp_table)],
        )
        def match_text(m: Optional[MatchInfo], raw_addr: Optional[int] = None) -> str:
            """Format the function reference at this vtable index as text.
            If we have not identified this function, we have the option to
            display the raw address. This is only worth doing for the original addr
            because we should always be able to identify the recomp function.
            If the original function is missing then this probably means that the class
            should override the given function from the superclass, but we have not
            implemented this yet.
            """
            if m is not None:
                orig = hex(m.orig_addr) if m.orig_addr is not None else "no orig"
                recomp = (
                    hex(m.recomp_addr) if m.recomp_addr is not None else "no recomp"
                )
                return f"({orig} / {recomp})  :  {m.name}"
            if raw_addr is not None:
                return f"0x{raw_addr:x} from orig not annotated."
            return "(no match)"
        orig_text = []
        recomp_text = []
        ratio = 0
        n_entries = 0
        # Now compare each pointer from the two vtables.
        for i, (raw_orig, raw_recomp) in enumerate(raw_addrs):
            orig = self._db.get_by_orig(raw_orig)
            recomp = self._db.get_by_recomp(raw_recomp)
            if (
                orig is not None
                and recomp is not None
                and orig.recomp_addr == recomp.recomp_addr
            ):
                ratio += 1
            n_entries += 1
            index = f"vtable0x{i*4:02x}"
            orig_text.append((index, match_text(orig, raw_orig)))
            recomp_text.append((index, match_text(recomp)))
        ratio = ratio / float(n_entries) if n_entries > 0 else 0
        # n=100: Show the entire table if there is a diff to display.
        # Otherwise it would be confusing if the table got cut off.
        sm = difflib.SequenceMatcher(
            None,
            [x[1] for x in orig_text],
            [x[1] for x in recomp_text],
        )
        unified_diff = combined_diff(sm, orig_text, recomp_text, context_size=100)
        return DiffReport(
            match_type=SymbolType.VTABLE,
            orig_addr=match.orig_addr,
            recomp_addr=match.recomp_addr,
            name=match.name,
            udiff=unified_diff,
            ratio=ratio,
        )
    def _compare_match(self, match: MatchInfo) -> Optional[DiffReport]:
        """Router for comparison type"""
        if match.size is None or match.size == 0:
            return None
        options = self._db.get_match_options(match.orig_addr)
        if options.get("skip", False):
            return None
        if options.get("stub", False):
            return DiffReport(
                match_type=match.compare_type,
                orig_addr=match.orig_addr,
                recomp_addr=match.recomp_addr,
                name=match.name,
                is_stub=True,
            )
        if match.compare_type == SymbolType.FUNCTION:
            return self._compare_function(match)
        if match.compare_type == SymbolType.VTABLE:
            return self._compare_vtable(match)
        return None
    ## Public API
    def is_pointer_match(self, orig_addr, recomp_addr) -> bool:
        """Check whether these pointers point at the same thing"""
        # Null pointers considered matching
        if orig_addr == 0 and recomp_addr == 0:
            return True
        match = self._db.get_by_orig(orig_addr)
        if match is None:
            return False
        return match.recomp_addr == recomp_addr
    def get_by_orig(self, addr: int) -> Optional[MatchInfo]:
        return self._db.get_by_orig(addr)
    def get_by_recomp(self, addr: int) -> Optional[MatchInfo]:
        return self._db.get_by_recomp(addr)
    def get_all(self) -> List[MatchInfo]:
        return self._db.get_all()
    def get_functions(self) -> List[MatchInfo]:
        return self._db.get_matches_by_type(SymbolType.FUNCTION)
    def get_vtables(self) -> List[MatchInfo]:
        return self._db.get_matches_by_type(SymbolType.VTABLE)
    def get_variables(self) -> List[MatchInfo]:
        return self._db.get_matches_by_type(SymbolType.DATA)
    def compare_address(self, addr: int) -> Optional[DiffReport]:
        match = self._db.get_one_match(addr)
        if match is None:
            return None
        return self._compare_match(match)
    def compare_all(self) -> Iterable[DiffReport]:
        for match in self._db.get_matches():
            diff = self._compare_match(match)
            if diff is not None:
                yield diff
    def compare_functions(self) -> Iterable[DiffReport]:
        for match in self.get_functions():
            diff = self._compare_match(match)
            if diff is not None:
                yield diff
    def compare_variables(self):
        pass
    def compare_pointers(self):
        pass
    def compare_strings(self):
        pass
    def compare_vtables(self) -> Iterable[DiffReport]:
        for match in self.get_vtables():
            diff = self._compare_match(match)
            if diff is not None:
                yield self._compare_match(match)
--- a/tools/isledecomp/isledecomp/compare/db.py
+++ b/tools/isledecomp/isledecomp/compare/db.py
@ -1,549 +0,0 @@
 """Wrapper for database (here an in-memory sqlite database) that collects the
 addresses/symbols that we want to compare between the original and recompiled binaries."""
 import sqlite3
 import logging
 from typing import List, Optional
 from isledecomp.types import SymbolType
 from isledecomp.cvdump.demangler import get_vtordisp_name
 _SETUP_SQL = """
    DROP TABLE IF EXISTS `symbols`;
    DROP TABLE IF EXISTS `match_options`;
    CREATE TABLE `symbols` (
        compare_type int,
        orig_addr int,
        recomp_addr int,
        name text,
        decorated_name text,
        size int
    );
    CREATE TABLE `match_options` (
        addr int not null,
        name text not null,
        value text,
        primary key (addr, name)
    ) without rowid;
    CREATE VIEW IF NOT EXISTS `match_info`
    (compare_type, orig_addr, recomp_addr, name, size) AS
        SELECT compare_type, orig_addr, recomp_addr, name, size
        FROM `symbols`
        ORDER BY orig_addr NULLS LAST;
    CREATE INDEX `symbols_or` ON `symbols` (orig_addr);
    CREATE INDEX `symbols_re` ON `symbols` (recomp_addr);
    CREATE INDEX `symbols_na` ON `symbols` (name);
 """
 class MatchInfo:
    def __init__(
        self,
        ctype: Optional[int],
        orig: Optional[int],
        recomp: Optional[int],
        name: Optional[str],
        size: Optional[int],
    ) -> None:
        self.compare_type = SymbolType(ctype) if ctype is not None else None
        self.orig_addr = orig
        self.recomp_addr = recomp
        self.name = name
        self.size = size
    def match_name(self) -> Optional[str]:
        """Combination of the name and compare type.
        Intended for name substitution in the diff. If there is a diff,
        it will be more obvious what this symbol indicates."""
        if self.name is None:
            return None
        ctype = self.compare_type.name if self.compare_type is not None else "UNK"
        name = repr(self.name) if ctype == "STRING" else self.name
        return f"{name} ({ctype})"
    def offset_name(self, ofs: int) -> Optional[str]:
        if self.name is None:
            return None
        return f"{self.name}+{ofs} (OFFSET)"
 def matchinfo_factory(_, row):
    return MatchInfo(*row)
 logger = logging.getLogger(__name__)
 class CompareDb:
    # pylint: disable=too-many-public-methods
    def __init__(self):
        self._db = sqlite3.connect(":memory:")
        self._db.executescript(_SETUP_SQL)
    def set_orig_symbol(
        self,
        addr: int,
        compare_type: Optional[SymbolType],
        name: Optional[str],
        size: Optional[int],
    ):
        # Ignore collisions here.
        if self._orig_used(addr):
            return
        compare_value = compare_type.value if compare_type is not None else None
        self._db.execute(
            "INSERT INTO `symbols` (orig_addr, compare_type, name, size) VALUES (?,?,?,?)",
            (addr, compare_value, name, size),
        )
    def set_recomp_symbol(
        self,
        addr: int,
        compare_type: Optional[SymbolType],
        name: Optional[str],
        decorated_name: Optional[str],
        size: Optional[int],
    ):
        # Ignore collisions here. The same recomp address can have
        # multiple names (e.g. _strlwr and __strlwr)
        if self._recomp_used(addr):
            return
        compare_value = compare_type.value if compare_type is not None else None
        self._db.execute(
            "INSERT INTO `symbols` (recomp_addr, compare_type, name, decorated_name, size) VALUES (?,?,?,?,?)",
            (addr, compare_value, name, decorated_name, size),
        )
    def get_unmatched_strings(self) -> List[str]:
        """Return any strings not already identified by STRING markers."""
        cur = self._db.execute(
            "SELECT name FROM `symbols` WHERE compare_type = ? AND orig_addr IS NULL",
            (SymbolType.STRING.value,),
        )
        return [string for (string,) in cur.fetchall()]
    def get_all(self) -> List[MatchInfo]:
        cur = self._db.execute("SELECT * FROM `match_info`")
        cur.row_factory = matchinfo_factory
        return cur.fetchall()
    def get_matches(self) -> Optional[MatchInfo]:
        cur = self._db.execute(
            """SELECT * FROM `match_info`
            WHERE orig_addr IS NOT NULL
            AND recomp_addr IS NOT NULL
            """,
        )
        cur.row_factory = matchinfo_factory
        return cur.fetchall()
    def get_one_match(self, addr: int) -> Optional[MatchInfo]:
        cur = self._db.execute(
            """SELECT * FROM `match_info`
            WHERE orig_addr = ?
            AND recomp_addr IS NOT NULL
            """,
            (addr,),
        )
        cur.row_factory = matchinfo_factory
        return cur.fetchone()
    def _get_closest_orig(self, addr: int) -> Optional[int]:
        value = self._db.execute(
            """SELECT max(orig_addr) FROM `symbols`
            WHERE ? >= orig_addr
            LIMIT 1
            """,
            (addr,),
        ).fetchone()
        return value[0] if value is not None else None
    def _get_closest_recomp(self, addr: int) -> Optional[int]:
        value = self._db.execute(
            """SELECT max(recomp_addr) FROM `symbols`
            WHERE ? >= recomp_addr
            LIMIT 1
            """,
            (addr,),
        ).fetchone()
        return value[0] if value is not None else None
    def get_by_orig(self, addr: int, exact: bool = True) -> Optional[MatchInfo]:
        if not exact and not self._orig_used(addr):
            addr = self._get_closest_orig(addr)
            if addr is None:
                return None
        cur = self._db.execute(
            """SELECT * FROM `match_info`
            WHERE orig_addr = ?
            """,
            (addr,),
        )
        cur.row_factory = matchinfo_factory
        return cur.fetchone()
    def get_by_recomp(self, addr: int, exact: bool = True) -> Optional[MatchInfo]:
        if not exact and not self._recomp_used(addr):
            addr = self._get_closest_recomp(addr)
            if addr is None:
                return None
        cur = self._db.execute(
            """SELECT * FROM `match_info`
            WHERE recomp_addr = ?
            """,
            (addr,),
        )
        cur.row_factory = matchinfo_factory
        return cur.fetchone()
    def get_matches_by_type(self, compare_type: SymbolType) -> List[MatchInfo]:
        cur = self._db.execute(
            """SELECT * FROM `match_info`
            WHERE compare_type = ?
            AND orig_addr IS NOT NULL
            AND recomp_addr IS NOT NULL
            """,
            (compare_type.value,),
        )
        cur.row_factory = matchinfo_factory
        return cur.fetchall()
    def _orig_used(self, addr: int) -> bool:
        cur = self._db.execute("SELECT 1 FROM symbols WHERE orig_addr = ?", (addr,))
        return cur.fetchone() is not None
    def _recomp_used(self, addr: int) -> bool:
        cur = self._db.execute("SELECT 1 FROM symbols WHERE recomp_addr = ?", (addr,))
        return cur.fetchone() is not None
    def set_pair(
        self, orig: int, recomp: int, compare_type: Optional[SymbolType] = None
    ) -> bool:
        if self._orig_used(orig):
            logger.error("Original address %s not unique!", hex(orig))
            return False
        compare_value = compare_type.value if compare_type is not None else None
        cur = self._db.execute(
            "UPDATE `symbols` SET orig_addr = ?, compare_type = ? WHERE recomp_addr = ?",
            (orig, compare_value, recomp),
        )
        return cur.rowcount > 0
    def set_pair_tentative(
        self, orig: int, recomp: int, compare_type: Optional[SymbolType] = None
    ) -> bool:
        """Declare a match for the original and recomp addresses given, but only if:
        1. The original address is not used elsewhere (as with set_pair)
        2. The recomp address has not already been matched
        If the compare_type is given, update this also, but only if NULL in the db.
        The purpose here is to set matches found via some automated analysis
        but to not overwrite a match provided by the human operator."""
        if self._orig_used(orig):
            # Probable and expected situation. Just ignore it.
            return False
        compare_value = compare_type.value if compare_type is not None else None
        cur = self._db.execute(
            """UPDATE `symbols`
            SET orig_addr = ?, compare_type = coalesce(compare_type, ?)
            WHERE recomp_addr = ?
            AND orig_addr IS NULL""",
            (orig, compare_value, recomp),
        )
        return cur.rowcount > 0
    def set_function_pair(self, orig: int, recomp: int) -> bool:
        """For lineref match or _entry"""
        return self.set_pair(orig, recomp, SymbolType.FUNCTION)
    def create_orig_thunk(self, addr: int, name: str) -> bool:
        """Create a thunk function reference using the orig address.
        We are here because we have a match on the thunked function,
        but it is not thunked in the recomp build."""
        if self._orig_used(addr):
            return False
        thunk_name = f"Thunk of '{name}'"
        # Assuming relative jump instruction for thunks (5 bytes)
        cur = self._db.execute(
            """INSERT INTO `symbols`
            (orig_addr, compare_type, name, size)
            VALUES (?,?,?,?)""",
            (addr, SymbolType.FUNCTION.value, thunk_name, 5),
        )
        return cur.rowcount > 0
    def create_recomp_thunk(self, addr: int, name: str) -> bool:
        """Create a thunk function reference using the recomp address.
        We start from the recomp side for this because we are guaranteed
        to have full information from the PDB. We can use a regular function
        match later to pull in the orig address."""
        if self._recomp_used(addr):
            return False
        thunk_name = f"Thunk of '{name}'"
        # Assuming relative jump instruction for thunks (5 bytes)
        cur = self._db.execute(
            """INSERT INTO `symbols`
            (recomp_addr, compare_type, name, size)
            VALUES (?,?,?,?)""",
            (addr, SymbolType.FUNCTION.value, thunk_name, 5),
        )
        return cur.rowcount > 0
    def _set_opt_bool(self, addr: int, option: str, enabled: bool = True):
        if enabled:
            self._db.execute(
                """INSERT OR IGNORE INTO `match_options`
                (addr, name)
                VALUES (?, ?)""",
                (addr, option),
            )
        else:
            self._db.execute(
                """DELETE FROM `match_options` WHERE addr = ? AND name = ?""",
                (addr, option),
            )
    def mark_stub(self, orig: int):
        self._set_opt_bool(orig, "stub")
    def skip_compare(self, orig: int):
        self._set_opt_bool(orig, "skip")
    def get_match_options(self, addr: int) -> Optional[dict]:
        cur = self._db.execute(
            """SELECT name, value FROM `match_options` WHERE addr = ?""", (addr,)
        )
        return {
            option: value if value is not None else True
            for (option, value) in cur.fetchall()
        }
    def is_vtordisp(self, recomp_addr: int) -> bool:
        """Check whether this function is a vtordisp based on its
        decorated name. If its demangled name is missing the vtordisp
        indicator, correct that."""
        row = self._db.execute(
            """SELECT name, decorated_name
            FROM `symbols`
            WHERE recomp_addr = ?""",
            (recomp_addr,),
        ).fetchone()
        if row is None:
            return False
        (name, decorated_name) = row
        if "`vtordisp" in name:
            return True
        new_name = get_vtordisp_name(decorated_name)
        if new_name is None:
            return False
        self._db.execute(
            """UPDATE `symbols`
            SET name = ?
            WHERE recomp_addr = ?""",
            (new_name, recomp_addr),
        )
        return True
    def _find_potential_match(
        self, name: str, compare_type: SymbolType
    ) -> Optional[int]:
        """Name lookup"""
        match_decorate = compare_type != SymbolType.STRING and name.startswith("?")
        if match_decorate:
            sql = """
            SELECT recomp_addr
            FROM `symbols`
            WHERE orig_addr IS NULL
            AND decorated_name = ?
            AND (compare_type IS NULL OR compare_type = ?)
            LIMIT 1
            """
        else:
            sql = """
            SELECT recomp_addr
            FROM `symbols`
            WHERE orig_addr IS NULL
            AND name = ?
            AND (compare_type IS NULL OR compare_type = ?)
            LIMIT 1
            """
        row = self._db.execute(sql, (name, compare_type.value)).fetchone()
        return row[0] if row is not None else None
    def _find_static_variable(
        self, variable_name: str, function_sym: str
    ) -> Optional[int]:
        """Get the recomp address of a static function variable.
        Matches using a LIKE clause on the combination of:
        1. The variable name read from decomp marker.
        2. The decorated name of the enclosing function.
        For example, the variable "g_startupDelay" from function "IsleApp::Tick"
        has symbol: `?g_startupDelay@?1??Tick@IsleApp@@QAEXH@Z@4HA`
        The function's decorated name is: `?Tick@IsleApp@@QAEXH@Z`"""
        row = self._db.execute(
            """SELECT recomp_addr FROM `symbols`
            WHERE decorated_name LIKE '%' || ? || '%' || ? || '%'
            AND orig_addr IS NULL
            AND (compare_type = ? OR compare_type = ? OR compare_type IS NULL)""",
            (
                variable_name,
                function_sym,
                SymbolType.DATA.value,
                SymbolType.POINTER.value,
            ),
        ).fetchone()
        return row[0] if row is not None else None
    def _match_on(self, compare_type: SymbolType, addr: int, name: str) -> bool:
        # Update the compare_type here too since the marker tells us what we should do
        # Truncate the name to 255 characters. It will not be possible to match a name
        # longer than that because MSVC truncates the debug symbols to this length.
        # See also: warning C4786.
        name = name[:255]
        logger.debug("Looking for %s %s", compare_type.name.lower(), name)
        recomp_addr = self._find_potential_match(name, compare_type)
        if recomp_addr is None:
            return False
        return self.set_pair(addr, recomp_addr, compare_type)
    def get_next_orig_addr(self, addr: int) -> Optional[int]:
        """Return the original address (matched or not) that follows
        the one given. If our recomp function size would cause us to read
        too many bytes for the original function, we can adjust it."""
        result = self._db.execute(
            """SELECT orig_addr
            FROM `symbols`
            WHERE orig_addr > ?
            ORDER BY orig_addr
            LIMIT 1""",
            (addr,),
        ).fetchone()
        return result[0] if result is not None else None
    def match_function(self, addr: int, name: str) -> bool:
        did_match = self._match_on(SymbolType.FUNCTION, addr, name)
        if not did_match:
            logger.error("Failed to find function symbol with name: %s", name)
        return did_match
    def match_vtable(
        self, addr: int, name: str, base_class: Optional[str] = None
    ) -> bool:
        # Set up our potential match names
        bare_vftable = f"{name}::`vftable'"
        for_name = base_class if base_class is not None else name
        for_vftable = f"{name}::`vftable'{{for `{for_name}'}}"
        # Only allow a match against "Class:`vftable'"
        # if this is the derived class.
        if base_class is None or base_class == name:
            name_options = (for_vftable, bare_vftable)
        else:
            name_options = (for_vftable, for_vftable)
        row = self._db.execute(
            """
            SELECT recomp_addr
            FROM `symbols`
            WHERE orig_addr IS NULL
            AND (name = ? OR name = ?)
            AND (compare_type = ?)
            LIMIT 1
            """,
            (*name_options, SymbolType.VTABLE.value),
        ).fetchone()
        if row is not None and self.set_pair(addr, row[0], SymbolType.VTABLE):
            return True
        logger.error("Failed to find vtable for class: %s", name)
        return False
    def match_static_variable(self, addr: int, name: str, function_addr: int) -> bool:
        """Matching a static function variable by combining the variable name
        with the decorated (mangled) name of its parent function."""
        cur = self._db.execute(
            """SELECT name, decorated_name
            FROM `symbols`
            WHERE orig_addr = ?""",
            (function_addr,),
        )
        if (result := cur.fetchone()) is None:
            logger.error("No function for static variable: %s", name)
            return False
        # Get the friendly name for the "failed to match" error message
        (function_name, decorated_name) = result
        recomp_addr = self._find_static_variable(name, decorated_name)
        if recomp_addr is not None:
            # TODO: This variable could be a pointer, but I don't think we
            # have a way to tell that right now.
            if self.set_pair(addr, recomp_addr, SymbolType.DATA):
                return True
        logger.error(
            "Failed to match static variable %s from function %s",
            name,
            function_name,
        )
        return False
    def match_variable(self, addr: int, name: str) -> bool:
        did_match = self._match_on(SymbolType.DATA, addr, name) or self._match_on(
            SymbolType.POINTER, addr, name
        )
        if not did_match:
            logger.error("Failed to find variable: %s", name)
        return did_match
    def match_string(self, addr: int, value: str) -> bool:
        did_match = self._match_on(SymbolType.STRING, addr, value)
        if not did_match:
            escaped = repr(value)
            logger.error("Failed to find string: %s", escaped)
        return did_match
--- a/tools/isledecomp/isledecomp/compare/diff.py
+++ b/tools/isledecomp/isledecomp/compare/diff.py
@ -1,98 +0,0 @@
 from difflib import SequenceMatcher
 from typing import Dict, List, Tuple
 CombinedDiffInput = List[Tuple[str, str]]
 CombinedDiffOutput = List[Tuple[str, List[Dict[str, Tuple[str, str]]]]]
 def combined_diff(
    diff: SequenceMatcher,
    orig_combined: CombinedDiffInput,
    recomp_combined: CombinedDiffInput,
    context_size: int = 3,
 ) -> CombinedDiffOutput:
    """We want to diff the original and recomp assembly. The "combined" assembly
    input has two components: the address of the instruction and the assembly text.
    We have already diffed the text only. This is the SequenceMatcher object.
    The SequenceMatcher can generate "opcodes" that describe how to turn "Text A"
    into "Text B". These refer to list indices of the original arrays, so we can
    use those to create the final diff and include the address for each line of assembly.
    This is almost the same procedure as the difflib.unified_diff function, but we
    are reusing the already generated SequenceMatcher object.
    """
    unified_diff = []
    for group in diff.get_grouped_opcodes(context_size):
        subgroups = []
        # Keep track of the addresses we've seen in this diff group.
        # This helps create the "@@" line. (Does this have a name?)
        # Do it this way because not every line in each list will have an
        # address. If our context begins or ends on a line that does not
        # have one, we will have an incomplete range string.
        orig_addrs = set()
        recomp_addrs = set()
        first, last = group[0], group[-1]
        orig_range = len(orig_combined[first[1] : last[2]])
        recomp_range = len(recomp_combined[first[3] : last[4]])
        for code, i1, i2, j1, j2 in group:
            if code == "equal":
                # The sections are equal, so the list slices are guaranteed
                # to have the same length. We only need the diffed value (asm text)
                # from one of the lists, but we need the addresses from both.
                # Use zip to put the two lists together and then take out what we want.
                both = [
                    (a, b, c)
                    for ((a, b), (c, _)) in zip(
                        orig_combined[i1:i2], recomp_combined[j1:j2]
                    )
                ]
                for orig_addr, _, recomp_addr in both:
                    if orig_addr is not None:
                        orig_addrs.add(orig_addr)
                    if recomp_addr is not None:
                        recomp_addrs.add(recomp_addr)
                subgroups.append({"both": both})
            else:
                for orig_addr, _ in orig_combined[i1:i2]:
                    if orig_addr is not None:
                        orig_addrs.add(orig_addr)
                for recomp_addr, _ in recomp_combined[j1:j2]:
                    if recomp_addr is not None:
                        recomp_addrs.add(recomp_addr)
                subgroups.append(
                    {
                        "orig": orig_combined[i1:i2],
                        "recomp": recomp_combined[j1:j2],
                    }
                )
        orig_sorted = sorted(orig_addrs)
        recomp_sorted = sorted(recomp_addrs)
        # We could get a diff group that has no original addresses.
        # This might happen for a stub function where we are not able to
        # produce even a single instruction from the original.
        # In that case, show the best slug line that we can.
        def peek_front(list_, default=""):
            try:
                return list_[0]
            except IndexError:
                return default
        orig_first = peek_front(orig_sorted)
        recomp_first = peek_front(recomp_sorted)
        diff_slug = f"@@ -{orig_first},{orig_range} +{recomp_first},{recomp_range} @@"
        unified_diff.append((diff_slug, subgroups))
    return unified_diff
--- a/tools/isledecomp/isledecomp/compare/lines.py
+++ b/tools/isledecomp/isledecomp/compare/lines.py
@ -1,69 +0,0 @@
 """Database used to match (filename, line_number) pairs
 between FUNCTION markers and PDB analysis."""
 import sqlite3
 import logging
 from functools import cache
 from typing import Optional
 from pathlib import Path
 from isledecomp.dir import PathResolver
 _SETUP_SQL = """
    DROP TABLE IF EXISTS `lineref`;
    CREATE TABLE `lineref` (
        path text not null,
        filename text not null,
        line int not null,
        addr int not null
    );
    CREATE INDEX `file_line` ON `lineref` (filename, line);
 """
 logger = logging.getLogger(__name__)
@cache
 def my_samefile(path: str, source_path: str) -> bool:
    return Path(path).samefile(source_path)
@cache
 def my_basename_lower(path: str) -> str:
    return Path(path).name.lower()
 class LinesDb:
    def __init__(self, code_dir) -> None:
        self._db = sqlite3.connect(":memory:")
        self._db.executescript(_SETUP_SQL)
        self._path_resolver = PathResolver(code_dir)
    def add_line(self, path: str, line_no: int, addr: int):
        """To be added from the LINES section of cvdump."""
        sourcepath = self._path_resolver.resolve_cvdump(path)
        filename = my_basename_lower(sourcepath)
        self._db.execute(
            "INSERT INTO `lineref` (path, filename, line, addr) VALUES (?,?,?,?)",
            (sourcepath, filename, line_no, addr),
        )
    def search_line(self, path: str, line_no: int) -> Optional[int]:
        """Using path and line number from FUNCTION marker,
        get the address of this function in the recomp."""
        filename = my_basename_lower(path)
        cur = self._db.execute(
            "SELECT path, addr FROM `lineref` WHERE filename = ? AND line = ?",
            (filename, line_no),
        )
        for source_path, addr in cur.fetchall():
            if my_samefile(path, source_path):
                return addr
        logger.error(
            "Failed to find function symbol with filename and line: %s:%d",
            path,
            line_no,
        )
        return None
--- a/tools/isledecomp/isledecomp/cvdump/init.py
+++ b/tools/isledecomp/isledecomp/cvdump/init.py
@ -1,4 +0,0 @@
 from .analysis import CvdumpAnalysis
 from .parser import CvdumpParser
 from .runner import Cvdump
 from .types import CvdumpTypesParser
--- a/tools/isledecomp/isledecomp/cvdump/analysis.py
+++ b/tools/isledecomp/isledecomp/cvdump/analysis.py
@ -1,178 +0,0 @@
 """For collating the results from parsing cvdump.exe into a more directly useful format."""
 from typing import Dict, List, Tuple, Optional
 from isledecomp.types import SymbolType
 from .parser import CvdumpParser
 from .demangler import demangle_string_const, demangle_vtable
 from .types import CvdumpKeyError, CvdumpIntegrityError
 class CvdumpNode:
    # pylint: disable=too-many-instance-attributes
    # These two are required and allow us to identify the symbol
    section: int
    offset: int
    # aka the mangled name from the PUBLICS section
    decorated_name: Optional[str] = None
    # optional "nicer" name (e.g. of a function from SYMBOLS section)
    friendly_name: Optional[str] = None
    # To be determined by context after inserting data, unless the decorated
    # name makes this obvious. (i.e. string constants or vtables)
    # We choose not to assume that section 1 (probably ".text") contains only
    # functions. Smacker functions are linked to their own section "_UNSTEXT"
    node_type: Optional[SymbolType] = None
    # Function size can be read from the LINES section so use this over any
    # other value if we have it.
    # TYPES section can tell us the size of structs and other complex types.
    confirmed_size: Optional[int] = None
    # Estimated by reading the distance between this symbol and the one that
    # follows in the same section.
    # If this is the last symbol in the section, we cannot estimate a size.
    estimated_size: Optional[int] = None
    # Size as reported by SECTION CONTRIBUTIONS section. Not guaranteed to be
    # accurate.
    section_contribution: Optional[int] = None
    def __init__(self, section: int, offset: int) -> None:
        self.section = section
        self.offset = offset
    def set_decorated(self, name: str):
        self.decorated_name = name
        if self.decorated_name.startswith("??_7"):
            self.node_type = SymbolType.VTABLE
            self.friendly_name = demangle_vtable(self.decorated_name)
        elif self.decorated_name.startswith("??_8"):
            # This is the `vbtable' symbol for virtual inheritance.
            # Should be okay to reuse demangle_vtable. We still want to
            # remove things like "const" from the output.
            self.node_type = SymbolType.DATA
            self.friendly_name = demangle_vtable(self.decorated_name)
        elif self.decorated_name.startswith("??_C@"):
            self.node_type = SymbolType.STRING
            (strlen, _) = demangle_string_const(self.decorated_name)
            self.confirmed_size = strlen
        elif not self.decorated_name.startswith("?") and "@" in self.decorated_name:
            # C mangled symbol. The trailing at-sign with number tells the number of bytes
            # in the parameter list for __stdcall, __fastcall, or __vectorcall
            # For __cdecl it is more ambiguous and we would have to know which section we are in.
            # https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170#FormatC
            self.node_type = SymbolType.FUNCTION
    def name(self) -> Optional[str]:
        """Prefer "friendly" name if we have it.
        This is what we have been using to match functions."""
        return (
            self.friendly_name
            if self.friendly_name is not None
            else self.decorated_name
        )
    def size(self) -> Optional[int]:
        if self.confirmed_size is not None:
            return self.confirmed_size
        # Better to undershoot the size because we can identify a comparison gap easily
        if self.estimated_size is not None and self.section_contribution is not None:
            return min(self.estimated_size, self.section_contribution)
        # Return whichever one we have, or neither
        return self.estimated_size or self.section_contribution
 class CvdumpAnalysis:
    """Collects the results from CvdumpParser into a list of nodes (i.e. symbols).
    These can then be analyzed by a downstream tool."""
    nodes = List[CvdumpNode]
    verified_lines = Dict[Tuple[str, str], Tuple[str, str]]
    def __init__(self, parser: CvdumpParser):
        """Read in as much information as we have from the parser.
        The more sections we have, the better our information will be."""
        node_dict = {}
        # PUBLICS is our roadmap for everything that follows.
        for pub in parser.publics:
            key = (pub.section, pub.offset)
            if key not in node_dict:
                node_dict[key] = CvdumpNode(*key)
            node_dict[key].set_decorated(pub.name)
        for sizeref in parser.sizerefs:
            key = (sizeref.section, sizeref.offset)
            if key not in node_dict:
                node_dict[key] = CvdumpNode(*key)
            node_dict[key].section_contribution = sizeref.size
        for glo in parser.globals:
            key = (glo.section, glo.offset)
            if key not in node_dict:
                node_dict[key] = CvdumpNode(*key)
            node_dict[key].node_type = SymbolType.DATA
            node_dict[key].friendly_name = glo.name
            try:
                # Check our types database for type information.
                # If we did not parse the TYPES section, we can only
                # get information for built-in "T_" types.
                g_info = parser.types.get(glo.type)
                node_dict[key].confirmed_size = g_info.size
                # Previously we set the symbol type to POINTER here if
                # the variable was known to be a pointer. We can derive this
                # information later when it's time to compare the variable,
                # so let's set these to symbol type DATA instead.
                # POINTER will be reserved for non-variable pointer data.
                # e.g. thunks, unwind section.
            except (CvdumpKeyError, CvdumpIntegrityError):
                # No big deal if we don't have complete type information.
                pass
        for key, _ in parser.lines.items():
            # Here we only set if the section:offset already exists
            # because our values include offsets inside of the function.
            if key in node_dict:
                node_dict[key].node_type = SymbolType.FUNCTION
        # The LINES section contains every code line in the file, naturally.
        # There isn't an obvious separation between functions, so we have to
        # read everything. However, any function that would be in LINES
        # has to be somewhere else in the PDB (probably PUBLICS).
        # Isolate the lines that we actually care about for matching.
        self.verified_lines = {
            key: value for (key, value) in parser.lines.items() if key in node_dict
        }
        for sym in parser.symbols:
            key = (sym.section, sym.offset)
            if key not in node_dict:
                node_dict[key] = CvdumpNode(*key)
            if sym.type == "S_GPROC32":
                node_dict[key].friendly_name = sym.name
                node_dict[key].confirmed_size = sym.size
                node_dict[key].node_type = SymbolType.FUNCTION
        self.nodes = [v for _, v in dict(sorted(node_dict.items())).items()]
        self._estimate_size()
    def _estimate_size(self):
        """Get the distance between one section:offset value and the next one
        in the same section. This gives a rough estimate of the size of the symbol.
        If we have information from SECTION CONTRIBUTIONS, take whichever one is
        less to get the best approximate size."""
        for i in range(len(self.nodes) - 1):
            this_node = self.nodes[i]
            next_node = self.nodes[i + 1]
            # If they are in different sections, we can't compare them
            if this_node.section != next_node.section:
                continue
            this_node.estimated_size = next_node.offset - this_node.offset
--- a/tools/isledecomp/isledecomp/cvdump/demangler.py
+++ b/tools/isledecomp/isledecomp/cvdump/demangler.py
@ -1,121 +0,0 @@
 """For demangling a subset of MSVC mangled symbols.
 Some unofficial information about the mangling scheme is here:
 https://en.wikiversity.org/wiki/Visual_C%2B%2B_name_mangling
 """
 import re
 from collections import namedtuple
 from typing import Optional
 import pydemangler
 class InvalidEncodedNumberError(Exception):
    pass
 _encoded_number_translate = str.maketrans("ABCDEFGHIJKLMNOP", "0123456789ABCDEF")
 def parse_encoded_number(string: str) -> int:
    # TODO: assert string ends in "@"?
    if string.endswith("@"):
        string = string[:-1]
    try:
        return int(string.translate(_encoded_number_translate), 16)
    except ValueError as e:
        raise InvalidEncodedNumberError(string) from e
 string_const_regex = re.compile(
    r"\?\?_C@\_(?P<is_utf16>[0-1])(?P<len>\d|[A-P]+@)(?P<hash>\w+)@(?P<value>.+)@"
 )
 StringConstInfo = namedtuple("StringConstInfo", "len is_utf16")
 def demangle_string_const(symbol: str) -> Optional[StringConstInfo]:
    """Don't bother to decode the string text from the symbol.
    We can just read it from the binary once we have the length."""
    match = string_const_regex.match(symbol)
    if match is None:
        return None
    try:
        strlen = (
            parse_encoded_number(match.group("len"))
            if "@" in match.group("len")
            else int(match.group("len"))
        )
    except (ValueError, InvalidEncodedNumberError):
        return None
    is_utf16 = match.group("is_utf16") == "1"
    return StringConstInfo(len=strlen, is_utf16=is_utf16)
 def get_vtordisp_name(symbol: str) -> Optional[str]:
    # pylint: disable=c-extension-no-member
    """For adjuster thunk functions, the PDB will sometimes use a name
    that contains "vtordisp" but often will just reuse the name of the
    function being thunked. We want to use the vtordisp name if possible."""
    name = pydemangler.demangle(symbol)
    if name is None:
        return None
    if "`vtordisp" not in name:
        return None
    # Now we remove the parts of the friendly name that we don't need
    try:
        # Assuming this is the last of the function prefixes
        thiscall_idx = name.index("__thiscall")
        # To match the end of the `vtordisp{x,y}' string
        end_idx = name.index("}'")
        return name[thiscall_idx + 11 : end_idx + 2]
    except ValueError:
        return name
 def demangle_vtable(symbol: str) -> str:
    # pylint: disable=c-extension-no-member
    """Get the class name referenced in the vtable symbol."""
    raw = pydemangler.demangle(symbol)
    if raw is None:
        pass  # TODO: This shouldn't happen if MSVC behaves
    # Remove storage class and other stuff we don't care about
    return (
        raw.replace("<class ", "<")
        .replace("<struct ", "<")
        .replace("const ", "")
        .replace("volatile ", "")
    )
 def demangle_vtable_ourselves(symbol: str) -> str:
    """Parked implementation of MSVC symbol demangling.
    We only use this for vtables and it works okay with the simple cases or
    templates that refer to other classes/structs. Some namespace support.
    Does not support backrefs, primitive types, or vtables with
    virtual inheritance."""
    # Seek ahead 4 chars to strip off "??_7" prefix
    t = symbol[4:].split("@")
    # "?$" indicates a template class
    if t[0].startswith("?$"):
        class_name = t[0][2:]
        # PA = Pointer/reference
        # V or U = class or struct
        if t[1].startswith("PA"):
            generic = f"{t[1][3:]} *"
        else:
            generic = t[1][1:]
        return f"{class_name}<{generic}>::`vftable'"
    # If we have two classes listed, it is a namespace hierarchy.
    # @@6B@ is a common generic suffix for these vtable symbols.
    if t[1] != "" and t[1] != "6B":
        return t[1] + "::" + t[0] + "::`vftable'"
    return t[0] + "::`vftable'"
--- a/tools/isledecomp/isledecomp/cvdump/parser.py
+++ b/tools/isledecomp/isledecomp/cvdump/parser.py
@ -1,199 +0,0 @@
 import re
 from typing import Iterable, Tuple
 from collections import namedtuple
 from .types import CvdumpTypesParser
 # e.g. `*** PUBLICS`
 _section_change_regex = re.compile(r"\*\*\* (?P<section>[A-Z/ ]{2,})")
 # e.g. `     27 00034EC0     28 00034EE2     29 00034EE7     30 00034EF4`
 _line_addr_pairs_findall = re.compile(r"\s+(?P<line_no>\d+) (?P<addr>[A-F0-9]{8})")
 # We assume no spaces in the file name
 # e.g. `  Z:\lego-island\isle\LEGO1\viewmanager\viewroi.cpp (None), 0001:00034E90-00034E97, line/addr pairs = 2`
 _lines_subsection_header = re.compile(
    r"^\s*(?P<filename>\S+).*?, (?P<section>[A-F0-9]{4}):(?P<start>[A-F0-9]{8})-(?P<end>[A-F0-9]{8}), line/addr pairs = (?P<len>\d+)"
 )
 # e.g. `S_PUB32: [0001:0003FF60], Flags: 00000000, __read`
 _publics_line_regex = re.compile(
    r"^(?P<type>\w+): \[(?P<section>\w{4}):(?P<offset>\w{8})], Flags: (?P<flags>\w{8}), (?P<name>\S+)"
 )
 # e.g. `(00008C) S_GPROC32: [0001:00034E90], Cb: 00000007, Type:             0x1024, ViewROI::IntrinsicImportance`
 _symbol_line_regex = re.compile(
    r"\(\w+\) (?P<type>\S+): \[(?P<section>\w{4}):(?P<offset>\w{8})\], Cb: (?P<size>\w+), Type:\s+\S+, (?P<name>.+)"
 )
 # e.g. `         Debug start: 00000008, Debug end: 0000016E`
 _gproc_debug_regex = re.compile(
    r"\s*Debug start: (?P<start>\w{8}), Debug end: (?P<end>\w{8})"
 )
 # e.g. `  00DA  0001:00000000  00000073  60501020`
 _section_contrib_regex = re.compile(
    r"\s*(?P<module>\w{4})  (?P<section>\w{4}):(?P<offset>\w{8})  (?P<size>\w{8})  (?P<flags>\w{8})"
 )
 # e.g. `S_GDATA32: [0003:000004A4], Type:   T_32PRCHAR(0470), g_set`
 _gdata32_regex = re.compile(
    r"S_GDATA32: \[(?P<section>\w{4}):(?P<offset>\w{8})\], Type:\s*(?P<type>\S+), (?P<name>.+)"
 )
 # e.g. 0003 "CMakeFiles/isle.dir/ISLE/res/isle.rc.res"
 # e.g. 0004 "C:\work\lego-island\isle\3rdparty\smartheap\SHLW32MT.LIB" "check.obj"
 _module_regex = re.compile(r"(?P<id>\w{4})(?: \"(?P<lib>.+?)\")?(?: \"(?P<obj>.+?)\")")
 # User functions only
 LinesEntry = namedtuple("LinesEntry", "filename line_no section offset")
 # Strings, vtables, functions
 # superset of everything else
 # only place you can find the C symbols (library functions, smacker, etc)
 PublicsEntry = namedtuple("PublicsEntry", "type section offset flags name")
 # S_GPROC32 = functions
 SymbolsEntry = namedtuple("SymbolsEntry", "type section offset size name")
 # (Estimated) size of any symbol
 SizeRefEntry = namedtuple("SizeRefEntry", "module section offset size")
 # global variables
 GdataEntry = namedtuple("GdataEntry", "section offset type name")
 ModuleEntry = namedtuple("ModuleEntry", "id lib obj")
 class CvdumpParser:
    # pylint: disable=too-many-instance-attributes
    def __init__(self) -> None:
        self._section: str = ""
        self._lines_function: Tuple[str, int] = ("", 0)
        self.lines = {}
        self.publics = []
        self.symbols = []
        self.sizerefs = []
        self.globals = []
        self.modules = []
        self.types = CvdumpTypesParser()
    def _lines_section(self, line: str):
        """Parsing entries from the LINES section. We only care about the pairs of
        line_number and address and the subsection header to indicate which code file
        we are in."""
        # Subheader indicates a new function and possibly a new code filename.
        # Save the section here because it is not given on the lines that follow.
        if (match := _lines_subsection_header.match(line)) is not None:
            self._lines_function = (
                match.group("filename"),
                int(match.group("section"), 16),
            )
            return
        # Match any pairs as we find them
        for line_no, offset in _line_addr_pairs_findall.findall(line):
            key = (self._lines_function[1], int(offset, 16))
            self.lines[key] = (self._lines_function[0], int(line_no))
    def _publics_section(self, line: str):
        """Match each line from PUBLICS and pull out the symbol information.
        These are MSVC mangled symbol names. String constants and vtable
        addresses can only be found here."""
        if (match := _publics_line_regex.match(line)) is not None:
            self.publics.append(
                PublicsEntry(
                    type=match.group("type"),
                    section=int(match.group("section"), 16),
                    offset=int(match.group("offset"), 16),
                    flags=int(match.group("flags"), 16),
                    name=match.group("name"),
                )
            )
    def _globals_section(self, line: str):
        """S_PROCREF may be useful later.
        Right now we just want S_GDATA32 symbols because it is the simplest
        way to access global variables."""
        if (match := _gdata32_regex.match(line)) is not None:
            self.globals.append(
                GdataEntry(
                    section=int(match.group("section"), 16),
                    offset=int(match.group("offset"), 16),
                    type=match.group("type"),
                    name=match.group("name"),
                )
            )
    def _symbols_section(self, line: str):
        """We are interested in S_GPROC32 symbols only."""
        if (match := _symbol_line_regex.match(line)) is not None:
            if match.group("type") == "S_GPROC32":
                self.symbols.append(
                    SymbolsEntry(
                        type=match.group("type"),
                        section=int(match.group("section"), 16),
                        offset=int(match.group("offset"), 16),
                        size=int(match.group("size"), 16),
                        name=match.group("name"),
                    )
                )
    def _section_contributions(self, line: str):
        """Gives the size of elements across all sections of the binary.
        This is the easiest way to get the data size for .data and .rdata
        members that do not have a primitive data type."""
        if (match := _section_contrib_regex.match(line)) is not None:
            self.sizerefs.append(
                SizeRefEntry(
                    module=int(match.group("module"), 16),
                    section=int(match.group("section"), 16),
                    offset=int(match.group("offset"), 16),
                    size=int(match.group("size"), 16),
                )
            )
    def _modules_section(self, line: str):
        """Record the object file (and lib file, if used) linked into the binary.
        The auto-incrementing id is cross-referenced in SECTION CONTRIBUTIONS
        (and perhaps other locations)"""
        if (match := _module_regex.match(line)) is not None:
            self.modules.append(
                ModuleEntry(
                    id=int(match.group("id"), 16),
                    lib=match.group("lib"),
                    obj=match.group("obj"),
                )
            )
    def read_line(self, line: str):
        if (match := _section_change_regex.match(line)) is not None:
            self._section = match.group(1)
            return
        if self._section == "TYPES":
            self.types.read_line(line)
        elif self._section == "SYMBOLS":
            self._symbols_section(line)
        elif self._section == "LINES":
            self._lines_section(line)
        elif self._section == "PUBLICS":
            self._publics_section(line)
        elif self._section == "SECTION CONTRIBUTIONS":
            self._section_contributions(line)
        elif self._section == "GLOBALS":
            self._globals_section(line)
        elif self._section == "MODULES":
            self._modules_section(line)
    def read_lines(self, lines: Iterable[str]):
        for line in lines:
            self.read_line(line)
--- a/tools/isledecomp/isledecomp/cvdump/runner.py
+++ b/tools/isledecomp/isledecomp/cvdump/runner.py
@ -1,83 +0,0 @@
 import io
 from os import name as os_name
 from enum import Enum
 from typing import List
 import subprocess
 from isledecomp.lib import lib_path_join
 from isledecomp.dir import winepath_unix_to_win
 from .parser import CvdumpParser
 class DumpOpt(Enum):
    LINES = 0
    SYMBOLS = 1
    GLOBALS = 2
    PUBLICS = 3
    SECTION_CONTRIB = 4
    MODULES = 5
    TYPES = 6
 cvdump_opt_map = {
    DumpOpt.LINES: "-l",
    DumpOpt.SYMBOLS: "-s",
    DumpOpt.GLOBALS: "-g",
    DumpOpt.PUBLICS: "-p",
    DumpOpt.SECTION_CONTRIB: "-seccontrib",
    DumpOpt.MODULES: "-m",
    DumpOpt.TYPES: "-t",
 }
 class Cvdump:
    def __init__(self, pdb: str) -> None:
        self._pdb: str = pdb
        self._options = set()
    def lines(self):
        self._options.add(DumpOpt.LINES)
        return self
    def symbols(self):
        self._options.add(DumpOpt.SYMBOLS)
        return self
    def globals(self):
        self._options.add(DumpOpt.GLOBALS)
        return self
    def publics(self):
        self._options.add(DumpOpt.PUBLICS)
        return self
    def section_contributions(self):
        self._options.add(DumpOpt.SECTION_CONTRIB)
        return self
    def modules(self):
        self._options.add(DumpOpt.MODULES)
        return self
    def types(self):
        self._options.add(DumpOpt.TYPES)
        return self
    def cmd_line(self) -> List[str]:
        cvdump_exe = lib_path_join("cvdump.exe")
        flags = [cvdump_opt_map[opt] for opt in self._options]
        if os_name == "nt":
            return [cvdump_exe, *flags, self._pdb]
        return ["wine", cvdump_exe, *flags, winepath_unix_to_win(self._pdb)]
    def run(self) -> CvdumpParser:
        parser = CvdumpParser()
        call = self.cmd_line()
        with subprocess.Popen(call, stdout=subprocess.PIPE) as proc:
            for line in io.TextIOWrapper(proc.stdout, encoding="utf-8"):
                # Blank lines are there to help the reader; they have no context significance
                if line != "\n":
                    parser.read_line(line)
        return parser
--- a/tools/isledecomp/isledecomp/cvdump/types.py
+++ b/tools/isledecomp/isledecomp/cvdump/types.py
@ -1,453 +0,0 @@
 import re
 from typing import Dict, List, NamedTuple, Optional
 class CvdumpTypeError(Exception):
    pass
 class CvdumpKeyError(KeyError):
    pass
 class CvdumpIntegrityError(Exception):
    pass
 class FieldListItem(NamedTuple):
    """Member of a class or structure"""
    offset: int
    name: str
    type: str
 class ScalarType(NamedTuple):
    offset: int
    name: Optional[str]
    type: str
    @property
    def size(self) -> int:
        return scalar_type_size(self.type)
    @property
    def format_char(self) -> str:
        return scalar_type_format_char(self.type)
    @property
    def is_pointer(self) -> bool:
        return scalar_type_pointer(self.type)
 class TypeInfo(NamedTuple):
    key: str
    size: int
    name: Optional[str] = None
    members: Optional[List[FieldListItem]] = None
    def is_scalar(self) -> bool:
        # TODO: distinction between a class with zero members and no vtable?
        return self.members is None
 def normalize_type_id(key: str) -> str:
    """Helper for TYPES parsing to ensure a consistent format.
    If key begins with "T_" it is a built-in type.
    Else it is a hex string. We prefer lower case letters and
    no leading zeroes. (UDT identifier pads to 8 characters.)"""
    if key[0] == "0":
        return f"0x{key[-4:].lower()}"
    # Remove numeric value for "T_" type. We don't use this.
    return key.partition("(")[0]
 def scalar_type_pointer(type_name: str) -> bool:
    return type_name.startswith("T_32P")
 def scalar_type_size(type_name: str) -> int:
    if scalar_type_pointer(type_name):
        return 4
    if "CHAR" in type_name:
        return 2 if "WCHAR" in type_name else 1
    if "SHORT" in type_name:
        return 2
    if "QUAD" in type_name or "64" in type_name:
        return 8
    return 4
 def scalar_type_signed(type_name: str) -> bool:
    if scalar_type_pointer(type_name):
        return False
    # According to cvinfo.h, T_WCHAR is unsigned
    return not type_name.startswith("T_U") and not type_name.startswith("T_W")
 def scalar_type_format_char(type_name: str) -> str:
    if scalar_type_pointer(type_name):
        return "L"
    # "Really a char"
    if type_name.startswith("T_RCHAR"):
        return "c"
    # floats
    if type_name.startswith("T_REAL"):
        return "d" if "64" in type_name else "f"
    size = scalar_type_size(type_name)
    char = ({1: "b", 2: "h", 4: "l", 8: "q"}).get(size, "l")
    return char if scalar_type_signed(type_name) else char.upper()
 def member_list_to_struct_string(members: List[ScalarType]) -> str:
    """Create a string for use with struct.unpack"""
    format_string = "".join(m.format_char for m in members)
    if len(format_string) > 0:
        return "<" + format_string
    return ""
 def join_member_names(parent: str, child: Optional[str]) -> str:
    """Helper method to combine parent/child member names.
    Child member name is None if the child is a scalar type."""
    if child is None:
        return parent
    # If the child is an array index, join without the dot
    if child.startswith("["):
        return f"{parent}{child}"
    return f"{parent}.{child}"
 class CvdumpTypesParser:
    """Parser for cvdump output, TYPES section.
    Tricky enough that it demands its own parser."""
    # Marks the start of a new type
    INDEX_RE = re.compile(r"(?P<key>0x\w+) : .* (?P<type>LF_\w+)")
    # LF_FIELDLIST class/struct member (1/2)
    LIST_RE = re.compile(
        r"\s+list\[\d+\] = LF_MEMBER, (?P<scope>\w+), type = (?P<type>.*), offset = (?P<offset>\d+)"
    )
    # LF_FIELDLIST vtable indicator
    VTABLE_RE = re.compile(r"^\s+list\[\d+\] = LF_VFUNCTAB")
    # LF_FIELDLIST superclass indicator
    SUPERCLASS_RE = re.compile(
        r"^\s+list\[\d+\] = LF_BCLASS, (?P<scope>\w+), type = (?P<type>.*), offset = (?P<offset>\d+)"
    )
    # LF_FIELDLIST member name (2/2)
    MEMBER_RE = re.compile(r"^\s+member name = '(?P<name>.*)'$")
    # LF_ARRAY element type
    ARRAY_ELEMENT_RE = re.compile(r"^\s+Element type = (?P<type>.*)")
    # LF_ARRAY total array size
    ARRAY_LENGTH_RE = re.compile(r"^\s+length = (?P<length>\d+)")
    # LF_CLASS/LF_STRUCTURE field list reference
    CLASS_FIELD_RE = re.compile(
        r"^\s+# members = \d+,  field list type (?P<field_type>0x\w+),"
    )
    # LF_CLASS/LF_STRUCTURE name and other info
    CLASS_NAME_RE = re.compile(
        r"^\s+Size = (?P<size>\d+), class name = (?P<name>.+), UDT\((?P<udt>0x\w+)\)"
    )
    # LF_MODIFIER, type being modified
    MODIFIES_RE = re.compile(r".*modifies type (?P<type>.*)$")
    MODES_OF_INTEREST = {
        "LF_ARRAY",
        "LF_CLASS",
        "LF_ENUM",
        "LF_FIELDLIST",
        "LF_MODIFIER",
        "LF_POINTER",
        "LF_STRUCTURE",
    }
    def __init__(self) -> None:
        self.mode: Optional[str] = None
        self.last_key = ""
        self.keys = {}
    def _new_type(self):
        """Prepare a new dict for the type we just parsed.
        The id is self.last_key and the "type" of type is self.mode.
        e.g. LF_CLASS"""
        self.keys[self.last_key] = {"type": self.mode}
    def _set(self, key: str, value):
        self.keys[self.last_key][key] = value
    def _add_member(self, offset: int, type_: str):
        obj = self.keys[self.last_key]
        if "members" not in obj:
            obj["members"] = []
        obj["members"].append({"offset": offset, "type": type_})
    def _set_member_name(self, name: str):
        """Set name for most recently added member."""
        obj = self.keys[self.last_key]
        obj["members"][-1]["name"] = name
    def _get_field_list(self, type_obj: Dict) -> List[FieldListItem]:
        """Return the field list for the given LF_CLASS/LF_STRUCTURE reference"""
        if type_obj.get("type") == "LF_FIELDLIST":
            field_obj = type_obj
        else:
            field_list_type = type_obj.get("field_list_type")
            field_obj = self.keys[field_list_type]
        members: List[FieldListItem] = []
        super_id = field_obj.get("super")
        if super_id is not None:
            # May need to resolve forward ref.
            superclass = self.get(super_id)
            if superclass.members is not None:
                members = superclass.members
        raw_members = field_obj.get("members", [])
        members += [
            FieldListItem(
                offset=m["offset"],
                type=m["type"],
                name=m["name"],
            )
            for m in raw_members
        ]
        return sorted(members, key=lambda m: m.offset)
    def _mock_array_members(self, type_obj: Dict) -> List[FieldListItem]:
        """LF_ARRAY elements provide the element type and the total size.
        We want the list of "members" as if this was a struct."""
        if type_obj.get("type") != "LF_ARRAY":
            raise CvdumpTypeError("Type is not an LF_ARRAY")
        array_type = type_obj.get("array_type")
        if array_type is None:
            raise CvdumpIntegrityError("No array element type")
        array_element_size = self.get(array_type).size
        n_elements = type_obj["size"] // array_element_size
        return [
            FieldListItem(
                offset=i * array_element_size,
                type=array_type,
                name=f"[{i}]",
            )
            for i in range(n_elements)
        ]
    def get(self, type_key: str) -> TypeInfo:
        """Convert our dictionary values read from the cvdump output
        into a consistent format for the given type."""
        # Scalar type. Handled here because it makes the recursive steps
        # much simpler.
        if type_key.startswith("T_"):
            size = scalar_type_size(type_key)
            return TypeInfo(
                key=type_key,
                size=size,
            )
        # Go to our dictionary to find it.
        obj = self.keys.get(type_key.lower())
        if obj is None:
            raise CvdumpKeyError(type_key)
        # These type references are just a wrapper around a scalar
        if obj.get("type") == "LF_ENUM":
            return self.get("T_INT4")
        if obj.get("type") == "LF_POINTER":
            return self.get("T_32PVOID")
        if obj.get("is_forward_ref", False):
            # Get the forward reference to follow.
            # If this is LF_CLASS/LF_STRUCTURE, it is the UDT value.
            # For LF_MODIFIER, it is the type being modified.
            forward_ref = obj.get("udt", None) or obj.get("modifies", None)
            if forward_ref is None:
                raise CvdumpIntegrityError(f"Null forward ref for type {type_key}")
            return self.get(forward_ref)
        # Else it is not a forward reference, so build out the object here.
        if obj.get("type") == "LF_ARRAY":
            members = self._mock_array_members(obj)
        else:
            members = self._get_field_list(obj)
        return TypeInfo(
            key=type_key,
            size=obj.get("size"),
            name=obj.get("name"),
            members=members,
        )
    def get_by_name(self, name: str) -> TypeInfo:
        """Find the complex type with the given name."""
        # TODO
        raise NotImplementedError
    def get_scalars(self, type_key: str) -> List[ScalarType]:
        """Reduce the given type to a list of scalars so we can
        compare each component value."""
        obj = self.get(type_key)
        if obj.is_scalar():
            # Use obj.key here for alias types like LF_POINTER
            return [ScalarType(offset=0, type=obj.key, name=None)]
        # mypy?
        assert obj.members is not None
        # Dedupe repeated offsets if this is a union type
        unique_offsets = {m.offset: m for m in obj.members}
        unique_members = [m for _, m in unique_offsets.items()]
        return [
            ScalarType(
                offset=m.offset + cm.offset,
                type=cm.type,
                name=join_member_names(m.name, cm.name),
            )
            for m in unique_members
            for cm in self.get_scalars(m.type)
        ]
    def get_scalars_gapless(self, type_key: str) -> List[ScalarType]:
        """Reduce the given type to a list of scalars so we can
        compare each component value."""
        obj = self.get(type_key)
        total_size = obj.size
        scalars = self.get_scalars(type_key)
        output = []
        last_extent = total_size
        # Walk the scalar list in reverse; we assume a gap could not
        # come at the start of the struct.
        for scalar in scalars[::-1]:
            this_extent = scalar.offset + scalar_type_size(scalar.type)
            size_diff = last_extent - this_extent
            # We need to add the gap fillers in reverse here
            for i in range(size_diff - 1, -1, -1):
                # Push to front
                output.insert(
                    0,
                    ScalarType(
                        offset=this_extent + i,
                        name="(padding)",
                        type="T_UCHAR",
                    ),
                )
            output.insert(0, scalar)
            last_extent = scalar.offset
        return output
    def get_format_string(self, type_key: str) -> str:
        members = self.get_scalars_gapless(type_key)
        return member_list_to_struct_string(members)
    def read_line(self, line: str):
        if (match := self.INDEX_RE.match(line)) is not None:
            type_ = match.group(2)
            if type_ not in self.MODES_OF_INTEREST:
                self.mode = None
                return
            # Don't need to normalize, it's already in the format we want
            self.last_key = match.group(1)
            self.mode = type_
            self._new_type()
            return
        if self.mode is None:
            return
        if self.mode == "LF_MODIFIER":
            if (match := self.MODIFIES_RE.match(line)) is not None:
                # For convenience, because this is essentially the same thing
                # as an LF_CLASS forward ref.
                self._set("is_forward_ref", True)
                self._set("modifies", normalize_type_id(match.group("type")))
        elif self.mode == "LF_ARRAY":
            if (match := self.ARRAY_ELEMENT_RE.match(line)) is not None:
                self._set("array_type", normalize_type_id(match.group("type")))
            elif (match := self.ARRAY_LENGTH_RE.match(line)) is not None:
                self._set("size", int(match.group("length")))
        elif self.mode == "LF_FIELDLIST":
            # If this class has a vtable, create a mock member at offset 0
            if (match := self.VTABLE_RE.match(line)) is not None:
                # For our purposes, any pointer type will do
                self._add_member(0, "T_32PVOID")
                self._set_member_name("vftable")
            # Superclass is set here in the fieldlist rather than in LF_CLASS
            elif (match := self.SUPERCLASS_RE.match(line)) is not None:
                self._set("super", normalize_type_id(match.group("type")))
            # Member offset and type given on the first of two lines.
            elif (match := self.LIST_RE.match(line)) is not None:
                self._add_member(
                    int(match.group("offset")), normalize_type_id(match.group("type"))
                )
            # Name of the member read on the second of two lines.
            elif (match := self.MEMBER_RE.match(line)) is not None:
                self._set_member_name(match.group("name"))
        else:  # LF_CLASS or LF_STRUCTURE
            # Match the reference to the associated LF_FIELDLIST
            if (match := self.CLASS_FIELD_RE.match(line)) is not None:
                if match.group("field_type") == "0x0000":
                    # Not redundant. UDT might not match the key.
                    # These cases get reported as UDT mismatch.
                    self._set("is_forward_ref", True)
                else:
                    field_list_type = normalize_type_id(match.group("field_type"))
                    self._set("field_list_type", field_list_type)
            # Last line has the vital information.
            # If this is a FORWARD REF, we need to follow the UDT pointer
            # to get the actual class details.
            elif (match := self.CLASS_NAME_RE.match(line)) is not None:
                self._set("name", match.group("name"))
                self._set("udt", normalize_type_id(match.group("udt")))
                self._set("size", int(match.group("size")))
--- a/tools/isledecomp/isledecomp/dir.py
+++ b/tools/isledecomp/isledecomp/dir.py
@ -1,103 +0,0 @@
 import os
 import subprocess
 import sys
 import pathlib
 from typing import Iterator
 def winepath_win_to_unix(path: str) -> str:
    return subprocess.check_output(["winepath", path], text=True).strip()
 def winepath_unix_to_win(path: str) -> str:
    return subprocess.check_output(["winepath", "-w", path], text=True).strip()
 class PathResolver:
    """Intended to resolve Windows/Wine paths used in the PDB (cvdump) output
    into a "canonical" format to be matched against code file paths from os.walk.
    MSVC may include files from the parent dir using `..`. We eliminate those and create
    an absolute path so that information about the same file under different names
    will be combined into the same record. (i.e. line_no/addr pairs from LINES section.)
    """
    def __init__(self, basedir) -> None:
        """basedir is the root path of the code directory in the format for your OS.
        We will convert it to a PureWindowsPath to be platform-independent
        and match that to the paths from the PDB."""
        # Memoize the converted paths. We will need to do this for each path
        # in the PDB, for each function in that file. (i.e. lots of repeated work)
        self._memo = {}
        # Convert basedir to an absolute path if it is not already.
        # If it is not absolute, we cannot do the path swap on unix.
        self._realdir = pathlib.Path(basedir).resolve()
        self._is_unix = os.name != "nt"
        if self._is_unix:
            self._basedir = pathlib.PureWindowsPath(
                winepath_unix_to_win(str(self._realdir))
            )
        else:
            self._basedir = self._realdir
    def _memo_wrapper(self, path_str: str) -> str:
        """Wrapper so we can memoize from the public caller method"""
        path = pathlib.PureWindowsPath(path_str)
        if not path.is_absolute():
            # pathlib syntactic sugar for path concat
            path = self._basedir / path
        if self._is_unix:
            # If the given path is relative to the basedir, deconstruct the path
            # and swap in our unix path to avoid an expensive call to winepath.
            try:
                # Will raise ValueError if we are not relative to the base.
                section = path.relative_to(self._basedir)
                # Should combine to pathlib.PosixPath
                mockpath = (self._realdir / section).resolve()
                if mockpath.is_file():
                    return str(mockpath)
            except ValueError:
                pass
            # We are not relative to the basedir, or our path swap attempt
            # did not point at an actual file. Either way, we are forced
            # to call winepath using our original path.
            return winepath_win_to_unix(str(path))
        # We must be on Windows. Convert back to WindowsPath.
        # The resolve() call will eliminate intermediate backdir references.
        return str(pathlib.Path(path).resolve())
    def resolve_cvdump(self, path_str: str) -> str:
        """path_str is in Windows/Wine path format.
        We will return a path in the format for the host OS."""
        if path_str not in self._memo:
            self._memo[path_str] = self._memo_wrapper(path_str)
        return self._memo[path_str]
 def is_file_cpp(filename: str) -> bool:
    (_, ext) = os.path.splitext(filename)
    return ext.lower() in (".h", ".cpp")
 def walk_source_dir(source: str, recursive: bool = True) -> Iterator[str]:
    """Generator to walk the given directory recursively and return
    any C++ files found."""
    source = os.path.abspath(source)
    for subdir, _, files in os.walk(source):
        for file in files:
            if is_file_cpp(file):
                yield os.path.join(subdir, file)
        if not recursive:
            break
 def get_file_in_script_dir(fn):
    return os.path.join(os.path.dirname(os.path.abspath(sys.argv[0])), fn)
--- a/tools/isledecomp/isledecomp/lib/DUMPBIN.EXE
+++ b/tools/isledecomp/isledecomp/lib/DUMPBIN.EXE
--- a/tools/isledecomp/isledecomp/lib/LINK.EXE
+++ b/tools/isledecomp/isledecomp/lib/LINK.EXE
--- a/tools/isledecomp/isledecomp/lib/MSPDB41.DLL
+++ b/tools/isledecomp/isledecomp/lib/MSPDB41.DLL
--- a/tools/isledecomp/isledecomp/lib/init.py
+++ b/tools/isledecomp/isledecomp/lib/init.py
@ -1,13 +0,0 @@
 """Provides a reference point for redistributed tools found in this directory.
 This allows you to get the path for these tools from a script run anywhere."""
 from os.path import join, dirname
 def lib_path() -> str:
    """Returns the directory for this module."""
    return dirname(__file__)
 def lib_path_join(name: str) -> str:
    """Convenience wrapper for os.path.join."""
    return join(lib_path(), name)
--- a/tools/isledecomp/isledecomp/lib/cvdump.exe
+++ b/tools/isledecomp/isledecomp/lib/cvdump.exe
--- a/tools/isledecomp/isledecomp/parser/init.py
+++ b/tools/isledecomp/isledecomp/parser/init.py
@ -1,3 +0,0 @@
 from .codebase import DecompCodebase
 from .parser import DecompParser
 from .linter import DecompLinter
--- a/tools/isledecomp/isledecomp/parser/codebase.py
+++ b/tools/isledecomp/isledecomp/parser/codebase.py
@ -1,57 +0,0 @@
 """For aggregating decomp markers read from an entire directory and for a single module."""
 from typing import Callable, Iterable, Iterator, List
 from .parser import DecompParser
 from .node import (
    ParserSymbol,
    ParserFunction,
    ParserVtable,
    ParserVariable,
    ParserString,
 )
 class DecompCodebase:
    def __init__(self, filenames: Iterable[str], module: str) -> None:
        self._symbols: List[ParserSymbol] = []
        parser = DecompParser()
        for filename in filenames:
            parser.reset()
            with open(filename, "r", encoding="utf-8") as f:
                parser.read_lines(f)
            for sym in parser.iter_symbols(module):
                sym.filename = filename
                self._symbols.append(sym)
    def prune_invalid_addrs(self, is_valid: Callable[int, bool]) -> List[ParserSymbol]:
        """Some decomp annotations might have an invalid address.
        Return the list of addresses where we fail the is_valid check,
        and remove those from our list of symbols."""
        invalid_symbols = [sym for sym in self._symbols if not is_valid(sym.offset)]
        self._symbols = [sym for sym in self._symbols if is_valid(sym.offset)]
        return invalid_symbols
    def iter_line_functions(self) -> Iterator[ParserFunction]:
        """Return lineref functions separately from nameref. Assuming the PDB matches
        the state of the source code, a line reference is a guaranteed match, even if
        multiple functions share the same name. (i.e. polymorphism)"""
        return filter(
            lambda s: isinstance(s, ParserFunction) and not s.is_nameref(),
            self._symbols,
        )
    def iter_name_functions(self) -> Iterator[ParserFunction]:
        return filter(
            lambda s: isinstance(s, ParserFunction) and s.is_nameref(), self._symbols
        )
    def iter_vtables(self) -> Iterator[ParserVtable]:
        return filter(lambda s: isinstance(s, ParserVtable), self._symbols)
    def iter_variables(self) -> Iterator[ParserVariable]:
        return filter(lambda s: isinstance(s, ParserVariable), self._symbols)
    def iter_strings(self) -> Iterator[ParserString]:
        return filter(lambda s: isinstance(s, ParserString), self._symbols)
--- a/tools/isledecomp/isledecomp/parser/error.py
+++ b/tools/isledecomp/isledecomp/parser/error.py
@ -1,97 +0,0 @@
 from enum import Enum
 from typing import Optional
 from dataclasses import dataclass
 # TODO: poorly chosen name, should be AlertType or AlertCode or something
 class ParserError(Enum):
    # WARN: Stub function exceeds some line number threshold
    UNLIKELY_STUB = 100
    # WARN: Decomp marker is close enough to be recognized, but does not follow syntax exactly
    BAD_DECOMP_MARKER = 101
    # WARN: Multiple markers in sequence do not have distinct modules
    DUPLICATE_MODULE = 102
    # WARN: Detected a dupcliate module/offset pair in the current file
    DUPLICATE_OFFSET = 103
    # WARN: We read a line that matches the decomp marker pattern, but we are not set up
    # to handle it
    BOGUS_MARKER = 104
    # WARN: New function marker appeared while we were inside a function
    MISSED_END_OF_FUNCTION = 105
    # WARN: If we find a curly brace right after the function declaration
    # this is wrong but we still have enough to make a match with reccmp
    MISSED_START_OF_FUNCTION = 106
    # WARN: A blank line appeared between the end of FUNCTION markers
    # and the start of the function. We can ignore it, but the line shouldn't be there
    UNEXPECTED_BLANK_LINE = 107
    # WARN: We called the finish() method for the parser but had not reached the starting
    # state of SEARCH
    UNEXPECTED_END_OF_FILE = 108
    # WARN: We found a marker to be referenced by name outside of a header file.
    BYNAME_FUNCTION_IN_CPP = 109
    # WARN: A GLOBAL marker appeared over a variable without the g_ prefix
    GLOBAL_MISSING_PREFIX = 110
    # WARN: GLOBAL marker points at something other than variable declaration.
    # We can't match global variables based on position, but the goal here is
    # to ignore things like string literal that are not variables.
    GLOBAL_NOT_VARIABLE = 111
    # WARN: A marked static variable inside a function needs to have its
    # function marked too, and in the same module.
    ORPHANED_STATIC_VARIABLE = 112
    # This code or higher is an error, not a warning
    DECOMP_ERROR_START = 200
    # ERROR: We found a marker unexpectedly
    UNEXPECTED_MARKER = 200
    # ERROR: We found a marker where we expected to find one, but it is incompatible
    # with the preceding markers.
    # For example, a GLOBAL cannot follow FUNCTION/STUB
    INCOMPATIBLE_MARKER = 201
    # ERROR: The line following an explicit by-name marker was not a comment
    # We assume a syntax error here rather than try to use the next line
    BAD_NAMEREF = 202
    # ERROR: This function offset comes before the previous offset from the same module
    # This hopefully gives some hint about which functions need to be rearranged.
    FUNCTION_OUT_OF_ORDER = 203
    # ERROR: The line following an explicit by-name marker that does _not_ expect
    # a comment -- i.e. VTABLE or GLOBAL -- could not extract the name
    NO_SUITABLE_NAME = 204
    # ERROR: Two STRING markers have the same module and offset, but the strings
    # they annotate are different.
    WRONG_STRING = 205
    # ERROR: This lineref FUNCTION marker is next to a function declaration or
    # forward reference. The correct place for the marker is where the function
    # is implemented so we can match with the PDB.
    NO_IMPLEMENTATION = 206
@dataclass
 class ParserAlert:
    code: ParserError
    line_number: int
    line: Optional[str] = None
    def is_warning(self) -> bool:
        return self.code.value < ParserError.DECOMP_ERROR_START.value
    def is_error(self) -> bool:
        return self.code.value >= ParserError.DECOMP_ERROR_START.value
--- a/tools/isledecomp/isledecomp/parser/linter.py
+++ b/tools/isledecomp/isledecomp/parser/linter.py
@ -1,144 +0,0 @@
 from typing import List, Optional
 from .parser import DecompParser
 from .error import ParserAlert, ParserError
 from .node import ParserSymbol, ParserString
 def get_checkorder_filter(module):
    """Return a filter function on implemented functions in the given module"""
    return lambda fun: fun.module == module and not fun.lookup_by_name
 class DecompLinter:
    def __init__(self) -> None:
        self.alerts: List[ParserAlert] = []
        self._parser = DecompParser()
        self._filename: str = ""
        self._module: Optional[str] = None
        # Set of (str, int) tuples for each module/offset pair seen while scanning.
        # This is _not_ reset between files and is intended to report offset reuse
        # when scanning the entire directory.
        self._offsets_used = set()
        # Keep track of strings we have seen. Persists across files.
        # Module/offset can be repeated for string markers but the strings must match.
        self._strings = {}
    def reset(self, full_reset: bool = False):
        self.alerts = []
        self._parser.reset()
        self._filename = ""
        self._module = None
        if full_reset:
            self._offsets_used.clear()
            self._strings = {}
    def file_is_header(self):
        return self._filename.lower().endswith(".h")
    def _load_offsets_from_list(self, marker_list: List[ParserSymbol]):
        """Helper for loading (module, offset) tuples while the DecompParser
        has them broken up into three different lists."""
        for marker in marker_list:
            is_string = isinstance(marker, ParserString)
            value = (marker.module, marker.offset)
            if value in self._offsets_used:
                if is_string:
                    if self._strings[value] != marker.name:
                        self.alerts.append(
                            ParserAlert(
                                code=ParserError.WRONG_STRING,
                                line_number=marker.line_number,
                                line=f"0x{marker.offset:08x}, {repr(self._strings[value])} vs. {repr(marker.name)}",
                            )
                        )
                else:
                    self.alerts.append(
                        ParserAlert(
                            code=ParserError.DUPLICATE_OFFSET,
                            line_number=marker.line_number,
                            line=f"0x{marker.offset:08x}",
                        )
                    )
            else:
                self._offsets_used.add(value)
                if is_string:
                    self._strings[value] = marker.name
    def _check_function_order(self):
        """Rules:
        1. Only markers that are implemented in the file are considered. This means we
        only look at markers that are cross-referenced with cvdump output by their line
        number. Markers with the lookup_by_name flag set are ignored because we cannot
        directly influence their order.
        2. Order should be considered for a single module only. If we have multiple
        markers for a single function (i.e. for LEGO1 functions linked statically to
        ISLE) then the virtual address space will be very different. If we don't check
        for one module only, we would incorrectly report that the file is out of order.
        """
        if self._module is None:
            return
        checkorder_filter = get_checkorder_filter(self._module)
        last_offset = None
        for fun in filter(checkorder_filter, self._parser.functions):
            if last_offset is not None:
                if fun.offset < last_offset:
                    self.alerts.append(
                        ParserAlert(
                            code=ParserError.FUNCTION_OUT_OF_ORDER,
                            line_number=fun.line_number,
                        )
                    )
            last_offset = fun.offset
    def _check_offset_uniqueness(self):
        self._load_offsets_from_list(self._parser.functions)
        self._load_offsets_from_list(self._parser.vtables)
        self._load_offsets_from_list(self._parser.variables)
        self._load_offsets_from_list(self._parser.strings)
    def _check_byname_allowed(self):
        if self.file_is_header():
            return
        for fun in self._parser.functions:
            if fun.lookup_by_name:
                self.alerts.append(
                    ParserAlert(
                        code=ParserError.BYNAME_FUNCTION_IN_CPP,
                        line_number=fun.line_number,
                    )
                )
    def check_lines(self, lines, filename, module=None):
        """`lines` is a generic iterable to allow for testing with a list of strings.
        We assume lines has the entire contents of the compilation unit."""
        self.reset(False)
        self._filename = filename
        self._module = module
        self._parser.read_lines(lines)
        self._parser.finish()
        self.alerts = self._parser.alerts[::]
        self._check_offset_uniqueness()
        if self._module is not None:
            self._check_byname_allowed()
            if not self.file_is_header():
                self._check_function_order()
        return len(self.alerts) == 0
    def check_file(self, filename, module=None):
        """Convenience method for decomplint cli tool"""
        with open(filename, "r", encoding="utf-8") as f:
            return self.check_lines(f, filename, module)
--- a/tools/isledecomp/isledecomp/parser/marker.py
+++ b/tools/isledecomp/isledecomp/parser/marker.py
@ -1,146 +0,0 @@
 import re
 from typing import Optional, Tuple
 from enum import Enum
 class MarkerCategory(Enum):
    """For the purposes of grouping multiple different DecompMarkers together,
    assign a rough "category" for the MarkerType values below.
    It's really only the function types that have to get folded down, but
    we'll do that in a structured way to permit future expansion."""
    FUNCTION = 1
    VARIABLE = 2
    STRING = 3
    VTABLE = 4
    ADDRESS = 100  # i.e. no comparison required or possible
 class MarkerType(Enum):
    UNKNOWN = -100
    FUNCTION = 1
    STUB = 2
    SYNTHETIC = 3
    TEMPLATE = 4
    GLOBAL = 5
    VTABLE = 6
    STRING = 7
    LIBRARY = 8
 markerRegex = re.compile(
    r"\s*//\s*(?P<type>\w+):\s*(?P<module>\w+)\s+(?P<offset>0x[a-f0-9]+) *(?P<extra>\S.+\S)?",
    flags=re.I,
 )
 markerExactRegex = re.compile(
    r"\s*// (?P<type>[A-Z]+): (?P<module>[A-Z0-9]+) (?P<offset>0x[a-f0-9]+)(?: (?P<extra>\S.+\S))?\n?$"
 )
 class DecompMarker:
    def __init__(
        self, marker_type: str, module: str, offset: int, extra: Optional[str] = None
    ) -> None:
        try:
            self._type = MarkerType[marker_type.upper()]
        except KeyError:
            self._type = MarkerType.UNKNOWN
        # Convert to upper here. A lot of other analysis depends on this name
        # being consistent and predictable. If the name is _not_ capitalized
        # we will emit a syntax error.
        self._module: str = module.upper()
        self._offset: int = offset
        self._extra: Optional[str] = extra
    @property
    def type(self) -> MarkerType:
        return self._type
    @property
    def module(self) -> str:
        return self._module
    @property
    def offset(self) -> int:
        return self._offset
    @property
    def extra(self) -> Optional[str]:
        return self._extra
    @property
    def category(self) -> MarkerCategory:
        if self.is_vtable():
            return MarkerCategory.VTABLE
        if self.is_variable():
            return MarkerCategory.VARIABLE
        if self.is_string():
            return MarkerCategory.STRING
        # TODO: worth another look if we add more types, but this covers it
        if self.is_regular_function() or self.is_explicit_byname():
            return MarkerCategory.FUNCTION
        return MarkerCategory.ADDRESS
    @property
    def key(self) -> Tuple[str, str, Optional[str]]:
        """For use with the MarkerDict. To detect/avoid marker collision."""
        return (self.category, self.module, self.extra)
    def is_regular_function(self) -> bool:
        """Regular function, meaning: not an explicit byname lookup. FUNCTION
        markers can be _implicit_ byname.
        FUNCTION and STUB markers are (currently) the only heterogenous marker types that
        can be lumped together, although the reasons for doing so are a little vague."""
        return self._type in (MarkerType.FUNCTION, MarkerType.STUB)
    def is_explicit_byname(self) -> bool:
        return self._type in (
            MarkerType.SYNTHETIC,
            MarkerType.TEMPLATE,
            MarkerType.LIBRARY,
        )
    def is_variable(self) -> bool:
        return self._type == MarkerType.GLOBAL
    def is_synthetic(self) -> bool:
        return self._type == MarkerType.SYNTHETIC
    def is_template(self) -> bool:
        return self._type == MarkerType.TEMPLATE
    def is_vtable(self) -> bool:
        return self._type == MarkerType.VTABLE
    def is_library(self) -> bool:
        return self._type == MarkerType.LIBRARY
    def is_string(self) -> bool:
        return self._type == MarkerType.STRING
    def allowed_in_func(self) -> bool:
        return self._type in (MarkerType.GLOBAL, MarkerType.STRING)
 def match_marker(line: str) -> Optional[DecompMarker]:
    match = markerRegex.match(line)
    if match is None:
        return None
    return DecompMarker(
        marker_type=match.group("type"),
        module=match.group("module"),
        offset=int(match.group("offset"), 16),
        extra=match.group("extra"),
    )
 def is_marker_exact(line: str) -> bool:
    return markerExactRegex.match(line) is not None
--- a/tools/isledecomp/isledecomp/parser/node.py
+++ b/tools/isledecomp/isledecomp/parser/node.py
@ -1,63 +0,0 @@
 from typing import Optional
 from dataclasses import dataclass
 from .marker import MarkerType
@dataclass
 class ParserSymbol:
    """Exported decomp marker with all information (except the code filename) required to
    cross-reference with cvdump data."""
    type: MarkerType
    line_number: int
    module: str
    offset: int
    name: str
    # The parser doesn't (currently) know about the code filename, but if you
    # wanted to set it here after the fact, here's the spot.
    filename: Optional[str] = None
    def should_skip(self) -> bool:
        """The default is to compare any symbols we have"""
        return False
    def is_nameref(self) -> bool:
        """All symbols default to name lookup"""
        return True
@dataclass
 class ParserFunction(ParserSymbol):
    # We are able to detect the closing line of a function with some reliability.
    # This isn't used for anything right now, but perhaps later it will be.
    end_line: Optional[int] = None
    # All marker types are referenced by name except FUNCTION/STUB. These can also be
    # referenced by name, but only if this flag is true.
    lookup_by_name: bool = False
    def should_skip(self) -> bool:
        return self.type == MarkerType.STUB
    def is_nameref(self) -> bool:
        return (
            self.type in (MarkerType.SYNTHETIC, MarkerType.TEMPLATE, MarkerType.LIBRARY)
            or self.lookup_by_name
        )
@dataclass
 class ParserVariable(ParserSymbol):
    is_static: bool = False
    parent_function: Optional[int] = None
@dataclass
 class ParserVtable(ParserSymbol):
    base_class: Optional[str] = None
@dataclass
 class ParserString(ParserSymbol):
    pass
--- a/tools/isledecomp/isledecomp/parser/parser.py
+++ b/tools/isledecomp/isledecomp/parser/parser.py
@ -1,556 +0,0 @@
 # C++ file parser
 from typing import List, Iterable, Iterator, Optional
 from enum import Enum
 from .util import (
    get_class_name,
    get_variable_name,
    get_synthetic_name,
    remove_trailing_comment,
    get_string_contents,
    sanitize_code_line,
    scopeDetectRegex,
 )
 from .marker import (
    DecompMarker,
    MarkerCategory,
    match_marker,
    is_marker_exact,
 )
 from .node import (
    ParserSymbol,
    ParserFunction,
    ParserVariable,
    ParserVtable,
    ParserString,
 )
 from .error import ParserAlert, ParserError
 class ReaderState(Enum):
    SEARCH = 0
    WANT_SIG = 1
    IN_FUNC = 2
    IN_TEMPLATE = 3
    WANT_CURLY = 4
    IN_GLOBAL = 5
    IN_FUNC_GLOBAL = 6
    IN_VTABLE = 7
    IN_SYNTHETIC = 8
    IN_LIBRARY = 9
    DONE = 100
 class MarkerDict:
    def __init__(self) -> None:
        self.markers: dict = {}
    def insert(self, marker: DecompMarker) -> bool:
        """Return True if this insert would overwrite"""
        if marker.key in self.markers:
            return True
        self.markers[marker.key] = marker
        return False
    def query(
        self, category: MarkerCategory, module: str, extra: Optional[str] = None
    ) -> Optional[DecompMarker]:
        return self.markers.get((category, module, extra))
    def iter(self) -> Iterator[DecompMarker]:
        for _, marker in self.markers.items():
            yield marker
    def empty(self):
        self.markers = {}
 class CurlyManager:
    """Overly simplified scope manager"""
    def __init__(self):
        self._stack = []
    def reset(self):
        self._stack = []
    def _pop(self):
        """Pop stack safely"""
        try:
            self._stack.pop()
        except IndexError:
            pass
    def get_prefix(self, name: Optional[str] = None) -> str:
        """Return the prefix for where we are."""
        scopes = [t for t in self._stack if t != "{"]
        if len(scopes) == 0:
            return name if name is not None else ""
        if name is not None and name not in scopes:
            scopes.append(name)
        return "::".join(scopes)
    def read_line(self, raw_line: str):
        """Read a line of code and update the stack."""
        line = sanitize_code_line(raw_line)
        if (match := scopeDetectRegex.match(line)) is not None:
            if not line.endswith(";"):
                self._stack.append(match.group("name"))
        change = line.count("{") - line.count("}")
        if change > 0:
            for _ in range(change):
                self._stack.append("{")
        elif change < 0:
            for _ in range(-change):
                self._pop()
            if len(self._stack) == 0:
                return
            last = self._stack[-1]
            if last != "{":
                self._pop()
 class DecompParser:
    # pylint: disable=too-many-instance-attributes
    # Could combine output lists into a single list to get under the limit,
    # but not right now
    def __init__(self) -> None:
        # The lists to be populated as we parse
        self._symbols: List[ParserSymbol] = []
        self.alerts: List[ParserAlert] = []
        self.line_number: int = 0
        self.state: ReaderState = ReaderState.SEARCH
        self.last_line: str = ""
        self.curly = CurlyManager()
        # To allow for multiple markers where code is shared across different
        # modules, save lists of compatible markers that appear in sequence
        self.fun_markers = MarkerDict()
        self.var_markers = MarkerDict()
        self.tbl_markers = MarkerDict()
        # To handle functions that are entirely indented (i.e. those defined
        # in class declarations), remember how many whitespace characters
        # came before the opening curly brace and match that up at the end.
        # This should give us the same or better accuracy for a well-formed file.
        # The alternative is counting the curly braces on each line
        # but that's probably too cumbersome.
        self.curly_indent_stops: int = 0
        # For non-synthetic functions, save the line number where the function begins
        # (i.e. where we see the curly brace) along with the function signature.
        # We will need both when we reach the end of the function.
        self.function_start: int = 0
        self.function_sig: str = ""
    def reset(self):
        self._symbols = []
        self.alerts = []
        self.line_number = 0
        self.state = ReaderState.SEARCH
        self.last_line = ""
        self.fun_markers.empty()
        self.var_markers.empty()
        self.tbl_markers.empty()
        self.curly_indent_stops = 0
        self.function_start = 0
        self.function_sig = ""
        self.curly.reset()
    @property
    def functions(self) -> List[ParserFunction]:
        return [s for s in self._symbols if isinstance(s, ParserFunction)]
    @property
    def vtables(self) -> List[ParserVtable]:
        return [s for s in self._symbols if isinstance(s, ParserVtable)]
    @property
    def variables(self) -> List[ParserVariable]:
        return [s for s in self._symbols if isinstance(s, ParserVariable)]
    @property
    def strings(self) -> List[ParserString]:
        return [s for s in self._symbols if isinstance(s, ParserString)]
    def iter_symbols(self, module: Optional[str] = None) -> Iterator[ParserSymbol]:
        for s in self._symbols:
            if module is None or s.module == module:
                yield s
    def _recover(self):
        """We hit a syntax error and need to reset temp structures"""
        self.state = ReaderState.SEARCH
        self.fun_markers.empty()
        self.var_markers.empty()
        self.tbl_markers.empty()
    def _syntax_warning(self, code):
        self.alerts.append(
            ParserAlert(
                line_number=self.line_number,
                code=code,
                line=self.last_line.strip(),
            )
        )
    def _syntax_error(self, code):
        self._syntax_warning(code)
        self._recover()
    def _function_starts_here(self):
        self.function_start = self.line_number
    def _function_marker(self, marker: DecompMarker):
        if self.fun_markers.insert(marker):
            self._syntax_warning(ParserError.DUPLICATE_MODULE)
        self.state = ReaderState.WANT_SIG
    def _nameref_marker(self, marker: DecompMarker):
        """Functions explicitly referenced by name are set here"""
        if self.fun_markers.insert(marker):
            self._syntax_warning(ParserError.DUPLICATE_MODULE)
        if marker.is_template():
            self.state = ReaderState.IN_TEMPLATE
        elif marker.is_synthetic():
            self.state = ReaderState.IN_SYNTHETIC
        else:
            self.state = ReaderState.IN_LIBRARY
    def _function_done(self, lookup_by_name: bool = False, unexpected: bool = False):
        end_line = self.line_number
        if unexpected:
            # If we missed the end of the previous function, assume it ended
            # on the previous line and that whatever we are tracking next
            # begins on the current line.
            end_line -= 1
        for marker in self.fun_markers.iter():
            self._symbols.append(
                ParserFunction(
                    type=marker.type,
                    line_number=self.function_start,
                    module=marker.module,
                    offset=marker.offset,
                    name=self.function_sig,
                    lookup_by_name=lookup_by_name,
                    end_line=end_line,
                )
            )
        self.fun_markers.empty()
        self.curly_indent_stops = 0
        self.state = ReaderState.SEARCH
    def _vtable_marker(self, marker: DecompMarker):
        if self.tbl_markers.insert(marker):
            self._syntax_warning(ParserError.DUPLICATE_MODULE)
        self.state = ReaderState.IN_VTABLE
    def _vtable_done(self, class_name: str = None):
        if class_name is None:
            # Best we can do
            class_name = self.last_line.strip()
        for marker in self.tbl_markers.iter():
            self._symbols.append(
                ParserVtable(
                    type=marker.type,
                    line_number=self.line_number,
                    module=marker.module,
                    offset=marker.offset,
                    name=self.curly.get_prefix(class_name),
                    base_class=marker.extra,
                )
            )
        self.tbl_markers.empty()
        self.state = ReaderState.SEARCH
    def _variable_marker(self, marker: DecompMarker):
        if self.var_markers.insert(marker):
            self._syntax_warning(ParserError.DUPLICATE_MODULE)
        if self.state in (ReaderState.IN_FUNC, ReaderState.IN_FUNC_GLOBAL):
            self.state = ReaderState.IN_FUNC_GLOBAL
        else:
            self.state = ReaderState.IN_GLOBAL
    def _variable_done(
        self, variable_name: Optional[str] = None, string_value: Optional[str] = None
    ):
        if variable_name is None and string_value is None:
            self._syntax_error(ParserError.NO_SUITABLE_NAME)
            return
        for marker in self.var_markers.iter():
            if marker.is_string():
                self._symbols.append(
                    ParserString(
                        type=marker.type,
                        line_number=self.line_number,
                        module=marker.module,
                        offset=marker.offset,
                        name=string_value,
                    )
                )
            else:
                parent_function = None
                is_static = self.state == ReaderState.IN_FUNC_GLOBAL
                # If this is a static variable, we need to get the function
                # where it resides so that we can match it up later with the
                # mangled names of both variable and function from cvdump.
                if is_static:
                    fun_marker = self.fun_markers.query(
                        MarkerCategory.FUNCTION, marker.module
                    )
                    if fun_marker is None:
                        self._syntax_warning(ParserError.ORPHANED_STATIC_VARIABLE)
                        continue
                    parent_function = fun_marker.offset
                self._symbols.append(
                    ParserVariable(
                        type=marker.type,
                        line_number=self.line_number,
                        module=marker.module,
                        offset=marker.offset,
                        name=self.curly.get_prefix(variable_name),
                        is_static=is_static,
                        parent_function=parent_function,
                    )
                )
        self.var_markers.empty()
        if self.state == ReaderState.IN_FUNC_GLOBAL:
            self.state = ReaderState.IN_FUNC
        else:
            self.state = ReaderState.SEARCH
    def _handle_marker(self, marker: DecompMarker):
        # Cannot handle any markers between function sig and opening curly brace
        if self.state == ReaderState.WANT_CURLY:
            self._syntax_error(ParserError.UNEXPECTED_MARKER)
            return
        # If we are inside a function, the only markers we accept are:
        # GLOBAL, indicating a static variable
        # STRING, indicating a literal string.
        # Otherwise we assume that the parser missed the end of the function
        # and we have moved on to something else.
        # This is unlikely to occur with well-formed code, but
        # we can recover easily by just ending the function here.
        if self.state == ReaderState.IN_FUNC and not marker.allowed_in_func():
            self._syntax_warning(ParserError.MISSED_END_OF_FUNCTION)
            self._function_done(unexpected=True)
        # TODO: How uncertain are we of detecting the end of a function
        # in a clang-formatted file? For now we assume we have missed the
        # end if we detect a non-GLOBAL marker while state is IN_FUNC.
        # Maybe these cases should be syntax errors instead
        if marker.is_regular_function():
            if self.state in (
                ReaderState.SEARCH,
                ReaderState.WANT_SIG,
            ):
                # We will allow multiple offsets if we have just begun
                # the code block, but not after we hit the curly brace.
                self._function_marker(marker)
            else:
                self._syntax_error(ParserError.INCOMPATIBLE_MARKER)
        elif marker.is_template():
            if self.state in (ReaderState.SEARCH, ReaderState.IN_TEMPLATE):
                self._nameref_marker(marker)
            else:
                self._syntax_error(ParserError.INCOMPATIBLE_MARKER)
        elif marker.is_synthetic():
            if self.state in (ReaderState.SEARCH, ReaderState.IN_SYNTHETIC):
                self._nameref_marker(marker)
            else:
                self._syntax_error(ParserError.INCOMPATIBLE_MARKER)
        elif marker.is_library():
            if self.state in (ReaderState.SEARCH, ReaderState.IN_LIBRARY):
                self._nameref_marker(marker)
            else:
                self._syntax_error(ParserError.INCOMPATIBLE_MARKER)
        # Strings and variables are almost the same thing
        elif marker.is_string() or marker.is_variable():
            if self.state in (
                ReaderState.SEARCH,
                ReaderState.IN_GLOBAL,
                ReaderState.IN_FUNC,
                ReaderState.IN_FUNC_GLOBAL,
            ):
                self._variable_marker(marker)
            else:
                self._syntax_error(ParserError.INCOMPATIBLE_MARKER)
        elif marker.is_vtable():
            if self.state in (ReaderState.SEARCH, ReaderState.IN_VTABLE):
                self._vtable_marker(marker)
            else:
                self._syntax_error(ParserError.INCOMPATIBLE_MARKER)
        else:
            self._syntax_warning(ParserError.BOGUS_MARKER)
    def read_line(self, line: str):
        if self.state == ReaderState.DONE:
            return
        self.last_line = line  # TODO: Useful or hack for error reporting?
        self.line_number += 1
        marker = match_marker(line)
        if marker is not None:
            # TODO: what's the best place for this?
            # Does it belong with reading or marker handling?
            if not is_marker_exact(self.last_line):
                self._syntax_warning(ParserError.BAD_DECOMP_MARKER)
            self._handle_marker(marker)
            return
        self.curly.read_line(line)
        line_strip = line.strip()
        if self.state in (
            ReaderState.IN_SYNTHETIC,
            ReaderState.IN_TEMPLATE,
            ReaderState.IN_LIBRARY,
        ):
            # Explicit nameref functions provide the function name
            # on the next line (in a // comment)
            name = get_synthetic_name(line)
            if name is None:
                self._syntax_error(ParserError.BAD_NAMEREF)
            else:
                self.function_sig = name
                self._function_starts_here()
                self._function_done(lookup_by_name=True)
        elif self.state == ReaderState.WANT_SIG:
            # Ignore blanks on the way to function start or function name
            if len(line_strip) == 0:
                self._syntax_warning(ParserError.UNEXPECTED_BLANK_LINE)
            elif line_strip.startswith("//"):
                # If we found a comment, assume implicit lookup-by-name
                # function and end here. We know this is not a decomp marker
                # because it would have been handled already.
                self.function_sig = get_synthetic_name(line)
                self._function_starts_here()
                self._function_done(lookup_by_name=True)
            elif line_strip == "{":
                # We missed the function signature but we can recover from this
                self.function_sig = "(unknown)"
                self._function_starts_here()
                self._syntax_warning(ParserError.MISSED_START_OF_FUNCTION)
                self.state = ReaderState.IN_FUNC
            else:
                # Inline functions may end with a comment. Strip that out
                # to help parsing.
                self.function_sig = remove_trailing_comment(line_strip)
                # Now check to see if the opening curly bracket is on the
                # same line. clang-format should prevent this (BraceWrapping)
                # but it is easy to detect.
                # If the entire function is on one line, handle that too.
                if self.function_sig.endswith("{"):
                    self._function_starts_here()
                    self.state = ReaderState.IN_FUNC
                elif self.function_sig.endswith("}") or self.function_sig.endswith(
                    "};"
                ):
                    self._function_starts_here()
                    self._function_done()
                elif self.function_sig.endswith(");"):
                    # Detect forward reference or declaration
                    self._syntax_error(ParserError.NO_IMPLEMENTATION)
                else:
                    self.state = ReaderState.WANT_CURLY
        elif self.state == ReaderState.WANT_CURLY:
            if line_strip == "{":
                self.curly_indent_stops = line.index("{")
                self._function_starts_here()
                self.state = ReaderState.IN_FUNC
        elif self.state == ReaderState.IN_FUNC:
            if line_strip.startswith("}") and line[self.curly_indent_stops] == "}":
                self._function_done()
        elif self.state in (ReaderState.IN_GLOBAL, ReaderState.IN_FUNC_GLOBAL):
            # TODO: Known problem that an error here will cause us to abandon a
            # function we have already parsed if state == IN_FUNC_GLOBAL.
            # However, we are not tolerant of _any_ syntax problems in our
            # CI actions, so the solution is to just fix the invalid marker.
            variable_name = None
            global_markers_queued = any(
                m.is_variable() for m in self.var_markers.iter()
            )
            if len(line_strip) == 0:
                self._syntax_warning(ParserError.UNEXPECTED_BLANK_LINE)
                return
            if global_markers_queued:
                # Not the greatest solution, but a consequence of combining GLOBAL and
                # STRING markers together. If the marker precedes a return statement, it is
                # valid for a STRING marker to be here, but not a GLOBAL. We need to look
                # ahead and tell whether this *would* fail.
                if line_strip.startswith("return"):
                    self._syntax_error(ParserError.GLOBAL_NOT_VARIABLE)
                    return
                if line_strip.startswith("//"):
                    # If we found a comment, assume implicit lookup-by-name
                    # function and end here. We know this is not a decomp marker
                    # because it would have been handled already.
                    variable_name = get_synthetic_name(line)
                else:
                    variable_name = get_variable_name(line)
            string_name = get_string_contents(line)
            self._variable_done(variable_name, string_name)
        elif self.state == ReaderState.IN_VTABLE:
            vtable_class = get_class_name(line)
            if vtable_class is not None:
                self._vtable_done(class_name=vtable_class)
    def read_lines(self, lines: Iterable):
        for line in lines:
            self.read_line(line)
    def finish(self):
        if self.state != ReaderState.SEARCH:
            self._syntax_warning(ParserError.UNEXPECTED_END_OF_FILE)
        self.state = ReaderState.DONE
--- a/tools/isledecomp/isledecomp/parser/util.py
+++ b/tools/isledecomp/isledecomp/parser/util.py
@ -1,141 +0,0 @@
 # C++ Parser utility functions and data structures
 import re
 from typing import Optional
 from ast import literal_eval
 # The goal here is to just read whatever is on the next line, so some
 # flexibility in the formatting seems OK
 templateCommentRegex = re.compile(r"\s*//\s+(.*)")
 # To remove any comment (//) or block comment (/*) and its leading spaces
 # from the end of a code line
 trailingCommentRegex = re.compile(r"(\s*(?://|/\*).*)$")
 # Get char contents, ignore escape characters
 singleQuoteRegex = re.compile(r"('(?:[^\'\\]|\\.)')")
 # Match contents of block comment on one line
 blockCommentRegex = re.compile(r"(/\*.*?\*/)")
 # Match contents of single comment on one line
 regularCommentRegex = re.compile(r"(//.*)")
 # Get string contents, ignore escape characters that might interfere
 doubleQuoteRegex = re.compile(r"(\"(?:[^\"\\]|\\.)*\")")
 # Detect a line that would cause us to enter a new scope
 scopeDetectRegex = re.compile(r"(?:class|struct|namespace) (?P<name>\w+).*(?:{)?")
 def get_synthetic_name(line: str) -> Optional[str]:
    """Synthetic names appear on a single line comment on the line after the marker.
    If that's not what we have, return None"""
    template_match = templateCommentRegex.match(line)
    if template_match is not None:
        return template_match.group(1)
    return None
 def sanitize_code_line(line: str) -> str:
    """Helper for scope manager. Removes sections from a code line
    that would cause us to incorrectly detect curly brackets.
    This is a very naive implementation and fails entirely on multi-line
    strings or comments."""
    line = singleQuoteRegex.sub("''", line)
    line = doubleQuoteRegex.sub('""', line)
    line = blockCommentRegex.sub("", line)
    line = regularCommentRegex.sub("", line)
    return line.strip()
 def remove_trailing_comment(line: str) -> str:
    return trailingCommentRegex.sub("", line)
 def is_blank_or_comment(line: str) -> bool:
    """Helper to read ahead after the offset comment is matched.
    There could be blank lines or other comments before the
    function signature, and we want to skip those."""
    line_strip = line.strip()
    return (
        len(line_strip) == 0
        or line_strip.startswith("//")
        or line_strip.startswith("/*")
        or line_strip.endswith("*/")
    )
 template_regex = re.compile(r"<(?P<type>[\w]+)\s*(?P<asterisks>\*+)?\s*>")
 class_decl_regex = re.compile(
    r"\s*(?:\/\/)?\s*(?:class|struct) ((?:\w+(?:<.+>)?(?:::)?)+)"
 )
 def template_replace(match: re.Match) -> str:
    (type_name, asterisks) = match.groups()
    if asterisks is None:
        return f"<{type_name}>"
    return f"<{type_name} {asterisks}>"
 def fix_template_type(class_name: str) -> str:
    """For template classes, we should reformat the class name so it matches
    the output from cvdump: one space between the template type and any asterisks
    if it is a pointer type."""
    if "<" not in class_name:
        return class_name
    return template_regex.sub(template_replace, class_name)
 def get_class_name(line: str) -> Optional[str]:
    """For VTABLE markers, extract the class name from the code line or comment
    where it appears."""
    match = class_decl_regex.match(line)
    if match is not None:
        return fix_template_type(match.group(1))
    return None
 global_regex = re.compile(r"(?P<name>(?:\w+::)*g_\w+)")
 less_strict_global_regex = re.compile(r"(?P<name>(?:\w+::)*\w+)(?:\)\(|\[.*|\s*=.*|;)")
 def get_variable_name(line: str) -> Optional[str]:
    """Grab the name of the variable annotated with the GLOBAL marker.
    Correct syntax would have the variable start with the prefix "g_"
    but we will try to match regardless."""
    if (match := global_regex.search(line)) is not None:
        return match.group("name")
    if (match := less_strict_global_regex.search(line)) is not None:
        return match.group("name")
    return None
 def get_string_contents(line: str) -> Optional[str]:
    """Return the first C string seen on this line.
    We have to unescape the string, and a simple way to do that is to use
    python's ast.literal_eval. I'm sure there are many pitfalls to doing
    it this way, but hopefully the regex will ensure reasonably sane input."""
    try:
        if (match := doubleQuoteRegex.search(line)) is not None:
            return literal_eval(match.group(1))
    # pylint: disable=broad-exception-caught
    # No way to predict what kind of exception could occur.
    except Exception:
        pass
    return None
--- a/tools/isledecomp/isledecomp/types.py
+++ b/tools/isledecomp/isledecomp/types.py
@ -1,13 +0,0 @@
 """Types shared by other modules"""
 from enum import Enum
 class SymbolType(Enum):
    """Broadly tells us what kind of comparison is required for this symbol."""
    FUNCTION = 1
    DATA = 2
    POINTER = 3
    STRING = 4
    VTABLE = 5
    FLOAT = 6
--- a/tools/isledecomp/isledecomp/utils.py
+++ b/tools/isledecomp/isledecomp/utils.py
@ -1,308 +0,0 @@
 import os
 import sys
 from datetime import datetime
 import logging
 import colorama
 def print_combined_diff(udiff, plain: bool = False, show_both: bool = False):
    if udiff is None:
        return
    # We don't know how long the address string will be ahead of time.
    # Set this value for each address to try to line things up.
    padding_size = 0
    for slug, subgroups in udiff:
        if plain:
            print("---")
            print("+++")
            print(slug)
        else:
            print(f"{colorama.Fore.RED}---")
            print(f"{colorama.Fore.GREEN}+++")
            print(f"{colorama.Fore.BLUE}{slug}")
            print(colorama.Style.RESET_ALL, end="")
        for subgroup in subgroups:
            equal = subgroup.get("both") is not None
            if equal:
                for orig_addr, line, recomp_addr in subgroup["both"]:
                    padding_size = max(padding_size, len(orig_addr))
                    if show_both:
                        print(f"{orig_addr} / {recomp_addr} : {line}")
                    else:
                        print(f"{orig_addr} : {line}")
            else:
                for orig_addr, line in subgroup["orig"]:
                    padding_size = max(padding_size, len(orig_addr))
                    addr_prefix = (
                        f"{orig_addr} / {'':{padding_size}}" if show_both else orig_addr
                    )
                    if plain:
                        print(f"{addr_prefix} : -{line}")
                    else:
                        print(
                            f"{addr_prefix} : {colorama.Fore.RED}-{line}{colorama.Style.RESET_ALL}"
                        )
                for recomp_addr, line in subgroup["recomp"]:
                    padding_size = max(padding_size, len(recomp_addr))
                    addr_prefix = (
                        f"{'':{padding_size}} / {recomp_addr}"
                        if show_both
                        else " " * padding_size
                    )
                    if plain:
                        print(f"{addr_prefix} : +{line}")
                    else:
                        print(
                            f"{addr_prefix} : {colorama.Fore.GREEN}+{line}{colorama.Style.RESET_ALL}"
                        )
        # Newline between each diff subgroup.
        print()
 def print_diff(udiff, plain):
    """Print diff in difflib.unified_diff format."""
    if udiff is None:
        return False
    has_diff = False
    for line in udiff:
        has_diff = True
        color = ""
        if line.startswith("++") or line.startswith("@@") or line.startswith("--"):
            # Skip unneeded parts of the diff for the brief view
            continue
        # Work out color if we are printing color
        if not plain:
            if line.startswith("+"):
                color = colorama.Fore.GREEN
            elif line.startswith("-"):
                color = colorama.Fore.RED
        print(color + line)
        # Reset color if we're printing in color
        if not plain:
            print(colorama.Style.RESET_ALL, end="")
    return has_diff
 def get_percent_color(value: float) -> str:
    """Return colorama ANSI escape character for the given decimal value."""
    if value == 1.0:
        return colorama.Fore.GREEN
    if value > 0.8:
        return colorama.Fore.YELLOW
    return colorama.Fore.RED
 def percent_string(
    ratio: float, is_effective: bool = False, is_plain: bool = False
 ) -> str:
    """Helper to construct a percentage string from the given ratio.
    If is_effective (i.e. effective match), indicate that with the asterisk.
    If is_plain, don't use colorama ANSI codes."""
    percenttext = f"{(ratio * 100):.2f}%"
    effective_star = "*" if is_effective else ""
    if is_plain:
        return percenttext + effective_star
    return "".join(
        [
            get_percent_color(ratio),
            percenttext,
            colorama.Fore.RED if is_effective else "",
            effective_star,
            colorama.Style.RESET_ALL,
        ]
    )
 def diff_json_display(show_both_addrs: bool = False, is_plain: bool = False):
    """Generate a function that will display the diff according to
    the reccmp display preferences."""
    def formatter(orig_addr, saved, new) -> str:
        old_pct = "new"
        new_pct = "gone"
        name = ""
        recomp_addr = "n/a"
        if new is not None:
            new_pct = (
                "stub"
                if new.get("stub", False)
                else percent_string(
                    new["matching"], new.get("effective", False), is_plain
                )
            )
            # Prefer the current name of this function if we have it.
            # We are using the original address as the key.
            # A function being renamed is not of interest here.
            name = new.get("name", "")
            recomp_addr = new.get("recomp", "n/a")
        if saved is not None:
            old_pct = (
                "stub"
                if saved.get("stub", False)
                else percent_string(
                    saved["matching"], saved.get("effective", False), is_plain
                )
            )
            if name == "":
                name = saved.get("name", "")
        if show_both_addrs:
            addr_string = f"{orig_addr} / {recomp_addr:10}"
        else:
            addr_string = orig_addr
        # The ANSI codes from colorama counted towards string length,
        # so displaying this as an ascii-like spreadsheet
        # (using f-string formatting) would take some effort.
        return f"{addr_string} - {name} ({old_pct} -> {new_pct})"
    return formatter
 def diff_json(
    saved_data,
    new_data,
    orig_file: str,
    show_both_addrs: bool = False,
    is_plain: bool = False,
 ):
    """Using a saved copy of the diff summary and the current data, print a
    report showing which functions/symbols have changed match percentage."""
    # Don't try to diff a report generated for a different binary file
    base_file = os.path.basename(orig_file).lower()
    if saved_data.get("file") != base_file:
        logging.getLogger().error(
            "Diff report for '%s' does not match current file '%s'",
            saved_data.get("file"),
            base_file,
        )
        return
    if "timestamp" in saved_data:
        now = datetime.now().replace(microsecond=0)
        then = datetime.fromtimestamp(saved_data["timestamp"]).replace(microsecond=0)
        print(
            " ".join(
                [
                    "Saved diff report generated",
                    then.strftime("%B %d %Y, %H:%M:%S"),
                    f"({str(now - then)} ago)",
                ]
            )
        )
        print()
    # Convert to dict, using orig_addr as key
    saved_invert = {obj["address"]: obj for obj in saved_data["data"]}
    new_invert = {obj["address"]: obj for obj in new_data}
    all_addrs = set(saved_invert.keys()).union(new_invert.keys())
    # Put all the information in one place so we can decide how each item changed.
    combined = {
        addr: (
            saved_invert.get(addr),
            new_invert.get(addr),
        )
        for addr in sorted(all_addrs)
    }
    # The criteria for diff judgement is in these dict comprehensions:
    # Any function not in the saved file
    new_functions = {
        key: (saved, new) for key, (saved, new) in combined.items() if saved is None
    }
    # Any function now missing from the saved file
    # or a non-stub -> stub conversion
    dropped_functions = {
        key: (saved, new)
        for key, (saved, new) in combined.items()
        if new is None
        or (
            new is not None
            and saved is not None
            and new.get("stub", False)
            and not saved.get("stub", False)
        )
    }
    # TODO: move these two into functions if the assessment gets more complex
    # Any function with increased match percentage
    # or stub -> non-stub conversion
    improved_functions = {
        key: (saved, new)
        for key, (saved, new) in combined.items()
        if saved is not None
        and new is not None
        and (
            new["matching"] > saved["matching"]
            or (not new.get("stub", False) and saved.get("stub", False))
        )
    }
    # Any non-stub function with decreased match percentage
    degraded_functions = {
        key: (saved, new)
        for key, (saved, new) in combined.items()
        if saved is not None
        and new is not None
        and new["matching"] < saved["matching"]
        and not saved.get("stub")
        and not new.get("stub")
    }
    # Any function with former or current "effective" match
    entropy_functions = {
        key: (saved, new)
        for key, (saved, new) in combined.items()
        if saved is not None
        and new is not None
        and new["matching"] == 1.0
        and saved["matching"] == 1.0
        and new.get("effective", False) != saved.get("effective", False)
    }
    get_diff_str = diff_json_display(show_both_addrs, is_plain)
    for diff_name, diff_dict in [
        ("New", new_functions),
        ("Increased", improved_functions),
        ("Decreased", degraded_functions),
        ("Dropped", dropped_functions),
        ("Compiler entropy", entropy_functions),
    ]:
        if len(diff_dict) == 0:
            continue
        print(f"{diff_name} ({len(diff_dict)}):")
        for addr, (saved, new) in diff_dict.items():
            print(get_diff_str(addr, saved, new))
        print()
 def get_file_in_script_dir(fn):
    return os.path.join(os.path.dirname(os.path.abspath(sys.argv[0])), fn)
--- a/tools/isledecomp/setup.py
+++ b/tools/isledecomp/setup.py
@ -1,11 +0,0 @@
 from setuptools import setup, find_packages
 setup(
    name="isledecomp",
    version="0.1.0",
    description="Python tools for the isledecomp project",
    packages=find_packages(),
    tests_require=["pytest"],
    include_package_data=True,
    package_data={"isledecomp.lib": ["*.exe", "*.dll"]},
 )
--- a/tools/isledecomp/tests/init.py
+++ b/tools/isledecomp/tests/init.py
--- a/tools/isledecomp/tests/conftest.py
+++ b/tools/isledecomp/tests/conftest.py
@ -1,3 +0,0 @@
 def pytest_addoption(parser):
    """Allow the option to run tests against the original LEGO1.DLL."""
    parser.addoption("--lego1", action="store", help="Path to LEGO1.DLL")
--- a/tools/isledecomp/tests/samples/basic_class.cpp
+++ b/tools/isledecomp/tests/samples/basic_class.cpp
@ -1,30 +0,0 @@
 // Sample for python unit tests
 // Not part of the decomp
 // A very simple class
 // VTABLE: TEST 0x1001002
 class TestClass {
 public:
  TestClass();
  virtual ~TestClass() override;
  virtual MxResult Tickle() override; // vtable+08
  // FUNCTION: TEST 0x12345678
  inline const char* ClassName() const // vtable+0c
  {
    // 0xabcd1234
    return "TestClass";
  }
  // FUNCTION: TEST 0xdeadbeef
  inline MxBool IsA(const char* name) const override // vtable+10
  {
    return !strcmp(name, TestClass::ClassName());
  }
 private:
  int m_hello;
  int m_hiThere;
 };
--- a/tools/isledecomp/tests/samples/basic_file.cpp
+++ b/tools/isledecomp/tests/samples/basic_file.cpp
@ -1,22 +0,0 @@
 // Sample for python unit tests
 // Not part of the decomp
 // A very simple well-formed code file
 // FUNCTION: TEST 0x1234
 void function01()
 {
  // TODO
 }
 // FUNCTION: TEST 0x2345
 void function02()
 {
  // TODO
 }
 // FUNCTION: TEST 0x3456
 void function03()
 {
  // TODO
 }
--- a/tools/isledecomp/tests/samples/global_variables.cpp
+++ b/tools/isledecomp/tests/samples/global_variables.cpp
@ -1,14 +0,0 @@
 // Sample for python unit tests
 // Not part of the decomp
 // Global variables inside and outside of functions
 // GLOBAL: TEST 0x1000
 const char *g_message = "test";
 // FUNCTION: TEST 0x1234
 void function01()
 {
  // GLOBAL: TEST 0x5555
  static int g_hello = 123;
 }
--- a/tools/isledecomp/tests/samples/inline.cpp
+++ b/tools/isledecomp/tests/samples/inline.cpp
@ -1,8 +0,0 @@
 // Sample for python unit tests
 // Not part of the decomp
 // FUNCTION: TEST 0x10000001
 inline const char* OneLineWithComment() const { return "MxDSObject"; }; // hi there
 // FUNCTION: TEST 0x10000002
 inline const char* OneLine() const { return "MxDSObject"; };
--- a/tools/isledecomp/tests/samples/missing_offset.cpp
+++ b/tools/isledecomp/tests/samples/missing_offset.cpp
@ -1,16 +0,0 @@
 // Sample for python unit tests
 // Not part of the decomp
 #include <stdio.h>
 int no_offset_comment()
 {
  static int dummy = 123;
  return -1;
 }
 // FUNCTION: TEST 0xdeadbeef
 void regular_ole_function()
 {
  printf("hi there");
 }
--- a/tools/isledecomp/tests/samples/multiple_offsets.cpp
+++ b/tools/isledecomp/tests/samples/multiple_offsets.cpp
@ -1,25 +0,0 @@
 // Sample for python unit tests
 // Not part of the decomp
 // Handling multiple offset markers
 // FUNCTION: TEST 0x1234
 // FUNCTION: HELLO 0x5555
 void different_modules()
 {
  // TODO
 }
 // FUNCTION: TEST 0x2345
 // FUNCTION: TEST 0x1234
 void same_module()
 {
  // TODO
 }
 // FUNCTION: TEST 0x2002
 // FUNCTION: test 0x1001
 void same_case_insensitive()
 {
  // TODO
 }
--- a/tools/isledecomp/tests/samples/oneline_function.cpp
+++ b/tools/isledecomp/tests/samples/oneline_function.cpp
@ -1,12 +0,0 @@
 // Sample for python unit tests
 // Not part of the decomp
 // FUNCTION: TEST 0x1234
 void short_function() { static char* msg = "oneliner"; }
 // FUNCTION: TEST 0x5555
 void function_after_one_liner()
 {
  // This function comes after the previous that is on a single line.
  // Do we report the offset for this one correctly?
 }
--- a/tools/isledecomp/tests/samples/out_of_order.cpp
+++ b/tools/isledecomp/tests/samples/out_of_order.cpp
@ -1,20 +0,0 @@
 // Sample for python unit tests
 // Not part of the decomp
 // FUNCTION: TEST 0x1001
 void function_order01()
 {
    // TODO
 }
 // FUNCTION: TEST 0x1003
 void function_order03()
 {
    // TODO
 }
 // FUNCTION: TEST 0x1002
 void function_order02()
 {
    // TODO
 }
--- a/tools/isledecomp/tests/samples/poorly_formatted.cpp
+++ b/tools/isledecomp/tests/samples/poorly_formatted.cpp
@ -1,23 +0,0 @@
 // Sample for python unit tests
 // Not part of the decomp
 // While it's reasonable to expect a well-formed file (and clang-format
 // will make sure we get one), this will put the parser through its paces.
 // FUNCTION: TEST 0x1234
 void curly_with_spaces()
  {
  static char* msg = "hello";
  }
 // FUNCTION: TEST 0x5555
 void weird_closing_curly()
 {
  int x = 123; }
 // FUNCTION: HELLO 0x5656
 void bad_indenting() {
  if (0)
 {
  int y = 5;
 }}
--- a/tools/isledecomp/tests/test_compare_db.py
+++ b/tools/isledecomp/tests/test_compare_db.py
@ -1,82 +0,0 @@
 """Testing compare database behavior, particularly matching"""
 import pytest
 from isledecomp.compare.db import CompareDb
@pytest.fixture(name="db")
 def fixture_db():
    return CompareDb()
 def test_ignore_recomp_collision(db):
    """Duplicate recomp addresses are ignored"""
    db.set_recomp_symbol(0x1234, None, "hello", None, 100)
    db.set_recomp_symbol(0x1234, None, "alias_for_hello", None, 100)
    syms = db.get_all()
    assert len(syms) == 1
 def test_orig_collision(db):
    """Don't match if the original address is not unique"""
    db.set_recomp_symbol(0x1234, None, "hello", None, 100)
    assert db.match_function(0x5555, "hello") is True
    # Second run on same address fails
    assert db.match_function(0x5555, "hello") is False
    # Call set_pair directly without wrapper
    assert db.set_pair(0x5555, 0x1234) is False
 def test_name_match(db):
    db.set_recomp_symbol(0x1234, None, "hello", None, 100)
    assert db.match_function(0x5555, "hello") is True
    match = db.get_by_orig(0x5555)
    assert match.name == "hello"
    assert match.recomp_addr == 0x1234
 def test_match_decorated(db):
    """Should match using decorated name even though regular name is null"""
    db.set_recomp_symbol(0x1234, None, None, "?_hello", 100)
    assert db.match_function(0x5555, "?_hello") is True
    match = db.get_by_orig(0x5555)
    assert match is not None
 def test_duplicate_name(db):
    """If recomp name is not unique, match only one row"""
    db.set_recomp_symbol(0x100, None, "_Construct", None, 100)
    db.set_recomp_symbol(0x200, None, "_Construct", None, 100)
    db.set_recomp_symbol(0x300, None, "_Construct", None, 100)
    db.match_function(0x5555, "_Construct")
    matches = db.get_matches()
    # We aren't testing _which_ one would be matched, just that only one _was_ matched
    assert len(matches) == 1
 def test_static_variable_match(db):
    """Set up a situation where we can match a static function variable, then match it."""
    # We need a matched function to start with.
    db.set_recomp_symbol(0x1234, None, "Isle::Tick", "?Tick@IsleApp@@QAEXH@Z", 100)
    db.match_function(0x5555, "Isle::Tick")
    # Decorated variable name from PDB.
    db.set_recomp_symbol(
        0x2000, None, None, "?g_startupDelay@?1??Tick@IsleApp@@QAEXH@Z@4HA", 4
    )
    # Provide variable name and orig function address from decomp markers
    assert db.match_static_variable(0xBEEF, "g_startupDelay", 0x5555) is True
 def test_match_options_bool(db):
    """Test handling of boolean match options"""
    # You don't actually need an existing orig addr for this.
    assert db.get_match_options(0x1234) == {}
    db.mark_stub(0x1234)
    assert "stub" in db.get_match_options(0x1234)
--- a/tools/isledecomp/tests/test_curly.py
+++ b/tools/isledecomp/tests/test_curly.py
@ -1,73 +0,0 @@
 # nyuk nyuk nyuk
 import pytest
 from isledecomp.parser.parser import CurlyManager
 from isledecomp.parser.util import sanitize_code_line
@pytest.fixture(name="curly")
 def fixture_curly():
    return CurlyManager()
 def test_simple(curly):
    curly.read_line("namespace Test {")
    assert curly.get_prefix() == "Test"
    curly.read_line("}")
    assert curly.get_prefix() == ""
 def test_oneliner(curly):
    """Should not go down into a scope for a class forward reference"""
    curly.read_line("class LegoEntity;")
    assert curly.get_prefix() == ""
    # Now make sure that we still would not consider that class name
    # even after reading the opening curly brace
    curly.read_line("if (true) {")
    assert curly.get_prefix() == ""
 def test_ignore_comments(curly):
    curly.read_line("namespace Test {")
    curly.read_line("// }")
    assert curly.get_prefix() == "Test"
@pytest.mark.xfail(reason="todo: need a real lexer")
 def test_ignore_multiline_comments(curly):
    curly.read_line("namespace Test {")
    curly.read_line("/*")
    curly.read_line("}")
    curly.read_line("*/")
    assert curly.get_prefix() == "Test"
    curly.read_line("}")
    assert curly.get_prefix() == ""
 def test_nested(curly):
    curly.read_line("namespace Test {")
    curly.read_line("namespace Foo {")
    assert curly.get_prefix() == "Test::Foo"
    curly.read_line("}")
    assert curly.get_prefix() == "Test"
 sanitize_cases = [
    ("", ""),
    ("   ", ""),
    ("{", "{"),
    ("// comments {", ""),
    ("{ // why comment here", "{"),
    ("/* comments */ {", "{"),
    ('"curly in a string {"', '""'),
    ('if (!strcmp("hello { there }", g_test)) {', 'if (!strcmp("", g_test)) {'),
    ("'{'", "''"),
    ("weird_function('\"', hello, '\"')", "weird_function('', hello, '')"),
 ]
@pytest.mark.parametrize("start, end", sanitize_cases)
 def test_sanitize(start: str, end: str):
    """Make sure that we can remove curly braces in places where they should
    not be considered as part of the semantic structure of the file.
    i.e. inside strings or chars, and inside comments"""
    assert sanitize_code_line(start) == end
--- a/tools/isledecomp/tests/test_cvdump.py
+++ b/tools/isledecomp/tests/test_cvdump.py
@ -1,59 +0,0 @@
 import pytest
 from isledecomp.cvdump.types import (
    scalar_type_size,
    scalar_type_pointer,
    scalar_type_signed,
 )
 # These are all the types seen in the cvdump.
 # We have char, short, int, long, long long, float, and double all represented
 # in both signed and unsigned.
 # We can also identify a 4 byte pointer with the T_32 prefix.
 # The type T_VOID is used to designate a function's return type.
 # T_NOTYPE is specified as the type of "this" for a static function in a class.
 # For reference: https://github.com/microsoft/microsoft-pdb/blob/master/include/cvinfo.h
 # fmt: off
 # Fields are: type_name, size, is_signed, is_pointer
 type_check_cases = (
    ("T_32PINT4",      4,  False,  True),
    ("T_32PLONG",      4,  False,  True),
    ("T_32PRCHAR",     4,  False,  True),
    ("T_32PREAL32",    4,  False,  True),
    ("T_32PUCHAR",     4,  False,  True),
    ("T_32PUINT4",     4,  False,  True),
    ("T_32PULONG",     4,  False,  True),
    ("T_32PUSHORT",    4,  False,  True),
    ("T_32PVOID",      4,  False,  True),
    ("T_CHAR",         1,  True,   False),
    ("T_INT4",         4,  True,   False),
    ("T_LONG",         4,  True,   False),
    ("T_QUAD",         8,  True,   False),
    ("T_RCHAR",        1,  True,   False),
    ("T_REAL32",       4,  True,   False),
    ("T_REAL64",       8,  True,   False),
    ("T_SHORT",        2,  True,   False),
    ("T_UCHAR",        1,  False,  False),
    ("T_UINT4",        4,  False,  False),
    ("T_ULONG",        4,  False,  False),
    ("T_UQUAD",        8,  False,  False),
    ("T_USHORT",       2,  False,  False),
    ("T_WCHAR",        2,  False,  False),
 )
 # fmt: on
@pytest.mark.parametrize("type_name, size, _, __", type_check_cases)
 def test_scalar_size(type_name: str, size: int, _, __):
    assert scalar_type_size(type_name) == size
@pytest.mark.parametrize("type_name, _, is_signed, __", type_check_cases)
 def test_scalar_signed(type_name: str, _, is_signed: bool, __):
    assert scalar_type_signed(type_name) == is_signed
@pytest.mark.parametrize("type_name, _, __, is_pointer", type_check_cases)
 def test_scalar_pointer(type_name: str, _, __, is_pointer: bool):
    assert scalar_type_pointer(type_name) == is_pointer
--- a/tools/isledecomp/tests/test_cvdump_types.py
+++ b/tools/isledecomp/tests/test_cvdump_types.py
@ -1,465 +0,0 @@
 """Specifically testing the Cvdump TYPES parser
 and type dependency tree walker."""
 import pytest
 from isledecomp.cvdump.types import (
    CvdumpTypesParser,
    CvdumpKeyError,
    CvdumpIntegrityError,
 )
 TEST_LINES = """
 0x1028 : Length = 10, Leaf = 0x1001 LF_MODIFIER
    const, modifies type T_REAL32(0040)
 0x103b : Length = 14, Leaf = 0x1503 LF_ARRAY
    Element type = T_REAL32(0040)
    Index type = T_SHORT(0011)
    length = 16
    Name =
 0x103c : Length = 14, Leaf = 0x1503 LF_ARRAY
    Element type = 0x103B
    Index type = T_SHORT(0011)
    length = 64
    Name =
 0x10e0 : Length = 86, Leaf = 0x1203 LF_FIELDLIST
    list[0] = LF_MEMBER, public, type = T_REAL32(0040), offset = 0
        member name = 'x'
    list[1] = LF_MEMBER, public, type = T_REAL32(0040), offset = 0
        member name = 'dvX'
    list[2] = LF_MEMBER, public, type = T_REAL32(0040), offset = 4
        member name = 'y'
    list[3] = LF_MEMBER, public, type = T_REAL32(0040), offset = 4
        member name = 'dvY'
    list[4] = LF_MEMBER, public, type = T_REAL32(0040), offset = 8
        member name = 'z'
    list[5] = LF_MEMBER, public, type = T_REAL32(0040), offset = 8
        member name = 'dvZ'
 0x10e1 : Length = 34, Leaf = 0x1505 LF_STRUCTURE
    # members = 6,  field list type 0x10e0,
    Derivation list type 0x0000, VT shape type 0x0000
    Size = 12, class name = _D3DVECTOR, UDT(0x000010e1)
 0x10e4 : Length = 14, Leaf = 0x1503 LF_ARRAY
    Element type = T_UCHAR(0020)
    Index type = T_SHORT(0011)
    length = 8
    Name = 
 0x10ea : Length = 14, Leaf = 0x1503 LF_ARRAY
    Element type = 0x1028
    Index type = T_SHORT(0011)
    length = 12
    Name = 
 0x11f0 : Length = 30, Leaf = 0x1504 LF_CLASS
    # members = 0,  field list type 0x0000, FORWARD REF, 
    Derivation list type 0x0000, VT shape type 0x0000
    Size = 0, class name = MxRect32, UDT(0x00001214)
 0x11f2 : Length = 10, Leaf = 0x1001 LF_MODIFIER
    const, modifies type 0x11F0
 0x1213 : Length = 530, Leaf = 0x1203 LF_FIELDLIST
    list[0] = LF_METHOD, count = 5, list = 0x1203, name = 'MxRect32'
    list[1] = LF_ONEMETHOD, public, VANILLA, index = 0x1205, name = 'operator='
    list[2] = LF_ONEMETHOD, public, VANILLA, index = 0x11F5, name = 'Intersect'
    list[3] = LF_ONEMETHOD, public, VANILLA, index = 0x1207, name = 'SetPoint'
    list[4] = LF_ONEMETHOD, public, VANILLA, index = 0x1207, name = 'AddPoint'
    list[5] = LF_ONEMETHOD, public, VANILLA, index = 0x1207, name = 'SubtractPoint'
    list[6] = LF_ONEMETHOD, public, VANILLA, index = 0x11F5, name = 'UpdateBounds'
    list[7] = LF_ONEMETHOD, public, VANILLA, index = 0x1209, name = 'IsValid'
    list[8] = LF_ONEMETHOD, public, VANILLA, index = 0x120A, name = 'IntersectsWith'
    list[9] = LF_ONEMETHOD, public, VANILLA, index = 0x120B, name = 'GetWidth'
    list[10] = LF_ONEMETHOD, public, VANILLA, index = 0x120B, name = 'GetHeight'
    list[11] = LF_ONEMETHOD, public, VANILLA, index = 0x120C, name = 'GetPoint'
    list[12] = LF_ONEMETHOD, public, VANILLA, index = 0x120D, name = 'GetSize'
    list[13] = LF_ONEMETHOD, public, VANILLA, index = 0x120B, name = 'GetLeft'
    list[14] = LF_ONEMETHOD, public, VANILLA, index = 0x120B, name = 'GetTop'
    list[15] = LF_ONEMETHOD, public, VANILLA, index = 0x120B, name = 'GetRight'
    list[16] = LF_ONEMETHOD, public, VANILLA, index = 0x120B, name = 'GetBottom'
    list[17] = LF_ONEMETHOD, public, VANILLA, index = 0x120E, name = 'SetLeft'
    list[18] = LF_ONEMETHOD, public, VANILLA, index = 0x120E, name = 'SetTop'
    list[19] = LF_ONEMETHOD, public, VANILLA, index = 0x120E, name = 'SetRight'
    list[20] = LF_ONEMETHOD, public, VANILLA, index = 0x120E, name = 'SetBottom'
    list[21] = LF_METHOD, count = 3, list = 0x1211, name = 'CopyFrom'
    list[22] = LF_ONEMETHOD, private, STATIC, index = 0x1212, name = 'Min'
    list[23] = LF_ONEMETHOD, private, STATIC, index = 0x1212, name = 'Max'
    list[24] = LF_MEMBER, private, type = T_INT4(0074), offset = 0
        member name = 'm_left'
    list[25] = LF_MEMBER, private, type = T_INT4(0074), offset = 4
        member name = 'm_top'
    list[26] = LF_MEMBER, private, type = T_INT4(0074), offset = 8
        member name = 'm_right'
    list[27] = LF_MEMBER, private, type = T_INT4(0074), offset = 12
        member name = 'm_bottom'
 0x1214 : Length = 30, Leaf = 0x1504 LF_CLASS
    # members = 34,  field list type 0x1213, CONSTRUCTOR, OVERLOAD, 
    Derivation list type 0x0000, VT shape type 0x0000
    Size = 16, class name = MxRect32, UDT(0x00001214)
 0x1220 : Length = 30, Leaf = 0x1504 LF_CLASS
    # members = 0,  field list type 0x0000, FORWARD REF, 
    Derivation list type 0x0000, VT shape type 0x0000
    Size = 0, class name = MxCore, UDT(0x00004060)
 0x14db : Length = 30, Leaf = 0x1504 LF_CLASS
    # members = 0,  field list type 0x0000, FORWARD REF, 
    Derivation list type 0x0000, VT shape type 0x0000
    Size = 0, class name = MxString, UDT(0x00004db6)
 0x19b0 : Length = 34, Leaf = 0x1505 LF_STRUCTURE
    # members = 0,  field list type 0x0000, FORWARD REF, 
    Derivation list type 0x0000, VT shape type 0x0000
    Size = 0, class name = ROIColorAlias, UDT(0x00002a76)
 0x19b1 : Length = 14, Leaf = 0x1503 LF_ARRAY
    Element type = 0x19B0
    Index type = T_SHORT(0011)
    length = 440
    Name =
 0x2a75 : Length = 98, Leaf = 0x1203 LF_FIELDLIST
    list[0] = LF_MEMBER, public, type = T_32PRCHAR(0470), offset = 0
        member name = 'm_name'
    list[1] = LF_MEMBER, public, type = T_INT4(0074), offset = 4
        member name = 'm_red'
    list[2] = LF_MEMBER, public, type = T_INT4(0074), offset = 8
        member name = 'm_green'
    list[3] = LF_MEMBER, public, type = T_INT4(0074), offset = 12
        member name = 'm_blue'
    list[4] = LF_MEMBER, public, type = T_INT4(0074), offset = 16
        member name = 'm_unk0x10'
 0x2a76 : Length = 34, Leaf = 0x1505 LF_STRUCTURE
    # members = 5,  field list type 0x2a75, 
    Derivation list type 0x0000, VT shape type 0x0000
    Size = 20, class name = ROIColorAlias, UDT(0x00002a76)
 0x22d4 : Length = 154, Leaf = 0x1203 LF_FIELDLIST
    list[0] = LF_VFUNCTAB, type = 0x20FC
    list[1] = LF_METHOD, count = 3, list = 0x22D0, name = 'MxVariable'
    list[2] = LF_ONEMETHOD, public, INTRODUCING VIRTUAL, index = 0x1F0F, 
        vfptr offset = 0, name = 'GetValue'
    list[3] = LF_ONEMETHOD, public, INTRODUCING VIRTUAL, index = 0x1F10, 
        vfptr offset = 4, name = 'SetValue'
    list[4] = LF_ONEMETHOD, public, INTRODUCING VIRTUAL, index = 0x1F11, 
        vfptr offset = 8, name = '~MxVariable'
    list[5] = LF_ONEMETHOD, public, VANILLA, index = 0x22D3, name = 'GetKey'
    list[6] = LF_MEMBER, protected, type = 0x14DB, offset = 4
        member name = 'm_key'
    list[7] = LF_MEMBER, protected, type = 0x14DB, offset = 20
        member name = 'm_value'
 0x22d5 : Length = 34, Leaf = 0x1504 LF_CLASS
    # members = 10,  field list type 0x22d4, CONSTRUCTOR, 
    Derivation list type 0x0000, VT shape type 0x20fb
    Size = 36, class name = MxVariable, UDT(0x00004041)
 0x3cc2 : Length = 38, Leaf = 0x1507 LF_ENUM
    # members = 64,  type = T_INT4(0074) field list type 0x3cc1
 NESTED,     enum name = JukeBox::JukeBoxScript, UDT(0x00003cc2)
 0x3fab : Length = 10, Leaf = 0x1002 LF_POINTER
    Pointer (NEAR32), Size: 0
    Element type : 0x3FAA
 0x405f : Length = 158, Leaf = 0x1203 LF_FIELDLIST
    list[0] = LF_VFUNCTAB, type = 0x2090
    list[1] = LF_ONEMETHOD, public, VANILLA, index = 0x176A, name = 'MxCore'
    list[2] = LF_ONEMETHOD, public, INTRODUCING VIRTUAL, index = 0x176A, 
        vfptr offset = 0, name = '~MxCore'
    list[3] = LF_ONEMETHOD, public, INTRODUCING VIRTUAL, index = 0x176B, 
        vfptr offset = 4, name = 'Notify'
    list[4] = LF_ONEMETHOD, public, INTRODUCING VIRTUAL, index = 0x2087, 
        vfptr offset = 8, name = 'Tickle'
    list[5] = LF_ONEMETHOD, public, INTRODUCING VIRTUAL, index = 0x202F, 
        vfptr offset = 12, name = 'ClassName'
    list[6] = LF_ONEMETHOD, public, INTRODUCING VIRTUAL, index = 0x2030, 
        vfptr offset = 16, name = 'IsA'
    list[7] = LF_ONEMETHOD, public, VANILLA, index = 0x2091, name = 'GetId'
    list[8] = LF_MEMBER, private, type = T_UINT4(0075), offset = 4
        member name = 'm_id'
 0x4060 : Length = 30, Leaf = 0x1504 LF_CLASS
    # members = 9,  field list type 0x405f, CONSTRUCTOR, 
    Derivation list type 0x0000, VT shape type 0x1266
    Size = 8, class name = MxCore, UDT(0x00004060)
 0x4262 : Length = 14, Leaf = 0x1503 LF_ARRAY
    Element type = 0x3CC2
    Index type = T_SHORT(0011)
    length = 24
    Name = 
 0x432f : Length = 14, Leaf = 0x1503 LF_ARRAY
    Element type = T_INT4(0074)
    Index type = T_SHORT(0011)
    length = 12
    Name =
 0x4db5 : Length = 246, Leaf = 0x1203 LF_FIELDLIST
    list[0] = LF_BCLASS, public, type = 0x1220, offset = 0
    list[1] = LF_METHOD, count = 3, list = 0x14E3, name = 'MxString'
    list[2] = LF_ONEMETHOD, public, VIRTUAL, index = 0x14DE, name = '~MxString'
    list[3] = LF_METHOD, count = 2, list = 0x14E7, name = 'operator='
    list[4] = LF_ONEMETHOD, public, VANILLA, index = 0x14DE, name = 'ToUpperCase'
    list[5] = LF_ONEMETHOD, public, VANILLA, index = 0x14DE, name = 'ToLowerCase'
    list[6] = LF_ONEMETHOD, public, VANILLA, index = 0x14E8, name = 'operator+'
    list[7] = LF_ONEMETHOD, public, VANILLA, index = 0x14E9, name = 'operator+='
    list[8] = LF_ONEMETHOD, public, VANILLA, index = 0x14EB, name = 'Compare'
    list[9] = LF_ONEMETHOD, public, VANILLA, index = 0x14EC, name = 'GetData'
    list[10] = LF_ONEMETHOD, public, VANILLA, index = 0x4DB4, name = 'GetLength'
    list[11] = LF_MEMBER, private, type = T_32PRCHAR(0470), offset = 8
        member name = 'm_data'
    list[12] = LF_MEMBER, private, type = T_USHORT(0021), offset = 12
        member name = 'm_length'
 0x4db6 : Length = 30, Leaf = 0x1504 LF_CLASS
    # members = 16,  field list type 0x4db5, CONSTRUCTOR, OVERLOAD, 
    Derivation list type 0x0000, VT shape type 0x1266
    Size = 16, class name = MxString, UDT(0x00004db6)
 """
@pytest.fixture(name="parser")
 def types_parser_fixture():
    parser = CvdumpTypesParser()
    for line in TEST_LINES.split("\n"):
        parser.read_line(line)
    return parser
 def test_basic_parsing(parser):
    obj = parser.keys["0x4db6"]
    assert obj["type"] == "LF_CLASS"
    assert obj["name"] == "MxString"
    assert obj["udt"] == "0x4db6"
    assert len(parser.keys["0x4db5"]["members"]) == 2
 def test_scalar_types(parser):
    """Full tests on the scalar_* methods are in another file.
    Here we are just testing the passthrough of the "T_" types."""
    assert parser.get("T_CHAR").name is None
    assert parser.get("T_CHAR").size == 1
    assert parser.get("T_32PVOID").name is None
    assert parser.get("T_32PVOID").size == 4
 def test_resolve_forward_ref(parser):
    # Non-forward ref
    assert parser.get("0x22d5").name == "MxVariable"
    # Forward ref
    assert parser.get("0x14db").name == "MxString"
    assert parser.get("0x14db").size == 16
 def test_members(parser):
    """Return the list of items to compare for a given complex type.
    If the class has a superclass, add those members too."""
    # MxCore field list
    mxcore_members = parser.get_scalars("0x405f")
    assert mxcore_members == [
        (0, "vftable", "T_32PVOID"),
        (4, "m_id", "T_UINT4"),
    ]
    # MxCore class id. Should be the same members
    assert mxcore_members == parser.get_scalars("0x4060")
    # MxString field list. Should add inherited members from MxCore
    assert parser.get_scalars("0x4db5") == [
        (0, "vftable", "T_32PVOID"),
        (4, "m_id", "T_UINT4"),
        (8, "m_data", "T_32PRCHAR"),
        (12, "m_length", "T_USHORT"),
    ]
 def test_members_recursive(parser):
    """Make sure that we unwrap the dependency tree correctly."""
    # MxVariable field list
    assert parser.get_scalars("0x22d4") == [
        (0, "vftable", "T_32PVOID"),
        (4, "m_key.vftable", "T_32PVOID"),
        (8, "m_key.m_id", "T_UINT4"),
        (12, "m_key.m_data", "T_32PRCHAR"),
        (16, "m_key.m_length", "T_USHORT"),  # with padding
        (20, "m_value.vftable", "T_32PVOID"),
        (24, "m_value.m_id", "T_UINT4"),
        (28, "m_value.m_data", "T_32PRCHAR"),
        (32, "m_value.m_length", "T_USHORT"),  # with padding
    ]
 def test_struct(parser):
    """Basic test for converting type into struct.unpack format string."""
    # MxCore: vftable and uint32. The vftable pointer is read as uint32.
    assert parser.get_format_string("0x4060") == "<LL"
    # _D3DVECTOR, three floats. Union types should already be removed.
    assert parser.get_format_string("0x10e1") == "<fff"
    # MxRect32, four signed ints.
    assert parser.get_format_string("0x1214") == "<llll"
 def test_struct_padding(parser):
    """For data comparison purposes, make sure we have no gaps in the
    list of scalar types. Any gap is filled by an unsigned char."""
    # MxString, padded to 16 bytes. 4 actual members. 2 bytes of padding.
    assert len(parser.get_scalars("0x4db6")) == 4
    assert len(parser.get_scalars_gapless("0x4db6")) == 6
    # MxVariable, with two MxStrings (and a vtable)
    # Fill in the middle gap and the outer gap.
    assert len(parser.get_scalars("0x22d5")) == 9
    assert len(parser.get_scalars_gapless("0x22d5")) == 13
 def test_struct_format_string(parser):
    """Generate the struct.unpack format string using the
    list of scalars with padding filled in."""
    # MxString, padded to 16 bytes.
    assert parser.get_format_string("0x4db6") == "<LLLHBB"
    # MxVariable, with two MxString members.
    assert parser.get_format_string("0x22d5") == "<LLLLHBBLLLHBB"
 def test_array(parser):
    """LF_ARRAY members are created dynamically based on the
    total array size and the size of one element."""
    # unsigned char[8]
    assert parser.get_scalars("0x10e4") == [
        (0, "[0]", "T_UCHAR"),
        (1, "[1]", "T_UCHAR"),
        (2, "[2]", "T_UCHAR"),
        (3, "[3]", "T_UCHAR"),
        (4, "[4]", "T_UCHAR"),
        (5, "[5]", "T_UCHAR"),
        (6, "[6]", "T_UCHAR"),
        (7, "[7]", "T_UCHAR"),
    ]
    # float[4]
    assert parser.get_scalars("0x103b") == [
        (0, "[0]", "T_REAL32"),
        (4, "[1]", "T_REAL32"),
        (8, "[2]", "T_REAL32"),
        (12, "[3]", "T_REAL32"),
    ]
 def test_2d_array(parser):
    """Make sure 2d array elements are named as we expect."""
    # float[4][4]
    float_array = parser.get_scalars("0x103c")
    assert len(float_array) == 16
    assert float_array[0] == (0, "[0][0]", "T_REAL32")
    assert float_array[1] == (4, "[0][1]", "T_REAL32")
    assert float_array[4] == (16, "[1][0]", "T_REAL32")
    assert float_array[-1] == (60, "[3][3]", "T_REAL32")
 def test_enum(parser):
    """LF_ENUM should equal 4-byte int"""
    assert parser.get("0x3cc2").size == 4
    assert parser.get_scalars("0x3cc2") == [(0, None, "T_INT4")]
    # Now look at an array of enum, 24 bytes
    enum_array = parser.get_scalars("0x4262")
    assert len(enum_array) == 6  # 24 / 4
    assert enum_array[0].size == 4
 def test_lf_pointer(parser):
    """LF_POINTER is just a wrapper for scalar pointer type"""
    assert parser.get("0x3fab").size == 4
    # assert parser.get("0x3fab").is_pointer is True  # TODO: ?
    assert parser.get_scalars("0x3fab") == [(0, None, "T_32PVOID")]
 def test_key_not_exist(parser):
    """Accessing a non-existent type id should raise our exception"""
    with pytest.raises(CvdumpKeyError):
        parser.get("0xbeef")
    with pytest.raises(CvdumpKeyError):
        parser.get_scalars("0xbeef")
 def test_broken_forward_ref(parser):
    """Raise an exception if we cannot follow a forward reference"""
    # Verify forward reference on MxCore
    parser.get("0x1220")
    # Delete the MxCore LF_CLASS
    del parser.keys["0x4060"]
    # Forward ref via 0x1220 will fail
    with pytest.raises(CvdumpKeyError):
        parser.get("0x1220")
 def test_null_forward_ref(parser):
    """If the forward ref object is invalid and has no forward ref id,
    raise an exception."""
    # Test MxString forward reference
    parser.get("0x14db")
    # Delete the UDT for MxString
    del parser.keys["0x14db"]["udt"]
    # Cannot complete the forward reference lookup
    with pytest.raises(CvdumpIntegrityError):
        parser.get("0x14db")
 def test_broken_array_element_ref(parser):
    # Test LF_ARRAY of ROIColorAlias
    parser.get("0x19b1")
    # Delete ROIColorAlias
    del parser.keys["0x19b0"]
    # Type reference lookup will fail
    with pytest.raises(CvdumpKeyError):
        parser.get("0x19b1")
 def test_lf_modifier(parser):
    """Is this an alias for another type?"""
    # Modifies float
    assert parser.get("0x1028").size == 4
    assert parser.get_scalars("0x1028") == [(0, None, "T_REAL32")]
    mxrect = parser.get_scalars("0x1214")
    # Modifies MxRect32 via forward ref
    assert mxrect == parser.get_scalars("0x11f2")
 def test_union_members(parser):
    """If there is a union somewhere in our dependency list, we can
    expect to see duplicated member offsets and names. This is ok for
    the TypeInfo tuple, but the list of ScalarType items should have
    unique offset to simplify comparison."""
    # D3DVector type with duplicated offsets
    d3dvector = parser.get("0x10e1")
    assert len(d3dvector.members) == 6
    assert len([m for m in d3dvector.members if m.offset == 0]) == 2
    # Deduplicated comparison list
    vector_items = parser.get_scalars("0x10e1")
    assert len(vector_items) == 3
--- a/tools/isledecomp/tests/test_demangler.py
+++ b/tools/isledecomp/tests/test_demangler.py
@ -1,83 +0,0 @@
 import pytest
 from isledecomp.cvdump.demangler import (
    demangle_string_const,
    demangle_vtable,
    parse_encoded_number,
    InvalidEncodedNumberError,
    get_vtordisp_name,
 )
 string_demangle_cases = [
    ("??_C@_08LIDF@December?$AA@", 8, False),
    ("??_C@_0L@EGPP@english?9nz?$AA@", 11, False),
    (
        "??_C@_1O@POHA@?$AA?$CI?$AAn?$AAu?$AAl?$AAl?$AA?$CJ?$AA?$AA?$AA?$AA?$AA?$AH?$AA?$AA?$AA?$AA?$AA?$AA?$AA?$9A?$AE?$;I@",
        14,
        True,
    ),
    ("??_C@_00A@?$AA@", 0, False),
    ("??_C@_01A@?$AA?$AA@", 1, False),
 ]
@pytest.mark.parametrize("symbol, strlen, is_utf16", string_demangle_cases)
 def test_strings(symbol, is_utf16, strlen):
    s = demangle_string_const(symbol)
    assert s.len == strlen
    assert s.is_utf16 == is_utf16
 encoded_numbers = [
    ("A@", 0),
    ("AA@", 0),  # would never happen?
    ("P@", 15),
    ("BA@", 16),
    ("BCD@", 291),
 ]
@pytest.mark.parametrize("string, value", encoded_numbers)
 def test_encoded_numbers(string, value):
    assert parse_encoded_number(string) == value
 def test_invalid_encoded_number():
    with pytest.raises(InvalidEncodedNumberError):
        parse_encoded_number("Hello")
 vtable_cases = [
    ("??_7LegoCarBuildAnimPresenter@@6B@", "LegoCarBuildAnimPresenter::`vftable'"),
    ("??_7?$MxCollection@PAVLegoWorld@@@@6B@", "MxCollection<LegoWorld *>::`vftable'"),
    (
        "??_7?$MxPtrList@VLegoPathController@@@@6B@",
        "MxPtrList<LegoPathController>::`vftable'",
    ),
    ("??_7Renderer@Tgl@@6B@", "Tgl::Renderer::`vftable'"),
    ("??_7LegoExtraActor@@6B0@@", "LegoExtraActor::`vftable'{for `LegoExtraActor'}"),
    (
        "??_7LegoExtraActor@@6BLegoAnimActor@@@",
        "LegoExtraActor::`vftable'{for `LegoAnimActor'}",
    ),
    (
        "??_7LegoAnimActor@@6B?$LegoContainer@PAM@@@",
        "LegoAnimActor::`vftable'{for `LegoContainer<float *>'}",
    ),
 ]
@pytest.mark.parametrize("symbol, class_name", vtable_cases)
 def test_vtable(symbol, class_name):
    assert demangle_vtable(symbol) == class_name
 def test_vtordisp():
    """Make sure we can accurately detect an adjuster thunk symbol"""
    assert get_vtordisp_name("") is None
    assert get_vtordisp_name("?ClassName@LegoExtraActor@@UBEPBDXZ") is None
    assert (
        get_vtordisp_name("?ClassName@LegoExtraActor@@$4PPPPPPPM@A@BEPBDXZ") is not None
    )
    # A function called vtordisp
    assert get_vtordisp_name("?vtordisp@LegoExtraActor@@UBEPBDXZ") is None
--- a/tools/isledecomp/tests/test_instgen.py
+++ b/tools/isledecomp/tests/test_instgen.py
@ -1,212 +0,0 @@
 from isledecomp.compare.asm.instgen import InstructGen, SectionType
 def test_ret():
    """Make sure we can handle a function with one instruction."""
    ig = InstructGen(b"\xc3", 0)
    assert len(ig.sections) == 1
 SCORE_NOTIFY = (
    b"\x53\x56\x57\x8b\xd9\x33\xff\x8b\x74\x24\x10\x56\xe8\xbf\xe1\x01"
    b"\x00\x80\xbb\xf6\x00\x00\x00\x00\x0f\x84\x9c\x00\x00\x00\x8b\x4e"
    b"\x04\x49\x83\xf9\x17\x0f\x87\x8f\x00\x00\x00\x33\xc0\x8a\x81\xec"
    b"\x14\x00\x10\xff\x24\x85\xd4\x14\x00\x10\x8b\xcb\xbf\x01\x00\x00"
    b"\x00\xe8\x7a\x05\x00\x00\x8b\xc7\x5f\x5e\x5b\xc2\x04\x00\x56\x8b"
    b"\xcb\xe8\xaa\x00\x00\x00\x8b\xf8\x8b\xc7\x5f\x5e\x5b\xc2\x04\x00"
    b"\x80\x7e\x18\x20\x75\x07\x8b\xcb\xe8\xc3\xfe\xff\xff\xbf\x01\x00"
    b"\x00\x00\x8b\xc7\x5f\x5e\x5b\xc2\x04\x00\x56\x8b\xcb\xe8\x3e\x02"
    b"\x00\x00\x8b\xf8\x8b\xc7\x5f\x5e\x5b\xc2\x04\x00\x6a\x09\xa1\x4c"
    b"\x45\x0f\x10\x6a\x07\x50\xe8\x35\x45\x01\x00\x83\xc4\x0c\x8b\x83"
    b"\xf8\x00\x00\x00\x85\xc0\x74\x0d\x50\xe8\xa2\x42\x01\x00\x8b\xc8"
    b"\xe8\x9b\x9b\x03\x00\xbf\x01\x00\x00\x00\x8b\xc7\x5f\x5e\x5b\xc2"
    b"\x04\x00\x8b\xff\x4a\x14\x00\x10\x5e\x14\x00\x10\x70\x14\x00\x10"
    b"\x8a\x14\x00\x10\x9c\x14\x00\x10\xca\x14\x00\x10\x00\x01\x05\x05"
    b"\x05\x05\x02\x05\x05\x05\x05\x05\x05\x05\x05\x05\x03\x05\x05\x05"
    b"\x05\x05\x05\x04\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc"
 )
 def test_score_notify():
    """Score::Notify function from 0x10001410 in LEGO1.
    Good representative function for jump table (at 0x100014d4)
    and switch data (at 0x100014ec)."""
    ig = InstructGen(SCORE_NOTIFY, 0x10001410)
    # Did we get everything?
    assert len(ig.sections) == 3
    types_only = tuple(s.type for s in ig.sections)
    assert types_only == (SectionType.CODE, SectionType.ADDR_TAB, SectionType.DATA_TAB)
    # CODE section stopped at correct place?
    instructions = ig.sections[0].contents
    assert instructions[-1].address == 0x100014D2
    # n.b. 0x100014d2 is the dummy instruction `mov edi, edi`
    # Ghidra does more thorough analysis and ignores this.
    # The last real instruction should be at 0x100014cf. Not a big deal
    # to include this because it is not junk data.
    # 6 switch addresses
    assert len(ig.sections[1].contents) == 6
    # TODO: The data table at the end includes all of the 0xCC padding bytes.
 SMACK_CASE = (
    # LEGO1: 0x100cdc43 (modified so jump table points at +0x1016)
    b"\x2e\xff\x24\x8d\x16\x10\x00\x00"
    # LEGO1: 0x100cdb62 (instructions before and after jump table)
    b"\x8b\xf8\xeb\x1a\x87\xdb\x87\xc9\x87\xdb\x87\xc9\x87\xdb\x50\xdc"
    b"\x0c\x10\xd0\xe2\x0c\x10\xb0\xe8\x0c\x10\x50\xe9\x0c\x10\xa0\x10"
    b"\x27\x10\x10\x3c\x11\x77\x17\x8a\xc8"
 )
 def test_smack_case():
    """Case where we have code / jump table / code.
    Need to properly separate code sections, eliminate junk instructions
    and continue disassembling at the proper address following the data."""
    ig = InstructGen(SMACK_CASE, 0x1000)
    assert len(ig.sections) == 3
    assert ig.sections[0].type == ig.sections[2].type == SectionType.CODE
    # Make sure we captured the instruction immediately after
    assert ig.sections[2].contents[0].mnemonic == "mov"
 # BETA10 0x1004c9cc
 BETA_FUNC = (
    b"\x55\x8b\xec\x83\xec\x08\x53\x56\x57\x89\x4d\xfc\x8b\x45\xfc\x33"
    b"\xc9\x8a\x88\x19\x02\x00\x00\x89\x4d\xf8\xe9\x1e\x00\x00\x00\xe9"
    b"\x41\x00\x00\x00\xe9\x3c\x00\x00\x00\xe9\x37\x00\x00\x00\xe9\x32"
    b"\x00\x00\x00\xe9\x2d\x00\x00\x00\xe9\x28\x00\x00\x00\x83\x7d\xf8"
    b"\x04\x0f\x87\x1e\x00\x00\x00\x8b\x45\xf8\xff\x24\x85\x1d\xca\x04"
    b"\x10\xeb\xc9\x04\x10\xf0\xc9\x04\x10\xf5\xc9\x04\x10\xfa\xc9\x04"
    b"\x10\xff\xc9\x04\x10\xb0\x01\xe9\x00\x00\x00\x00\x5f\x5e\x5b\xc9"
    b"\xc2\x04\x00"
 )
 def test_beta_case():
    """Complete (and short) function with CODE / ADDR / CODE"""
    ig = InstructGen(BETA_FUNC, 0x1004C9CC)
    # The JMP into the jump table immediately precedes the jump table.
    # We have to detect this and switch sections correctly or we will only
    # get 1 section.
    assert len(ig.sections) == 3
    assert ig.sections[0].type == ig.sections[2].type == SectionType.CODE
    # Make sure we captured the instruction immediately after
    assert ig.sections[2].contents[0].mnemonic == "mov"
 # LEGO1 0x1000fb50
 # TODO: The test data here is longer than it needs to be.
 THUNK_TEST = (
    b"\x2b\x49\xfc\xe9\x08\x00\x00\x00\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc"
    b"\x56\x8b\xf1\xe8\xd8\xc5\x00\x00\x8b\xce\xe8\xb1\xdc\x01\x00\xf6"
    b"\x44\x24\x08\x01\x74\x0c\x8d\x46\xe0\x50\xe8\xe1\x66\x07\x00\x83"
    b"\xc4\x04\x8d\x46\xe0\x5e\xc2\x04\x00\xcc\xcc\xcc\xcc\xcc\xcc\xcc"
    b"\x2b\x49\xfc\xe9\x08\x00\x00\x00\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc"
    b"\xb8\x7c\x05\x0f\x10\xc3\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc"
    b"\x2b\x49\xfc\xe9\x08\x00\x00\x00\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc"
    b"\x8b\x54"
    # The problem is here: the last two bytes are the start of the next
    # function 0x1000fbc0. This is not enough data to read an instruction.
 )
 def test_thunk_case():
    """Adjuster thunk incorrectly annotated.
    We are reading way more bytes than we should for this function."""
    ig = InstructGen(THUNK_TEST, 0x1000FB50)
    # No switch cases here, so the only section is code.
    # This caused an infinite loop during testing so the goal is just to finish.
    assert len(ig.sections) == 1
    # TODO: We might detect the 0xCC padding bytes and cut off the function.
    # If we did that, we would correctly read only 2 instructions.
    # assert len(ig.sections[0].contents) == 2
 # LEGO1 0x1006f080, Infocenter::HandleEndAction
 HANDLE_END_ACTION = (
    b"\x53\x56\x57\x8b\xf1\x8b\x5c\x24\x10\x8b\x0d\x84\x45\x0f\x10\x8b"
    b"\x7b\x0c\x8b\x47\x20\x39\x01\x75\x29\x81\x7f\x1c\xf3\x01\x00\x00"
    b"\x75\x20\xe8\x59\x66\xfa\xff\x6a\x00\x8b\x40\x18\x6a\x00\x6a\x10"
    b"\x50\xff\x15\x38\xb5\x10\x10\xb8\x01\x00\x00\x00\x5f\x5e\x5b\xc2"
    b"\x04\x00\x39\x46\x0c\x0f\x85\xa2\x00\x00\x00\x8b\x47\x1c\x83\xf8"
    b"\x28\x74\x18\x83\xf8\x29\x74\x13\x83\xf8\x2a\x74\x0e\x83\xf8\x2b"
    b"\x74\x09\x83\xf8\x2c\x0f\x85\x82\x00\x00\x00\x66\x8b\x86\xd4\x01"
    b"\x00\x00\x66\x85\xc0\x74\x09\x66\x48\x66\x89\x86\xd4\x01\x00\x00"
    b"\x66\x83\xbe\xd4\x01\x00\x00\x00\x75\x63\x6a\x0b\xe8\xff\x67\xfa"
    b"\xff\x66\x8b\x86\xfc\x00\x00\x00\x83\xc4\x04\x50\xe8\x3f\x66\xfa"
    b"\xff\x8b\xc8\xe8\x58\xa6\xfc\xff\x0f\xbf\x86\xfc\x00\x00\x00\x48"
    b"\x83\xf8\x04\x77\x2f\xff\x24\x85\x78\xf4\x06\x10\x68\x1d\x02\x00"
    b"\x00\xeb\x1a\x68\x1e\x02\x00\x00\xeb\x13\x68\x1f\x02\x00\x00\xeb"
    b"\x0c\x68\x20\x02\x00\x00\xeb\x05\x68\x21\x02\x00\x00\x8b\xce\xe8"
    b"\x9c\x21\x00\x00\x6a\x01\x8b\xce\xe8\x53\x1c\x00\x00\x8d\x8e\x0c"
    b"\x01\x00\x00\x53\x8b\x01\xff\x50\x04\x85\xc0\x0f\x85\xef\x02\x00"
    b"\x00\x8b\x56\x0c\x8b\x4f\x20\x3b\xd1\x74\x0e\x8b\x1d\x74\x45\x0f"
    b"\x10\x39\x0b\x0f\x85\xd7\x02\x00\x00\x81\x7f\x1c\x02\x02\x00\x00"
    b"\x75\x1a\x6a\x00\x52\x6a\x10\xe8\xa4\x65\xfa\xff\x8b\xc8\xe8\x0d"
    b"\xa2\xfb\xff\x66\xc7\x86\xd6\x01\x00\x00\x00\x00\x8b\x96\x00\x01"
    b"\x00\x00\x8d\x42\x74\x8b\x18\x83\xfb\x0c\x0f\x87\x9b\x02\x00\x00"
    b"\x33\xc9\x8a\x8b\xac\xf4\x06\x10\xff\x24\x8d\x8c\xf4\x06\x10\x8b"
    b"\x86\x08\x01\x00\x00\x83\xf8\x05\x77\x07\xff\x24\x85\xbc\xf4\x06"
    b"\x10\x8b\xce\xe8\xb8\x1a\x00\x00\x8b\x86\x00\x01\x00\x00\x68\xf4"
    b"\x01\x00\x00\x8b\xce\xc7\x40\x74\x0b\x00\x00\x00\xe8\xef\x20\x00"
    b"\x00\x8b\x86\x00\x01\x00\x00\xc7\x86\x08\x01\x00\x00\xff\xff\xff"
    b"\xff\x83\x78\x78\x00\x0f\x85\x40\x02\x00\x00\xb8\x01\x00\x00\x00"
    b"\x5f\x66\xc7\x86\xd2\x01\x00\x00\x01\x00\x5e\x5b\xc2\x04\x00\x6a"
    b"\x00\x8b\xce\x6a\x01\xe8\xd6\x19\x00\x00\xb8\x01\x00\x00\x00\x5f"
    b"\x5e\x5b\xc2\x04\x00\x6a\x01\x8b\xce\x6a\x02\xe8\xc0\x19\x00\x00"
    b"\xb8\x01\x00\x00\x00\x5f\x5e\x5b\xc2\x04\x00\x8b\xce\xe8\x3e\x1a"
    b"\x00\x00\x8b\x86\x00\x01\x00\x00\x68\x1c\x02\x00\x00\x8b\xce\xc7"
    b"\x40\x74\x0b\x00\x00\x00\xe8\x75\x20\x00\x00\xb8\x01\x00\x00\x00"
    b"\x5f\xc7\x86\x08\x01\x00\x00\xff\xff\xff\xff\x5e\x5b\xc2\x04\x00"
    b"\x8b\xce\xe8\x09\x1a\x00\x00\x8b\x86\x00\x01\x00\x00\x68\x1b\x02"
    b"\x00\x00\x8b\xce\xc7\x40\x74\x0b\x00\x00\x00\xe8\x40\x20\x00\x00"
    b"\xb8\x01\x00\x00\x00\x5f\xc7\x86\x08\x01\x00\x00\xff\xff\xff\xff"
    b"\x5e\x5b\xc2\x04\x00\xc7\x00\x0b\x00\x00\x00\x8b\x86\x08\x01\x00"
    b"\x00\x83\xf8\x04\x74\x0c\x83\xf8\x05\x74\x0e\x68\xf4\x01\x00\x00"
    b"\xeb\x0c\x68\x1c\x02\x00\x00\xeb\x05\x68\x1b\x02\x00\x00\x8b\xce"
    b"\xe8\xfb\x1f\x00\x00\xb8\x01\x00\x00\x00\x5f\xc7\x86\x08\x01\x00"
    b"\x00\xff\xff\xff\xff\x5e\x5b\xc2\x04\x00\x6a\x00\xa1\xa0\x76\x0f"
    b"\x10\x50\xe8\x39\x65\xfa\xff\x83\xc4\x08\xa1\xa4\x76\x0f\x10\x6a"
    b"\x00\x50\xe8\x29\x65\xfa\xff\x83\xc4\x08\xe8\xf1\x63\xfa\xff\x8b"
    b"\xc8\xe8\x6a\x02\x01\x00\xb8\x01\x00\x00\x00\x5f\x5e\x5b\xc2\x04"
    b"\x00\x8b\x47\x1c\x83\xf8\x46\x74\x09\x83\xf8\x47\x0f\x85\x09\x01"
    b"\x00\x00\x6a\x00\x6a\x00\x6a\x32\x6a\x03\xe8\x91\x65\xfa\xff\x8b"
    b"\xc8\xe8\xfa\xc7\xfd\xff\x8b\x86\x00\x01\x00\x00\x5f\x5e\x5b\xc7"
    b"\x40\x74\x0e\x00\x00\x00\xb8\x01\x00\x00\x00\xc2\x04\x00\x8b\x47"
    b"\x1c\x39\x86\xf8\x00\x00\x00\x0f\x85\xce\x00\x00\x00\xe8\xbe\x63"
    b"\xfa\xff\x83\x78\x10\x02\x74\x19\x66\x8b\x86\xfc\x00\x00\x00\x66"
    b"\x85\xc0\x74\x0d\x50\xe8\xa6\x63\xfa\xff\x8b\xc8\xe8\xbf\xa3\xfc"
    b"\xff\x6a\x00\x6a\x00\x6a\x32\x6a\x03\xe8\x32\x65\xfa\xff\x8b\xc8"
    b"\xe8\x9b\xc7\xfd\xff\x8b\x86\x00\x01\x00\x00\x5f\x5e\x5b\xc7\x40"
    b"\x74\x0e\x00\x00\x00\xb8\x01\x00\x00\x00\xc2\x04\x00\x83\x7a\x78"
    b"\x00\x75\x32\x8b\x86\xf8\x00\x00\x00\x83\xf8\x28\x74\x27\x83\xf8"
    b"\x29\x74\x22\x83\xf8\x2a\x74\x1d\x83\xf8\x2b\x74\x18\x83\xf8\x2c"
    b"\x74\x13\x66\xc7\x86\xd0\x01\x00\x00\x01\x00\x6a\x0b\xe8\xee\x64"
    b"\xfa\xff\x83\xc4\x04\x8b\x86\x00\x01\x00\x00\x6a\x01\x68\xdc\x44"
    b"\x0f\x10\xc7\x40\x74\x02\x00\x00\x00\xe8\x22\x64\xfa\xff\x83\xc4"
    b"\x08\xb8\x01\x00\x00\x00\x5f\x5e\x5b\xc2\x04\x00\x8b\x47\x1c\x39"
    b"\x86\xf8\x00\x00\x00\x75\x14\x6a\x00\x6a\x00\x6a\x32\x6a\x03\xe8"
    b"\x9c\x64\xfa\xff\x8b\xc8\xe8\x05\xc7\xfd\xff\xb8\x01\x00\x00\x00"
    b"\x5f\x5e\x5b\xc2\x04\x00\x8b\xff\x3c\xf1\x06\x10\x43\xf1\x06\x10"
    b"\x4a\xf1\x06\x10\x51\xf1\x06\x10\x58\xf1\x06\x10\xdf\xf1\x06\x10"
    b"\xd5\xf2\x06\x10\x1a\xf3\x06\x10\x51\xf3\x06\x10\x8e\xf3\x06\x10"
    b"\xed\xf3\x06\x10\x4c\xf4\x06\x10\x6b\xf4\x06\x10\x00\x01\x02\x07"
    b"\x03\x04\x07\x07\x07\x07\x07\x05\x06\x8d\x49\x00\x3f\xf2\x06\x10"
    b"\x55\xf2\x06\x10\xf1\xf1\x06\x10\xf1\xf1\x06\x10\x6b\xf2\x06\x10"
    b"\xa0\xf2\x06\x10\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc"
 )
 def test_action_case():
    """3 switches: 3 jump tables, 1 data table"""
    ig = InstructGen(HANDLE_END_ACTION, 0x1006F080)
    # Two of the jump tables (0x1006f478 with 5, 0x1006f48c with 8)
    # are contiguous.
    assert len(ig.sections) == 5
--- a/tools/isledecomp/tests/test_islebin.py
+++ b/tools/isledecomp/tests/test_islebin.py
@ -1,152 +0,0 @@
 """Tests for the Bin (or IsleBin) module that:
 1. Parses relevant data from the PE header and other structures.
 2. Provides an interface to read from the DLL or EXE using a virtual address.
 These are some basic smoke tests."""
 import hashlib
 from typing import Tuple
 import pytest
 from isledecomp.bin import (
    Bin as IsleBin,
    SectionNotFoundError,
    InvalidVirtualAddressError,
 )
 # LEGO1.DLL: v1.1 English, September
 LEGO1_SHA256 = "14645225bbe81212e9bc1919cd8a692b81b8622abb6561280d99b0fc4151ce17"
@pytest.fixture(name="binfile", scope="session")
 def fixture_binfile(pytestconfig) -> IsleBin:
    filename = pytestconfig.getoption("--lego1")
    # Skip this if we have not provided the path to LEGO1.dll.
    if filename is None:
        pytest.skip(allow_module_level=True, reason="No path to LEGO1")
    with open(filename, "rb") as f:
        digest = hashlib.sha256(f.read()).hexdigest()
        if digest != LEGO1_SHA256:
            pytest.fail(reason="Did not match expected LEGO1.DLL")
    with IsleBin(filename, find_str=True) as islebin:
        yield islebin
 def test_basic(binfile: IsleBin):
    assert binfile.entry == 0x1008C860
    assert len(binfile.sections) == 6
    with pytest.raises(SectionNotFoundError):
        binfile.get_section_by_name(".hello")
 SECTION_INFO = (
    (".text", 0x10001000, 0xD2A66, 0xD2C00),
    (".rdata", 0x100D4000, 0x1B5B6, 0x1B600),
    (".data", 0x100F0000, 0x1A734, 0x12C00),
    (".idata", 0x1010B000, 0x1006, 0x1200),
    (".rsrc", 0x1010D000, 0x21D8, 0x2200),
    (".reloc", 0x10110000, 0x10C58, 0x10E00),
 )
@pytest.mark.parametrize("name, v_addr, v_size, raw_size", SECTION_INFO)
 def test_sections(name: str, v_addr: int, v_size: int, raw_size: int, binfile: IsleBin):
    section = binfile.get_section_by_name(name)
    assert section.virtual_address == v_addr
    assert section.virtual_size == v_size
    assert section.size_of_raw_data == raw_size
 DOUBLE_PI_BYTES = b"\x18\x2d\x44\x54\xfb\x21\x09\x40"
 # Now that's a lot of pi
 PI_ADDRESSES = (
    0x100D4000,
    0x100D4700,
    0x100D7180,
    0x100DB8F0,
    0x100DC030,
 )
@pytest.mark.parametrize("addr", PI_ADDRESSES)
 def test_read_pi(addr: int, binfile: IsleBin):
    assert binfile.read(addr, 8) == DOUBLE_PI_BYTES
 def test_unusual_reads(binfile: IsleBin):
    """Reads that return an error or some specific value based on context"""
    # Reading an address earlier than the imagebase
    with pytest.raises(InvalidVirtualAddressError):
        binfile.read(0, 1)
    # Really big address
    with pytest.raises(InvalidVirtualAddressError):
        binfile.read(0xFFFFFFFF, 1)
    # Uninitialized part of .data
    assert binfile.read(0x1010A600, 4) is None
    # Past the end of virtual size in .text
    assert binfile.read(0x100D3A70, 4) == b"\x00\x00\x00\x00"
 STRING_ADDRESSES = (
    (0x100DB588, b"November"),
    (0x100F0130, b"Helicopter"),
    (0x100F0144, b"HelicopterState"),
    (0x100F0BE4, b"valerie"),
    (0x100F4080, b"TARGET"),
 )
@pytest.mark.parametrize("addr, string", STRING_ADDRESSES)
 def test_strings(addr: int, string: bytes, binfile: IsleBin):
    """Test string read utility function and the string search feature"""
    assert binfile.read_string(addr) == string
    assert binfile.find_string(string) == addr
 def test_relocation(binfile: IsleBin):
    # n.b. This is not the number of *relocations* read from .reloc.
    # It is the set of unique addresses in the binary that get relocated.
    assert len(binfile.get_relocated_addresses()) == 14066
    # Score::Score is referenced only by CALL instructions. No need to relocate.
    assert binfile.is_relocated_addr(0x10001000) is False
    # MxEntity::SetEntityId is in the vtable and must be relocated.
    assert binfile.is_relocated_addr(0x10001070) is True
 # Not sanitizing dll name case. Do we care?
 IMPORT_REFS = (
    ("KERNEL32.dll", "CreateMutexA", 0x1010B3D0),
    ("WINMM.dll", "midiOutPrepareHeader", 0x1010B550),
 )
@pytest.mark.parametrize("import_ref", IMPORT_REFS)
 def test_imports(import_ref: Tuple[str, str, int], binfile: IsleBin):
    assert import_ref in binfile.imports
 # Location of the JMP instruction and the import address.
 THUNKS = (
    (0x100D3728, 0x1010B32C),  # DirectDrawCreate
    (0x10098F9E, 0x1010B3D4),  # RtlUnwind
 )
@pytest.mark.parametrize("thunk_ref", THUNKS)
 def test_thunks(thunk_ref: Tuple[int, int], binfile: IsleBin):
    assert thunk_ref in binfile.thunks
 def test_exports(binfile: IsleBin):
    assert len(binfile.exports) == 130
    assert (0x1003BFB0, b"??0LegoBackgroundColor@@QAE@PBD0@Z") in binfile.exports
    assert (0x10091EE0, b"_DllMain@12") in binfile.exports
--- a/tools/isledecomp/tests/test_linter.py
+++ b/tools/isledecomp/tests/test_linter.py
@ -1,144 +0,0 @@
 import pytest
 from isledecomp.parser import DecompLinter
 from isledecomp.parser.error import ParserError
@pytest.fixture(name="linter")
 def fixture_linter():
    return DecompLinter()
 def test_simple_in_order(linter):
    lines = [
        "// FUNCTION: TEST 0x1000",
        "void function1() {}",
        "// FUNCTION: TEST 0x2000",
        "void function2() {}",
        "// FUNCTION: TEST 0x3000",
        "void function3() {}",
    ]
    assert linter.check_lines(lines, "test.cpp", "TEST") is True
 def test_simple_not_in_order(linter):
    lines = [
        "// FUNCTION: TEST 0x1000",
        "void function1() {}",
        "// FUNCTION: TEST 0x3000",
        "void function3() {}",
        "// FUNCTION: TEST 0x2000",
        "void function2() {}",
    ]
    assert linter.check_lines(lines, "test.cpp", "TEST") is False
    assert len(linter.alerts) == 1
    assert linter.alerts[0].code == ParserError.FUNCTION_OUT_OF_ORDER
    # N.B. Line number given is the start of the function, not the marker
    assert linter.alerts[0].line_number == 6
 def test_byname_ignored(linter):
    """Should ignore lookup-by-name markers when checking order."""
    lines = [
        "// FUNCTION: TEST 0x1000",
        "void function1() {}",
        "// FUNCTION: TEST 0x3000",
        "// MyClass::MyMethod",
        "// FUNCTION: TEST 0x2000",
        "void function2() {}",
    ]
    # This will fail because byname lookup does not belong in the cpp file
    assert linter.check_lines(lines, "test.cpp", "TEST") is False
    # but it should not fail for function order.
    assert all(
        alert.code != ParserError.FUNCTION_OUT_OF_ORDER for alert in linter.alerts
    )
 def test_module_isolation(linter):
    """Should check the order of markers from a single module only."""
    lines = [
        "// FUNCTION: ALPHA 0x0001",
        "// FUNCTION: TEST 0x1000",
        "void function1() {}",
        "// FUNCTION: ALPHA 0x0002",
        "// FUNCTION: TEST 0x2000",
        "void function2() {}",
        "// FUNCTION: ALPHA 0x0003",
        "// FUNCTION: TEST 0x3000",
        "void function3() {}",
    ]
    assert linter.check_lines(lines, "test.cpp", "TEST") is True
    linter.reset(True)
    assert linter.check_lines(lines, "test.cpp", "ALPHA") is True
 def test_byname_headers_only(linter):
    """Markers that ar referenced by name with cvdump belong in header files only."""
    lines = [
        "// FUNCTION: TEST 0x1000",
        "// MyClass::~MyClass",
    ]
    assert linter.check_lines(lines, "test.h", "TEST") is True
    linter.reset(True)
    assert linter.check_lines(lines, "test.cpp", "TEST") is False
    assert linter.alerts[0].code == ParserError.BYNAME_FUNCTION_IN_CPP
 def test_duplicate_offsets(linter):
    """The linter will retain module/offset pairs found until we do a full reset."""
    lines = [
        "// FUNCTION: TEST 0x1000",
        "// FUNCTION: HELLO 0x1000",
        "// MyClass::~MyClass",
    ]
    # Should not fail for duplicate offset 0x1000 because the modules are unique.
    assert linter.check_lines(lines, "test.h", "TEST") is True
    # Simulate a failure by reading the same file twice.
    assert linter.check_lines(lines, "test.h", "TEST") is False
    # Two errors because offsets from both modules are duplicated
    assert len(linter.alerts) == 2
    assert all(a.code == ParserError.DUPLICATE_OFFSET for a in linter.alerts)
    # Partial reset will retain the list of seen offsets.
    linter.reset(False)
    assert linter.check_lines(lines, "test.h", "TEST") is False
    # Full reset will forget seen offsets.
    linter.reset(True)
    assert linter.check_lines(lines, "test.h", "TEST") is True
 def test_duplicate_strings(linter):
    """Duplicate string markers are okay if the string value is the same."""
    string_lines = [
        "// STRING: TEST 0x1000",
        'return "hello world";',
    ]
    # No problem to use this marker twice.
    assert linter.check_lines(string_lines, "test.h", "TEST") is True
    assert linter.check_lines(string_lines, "test.h", "TEST") is True
    different_string = [
        "// STRING: TEST 0x1000",
        'return "hi there";',
    ]
    # Same address but the string is different
    assert linter.check_lines(different_string, "greeting.h", "TEST") is False
    assert len(linter.alerts) == 1
    assert linter.alerts[0].code == ParserError.WRONG_STRING
    same_addr_reused = [
        "// GLOBAL:TEXT 0x1000",
        "int g_test = 123;",
    ]
    # This will fail like any other offset reuse.
    assert linter.check_lines(same_addr_reused, "other.h", "TEST") is False
--- a/tools/isledecomp/tests/test_parser.py
+++ b/tools/isledecomp/tests/test_parser.py
@ -1,773 +0,0 @@
 import pytest
 from isledecomp.parser.parser import (
    ReaderState,
    DecompParser,
 )
 from isledecomp.parser.error import ParserError
@pytest.fixture(name="parser")
 def fixture_parser():
    return DecompParser()
 def test_missing_sig(parser):
    """In the hopefully rare scenario that the function signature and marker
    are swapped, we still have enough to match witch reccmp"""
    parser.read_lines(
        [
            "void my_function()",
            "// FUNCTION: TEST 0x1234",
            "{",
            "}",
        ]
    )
    assert parser.state == ReaderState.SEARCH
    assert len(parser.functions) == 1
    assert parser.functions[0].line_number == 3
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.MISSED_START_OF_FUNCTION
 def test_not_exact_syntax(parser):
    """Alert to inexact syntax right here in the parser instead of kicking it downstream.
    Doing this means we don't have to save the actual text."""
    parser.read_line("// function: test 0x1234")
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.BAD_DECOMP_MARKER
 def test_invalid_marker(parser):
    """We matched a decomp marker, but it's not one we care about"""
    parser.read_line("// BANANA: TEST 0x1234")
    assert parser.state == ReaderState.SEARCH
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.BOGUS_MARKER
 def test_incompatible_marker(parser):
    """The marker we just read cannot be handled in the current parser state"""
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x1234",
            "// GLOBAL: TEST 0x5000",
        ]
    )
    assert parser.state == ReaderState.SEARCH
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.INCOMPATIBLE_MARKER
 def test_variable(parser):
    """Should identify a global variable"""
    parser.read_lines(
        [
            "// GLOBAL: HELLO 0x1234",
            "int g_value = 5;",
        ]
    )
    assert len(parser.variables) == 1
 def test_synthetic_plus_marker(parser):
    """Marker tracking preempts synthetic name detection.
    Should fail with error and not log the synthetic"""
    parser.read_lines(
        [
            "// SYNTHETIC: HEY 0x555",
            "// FUNCTION: HOWDY 0x1234",
        ]
    )
    assert len(parser.functions) == 0
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.INCOMPATIBLE_MARKER
 def test_different_markers_different_module(parser):
    """Does it make any sense for a function to be a stub in one module,
    but not in another? I don't know. But it's no problem for us."""
    parser.read_lines(
        [
            "// FUNCTION: HOWDY 0x1234",
            "// STUB: SUP 0x5555",
            "void interesting_function() {",
            "}",
        ]
    )
    assert len(parser.alerts) == 0
    assert len(parser.functions) == 2
 def test_different_markers_same_module(parser):
    """Now, if something is a regular function but then a stub,
    what do we say about that?"""
    parser.read_lines(
        [
            "// FUNCTION: HOWDY 0x1234",
            "// STUB: HOWDY 0x5555",
            "void interesting_function() {",
            "}",
        ]
    )
    # Use first marker declaration, don't replace
    assert len(parser.functions) == 1
    assert parser.functions[0].should_skip() is False
    # Should alert to this
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.DUPLICATE_MODULE
 def test_unexpected_synthetic(parser):
    """FUNCTION then SYNTHETIC should fail to report either one"""
    parser.read_lines(
        [
            "// FUNCTION: HOWDY 0x1234",
            "// SYNTHETIC: HOWDY 0x5555",
            "void interesting_function() {",
            "}",
        ]
    )
    assert parser.state == ReaderState.SEARCH
    assert len(parser.functions) == 0
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.INCOMPATIBLE_MARKER
@pytest.mark.skip(reason="not implemented yet")
 def test_duplicate_offset(parser):
    """Repeating the same module/offset in the same file is probably a typo"""
    parser.read_lines(
        [
            "// GLOBAL: HELLO 0x1234",
            "int x = 1;",
            "// GLOBAL: HELLO 0x1234",
            "int y = 2;",
        ]
    )
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.DUPLICATE_OFFSET
 def test_multiple_variables(parser):
    """Theoretically the same global variable can appear in multiple modules"""
    parser.read_lines(
        [
            "// GLOBAL: HELLO 0x1234",
            "// GLOBAL: WUZZUP 0x555",
            "const char *g_greeting;",
        ]
    )
    assert len(parser.alerts) == 0
    assert len(parser.variables) == 2
 def test_multiple_variables_same_module(parser):
    """Should not overwrite offset"""
    parser.read_lines(
        [
            "// GLOBAL: HELLO 0x1234",
            "// GLOBAL: HELLO 0x555",
            "const char *g_greeting;",
        ]
    )
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.DUPLICATE_MODULE
    assert len(parser.variables) == 1
    assert parser.variables[0].offset == 0x1234
 def test_multiple_vtables(parser):
    parser.read_lines(
        [
            "// VTABLE: HELLO 0x1234",
            "// VTABLE: TEST 0x5432",
            "class MxString : public MxCore {",
        ]
    )
    assert len(parser.alerts) == 0
    assert len(parser.vtables) == 2
    assert parser.vtables[0].name == "MxString"
 def test_multiple_vtables_same_module(parser):
    """Should not overwrite offset"""
    parser.read_lines(
        [
            "// VTABLE: HELLO 0x1234",
            "// VTABLE: HELLO 0x5432",
            "class MxString : public MxCore {",
        ]
    )
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.DUPLICATE_MODULE
    assert len(parser.vtables) == 1
    assert parser.vtables[0].offset == 0x1234
 def test_synthetic(parser):
    parser.read_lines(
        [
            "// SYNTHETIC: TEST 0x1234",
            "// TestClass::TestMethod",
        ]
    )
    assert len(parser.functions) == 1
    assert parser.functions[0].lookup_by_name is True
    assert parser.functions[0].name == "TestClass::TestMethod"
 def test_synthetic_same_module(parser):
    parser.read_lines(
        [
            "// SYNTHETIC: TEST 0x1234",
            "// SYNTHETIC: TEST 0x555",
            "// TestClass::TestMethod",
        ]
    )
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.DUPLICATE_MODULE
    assert len(parser.functions) == 1
    assert parser.functions[0].offset == 0x1234
 def test_synthetic_no_comment(parser):
    """Synthetic marker followed by a code line (i.e. non-comment)"""
    parser.read_lines(
        [
            "// SYNTHETIC: TEST 0x1234",
            "int x = 123;",
        ]
    )
    assert len(parser.functions) == 0
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.BAD_NAMEREF
    assert parser.state == ReaderState.SEARCH
 def test_single_line_function(parser):
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x1234",
            "int hello() { return 1234; }",
        ]
    )
    assert len(parser.functions) == 1
    assert parser.functions[0].line_number == 2
    assert parser.functions[0].end_line == 2
 def test_indented_function(parser):
    """Track the number of whitespace characters when we begin the function
    and check that against each closing curly brace we read.
    Should not report a syntax warning if the function is indented"""
    parser.read_lines(
        [
            "    // FUNCTION: TEST 0x1234",
            "    void indented()",
            "    {",
            "        // TODO",
            "    }",
            "    // FUNCTION: NEXT 0x555",
        ]
    )
    assert len(parser.alerts) == 0
@pytest.mark.xfail(reason="todo")
 def test_indented_no_curly_hint(parser):
    """Same as above, but opening curly brace is on the same line.
    Without the hint of how many whitespace characters to check, can we
    still identify the end of the function?"""
    parser.read_lines(
        [
            "    // FUNCTION: TEST 0x1234",
            "    void indented() {",
            "    }",
            "    // FUNCTION: NEXT 0x555",
        ]
    )
    assert len(parser.alerts) == 0
 def test_implicit_lookup_by_name(parser):
    """FUNCTION (or STUB) offsets must directly precede the function signature.
    If we detect a comment instead, we assume that this is a lookup-by-name
    function and end here."""
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x1234",
            "// TestClass::TestMethod()",
        ]
    )
    assert parser.state == ReaderState.SEARCH
    assert len(parser.functions) == 1
    assert parser.functions[0].lookup_by_name is True
    assert parser.functions[0].name == "TestClass::TestMethod()"
 def test_function_with_spaces(parser):
    """There should not be any spaces between the end of FUNCTION markers
    and the start or name of the function. If it's a blank line, we can safely
    ignore but should alert to this."""
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x1234",
            "   ",
            "inline void test_function() { };",
        ]
    )
    assert len(parser.functions) == 1
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.UNEXPECTED_BLANK_LINE
 def test_function_with_spaces_implicit(parser):
    """Same as above, but for implicit lookup-by-name"""
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x1234",
            "   ",
            "// Implicit::Method",
        ]
    )
    assert len(parser.functions) == 1
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.UNEXPECTED_BLANK_LINE
@pytest.mark.xfail(reason="will assume implicit lookup-by-name function")
 def test_function_is_commented(parser):
    """In an ideal world, we would recognize that there is no code here.
    Some editors (or users) might comment the function on each line like this
    but hopefully it is rare."""
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x1234",
            "// int my_function()",
            "// {",
            "//     return 5;",
            "// }",
        ]
    )
    assert len(parser.functions) == 0
 def test_unexpected_eof(parser):
    """If a decomp marker finds its way to the last line of the file,
    report that we could not get anything from it."""
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x1234",
            "// Cls::Method",
            "// FUNCTION: TEST 0x5555",
        ]
    )
    parser.finish()
    assert len(parser.functions) == 1
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.UNEXPECTED_END_OF_FILE
@pytest.mark.xfail(reason="no longer applies")
 def test_global_variable_prefix(parser):
    """Global and static variables should have the g_ prefix."""
    parser.read_lines(
        [
            "// GLOBAL: TEST 0x1234",
            'const char* g_msg = "hello";',
        ]
    )
    assert len(parser.variables) == 1
    assert len(parser.alerts) == 0
    parser.read_lines(
        [
            "// GLOBAL: TEXT 0x5555",
            "int test = 5;",
        ]
    )
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.GLOBAL_MISSING_PREFIX
    # In spite of that, we should still grab the variable name.
    assert parser.variables[1].name == "test"
 def test_global_nomatch(parser):
    """We do our best to grab the variable name, even without the g_ prefix
    but this (by design) will not match everything."""
    parser.read_lines(
        [
            "// GLOBAL: TEST 0x1234",
            "FunctionCall();",
        ]
    )
    assert len(parser.variables) == 0
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.NO_SUITABLE_NAME
 def test_static_variable(parser):
    """We can detect whether a variable is a static function variable
    based on the parser's state when we detect it.
    Checking for the word `static` alone is not a good test.
    Static class variables are filed as S_GDATA32, same as regular globals."""
    parser.read_lines(
        [
            "// GLOBAL: TEST 0x1234",
            "int g_test = 1234;",
        ]
    )
    assert len(parser.variables) == 1
    assert parser.variables[0].is_static is False
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x5555",
            "void test_function() {",
            "// GLOBAL: TEST 0x8888",
            "static int g_internal = 0;",
            "}",
        ]
    )
    assert len(parser.variables) == 2
    assert parser.variables[1].is_static is True
 def test_reject_global_return(parser):
    """Previously we had annotated strings with the GLOBAL marker.
    For example: if a function returned a string. We now want these to be
    annotated with the STRING marker."""
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x5555",
            "void test_function() {",
            "  // GLOBAL: TEST 0x8888",
            '  return "test";',
            "}",
        ]
    )
    assert len(parser.variables) == 0
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.GLOBAL_NOT_VARIABLE
 def test_global_string(parser):
    """We now allow GLOBAL and STRING markers for the same item."""
    parser.read_lines(
        [
            "// GLOBAL: TEST 0x1234",
            "// STRING: TEXT 0x5555",
            'char* g_test = "hello";',
        ]
    )
    assert len(parser.variables) == 1
    assert len(parser.strings) == 1
    assert len(parser.alerts) == 0
    assert parser.variables[0].name == "g_test"
    assert parser.strings[0].name == "hello"
 def test_comment_variables(parser):
    """Match on hidden variables from libraries."""
    parser.read_lines(
        [
            "// GLOBAL: TEST 0x1234",
            "// g_test",
        ]
    )
    assert len(parser.variables) == 1
    assert parser.variables[0].name == "g_test"
 def test_flexible_variable_prefix(parser):
    """Don't alert to library variables that lack the g_ prefix.
    This is out of our control."""
    parser.read_lines(
        [
            "// GLOBAL: TEST 0x1234",
            "// some_other_variable",
        ]
    )
    assert len(parser.variables) == 1
    assert len(parser.alerts) == 0
    assert parser.variables[0].name == "some_other_variable"
 def test_string_ignore_g_prefix(parser):
    """String annotations above a regular variable should not alert to
    the missing g_ prefix. This is only required for GLOBAL markers."""
    parser.read_lines(
        [
            "// STRING: TEST 0x1234",
            'const char* value = "";',
        ]
    )
    assert len(parser.strings) == 1
    assert len(parser.alerts) == 0
 def test_class_variable(parser):
    """We should accurately name static variables that are class members."""
    parser.read_lines(
        [
            "class Test {",
            "protected:",
            "  // GLOBAL: TEST 0x1234",
            "  static int g_test;",
            "};",
        ]
    )
    assert len(parser.variables) == 1
    assert parser.variables[0].name == "Test::g_test"
 def test_namespace_variable(parser):
    """We should identify a namespace surrounding any global variables"""
    parser.read_lines(
        [
            "namespace Test {",
            "// GLOBAL: TEST 0x1234",
            "int g_test = 1234;",
            "}",
            "// GLOBAL: TEST 0x5555",
            "int g_second = 2;",
        ]
    )
    assert len(parser.variables) == 2
    assert parser.variables[0].name == "Test::g_test"
    assert parser.variables[1].name == "g_second"
 def test_namespace_vtable(parser):
    parser.read_lines(
        [
            "namespace Tgl {",
            "// VTABLE: TEST 0x1234",
            "class Renderer {",
            "};",
            "}",
            "// VTABLE: TEST 0x5555",
            "class Hello { };",
        ]
    )
    assert len(parser.vtables) == 2
    assert parser.vtables[0].name == "Tgl::Renderer"
    assert parser.vtables[1].name == "Hello"
@pytest.mark.xfail(reason="no longer applies")
 def test_global_prefix_namespace(parser):
    """Should correctly identify namespaces before checking for the g_ prefix"""
    parser.read_lines(
        [
            "class Test {",
            "  // GLOBAL: TEST 0x1234",
            "  static int g_count = 0;",
            "  // GLOBAL: TEST 0x5555",
            "  static int count = 0;",
            "};",
        ]
    )
    assert len(parser.variables) == 2
    assert parser.variables[0].name == "Test::g_count"
    assert parser.variables[1].name == "Test::count"
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.GLOBAL_MISSING_PREFIX
 def test_nested_namespace(parser):
    parser.read_lines(
        [
            "namespace Tgl {",
            "class Renderer {",
            "  // GLOBAL: TEST 0x1234",
            "  static int g_count = 0;",
            "};",
            "};",
        ]
    )
    assert len(parser.variables) == 1
    assert parser.variables[0].name == "Tgl::Renderer::g_count"
 def test_match_qualified_variable(parser):
    """If a variable belongs to a scope and we use a fully qualified reference
    below a GLOBAL marker, make sure we capture the full name."""
    parser.read_lines(
        [
            "// GLOBAL: TEST 0x1234",
            "int MxTest::g_count = 0;",
        ]
    )
    assert len(parser.variables) == 1
    assert parser.variables[0].name == "MxTest::g_count"
    assert len(parser.alerts) == 0
 def test_static_variable_parent(parser):
    """Report the address of the parent function that contains a static variable."""
    parser.read_lines(
        [
            "// FUNCTION: TEST 0x1234",
            "void test()",
            "{",
            "   // GLOBAL: TEST 0x5555",
            "   static int g_count = 0;",
            "}",
        ]
    )
    assert len(parser.variables) == 1
    assert parser.variables[0].is_static is True
    assert parser.variables[0].parent_function == 0x1234
@pytest.mark.xfail(
    reason="""Without the FUNCTION marker we don't know that we are inside a function,
    so we do not identify this variable as static."""
 )
 def test_static_variable_no_parent(parser):
    """If the function that contains a static variable is not marked, we
    cannot match it with cvdump so we should skip it and report an error."""
    parser.read_lines(
        [
            "void test()",
            "{",
            "   // GLOBAL: TEST 0x5555",
            "   static int g_count = 0;",
            "}",
        ]
    )
    # No way to match this variable so don't report it
    assert len(parser.variables) == 0
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.ORPHANED_STATIC_VARIABLE
 def test_static_variable_incomplete_coverage(parser):
    """If the function that contains a static variable is marked, but
    not for each module used for the variable itself, this is an error."""
    parser.read_lines(
        [
            "// FUNCTION: HELLO 0x1234",
            "void test()",
            "{",
            "   // GLOBAL: HELLO 0x5555",
            "   // GLOBAL: TEST 0x5555",
            "   static int g_count = 0;",
            "}",
        ]
    )
    # Match for HELLO module
    assert len(parser.variables) == 1
    # Failed for TEST module
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.ORPHANED_STATIC_VARIABLE
 def test_header_function_declaration(parser):
    """This is either a forward reference or a declaration in a header file.
    Meaning: The implementation is not here. This is not the correct place
    for the FUNCTION marker and it will probably not match anything."""
    parser.read_lines(
        [
            "// FUNCTION: HELLO 0x1234",
            "void sample_function(int);",
        ]
    )
    assert len(parser.alerts) == 1
    assert parser.alerts[0].code == ParserError.NO_IMPLEMENTATION
 def test_extra(parser):
    """Allow a fourth field in the decomp annotation. Its use will vary
    depending on the marker type. Currently this is only used to identify
    a vtable with virtual inheritance."""
    # Intentionally using non-vtable markers here.
    # We might want to emit a parser warning for unnecessary extra info.
    parser.read_lines(
        [
            "// GLOBAL: TEST 0x5555 Haha",
            "int g_variable = 0;",
            "// FUNCTION: TEST 0x1234 Something",
            "void Test() { g_variable++; }",
            "// LIBRARY: TEST 0x8080 Printf",
            "// _printf",
        ]
    )
    # We don't use this information (yet) but this is all fine.
    assert len(parser.alerts) == 0
 def test_virtual_inheritance(parser):
    """Indicate the base class for a vtable where the class uses
    virtual inheritance."""
    parser.read_lines(
        [
            "// VTABLE: HELLO 0x1234",
            "// VTABLE: HELLO 0x1238 Greetings",
            "// VTABLE: HELLO 0x123c Howdy",
            "class HiThere : public virtual Greetings {",
            "};",
        ]
    )
    assert len(parser.alerts) == 0
    assert len(parser.vtables) == 3
    assert parser.vtables[0].base_class is None
    assert parser.vtables[1].base_class == "Greetings"
    assert parser.vtables[2].base_class == "Howdy"
    assert all(v.name == "HiThere" for v in parser.vtables)
 def test_namespace_in_comment(parser):
    parser.read_lines(
        [
            "// VTABLE: HELLO 0x1234",
            "// class Tgl::Object",
            "// VTABLE: HELLO 0x5555",
            "// class TglImpl::RendererImpl<D3DRMImpl::D3DRM>",
        ]
    )
    assert len(parser.vtables) == 2
    assert parser.vtables[0].name == "Tgl::Object"
    assert parser.vtables[1].name == "TglImpl::RendererImpl<D3DRMImpl::D3DRM>"
--- a/tools/isledecomp/tests/test_parser_samples.py
+++ b/tools/isledecomp/tests/test_parser_samples.py
@ -1,141 +0,0 @@
 import os
 from typing import List, TextIO
 import pytest
 from isledecomp.parser import DecompParser
 from isledecomp.parser.node import ParserSymbol
 SAMPLE_DIR = os.path.join(os.path.dirname(__file__), "samples")
 def sample_file(filename: str) -> TextIO:
    """Wrapper for opening the samples from the directory that does not
    depend on the cwd where we run the test"""
    full_path = os.path.join(SAMPLE_DIR, filename)
    return open(full_path, "r", encoding="utf-8")
 def code_blocks_are_sorted(blocks: List[ParserSymbol]) -> bool:
    """Helper to make this more idiomatic"""
    just_offsets = [block.offset for block in blocks]
    return just_offsets == sorted(just_offsets)
@pytest.fixture(name="parser")
 def fixture_parser():
    return DecompParser()
 # Tests are below #
 def test_sanity(parser):
    """Read a very basic file"""
    with sample_file("basic_file.cpp") as f:
        parser.read_lines(f)
    assert len(parser.functions) == 3
    assert code_blocks_are_sorted(parser.functions) is True
    # n.b. The parser returns line numbers as 1-based
    # Function starts when we see the opening curly brace
    assert parser.functions[0].line_number == 8
    assert parser.functions[0].end_line == 10
 def test_oneline(parser):
    """(Assuming clang-format permits this) This sample has a function
    on a single line. This will test the end-of-function detection"""
    with sample_file("oneline_function.cpp") as f:
        parser.read_lines(f)
    assert len(parser.functions) == 2
    assert parser.functions[0].line_number == 5
    assert parser.functions[0].end_line == 5
 def test_missing_offset(parser):
    """What if the function doesn't have an offset comment?"""
    with sample_file("missing_offset.cpp") as f:
        parser.read_lines(f)
    # TODO: For now, the function without the offset will just be ignored.
    # Would be the same outcome if the comment was present but mangled and
    # we failed to match it. We should detect these cases in the future.
    assert len(parser.functions) == 1
 def test_jumbled_case(parser):
    """The parser just reports what it sees. It is the responsibility of
    the downstream tools to do something about a jumbled file.
    Just verify that we are reading it correctly."""
    with sample_file("out_of_order.cpp") as f:
        parser.read_lines(f)
    assert len(parser.functions) == 3
    assert code_blocks_are_sorted(parser.functions) is False
 def test_bad_file(parser):
    with sample_file("poorly_formatted.cpp") as f:
        parser.read_lines(f)
    assert len(parser.functions) == 3
 def test_indented(parser):
    """Offsets for functions inside of a class will probably be indented."""
    with sample_file("basic_class.cpp") as f:
        parser.read_lines(f)
    # TODO: We don't properly detect the end of these functions
    # because the closing brace is indented. However... knowing where each
    # function ends is less important (for now) than capturing
    # all the functions that are there.
    assert len(parser.functions) == 2
    assert parser.functions[0].offset == int("0x12345678", 16)
    assert parser.functions[0].line_number == 16
    # assert parser.functions[0].end_line == 19
    assert parser.functions[1].offset == int("0xdeadbeef", 16)
    assert parser.functions[1].line_number == 23
    # assert parser.functions[1].end_line == 25
 def test_inline(parser):
    with sample_file("inline.cpp") as f:
        parser.read_lines(f)
    assert len(parser.functions) == 2
    for fun in parser.functions:
        assert fun.line_number is not None
        assert fun.line_number == fun.end_line
 def test_multiple_offsets(parser):
    """If multiple offset marks appear before for a code block, take them
    all but ensure module name (case-insensitive) is distinct.
    Use first module occurrence in case of duplicates."""
    with sample_file("multiple_offsets.cpp") as f:
        parser.read_lines(f)
    assert len(parser.functions) == 4
    assert parser.functions[0].module == "TEST"
    assert parser.functions[0].line_number == 9
    assert parser.functions[1].module == "HELLO"
    assert parser.functions[1].line_number == 9
    # Duplicate modules are ignored
    assert parser.functions[2].line_number == 16
    assert parser.functions[2].offset == 0x2345
    assert parser.functions[3].module == "TEST"
    assert parser.functions[3].offset == 0x2002
 def test_variables(parser):
    with sample_file("global_variables.cpp") as f:
        parser.read_lines(f)
    assert len(parser.functions) == 1
    assert len(parser.variables) == 2
--- a/tools/isledecomp/tests/test_parser_statechange.py
+++ b/tools/isledecomp/tests/test_parser_statechange.py
@ -1,141 +0,0 @@
 from typing import Optional
 import pytest
 from isledecomp.parser.parser import (
    ReaderState as _rs,
    DecompParser,
 )
 from isledecomp.parser.error import ParserError as _pe
 # fmt: off
 state_change_marker_cases = [
    (_rs.SEARCH,          "FUNCTION",   _rs.WANT_SIG,        None),
    (_rs.SEARCH,          "GLOBAL",     _rs.IN_GLOBAL,       None),
    (_rs.SEARCH,          "STUB",       _rs.WANT_SIG,        None),
    (_rs.SEARCH,          "SYNTHETIC",  _rs.IN_SYNTHETIC,    None),
    (_rs.SEARCH,          "TEMPLATE",   _rs.IN_TEMPLATE,     None),
    (_rs.SEARCH,          "VTABLE",     _rs.IN_VTABLE,       None),
    (_rs.SEARCH,          "LIBRARY",    _rs.IN_LIBRARY,      None),
    (_rs.SEARCH,          "STRING",     _rs.IN_GLOBAL,       None),
    (_rs.WANT_SIG,        "FUNCTION",   _rs.WANT_SIG,        None),
    (_rs.WANT_SIG,        "GLOBAL",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.WANT_SIG,        "STUB",       _rs.WANT_SIG,        None),
    (_rs.WANT_SIG,        "SYNTHETIC",  _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.WANT_SIG,        "TEMPLATE",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.WANT_SIG,        "VTABLE",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.WANT_SIG,        "LIBRARY",    _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.WANT_SIG,        "STRING",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_FUNC,         "FUNCTION",   _rs.WANT_SIG,        _pe.MISSED_END_OF_FUNCTION),
    (_rs.IN_FUNC,         "GLOBAL",     _rs.IN_FUNC_GLOBAL,  None),
    (_rs.IN_FUNC,         "STUB",       _rs.WANT_SIG,        _pe.MISSED_END_OF_FUNCTION),
    (_rs.IN_FUNC,         "SYNTHETIC",  _rs.IN_SYNTHETIC,    _pe.MISSED_END_OF_FUNCTION),
    (_rs.IN_FUNC,         "TEMPLATE",   _rs.IN_TEMPLATE,     _pe.MISSED_END_OF_FUNCTION),
    (_rs.IN_FUNC,         "VTABLE",     _rs.IN_VTABLE,       _pe.MISSED_END_OF_FUNCTION),
    (_rs.IN_FUNC,         "LIBRARY",    _rs.IN_LIBRARY,      _pe.MISSED_END_OF_FUNCTION),
    (_rs.IN_FUNC,         "STRING",     _rs.IN_FUNC_GLOBAL,  None),
    (_rs.IN_TEMPLATE,     "FUNCTION",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_TEMPLATE,     "GLOBAL",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_TEMPLATE,     "STUB",       _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_TEMPLATE,     "SYNTHETIC",  _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_TEMPLATE,     "TEMPLATE",   _rs.IN_TEMPLATE,     None),
    (_rs.IN_TEMPLATE,     "VTABLE",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_TEMPLATE,     "LIBRARY",    _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_TEMPLATE,     "STRING",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.WANT_CURLY,      "FUNCTION",   _rs.SEARCH,          _pe.UNEXPECTED_MARKER),
    (_rs.WANT_CURLY,      "GLOBAL",     _rs.SEARCH,          _pe.UNEXPECTED_MARKER),
    (_rs.WANT_CURLY,      "STUB",       _rs.SEARCH,          _pe.UNEXPECTED_MARKER),
    (_rs.WANT_CURLY,      "SYNTHETIC",  _rs.SEARCH,          _pe.UNEXPECTED_MARKER),
    (_rs.WANT_CURLY,      "TEMPLATE",   _rs.SEARCH,          _pe.UNEXPECTED_MARKER),
    (_rs.WANT_CURLY,      "VTABLE",     _rs.SEARCH,          _pe.UNEXPECTED_MARKER),
    (_rs.WANT_CURLY,      "LIBRARY",    _rs.SEARCH,          _pe.UNEXPECTED_MARKER),
    (_rs.WANT_CURLY,      "STRING",     _rs.SEARCH,          _pe.UNEXPECTED_MARKER),
    (_rs.IN_GLOBAL,       "FUNCTION",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_GLOBAL,       "GLOBAL",     _rs.IN_GLOBAL,       None),
    (_rs.IN_GLOBAL,       "STUB",       _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_GLOBAL,       "SYNTHETIC",  _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_GLOBAL,       "TEMPLATE",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_GLOBAL,       "VTABLE",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_GLOBAL,       "LIBRARY",    _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_GLOBAL,       "STRING",     _rs.IN_GLOBAL,       None),
    (_rs.IN_FUNC_GLOBAL,  "FUNCTION",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_FUNC_GLOBAL,  "GLOBAL",     _rs.IN_FUNC_GLOBAL,  None),
    (_rs.IN_FUNC_GLOBAL,  "STUB",       _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_FUNC_GLOBAL,  "SYNTHETIC",  _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_FUNC_GLOBAL,  "TEMPLATE",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_FUNC_GLOBAL,  "VTABLE",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_FUNC_GLOBAL,  "LIBRARY",    _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_FUNC_GLOBAL,  "STRING",     _rs.IN_FUNC_GLOBAL,  None),
    (_rs.IN_VTABLE,       "FUNCTION",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_VTABLE,       "GLOBAL",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_VTABLE,       "STUB",       _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_VTABLE,       "SYNTHETIC",  _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_VTABLE,       "TEMPLATE",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_VTABLE,       "VTABLE",     _rs.IN_VTABLE,       None),
    (_rs.IN_VTABLE,       "LIBRARY",    _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_VTABLE,       "STRING",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_SYNTHETIC,    "FUNCTION",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_SYNTHETIC,    "GLOBAL",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_SYNTHETIC,    "STUB",       _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_SYNTHETIC,    "SYNTHETIC",  _rs.IN_SYNTHETIC,    None),
    (_rs.IN_SYNTHETIC,    "TEMPLATE",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_SYNTHETIC,    "VTABLE",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_SYNTHETIC,    "LIBRARY",    _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_SYNTHETIC,    "STRING",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_LIBRARY,      "FUNCTION",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_LIBRARY,      "GLOBAL",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_LIBRARY,      "STUB",       _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_LIBRARY,      "SYNTHETIC",  _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_LIBRARY,      "TEMPLATE",   _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_LIBRARY,      "VTABLE",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
    (_rs.IN_LIBRARY,      "LIBRARY",    _rs.IN_LIBRARY,      None),
    (_rs.IN_LIBRARY,      "STRING",     _rs.SEARCH,          _pe.INCOMPATIBLE_MARKER),
 ]
 # fmt: on
@pytest.mark.parametrize(
    "state, marker_type, new_state, expected_error", state_change_marker_cases
 )
 def test_state_change_by_marker(
    state: _rs, marker_type: str, new_state: _rs, expected_error: Optional[_pe]
 ):
    p = DecompParser()
    p.state = state
    mock_line = f"// {marker_type}: TEST 0x1234"
    p.read_line(mock_line)
    assert p.state == new_state
    if expected_error is not None:
        assert len(p.alerts) > 0
        assert p.alerts[0].code == expected_error
 # Reading any of these lines should have no effect in ReaderState.SEARCH
 search_lines_no_effect = [
    "",
    "\t",
    "    ",
    "int x = 0;",
    "// Comment",
    "/*",
    "*/",
    "/* Block comment */",
    "{",
    "}",
 ]
@pytest.mark.parametrize("line", search_lines_no_effect)
 def test_state_search_line(line: str):
    p = DecompParser()
    p.read_line(line)
    assert p.state == _rs.SEARCH
    assert len(p.alerts) == 0
--- a/tools/isledecomp/tests/test_parser_util.py
+++ b/tools/isledecomp/tests/test_parser_util.py
@ -1,209 +0,0 @@
 import pytest
 from isledecomp.parser.parser import MarkerDict
 from isledecomp.parser.marker import (
    DecompMarker,
    MarkerType,
    match_marker,
    is_marker_exact,
 )
 from isledecomp.parser.util import (
    is_blank_or_comment,
    get_class_name,
    get_variable_name,
    get_string_contents,
 )
 blank_or_comment_param = [
    (True, ""),
    (True, "\t"),
    (True, "    "),
    (False, "\tint abc=123;"),
    (True, "// OFFSET: LEGO1 0xdeadbeef"),
    (True, "   /* Block comment beginning"),
    (True, "Block comment ending */   "),
    # TODO: does clang-format have anything to say about these cases?
    (False, "x++; // Comment folows"),
    (False, "x++; /* Block comment begins"),
 ]
@pytest.mark.parametrize("expected, line", blank_or_comment_param)
 def test_is_blank_or_comment(line: str, expected: bool):
    assert is_blank_or_comment(line) is expected
 marker_samples = [
    # (can_parse: bool, exact_match: bool, line: str)
    (True, True, "// FUNCTION: LEGO1 0xdeadbeef"),
    (True, True, "// FUNCTION: ISLE 0x12345678"),
    # No trailing spaces allowed
    (True, False, "// FUNCTION: LEGO1 0xdeadbeef  "),
    # Must have exactly one space between elements
    (True, False, "//FUNCTION: ISLE 0xdeadbeef"),
    (True, False, "// FUNCTION:ISLE 0xdeadbeef"),
    (True, False, "//  FUNCTION: ISLE 0xdeadbeef"),
    (True, False, "// FUNCTION:  ISLE 0xdeadbeef"),
    (True, False, "// FUNCTION: ISLE  0xdeadbeef"),
    # Must have 0x prefix for hex number to match at all
    (False, False, "// FUNCTION: ISLE deadbeef"),
    # Offset, module name, and STUB must be uppercase
    (True, False, "// function: ISLE 0xdeadbeef"),
    (True, False, "// function: isle 0xdeadbeef"),
    # Hex string must be lowercase
    (True, False, "// FUNCTION: ISLE 0xDEADBEEF"),
    # TODO: How flexible should we be with matching the module name?
    (True, True, "// FUNCTION: OMNI 0x12345678"),
    (True, True, "// FUNCTION: LEG01 0x12345678"),
    (True, False, "// FUNCTION: hello 0x12345678"),
    # Not close enough to match
    (False, False, "// FUNCTION: ISLE0x12345678"),
    (False, False, "// FUNCTION: 0x12345678"),
    (False, False, "// LEGO1: 0x12345678"),
    # Hex string shorter than 8 characters
    (True, True, "// FUNCTION: LEGO1 0x1234"),
    # TODO: These match but shouldn't.
    # (False, False, '// FUNCTION: LEGO1 0'),
    # (False, False, '// FUNCTION: LEGO1 0x'),
    # Extra field
    (True, True, "// VTABLE: HELLO 0x1234 Extra"),
    # Extra with spaces
    (True, True, "// VTABLE: HELLO 0x1234 Whatever<SubClass *>"),
    # Extra, no space (if the first non-hex character is not in [a-f])
    (True, False, "// VTABLE: HELLO 0x1234Hello"),
    # Extra, many spaces
    (True, False, "// VTABLE: HELLO 0x1234    Hello"),
 ]
@pytest.mark.parametrize("match, _, line", marker_samples)
 def test_marker_match(line: str, match: bool, _):
    did_match = match_marker(line) is not None
    assert did_match is match
@pytest.mark.parametrize("_, exact, line", marker_samples)
 def test_marker_exact(line: str, exact: bool, _):
    assert is_marker_exact(line) is exact
 def test_marker_dict_simple():
    d = MarkerDict()
    d.insert(DecompMarker("FUNCTION", "TEST", 0x1234))
    markers = list(d.iter())
    assert len(markers) == 1
 def test_marker_dict_ofs_replace():
    d = MarkerDict()
    d.insert(DecompMarker("FUNCTION", "TEST", 0x1234))
    d.insert(DecompMarker("FUNCTION", "TEST", 0x555))
    markers = list(d.iter())
    assert len(markers) == 1
    assert markers[0].offset == 0x1234
 def test_marker_dict_type_replace():
    d = MarkerDict()
    d.insert(DecompMarker("FUNCTION", "TEST", 0x1234))
    d.insert(DecompMarker("STUB", "TEST", 0x1234))
    markers = list(d.iter())
    assert len(markers) == 1
    assert markers[0].type == MarkerType.FUNCTION
 class_name_match_cases = [
    ("struct MxString {", "MxString"),
    ("class MxString {", "MxString"),
    ("// class MxString", "MxString"),
    ("class MxString : public MxCore {", "MxString"),
    ("class MxPtrList<MxPresenter>", "MxPtrList<MxPresenter>"),
    # If it is possible to match the symbol MxList<LegoPathController *>::`vftable'
    # we should get the correct class name if possible. If the template type is a pointer,
    # the asterisk and class name are separated by one space.
    ("// class MxList<LegoPathController *>", "MxList<LegoPathController *>"),
    ("// class MxList<LegoPathController*>", "MxList<LegoPathController *>"),
    ("// class MxList<LegoPathController* >", "MxList<LegoPathController *>"),
    # I don't know if this would ever come up, but sure, why not?
    ("// class MxList<LegoPathController**>", "MxList<LegoPathController **>"),
    ("// class Many::Name::Spaces", "Many::Name::Spaces"),
 ]
@pytest.mark.parametrize("line, class_name", class_name_match_cases)
 def test_get_class_name(line: str, class_name: str):
    assert get_class_name(line) == class_name
 class_name_no_match_cases = [
    "MxString { ",
    "clas MxString",
    "// MxPtrList<MxPresenter>::`scalar deleting destructor'",
 ]
@pytest.mark.parametrize("line", class_name_no_match_cases)
 def test_get_class_name_none(line: str):
    assert get_class_name(line) is None
 variable_name_cases = [
    # with prefix for easy access
    ("char* g_test;", "g_test"),
    ("g_test;", "g_test"),
    ("void (*g_test)(int);", "g_test"),
    ("char g_test[50];", "g_test"),
    ("char g_test[50] = {1234,", "g_test"),
    ("int g_test = 500;", "g_test"),
    # no prefix
    ("char* hello;", "hello"),
    ("hello;", "hello"),
    ("void (*hello)(int);", "hello"),
    ("char hello[50];", "hello"),
    ("char hello[50] = {1234,", "hello"),
    ("int hello = 500;", "hello"),
 ]
@pytest.mark.parametrize("line,name", variable_name_cases)
 def test_get_variable_name(line: str, name: str):
    assert get_variable_name(line) == name
 string_match_cases = [
    ('return "hello world";', "hello world"),
    ('"hello\\\\"', "hello\\"),
    ('"hello \\"world\\""', 'hello "world"'),
    ('"hello\\nworld"', "hello\nworld"),
    # Only match first string if there are multiple options
    ('Method("hello", "world");', "hello"),
 ]
@pytest.mark.parametrize("line, string", string_match_cases)
 def test_get_string_contents(line: str, string: str):
    assert get_string_contents(line) == string
 def test_marker_extra_spaces():
    """The extra field can contain spaces"""
    marker = match_marker("// VTABLE: TEST 0x1234 S p a c e s")
    assert marker.extra == "S p a c e s"
    # Trailing spaces removed
    marker = match_marker("// VTABLE: TEST 0x8888 spaces    ")
    assert marker.extra == "spaces"
    # Trailing newline removed if present
    marker = match_marker("// VTABLE: TEST 0x5555 newline\n")
    assert marker.extra == "newline"
 def test_marker_trailing_spaces():
    """Should ignore trailing spaces. (Invalid extra field)
    Offset field not truncated, extra field set to None."""
    marker = match_marker("// VTABLE: TEST 0x1234     ")
    assert marker is not None
    assert marker.offset == 0x1234
    assert marker.extra is None
--- a/tools/isledecomp/tests/test_path_resolver_nt.py
+++ b/tools/isledecomp/tests/test_path_resolver_nt.py
@ -1,32 +0,0 @@
 from os import name as os_name
 import pytest
 from isledecomp.dir import PathResolver
 if os_name != "nt":
    pytest.skip(reason="Skip Windows-only tests", allow_module_level=True)
@pytest.fixture(name="resolver")
 def fixture_resolver_win():
    yield PathResolver("C:\\isle")
 def test_identity(resolver):
    assert resolver.resolve_cvdump("C:\\isle\\test.h") == "C:\\isle\\test.h"
 def test_outside_basedir(resolver):
    assert resolver.resolve_cvdump("C:\\lego\\test.h") == "C:\\lego\\test.h"
 def test_relative(resolver):
    assert resolver.resolve_cvdump(".\\test.h") == "C:\\isle\\test.h"
    assert resolver.resolve_cvdump("..\\test.h") == "C:\\test.h"
 def test_intermediate_relative(resolver):
    """These paths may not register as `relative` paths, but we want to
    produce a single absolute path for each."""
    assert resolver.resolve_cvdump("C:\\isle\\test\\..\\test.h") == "C:\\isle\\test.h"
    assert resolver.resolve_cvdump(".\\subdir\\..\\test.h") == "C:\\isle\\test.h"
--- a/tools/isledecomp/tests/test_path_resolver_posix.py
+++ b/tools/isledecomp/tests/test_path_resolver_posix.py
@ -1,69 +0,0 @@
 from os import name as os_name
 from unittest.mock import patch
 import pytest
 from isledecomp.dir import PathResolver
 if os_name == "nt":
    pytest.skip(reason="Skip Posix-only tests", allow_module_level=True)
@pytest.fixture(name="resolver")
 def fixture_resolver_posix():
    # Skip the call to winepath by using a patch, although this is not strictly necessary.
    with patch("isledecomp.dir.winepath_unix_to_win", return_value="Z:\\usr\\isle"):
        yield PathResolver("/usr/isle")
@patch("isledecomp.dir.winepath_win_to_unix")
 def test_identity(winepath_mock, resolver):
    """Test with an absolute Wine path where a path swap is possible."""
    # In this and upcoming tests, patch is_file so we always assume there is
    # a file at the given unix path. We want to test the conversion logic only.
    with patch("pathlib.Path.is_file", return_value=True):
        assert resolver.resolve_cvdump("Z:\\usr\\isle\\test.h") == "/usr/isle/test.h"
    winepath_mock.assert_not_called()
    # Without the patch, this should call the winepath_mock, but we have
    # memoized the value from the previous run.
    assert resolver.resolve_cvdump("Z:\\usr\\isle\\test.h") == "/usr/isle/test.h"
    winepath_mock.assert_not_called()
@patch("isledecomp.dir.winepath_win_to_unix")
 def test_file_does_not_exist(winepath_mock, resolver):
    """These test files (probably) don't exist, so we always assume
    the path swap failed and defer to winepath."""
    resolver.resolve_cvdump("Z:\\usr\\isle\\test.h")
    winepath_mock.assert_called_once_with("Z:\\usr\\isle\\test.h")
@patch("isledecomp.dir.winepath_win_to_unix")
 def test_outside_basedir(winepath_mock, resolver):
    """Test an absolute path where we cannot do a path swap."""
    with patch("pathlib.Path.is_file", return_value=True):
        resolver.resolve_cvdump("Z:\\lego\\test.h")
    winepath_mock.assert_called_once_with("Z:\\lego\\test.h")
@patch("isledecomp.dir.winepath_win_to_unix")
 def test_relative(winepath_mock, resolver):
    """Test relative paths inside and outside of the base dir."""
    with patch("pathlib.Path.is_file", return_value=True):
        assert resolver.resolve_cvdump("./test.h") == "/usr/isle/test.h"
        # This works because we will resolve "/usr/isle/test/../test.h"
        assert resolver.resolve_cvdump("../test.h") == "/usr/test.h"
    winepath_mock.assert_not_called()
@patch("isledecomp.dir.winepath_win_to_unix")
 def test_intermediate_relative(winepath_mock, resolver):
    """We can resolve intermediate backdirs if they are relative to the basedir."""
    with patch("pathlib.Path.is_file", return_value=True):
        assert (
            resolver.resolve_cvdump("Z:\\usr\\isle\\test\\..\\test.h")
            == "/usr/isle/test.h"
        )
        assert resolver.resolve_cvdump(".\\subdir\\..\\test.h") == "/usr/isle/test.h"
    winepath_mock.assert_not_called()
--- a/tools/isledecomp/tests/test_sanitize.py
+++ b/tools/isledecomp/tests/test_sanitize.py
@ -1,296 +0,0 @@
 from typing import Optional
 import pytest
 from isledecomp.compare.asm.parse import DisasmLiteInst, ParseAsm
 def mock_inst(mnemonic: str, op_str: str) -> DisasmLiteInst:
    """Mock up the named tuple DisasmLite from just a mnemonic and op_str.
    To be used for tests on sanitize that do not require the instruction address
    or size. i.e. any non-jump instruction."""
    return DisasmLiteInst(0, 0, mnemonic, op_str)
 identity_cases = [
    ("", ""),
    ("sti", ""),
    ("push", "ebx"),
    ("ret", ""),
    ("ret", "4"),
    ("mov", "eax, 0x1234"),
 ]
@pytest.mark.parametrize("mnemonic, op_str", identity_cases)
 def test_identity(mnemonic, op_str):
    """Confirm that nothing is substituted."""
    p = ParseAsm()
    inst = mock_inst(mnemonic, op_str)
    result = p.sanitize(inst)
    assert result == (mnemonic, op_str)
 ptr_replace_cases = [
    ("byte ptr [0x5555]", "byte ptr [<OFFSET1>]"),
    ("word ptr [0x5555]", "word ptr [<OFFSET1>]"),
    ("dword ptr [0x5555]", "dword ptr [<OFFSET1>]"),
    ("qword ptr [0x5555]", "qword ptr [<OFFSET1>]"),
    ("eax, dword ptr [0x5555]", "eax, dword ptr [<OFFSET1>]"),
    ("dword ptr [0x5555], eax", "dword ptr [<OFFSET1>], eax"),
    ("dword ptr [0x5555], 0", "dword ptr [<OFFSET1>], 0"),
    ("dword ptr [0x5555], 8", "dword ptr [<OFFSET1>], 8"),
    # Same value, assumed to be an addr in the first appearance
    # because it is designated as 'ptr', but we have not provided the
    # relocation table lookup method so we do not replace the second appearance.
    ("dword ptr [0x5555], 0x5555", "dword ptr [<OFFSET1>], 0x5555"),
 ]
@pytest.mark.parametrize("start, end", ptr_replace_cases)
 def test_ptr_replace(start, end):
    """Anything in square brackets (with the 'ptr' prefix) will always be replaced."""
    p = ParseAsm()
    inst = mock_inst("", start)
    (_, op_str) = p.sanitize(inst)
    assert op_str == end
 call_replace_cases = [
    ("ebx", "ebx"),
    ("0x1234", "<OFFSET1>"),
    ("dword ptr [0x1234]", "dword ptr [<OFFSET1>]"),
    ("dword ptr [ecx + 0x10]", "dword ptr [ecx + 0x10]"),
 ]
@pytest.mark.parametrize("start, end", call_replace_cases)
 def test_call_replace(start, end):
    """Call with hex operand is always replaced.
    Otherwise, ptr replacement rules apply, but skip `this` calls."""
    p = ParseAsm()
    inst = mock_inst("call", start)
    (_, op_str) = p.sanitize(inst)
    assert op_str == end
 def test_jump_displacement():
    """Display jump displacement (offset from end of jump instruction)
    instead of destination address."""
    p = ParseAsm()
    inst = DisasmLiteInst(0x1000, 2, "je", "0x1000")
    (_, op_str) = p.sanitize(inst)
    assert op_str == "-0x2"
 def test_jmp_table():
    """To ignore cases where it would be inappropriate to replace pointer
    displacement (i.e. the vast majority of them) we require the address
    to be relocated. This excludes any address less than the imagebase."""
    p = ParseAsm()
    inst = mock_inst("jmp", "dword ptr [eax*4 + 0x5555]")
    (_, op_str) = p.sanitize(inst)
    # i.e. no change
    assert op_str == "dword ptr [eax*4 + 0x5555]"
    def relocate_lookup(addr: int) -> bool:
        return addr == 0x5555
    # Now add the relocation lookup
    p = ParseAsm(relocate_lookup=relocate_lookup)
    (_, op_str) = p.sanitize(inst)
    # Should replace it now
    assert op_str == "dword ptr [eax*4 + <OFFSET1>]"
 name_replace_cases = [
    ("byte ptr [0x5555]", "byte ptr [_substitute_]"),
    ("word ptr [0x5555]", "word ptr [_substitute_]"),
    ("dword ptr [0x5555]", "dword ptr [_substitute_]"),
    ("qword ptr [0x5555]", "qword ptr [_substitute_]"),
 ]
@pytest.mark.parametrize("start, end", name_replace_cases)
 def test_name_replace(start, end):
    """Make sure the name lookup function is called if present"""
    def substitute(_: int, __: bool) -> str:
        return "_substitute_"
    p = ParseAsm(name_lookup=substitute)
    inst = mock_inst("mov", start)
    (_, op_str) = p.sanitize(inst)
    assert op_str == end
 def test_replacement_cache():
    p = ParseAsm()
    inst = mock_inst("inc", "dword ptr [0x1234]")
    (_, op_str) = p.sanitize(inst)
    assert op_str == "dword ptr [<OFFSET1>]"
    (_, op_str) = p.sanitize(inst)
    assert op_str == "dword ptr [<OFFSET1>]"
 def test_replacement_numbering():
    """If we can use the name lookup for the first address but not the second,
    the second replacement should be <OFFSET2> not <OFFSET1>."""
    def substitute_1234(addr: int, _: bool) -> Optional[str]:
        return "_substitute_" if addr == 0x1234 else None
    p = ParseAsm(name_lookup=substitute_1234)
    (_, op_str) = p.sanitize(mock_inst("inc", "dword ptr [0x1234]"))
    assert op_str == "dword ptr [_substitute_]"
    (_, op_str) = p.sanitize(mock_inst("inc", "dword ptr [0x5555]"))
    assert op_str == "dword ptr [<OFFSET2>]"
 def test_relocate_lookup():
    """Immediate values would be relocated if they are actually addresses.
    So we can use the relocation table to check whether a given value is an
    address or just some number."""
    def relocate_lookup(addr: int) -> bool:
        return addr == 0x1234
    p = ParseAsm(relocate_lookup=relocate_lookup)
    (_, op_str) = p.sanitize(mock_inst("mov", "eax, 0x1234"))
    assert op_str == "eax, <OFFSET1>"
    (_, op_str) = p.sanitize(mock_inst("mov", "eax, 0x5555"))
    assert op_str == "eax, 0x5555"
 def test_jump_to_function():
    """A jmp instruction can lead us directly to a function. This can be found
    in the unwind section at the end of a function. However: we do not want to
    assume this is the case for all jumps. Only replace the jump with a name
    if we can find it using our lookup."""
    def substitute_1234(addr: int, _: bool) -> Optional[str]:
        return "_substitute_" if addr == 0x1234 else None
    p = ParseAsm(name_lookup=substitute_1234)
    inst = DisasmLiteInst(0x1000, 2, "jmp", "0x1234")
    (_, op_str) = p.sanitize(inst)
    assert op_str == "_substitute_"
    # Should not replace this jump.
    # 0x1000 (start addr)
    # + 2 (size of jump instruction)
    # + 0x5555 (displacement, the value we want)
    # = 0x6557
    inst = DisasmLiteInst(0x1000, 2, "jmp", "0x6557")
    (_, op_str) = p.sanitize(inst)
    assert op_str == "0x5555"
@pytest.mark.skip(reason="changed implementation")
 def test_float_replacement():
    """Floating point constants often appear as pointers to data.
    A good example is ViewROI::IntrinsicImportance and the subclass override
    LegoROI::IntrinsicImportance. Both return 0.5, but this is done via the
    FLD instruction and a dword value at 0x100dbdec. In this case it is more
    valuable to just read the constant value rather than use a placeholder.
    The float constants don't appear to be deduplicated (like strings are)
    because there is another 0.5 at 0x100d40b0."""
    def bin_lookup(addr: int, _: int) -> Optional[bytes]:
        return b"\xdb\x0f\x49\x40" if addr == 0x1234 else None
    p = ParseAsm(bin_lookup=bin_lookup)
    inst = DisasmLiteInst(0x1000, 6, "fld", "dword ptr [0x1234]")
    (_, op_str) = p.sanitize(inst)
    # Single-precision float. struct.unpack("<f", struct.pack("<f", math.pi))
    assert op_str == "dword ptr [3.1415927410125732 (FLOAT)]"
@pytest.mark.skip(reason="changed implementation")
 def test_float_variable():
    """If there is a variable at the address referenced by a float instruction,
    use the name instead of calling into the float replacement handler."""
    def name_lookup(addr: int, _: bool) -> Optional[str]:
        return "g_myFloatVariable" if addr == 0x1234 else None
    p = ParseAsm(name_lookup=name_lookup)
    inst = DisasmLiteInst(0x1000, 6, "fld", "dword ptr [0x1234]")
    (_, op_str) = p.sanitize(inst)
    assert op_str == "dword ptr [g_myFloatVariable]"
 def test_pointer_compare():
    """A loop on an array could get optimized into comparing on the address
    that immediately follows the array. This may or may not be a valid address
    and it may or may not be annotated. To avoid a situation where an
    erroneous address value would get replaced with a placeholder and silently
    pass the comparison check, we will only replace an immediate value on the
    CMP instruction if it is a known address."""
    # 0x1234 and 0x5555 are relocated and so are considered to be addresses.
    def relocate_lookup(addr: int) -> bool:
        return addr in (0x1234, 0x5555)
    # Only 0x5555 is a "known" address
    def name_lookup(addr: int, _: bool) -> Optional[str]:
        return "hello" if addr == 0x5555 else None
    p = ParseAsm(relocate_lookup=relocate_lookup, name_lookup=name_lookup)
    # Will always replace on MOV instruction
    (_, op_str) = p.sanitize(mock_inst("mov", "eax, 0x1234"))
    assert op_str == "eax, <OFFSET1>"
    (_, op_str) = p.sanitize(mock_inst("mov", "eax, 0x5555"))
    assert op_str == "eax, hello"
    # n.b. We have already cached the replacement for 0x1234, but the
    # special handling for CMP should skip the cache and not use it.
    # Do not replace here
    (_, op_str) = p.sanitize(mock_inst("cmp", "eax, 0x1234"))
    assert op_str == "eax, 0x1234"
    # Should replace here
    (_, op_str) = p.sanitize(mock_inst("cmp", "eax, 0x5555"))
    assert op_str == "eax, hello"
 def test_absolute_indirect():
    """The instruction `call dword ptr [0x1234]` means we call the function
    whose address is at 0x1234. (i.e. absolute indirect addressing mode)
    It is probably more useful to show the name of the function itself if
    we have it, but there are some circumstances where we want to replace
    with the pointer's name (i.e. an import function)."""
    def name_lookup(addr: int, _: bool) -> Optional[str]:
        return {
            0x1234: "Hello",
            0x4321: "xyz",
            0x5555: "Test",
        }.get(addr)
    def bin_lookup(addr: int, _: int) -> Optional[bytes]:
        return (
            {
                0x1234: b"\x55\x55\x00\x00",
                0x4321: b"\x99\x99\x00\x00",
            }
        ).get(addr)
    p = ParseAsm(name_lookup=name_lookup, bin_lookup=bin_lookup)
    # If we know the indirect address (0x5555)
    # Arrow to indicate this is an indirect replacement
    (_, op_str) = p.sanitize(mock_inst("call", "dword ptr [0x1234]"))
    assert op_str == "dword ptr [->Test]"
    # If we do not know the indirect address (0x9999)
    (_, op_str) = p.sanitize(mock_inst("call", "dword ptr [0x4321]"))
    assert op_str == "dword ptr [xyz]"
    # If we can't read the indirect address
    (_, op_str) = p.sanitize(mock_inst("call", "dword ptr [0x5555]"))
    assert op_str == "dword ptr [Test]"
--- a/tools/ncc/ncc.py
+++ b/tools/ncc/ncc.py
@ -1,661 +0,0 @@
 #!/usr/bin/env python
 # MIT License
 #
 # Copyright (c) 2018 Nithin Nellikunnu (nithin.nn@gmail.com)
 #
 # Permission is hereby granted, free of charge, to any person obtaining a copy
 # of this software and associated documentation files (the "Software"), to deal
 # in the Software without restriction, including without limitation the rights
 # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 # copies of the Software, and to permit persons to whom the Software is
 # furnished to do so, subject to the following conditions:
 #
 # The above copyright notice and this permission notice shall be included in all
 # copies or substantial portions of the Software.
 #
 # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 # SOFTWARE.
 import logging
 import argparse
 import yaml
 import re
 import sys
 import difflib
 import os
 import fnmatch
 from clang.cindex import Index
 from clang.cindex import CursorKind
 from clang.cindex import StorageClass
 from clang.cindex import TypeKind
 from clang.cindex import Config
 # Clang cursor kind to ncc Defined cursor map
 default_rules_db = {}
 clang_to_user_map = {}
 special_kind = {CursorKind.STRUCT_DECL: 1, CursorKind.CLASS_DECL: 1}
 file_extensions = [".c", ".cpp", ".h", ".hpp"]
 class Rule(object):
    def __init__(self, name, clang_kind, parent_kind=None, pattern_str='^.*$'):
        self.name = name
        self.clang_kind = clang_kind
        self.parent_kind = parent_kind
        self.pattern_str = pattern_str
        self.pattern = re.compile(pattern_str)
        self.includes = []
        self.excludes = []
    def evaluate(self, node, scope=None):
        if not self.pattern.match(node.spelling):
            fmt = '{}:{}:{}: "{}" does not match "{}" associated with {}\n'
            msg = fmt.format(node.location.file.name, node.location.line, node.location.column,
                             node.displayname, self.pattern_str, self.name)
            sys.stderr.write(msg)
            return False
        return True
 class ScopePrefixRule(object):
    def __init__(self, pattern_obj):
        self.name = "ScopePrefixRule"
        self.rule_names = ["Global", "Static", "ClassMember", "StructMember"]
        self.global_prefix = ""
        self.static_prefix = ""
        self.class_member_prefix = ""
        self.struct_member_prefix = ""
        try:
            for key, value in pattern_obj.items():
                if key == "Global":
                    self.global_prefix = value
                elif key == "Static":
                    self.static_prefix = value
                elif key == "ClassMember":
                    self.class_member_prefix = value
                elif key == "StructMember":
                    self.struct_member_prefix = value
                else:
                    raise ValueError(key)
        except ValueError as e:
            sys.stderr.write('{} is not a valid rule name\n'.format(e.message))
            fixit = difflib.get_close_matches(e.message, self.rule_names, n=1, cutoff=0.8)
            if fixit:
                sys.stderr.write('Did you mean rule name: {} ?\n'.format(fixit[0]))
            sys.exit(1)
 class DataTypePrefixRule(object):
    def __init__(self, pattern_obj):
        self.name = "DataTypePrefix"
        self.rule_names = ["String", "Integer", "Bool", "Pointer"]
        self.string_prefix = ""
        try:
            for key, value in pattern_obj.items():
                if key == "String":
                    self.string_prefix = value
                elif key == "Integer":
                    self.integer_prefix = value
                elif key == "Bool":
                    self.bool_prefix = value
                elif key == "Pointer":
                    self.pointer_prefix = value
                else:
                    raise ValueError(key)
        except ValueError as e:
            sys.stderr.write('{} is not a valid rule name\n'.format(e.message))
            fixit = difflib.get_close_matches(e.message, self.rule_names, n=1, cutoff=0.8)
            if fixit:
                sys.stderr.write('Did you mean rule name: {} ?\n'.format(fixit[0]))
            sys.exit(1)
 class VariableNameRule(object):
    def __init__(self, pattern_obj=None):
        self.name = "VariableName"
        self.pattern_str = "^.*$"
        self.rule_names = ["ScopePrefix", "DataTypePrefix", "Pattern"]
        self.scope_prefix_rule = None
        self.datatype_prefix_rule = None
        try:
            for key, value in pattern_obj.items():
                if key == "ScopePrefix":
                    self.scope_prefix_rule = ScopePrefixRule(value)
                elif key == "DataTypePrefix":
                    self.datatype_prefix_rule = DataTypePrefixRule(value)
                elif key == "Pattern":
                    self.pattern_str = value
                else:
                    raise ValueError(key)
        except ValueError as e:
            sys.stderr.write('{} is not a valid rule name\n'.format(e.message))
            fixit = difflib.get_close_matches(e.message, self.rule_names, n=1, cutoff=0.8)
            if fixit:
                sys.stderr.write('Did you mean rule name: {} ?\n'.format(fixit[0]))
            sys.exit(1)
        except re.error as e:
            sys.stderr.write('{} is not a valid pattern \n'.format(e.message))
            sys.exit(1)
    def get_scope_prefix(self, node, scope=None):
        if node.storage_class == StorageClass.STATIC:
            return self.scope_prefix_rule.static_prefix
        elif (scope is None) and (node.storage_class == StorageClass.EXTERN or
                                  node.storage_class == StorageClass.NONE):
            return self.scope_prefix_rule.global_prefix
        elif (scope is CursorKind.CLASS_DECL) or (scope is CursorKind.CLASS_TEMPLATE):
            return self.scope_prefix_rule.class_member_prefix
        elif (scope is CursorKind.STRUCT_DECL):
            return self.scope_prefix_rule.struct_member_prefix
        return ""
    def get_datatype_prefix(self, node):
        if node.type.kind is TypeKind.ELABORATED:
            if node.type.spelling.startswith('std::string'):
                return self.datatype_prefix_rule.string_prefix
            elif (node.type.spelling.startswith('std::unique_ptr') or
                  node.type.spelling.startswith("std::shared_ptr")):
                return self.datatype_prefix_rule.pointer_prefix
        elif node.type.kind is TypeKind.POINTER:
            return self.datatype_prefix_rule.pointer_prefix
        else:
            if node.type.spelling == "int":
                return self.datatype_prefix_rule.integer_prefix
            elif node.type.spelling.startswith('bool'):
                return self.datatype_prefix_rule.bool_prefix
        return ""
    def evaluate(self, node, scope=None):
        pattern_str = self.pattern_str
        scope_prefix = self.get_scope_prefix(node, scope)
        datatype_prefix = self.get_datatype_prefix(node)
        pattern_str = pattern_str[0] + scope_prefix + datatype_prefix + pattern_str[1:]
        pattern = re.compile(pattern_str)
        if not pattern.match(node.spelling):
            fmt = '{}:{}:{}: "{}" does not have the pattern {} associated with Variable name\n'
            msg = fmt.format(node.location.file.name, node.location.line, node.location.column,
                             node.displayname, pattern_str)
            sys.stderr.write(msg)
            return False
        return True
 # All supported rules
 default_rules_db["StructName"] = Rule("StructName", CursorKind.STRUCT_DECL)
 default_rules_db["UnionName"] = Rule("UnionName", CursorKind.UNION_DECL)
 default_rules_db["ClassName"] = Rule("ClassName", CursorKind.CLASS_DECL)
 default_rules_db["EnumName"] = Rule("EnumName", CursorKind.ENUM_DECL)
 default_rules_db["EnumConstantName"] = Rule("EnumConstantName", CursorKind.ENUM_CONSTANT_DECL)
 default_rules_db["FunctionName"] = Rule("FunctionName", CursorKind.FUNCTION_DECL)
 default_rules_db["ParameterName"] = Rule("ParameterName", CursorKind.PARM_DECL)
 default_rules_db["TypedefName"] = Rule("TypedefName", CursorKind.TYPEDEF_DECL)
 default_rules_db["CppMethod"] = Rule("CppMethod", CursorKind.CXX_METHOD)
 default_rules_db["Namespace"] = Rule("Namespace", CursorKind.NAMESPACE)
 default_rules_db["ConversionFunction"] = Rule("ConversionFunction", CursorKind.CONVERSION_FUNCTION)
 default_rules_db["TemplateTypeParameter"] = Rule(
    "TemplateTypeParameter", CursorKind.TEMPLATE_TYPE_PARAMETER)
 default_rules_db["TemplateNonTypeParameter"] = Rule(
    "TemplateNonTypeParameter", CursorKind.TEMPLATE_NON_TYPE_PARAMETER)
 default_rules_db["TemplateTemplateParameter"] = Rule(
    "TemplateTemplateParameter", CursorKind.TEMPLATE_TEMPLATE_PARAMETER)
 default_rules_db["FunctionTemplate"] = Rule("FunctionTemplate", CursorKind.FUNCTION_TEMPLATE)
 default_rules_db["ClassTemplate"] = Rule("ClassTemplate", CursorKind.CLASS_TEMPLATE)
 default_rules_db["ClassTemplatePartialSpecialization"] = Rule(
    "ClassTemplatePartialSpecialization", CursorKind.CLASS_TEMPLATE_PARTIAL_SPECIALIZATION)
 default_rules_db["NamespaceAlias"] = Rule("NamespaceAlias", CursorKind.NAMESPACE_ALIAS)
 default_rules_db["UsingDirective"] = Rule("UsingDirective", CursorKind.USING_DIRECTIVE)
 default_rules_db["UsingDeclaration"] = Rule("UsingDeclaration", CursorKind.USING_DECLARATION)
 default_rules_db["TypeAliasName"] = Rule("TypeAliasName", CursorKind.TYPE_ALIAS_DECL)
 default_rules_db["ClassAccessSpecifier"] = Rule(
    "ClassAccessSpecifier", CursorKind.CXX_ACCESS_SPEC_DECL)
 default_rules_db["TypeReference"] = Rule("TypeReference", CursorKind.TYPE_REF)
 default_rules_db["CxxBaseSpecifier"] = Rule("CxxBaseSpecifier", CursorKind.CXX_BASE_SPECIFIER)
 default_rules_db["TemplateReference"] = Rule("TemplateReference", CursorKind.TEMPLATE_REF)
 default_rules_db["NamespaceReference"] = Rule("NamespaceReference", CursorKind.NAMESPACE_REF)
 default_rules_db["MemberReference"] = Rule("MemberReference", CursorKind.MEMBER_REF)
 default_rules_db["LabelReference"] = Rule("LabelReference", CursorKind.LABEL_REF)
 default_rules_db["OverloadedDeclarationReference"] = Rule(
    "OverloadedDeclarationReference", CursorKind.OVERLOADED_DECL_REF)
 default_rules_db["VariableReference"] = Rule("VariableReference", CursorKind.VARIABLE_REF)
 default_rules_db["InvalidFile"] = Rule("InvalidFile", CursorKind.INVALID_FILE)
 default_rules_db["NoDeclarationFound"] = Rule("NoDeclarationFound", CursorKind.NO_DECL_FOUND)
 default_rules_db["NotImplemented"] = Rule("NotImplemented", CursorKind.NOT_IMPLEMENTED)
 default_rules_db["InvalidCode"] = Rule("InvalidCode", CursorKind.INVALID_CODE)
 default_rules_db["UnexposedExpression"] = Rule("UnexposedExpression", CursorKind.UNEXPOSED_EXPR)
 default_rules_db["DeclarationReferenceExpression"] = Rule(
    "DeclarationReferenceExpression", CursorKind.DECL_REF_EXPR)
 default_rules_db["MemberReferenceExpression"] = Rule(
    "MemberReferenceExpression", CursorKind.MEMBER_REF_EXPR)
 default_rules_db["CallExpression"] = Rule("CallExpression", CursorKind.CALL_EXPR)
 default_rules_db["BlockExpression"] = Rule("BlockExpression", CursorKind.BLOCK_EXPR)
 default_rules_db["IntegerLiteral"] = Rule("IntegerLiteral", CursorKind.INTEGER_LITERAL)
 default_rules_db["FloatingLiteral"] = Rule("FloatingLiteral", CursorKind.FLOATING_LITERAL)
 default_rules_db["ImaginaryLiteral"] = Rule("ImaginaryLiteral", CursorKind.IMAGINARY_LITERAL)
 default_rules_db["StringLiteral"] = Rule("StringLiteral", CursorKind.STRING_LITERAL)
 default_rules_db["CharacterLiteral"] = Rule("CharacterLiteral", CursorKind.CHARACTER_LITERAL)
 default_rules_db["ParenExpression"] = Rule("ParenExpression", CursorKind.PAREN_EXPR)
 default_rules_db["UnaryOperator"] = Rule("UnaryOperator", CursorKind.UNARY_OPERATOR)
 default_rules_db["ArraySubscriptExpression"] = Rule(
    "ArraySubscriptExpression", CursorKind.ARRAY_SUBSCRIPT_EXPR)
 default_rules_db["BinaryOperator"] = Rule("BinaryOperator", CursorKind.BINARY_OPERATOR)
 default_rules_db["CompoundAssignmentOperator"] = Rule(
    "CompoundAssignmentOperator", CursorKind.COMPOUND_ASSIGNMENT_OPERATOR)
 default_rules_db["ConditionalOperator"] = Rule(
    "ConditionalOperator", CursorKind.CONDITIONAL_OPERATOR)
 default_rules_db["CstyleCastExpression"] = Rule(
    "CstyleCastExpression", CursorKind.CSTYLE_CAST_EXPR)
 default_rules_db["CompoundLiteralExpression"] = Rule(
    "CompoundLiteralExpression", CursorKind.COMPOUND_LITERAL_EXPR)
 default_rules_db["InitListExpression"] = Rule("InitListExpression", CursorKind.INIT_LIST_EXPR)
 default_rules_db["AddrLabelExpression"] = Rule("AddrLabelExpression", CursorKind.ADDR_LABEL_EXPR)
 default_rules_db["StatementExpression"] = Rule("StatementExpression", CursorKind.StmtExpr)
 default_rules_db["GenericSelectionExpression"] = Rule(
    "GenericSelectionExpression", CursorKind.GENERIC_SELECTION_EXPR)
 default_rules_db["GnuNullExpression"] = Rule("GnuNullExpression", CursorKind.GNU_NULL_EXPR)
 default_rules_db["CxxStaticCastExpression"] = Rule(
    "CxxStaticCastExpression", CursorKind.CXX_STATIC_CAST_EXPR)
 default_rules_db["CxxDynamicCastExpression"] = Rule(
    "CxxDynamicCastExpression", CursorKind.CXX_DYNAMIC_CAST_EXPR)
 default_rules_db["CxxReinterpretCastExpression"] = Rule(
    "CxxReinterpretCastExpression", CursorKind.CXX_REINTERPRET_CAST_EXPR)
 default_rules_db["CxxConstCastExpression"] = Rule(
    "CxxConstCastExpression", CursorKind.CXX_CONST_CAST_EXPR)
 default_rules_db["CxxFunctionalCastExpression"] = Rule(
    "CxxFunctionalCastExpression", CursorKind.CXX_FUNCTIONAL_CAST_EXPR)
 default_rules_db["CxxTypeidExpression"] = Rule("CxxTypeidExpression", CursorKind.CXX_TYPEID_EXPR)
 default_rules_db["CxxBoolLiteralExpression"] = Rule(
    "CxxBoolLiteralExpression", CursorKind.CXX_BOOL_LITERAL_EXPR)
 default_rules_db["CxxNullPointerLiteralExpression"] = Rule(
    "CxxNullPointerLiteralExpression", CursorKind.CXX_NULL_PTR_LITERAL_EXPR)
 default_rules_db["CxxThisExpression"] = Rule("CxxThisExpression", CursorKind.CXX_THIS_EXPR)
 default_rules_db["CxxThrowExpression"] = Rule("CxxThrowExpression", CursorKind.CXX_THROW_EXPR)
 default_rules_db["CxxNewExpression"] = Rule("CxxNewExpression", CursorKind.CXX_NEW_EXPR)
 default_rules_db["CxxDeleteExpression"] = Rule("CxxDeleteExpression", CursorKind.CXX_DELETE_EXPR)
 default_rules_db["CxxUnaryExpression"] = Rule("CxxUnaryExpression", CursorKind.CXX_UNARY_EXPR)
 default_rules_db["PackExpansionExpression"] = Rule(
    "PackExpansionExpression", CursorKind.PACK_EXPANSION_EXPR)
 default_rules_db["SizeOfPackExpression"] = Rule(
    "SizeOfPackExpression", CursorKind.SIZE_OF_PACK_EXPR)
 default_rules_db["LambdaExpression"] = Rule("LambdaExpression", CursorKind.LAMBDA_EXPR)
 default_rules_db["ObjectBoolLiteralExpression"] = Rule(
    "ObjectBoolLiteralExpression", CursorKind.OBJ_BOOL_LITERAL_EXPR)
 default_rules_db["ObjectSelfExpression"] = Rule("ObjectSelfExpression", CursorKind.OBJ_SELF_EXPR)
 default_rules_db["UnexposedStatement"] = Rule("UnexposedStatement", CursorKind.UNEXPOSED_STMT)
 default_rules_db["LabelStatement"] = Rule("LabelStatement", CursorKind.LABEL_STMT)
 default_rules_db["CompoundStatement"] = Rule("CompoundStatement", CursorKind.COMPOUND_STMT)
 default_rules_db["CaseStatement"] = Rule("CaseStatement", CursorKind.CASE_STMT)
 default_rules_db["DefaultStatement"] = Rule("DefaultStatement", CursorKind.DEFAULT_STMT)
 default_rules_db["IfStatement"] = Rule("IfStatement", CursorKind.IF_STMT)
 default_rules_db["SwitchStatement"] = Rule("SwitchStatement", CursorKind.SWITCH_STMT)
 default_rules_db["WhileStatement"] = Rule("WhileStatement", CursorKind.WHILE_STMT)
 default_rules_db["DoStatement"] = Rule("DoStatement", CursorKind.DO_STMT)
 default_rules_db["ForStatement"] = Rule("ForStatement", CursorKind.FOR_STMT)
 default_rules_db["GotoStatement"] = Rule("GotoStatement", CursorKind.GOTO_STMT)
 default_rules_db["IndirectGotoStatement"] = Rule(
    "IndirectGotoStatement", CursorKind.INDIRECT_GOTO_STMT)
 default_rules_db["ContinueStatement"] = Rule("ContinueStatement", CursorKind.CONTINUE_STMT)
 default_rules_db["BreakStatement"] = Rule("BreakStatement", CursorKind.BREAK_STMT)
 default_rules_db["ReturnStatement"] = Rule("ReturnStatement", CursorKind.RETURN_STMT)
 default_rules_db["AsmStatement"] = Rule("AsmStatement", CursorKind.ASM_STMT)
 default_rules_db["CxxCatchStatement"] = Rule("CxxCatchStatement", CursorKind.CXX_CATCH_STMT)
 default_rules_db["CxxTryStatement"] = Rule("CxxTryStatement", CursorKind.CXX_TRY_STMT)
 default_rules_db["CxxForRangeStatement"] = Rule(
    "CxxForRangeStatement", CursorKind.CXX_FOR_RANGE_STMT)
 default_rules_db["MsAsmStatement"] = Rule("MsAsmStatement", CursorKind.MS_ASM_STMT)
 default_rules_db["NullStatement"] = Rule("NullStatement", CursorKind.NULL_STMT)
 default_rules_db["DeclarationStatement"] = Rule("DeclarationStatement", CursorKind.DECL_STMT)
 default_rules_db["TranslationUnit"] = Rule("TranslationUnit", CursorKind.TRANSLATION_UNIT)
 default_rules_db["UnexposedAttribute"] = Rule("UnexposedAttribute", CursorKind.UNEXPOSED_ATTR)
 default_rules_db["CxxFinalAttribute"] = Rule("CxxFinalAttribute", CursorKind.CXX_FINAL_ATTR)
 default_rules_db["CxxOverrideAttribute"] = Rule(
    "CxxOverrideAttribute", CursorKind.CXX_OVERRIDE_ATTR)
 default_rules_db["AnnotateAttribute"] = Rule("AnnotateAttribute", CursorKind.ANNOTATE_ATTR)
 default_rules_db["AsmLabelAttribute"] = Rule("AsmLabelAttribute", CursorKind.ASM_LABEL_ATTR)
 default_rules_db["PackedAttribute"] = Rule("PackedAttribute", CursorKind.PACKED_ATTR)
 default_rules_db["PureAttribute"] = Rule("PureAttribute", CursorKind.PURE_ATTR)
 default_rules_db["ConstAttribute"] = Rule("ConstAttribute", CursorKind.CONST_ATTR)
 default_rules_db["NoduplicateAttribute"] = Rule(
    "NoduplicateAttribute", CursorKind.NODUPLICATE_ATTR)
 default_rules_db["PreprocessingDirective"] = Rule(
    "PreprocessingDirective", CursorKind.PREPROCESSING_DIRECTIVE)
 default_rules_db["MacroDefinition"] = Rule("MacroDefinition", CursorKind.MACRO_DEFINITION)
 default_rules_db["MacroInstantiation"] = Rule("MacroInstantiation", CursorKind.MACRO_INSTANTIATION)
 default_rules_db["InclusionDirective"] = Rule("InclusionDirective", CursorKind.INCLUSION_DIRECTIVE)
 default_rules_db["TypeAliasTeplateDeclaration"] = Rule(
    "TypeAliasTeplateDeclaration", CursorKind.TYPE_ALIAS_TEMPLATE_DECL)
 # Reverse lookup map. The parse identifies Clang cursor kinds, which must be mapped
 # to user defined types
 for key, value in default_rules_db.items():
    clang_to_user_map[value.clang_kind] = key
 default_rules_db["VariableName"] = Rule("VariableName", CursorKind.VAR_DECL)
 clang_to_user_map[CursorKind.FIELD_DECL] = "VariableName"
 clang_to_user_map[CursorKind.VAR_DECL] = "VariableName"
 class AstNodeStack(object):
    def __init__(self):
        self.stack = []
    def pop(self):
        return self.stack.pop()
    def push(self, kind):
        self.stack.append(kind)
    def peek(self):
        if len(self.stack) > 0:
            return self.stack[-1]
        return None
 class Options:
    def __init__(self):
        self.args = None
        self._style_file = None
        self.file_exclusions = None
        self._skip_file = None
        self.parser = argparse.ArgumentParser(
            prog="ncc.py",
            description="ncc is a development tool to help programmers "
            "write C/C++ code that adheres to adhere some naming conventions. It automates the "
            "process of checking C code to spare humans of this boring "
            "(but important) task. This makes it ideal for projects that want "
            "to enforce a coding standard.")
        self.parser.add_argument('--recurse', action='store_true', dest="recurse",
                                 help="Read all files under each directory, recursively")
        self.parser.add_argument('--style', dest="style_file",
                                 help="Read rules from the specified file. If the user does not"
                                 "provide a style file ncc will use all style rules. To print"
                                 "all style rules use --dump option")
        self.parser.add_argument('--include', dest='include', nargs="+", help="User defined "
                                 "header file path, this is same as -I argument to the compiler")
        self.parser.add_argument('--definition', dest='definition', nargs="+", help="User specified "
                                 "definitions, this is same as -D argument to the compiler")
        self.parser.add_argument('--dump', dest='dump', action='store_true',
                                 help="Dump all available options")
        self.parser.add_argument('--output', dest='output', help="output file name where"
                                 "naming convenction vialoations will be stored")
        self.parser.add_argument('--filetype', dest='filetype', help="File extentions type"
                                 "that are applicable for naming convection validation")
        self.parser.add_argument('--clang-lib', dest='clang_lib',
                                 help="Custom location of clang library")
        self.parser.add_argument('--exclude', dest='exclude', nargs="+", help="Skip files "
                                 "matching the pattern specified from recursive searches. It "
                                 "matches a specified pattern according to the rules used by "
                                 "the Unix shell")
        self.parser.add_argument('--skip', '-s', dest="skip_file",
                                 help="Read list of items to ignore during the check. "
                                 "User can use the skip file to specify character sequences that should "
                                 "be ignored by ncc")
        # self.parser.add_argument('--exclude-dir', dest='exclude_dir', help="Skip the directories"
        #                          "matching the pattern specified")
        self.parser.add_argument('--path', dest='path', nargs="+",
                                 help="Path of file or directory")
    def parse_cmd_line(self):
        self.args = self.parser.parse_args()
        if self.args.dump:
            self.dump_all_rules()
        if self.args.style_file:
            self._style_file = self.args.style_file
            if not os.path.exists(self._style_file):
                sys.stderr.write("Style file '{}' not found!\n".format(self._style_file))
                sys.exit(1)
        if self.args.skip_file:
            self._skip_file = self.args.skip_file
            if not os.path.exists(self._skip_file):
                sys.stderr.write("Skip file '{}' not found!\n".format(self._skip_file))
    def dump_all_rules(self):
        print("----------------------------------------------------------")
        print("{:<35} | {}".format("Rule Name", "Pattern"))
        print("----------------------------------------------------------")
        for (key, value) in default_rules_db.items():
            print("{:<35} : {}".format(key, value.pattern_str))
 class SkipDb(object):
    def __init__(self, skip_file=None):
        self.__skip_db = {}
        if skip_file:
            self.build_skip_db(skip_file)
    def build_skip_db(self, skip_file):
        with open(skip_file) as stylefile:
            style_rules = yaml.safe_load(stylefile)
            for (skip_string, skip_comment) in style_rules.items():
                self.__skip_db[skip_string] = skip_comment
    def check_skip_db(self, input_query):
        if input_query in self.__skip_db.keys():
            return 1
        else:
            return 0
 class RulesDb(object):
    def __init__(self, style_file=None):
        self.__rule_db = {}
        self.__clang_db = {}
        if style_file:
            self.build_rules_db(style_file)
        else:
            self.__rule_db = default_rules_db
            self.__clang_db = clang_to_user_map
    def build_rules_db(self, style_file):
        with open(style_file) as stylefile:
            style_rules = yaml.safe_load(stylefile)
        for (rule_name, pattern_str) in style_rules.items():
            try:
                clang_kind = default_rules_db[rule_name].clang_kind
                if clang_kind:
                    if rule_name == "VariableName":
                        self.__rule_db[rule_name] = VariableNameRule(pattern_str)
                        self.__clang_db[CursorKind.FIELD_DECL] = rule_name
                        self.__clang_db[CursorKind.VAR_DECL] = rule_name
                    else:
                        self.__rule_db[rule_name] = default_rules_db[rule_name]
                        self.__rule_db[rule_name].pattern_str = pattern_str
                        self.__rule_db[rule_name].pattern = re.compile(pattern_str)
                        self.__clang_db[clang_kind] = rule_name
            except KeyError as e:
                sys.stderr.write('{} is not a valid C/C++ construct name\n'.format(e.message))
                fixit = difflib.get_close_matches(e.message, default_rules_db.keys(),
                                                  n=1, cutoff=0.8)
                if fixit:
                    sys.stderr.write('Did you mean rule name: {} ?\n'.format(fixit[0]))
                sys.exit(1)
            except re.error as e:
                sys.stderr.write('"{}" pattern {} has {} \n'.
                                 format(rule_name, pattern_str, e.message))
                sys.exit(1)
    def is_rule_enabled(self, kind):
        if self.__clang_db.get(kind):
            return True
        return False
    def get_rule_names(self, kind):
        """
        Multiple user defined rules can be configured against one type of ClangKind
        For e.g. ClassMemberVariable, StructMemberVariable are types of FIELD_DECL
        """
        return self.__clang_db.get(kind)
    def get_rule(self, rule_name):
        return self.__rule_db.get(rule_name)
 class Validator(object):
    def __init__(self, rule_db, filename, options, skip_db=None):
        self.filename = filename
        self.rule_db = rule_db
        self.skip_db = skip_db
        self.options = options
        self.node_stack = AstNodeStack()
        index = Index.create()
        args = []
        args.append('-x')
        args.append('c++')
        args.append('-D_GLIBCXX_USE_CXX11_ABI=0')
        if self.options.args.definition:
            for item in self.options.args.definition:
                defintion = r'-D' + item
                args.append(defintion)
        if self.options.args.include:
            for item in self.options.args.include:
                inc = r'-I' + item
                args.append(inc)
        self.cursor = index.parse(filename, args).cursor
    def validate(self):
        return self.check(self.cursor)
    def check(self, node):
        """
        Recursively visit all nodes of the AST and match against the patter provided by
        the user. Return the total number of errors caught in the file
        """
        errors = 0
        for child in node.get_children():
            if self.is_local(child, self.filename):
                # This is the case when typedef of struct is causing double reporting of error
                # TODO: Find a better way to handle it
                parent = self.node_stack.peek()
                if (parent and parent == CursorKind.TYPEDEF_DECL and
                        child.kind == CursorKind.STRUCT_DECL):
                    return 0
                errors += self.evaluate(child)
                # Members struct, class, and unions must be treated differently.
                # So parent ast node information is pushed in to the stack.
                # Once all its children are validated pop it out of the stack
                self.node_stack.push(child.kind)
                errors += self.check(child)
                self.node_stack.pop()
        return errors
    def evaluate(self, node):
        """
        get the node's rule and match the pattern. Report and error if pattern
        matching fails
        """
        if not self.rule_db.is_rule_enabled(node.kind):
            return 0
        # If the pattern is in the skip list, ignore it
        if self.skip_db.check_skip_db(node.displayname):
            return 0
        rule_name = self.rule_db.get_rule_names(node.kind)
        rule = self.rule_db.get_rule(rule_name)
        if rule.evaluate(node, self.node_stack.peek()) is False:
            return 1
        return 0
    def is_local(self, node, filename):
        """ Returns True is node belongs to the file being validated and not an include file """
        if node.location.file and node.location.file.name in filename:
            return True
        return False
 def do_validate(options, filename):
    """
    Returns true if the file should be validated
    - Check if its a c/c++ file
    - Check if the file is not excluded
    """
    path, extension = os.path.splitext(filename)
    if extension not in file_extensions:
        return False
    if options.args.exclude:
        for item in options.args.exclude:
            if fnmatch.fnmatch(filename, item):
                return False
    return True
 if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(message)s',
                        filename='log.txt', filemode='w')
    """ Parse all command line arguments and validate """
    op = Options()
    op.parse_cmd_line()
    if op.args.path is None:
        sys.exit(1)
    if op.args.clang_lib:
        Config.set_library_file(op.args.clang_lib)
    """ Creating the rules database """
    rules_db = RulesDb(op._style_file)
    """ Creating the skip database """
    skip_db = SkipDb(op._skip_file)
    """ Check the source code against the configured rules """
    errors = 0
    for path in op.args.path:
        if os.path.isfile(path):
            if do_validate(op, path):
                v = Validator(rules_db, path, op, skip_db)
                errors += v.validate()
        elif os.path.isdir(path):
            for (root, subdirs, files) in os.walk(path):
                for filename in files:
                    path = root + '/' + filename
                    if do_validate(op, path):
                        v = Validator(rules_db, path, op, skip_db)
                        errors += v.validate()
                if not op.args.recurse:
                    break
        else:
            sys.stderr.write("File '{}' not found!\n".format(path))
            sys.exit(1)
    if errors:
        print("Total number of errors = {}".format(errors))
        sys.exit(1)
--- a/tools/ncc/ncc.style
+++ b/tools/ncc/ncc.style
@ -1,21 +0,0 @@
 ClassName: '^[A-Z][a-zA-Z0-9]+$'
 CppMethod: '^operator|^FUN_[a-f0-9]{8}$|^VTable0x[a-f0-9]{1,8}$|^(?!VTable)[A-Z][a-zA-Z0-9]+$'
 EnumName: '^\(unnamed|^[A-Z][a-zA-Z0-9]+$'
 EnumConstantName: '^(c_|e_)[a-z][a-zA-Z0-9]+$'
 FunctionName: '^operator|^FUN_[a-f0-9]{8}$|^VTable0x[a-f0-9]{1,8}$|^(?!VTable)[A-Z][a-zA-Z0-9]+$'
 ParameterName: '^p_(unk0x[a-f0-9]{1,8}$|(?!unk)[a-z][a-zA-Z0-9]*)$|^$'
 StructName: '^\(anon|^\(unnamed|^[A-Z][a-zA-Z0-9]+$'
 TypedefName: '^[A-Z][a-zA-Z0-9]+$'
 UnionName: '^\(anon|^[A-Z][a-zA-Z0-9]+$'
 VariableName:
    ScopePrefix:
        Global: 'g_'
        Static: 'g_'
        ClassMember: 'm_'
        StructMember: 'm_'
    DataTypePrefix:
        String: ''
        Integer: ''
        Bool: ''
        Pointer: ''
    Pattern: '^(unk0x[a-f0-9]{1,8}$|(?!unk)[a-z][a-zA-Z0-9]*|str[a-zA-Z0-9_]*)$'
--- a/tools/ncc/skip.yml
+++ b/tools/ncc/skip.yml
@ -1,32 +0,0 @@
 configureLegoAnimationManager(MxS32): 'DLL exported function'
 configureLegoBuildingManager(MxS32): 'DLL exported function'
 configureLegoModelPresenter(MxS32): 'DLL exported function'
 configureLegoPartPresenter(MxS32, MxS32): 'DLL exported function'
 configureLegoROI(int): 'DLL exported function'
 configureLegoWorldPresenter(MxS32): 'DLL exported function'
 GetNoCD_SourceName(): 'DLL exported function'
 m_3dView: 'Allow this variable name'
 m_3dManager: 'Allow this variable name'
 m_16bitPal: 'Allow this variable name'
 m_HWDesc: 'Allow this variable name'
 m_HELDesc: 'Allow this variable name'
 p_HWDesc: 'Allow this variable name'
 p_HELDesc: 'Allow this variable name'
 e_RAMStream: 'Allow this enum constant'
 p_milliseconds: 'Probably a bug with function call'
 m_increaseAmount: "Can't currently detect member in union"
 m_increaseFactor: "Can't currently detect member in union"
 delta_rad: "Allow original naming from 1996"
 delta_pos: "Allow original naming from 1996"
 rot_mat: "Allow original naming from 1996"
 new_pos: "Allow original naming from 1996"
 new_dir: "Allow original naming from 1996"
 p_AnimTreePtr: "Allow original naming from beta"
 m_AnimTreePtr: "Allow original naming from beta"
 m_BADuration: "Allow original naming from beta"
 m_assAnimP: "Allow original naming from beta"
 m_disAnimP: "Allow original naming from beta"
 i_activity: "Allow original naming from beta"
 i_actor: "Allow original naming from beta"
 score: "Allow original naming from beta"
 c_LOCATIONS_NUM: "Allow original naming from beta"
--- a/tools/patch_c2.py
+++ b/tools/patch_c2.py
@ -1,67 +0,0 @@
 #!/usr/bin/env python
 import argparse
 import hashlib
 import pathlib
 import shutil
 ORIGINAL_C2_MD5 = "dcd69f1dd28b02dd03dd7ed02984299a"  # original C2.EXE
 C2_MD5 = (
    ORIGINAL_C2_MD5,
    "e70acde41802ddec06c4263bb357ac30",  # patched C2.EXE
 )
 C2_SIZE = 549888
 def main():
    parser = argparse.ArgumentParser(
        allow_abbrev=False,
        description="Path to C2.EXE of Microsoft Visual Studio 4.2.0 to disable C4786 warning",
    )
    parser.add_argument("path", type=pathlib.Path, help="Path of C2.EXE")
    parser.add_argument(
        "-f", dest="force", default=False, action="store_true", help="force"
    )
    args = parser.parse_args()
    if not args.path.is_file():
        parser.error("Input is not a file")
    binary = bytearray(args.path.open("rb").read())
    md5 = hashlib.md5(binary).hexdigest()
    print(md5, C2_MD5)
    msg_cb = parser.error if not args.force else print
    if len(binary) != C2_SIZE:
        msg_cb("file size is not correct")
    if md5 not in C2_MD5:
        msg_cb("md5 checksum does not match")
    if md5 == ORIGINAL_C2_MD5:
        backup = f"{args.path}.BAK"
        print(f'Creating backup "{backup}"')
        shutil.copyfile(args.path, backup)
    def nop_patch(start, count, expected=None):
        replacement = [0x90] * count
        if expected:
            current = list(binary[start : start + count])
            assert len(expected) == count
            assert current in (expected, replacement)
        print(f"Nopping {count} bytes at 0x{start:08x}")
        binary[start : start + count] = replacement
    print(
        "Disable C4786 warning: '%Fs' : identifier was truncated to '%d' characters in the debug information"
    )
    nop_patch(0x52F07, 5, [0xE8, 0x4F, 0xB3, 0xFE, 0xFF])  # 0x00453b07
    nop_patch(0x74832, 5, [0xE8, 0x24, 0x9A, 0xFC, 0xFF])  # 0x00475432
    args.path.open("wb").write(binary)
    print("done")
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/tools/reccmp/config.png
+++ b/tools/reccmp/config.png
--- a/tools/reccmp/isle.png
+++ b/tools/reccmp/isle.png
--- a/tools/reccmp/lego1.png
+++ b/tools/reccmp/lego1.png
--- a/tools/reccmp/reccmp.js
+++ b/tools/reccmp/reccmp.js
@ -1,867 +0,0 @@
 // reccmp.js
 /* global data */
 // Unwrap array of functions into a dictionary with address as the key.
 const dataDict = Object.fromEntries(data.map(row => [row.address, row]));
 function getDataByAddr(addr) {
  return dataDict[addr];
 }
 //
 // Pure functions
 //
 function formatAsm(entries, addrOption) {
  const output = [];
  const createTh = (text) => {
    const th = document.createElement('th');
    th.innerText = text;
    return th;
  };
  const createTd = (text, className = '') => {
    const td = document.createElement('td');
    td.innerText = text;
    td.className = className;
    return td;
  };
  entries.forEach(obj => {
    // These won't all be present. You get "both" for an equal node
    // and orig/recomp for a diff.
    const { both = [], orig = [], recomp = [] } = obj;
    output.push(...both.map(([addr, line, recompAddr]) => {
      const tr = document.createElement('tr');
      tr.appendChild(createTh(addr));
      tr.appendChild(createTh(recompAddr));
      tr.appendChild(createTd(line));
      return tr;
    }));
    output.push(...orig.map(([addr, line]) => {
      const tr = document.createElement('tr');
      tr.appendChild(createTh(addr));
      tr.appendChild(createTh(''));
      tr.appendChild(createTd(`-${line}`, 'diffneg'));
      return tr;
    }));
    output.push(...recomp.map(([addr, line]) => {
      const tr = document.createElement('tr');
      tr.appendChild(createTh(''));
      tr.appendChild(createTh(addr));
      tr.appendChild(createTd(`+${line}`, 'diffpos'));
      return tr;
    }));
  });
  return output;
 }
 // Special internal values to ensure this sort order for matching column:
 // 1. Stub
 // 2. Any match percentage [0.0, 1.0)
 // 3. Effective match
 // 4. Actual 100% match
 function matchingColAdjustment(row) {
  if ('stub' in row) {
    return -1;
  }
  if ('effective' in row) {
    return 1.0;
  }
  if (row.matching === 1.0) {
    return 1000;
  }
  return row.matching;
 }
 function getCppClass(str) {
  const idx = str.indexOf('::');
  if (idx !== -1) {
    return str.slice(0, idx);
  }
  return str;
 }
 // Clamp string length to specified length and pad with ellipsis
 function stringTruncate(str, maxlen = 20) {
  str = getCppClass(str);
  if (str.length > maxlen) {
    return `${str.slice(0, maxlen)}...`;
  }
  return str;
 }
 function getMatchPercentText(row) {
  if ('stub' in row) {
    return 'stub';
  }
  if ('effective' in row) {
    return '100.00%*';
  }
  return (row.matching * 100).toFixed(2) + '%';
 }
 function countDiffs(row) {
  const { diff = '' } = row;
  if (diff === '') {
    return '';
  }
  const diffs = diff.map(([slug, subgroups]) => subgroups).flat();
  const diffLength = diffs.filter(d => !('both' in d)).length;
  const diffWord = diffLength === 1 ? 'diff' : 'diffs';
  return diffLength === 0 ? '' : `${diffLength} ${diffWord}`;
 }
 // Helper for this set/remove attribute block
 function setBooleanAttribute(element, attribute, value) {
  if (value) {
    element.setAttribute(attribute, '');
  } else {
    element.removeAttribute(attribute);
  }
 }
 function copyToClipboard(value) {
  navigator.clipboard.writeText(value);
 }
 const PAGE_SIZE = 200;
 //
 // Global state
 //
 class ListingState {
  constructor() {
    this._query = '';
    this._sortCol = 'address';
    this._filterType = 1;
    this._sortDesc = false;
    this._hidePerfect = false;
    this._hideStub = false;
    this._showRecomp = false;
    this._expanded = {};
    this._page = 0;
    this._listeners = [];
    this._results = [];
    this.updateResults();
  }
  addListener(fn) {
    this._listeners.push(fn);
  }
  callListeners() {
    for (const fn of this._listeners) {
      fn();
    }
  }
  isExpanded(addr) {
    return addr in this._expanded;
  }
  toggleExpanded(addr) {
    this.setExpanded(addr, !this.isExpanded(addr));
  }
  setExpanded(addr, value) {
    if (value) {
      this._expanded[addr] = true;
    } else {
      delete this._expanded[addr];
    }
  }
  updateResults() {
    const filterFn = this.rowFilterFn.bind(this);
    const sortFn = this.rowSortFn.bind(this);
    this._results = data.filter(filterFn).sort(sortFn);
    // Set _page directly to avoid double call to listeners.
    this._page = this.pageClamp(this.page);
    this.callListeners();
  }
  pageSlice() {
    return this._results.slice(this.page * PAGE_SIZE, (this.page + 1) * PAGE_SIZE);
  }
  resultsCount() {
    return this._results.length;
  }
  pageCount() {
    return Math.ceil(this._results.length / PAGE_SIZE);
  }
  maxPage() {
    return Math.max(0, this.pageCount() - 1);
  }
  // A list showing the range of each page based on the sort column and direction.
  pageHeadings() {
    if (this._results.length === 0) {
      return [];
    }
    const headings = [];
    for (let i = 0; i < this.pageCount(); i++) {
      const startIdx = i * PAGE_SIZE;
      const endIdx = Math.min(this._results.length, ((i + 1) * PAGE_SIZE)) - 1;
      let start = this._results[startIdx][this.sortCol];
      let end = this._results[endIdx][this.sortCol];
      if (this.sortCol === 'matching') {
        start = getMatchPercentText(this._results[startIdx]);
        end = getMatchPercentText(this._results[endIdx]);
      }
      headings.push([i, stringTruncate(start), stringTruncate(end)]);
    }
    return headings;
  }
  rowFilterFn(row) {
    // Destructuring sets defaults for optional values from this object.
    const {
      effective = false,
      stub = false,
      diff = '',
      name,
      address,
      matching
    } = row;
    if (this.hidePerfect && (effective || matching >= 1)) {
      return false;
    }
    if (this.hideStub && stub) {
      return false;
    }
    if (this.query === '') {
      return true;
    }
    // Name/addr search
    if (this.filterType === 1) {
      return (
        address.includes(this.query) ||
        name.toLowerCase().includes(this.query)
      );
    }
    // no diff for review.
    if (diff === '') {
      return false;
    }
    // special matcher for combined diff
    const anyLineMatch = ([addr, line]) => line.toLowerCase().trim().includes(this.query);
    // Flatten all diff groups for the search
    const diffs = diff.map(([slug, subgroups]) => subgroups).flat();
    for (const subgroup of diffs) {
      const { both = [], orig = [], recomp = [] } = subgroup;
      // If search includes context
      if (this.filterType === 2 && both.some(anyLineMatch)) {
        return true;
      }
      if (orig.some(anyLineMatch) || recomp.some(anyLineMatch)) {
        return true;
      }
    }
    return false;
  }
  rowSortFn(rowA, rowB) {
    const valA = this.sortCol === 'matching'
      ? matchingColAdjustment(rowA)
      : rowA[this.sortCol];
    const valB = this.sortCol === 'matching'
      ? matchingColAdjustment(rowB)
      : rowB[this.sortCol];
    if (valA > valB) {
      return this.sortDesc ? -1 : 1;
    } else if (valA < valB) {
      return this.sortDesc ? 1 : -1;
    }
    return 0;
  }
  pageClamp(page) {
    return Math.max(0, Math.min(page, this.maxPage()));
  }
  get page() {
    return this._page;
  }
  set page(page) {
    this._page = this.pageClamp(page);
    this.callListeners();
  }
  get filterType() {
    return parseInt(this._filterType);
  }
  set filterType(value) {
    value = parseInt(value);
    if (value >= 1 && value <= 3) {
      this._filterType = value;
    }
    this.updateResults();
  }
  get query() {
    return this._query;
  }
  set query(value) {
    // Normalize search string
    this._query = value.toLowerCase().trim();
    this.updateResults();
  }
  get showRecomp() {
    return this._showRecomp;
  }
  set showRecomp(value) {
    // Don't sort by the recomp column we are about to hide
    if (!value && this.sortCol === 'recomp') {
      this._sortCol = 'address';
    }
    this._showRecomp = value;
    this.callListeners();
  }
  get sortCol() {
    return this._sortCol;
  }
  set sortCol(column) {
    if (column === this._sortCol) {
      this._sortDesc = !this._sortDesc;
    } else {
      this._sortCol = column;
    }
    this.updateResults();
  }
  get sortDesc() {
    return this._sortDesc;
  }
  set sortDesc(value) {
    this._sortDesc = value;
    this.updateResults();
  }
  get hidePerfect() {
    return this._hidePerfect;
  }
  set hidePerfect(value) {
    this._hidePerfect = value;
    this.updateResults();
  }
  get hideStub() {
    return this._hideStub;
  }
  set hideStub(value) {
    this._hideStub = value;
    this.updateResults();
  }
 }
 const appState = new ListingState();
 //
 // Custom elements
 //
 // Sets sort indicator arrow based on element attributes.
 class SortIndicator extends window.HTMLElement {
  static observedAttributes = ['data-sort'];
  attributeChangedCallback(name, oldValue, newValue) {
    if (newValue === null) {
      // Reserve space for blank indicator so column width stays the same
      this.innerHTML = '&nbsp;';
    } else {
      this.innerHTML = newValue === 'asc' ? '&#9650;' : '&#9660;';
    }
  }
 }
 class FuncRow extends window.HTMLElement {
  connectedCallback() {
    if (this.shadowRoot !== null) {
      return;
    }
    const template = document.querySelector('template#funcrow-template').content;
    const shadow = this.attachShadow({ mode: 'open' });
    shadow.appendChild(template.cloneNode(true));
    shadow.querySelector(':host > div[data-col="name"]').addEventListener('click', evt => {
      this.dispatchEvent(new Event('name-click'));
    });
  }
  get address() {
    return this.getAttribute('data-address');
  }
 }
 class NoDiffMessage extends window.HTMLElement {
  connectedCallback() {
    if (this.shadowRoot !== null) {
      return;
    }
    const template = document.querySelector('template#nodiff-template').content;
    const shadow = this.attachShadow({ mode: 'open' });
    shadow.appendChild(template.cloneNode(true));
  }
 }
 class CanCopy extends window.HTMLElement {
  connectedCallback() {
    if (this.shadowRoot !== null) {
      return;
    }
    const template = document.querySelector('template#can-copy-template').content;
    const shadow = this.attachShadow({ mode: 'open' });
    shadow.appendChild(template.cloneNode(true));
    const el = shadow.querySelector('slot').assignedNodes()[0];
    el.addEventListener('mouseout', evt => { this.copied = false; });
    el.addEventListener('click', evt => {
      copyToClipboard(evt.target.textContent);
      this.copied = true;
    });
  }
  get copied() {
    return this.getAttribute('copied');
  }
  set copied(value) {
    if (value) {
      setTimeout(() => { this.copied = false; }, 2000);
    }
    setBooleanAttribute(this, 'copied', value);
  }
 }
 // Displays asm diff for the given @data-address value.
 class DiffRow extends window.HTMLElement {
  connectedCallback() {
    if (this.shadowRoot !== null) {
      return;
    }
    const template = document.querySelector('template#diffrow-template').content;
    const shadow = this.attachShadow({ mode: 'open' });
    shadow.appendChild(template.cloneNode(true));
  }
  get address() {
    return this.getAttribute('data-address');
  }
  set address(value) {
    this.setAttribute('data-address', value);
  }
 }
 class DiffDisplayOptions extends window.HTMLElement {
  static observedAttributes = ['data-option'];
  connectedCallback() {
    if (this.shadowRoot !== null) {
      return;
    }
    const shadow = this.attachShadow({ mode: 'open' });
    shadow.innerHTML = `
      <style>
        fieldset {
          align-items: center;
          display: flex;
          margin-bottom: 20px;
        }
        label {
          margin-right: 10px;
          user-select: none;
        }
        label, input {
          cursor: pointer;
        }
      </style>
      <fieldset>
        <legend>Address display:</legend>
        <input type="radio" id="showNone" name="addrDisplay" value=0>
        <label for="showNone">None</label>
        <input type="radio" id="showOrig" name="addrDisplay" value=1>
        <label for="showOrig">Original</label>
        <input type="radio" id="showBoth" name="addrDisplay" value=2>
        <label for="showBoth">Both</label>
      </fieldset>`;
    shadow.querySelectorAll('input[type=radio]').forEach(radio => {
      const checked = this.option === radio.getAttribute('value');
      setBooleanAttribute(radio, 'checked', checked);
      radio.addEventListener('change', evt => (this.option = evt.target.value));
    });
  }
  set option(value) {
    this.setAttribute('data-option', parseInt(value));
  }
  get option() {
    return this.getAttribute('data-option') ?? 1;
  }
  attributeChangedCallback(name, oldValue, newValue) {
    if (name !== 'data-option') {
      return;
    }
    this.dispatchEvent(new Event('change'));
  }
 }
 class DiffDisplay extends window.HTMLElement {
  static observedAttributes = ['data-option'];
  connectedCallback() {
    if (this.querySelector('diff-display-options') !== null) {
      return;
    }
    const optControl = new DiffDisplayOptions();
    optControl.option = this.option;
    optControl.addEventListener('change', evt => (this.option = evt.target.option));
    this.appendChild(optControl);
    const div = document.createElement('div');
    const obj = getDataByAddr(this.address);
    const createHeaderLine = (text, className) => {
      const div = document.createElement('div');
      div.textContent = text;
      div.className = className;
      return div;
    };
    const groups = obj.diff;
    groups.forEach(([slug, subgroups]) => {
      const secondTable = document.createElement('table');
      secondTable.classList.add('diffTable');
      const hdr = document.createElement('div');
      hdr.appendChild(createHeaderLine('---', 'diffneg'));
      hdr.appendChild(createHeaderLine('+++', 'diffpos'));
      hdr.appendChild(createHeaderLine(slug, 'diffslug'));
      div.appendChild(hdr);
      const tbody = document.createElement('tbody');
      secondTable.appendChild(tbody);
      const diffs = formatAsm(subgroups, this.option);
      for (const el of diffs) {
        tbody.appendChild(el);
      }
      div.appendChild(secondTable);
    });
    this.appendChild(div);
  }
  get address() {
    return this.getAttribute('data-address');
  }
  set address(value) {
    this.setAttribute('data-address', value);
  }
  get option() {
    return this.getAttribute('data-option') ?? 1;
  }
  set option(value) {
    this.setAttribute('data-option', value);
  }
 }
 class ListingOptions extends window.HTMLElement {
  constructor() {
    super();
    // Register to receive updates
    appState.addListener(() => this.onUpdate());
    const input = this.querySelector('input[type=search]');
    input.oninput = evt => (appState.query = evt.target.value);
    const hidePerf = this.querySelector('input#cbHidePerfect');
    hidePerf.onchange = evt => (appState.hidePerfect = evt.target.checked);
    hidePerf.checked = appState.hidePerfect;
    const hideStub = this.querySelector('input#cbHideStub');
    hideStub.onchange = evt => (appState.hideStub = evt.target.checked);
    hideStub.checked = appState.hideStub;
    const showRecomp = this.querySelector('input#cbShowRecomp');
    showRecomp.onchange = evt => (appState.showRecomp = evt.target.checked);
    showRecomp.checked = appState.showRecomp;
    this.querySelector('button#pagePrev').addEventListener('click', evt => {
      appState.page = appState.page - 1;
    });
    this.querySelector('button#pageNext').addEventListener('click', evt => {
      appState.page = appState.page + 1;
    });
    this.querySelector('select#pageSelect').addEventListener('change', evt => {
      appState.page = evt.target.value;
    });
    this.querySelectorAll('input[name=filterType]').forEach(radio => {
      const checked = appState.filterType === parseInt(radio.getAttribute('value'));
      setBooleanAttribute(radio, 'checked', checked);
      radio.onchange = evt => (appState.filterType = radio.getAttribute('value'));
    });
    this.onUpdate();
  }
  onUpdate() {
    // Update input placeholder based on search type
    this.querySelector('input[type=search]').placeholder = appState.filterType === 1
      ? 'Search for offset or function name...'
      : 'Search for instruction...';
    // Update page number and max page
    this.querySelector('fieldset#pageDisplay > legend').textContent = `Page ${appState.page + 1} of ${Math.max(1, appState.pageCount())}`;
    // Disable prev/next buttons on first/last page
    setBooleanAttribute(this.querySelector('button#pagePrev'), 'disabled', appState.page === 0);
    setBooleanAttribute(this.querySelector('button#pageNext'), 'disabled', appState.page === appState.maxPage());
    // Update page select dropdown
    const pageSelect = this.querySelector('select#pageSelect');
    setBooleanAttribute(pageSelect, 'disabled', appState.resultsCount() === 0);
    pageSelect.innerHTML = '';
    if (appState.resultsCount() === 0) {
      const opt = document.createElement('option');
      opt.textContent = '- no results -';
      pageSelect.appendChild(opt);
    } else {
      for (const row of appState.pageHeadings()) {
        const opt = document.createElement('option');
        opt.value = row[0];
        if (appState.page === row[0]) {
          opt.setAttribute('selected', '');
        }
        const [start, end] = [row[1], row[2]];
        opt.textContent = `${appState.sortCol}: ${start} to ${end}`;
        pageSelect.appendChild(opt);
      }
    }
    // Update row count
    this.querySelector('#rowcount').textContent = `${appState.resultsCount()}`;
  }
 }
 // Main application.
 class ListingTable extends window.HTMLElement {
  constructor() {
    super();
    // Register to receive updates
    appState.addListener(() => this.somethingChanged());
  }
  setDiffRow(address, shouldExpand) {
    const tbody = this.querySelector('tbody');
    const funcrow = tbody.querySelector(`func-row[data-address="${address}"]`);
    if (funcrow === null) {
      return;
    }
    const existing = tbody.querySelector(`diff-row[data-address="${address}"]`);
    if (existing !== null) {
      if (!shouldExpand) {
        tbody.removeChild(existing);
      }
      return;
    }
    const diffrow = document.createElement('diff-row');
    diffrow.address = address;
    // Decide what goes inside the diff row.
    const obj = getDataByAddr(address);
    if ('stub' in obj) {
      const msg = document.createElement('no-diff');
      const p = document.createElement('div');
      p.innerText = 'Stub. No diff.';
      msg.appendChild(p);
      diffrow.appendChild(msg);
    } else if (obj.diff.length === 0) {
      const msg = document.createElement('no-diff');
      const p = document.createElement('div');
      p.innerText = 'Identical function - no diff';
      msg.appendChild(p);
      diffrow.appendChild(msg);
    } else {
      const dd = new DiffDisplay();
      dd.option = '1';
      dd.address = address;
      diffrow.appendChild(dd);
    }
    // Insert the diff row after the parent func row.
    tbody.insertBefore(diffrow, funcrow.nextSibling);
  }
  connectedCallback() {
    const thead = this.querySelector('thead');
    const headers = thead.querySelectorAll('th:not([data-no-sort])'); // TODO
    headers.forEach(th => {
      const col = th.getAttribute('data-col');
      if (col) {
        const span = th.querySelector('span');
        if (span) {
          span.addEventListener('click', evt => { appState.sortCol = col; });
        }
      }
    });
    this.somethingChanged();
  }
  somethingChanged() {
    // Toggle recomp/diffs column
    setBooleanAttribute(this.querySelector('table'), 'show-recomp', appState.showRecomp);
    this.querySelectorAll('func-row[data-address]').forEach(row => {
      setBooleanAttribute(row, 'show-recomp', appState.showRecomp);
    });
    const thead = this.querySelector('thead');
    const headers = thead.querySelectorAll('th');
    // Update sort indicator
    headers.forEach(th => {
      const col = th.getAttribute('data-col');
      const indicator = th.querySelector('sort-indicator');
      if (indicator === null) {
        return;
      }
      if (appState.sortCol === col) {
        indicator.setAttribute('data-sort', appState.sortDesc ? 'desc' : 'asc');
      } else {
        indicator.removeAttribute('data-sort');
      }
    });
    // Add the rows
    const tbody = this.querySelector('tbody');
    tbody.innerHTML = ''; // ?
    for (const obj of appState.pageSlice()) {
      const row = document.createElement('func-row');
      row.setAttribute('data-address', obj.address); // ?
      row.addEventListener('name-click', evt => {
        appState.toggleExpanded(obj.address);
        this.setDiffRow(obj.address, appState.isExpanded(obj.address));
      });
      setBooleanAttribute(row, 'show-recomp', appState.showRecomp);
      setBooleanAttribute(row, 'expanded', appState.isExpanded(row));
      const items = [
        ['address', obj.address],
        ['recomp', obj.recomp],
        ['name', obj.name],
        ['diffs', countDiffs(obj)],
        ['matching', getMatchPercentText(obj)]
      ];
      items.forEach(([slotName, content]) => {
        const div = document.createElement('span');
        div.setAttribute('slot', slotName);
        div.innerText = content;
        row.appendChild(div);
      });
      tbody.appendChild(row);
      if (appState.isExpanded(obj.address)) {
        this.setDiffRow(obj.address, true);
      }
    }
  }
 }
 window.onload = () => {
  window.customElements.define('listing-table', ListingTable);
  window.customElements.define('listing-options', ListingOptions);
  window.customElements.define('diff-display', DiffDisplay);
  window.customElements.define('diff-display-options', DiffDisplayOptions);
  window.customElements.define('sort-indicator', SortIndicator);
  window.customElements.define('func-row', FuncRow);
  window.customElements.define('diff-row', DiffRow);
  window.customElements.define('no-diff', NoDiffMessage);
  window.customElements.define('can-copy', CanCopy);
 };
--- a/tools/reccmp/reccmp.py
+++ b/tools/reccmp/reccmp.py
@ -1,344 +0,0 @@
 #!/usr/bin/env python3
 import argparse
 import base64
 import json
 import logging
 import os
 from datetime import datetime
 from isledecomp import (
    Bin,
    get_file_in_script_dir,
    print_combined_diff,
    diff_json,
    percent_string,
 )
 from isledecomp.compare import Compare as IsleCompare
 from isledecomp.types import SymbolType
 from pystache import Renderer
 import colorama
 colorama.just_fix_windows_console()
 def gen_json(json_file: str, orig_file: str, data):
    """Create a JSON file that contains the comparison summary"""
    # If the structure of the JSON file ever changes, we would run into a problem
    # reading an older format file in the CI action. Mark which version we are
    # generating so we could potentially address this down the road.
    json_format_version = 1
    # Remove the diff field
    reduced_data = [
        {key: value for (key, value) in obj.items() if key != "diff"} for obj in data
    ]
    with open(json_file, "w", encoding="utf-8") as f:
        json.dump(
            {
                "file": os.path.basename(orig_file).lower(),
                "format": json_format_version,
                "timestamp": datetime.now().timestamp(),
                "data": reduced_data,
            },
            f,
        )
 def gen_html(html_file, data):
    js_path = get_file_in_script_dir("reccmp.js")
    with open(js_path, "r", encoding="utf-8") as f:
        reccmp_js = f.read()
    output_data = Renderer().render_path(
        get_file_in_script_dir("template.html"), {"data": data, "reccmp_js": reccmp_js}
    )
    with open(html_file, "w", encoding="utf-8") as htmlfile:
        htmlfile.write(output_data)
 def gen_svg(svg_file, name_svg, icon, svg_implemented_funcs, total_funcs, raw_accuracy):
    icon_data = None
    if icon:
        with open(icon, "rb") as iconfile:
            icon_data = base64.b64encode(iconfile.read()).decode("utf-8")
    total_statistic = raw_accuracy / total_funcs
    full_percentbar_width = 127.18422
    output_data = Renderer().render_path(
        get_file_in_script_dir("template.svg"),
        {
            "name": name_svg,
            "icon": icon_data,
            "implemented": f"{(svg_implemented_funcs / total_funcs * 100):.2f}% ({svg_implemented_funcs}/{total_funcs})",
            "accuracy": f"{(raw_accuracy / svg_implemented_funcs * 100):.2f}%",
            "progbar": total_statistic * full_percentbar_width,
            "percent": f"{(total_statistic * 100):.2f}%",
        },
    )
    with open(svg_file, "w", encoding="utf-8") as svgfile:
        svgfile.write(output_data)
 def print_match_verbose(match, show_both_addrs: bool = False, is_plain: bool = False):
    percenttext = percent_string(
        match.effective_ratio, match.is_effective_match, is_plain
    )
    if show_both_addrs:
        addrs = f"0x{match.orig_addr:x} / 0x{match.recomp_addr:x}"
    else:
        addrs = hex(match.orig_addr)
    if match.is_stub:
        print(f"{addrs}: {match.name} is a stub. No diff.")
        return
    if match.effective_ratio == 1.0:
        ok_text = (
            "OK!"
            if is_plain
            else (colorama.Fore.GREEN + "✨ OK! ✨" + colorama.Style.RESET_ALL)
        )
        if match.ratio == 1.0:
            print(f"{addrs}: {match.name} 100% match.\n\n{ok_text}\n\n")
        else:
            print(
                f"{addrs}: {match.name} Effective 100% match. (Differs in register allocation only)\n\n{ok_text} (still differs in register allocation)\n\n"
            )
    else:
        print_combined_diff(match.udiff, is_plain, show_both_addrs)
        print(
            f"\n{match.name} is only {percenttext} similar to the original, diff above"
        )
 def print_match_oneline(match, show_both_addrs: bool = False, is_plain: bool = False):
    percenttext = percent_string(
        match.effective_ratio, match.is_effective_match, is_plain
    )
    if show_both_addrs:
        addrs = f"0x{match.orig_addr:x} / 0x{match.recomp_addr:x}"
    else:
        addrs = hex(match.orig_addr)
    if match.is_stub:
        print(f"  {match.name} ({addrs}) is a stub.")
    else:
        print(f"  {match.name} ({addrs}) is {percenttext} similar to the original")
 def parse_args() -> argparse.Namespace:
    def virtual_address(value) -> int:
        """Helper method for argparse, verbose parameter"""
        return int(value, 16)
    parser = argparse.ArgumentParser(
        allow_abbrev=False,
        description="Recompilation Compare: compare an original EXE with a recompiled EXE + PDB.",
    )
    parser.add_argument(
        "original", metavar="original-binary", help="The original binary"
    )
    parser.add_argument(
        "recompiled", metavar="recompiled-binary", help="The recompiled binary"
    )
    parser.add_argument(
        "pdb", metavar="recompiled-pdb", help="The PDB of the recompiled binary"
    )
    parser.add_argument(
        "decomp_dir", metavar="decomp-dir", help="The decompiled source tree"
    )
    parser.add_argument(
        "--total",
        "-T",
        metavar="<count>",
        help="Total number of expected functions (improves total accuracy statistic)",
    )
    parser.add_argument(
        "--verbose",
        "-v",
        metavar="<offset>",
        type=virtual_address,
        help="Print assembly diff for specific function (original file's offset)",
    )
    parser.add_argument(
        "--json",
        metavar="<file>",
        help="Generate JSON file with match summary",
    )
    parser.add_argument(
        "--diff",
        metavar="<file>",
        help="Diff against summary in JSON file",
    )
    parser.add_argument(
        "--html",
        "-H",
        metavar="<file>",
        help="Generate searchable HTML summary of status and diffs",
    )
    parser.add_argument(
        "--no-color", "-n", action="store_true", help="Do not color the output"
    )
    parser.add_argument(
        "--svg", "-S", metavar="<file>", help="Generate SVG graphic of progress"
    )
    parser.add_argument("--svg-icon", metavar="icon", help="Icon to use in SVG (PNG)")
    parser.add_argument(
        "--print-rec-addr",
        action="store_true",
        help="Print addresses of recompiled functions too",
    )
    parser.add_argument(
        "--silent",
        action="store_true",
        help="Don't display text summary of matches",
    )
    parser.set_defaults(loglevel=logging.INFO)
    parser.add_argument(
        "--debug",
        action="store_const",
        const=logging.DEBUG,
        dest="loglevel",
        help="Print script debug information",
    )
    args = parser.parse_args()
    if not os.path.isfile(args.original):
        parser.error(f"Original binary {args.original} does not exist")
    if not os.path.isfile(args.recompiled):
        parser.error(f"Recompiled binary {args.recompiled} does not exist")
    if not os.path.isfile(args.pdb):
        parser.error(f"Symbols PDB {args.pdb} does not exist")
    if not os.path.isdir(args.decomp_dir):
        parser.error(f"Source directory {args.decomp_dir} does not exist")
    return args
 def main():
    args = parse_args()
    logging.basicConfig(level=args.loglevel, format="[%(levelname)s] %(message)s")
    with Bin(args.original, find_str=True) as origfile, Bin(
        args.recompiled
    ) as recompfile:
        if args.verbose is not None:
            # Mute logger events from compare engine
            logging.getLogger("isledecomp.compare.db").setLevel(logging.CRITICAL)
            logging.getLogger("isledecomp.compare.lines").setLevel(logging.CRITICAL)
        isle_compare = IsleCompare(origfile, recompfile, args.pdb, args.decomp_dir)
        if args.loglevel == logging.DEBUG:
            isle_compare.debug = True
        print()
        ### Compare one or none.
        if args.verbose is not None:
            match = isle_compare.compare_address(args.verbose)
            if match is None:
                print(f"Failed to find a match at address 0x{args.verbose:x}")
                return
            print_match_verbose(
                match, show_both_addrs=args.print_rec_addr, is_plain=args.no_color
            )
            return
        ### Compare everything.
        function_count = 0
        total_accuracy = 0
        total_effective_accuracy = 0
        htmlinsert = []
        for match in isle_compare.compare_all():
            if not args.silent and args.diff is None:
                print_match_oneline(
                    match, show_both_addrs=args.print_rec_addr, is_plain=args.no_color
                )
            if match.match_type == SymbolType.FUNCTION and not match.is_stub:
                function_count += 1
                total_accuracy += match.ratio
                total_effective_accuracy += match.effective_ratio
            # If html, record the diffs to an HTML file
            html_obj = {
                "address": f"0x{match.orig_addr:x}",
                "recomp": f"0x{match.recomp_addr:x}",
                "name": match.name,
                "matching": match.effective_ratio,
            }
            if match.is_effective_match:
                html_obj["effective"] = True
            if match.udiff is not None:
                html_obj["diff"] = match.udiff
            if match.is_stub:
                html_obj["stub"] = True
            htmlinsert.append(html_obj)
        # Compare with saved diff report.
        if args.diff is not None:
            with open(args.diff, "r", encoding="utf-8") as f:
                saved_data = json.load(f)
                diff_json(
                    saved_data,
                    htmlinsert,
                    args.original,
                    show_both_addrs=args.print_rec_addr,
                    is_plain=args.no_color,
                )
        ## Generate files and show summary.
        if args.json is not None:
            gen_json(args.json, args.original, htmlinsert)
        if args.html is not None:
            gen_html(args.html, json.dumps(htmlinsert))
        implemented_funcs = function_count
        if args.total:
            function_count = int(args.total)
        if function_count > 0:
            effective_accuracy = total_effective_accuracy / function_count * 100
            actual_accuracy = total_accuracy / function_count * 100
            print(
                f"\nTotal effective accuracy {effective_accuracy:.2f}% across {function_count} functions ({actual_accuracy:.2f}% actual accuracy)"
            )
            if args.svg is not None:
                gen_svg(
                    args.svg,
                    os.path.basename(args.original),
                    args.svg_icon,
                    implemented_funcs,
                    function_count,
                    total_effective_accuracy,
                )
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/tools/reccmp/template.html
+++ b/tools/reccmp/template.html
@ -1,365 +0,0 @@
 <!DOCTYPE html>
 <html>
  <head>
    <title>Decompilation Status</title>
    <style>
      body {
        background: #202020;
        color: #f0f0f0;
        font-family: sans-serif;
      }
      h1 {
        text-align: center;
      }
      .main {
        width: 800px;
        max-width: 100%;
        margin: auto;
      }
      #search {
        width: 100%;
        box-sizing: border-box;
        background: #303030;
        color: #f0f0f0;
        border: 1px #f0f0f0 solid;
        padding: 0.5em;
        border-radius: 0.5em;
      }
      #search::placeholder {
        color: #b0b0b0;
      }
      #listing {
        width: 100%;
        border-collapse: collapse;
        font-family: monospace;
      }
      func-row:hover {
        background: #404040 !important;
      }
      func-row:nth-child(odd of :not([hidden])), #listing > thead th {
        background: #282828;
      }
      func-row:nth-child(even of :not([hidden])) {
        background: #383838;
      }
      table#listing {
        border: 1px #f0f0f0 solid;
      }
      #listing > thead th {
        padding: 0.5em;
        user-select: none;
        width: 10%;
        text-align: left;
      }
      #listing:not([show-recomp]) > thead th[data-col="recomp"] {
        display: none;
      }
      #listing > thead th > div {
        display: flex;
        column-gap: 0.5em;
      }
      #listing > thead th > div > span {
        cursor: pointer;
      }
      #listing > thead th > div > span:hover {
        text-decoration: underline;
        text-decoration-style: dotted;
      }
      #listing > thead th:last-child > div {
        justify-content: right;
      }
      #listing > thead th[data-col="name"] {
        width: 60%;
      }
      .diffneg {
        color: #FF8080;
      }
      .diffpos {
        color: #80FF80;
      }
      .diffslug {
        color: #8080FF;
      }
      .identical {
        font-style: italic;
        text-align: center;
      }
      sort-indicator {
        user-select: none;
      }
      .filters {
        align-items: top;
        display: flex;
        font-size: 10pt;
        justify-content: space-between;
        margin: 0.5em 0 1em 0;
      }
      .filters > fieldset {
        /* checkbox and radio buttons v-aligned with text */
        align-items: center;
        display: flex;
      }
      .filters > fieldset > input, .filters > fieldset > label {
        cursor: pointer;
      }
      .filters > fieldset > label {
        margin-right: 10px;
      }
      table.diffTable {
        border-collapse: collapse;
      }
      table.diffTable:not(:last-child) {
        /* visual gap *between* diff context groups */
        margin-bottom: 40px;
      }
      table.diffTable td, table.diffTable th {
        border: 0 none;
        padding: 0 10px 0 0;
      }
      table.diffTable th {
        /* don't break address if asm line is long */
        word-break: keep-all;
      }
      diff-display[data-option="0"] th:nth-child(1) {
        display: none;
      }
      diff-display[data-option="0"] th:nth-child(2),
      diff-display[data-option="1"] th:nth-child(2) {
        display: none;
      }
      label {
        user-select: none;
      }
      #pageDisplay > button {
        cursor: pointer;
        padding: 0.25em 0.5em;
      }
      #pageDisplay select {
        cursor: pointer;
        padding: 0.25em;
        margin: 0 0.5em;
      }
      p.rowcount {
        align-self: flex-end;
        font-size: 1.2em;
        margin-bottom: 0;
      }
    </style>
    <script>var data = {{{data}}};</script>
    <script>{{{reccmp_js}}}</script>
    </script>
  </head>
  <body>
    <div class="main">
      <h1>Decompilation Status</h1>
      <listing-options>
        <input id="search" type="search" placeholder="Search for offset or function name...">
        <div class="filters">
          <fieldset>
            <legend>Options:</legend>
            <input type="checkbox" id="cbHidePerfect" />
            <label for="cbHidePerfect">Hide 100% match</label>
            <input type="checkbox" id="cbHideStub" />
            <label for="cbHideStub">Hide stubs</label>
            <input type="checkbox" id="cbShowRecomp" />
            <label for="cbShowRecomp">Show recomp address</label>
          </fieldset>
          <fieldset>
            <legend>Search filters on:</legend>
            <input type="radio" name="filterType" id="filterName" value=1 checked />
            <label for="filterName">Name/address</label>
            <input type="radio" name="filterType" id="filterAsm" value=2 />
            <label for="filterAsm">Asm output</label>
            <input type="radio" name="filterType" id="filterDiff" value=3 />
            <label for="filterDiff">Asm diffs only</label>
          </fieldset>
        </div>
        <div class="filters">
          <p class="rowcount">Results: <span id="rowcount"></span></p>
          <fieldset id="pageDisplay">
            <legend>Page</legend>
            <button id="pagePrev">prev</button>
            <select id="pageSelect">
            </select>
            <button id="pageNext">next</button>
          </fieldset>
        </div>
      </listing-options>
      <listing-table>
        <table id="listing">
          <thead>
            <tr>
              <th data-col="address">
                <div>
                  <span>Address</span>
                  <sort-indicator/>
                </div>
              </th>
              <th data-col="recomp">
                <div>
                  <span>Recomp</span>
                  <sort-indicator/>
                </div>
              </th>
              <th data-col="name">
                <div>
                  <span>Name</span>
                  <sort-indicator/>
                </div>
              </th>
              <th data-col="diffs" data-no-sort></th>
              <th data-col="matching">
                <div>
                  <sort-indicator></sort-indicator>
                  <span>Matching</span>
                </div>
              </th>
            </tr>
          </thead>
          <tbody>
          </tbody>
        </table>
      </listing-table>
    </div>
    <template id="funcrow-template">
      <style>
        :host(:not([hidden])) {
          display: table-row;
        }
        :host(:not([show-recomp])) > div[data-col="recomp"] {
          display: none;
        }
        div[data-col="name"]:hover {
          cursor: pointer;
        }
        div[data-col="name"]:hover > ::slotted(*) {
          text-decoration: underline;
          text-decoration-style: dotted;
        }
        ::slotted(*:not([slot="name"])) {
          white-space: nowrap;
        }
        :host > div {
          border-top: 1px #f0f0f0 solid;
          display: table-cell;
          padding: 0.5em;
          word-break: break-all !important;
        }
        :host > div:last-child {
          text-align: right;
        }
      </style>
      <div data-col="address"><can-copy><slot name="address"></slot></can-copy></div>
      <div data-col="recomp"><can-copy><slot name="recomp"></slot></can-copy></div>
      <div data-col="name"><slot name="name"></slot></div>
      <div data-col="diffs"><slot name="diffs"></slot></div>
      <div data-col="matching"><slot name="matching"></slot></div>
    </template>
    <template id="diffrow-template">
      <style>
        :host(:not([hidden])) {
          display: table-row;
          contain: paint;
        }
        td.singleCell {
          border: 1px #f0f0f0 solid;
          border-bottom: 0px none;
          display: table-cell;
          padding: 0.5em;
          word-break: break-all !important;
        }
      </style>
      <td class="singleCell" colspan="5">
        <slot></slot>
      </td>
    </template>
    <template id="nodiff-template">
      <style>
        ::slotted(*) {
          font-style: italic;
          text-align: center;
        }
      </style>
      <slot></slot>
    </template>
    <template id="can-copy-template">
      <style>
        :host {
          position: relative;
        }
        ::slotted(*) {
          cursor: pointer;
        }
        slot::after {
          background-color: #fff;
          color: #222;
          display: none;
          font-size: 12px;
          padding: 1px 2px;
          width: fit-content;
          border-radius: 1px;
          text-align: center;
          bottom: 120%;
          box-shadow: 0 4px 14px 0 rgba(0,0,0,.2), 0 0 0 1px rgba(0,0,0,.05);
          position: absolute;
          white-space: nowrap;
          transition: .1s;
          content: 'Copy to clipboard';
        }
        ::slotted(*:hover) {
          text-decoration: underline;
          text-decoration-style: dotted;
        }
        slot:hover::after {
          display: block;
        }
        :host([copied]) > slot:hover::after {
          content: 'Copied!';
        }
      </style>
      <slot></slot>
    </template>
  </body>
 </html>
--- a/tools/reccmp/template.svg
+++ b/tools/reccmp/template.svg
@ -1,119 +0,0 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
 <!-- Created with Inkscape (http://www.inkscape.org/) -->
 <svg
   width="640"
   height="480"
   viewBox="0 0 169.33333 127"
   version="1.1"
   id="svg5"
   xml:space="preserve"
   sodipodi:docname="template.svg"
   inkscape:version="1.2.2 (b0a8486541, 2022-12-01)"
   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns="http://www.w3.org/2000/svg"
   xmlns:svg="http://www.w3.org/2000/svg"><sodipodi:namedview
     id="namedview26"
     pagecolor="#505050"
     bordercolor="#eeeeee"
     borderopacity="1"
     inkscape:showpageshadow="0"
     inkscape:pageopacity="0"
     inkscape:pagecheckerboard="0"
     inkscape:deskcolor="#505050"
     showgrid="false"
     inkscape:zoom="1.6046875"
     inkscape:cx="158.90944"
     inkscape:cy="220.6037"
     inkscape:window-width="2560"
     inkscape:window-height="1379"
     inkscape:window-x="0"
     inkscape:window-y="0"
     inkscape:window-maximized="1"
     inkscape:current-layer="g1273" /><defs
     id="defs5">
        <clipPath
   id="progBarCutoff">
          <rect
   width="{{progbar}}"
   height="8.6508904"
   x="21.118132"
   y="134.05507"
   id="rect2" />
        </clipPath>
      </defs><g
     id="g1273"
     transform="matrix(1.2683581,0,0,1.2683581,-22.720969,-65.913871)"><image
       width="53.066437"
       height="53.066437"
       preserveAspectRatio="none"
       style="image-rendering:optimizeSpeed"
       xlink:href="data:image/png;base64,{{icon}}"
       id="image1060"
       x="58.13345"
       y="51.967873" /><text
       xml:space="preserve"
       style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:12.7px;font-family:monospace;-inkscape-font-specification:mono;text-align:center;text-anchor:middle;fill:#ffffff;stroke:#000000;stroke-width:1.25161812;stroke-opacity:1;stroke-dasharray:none;paint-order:stroke fill markers"
       x="84.666656"
       y="118.35877"
       id="text740"><tspan
         id="tspan738"
         style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-family:monospace;-inkscape-font-specification:mono;text-align:center;text-anchor:middle;stroke:#000000;stroke-width:1.25161812;stroke-opacity:1;stroke-dasharray:none;paint-order:stroke fill markers"
         x="84.666656"
         y="118.35877">{{name}}</tspan></text><g
       id="g1250"
       transform="translate(-0.04358834,8.1397473)"><rect
         style="display:inline;fill:none;fill-opacity:1;stroke:#000000;stroke-width:2.50324;stroke-dasharray:none;stroke-opacity:1"
         id="rect1619"
         width="127.18422"
         height="8.6508904"
         x="21.118132"
         y="134.05507" /><rect
         style="display:inline;fill:#000000;fill-opacity:1;stroke:#ffffff;stroke-width:0.87411;stroke-dasharray:none;stroke-opacity:1"
         id="rect1167"
         width="127.18422"
         height="8.6508904"
         x="21.118132"
         y="134.05507" /><text
         xml:space="preserve"
         style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:4.23333px;font-family:monospace;-inkscape-font-specification:mono;text-align:start;text-anchor:start;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1.05833;stroke-dasharray:none;stroke-opacity:1"
         x="76.884926"
         y="139.89182"
         id="text2152"><tspan
           style="font-size:4.23333px;fill:#ffffff;fill-opacity:1;stroke-width:1.05833"
           x="76.884926"
           y="139.89182"
           id="tspan2150">{{percent}}</tspan></text><rect
         style="display:inline;fill:#ffffff;stroke:none;stroke-width:2.6764"
         id="rect1169"
         width="127.18422"
         height="8.6508904"
         x="21.118132"
         y="134.05507"
         clip-path="url(#progBarCutoff)" /><text
         xml:space="preserve"
         style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:4.23333px;font-family:monospace;-inkscape-font-specification:mono;text-align:start;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1.05833;stroke-dasharray:none;stroke-opacity:1"
         x="76.884926"
         y="139.89182"
         id="text18"
         clip-path="url(#progBarCutoff)"
         inkscape:label="text18"><tspan
           style="font-size:4.23333px;fill:#000000;fill-opacity:1;stroke-width:1.05833"
           x="76.884926"
           y="139.89182"
           id="tspan16">{{percent}}</tspan></text></g><text
       xml:space="preserve"
       style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:4.23333px;font-family:monospace;-inkscape-font-specification:mono;text-align:start;text-anchor:start;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.83441208;stroke-dasharray:none;stroke-opacity:1;opacity:1;stroke-linejoin:miter;stroke-linecap:butt;paint-order:stroke fill markers"
       x="46.947659"
       y="129.67447"
       id="text1260"><tspan
         id="tspan1258"
         style="font-size:4.23333px;stroke-width:0.83441208;stroke:#000000;stroke-opacity:1;stroke-dasharray:none;stroke-linejoin:miter;stroke-linecap:butt;paint-order:stroke fill markers"
         x="46.947659"
         y="129.67447">Implemented: {{implemented}}</tspan><tspan
         style="font-size:4.23333px;stroke-width:0.83441208;stroke:#000000;stroke-opacity:1;stroke-dasharray:none;stroke-linejoin:miter;stroke-linecap:butt;paint-order:stroke fill markers"
         x="46.947659"
         y="134.96613"
         id="tspan1262">Accuracy:    {{accuracy}}</tspan></text></g></svg>
--- a/tools/requirements.txt
+++ b/tools/requirements.txt
@ -1,11 +0,0 @@
 tools/isledecomp
 capstone
 clang==16.*
 colorama>=0.4.6
 isledecomp
 pystache
 pyyaml
 git+https://github.com/wbenny/pydemangler.git
 # requirement of capstone due to python dropping distutils.
 # see: https://github.com/capstone-engine/capstone/issues/2223
 setuptools ; python_version >= "3.12"
--- a/tools/roadmap/roadmap.py
+++ b/tools/roadmap/roadmap.py
@ -1,492 +0,0 @@
 """For all addresses matched by code annotations or recomp pdb,
 report how "far off" the recomp symbol is from its proper place
 in the original binary."""
 import os
 import argparse
 import logging
 import statistics
 import bisect
 from typing import Iterator, List, Optional, Tuple
 from collections import namedtuple
 from isledecomp import Bin as IsleBin
 from isledecomp.bin import InvalidVirtualAddressError
 from isledecomp.cvdump import Cvdump
 from isledecomp.compare import Compare as IsleCompare
 from isledecomp.types import SymbolType
 # Ignore all compare-db messages.
 logging.getLogger("isledecomp.compare").addHandler(logging.NullHandler())
 def or_blank(value) -> str:
    """Helper for dealing with potential None values in text output."""
    return "" if value is None else str(value)
 class ModuleMap:
    """Load a subset of sections from the pdb to allow you to look up the
    module number based on the recomp address."""
    def __init__(self, pdb, binfile) -> None:
        cvdump = Cvdump(pdb).section_contributions().modules().run()
        self.module_lookup = {m.id: (m.lib, m.obj) for m in cvdump.modules}
        self.library_lookup = {m.obj: m.lib for m in cvdump.modules}
        self.section_contrib = [
            (
                binfile.get_abs_addr(sizeref.section, sizeref.offset),
                sizeref.size,
                sizeref.module,
            )
            for sizeref in cvdump.sizerefs
            if binfile.is_valid_section(sizeref.section)
        ]
        # For bisect performance enhancement
        self.contrib_starts = [start for (start, _, __) in self.section_contrib]
    def get_lib_for_module(self, module: str) -> Optional[str]:
        return self.library_lookup.get(module)
    def get_all_cmake_modules(self) -> List[str]:
        return [
            obj
            for (_, (__, obj)) in self.module_lookup.items()
            if obj.startswith("CMakeFiles")
        ]
    def get_module(self, addr: int) -> Optional[str]:
        i = bisect.bisect_left(self.contrib_starts, addr)
        # If the addr matches the section contribution start, we are in the
        # right spot. Otherwise, we need to subtract one here.
        # We don't want the insertion point given by bisect, but the
        # section contribution that contains the address.
        (potential_start, _, __) = self.section_contrib[i]
        if potential_start != addr:
            i -= 1
        # Safety catch: clamp to range of indices from section_contrib.
        i = max(0, min(i, len(self.section_contrib) - 1))
        (start, size, module_id) = self.section_contrib[i]
        if start <= addr < start + size:
            if (module := self.module_lookup.get(module_id)) is not None:
                return module
        return None
 def print_sections(sections):
    print("    name |    start |   v.size | raw size")
    print("---------|----------|----------|----------")
    for sect in sections:
        name = sect.name
        print(
            f"{name:>8} | {sect.virtual_address:8x} | {sect.virtual_size:8x} | {sect.size_of_raw_data:8x}"
        )
    print()
 ALLOWED_TYPE_ABBREVIATIONS = ["fun", "dat", "poi", "str", "vta", "flo"]
 def match_type_abbreviation(mtype: Optional[SymbolType]) -> str:
    """Return abbreviation of the given SymbolType name"""
    if mtype is None:
        return ""
    return mtype.name.lower()[:3]
 def get_cmakefiles_prefix(module: str) -> str:
    """For the given .obj, get the "CMakeFiles/something.dir/" prefix.
    For lack of a better option, this is the library for this module."""
    if module.startswith("CMakeFiles"):
        return "/".join(module.split("/", 2)[:2]) + "/"
    return module
 def truncate_module_name(prefix: str, module: str) -> str:
    """Remove the CMakeFiles prefix and the .obj suffix for the given module.
    Input: CMakeFiles/lego1.dir/, CMakeFiles/lego1.dir/LEGO1/define.cpp.obj
    Output: LEGO1/define.cpp"""
    if module.startswith(prefix):
        module = module[len(prefix) :]
    if module.endswith(".obj"):
        module = module[:-4]
    return module
 def avg_remove_outliers(entries: List[int]) -> int:
    """Compute the average from this list of entries (addresses)
    after removing outlier values."""
    if len(entries) == 1:
        return entries[0]
    avg = statistics.mean(entries)
    sd = statistics.pstdev(entries)
    return int(statistics.mean([e for e in entries if abs(e - avg) <= 2 * sd]))
 RoadmapRow = namedtuple(
    "RoadmapRow",
    [
        "orig_sect_ofs",
        "recomp_sect_ofs",
        "orig_addr",
        "recomp_addr",
        "displacement",
        "sym_type",
        "size",
        "name",
        "module",
    ],
 )
 class DeltaCollector:
    """Reads each row of the results and aggregates information about the
    placement of each module."""
    def __init__(self, match_type: str = "fun") -> None:
        # The displacement for each symbol from each module
        self.disp_map = {}
        # Each address for each module
        self.addresses = {}
        # The earliest address for each module
        self.earliest = {}
        # String abbreviation for which symbol type we are checking
        self.match_type = "fun"
        match_type = str(match_type).strip().lower()[:3]
        if match_type in ALLOWED_TYPE_ABBREVIATIONS:
            self.match_type = match_type
    def read_row(self, row: RoadmapRow):
        if row.module is None:
            return
        if row.sym_type != self.match_type:
            return
        if row.orig_addr is not None:
            if row.module not in self.addresses:
                self.addresses[row.module] = []
            self.addresses[row.module].append(row.orig_addr)
            if row.orig_addr < self.earliest.get(row.module, 0xFFFFFFFFF):
                self.earliest[row.module] = row.orig_addr
        if row.displacement is not None:
            if row.module not in self.disp_map:
                self.disp_map[row.module] = []
            self.disp_map[row.module].append(row.displacement)
    def iter_sorted(self) -> Iterator[Tuple[int, int]]:
        """Compute the average address for each module, then generate them
        in ascending order."""
        avg_address = {
            mod: avg_remove_outliers(values) for mod, values in self.addresses.items()
        }
        for mod, avg in sorted(avg_address.items(), key=lambda x: x[1]):
            yield (avg, mod)
 def suggest_order(results: List[RoadmapRow], module_map: ModuleMap, match_type: str):
    """Suggest the order of modules for CMakeLists.txt"""
    dc = DeltaCollector(match_type)
    for row in results:
        dc.read_row(row)
    # First, show the order of .obj files for the "CMake Modules"
    # Meaning: the modules where the .obj file begins with "CMakeFiles".
    # These are the libraries where we directly control the order.
    # The library name (from cvdump) doesn't make it obvious that these are
    # our libraries so we derive the name based on the CMakeFiles prefix.
    leftover_modules = set(module_map.get_all_cmake_modules())
    # A little convoluted, but we want to take the first two tokens
    # of the string with '/' as the delimiter.
    # i.e. CMakeFiles/isle.dir/
    # The idea is to print exactly what appears in CMakeLists.txt.
    cmake_prefixes = sorted(set(get_cmakefiles_prefix(mod) for mod in leftover_modules))
    # Save this off because we'll use it again later.
    computed_order = list(dc.iter_sorted())
    for prefix in cmake_prefixes:
        print(prefix)
        last_earliest = 0
        # Show modules ordered by the computed average of addresses
        for _, module in computed_order:
            if not module.startswith(prefix):
                continue
            leftover_modules.remove(module)
            avg_displacement = None
            displacements = dc.disp_map.get(module)
            if displacements is not None and len(displacements) > 0:
                avg_displacement = int(statistics.mean(displacements))
            # Call attention to any modules where ordering by earliest
            # address is different from the computed order we display.
            earliest = dc.earliest.get(module)
            ooo_mark = "*" if earliest < last_earliest else " "
            last_earliest = earliest
            code_file = truncate_module_name(prefix, module)
            print(f"0x{earliest:08x}{ooo_mark} {avg_displacement:10}  {code_file}")
        # These modules are included in the final binary (in some form) but
        # don't contribute any symbols of the type we are checking.
        # n.b. There could still be other modules that are part of
        # CMakeLists.txt but are not included in the pdb for whatever reason.
        # In other words: don't take the list we provide as the final word on
        # what should or should not be included.
        # This is merely a suggestion of the order.
        for module in leftover_modules:
            if not module.startswith(prefix):
                continue
            # aligned with previous print
            code_file = truncate_module_name(prefix, module)
            print(f"      no suggestion     {code_file}")
        print()
    # Now display the order of all libaries in the final file.
    library_order = {}
    for start, module in computed_order:
        lib = module_map.get_lib_for_module(module)
        if lib is None:
            lib = get_cmakefiles_prefix(module)
        if start < library_order.get(lib, 0xFFFFFFFFF):
            library_order[lib] = start
    print("Library order (average address shown):")
    for lib, start in sorted(library_order.items(), key=lambda x: x[1]):
        # Strip off any OS path for brevity
        if not lib.startswith("CMakeFiles"):
            lib = os.path.basename(lib)
        print(f"{lib:40} {start:08x}")
 def print_text_report(results: List[RoadmapRow]):
    """Print the result with original and recomp addresses."""
    for row in results:
        print(
            "  ".join(
                [
                    f"{or_blank(row.orig_sect_ofs):14}",
                    f"{or_blank(row.recomp_sect_ofs):14}",
                    f"{or_blank(row.displacement):>8}",
                    f"{row.sym_type:3}",
                    f"{or_blank(row.size):6}",
                    or_blank(row.name),
                ]
            )
        )
 def print_diff_report(results: List[RoadmapRow]):
    """Print only entries where we have the recomp address.
    This is intended for generating a file to diff against.
    The recomp addresses are always changing so we hide those."""
    for row in results:
        if row.orig_addr is None or row.recomp_addr is None:
            continue
        print(
            "  ".join(
                [
                    f"{or_blank(row.orig_sect_ofs):14}",
                    f"{or_blank(row.displacement):>8}",
                    f"{row.sym_type:3}",
                    f"{or_blank(row.size):6}",
                    or_blank(row.name),
                ]
            )
        )
 def export_to_csv(csv_file: str, results: List[RoadmapRow]):
    with open(csv_file, "w+", encoding="utf-8") as f:
        f.write(
            "orig_sect_ofs,recomp_sect_ofs,orig_addr,recomp_addr,displacement,row_type,size,name,module\n"
        )
        for row in results:
            f.write(",".join(map(or_blank, row)))
            f.write("\n")
 def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Show all addresses from original and recomp."
    )
    parser.add_argument(
        "original", metavar="original-binary", help="The original binary"
    )
    parser.add_argument(
        "recompiled", metavar="recompiled-binary", help="The recompiled binary"
    )
    parser.add_argument(
        "pdb", metavar="recompiled-pdb", help="The PDB of the recompiled binary"
    )
    parser.add_argument(
        "decomp_dir", metavar="decomp-dir", help="The decompiled source tree"
    )
    parser.add_argument("--csv", metavar="<file>", help="If set, export to CSV")
    parser.add_argument(
        "--verbose", "-v", action="store_true", help="Show recomp addresses in output"
    )
    parser.add_argument(
        "--order",
        const="fun",
        nargs="?",
        type=str,
        help="Show suggested order of modules (using the specified symbol type)",
    )
    (args, _) = parser.parse_known_args()
    if not os.path.isfile(args.original):
        parser.error(f"Original binary {args.original} does not exist")
    if not os.path.isfile(args.recompiled):
        parser.error(f"Recompiled binary {args.recompiled} does not exist")
    if not os.path.isfile(args.pdb):
        parser.error(f"Symbols PDB {args.pdb} does not exist")
    if not os.path.isdir(args.decomp_dir):
        parser.error(f"Source directory {args.decomp_dir} does not exist")
    return args
 def main():
    args = parse_args()
    with IsleBin(args.original, find_str=True) as orig_bin, IsleBin(
        args.recompiled
    ) as recomp_bin:
        engine = IsleCompare(orig_bin, recomp_bin, args.pdb, args.decomp_dir)
        module_map = ModuleMap(args.pdb, recomp_bin)
        def is_same_section(orig: int, recomp: int) -> bool:
            """Compare the section name instead of the index.
            LEGO1.dll adds extra sections for some reason. (Smacker library?)"""
            try:
                orig_name = orig_bin.sections[orig - 1].name
                recomp_name = recomp_bin.sections[recomp - 1].name
                return orig_name == recomp_name
            except IndexError:
                return False
        def to_roadmap_row(match):
            orig_sect = None
            orig_ofs = None
            orig_sect_ofs = None
            recomp_sect = None
            recomp_ofs = None
            recomp_sect_ofs = None
            orig_addr = None
            recomp_addr = None
            displacement = None
            module_name = None
            if match.recomp_addr is not None:
                if (module_ref := module_map.get_module(match.recomp_addr)) is not None:
                    (_, module_name) = module_ref
            row_type = match_type_abbreviation(match.compare_type)
            name = (
                repr(match.name)
                if match.compare_type == SymbolType.STRING
                else match.name
            )
            if match.orig_addr is not None:
                orig_addr = match.orig_addr
                (orig_sect, orig_ofs) = orig_bin.get_relative_addr(match.orig_addr)
                orig_sect_ofs = f"{orig_sect:04}:{orig_ofs:08x}"
            if match.recomp_addr is not None:
                recomp_addr = match.recomp_addr
                (recomp_sect, recomp_ofs) = recomp_bin.get_relative_addr(
                    match.recomp_addr
                )
                recomp_sect_ofs = f"{recomp_sect:04}:{recomp_ofs:08x}"
            if (
                orig_sect is not None
                and recomp_sect is not None
                and is_same_section(orig_sect, recomp_sect)
            ):
                displacement = recomp_ofs - orig_ofs
            return RoadmapRow(
                orig_sect_ofs,
                recomp_sect_ofs,
                orig_addr,
                recomp_addr,
                displacement,
                row_type,
                match.size,
                name,
                module_name,
            )
        def roadmap_row_generator(matches):
            for match in matches:
                try:
                    yield to_roadmap_row(match)
                except InvalidVirtualAddressError:
                    # This is here to work around the fact that we have RVA
                    # values (i.e. not real virtual addrs) in our compare db.
                    pass
        results = list(roadmap_row_generator(engine.get_all()))
        if args.order is not None:
            suggest_order(results, module_map, args.order)
            return
        if args.csv is None:
            if args.verbose:
                print("ORIG sections:")
                print_sections(orig_bin.sections)
                print("RECOMP sections:")
                print_sections(recomp_bin.sections)
                print_text_report(results)
            else:
                print_diff_report(results)
        if args.csv is not None:
            export_to_csv(args.csv, results)
 if __name__ == "__main__":
    main()
--- a/tools/verexp/verexp.py
+++ b/tools/verexp/verexp.py
@ -1,75 +0,0 @@
 #!/usr/bin/env python3
 import argparse
 import difflib
 import subprocess
 import os
 from isledecomp.lib import lib_path_join
 from isledecomp.utils import print_diff
 def main():
    parser = argparse.ArgumentParser(
        allow_abbrev=False,
        description="Verify Exports: Compare the exports of two DLLs.",
    )
    parser.add_argument(
        "original", metavar="original-binary", help="The original binary"
    )
    parser.add_argument(
        "recompiled", metavar="recompiled-binary", help="The recompiled binary"
    )
    parser.add_argument(
        "--no-color", "-n", action="store_true", help="Do not color the output"
    )
    args = parser.parse_args()
    if not os.path.isfile(args.original):
        parser.error(f"Original binary file {args.original} does not exist")
    if not os.path.isfile(args.recompiled):
        parser.error(f"Recompiled binary {args.recompiled} does not exist")
    def get_exports(file):
        call = [lib_path_join("DUMPBIN.EXE"), "/EXPORTS"]
        if os.name != "nt":
            call.insert(0, "wine")
            file = (
                subprocess.check_output(["winepath", "-w", file])
                .decode("utf-8")
                .strip()
            )
        call.append(file)
        raw = subprocess.check_output(call).decode("utf-8").split("\r\n")
        exports = []
        start = False
        for line in raw:
            if not start:
                if line == "            ordinal hint   name":
                    start = True
            else:
                if line:
                    exports.append(line[27 : line.rindex("  (")])
                elif exports:
                    break
        return exports
    og_exp = get_exports(args.original)
    re_exp = get_exports(args.recompiled)
    udiff = difflib.unified_diff(og_exp, re_exp)
    has_diff = print_diff(udiff, args.no_color)
    return 1 if has_diff else 0
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/tools/vtable/vtable.py
+++ b/tools/vtable/vtable.py
@ -1,111 +0,0 @@
 #!/usr/bin/env python3
 import os
 import argparse
 import logging
 from typing import List
 import colorama
 from isledecomp.bin import Bin as IsleBin
 from isledecomp.compare import Compare as IsleCompare
 from isledecomp.utils import print_combined_diff
 # Ignore all compare-db messages.
 logging.getLogger("isledecomp.compare").addHandler(logging.NullHandler())
 colorama.just_fix_windows_console()
 def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Comparing vtables.")
    parser.add_argument(
        "original", metavar="original-binary", help="The original binary"
    )
    parser.add_argument(
        "recompiled", metavar="recompiled-binary", help="The recompiled binary"
    )
    parser.add_argument(
        "pdb", metavar="recompiled-pdb", help="The PDB of the recompiled binary"
    )
    parser.add_argument(
        "decomp_dir", metavar="decomp-dir", help="The decompiled source tree"
    )
    parser.add_argument(
        "--verbose", "-v", action="store_true", help="Show more detailed information"
    )
    parser.add_argument(
        "--no-color", "-n", action="store_true", help="Do not color the output"
    )
    (args, _) = parser.parse_known_args()
    if not os.path.isfile(args.original):
        parser.error(f"Original binary {args.original} does not exist")
    if not os.path.isfile(args.recompiled):
        parser.error(f"Recompiled binary {args.recompiled} does not exist")
    if not os.path.isfile(args.pdb):
        parser.error(f"Symbols PDB {args.pdb} does not exist")
    if not os.path.isdir(args.decomp_dir):
        parser.error(f"Source directory {args.decomp_dir} does not exist")
    return args
 def show_vtable_diff(udiff: List, _: bool = False, plain: bool = False):
    print_combined_diff(udiff, plain)
 def print_summary(vtable_count: int, problem_count: int):
    if problem_count == 0:
        print(f"Vtables found: {vtable_count}.\n100% match.")
        return
    print(f"Vtables found: {vtable_count}.\nVtables not matching: {problem_count}.")
 def main():
    args = parse_args()
    vtable_count = 0
    problem_count = 0
    with IsleBin(args.original) as orig_bin, IsleBin(args.recompiled) as recomp_bin:
        engine = IsleCompare(orig_bin, recomp_bin, args.pdb, args.decomp_dir)
        for tbl_match in engine.compare_vtables():
            vtable_count += 1
            if tbl_match.ratio < 1:
                problem_count += 1
                udiff = list(tbl_match.udiff)
                print(
                    tbl_match.name,
                    f": orig 0x{tbl_match.orig_addr:x}, recomp 0x{tbl_match.recomp_addr:x}",
                )
                show_vtable_diff(udiff, args.verbose, args.no_color)
                print()
        print_summary(vtable_count, problem_count)
        # Now compare adjuster thunk functions, if there are any.
        # These matches are generated by the compare engine.
        # They should always match 100%. If not, there is a problem
        # with the inheritance or an overriden function.
        for fun_match in engine.get_functions():
            if "`vtordisp" not in fun_match.name:
                continue
            diff = engine.compare_address(fun_match.orig_addr)
            if diff.ratio < 1.0:
                problem_count += 1
                print(
                    f"Problem with adjuster thunk {fun_match.name} (0x{fun_match.orig_addr:x} / 0x{fun_match.recomp_addr:x})"
                )
    return 1 if problem_count > 0 else 0
 if __name__ == "__main__":
    raise SystemExit(main())
		`@ -1,2 +0,0 @@`
			`from .parse import ParseAsm`
			`from .swap import can_resolve_register_differences`