Skip to content

Conversation

@mkrasnitski
Copy link
Contributor

Now that platforms can specify their own address size, it would be nice for Binja to support ILP32 binaries. This PR adds initial support for recognizing when AARCH64 binaries are stored in a 32-bit ELF, based on the implementation for X32 binaries. Currently I am unsure if any underlying changes to the arm64 architecture itself are needed.

@mkrasnitski
Copy link
Contributor Author

Doing some testing with this PR reveals some additional issues with ILP32 compatibility, that I think might be at the aarch64 level. I have the following code:

#include <stdio.h>

int main(void);

void * MAIN_ADDRESS = &main;

int main(void) {
  printf("Sizes int=%lu long=%lu ptr=%lu\n", sizeof(int), sizeof(long), sizeof(int *));
  printf("main addr: %p\n", MAIN_ADDRESS);
  return 0;
}

I generated a 32-bit and 64-bit aarch64 elf using gcc (the only 32-bit toolchain I could find online was https://snapshots.linaro.org/components/toolchain/binaries/7.5-2019.12-rc1/aarch64-linux-gnu_ilp32). In the Ilp32 binary there were still a couple places where pointer sizes were wrong:

image image

Any help diagnosing these issues would be appreciated.

@zznop
Copy link
Member

zznop commented Aug 22, 2025

Hey @mkrasnitski thanks for the PR! I'll take a closer look soon, but it looks like this gets us most of the way there. From your screenshots, it looks like clang is still parsing types as 64-bit, which is why you see the _ptr64 param for printf. Should be an easy fix.

@plafosse plafosse requested a review from zznop August 28, 2025 14:02
@mkrasnitski
Copy link
Contributor Author

mkrasnitski commented Aug 28, 2025

From your screenshots, it looks like clang is still parsing types as 64-bit, which is why you see the _ptr64 param for printf. Should be an easy fix.

I took a look at the clang debug output by setting BN_DEBUG_CLANG=1, and looks like the target triple for ilp32 binaries is being explicitly set to --target=arm64-unknown-linux-ilp32-unknown. I tried adding an implementation of AdjustTypeParserInput which sets it to --target=aarch64-unknown-linux-gnu_ilp32 but that didn't seem to solve the problem. Here's what I tried:

virtual void AdjustTypeParserInput(
	Ref<TypeParser> parser,
	std::vector<std::string>& arguments,
	std::vector<std::pair<std::string, std::string>>& sourceFiles
) override
{
	if (parser->GetName() != "ClangTypeParser")
	{
		return;
	}

	for (auto& arg: arguments)
	{
		if (arg.find("--target=") == 0 && arg.find("-unknown-") != std::string::npos)
		{
			arg = "--target=aarch64-unknown-linux-gnu_ilp32";
		}
	}
}

The debug output confirms that the platform ends up modifying the clang args, however the same problems still persist, where the GOT entries for printf and __gmon_start__ seem to be using 64-bit pointers.

@zznop
Copy link
Member

zznop commented Jan 30, 2026

Sorry for sitting on this for so long. I think we're in good shape now. The pointer width for imported address symbols was still using the architecture pointer width, instead of the platforms. Also, I fixed up the AArch64 function recognizer to detect ILP32 .plt functions and propagate types.

image

Should be on dev soon

@mkrasnitski
Copy link
Contributor Author

I fixed up the AArch64 function recognizer to detect ILP32 .plt functions and propagate types.

Is this code part of the core or will it be pushed to arch/arm64 or platform/linux?

@zznop
Copy link
Member

zznop commented Jan 31, 2026

Here's the commit for the function recognizer changes

6be9a1f

@zznop
Copy link
Member

zznop commented Feb 2, 2026

This has been merged. ILP32 support is included in 5.3.9025-dev. Thanks for the contribution.

@zznop zznop closed this Feb 2, 2026
@mkrasnitski
Copy link
Contributor Author

Sorry, just now getting around to testing this. Still running into a couple bugs around the PLT and function parameters. Specifically, __libc_start_main is given an empty parameter list, and the jumps from the PLT to the GOT I think are still being lifted with the wrong address size, I think because they still branch using br x17 instead of br w17. I will try my hand at debugging this, but just attaching two sample binaries for the code snippet above (one ILP32, one AARCH64) if you also want to take a look.

samples.zip

@zznop
Copy link
Member

zznop commented Feb 2, 2026

Yes, I noticed this as well. I'm fairly confident the issue with the unresolved jumps from the PLT to the GOT are lifting bugs as a result of the 32-bit address being loaded into w17 but Binja is not zero-extending the top 32-bits in x17.

00401dc0    uint32_t strtoul(char const* str, char** endptr, int32_t base)
00401dc0  adrp    x16, getspnam
00401dc4  ldr     w17, [x16, #0x10c]  {strtoul}
00401dc8  add     w16, w16, #0x10c  {strtoul}
❓00401dcc  br      x17

This shouldn't impact analysis much though since I updated function recognizer to identify the .plt.got thunks. I do plan to look into this though. The relevant code is here: https://github.com/Vector35/binaryninja-api/blob/dev/arch/arm64/il.cpp#L889

As for no params being passed to __libc_start_main it's because __libc_start_main doesn't have an auto-type and the caller-based heuristics are failing to recognize the params. Hitting y on __libc_start_main and applying the following type seems to work:

int __libc_start_main(int *(main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (* stack_end));

image

@zznop
Copy link
Member

zznop commented Feb 2, 2026

I added platform types in 5.3.9028 which fixes the issue for __libc_start_main. I was also able to verify that the unresolved branch targets from the PLT stubs is indeed a lifting bug. By applying this gross hack, you can see that it works correctly.

diff --git a/arch/arm64/il.cpp b/arch/arm64/il.cpp
index f02cdd2d..ab6ff084 100644
--- a/arch/arm64/il.cpp
+++ b/arch/arm64/il.cpp
@@ -905,8 +905,11 @@ static void LoadStoreOperand(LowLevelILFunction& il, bool load,
                            ILSETREG_O(operand1, il.Operand(1, il.Load(load_store_sz, ILREG_O(operand2)))));
                        break;
                case MEM_OFFSET:
-                       if (!load_store_sz)
-                               load_store_sz = REGSZ_O(operand1);
+                       if ((operand1.reg[0] >= REG_W0 && operand1.reg[0] <= REG_WSP) || (operand1.reg[0] >= REG_S0 && operand1.reg[0] <= REG_S31))
+                       {
+                               BNRegisterInfo regInfo = il.GetArchitecture()->GetRegisterInfo(operand1.reg[0]);
+                               operand1.reg[0] = (Register)regInfo.fullWidthRegister;
+                       }

                        // operand1.reg = [operand2.reg + operand2.imm]
                        if (IMM_O(operand2) == 0)

However, this isn't a good solution and we should be correctly handling the zero-extension from w17 to x17 in core. I'll create a separate issue for this.

@zznop
Copy link
Member

zznop commented Feb 2, 2026

#7930

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants