-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ghidra plugin sometimes generates broken BinExport files #118
Comments
Thank you for this thorough analysis (and happy new year!). The suggested short term solution SGTM, and we should implement this in the Ghidra exporter. TBH, when I started implementing BinExport for Ghidra, I didn't have much experience with its API while at the same time choosing a different approach for writing the proto. So it's not too surprising that BinExport cannot deal with partially broken disassembly. Longer term, we can modify BinDiff to check whether a certain field is present - since we're using The fundamental underlying problem (at least how I see it) is a more philosophical one: Do most researchers want a tool that operates without any further input (thus having to deal with all kinds of edge cases in "broken" or incomplete disassembly)? Or do they want a tool that just works on a disassembly that has been cleaned up? IMO, in recent years the majority opinion seems to have shifted towards making most decisions and clean up automatic, which is why we have this impedance mismatch. The best way forward would thus be to modify the tools to match people's expectations. |
Happy New Year to you, too!
Oh, that's good! I wasn't sure that was possible, but testing with this script it seems to work to find the problem instructions: #!/usr/bin/env python3
import sys
import binexport2_pb2
be = binexport2_pb2.BinExport2()
be.ParseFromString(open(sys.argv[1], "rb").read())
print("Problem addresses/instructions:")
for flow_graph in be.flow_graph:
if not flow_graph.HasField("entry_basic_block_index"):
instr = be.instruction[
be.basic_block[flow_graph.basic_block_index[0]]
.instruction_index[0]
.begin_index
]
if instr.HasField("address"):
print("-", hex(instr.address))
else:
print("-", be.mnemonic[instr.mnemonic_index].name, instr.raw_bytes.hex())
I don't know if it's an issue of philosophy so much as one of UX (for me, at least). I can totally understand if my disassembly is too broken or incomplete for a tool to handle, but if that's the case then I need the tool to 1) tell me that my disassembly needs to be cleaned up, 2) tell me which parts of it I need to clean-up, and 3) not produce a file that the consuming application will error out on. So I think all that really needs to be done with the BinExport plugin is to have the export process present an error message when it encounters one or more problems, to list the problems it does encounter so the user can fix them (it should be possible to output this information to the Ghidra console, where it's easy to click the listed addresses to jump straight to the problem areas), and to not produce any BinExport file if an error was encountered.
Just to be clear, so long as signing the Google CLA remains a requirement for directly contributing code to this repository, I am unfortunately unable to do so. But if you (or anyone else who is able to contribute code to this repo) would like to make the suggested changes, I'd be happy to offer any assistance for which signing the CLA is not required. |
Thanks to the release of the BinDiff source code, I was finally able to solve an issue I was having when diffing bare-metal binaries (this specific instance happened with an ARM binary, but I've seen this happen with 8051 binaries as well).
Problem
The issue I was troubleshooting was causing the
bindiff
binary to exit with the following error:What's happening here is:
entry_basic_block_index
for aflow_graph
is not set, it defaults to zero.FlowGraph::Read
, if theentry_basic_block_index
of theflow_graph
is zero,entry_point_address_
will be set to the address of the 0thbasic_block
(really of the 0thinstruction
of that 0thbasic_block
).CallGraph::AttachFlowGraph
, if there is novertex
whose address is equal to the address of that 0thbasic_block
,CallGraph::GetVertex
will fail to find thevertex
and an exception will be thrown.I think the reason
entry_basic_block_index
wasn't getting set was because the function theflow_graph
came from was incompletely disassembled, which in this specific instance happened because Ghidra's disassembler had incorrectly tried disassembling Thumb-2 instructions as full-size ARM instructions. I used the following Python script to identify the starting addresses of the problemflow_graph
messages:Some of the addresses listed were zero, which was strange because I knew the code at that address was already correctly disassembled. Regardless, I fixed up the disassembly at the non-zero addresses and ran
bindiff
again. Unfortunately, I got the same error message. After some more debugging, I discovered the problem--the instructions whose addresses were set to zero were causing the following issue:FlowGraph::Read
, if the 0thinstruction
of thebasic_block
doesn't have an address set,entry_point_address_
will be set to zero.vertex
with an address matching theentry_point_address_
, an exception will be thrown.Fixing this issue required identifying those instructions that had no address set, searching for the mnemonic/bytes in the disassembly listing, and then fixing the disassembly (once again, Ghidra tried disassembling Thumb-2 instructions as ARM instructions). I used the following Python script to identify the problem instructions:
With all the improperly-disassembled instructions now properly-disassembled, the BinDiff exporter for Ghidra was able to generate a file that BinDiff was able to use. Yay!
Solution
Some suggestions for solutions to the described problems (better solutions may be available):
Near-term
flow_graph
if it can't set theentry_basic_block_index
.begin_index
is always set to the correct address.Long-term
I think the root cause of all these problems is that certain fields are optional when they really shouldn't be, especially since (please correct me if I'm wrong) it isn't possible for a protobuf-consuming application to tell the difference between an optional field whose value is zero and an optional field that is missing. Because BinDiff can't tell if a field has been set or is missing, it can't detect the problem when it happens and either handle it properly or bail early--it can only assume the value is correct and hope for the best. So, either the optional fields need to be made mandatory and always included in files generated by BinExport, or BinDiff needs to be modified to properly handle situations where those fields are missing.
The text was updated successfully, but these errors were encountered: