././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9924638 apkinspector-1.3.2/LICENSE0000644000000000000000000002613514673222122012162 0ustar00 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9924638 apkinspector-1.3.2/README.md0000644000000000000000000001324414673222122012431 0ustar00![apkInspector](https://i.imgur.com/hTzyIDG.png) ![PyPI - Version](https://img.shields.io/pypi/v/apkInspector) [![CI](https://github.com/erev0s/apkInspector/actions/workflows/ci.yml/badge.svg)](https://github.com/erev0s/apkInspector/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/erev0s/apkInspector/graph/badge.svg?token=A3YXHGXUXF)](https://codecov.io/gh/erev0s/apkInspector) # apkInspector apkInspector is a tool designed to provide detailed insights into the zip structure of APK files, offering the capability to extract content and decode the AndroidManifest.xml file. What sets APKInspector apart is its adherence to the zip specification during APK parsing, eliminating the need for reliance on external libraries. This independence, allows APKInspector to be highly adaptable, effectively emulating Android's installation process for APKs that cannot be parsed using standard libraries. The main goal is to enable users to conduct static analysis on APKs that employ evasion techniques, especially when conventional methods prove ineffective. Please check [this blog post](https://erev0s.com/blog/unmasking-evasive-threats-with-apkinspector/) for more details. ## How to install [apkInspector is available through PyPI](https://pypi.org/project/apkInspector/) ~~~~ pip install apkInspector ~~~~ or you can clone this repository and build and install locally: ~~~~ git clone https://github.com/erev0s/apkInspector.git cd apkInspector poetry build pip install dist/apkInspector-Version_here.tar.gz ~~~~ ## Documentation Documentation created based on the docstrings, is available through Sphinx: https://erev0s.github.io/apkInspector/ ## CLI apkInspector offers a command line tool with the same name, with the following options; ~~~~ $ apkInspector -h usage: apkInspector [-h] [-apk APK] [-f FILENAME] [-ll] [-lc] [-la] [-e] [-x] [-xa] [-m] [-sm SPECIFY_MANIFEST] [-a] [-v] apkInspector is a tool designed to provide detailed insights into the zip structure of APK files, offering the capability to extract content and decode the AndroidManifest.xml file. options: -h, --help show this help message and exit -apk APK APK to inspect -f FILENAME, --filename FILENAME Filename to provide info for -ll, --list-local List all files by name from local headers -lc, --list-central List all files by name from central directory header -la, --list-all List all files from both central directory and local headers -e, --export Export to JSON. What you list from the other flags, will be exported -x, --extract Attempt to extract the file specified by the -f flag -xa, --extract-all Attempt to extract all files detected in the central directory header -m, --manifest Extract and decode the AndroidManifest.xml -sm SPECIFY_MANIFEST, --specify-manifest SPECIFY_MANIFEST Pass an encoded AndroidManifest.xml file to be decoded -a, --analyze Check an APK for static analysis evasion techniques -v, --version Retrieves version information ~~~~ ## Library The library component of apkInspector is designed with extensibility in mind, allowing other tools to seamlessly integrate its functionality. This flexibility empowers developers to leverage the capabilities of apkInspector within their own applications and workflows. To facilitate clear comprehension and ease of use, comprehensive docstrings accompany all primary methods, providing valuable insights into their functionality, expected arguments, and return values. These detailed explanations serve as invaluable guides, ensuring that developers can quickly grasp the inner workings of apkInspector's core features and smoothly incorporate them into their projects. ### Features offered - Find end of central directory record - Parse central directory of APK and get details about each entry - Get details local header for each entry - Extract single or all files within an APK - Decode AndroidManifest.xml file - Identify Tampering Indicators: - End of Central Directory record defined multiple times - Unknown compression methods - Compressed entry with empty filename - Unexpected starting signature of AndroidManifest.xml - Tampered StringCount value - Strings surpassing maximum length - Invalid data between elements - Unexpected attribute size - Unexpected attribute names or values - Zero size header for namespace end nodes The command-line interface (CLI) serves as a practical illustration of how the methods provided by the library have been employed. ## Reliability Please take [a look at the results](https://github.com/erev0s/apkInspector/tree/main/tests/top_apps) from testing apkInspector against a set of top Play Store applications ## Planned to-do - Improve documentation (add examples) - Improve code coverage ## Contributions We welcome contributions from the open-source community to help improve and enhance apkInspector. Whether you're a developer, tester, or documentation enthusiast, your contributions are valuable. ## :rocket: apkInspector is being used by :rocket: : - [androguard](https://github.com/androguard/androguard/) - [medusa](https://github.com/Ch0pin/medusa) ## Presentation of the tool and the research behind it - Defcon 32 | [PDF](docs/presentation/apkinspector-Defon32-presentation.pdf) ## Disclaimer It should be kept in mind that apkInspector is an evolving project, a work in progress. As such, users should anticipate occasional bugs and anticipate updates and upgrades as the tool continues to mature and enhance its functionality. Your feedback and contributions to apkInspector are highly appreciated as we work together to improve and refine its capabilities.././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9924638 apkinspector-1.3.2/apkInspector/__init__.py0000644000000000000000000000002614673222122015717 0ustar00__version__ = "1.3.2" ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9924638 apkinspector-1.3.2/apkInspector/axml.py0000644000000000000000000012116014673222122015124 0ustar00import io import logging import struct import random from .extract import extract_file_based_on_header_info from .headers import ZipEntry from .helpers import escape_xml_entities logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(filename)s:%(lineno)d -> %(funcName)s : %(message)s' ) class ResChunkHeader: """ Chunk header used throughout the axml. This header is essential as it contains information about the header size but also the total size of the chunk the header belongs to. """ def __init__(self, header_type, header_size, total_size, data): self.type = header_type self.header_size = header_size self.total_size = total_size self.data = data @classmethod def parse(cls, file): """ Read the header type (2 bytes), header size (2 bytes), and entry size (4 bytes). :param file: the xml file e.g. with open('/path/AndroidManifest.xml', 'rb') as file :type file: bytesIO :return: Returns an instance of itself :rtype: ResChunkHeader """ header_data = file.read(8) if len(header_data) < 8: # End of file return None header_type, header_size, total_size = struct.unpack(' end_stringpool_offset: return "" # TODO:a check for non null-terminated strings should be here as well content = file.read(real_length * 2).decode('utf-16le') else: # Handle UTF-8 encoded strings u16len = struct.unpack('B', file.read(1))[0] file.read(1) u8len = u16len content = file.read(u8len).decode('utf-8', errors='replace') # TODO: fixup is needed here as well, like for the utf16 case return content @classmethod def read_strings(cls, file, string_offsets, strings_start, is_utf8): """ Gets the actual strings based on the offsets retrieved from read_string_offsets(). :param file: the xml file right after the string pool offsets have been read :type file: bytesIO :param string_offsets: see -> read_string_offsets() :type string_offsets: list :param strings_start: the offset at which the string data starts :type strings_start: int :param is_utf8: boolean to check if a utf8 string is expected :type is_utf8: bool :return: Returns a list of the string data :rtype: list """ strings = [] for offset in string_offsets: # Calculate the absolute offset within the string data +8 for the file header absolute_offset = strings_start + offset + 8 # TODO: update this to get the file header size # Move the file pointer to the start of the string file.seek(absolute_offset) # Read the length of the string (in bytes) content = cls.decode_stringpool_mixed_string(file, is_utf8, strings_start + string_offsets[-1]) strings.append(content) return strings @classmethod def read_string(cls, file, string_offset, strings_start, is_utf8, end_stringpool_offset): """ Read a string from the string pool when the offset of it is known already. :param file: the string pool data parsed as bytes :type file: io.bytesIO :param string_offset: the offset at which the string is located in the string pool :type string_offset: int :param strings_start: the offset at which the string data starts :type strings_start: int :param is_utf8: boolean to check if a utf8 string is expected :type is_utf8: bool :param end_stringpool_offset: the offset at which the string pool ends :type end_stringpool_offset: int :return: Returns the string or None :rtype: str or None """ absolute_offset = strings_start + string_offset - 28 if file.getbuffer().nbytes < absolute_offset: return None file.seek(absolute_offset) string = cls.decode_stringpool_mixed_string(file, is_utf8, end_stringpool_offset) return string @classmethod def get_string_from_pool(cls, position, string_pool_data, end_stringpool_offset, strings_start, is_utf8): """ Retrieve a single string from the String Pool, given its position. It first gets the correct offset of the string and then reads the string. :param position: The position of the string to be retrieved :type position: int :param string_pool_data: the string pool data parsed as bytes :type string_pool_data: io.BytesIO :param end_stringpool_offset: the offset at which the string pool ends :type end_stringpool_offset: int :param strings_start: the offset at which the string data starts :type strings_start: int :param is_utf8: boolean to check if a utf8 string is expected :type is_utf8: bool :return: Returns the string or None :rtype: str or None """ try: string_offset = cls.read_string_offset(string_pool_data, position) if string_offset is None: return None return cls.read_string(string_pool_data, string_offset, strings_start, is_utf8, end_stringpool_offset) except Exception as e: logging.exception(f"Exception while retrieving string from pool: {e}") return None @classmethod def parse_lite(cls, file): """ A 'lite' parser that gets the header and then reads the rest of the chunk as a blob of bytes. :param file: the AndroidManifest.xml file :type file: bytesIO :return: returns the header and the chunk data :rtype: tuple(ResStringPoolHeader, bytes) """ ResStringPool_header = ResStringPoolHeader.parse(file) string_pool_data = read_remaining(file, ResStringPool_header.header) while True: # read any null bytes remaining cur_pos = file.tell() if file.read(2) == b'\x80\x01': file.seek(cur_pos) break file.seek(cur_pos) file.read(1) return ResStringPool_header, string_pool_data @classmethod def parse(cls, file): """ Parse the string pool to acquire the strings used within the axml. :param file: the xml file right after the file header is read :type file: bytesIO :return: Returns an instance of itself :rtype: StringPoolType """ string_pool_header = ResStringPoolHeader.parse(file) string_pool_start = file.tell() size_of_strings_offsets = string_pool_header.strings_start - 28 # it should be divisible by 4, as 4 bytes are per offset, so we can get accurately the # of strings num_of_strings = size_of_strings_offsets // 4 if not (size_of_strings_offsets / 4).is_integer(): logging.warning(f"The number of strings in the string pool is not a integer number.") string_offsets = cls.read_string_offsets(file, num_of_strings, string_pool_header.strings_start + 8) is_utf8 = bool(string_pool_header.flags & (1 << 8)) string_list = cls.read_strings(file, string_offsets, string_pool_header.strings_start, is_utf8) while True: # read any null bytes remaining cur_pos = file.tell() if file.read(2) == b'\x80\x01': file.seek(cur_pos) break file.seek(cur_pos) file.read(1) if file.getbuffer().nbytes < file.tell() + 8: raise ValueError("Resource Map header was not detected.") string_pool_end = file.tell() file.seek(string_pool_start) string_pool_data = file.read(string_pool_end - string_pool_start) return cls( string_pool_header, string_offsets, string_list, string_pool_data ) class XmlResourceMapType: """ Resource map class, with the header and the resource IDs. """ def __init__(self, header, resids, resids_data): self.header = header self.resids = resids self.data = resids_data @classmethod def parse_lite(cls, file): """ A 'lite' parser that gets the header and then reads the rest of the chunk as a blob of bytes. :param file: the AndroidManifest.xml file :type file: bytesIO :return: returns the header and the chunk data :rtype: tuple(ResChunkHeader, bytes) """ resource_map_header = ResChunkHeader.parse(file) resource_map_data = read_remaining(file, resource_map_header) return resource_map_header, resource_map_data @classmethod def parse(cls, file): """ Parse the resource map and get the resource IDs. :param file: the xml file right after the string pool is read :type file: bytesIO :return: Returns an instance of itself :rtype: XmlResourceMapType """ header = ResChunkHeader.parse(file) num_resids = (header.total_size - header.header_size) // 4 resids_data = file.read(num_resids * 4) chunks = [resids_data[i:i + 4] for i in range(0, len(resids_data), 4)] resids = [struct.unpack(' 8: header_data = file.read(header.header_size - 8) return cls(header, header_data) class XmlStartNamespace: """ The actual start of the xml, after this the elements of the xml will be found. """ def __init__(self, header: ResXMLHeader, ext, ext_data): self.header = header self.ext = ext # [prefix_index, uri_index] self.data = ext_data @classmethod def parse(cls, file, header_t: ResXMLHeader): """ Parse the starting element of a Namespace :param file: the axml already pointing at the right offset :type file: bytesIO :param header_t: the already read header of the chunk :type header_t: ResXMLHeader :return: an instance of itself :rtype: XmlStartNamespace """ num_exts = (header_t.header.total_size - header_t.header.header_size) // 4 ext_data = file.read(num_exts * 4) chunks = [ext_data[i:i + 4] for i in range(0, len(ext_data), 4)] ext = [struct.unpack('= min_size: return True file.read(1) @staticmethod def parse_next_header(file): """ Dispatcher method to parse the next available header. It takes into account to move on past the header if it contains extra info besides the standard ones. The dispatcher automatically picks the correct processing method for each chunk type. :param file: the axml that will be processed :type file: bytesIO :raises NotImplementedError: The chunk type identified is not supported :return: Dispatches to the appropriate processing method for each chunk type. """ chunk_header_total = ResXMLHeader.parse(file) chunk_header = chunk_header_total.header if chunk_header is None: # end of file return None chunk_type = hex(chunk_header.type) if chunk_type in chunk_type_handlers: return chunk_type_handlers[chunk_type](file, chunk_header_total) else: raise NotImplementedError(f"Unsupported chunk type: {chunk_type}") @staticmethod def process_elements(file, num_of_elements=None): """ It starts processing the remaining chunks **after** the resource map chunk. :param file: the axml that will be processed :type file: BytesIO :param num_of_elements: how many elements should it process :type num_of_elements: int :return: Returns all the elements found as their corresponding classes and whether dummy data were found in between. :rtype: set(list, set(bool, bool)) """ elements = [] while True: cur_pos = file.tell() if file.getbuffer().nbytes < cur_pos + 8: # we reached the end of the file break ManifestStruct.check_reached_element(file) resXMLTree_node = ResXMLHeader.parse(file) cur_elem_data = read_remaining(file, resXMLTree_node.header) elem_data = resXMLTree_node.header.data + resXMLTree_node.data + cur_elem_data elements.append(ManifestStruct.parse_next_header(io.BytesIO(elem_data))) if num_of_elements is None: continue if len(elements) == num_of_elements: break return elements def get_manifest(self): """ Method to return the AndroidManifest created from this instance :return: The AndroidManifest.xml as a string :rtype: str """ manifest = create_manifest(self.elements, self.string_pool.string_list) return manifest @staticmethod def parse_lite(manifest, num_of_elements=None): """ Parse the AndroidManifest with a limit on the elements to be parsed after the string pool. The goal of this method is to make it possible to partially parse the AndroidManifest and allow faster parsing when needed. Only the header is parsed from each chunk, and the rest are there as blobs of bytes. :param manifest: The manifest to be processed :type manifest: bytesIO :param num_of_elements: How many elements of the manifest to process. Usually 3 are enough to get basic info about it. :type num_of_elements: int :return: A tuple containing four elements: ResChunkHeader, [ResStringPoolHeader, string_pool_data], [ResChunkHeader, resource_map_data], elements :rtype: tuple (ResChunkHeader_init, [ResStringPoolHeader, bytes], [ResChunkHeader, bytes], list of bytes) """ ResChunkHeader_init = ResChunkHeader.parse(manifest) ResStringPool_header, string_pool_data = StringPoolType.parse_lite(manifest) resource_map_header, resource_map_data = XmlResourceMapType.parse_lite(manifest) elements = ManifestStruct.process_elements(manifest, num_of_elements=num_of_elements) return ResChunkHeader_init, [ResStringPool_header, string_pool_data], [resource_map_header, resource_map_data], elements @classmethod def parse(cls, file): """ A composition of the rest of the classes available in the apkInspector.axml module, to form the AndroidManifest structure. :param file: the axml that will be processed :type file: bytesIO :return: an instance of itself :rtype: ManifestStruct """ header = ResChunkHeader.parse(file) string_pool = StringPoolType.parse(file) resource_map = XmlResourceMapType.parse(file) elements = cls.process_elements(file) return cls(header, string_pool, resource_map, elements) chunk_type_handlers = { '0x100': XmlStartNamespace.parse, # RES_XML_START_NAMESPACE_TYPE '0x101': XmlEndNamespace.parse, # RES_XML_END_NAMESPACE_TYPE '0x102': XmlStartElement.parse, # RES_XML_START_ELEMENT_TYPE '0x103': XmlEndElement.parse, # RES_XML_END_ELEMENT_TYPE '0x104': XmlcDataElement.parse, # RES_XML_CDATA_TYPE } def read_remaining(file: io.BytesIO, header: ResChunkHeader): """ :param file: the current file that is being processed :type file: io.BytesIO :param header: the header of the current chunk of instance ResChunkHeader :type header: ResChunkHeader :return: Returns the remaining bytes of the chunk except the header :rtype: bytes """ remaining_to_be_read = header.total_size - header.header_size return file.read(remaining_to_be_read) def process_attributes(attributes, string_list, ns_dict): """ Helps in processing the representation of attributes found in each element of the axml. It should be noted that not all datatypes are taken into account, meaning that the values of certain attributes might not be represented properly. :param attributes: the attributes of an XmlStartElement object as returned by XmlAttributeElement.parse() :type attributes: list :param string_list: the string data list from the String Pool :type string_list: list :param ns_dict: a namespace dictionary based on the XmlStartNamespace elements found :type ns_dict: dict :return: returns a string of all the attributes with their values :rtype: str """ attribute_list = [] for attr in attributes: name = string_list[attr.name_index] if not name: # It happens that the attr.name_index points to an empty string in StringPool and you have to use # the public.xml. It falls outside the scope of the tool, so I am not going to solve it for now. name = f'Unknown_Attribute_Name_{random.randint(1000, 9999)}' if attr.typed_value_datatype == 1: # reference type value = f"@{attr.typed_value_data}" elif attr.typed_value_datatype == 3: # string type try: value = escape_xml_entities(string_list[attr.typed_value_data]) except: value = attr.typed_value_data elif attr.typed_value_datatype == 17: # int-hex type value = "0x{:08X}".format(attr.typed_value_data) elif attr.typed_value_datatype == 18: # boolean type value = "true" if bool(attr.typed_value_data) else "false" elif attr.typed_value_datatype == 0: # null, used for CData return name else: # TODO: Not accurate enough, values should be represented based on which datatype. Good enough for now value = str(attr.typed_value_data) if attr.full_namespace_index < len(string_list): namespace = string_list[attr.full_namespace_index] if not namespace: # Same as with the empty name, points to an empty string in StringPool. namespace = 'android' try: attribute_list.append(f'{ns_dict[namespace]}:{name}="{value}"') except: attribute_list.append(f'{namespace.split("/")[-1]}:{name}="{value}"') else: attribute_list.append(f'{name}="{value}"') return ' '.join(attribute_list) def create_manifest(elements, string_list): """ Method to create the readable XML AndroidManifest.xml file based on the elements discovered from the processed APK :param elements: The parsed elements as returned by process_elements()[0] :type elements: list :param string_list: The string pool data :type string_list: list :return: The AndroidManifest.xml as a string :rtype: str """ android_manifest_xml = [] namespaces = {} ns_dict = {} ns_declared = [] for element in elements: if isinstance(element, XmlStartNamespace): namespaces[ string_list[ element.ext[0]]] = f'xmlns:{string_list[element.ext[0]]}="{string_list[element.ext[1]]}"' ns_dict[string_list[element.ext[1]]] = string_list[element.ext[0]] elif isinstance(element, XmlStartElement): attributes = process_attributes(element.attributes, string_list, ns_dict) attr_ns_list = set(ns.split(':')[0] for ns in attributes.split(' ') if ':' in ns) tmp_ns = [] # TODO: Somewhat hacky way to add namespaces/ Maybe improve in future depending on needs for vl in attr_ns_list: if vl not in ns_declared: if vl in namespaces: tmp_ns.append(namespaces[vl]) elif vl == 'android': tmp_ns.append(f'xmlns:android="http://schemas.android.com/apk/res/android"') ns_declared.append(vl) if tmp_ns: tag_line = f"<{string_list[element.attrext[1]]} {' '.join(tmp_ns)} {attributes}>\n" if attributes else f"<{string_list[element.attrext[1]]}>\n" else: tag_line = f"<{string_list[element.attrext[1]]} {attributes}>\n" if attributes else f"<{string_list[element.attrext[1]]}>\n" android_manifest_xml.append(tag_line) elif isinstance(element, XmlcDataElement): if android_manifest_xml[-1][-1] == '\n': android_manifest_xml[-1] = android_manifest_xml[-1].replace('\n', string_list[element.data_index]) elif isinstance(element, XmlEndElement): name = string_list[element.attrext[1]] closing_tag = f"" if name == "manifest" else f"\n" android_manifest_xml.append(closing_tag) return ''.join(android_manifest_xml) def get_manifest(raw_manifest): """ Helper method to directly return the AndroidManifest file as created by create_manifest() :param raw_manifest: expects the encoded AndroidManifest.xml file as a file-like object :type raw_manifest: bytesIO :return: returns the decoded AndroidManifest file :rtype: str """ manifest_object = ManifestStruct.parse(raw_manifest) return manifest_object.get_manifest() def parse_apk_for_manifest(inc_apk, raw: bool = False, lite: bool = False, num_of_elements: int = 3): """ Helper method to retrieve the AndroidManifest directly from an APK, either by providing the APK itself or the path. :param inc_apk: The path of the APK file or the APK itself :type inc_apk: str :param raw: Boolean parameter to define whether the manifest is provided as string or bytes :type raw: bool :param lite: Boolean parameter to define whether the lite parsing would occur or not :type lite: bool :param num_of_elements: Number of elements to parse from the APK :type num_of_elements: int :return: Returns the AndroidManifest.xml as string :rtype: str """ if raw: apk_file = inc_apk else: with open(inc_apk, 'rb') as apk: apk_file = io.BytesIO(apk.read()) entry_manifest = ZipEntry.parse_single(apk_file, "AndroidManifest.xml") manifest_local = entry_manifest.local_headers["AndroidManifest.xml"].to_dict() manifest_bytes = extract_file_based_on_header_info(apk_file, manifest_local, entry_manifest.central_directory.entries[ "AndroidManifest.xml"].to_dict())[0] if lite: manifest = get_manifest_lite(io.BytesIO(manifest_bytes), num_of_elements=num_of_elements) else: manifest = get_manifest(io.BytesIO(manifest_bytes)) return manifest def get_manifest_lite(manifest: io.BytesIO, num_of_elements: int): """ A method to provide 'lite' parsing of the AndroidManifest in order to retrieve a few details as fast as possible. Based on the integer 'num_of_elements' being passed as a parameter, it will attempt to fetch this many chunks right after the 'resource map' chunk and will get the attributes values of these elements if they are of instance XmlStartElement :param manifest: The manifest to be processed :type manifest: io.BytesIO :param num_of_elements: :type num_of_elements: int :return: Returns a dictionary of the attributes discovered :rtype: dict """ (ResChunkHeader_init, [string_pool_ResChunkHeader, string_pool_data], [resource_map_header, resource_map_data], elements) = ManifestStruct.parse_lite(manifest, num_of_elements=num_of_elements) end_stringpool_offset = string_pool_ResChunkHeader.header.total_size + 8 strings_start = string_pool_ResChunkHeader.strings_start is_utf8 = bool(string_pool_ResChunkHeader.flags & (1 << 8)) attributes_dict = {} for element in elements: if isinstance(element, XmlStartElement): for attr in element.attributes: if isinstance(attr, XmlAttributeElement): attr_name = StringPoolType.get_string_from_pool(attr.name_index, io.BytesIO(string_pool_data), end_stringpool_offset, strings_start, is_utf8) attribute_value = get_attribute_value(attr_name, attr, end_stringpool_offset, strings_start, is_utf8, io.BytesIO(string_pool_data)) attributes_dict[attr_name] = attribute_value return attributes_dict def get_attribute_value(attr_name, attribute, end_stringpool_offset, strings_start, is_utf8, string_pool_data): """ Gets the value for a single attribute :param attr_name: The attribute name as it has been retrieved by the string pool :type attr_name: str :param attribute: the parsed attribute itself :type attribute: XmlAttributeElement :param end_stringpool_offset: The end of string pool offset :type end_stringpool_offset: int :param strings_start: the strings start offset for the string pool :type strings_start: int :param is_utf8: boolean to check if a utf8 string is expected :type is_utf8: bool :param string_pool_data: The string pool data as io.BytesIO :type string_pool_data: io.BytesIO :return: returns the attribute value :rtype: str """ try: if attribute.typed_value_datatype == 1: # reference type return f"@{attribute.typed_value_data}" elif attribute.typed_value_datatype == 3: # string type str_pool_loc = StringPoolType.get_string_from_pool(attribute.typed_value_data, string_pool_data, end_stringpool_offset, strings_start, is_utf8) return escape_xml_entities(str_pool_loc) if str_pool_loc else str(attribute.typed_value_data) elif attribute.typed_value_datatype == 4: # float type str_pool_loc = StringPoolType.get_string_from_pool(attribute.typed_value_data, string_pool_data, end_stringpool_offset, strings_start, is_utf8) if not str_pool_loc: str_pool_loc = StringPoolType.get_string_from_pool(attribute.raw_value_index, string_pool_data, end_stringpool_offset, strings_start, is_utf8) return str_pool_loc if str_pool_loc else str(attribute.typed_value_data) elif attribute.typed_value_datatype == 17: # int-hex type return f"0x{attribute.typed_value_data:08X}" elif attribute.typed_value_datatype == 18: # boolean type return "true" if attribute.typed_value_data else "false" else: return str(attribute.typed_value_data) except Exception as e: logging.exception(f"Exception processing attribute {attr_name}: {e}") return str(attribute.typed_value_data) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9924638 apkinspector-1.3.2/apkInspector/extract.py0000644000000000000000000001224614673222122015641 0ustar00import zlib import os def extract_file_based_on_header_info(apk_file, local_header_info, central_directory_info): """ Extracts a single file from the apk_file based on the information provided from the offset and the header_info. It takes into account that the compression method provided might not be STORED or DEFLATED! The returned 'indicator', shows what compression method was used. Besides the standard STORED/DEFLATE it may return 'DEFLATED_TAMPERED', which means that the compression method found was not DEFLATED(8) but it should have been, and 'STORED_TAMPERED' which means that the compression method found was not STORED(0) but should have been. :param apk_file: The APK file e.g. with open('test.apk', 'rb') as apk_file :type apk_file: bytesIO :param local_header_info: The local header dictionary info for that specific filename :type local_header_info: dict :param central_directory_info: The central directory entry for that specific filename :type central_directory_info: dict :return: Returns the actual extracted data for that file along with an indication of whether a static analysis evasion technique was used or not. :rtype: set(bytes, str) """ filename_length = local_header_info["file_name_length"] if local_header_info["compressed_size"] == 0 or local_header_info["uncompressed_size"] == 0: compressed_size = central_directory_info["compressed_size"] uncompressed_size = central_directory_info["uncompressed_size"] else: compressed_size = local_header_info["compressed_size"] uncompressed_size = local_header_info["uncompressed_size"] extra_field_length = local_header_info["extra_field_length"] compression_method = local_header_info["compression_method"] # Skip the offset + local header to reach the compressed data local_header_size = 30 # Size of the local header in bytes offset = central_directory_info["relative_offset_of_local_file_header"] apk_file.seek(offset + local_header_size + filename_length + extra_field_length) if compression_method == 0: # Stored (no compression) uncompressed_data = apk_file.read(uncompressed_size) extracted_data = uncompressed_data indicator = 'STORED' elif compression_method == 8: compressed_data = apk_file.read(compressed_size) # -15 for windows size due to raw stream with no header or trailer extracted_data = zlib.decompress(compressed_data, -15) indicator = 'DEFLATED' elif compressed_size == uncompressed_size: compressed_data = apk_file.read(uncompressed_size) extracted_data = compressed_data indicator = 'STORED_TAMPERED' else: cur_loc = apk_file.tell() try: compressed_data = apk_file.read(compressed_size) extracted_data = zlib.decompress(compressed_data, -15) indicator = 'DEFLATED_TAMPERED' except: apk_file.seek(cur_loc) compressed_data = apk_file.read(uncompressed_size) extracted_data = compressed_data indicator = 'STORED_TAMPERED' return extracted_data, indicator def extract_all_files_from_central_directory(apk_file, central_directory_entries, local_header_entries, output_dir): """ Extracts all files from an APK based on the entries detected in the central_directory_entries. :param apk_file: The APK file e.g. with open('test.apk', 'rb') as apk_file :type apk_file: bytesIO :param central_directory_entries: The dictionary with all the entries for the central directory :type central_directory_entries: dict :param local_header_entries: The dictionary with all the local header entries :type local_header_entries: dict :param output_dir: The output directory where to save the files. :type output_dir: str :return: Returns 0 if no errors, 1 if an exception and 2 if the output directory already exists :rtype: int """ try: # Check if the output directory already exists if os.path.exists(output_dir): print("Extraction aborted. Output directory already exists.") return 2 # Create the output directory or overwrite if it already exists os.makedirs(output_dir, exist_ok=True) # Iterate over central directory entries for filename, cd_header_info in central_directory_entries.items(): if not filename: # to account for the cases where an empty filename entry is added continue # Extract the file using the local header information extracted_data = \ extract_file_based_on_header_info(apk_file, local_header_entries[filename], cd_header_info)[0] # Construct the output file path output_path = os.path.join(output_dir, filename) # Create directories if necessary os.makedirs(os.path.dirname(output_path), exist_ok=True) # Write the extracted data to the output file with open(output_path, 'wb') as output_file: output_file.write(extracted_data) return 0 except Exception as e: print(f"Error extracting files: {e}") return 1 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9934638 apkinspector-1.3.2/apkInspector/headers.py0000644000000000000000000006542514673222122015611 0ustar00import io import os import struct from typing import Dict from .extract import extract_file_based_on_header_info, extract_all_files_from_central_directory from .helpers import pretty_print_header, save_to_json, save_data_to_file class EndOfCentralDirectoryRecord: """ A class to provide details about the end of central directory record. """ def __init__(self, signature, number_of_this_disk, disk_where_central_directory_starts, number_of_central_directory_records_on_this_disk, total_number_of_central_directory_records, size_of_central_directory, offset_of_start_of_central_directory, comment_length, comment): self.signature = signature self.number_of_this_disk = number_of_this_disk self.disk_where_central_directory_starts = disk_where_central_directory_starts self.number_of_central_directory_records_on_this_disk = number_of_central_directory_records_on_this_disk self.total_number_of_central_directory_records = total_number_of_central_directory_records self.size_of_central_directory = size_of_central_directory self.offset_of_start_of_central_directory = offset_of_start_of_central_directory self.comment_length = comment_length self.comment = comment @classmethod def parse(cls, apk_file): """ Method to locate the "end of central directory record signature" as the first step of the correct process of reading a ZIP archive. Should be noted that certain APKs do not follow the zip specification and declare multiple "end of central directory records". For this reason the search for the corresponding signature of the eocd starts from the end of the apk. :param apk_file: The already read/loaded data of the APK file e.g. with open('test.apk', 'rb') as apk_file :type apk_file: bytesIO :return: Returns the end of central directory record with all the information available if the corresponding signature is found. If not, then it returns None. :rtype: EndOfCentralDirectoryRecord or None """ chunk_size = 1024 offset = 0 signature_offset = -1 file_size = apk_file.seek(0, 2) while offset < file_size: position = max(0, file_size - offset - chunk_size) apk_file.seek(position) chunk = apk_file.read(chunk_size) if not chunk: break signature_offset = chunk.rfind(b'\x50\x4b\x05\x06') # EOCD signature if signature_offset != -1: eo_central_directory_offset = position + signature_offset break # Found EOCD signature # Adjust offset to overlap by 4 bytes offset += chunk_size - 4 if signature_offset == -1: raise ValueError("End of central directory record (EOCD) signature not found") apk_file.seek(eo_central_directory_offset) signature = apk_file.read(4) number_of_this_disk = struct.unpack(' Dict[str, CentralDirectoryEntry]: """ List of information about the entries in the central directory. :return: returns a dictionary where the keys are the filenames and the values are each an instance of the CentralDirectoryEntry :rtype: dict """ return self.central_directory.entries def namelist(self): """ List of the filenames included in the central directory. :return: returns the list of the filenames :rtype: list """ return [vl for vl in self.central_directory.to_dict()] def extract_all(self, extract_path, apk_name): """ Extracts all the contents of the APK. :param extract_path: where to extract it :type extract_path: str :param apk_name: the name of the apk :type apk_name: str """ output_path = os.path.join(extract_path, apk_name) if not extract_all_files_from_central_directory(self.zip, self.to_dict()["central_directory"], self.to_dict()["local_headers"], output_path): print(f"Extraction successful for: {apk_name}") def print_headers_of_filename(cd_h_of_file, local_header_of_file): """ Prints out the details for both the central directory header and the local file header. Useful for the CLI. :param cd_h_of_file: central directory header of a filename as it may be retrieved from headers_of_filename :type cd_h_of_file: dict :param local_header_of_file: local header dictionary of a filename as it may be retrieved from headers_of_filename :type local_header_of_file: dict """ if not cd_h_of_file or not local_header_of_file: print("Are you sure the filename exists?") return pretty_print_header("CENTRAL DIRECTORY") for k in cd_h_of_file: if k == 'Relative offset of local file header' or k == 'Offset in the central directory header': print(f"{k:40} : {hex(int(cd_h_of_file[k]))} | {cd_h_of_file[k]}") else: print(f"{k:40} : {cd_h_of_file[k]}") pretty_print_header("LOCAL HEADER") for k in local_header_of_file: print(f"{k:40} : {local_header_of_file[k]}") def show_and_save_info_of_headers(entries, apk_name, header_type: str, export: bool, show: bool): """ Print information for each entry for the central directory header and allow to possibly export to JSON. :param entries: The dictionary with all the entries for the central directory :type entries: dict :param apk_name: String with the name of the APK, so it can be used for the export. :type apk_name: str :param header_type: What type of header that is, either central_directory or local, to be used for the export :type header_type: str :param export: Boolean for exporting or not to JSON :type export: bool :param show: Boolean for printing or not the entries :type show: bool """ if show: for entry in entries: pretty_print_header(entry) print(entries[entry]) if export: save_to_json(f"{apk_name}_{header_type}_header.json", entries) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9934638 apkinspector-1.3.2/apkInspector/helpers.py0000644000000000000000000000326514673222122015632 0ustar00import json def pretty_print_header(header_text, width=50, char='-'): """ Formatting output used for the CLI :param header_text: The text to be displayed :type header_text: str :param width: total width of the display :type width: int :param char: which char to be used as a filler :type char: str """ padding = max(0, width - len(header_text)) // 2 formatted_header = f"\n{char * padding} {header_text} {char * padding}" print(formatted_header) def save_data_to_file(filename, data): """ Write data to file :param data: the actual data :type data: bytes :param filename: file to be saved in :type filename: str """ try: with open(filename, 'wb') as output_file: output_file.write(data) print(f"Data saved to {filename}") except Exception as e: print(f"Error while saving data to {filename}: {e}") def save_to_json(filename, dictionary): """ Simple method to save a dictionary as JSON into the filename. :param filename: the name of the file to be saved as :type filename: str :param dictionary: the dictionary to be saved as JSON :type dictionary: dict """ with open(filename, "w") as h_file: json.dump(dictionary, h_file, indent=4) def escape_xml_entities(data): """ Escaping characters that cant be included within an XML file. :param data: The string to escape :type data: str :return: The escaped output :rtype: str """ replacements = { '<': '<', '>': '>', '&': '&', '"': '"', "'": ''' } return ''.join(replacements.get(c, c) for c in data)././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9934638 apkinspector-1.3.2/apkInspector/indicators.py0000644000000000000000000002214114673222122016321 0ustar00import io import struct from .extract import extract_file_based_on_header_info from .headers import ZipEntry from .axml import ResChunkHeader, StringPoolType, XmlResourceMapType, XmlStartElement, ManifestStruct def count_eocd(apk_file): """ Counter for the number of time the end of central directory record was found. :param apk_file: The APK file e.g. with open('test.apk', 'rb') as apk_file :type apk_file: bytesIO :return: The count of how many times the end of central directory record was found :rtype: int """ apk_file.seek(0) content = apk_file.read() return content.count(b'\x50\x4b\x05\x06') def zip_tampering_indicators(apk_file, strict: bool): """ Method to check the for indicators of tampering in the ZIP structure of the APK. These tamperings in the ZIP structure, serve as a method of evasion against static analysis tools. :param apk_file: The APK file e.g. with open('test.apk', 'rb') as apk_file :type apk_file: bytesIO :param strict: Whether to be checking strictly or not. Utilizing the application set that was used also for the tests here https://github.com/erev0s/apkInspector/tree/main/tests/top_apps, we tested what kind of indicators would be returned. It turns out that in some cases the local header and the central directory entry for the same file do not have the same values for some keys. So the strict checking was added, to be able to exclude these rare but possible occasions. :type strict: bool :return: Returns a dictionary with the detected indicators. :rtype: dict """ zip_tampering_indicators_dict = {} if strict: # This is added as strict as a few legitimate APKs do have it for some reason count = count_eocd(apk_file) if count > 1: zip_tampering_indicators_dict['eocd_count'] = count zipentry_dict = ZipEntry.parse(apk_file).to_dict() empty_keys = any(k == "" or k is None for k in zipentry_dict["central_directory"].keys()) if empty_keys: zip_tampering_indicators_dict['empty_keys'] = empty_keys unique_keys = list(zipentry_dict["central_directory"].keys() ^ zipentry_dict["local_headers"].keys()) common_keys = list(set(zipentry_dict["central_directory"].keys()) & set(zipentry_dict["local_headers"].keys())) if unique_keys: zip_tampering_indicators_dict['unique_entries'] = unique_keys for key in common_keys: cd_entry = zipentry_dict["central_directory"][key] lh_entry = zipentry_dict["local_headers"][key] temp = {} if cd_entry['compression_method'] not in [0, 8]: temp['central compression method'] = cd_entry['compression_method'] if lh_entry['compression_method'] not in [0, 8]: temp['local compression method'] = lh_entry['compression_method'] if cd_entry['compression_method'] not in [0, 8] or lh_entry['compression_method'] not in [0, 8]: indicator = \ extract_file_based_on_header_info(apk_file, lh_entry, cd_entry)[ 1] temp['actual compression method'] = indicator df_keys = local_and_central_header_discrepancies(cd_entry, lh_entry, strict) if df_keys: temp['differing headers'] = df_keys if not temp: continue zip_tampering_indicators_dict[key] = temp return zip_tampering_indicators_dict def local_and_central_header_discrepancies(dict1, dict2, strict: bool): """ Checking discrepancies between local header values and central directory values :param dict1: the central directory dictionary :type dict1: dict :param dict2: the local headers dictionary :type dict2: dict :param strict: Boolean for strict checking the headers or not :type strict: bool :return: Returns a list with the common keys between the dictionaries that have different values. :rtype: list """ common_keys = set(dict1.keys()) & set(dict2.keys()) differences = {key: (dict1[key], dict2[key]) for key in common_keys if dict1[key] != dict2[key]} # Display the keys with differing values keys = [] for key, values in dict(sorted(differences.items())).items(): # strict checking or not: excluding these as they differ often if not strict and key in ['extra_field', 'extra_field_length', 'crc32_of_uncompressed_data', 'compressed_size', 'uncompressed_size']: continue keys.append(key) return keys def process_elements_indicators(file): """ It starts processing the remaining chunks **after** the resource map chunk. It also returns whether dummy data have been found between the elements, so it can be reported that the apk employed this evasion technique. The difference between the process_elements method found in the axml module is that in this case it does not take into account the total size of the element as stated in the header, but tries to parse the contents regardless. This means that it will detect any dummy data injected after the actual data. :param file: the axml that will be processed :type file: BytesIO :return: Returns all the elements found as their corresponding classes and whether dummy data were found in between. :rtype: set(list, set(bool, bool)) """ elements = [] dummy_data_between_elements = False wrong_end_namespace_size = False possible_types = {256, 257, 258, 259, 260} min_size = 8 while True: cur_pos = file.tell() if file.getbuffer().nbytes < cur_pos + min_size: # we reached the end of the file break _type, _header_size, _size = struct.unpack('= min_size): if _size < min_size: if _type == 257: wrong_end_namespace_size = True if file.getbuffer().nbytes <= cur_pos + 24: break file.read(1) continue chunk_type = ManifestStruct.parse_next_header(file) elements.append(chunk_type) continue file.read(1) dummy_data_between_elements = True return elements, (dummy_data_between_elements, wrong_end_namespace_size) def manifest_tampering_indicators(manifest): """ Method to check for indicators of tampering in the AndroidManifest.xml :param manifest: The AndroidManifest file to check :type manifest: bytesIO :return: Returns a dictionary with the indicators of tampering for the AndroidManifest :rtype: dict """ chunkHeader = ResChunkHeader.parse(manifest) manifest_tampering_indicators_dict = {} if chunkHeader.type != 3: manifest_tampering_indicators_dict['unexpected_starting_signature_of_androidmanifest'] = hex(chunkHeader.type) string_pool = StringPoolType.parse(manifest) if len(string_pool.string_offsets) != string_pool.str_header.string_count: manifest_tampering_indicators_dict['string_pool'] = {'string_count': string_pool.str_header.string_count, 'real_string_count': len(string_pool.string_offsets)} XmlResourceMapType.parse(manifest) elements, dummy = process_elements_indicators(manifest) for element in elements: if isinstance(element, XmlStartElement): for attr in element.attributes: if element.attrext[3] != 20: manifest_tampering_indicators_dict['unexpected_attribute_size'] = True if 0 <= attr.name_index < len(string_pool.string_list): if string_pool.string_list[attr.name_index] == "": manifest_tampering_indicators_dict['unexpected_attribute_names'] = True if dummy[0]: manifest_tampering_indicators_dict['invalid_data_between_elements'] = True if dummy[1]: manifest_tampering_indicators_dict['zero_size_header_for_namespace_end_nodes'] = True return manifest_tampering_indicators_dict def apk_tampering_check(apk_file, strict: bool): """ Method to combine the check for tampering in the zip structure and in the AndroidManifest and return the results. :param apk_file: The apk file to check :type apk_file: bytesIO :param strict: A boolean to strictly check all fields or not. Suggested value: False :type strict: bool :return: Returns a combined dictionary with the results from the zip_tampering_indicators and the manifest_tampering_indicators :rtype: dict """ zip_tampering_indicators_dict = zip_tampering_indicators(apk_file, strict) zipentry = ZipEntry.parse(apk_file) cd_h_of_file = zipentry.get_central_directory_entry_dict("AndroidManifest.xml") local_header_of_file = zipentry.get_local_header_dict("AndroidManifest.xml") manifest = io.BytesIO(extract_file_based_on_header_info(apk_file, local_header_of_file, cd_h_of_file)[0]) manifest_tampering_indicators_dict = manifest_tampering_indicators(manifest) return {'zip tampering': zip_tampering_indicators_dict, 'manifest tampering': manifest_tampering_indicators_dict} ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9934638 apkinspector-1.3.2/apkInspectorCLI/__init__.py0000644000000000000000000000000114673222122016240 0ustar00 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817361.9934638 apkinspector-1.3.2/apkInspectorCLI/main.py0000644000000000000000000002262114673222122015441 0ustar00import argparse import io import os from apkInspector import __version__ as version from apkInspector.extract import extract_file_based_on_header_info, extract_all_files_from_central_directory from apkInspector.headers import print_headers_of_filename, ZipEntry, show_and_save_info_of_headers from apkInspector.helpers import save_data_to_file, pretty_print_header from apkInspector.indicators import apk_tampering_check from apkInspector.axml import get_manifest def print_nested_dict(dictionary, parent_key=''): for key, value in dictionary.items(): if isinstance(value, dict): print_nested_dict(value, parent_key=f"{parent_key}") else: full_key = f"{parent_key}->{key}" if parent_key else key print(f"{full_key}: {value}") def get_apk_files(path): # If the path is a single file, return it as a list if it's an APK if os.path.isfile(path) and path.endswith('.apk'): return [path] # If the path is a directory, return a list of all APK files in it elif os.path.isdir(path): return [os.path.join(path, f) for f in os.listdir(path) if f.endswith('.apk') and os.path.isfile(os.path.join(path, f))] # If the path is invalid or not an APK file, return an empty list return [] def main(): parser = argparse.ArgumentParser(description='apkInspector is a tool designed to provide detailed insights into ' 'the zip structure of APK files, offering the ' 'capability to extract content and decode the AndroidManifest.xml ' 'file.') parser.add_argument('-apk', help='APK to inspect') parser.add_argument('-f', '--filename', help='Filename to provide info for') parser.add_argument('-ll', '--list-local', action='store_true', help='List all files by name from local headers') parser.add_argument('-lc', '--list-central', action='store_true', help='List all files by name from central ' 'directory header') parser.add_argument('-la', '--list-all', action='store_true', help='List all files from both central directory and local headers') parser.add_argument('-e', '--export', action='store_true', help='Export to JSON. What you list from the other flags, will be exported') parser.add_argument('-x', '--extract', action='store_true', help='Attempt to extract the file specified by the -f ' 'flag') parser.add_argument('-xa', '--extract-all', action='store_true', help='Attempt to extract all files detected in ' 'the central directory header') parser.add_argument('-m', '--manifest', action='store_true', help='Extract and decode the AndroidManifest.xml') parser.add_argument('-sm', '--specify-manifest', help='Pass an encoded AndroidManifest.xml file to be decoded') parser.add_argument('-a', '--analyze', action='store_true', help='Check an APK for static analysis evasion techniques') parser.add_argument('-v', '--version', action='store_true', help='Retrieves version information') args = parser.parse_args() if args.version: mm = """ # _ _____ _ # | | |_ _| | | # __ _ _ __ | | __ | | _ __ ___ _ __ ___ ___ | |_ ___ _ __ # / _` || '_ \ | |/ / | | | '_ \ / __|| '_ \ / _ \ / __|| __| / _ \ | '__| # | (_| || |_) || < _| |_ | | | |\__ \| |_) || __/| (__ | |_ | (_) || | # \__,_|| .__/ |_|\_\ \___/ |_| |_||___/| .__/ \___| \___| \__| \___/ |_| # | | | | # |_| |_| """ print(mm) print(f"apkInspector Library Version: {version}") print(f"Copyright 2024 erev0s \n") return print(f"apkInspector Version: {version}") print(f"Copyright 2024 erev0s \n") if args.apk is None and args.specify_manifest is None: parser.error('APK file or AndroidManifest.xml file is required') if not (args.specify_manifest is None) != (args.apk is None): parser.error( 'Please specify an apk file with flag "-apk" or an AndroidManifest.xml file with flag "-sm", but not both.') if args.apk: apk_files = get_apk_files(args.apk) if not apk_files: print(f"No APK files found at: {args.apk}") return for apk in apk_files: pretty_print_header(f"Results for {apk}:") apk_name = os.path.splitext(apk)[0] with open(apk, 'rb') as apk_file: zipentry = ZipEntry.parse(apk_file) if args.filename and args.extract: cd_h_of_file = zipentry.get_central_directory_entry_dict(args.filename) if cd_h_of_file is None: print(f"It appears that file: {args.filename} is not among the entries of the central directory!") return local_header_of_file = zipentry.get_local_header_dict(args.filename) print_headers_of_filename(cd_h_of_file, local_header_of_file) extracted_data = extract_file_based_on_header_info(apk_file, local_header_of_file, cd_h_of_file)[0] save_data_to_file(f"EXTRACTED_{args.filename}", extracted_data) elif args.filename: cd_h_of_file = zipentry.get_central_directory_entry_dict(args.filename) if cd_h_of_file is None: print(f"It appears that file: {args.filename} is not among the entries of the central directory!") return local_header_of_file = zipentry.get_local_header_dict(args.filename) print_headers_of_filename(cd_h_of_file, local_header_of_file) elif args.extract_all: print(f"Number of entries: {len(zipentry.central_directory.entries)}") if not extract_all_files_from_central_directory(apk_file, zipentry.to_dict()["central_directory"], zipentry.to_dict()["local_headers"], apk_name): print(f"Extraction successful for: {apk_name}") elif args.list_local: show_and_save_info_of_headers(zipentry.to_dict()["local_headers"], apk_name, "local", args.export, True) print(f"Local headers list complete. Export: {args.export}") elif args.list_central: show_and_save_info_of_headers(zipentry.to_dict()["central_directory"], apk_name, "central", args.export, True) print(f"Central header list complete. Export: {args.export}") elif args.list_all: show_and_save_info_of_headers(zipentry.to_dict()["central_directory"], apk_name, "local", args.export, True) show_and_save_info_of_headers(zipentry.to_dict()["local_headers"], apk_name, "local", args.export, True) print(f"Central and local headers list complete. Export: {args.export}") elif args.manifest: cd_h_of_file = zipentry.get_central_directory_entry_dict("AndroidManifest.xml") local_header_of_file = zipentry.get_local_header_dict("AndroidManifest.xml") extracted_data = io.BytesIO( extract_file_based_on_header_info(apk_file, local_header_of_file, cd_h_of_file)[0]) manifest = get_manifest(extracted_data) with open("decoded_AndroidManifest.xml", "w", encoding="utf-8") as xml_file: xml_file.write(manifest) print("AndroidManifest was saved as: decoded_AndroidManifest.xml") elif args.analyze: tamperings = apk_tampering_check(zipentry.zip, False) if tamperings['zip tampering']: print( f"\nThe zip structure was tampered with using the following patterns:\n") print_nested_dict(tamperings['zip tampering']) else: print(f"No files were detected were a tampering in the zip structure was present.") if tamperings['manifest tampering']: print(f"\n\nThe AndroidManifest.xml file was tampered using the following patterns:\n") print_nested_dict(tamperings['manifest tampering']) else: print(f"The AndroidManifest.xml file does not seem to be tampered structurally.") else: parser.print_help() elif args.specify_manifest: with open(args.specify_manifest, 'rb') as enc_manifest: manifest = get_manifest(io.BytesIO(enc_manifest.read())) with open("decoded_AndroidManifest.xml", "w", encoding="utf-8") as xml_file: xml_file.write(manifest) print("AndroidManifest was saved as: decoded_AndroidManifest.xml") if __name__ == '__main__': main() ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726817362.0434644 apkinspector-1.3.2/pyproject.toml0000644000000000000000000000124314673222122014062 0ustar00[tool.poetry] name = "apkInspector" version = "1.3.2" description = "apkInspector is a tool designed to provide detailed insights into the zip structure of APK files, offering the capability to extract content and decode the AndroidManifest.xml file." authors = ["erev0s "] license = "Apache-2.0" readme = "README.md" repository = "https://github.com/erev0s/apkInspector" packages = [ { include = "apkInspector" }, { include = "apkInspectorCLI" }, ] [tool.poetry.dependencies] python = "^3.5" [tool.poetry.scripts] apkInspector = "apkInspectorCLI.main:main" [build-system] requires = ["poetry-core"] build-backend = "poetry.core.masonry.api" apkinspector-1.3.2/PKG-INFO0000644000000000000000000001523200000000000012206 0ustar00Metadata-Version: 2.1 Name: apkInspector Version: 1.3.2 Summary: apkInspector is a tool designed to provide detailed insights into the zip structure of APK files, offering the capability to extract content and decode the AndroidManifest.xml file. Home-page: https://github.com/erev0s/apkInspector License: Apache-2.0 Author: erev0s Author-email: projects@erev0s.com Requires-Python: >=3.5,<4.0 Classifier: License :: OSI Approved :: Apache Software License Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Project-URL: Repository, https://github.com/erev0s/apkInspector Description-Content-Type: text/markdown ![apkInspector](https://i.imgur.com/hTzyIDG.png) ![PyPI - Version](https://img.shields.io/pypi/v/apkInspector) [![CI](https://github.com/erev0s/apkInspector/actions/workflows/ci.yml/badge.svg)](https://github.com/erev0s/apkInspector/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/erev0s/apkInspector/graph/badge.svg?token=A3YXHGXUXF)](https://codecov.io/gh/erev0s/apkInspector) # apkInspector apkInspector is a tool designed to provide detailed insights into the zip structure of APK files, offering the capability to extract content and decode the AndroidManifest.xml file. What sets APKInspector apart is its adherence to the zip specification during APK parsing, eliminating the need for reliance on external libraries. This independence, allows APKInspector to be highly adaptable, effectively emulating Android's installation process for APKs that cannot be parsed using standard libraries. The main goal is to enable users to conduct static analysis on APKs that employ evasion techniques, especially when conventional methods prove ineffective. Please check [this blog post](https://erev0s.com/blog/unmasking-evasive-threats-with-apkinspector/) for more details. ## How to install [apkInspector is available through PyPI](https://pypi.org/project/apkInspector/) ~~~~ pip install apkInspector ~~~~ or you can clone this repository and build and install locally: ~~~~ git clone https://github.com/erev0s/apkInspector.git cd apkInspector poetry build pip install dist/apkInspector-Version_here.tar.gz ~~~~ ## Documentation Documentation created based on the docstrings, is available through Sphinx: https://erev0s.github.io/apkInspector/ ## CLI apkInspector offers a command line tool with the same name, with the following options; ~~~~ $ apkInspector -h usage: apkInspector [-h] [-apk APK] [-f FILENAME] [-ll] [-lc] [-la] [-e] [-x] [-xa] [-m] [-sm SPECIFY_MANIFEST] [-a] [-v] apkInspector is a tool designed to provide detailed insights into the zip structure of APK files, offering the capability to extract content and decode the AndroidManifest.xml file. options: -h, --help show this help message and exit -apk APK APK to inspect -f FILENAME, --filename FILENAME Filename to provide info for -ll, --list-local List all files by name from local headers -lc, --list-central List all files by name from central directory header -la, --list-all List all files from both central directory and local headers -e, --export Export to JSON. What you list from the other flags, will be exported -x, --extract Attempt to extract the file specified by the -f flag -xa, --extract-all Attempt to extract all files detected in the central directory header -m, --manifest Extract and decode the AndroidManifest.xml -sm SPECIFY_MANIFEST, --specify-manifest SPECIFY_MANIFEST Pass an encoded AndroidManifest.xml file to be decoded -a, --analyze Check an APK for static analysis evasion techniques -v, --version Retrieves version information ~~~~ ## Library The library component of apkInspector is designed with extensibility in mind, allowing other tools to seamlessly integrate its functionality. This flexibility empowers developers to leverage the capabilities of apkInspector within their own applications and workflows. To facilitate clear comprehension and ease of use, comprehensive docstrings accompany all primary methods, providing valuable insights into their functionality, expected arguments, and return values. These detailed explanations serve as invaluable guides, ensuring that developers can quickly grasp the inner workings of apkInspector's core features and smoothly incorporate them into their projects. ### Features offered - Find end of central directory record - Parse central directory of APK and get details about each entry - Get details local header for each entry - Extract single or all files within an APK - Decode AndroidManifest.xml file - Identify Tampering Indicators: - End of Central Directory record defined multiple times - Unknown compression methods - Compressed entry with empty filename - Unexpected starting signature of AndroidManifest.xml - Tampered StringCount value - Strings surpassing maximum length - Invalid data between elements - Unexpected attribute size - Unexpected attribute names or values - Zero size header for namespace end nodes The command-line interface (CLI) serves as a practical illustration of how the methods provided by the library have been employed. ## Reliability Please take [a look at the results](https://github.com/erev0s/apkInspector/tree/main/tests/top_apps) from testing apkInspector against a set of top Play Store applications ## Planned to-do - Improve documentation (add examples) - Improve code coverage ## Contributions We welcome contributions from the open-source community to help improve and enhance apkInspector. Whether you're a developer, tester, or documentation enthusiast, your contributions are valuable. ## :rocket: apkInspector is being used by :rocket: : - [androguard](https://github.com/androguard/androguard/) - [medusa](https://github.com/Ch0pin/medusa) ## Presentation of the tool and the research behind it - Defcon 32 | [PDF](docs/presentation/apkinspector-Defon32-presentation.pdf) ## Disclaimer It should be kept in mind that apkInspector is an evolving project, a work in progress. As such, users should anticipate occasional bugs and anticipate updates and upgrades as the tool continues to mature and enhance its functionality. Your feedback and contributions to apkInspector are highly appreciated as we work together to improve and refine its capabilities.