File Folding.

File Folding.

File Folding is a technique that moves a file into hex, and that hex is broken into folder file names in a fashion that can be reconstructed.

I had a little idea that I needed out of my head this week...

The idea is to break a file down into hexidecimal, then calculate how many folders are needed to place strings of that hexidecimal file into the folder name, including the means to support a lambda function that will keep track of the order of those strings, what you will be left with is many folders with folder names that represent a part of the file you have broken down into named folders

For example, below is taking the infamous Mimikatz.exe and 'Folding' the file, this leaves us with 19534 directories, 0 files, that's a lot of folders!

Created 19533 folders in /Users/carroll/Desktop/mimikatz.exe based on the file /Users/carroll/Downloads/mimikatz.exe Original file size: 1250056 bytes SHA-256 hash of the original file: 92804faaab2175dc501d73e814663058c78c0a042675a8937266357bcfb96c50 carroll@PewPew5 Desktop % tree mimikatz.exe mimikatz.exe β”œβ”€β”€ 0000_4d5a90000300000004000000ffff0000b80000000000000040000000000000000000000000000000000000000000000000000000000000000000000020010000 β”œβ”€β”€ 0001_0e1fba0e00b409cd21b8014ccd21546869732070726f6772616d2063616e6e6f742062652072756e20696e20444f53206d6f64652e0d0d0a2400000000000000 β”œβ”€β”€ 0002_34c8d82270a9b67170a9b67170a9b67179d1237172a9b67179d135714fa9b67179d1327160a9b67179d1257172a9b6716b342a7172a9b67116477d7174a9b671

To rebuild,

β€Œcarroll@PewPew5 Desktop % python3 Folding.py rebuild /Users/carroll/Desktop/mimikatz.exe --output /Users/carroll/Desktop/mimikatz.exe/mimi.exe

Rebuilds the mimikatz.exe to mimi.exe from the mimikatz.exe/ folder

Why bother ?

Well, it's really just for fun at this point, but there is utility here, simply because ingress and egress will miss this, and where there are new offensive motivations there has to be new countermeasures, signatures or visibility, will this get adopted by offence ? will defence need to factor this in and can they?

It would be nice to see a powershell version, it would be nice to see extra switches that rebuild in memory and execute, and all that evasive fun.

bypassing AMSI and others aside, for basic file theft or indroducing malicious or covert files into an organisation the current script will get you there

You can find it here:β€Œβ€Œhttps://gist.github.com/yosignals/1edd935d21d1210596ea5538679e359b

import os import argparse import hashlib def file_to_hex(filename): """Convert file content to a hex string.""" with open(filename, 'rb') as file: content = file.read() return content.hex(), content def create_folders_for_hex(hex_data, base_path, max_length=254): """Create folders with names based on segments of the hexadecimal data, including a sequence number.""" num_folders = (len(hex_data) + max_length - 1) // max_length for i in range(num_folders): folder_name = f"{i:04d}_{hex_data[i * max_length:(i + 1) * max_length]}" os.makedirs(os.path.join(base_path, folder_name), exist_ok=True) return num_folders def hex_to_binary(hex_str): """Convert hexadecimal string to binary data.""" return bytes.fromhex(hex_str) def rebuild_file_from_folders(base_path, output_filename): """Rebuild a file from folders with hexadecimal names that include a sequence number.""" folders = [name for name in os.listdir(base_path) if os.path.isdir(os.path.join(base_path, name))] # Extract the sequence number and sort by it folders.sort(key=lambda x: int(x.split('_')[0])) hex_data = ''.join(name.split('_')[1] for name in folders) binary_data = hex_to_binary(hex_data) with open(output_filename, 'wb') as file: file.write(binary_data) def calculate_sha256(file_content): """Calculate SHA-256 hash of the given file content.""" sha256 = hashlib.sha256() sha256.update(file_content) return sha256.hexdigest() def main(): parser = argparse.ArgumentParser(description="Create or Rebuild files from hex-named folders.") parser.add_argument('mode', choices=['create', 'rebuild'], help='Operation mode: "create" or "rebuild"') parser.add_argument('path', help='Path to the file or directory') parser.add_argument('--output', help='Output file or directory', required=True) args = parser.parse_args() if args.mode == 'create': hex_data, original_content = file_to_hex(args.path) total_folders = create_folders_for_hex(hex_data, args.output) file_size = os.path.getsize(args.path) file_sha256 = calculate_sha256(original_content) print(f"Created {total_folders} folders in {args.output} based on the file {args.path}") print(f"Original file size: {file_size} bytes") print(f"SHA-256 hash of the original file: {file_sha256}") elif args.mode == 'rebuild': rebuild_file_from_folders(args.path, args.output) print(f"Rebuilt file {args.output} from folders in {args.path}") if __name__ == "__main__": main() Β  Β  Β  Β β€Œ

Mode Selection: The script can operate in two modes:

  • Create Mode: This mode is used to convert a specified file's content into hexadecimal format and distribute these hex values into multiple folders. Each folder's name contains a portion of the hexadecimal data along with a sequence number. This can be useful for data storage and organization, breaking large data into manageable parts.
  • Rebuild Mode: This mode allows for the reconstruction of the original file from folders named with hexadecimal sequences. This is particularly useful for restoring previously segmented data back into its original form.

Functionality Details:

  • Hexadecimal Conversion: Converts file content to a hex string, which simplifies data handling or transmission.
  • Folder Creation for Data Segmentation: Segments the hex data into parts and stores each part in a separate folder, which helps in managing large datasets by breaking them into smaller, organized units.
  • Data Rebuilding: Reassembles the original file from the folders, ensuring that data segmented for storage can be accurately restored.
  • Data Integrity Check: Calculates and displays the SHA-256 hash of the original file to ensure data integrity. This hash function is a cryptographic tool that verifies the data has not been altered.

β€Œβ€ŒWhen viewed through the lens of a more offensive, covert, and adversarial use case, the script's capabilities pose several challenges for cybersecurity and defense. Here are key aspects to consider:

  1. Data Exfiltration and Concealment: The script's ability to break down a file into hexadecimal segments and distribute these across multiple directories can be utilized for covert data exfiltration. By disguising data as normal directory names and spreading it out, malicious actors can evade traditional detection mechanisms that scan for unusual file types or large file transfers. This technique complicates the tracking and identification of exfiltrated data.
  2. Segmentation of Malicious Payloads: Attackers might use the script to segment malicious payloads into smaller, less detectable pieces. These segments can be individually harmless but, when reassembled, form a harmful executable or script. This method could bypass security systems that are configured to detect known signatures of malware but may not recognize fragmented pieces.
  3. Data Persistence and Retrieval: By storing data in a segmented format across various system folders, malicious actors can achieve persistence on a network. This method makes data recovery by the attacker simpler and more stealthy, while also complicating the cleanup process for defenders.
  4. Challenges in Forensic Analysis: The reconstruction of original data from folders requires knowledge of the sequence and method used in segmentation. Without this, forensic analysts may struggle to piece together data remnants found during an investigation, thus obscuring the full scope of a data breach or malicious activity.
  5. Bypassing Data Leakage Prevention (DLP) Systems: DLP systems are designed to detect and prevent the unauthorized use and transmission of confidential information. By using this script to segment and transform data into hexadecimal formats, malicious actors might evade DLP rules that are not configured to monitor and analyze folder names or directory-based data storage and transmission.

So, Do we need a response to this (yet?)β€Œβ€Œ

β€Œ

Let's see if I can convince some of my friends to make this problem more problemy Β  β€Œ