Merging multiple logical recoveries
14.10.2020, 11:42 - Autor: Mark B.
In case you go for multiple logical recovery attempts to maximize the count of useable files you need to merge folders and preserve the folder-structure but you also need to check if files differ from each other! This script merge the folders and check each files MD5-sum to differentiate between alternative versions of a file and duplicates.
To do that you slould create a folder-structure like that:
D:\DE\case12345\a
D:\DE\case12345\b
D:\DE\case12345\c
... each recovery with each tool into it's own folder and that folders collected in a case-folder.
The folder-names (
a,
b and
c in that example) can be named as you like. They will be processed alphabetically and so if you prefer to start with an specific attempt you should make sure that's the first processed folder!
When you start the tool you get an propt to select the base-folder. That would be in that example from above
D:\DE\case12345.
The tool will create a log-file with all MD5-checksums and a folder named
000_merged which will be skipped when processing. So in case you have to stop the tool and rerun it later your sorting will be fine and the tool will skip all previously processed files.
Source:
import os, time, hashlib
import tkinter as tk
from tkinter import filedialog, messagebox, simpledialog
def md5(fpath):
hash_md5 = hashlib.md5()
with open(fpath, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()
root = tk.Tk()
root.withdraw()
base_folder = filedialog.askdirectory(parent=root, initialdir="/", title='Please select the base-directory')
output_folder_name = "000_merged"
base_subfolders = []
mctr = 0
sctr = 0
output_folder = os.path.join(base_folder, output_folder_name)
if base_folder is not None:
# Check if output-folder exists and create it if not
if not os.path.isdir(output_folder):
os.mkdir(output_folder)
# Get folders in Basedir
for d in os.listdir(base_folder):
d_path = os.path.join(base_folder, d)
if os.path.isdir(d_path) and not output_folder_name in d_path:
base_subfolders.append(d_path)
# iterate over folders
for subdir, dirs, files in os.walk(base_folder):
# don't process output folder
if subdir == base_folder:
[dirs.remove(d) for d in list(dirs) if d == output_folder_name]
for filename in files:
fpath = os.path.join(subdir, filename)
target_fpath = fpath
for d in base_subfolders:
target_fpath = target_fpath.replace(d, output_folder)
# Check if target file exist
if os.path.isfile(target_fpath):
# Check if source file exist
if not os.path.isfile(fpath):
continue
# Check md5 sum of files
md5_source = md5(fpath)
md5_target = md5(target_fpath)
if md5_source != md5_target:
# Rename targetfile
time_stamp = str(time.time()).encode("utf-8")
tmp = os.path.basename(target_fpath)
tmp = list(os.path.splitext(tmp))
tmp[0] = tmp[0] + "_" + str(hashlib.md5(time_stamp).hexdigest())
new_filename = "".join(tmp)
# Create new path and log the filename
target_fpath = os.path.join(os.path.dirname(target_fpath), new_filename)
with open(os.path.join(base_folder, "different_files.log"), "a", encoding="utf-8") as logfile:
logfile.write(target_fpath + "\n")
else:
sctr += 1
print("SKIPPING: " + fpath)
continue
# Create folder-structure in the output-filder
os.makedirs(os.path.dirname(target_fpath), exist_ok = True)
# move file
mctr += 1
try:
os.rename(fpath, target_fpath)
print("MOVING: " + target_fpath)
except FileNotFoundError:
print("ERROR: " + target_fpath)
print()
print("UNIQ FILES: " + str(mctr))
print("SKIPPED FILES: " + str(sctr))
Download
System requrements:
Windows, OSX or Linux with