Find duplicates

Comparing the documents metadata and file properties, this tool is able to group candidates of being equal documents. PDFs with same number of pages, same creation date and file size are obvious possible duplicates. Other file and metadata properties may apply in different scenarios.
With this selection simplification, and with the possibility to fine tune the grouping with CRC (Cyclic Redundancy Check) comparison, and manual, side by side, visualization, we can easily determinate if two documents are duplicates.

Find duplicates tool screenshot

The tool has options to delete, copy or move files, that can be used to easily manage the duplicates found.

The operation begins by starting the tool with the list of PDFs we want to compare.

Upon starting, the tool will use the default, or last used, list of properties to compare to group the documents by equality of properties, if any equality is found. No groups will show If no equality is found, but the list of files remain charged internally. We have always the chance to change the list of properties used in the comparison to try other possibilities.

The list of properties to compare is composed by interacting with the related buttons in the top toolbar. The toolbar left button collects the last used comparison for easily reuse; the plus (+) sign button is used to add more items; each of the items buttons have options to change to another property, or to remove it from the comparison. Under this menu there is a scripts named item that provides access to custom defined script functions. The script functions are created using the built-in script editor, started from the manage scripts item. The script function should be create using the same rules as the used by rename tool scripting functionality, and should return a string that will be compared against the related values off all the other files. This can be, as example, the checksum of the PDF text, a value representing the page sizes, etc.

The Apply named button will execute a new comparison using the current list of comparison items.

The CRC comparison is executed by group, and exclude all the files that in that group don't have the same CRC value. When a file is excluded by CRC, or properties, comparison, it is not removed from the internal list of files, so further compare operations will show it again, if collision once again occur.

The delete and move operations remove the affected files from the internal list of documents, so further compare operations will not include these.

The export to CSV button exports the shown duplicates groups to a CSV format file, so the results can be processed by external applications.