Hi, Philipp-
"ijk" refers to integer-based indices for locating oneself in an ordered collection, like a list/array. These numbers are have no units, and go from 0,.., N-1, where "N" is the size of the collection along the given dimension. These are independent of voxel size.
"xyz" refers to physical coordinates of the dset, each of which has units of millimeters, depends on voxel sizes, and are generally floating point valued. In one of my least favorite aspects of neuroimaging data processing, there are maaaany conventions for what signs mean on these: "-4" in the x-axis might mean left or right, depending on convention, and many people still don't report these correctly in papers. Sigh. Life is hard.
In general, I would not try to mix these two conventions: equality/matching with floating point numbers is not a fun task numerically. Different datasets that technically "match" or are on the same grid might have different rounding errors in floating point coordinates, for example. The dataset orientation can play a confusing role, as well, such as moving between RAI-DICOM or LPI-SPM conventions. That is why I recommended verifying that dsets that were going to have information merged in Python should be checked for matching first, and then just use the coordinates of one dset. The header information being correct is key, and Python won't really deal with that header info---it is a recipe for confusion, instability and possibly disaster.
--pt