Back CONTACT

DNA Alignment Help | How to handle N's in binary rdf files


We advise that novice users should ignore the binary option, and instead directly use their multistate data to produce a Median-Joining network.

DNA Alignment allows the user to save a multistate file as a binary file. Using DNA Alignment 1.2 or older versions, or the old binary *.rdf option in DNA Alignment 1.3, multistate nucleotides are replaced with N's.

When importing this file into Network's Reduced Median calculation, each N is automatically substituted with either zero or with one, i.e. to the most similar sequence type in the rest of the data set. The data then consists purely of zeroes and ones, and can be understood by the Reduced Median calculation. The identical binarisation of Ns is achieved if the binary rdf file containing Ns is imported into Network's rdf data editor, and all Ns are adjusted to the most similar sequence types in the rest of the data set using the "Replace ambiguous states" button. However, replacing multistate positions with binary positions can easily introduce artefacts which obscure the evolutionary signal in the data, and lead to inaccurate networks. Therefore, the user should critically examine the columns with the Ns before proceeding with a Network calculation. If there are many columns with Ns (say, more than 5 percent of the variable nucleotide positions), these columns should be deleted entirely before starting any network analysis.

A different binarisation method is used when saving as Network 4.5 binary rdf from DNA Alignment 1.3. This must also be used with expert attention to multistate nucleotides.

For more information, see Why binary rdf files instead of multistate rdf files?