DNA Alignment Help | Why binary rdf files instead of multistate rdf files?
We advise that novice users should ignore the binary option, and instead directly use their multistate data to produce a Median-Joining network.
DNA Alignment allows the user to save
multistate data as a binary file, which Network can then use to produce a Reduced Median network, or a Reduced Median Median-Joining network. Binarisation should never be the first choice; the more accurate procedure is to use the original multistate data for a Median-Joining network calculation.
Experienced users may experiment with binarisation in the following typical cases:
-
They are dissatisfied with their Median-Joining network and desire a "second opinion" from a Reduced Median network.
-
To attempt to resolve superfluous network links by running a Reduced Median calculation to produce an *.rmf file, which can be subsequently imported into a Median-Joining network calculation.
-
To run the Star Contraction calculation on binarised amino acid data to produce an *.sco file, which can be imported into the Reduced Median calculation or the Median-Joining calculation.
Experienced users, who wish to experiment with binarisation, should note the following:
-
Binarisation can easily introduce artefacts which obscure the evolutionary signal in the data, and lead to inaccurate networks.
To avoid errors, the user should identify and critically examine multistate nucleotides before proceeding with a Network calculation.
Multistate nucleotides are most easily identified by saving in old binary *.rdf format, and searching for Ns in the data.
If there are many columns with N (say, more than 5 percent of the variable nucleotide positions), these nucleotide positions should be deleted entirely before starting any network analysis.
-
Using DNA Alignment 1.2 or older versions, or the old binary *.rdf option in DNA Alignment 1.3:
This binary format option of DNA Alignment replaces any multistate nucleotides with new N's which may require careful manual editing (e.g. within the Network rdf data editor).
When importing this file into Network's Reduced Median calculation, each N is automatically substituted with either zero or with one, i.e. to the most similar sequence type in the rest of the data set. The data then consists purely of zeroes and ones, and can be understood by the Reduced Median calculation.
The identical binarisation of Ns is achieved if the binary rdf file containing Ns is imported into Network's rdf data editor, and all Ns are adjusted to the most similar sequence types in the rest of the data set using the "Replace ambiguous states" button.
-
DNA Alignment 1.3 can optionally save Network 4.5 binary *.rdf files which contain no N's. The Network 4.5 binary *.rdf files contain a "1" at those nucleotide positions which are different to the first sequence's nucleotides, regardless of whether the position is multistate or binary - this is different to Network's method of replacing N's. This binarisation may also lead to inaccurate networks, see the first note on binarisation.