In order to get a FASTA file from all the Skunavirus genomes, the NCBI database was used:
https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=Skunavirus,%20taxid:1623305

Select Download, Select Data Type (Sequence Data/FASTA Format)

The rest of the data is generated with this workflow in mind:

1. Using the package prodigal to generate a database file
2. Using awk to filter this
3. Using mcl to generate protein families
4. Using a custom python script to do pairwise comparisons
5. Out of the files generated in the previous step, I am using mcl again to build clusters of genomes.

You can find more details inside the LMCLUST.sh
