snpio.GenotypeEncoder
- class snpio.GenotypeEncoder(genotype_data)[source]
Encode genotypes to various formats suitable for machine learning.
This class provides methods to encode genotypes to various formats suitable for machine learning, including 012, one-hot, and integer encodings, as well as the inverse operations.
Example
>>> # Import necessary modules >>> from snpio import VCFReader, GenotypeEncoder >>> >>> # Initialize VCFReader and GenotypeEncoder objects >>> gd = VCFReader(filename="my_vcf.vcf", popmapfile="my_popmap.txt") >>> ge = GenotypeEncoder(gd) >>> >>> # Encode genotypes to 012, one-hot, and integer formats >>> gt_012 = ge.genotypes_012 >>> gt_onehot = ge.genotypes_onehot(gt_012) >>> gt_int = ge.genotypes_int(gt_012) >>> >>> # Inverse operations >>> ge.genotypes_012 = gt_012 >>> ge.genotypes_onehot = gt_onehot >>> ge.genotypes_int = gt_int
- plot_format
Plot format for the data.
- Type:
str
- prefix
Prefix for the output directory.
- Type:
str
- verbose
If True, display verbose output.
- Type:
bool
- snp_data
List of lists of SNPs.
- Type:
List[List[str]]
- samples
List of sample IDs.
- Type:
List[str]
- filetype
File type of the data.
- Type:
str
- missing_vals
List of missing values.
- Type:
List[str]
- replace_vals
List of values to replace missing values with.
- Type:
List[str]
- __init__(genotype_data)[source]
Initialize the GenotypeEncoder object.
This class provides methods to encode genotypes to various formats suitable for machine learning, including 012, one-hot, and integer encodings, as well as the inverse operations.
- Parameters:
genotype_data (GenotypeData) – Initialized GenotypeData object.
Note
The GenotypeData object must be initialized before creating an instance of this class.
Methods
__init__(genotype_data)Initialize the GenotypeEncoder object.
convert_012(snps)Convert genotype strings to 012 encoding.
convert_int_iupac(snp_data[, encodings_dict])Convert input data to integer-encoded format (0-9) based on IUPAC codes.
convert_onehot(snp_data[, encodings_dict])Convert input data to one-hot encoded format.
decode_012(X[, is_nuc])Decode 012 encodings to IUPAC characters with metadata repair.
decode_alleles_two_channel(allele1, allele2)Convert two integer allele matrices back into IUPAC-encoded genotypes.
encode_alleles_two_channel(snp_data)Convert IUPAC genotypes to two integer allele matrices.
inverse_int_iupac(int_encoded_data[, ...])Convert integer-encoded data back to original format.
inverse_onehot(onehot_data[, encodings_dict])Convert one-hot encoded data back to original format.
Attributes
genotypes_012Encode 012 genotypes as a numpy array.
genotypes_intInteger-encoded (0-9 including IUPAC characters) snps format.
genotypes_onehotOne-hot encoded snps format of shape (n_samples, n_loci, 4).
two_channel_allelesTwo-channel allele matrices.