snpio.GenotypeEncoder

class snpio.GenotypeEncoder(genotype_data)[source]

Encode genotypes to various formats suitable for machine learning.

This class provides methods to encode genotypes to various formats suitable for machine learning, including 012, one-hot, and integer encodings, as well as the inverse operations.

Example

>>> # Import necessary modules
>>> from snpio import VCFReader, GenotypeEncoder
>>>
>>> # Initialize VCFReader and GenotypeEncoder objects
>>> gd = VCFReader(filename="my_vcf.vcf", popmapfile="my_popmap.txt")
>>> ge = GenotypeEncoder(gd)
>>>
>>> # Encode genotypes to 012, one-hot, and integer formats
>>> gt_012 = ge.genotypes_012
>>> gt_onehot = ge.genotypes_onehot(gt_012)
>>> gt_int = ge.genotypes_int(gt_012)
>>>
>>> # Inverse operations
>>> ge.genotypes_012 = gt_012
>>> ge.genotypes_onehot = gt_onehot
>>> ge.genotypes_int = gt_int
plot_format

Plot format for the data.

Type:

str

prefix

Prefix for the output directory.

Type:

str

verbose

If True, display verbose output.

Type:

bool

snp_data

List of lists of SNPs.

Type:

List[List[str]]

samples

List of sample IDs.

Type:

List[str]

filetype

File type of the data.

Type:

str

missing_vals

List of missing values.

Type:

List[str]

replace_vals

List of values to replace missing values with.

Type:

List[str]

__init__(genotype_data)[source]

Initialize the GenotypeEncoder object.

This class provides methods to encode genotypes to various formats suitable for machine learning, including 012, one-hot, and integer encodings, as well as the inverse operations.

Parameters:

genotype_data (GenotypeData) – Initialized GenotypeData object.

Note

The GenotypeData object must be initialized before creating an instance of this class.

Methods

__init__(genotype_data)

Initialize the GenotypeEncoder object.

convert_012(snps)

Convert genotype strings to 012 encoding.

convert_int_iupac(snp_data[, encodings_dict])

Convert input data to integer-encoded format (0-9) based on IUPAC codes.

convert_onehot(snp_data[, encodings_dict])

Convert input data to one-hot encoded format.

decode_012(X[, is_nuc])

Decode 012 encodings to IUPAC characters with metadata repair.

decode_alleles_two_channel(allele1, allele2)

Convert two integer allele matrices back into IUPAC-encoded genotypes.

encode_alleles_two_channel(snp_data)

Convert IUPAC genotypes to two integer allele matrices.

inverse_int_iupac(int_encoded_data[, ...])

Convert integer-encoded data back to original format.

inverse_onehot(onehot_data[, encodings_dict])

Convert one-hot encoded data back to original format.

Attributes

genotypes_012

Encode 012 genotypes as a numpy array.

genotypes_int

Integer-encoded (0-9 including IUPAC characters) snps format.

genotypes_onehot

One-hot encoded snps format of shape (n_samples, n_loci, 4).

two_channel_alleles

Two-channel allele matrices.