snpio.Plotting

class snpio.Plotting(genotype_data, show=False, plot_format='png', dpi=300, plot_fontsize=18, plot_title_fontsize=22, despine=True, verbose=False, debug=False)[source]

Class containing various methods for generating plots based on genotype data.

This class is initialized with a GenotypeData object containing necessary data. The class attributes are set based on the provided values, the GenotypeData object, or default values.

genotype_data

Initialized GenotypeData object containing necessary data.

Type:

GenotypeData

prefix

Prefix string for output directories and files.

Type:

str

output_dir

Output directory for saving plots.

Type:

Path

show

Whether to display the plots.

Type:

bool

plot_format

Format in which to save the plots.

Type:

str

dpi

Resolution of the saved plots.

Type:

int

plot_fontsize

Font size for the plot labels.

Type:

int

plot_title_fontsize

Font size for the plot titles.

Type:

int

despine

Whether to remove the top and right plot axis spines.

Type:

bool

verbose

Whether to enable verbose logging.

Type:

bool

debug

Whether to enable debug logging.

Type:

bool

logger

Logger object for logging messages.

Type:

logging.Logger

boolean_filter_methods

List of boolean filter methods.

Type:

list

missing_filter_methods

List of missing data filter methods.

Type:

list

maf_filter_methods

List of MAF filter methods.

Type:

list

mpl_params

Default Matplotlib parameters for the plots.

Type:

dict

plot_sankey_filtering_report()

Plot a Sankey diagram for the filtering report.

plot_pca()

Plot a PCA scatter plot with 2 or 3 dimensions, colored by missing data proportions, and labeled by population with symbols for each sample.

plot_summary_statistics()[source]

Plot summary statistics per sample and per population on the same figure. The summary statistics are plotted as lines for each statistic (Ho, He, Pi, Fst).

plot_dapc()

Plot a DAPC scatter plot. with 2 or 3 dimensions, colored by population, and labeled by population with symbols for each sample.

plot_fst_heatmap()

Plot a heatmap of Fst values between populations, sorted by highest Fst and displaying only the lower triangle.

plot_fst_outliers()[source]

Plot a heatmap of Fst values for outlier SNPs, highlighting contributing population pairs.

plot_d_statistics()[source]

Create plots for D-statistics with multiple test corrections.

_set_logger()

Set the logger object based on the debug attribute. If debug is True, the logger will log debug messages.

_get_attribute_value()[source]

Determine the value for an attribute based on the provided argument, genotype_data attribute, or default value. If a value is provided during initialization, it is used. Otherwise, the genotype_data attribute is used if available. If neither is available, the default value is used.

_plot_summary_statistics_per_sample()[source]

Plot summary statistics per sample. If an axis is provided, the plot is drawn on that axis.

_plot_summary_statistics_per_population()[source]

Plot summary statistics per population. If an axis is provided, the plot is drawn on that axis.

__init__(genotype_data, show=False, plot_format='png', dpi=300, plot_fontsize=18, plot_title_fontsize=22, despine=True, verbose=False, debug=False)[source]

Initialize the Plotting class.

This class contains various methods for generating plots based on genotype data. The class is initialized with a GenotypeData object containing necessary data. The class attributes are set based on the provided values, the GenotypeData object, or default values.

Parameters:
  • genotype_data (GenotypeData) – Initialized GenotypeData object containing necessary data.

  • show (bool) – Whether to display the plots. Defaults to genotype_data.show if available, otherwise False.

  • plot_format (str) – The format in which to save the plots (e.g., ‘png’, ‘svg’). Defaults to genotype_data.plot_format if available, otherwise ‘png’.

  • dpi (int) – The resolution of the saved plots. Unused for vector plot_format types. Defaults to genotype_data.dpi if available, otherwise 300.

  • plot_fontsize (int) – The font size for the plot labels. Defaults to genotype_data.plot_fontsize if available, otherwise 18.

  • plot_title_fontsize (int) – The font size for the plot titles. Defaults to genotype_data.plot_title_fontsize if available, otherwise 22.

  • despine (bool) – Whether to remove the top and right plot axis spines. Defaults to genotype_data.despine if available, otherwise True.

  • verbose (bool) – Whether to enable verbose logging. Defaults to genotype_data.verbose if available, otherwise False.

  • debug (bool) – Whether to enable debug logging. Defaults to genotype_data.debug if available, otherwise False.

Note

  • The show, plot_format, dpi, plot_fontsize, plot_title_fontsize, despine, verbose, and debug attributes are set based on the provided values, the genotype_data object, or default values.

  • The output_dir attribute is set to the prefix_output/nremover/plots directory or the prefix_output/plots directory if the genotype data was not filtered when initializing the Plotting class.

  • The mpl_params dictionary contains default Matplotlib parameters for the plots and are updated with the mpl_params dictionary.

  • The plotting object is used to set the attributes based on the provided values, the genotype_data object, or default values.

Methods

__init__(genotype_data[, show, plot_format, ...])

Initialize the Plotting class.

plot_allele_summary(summary[, figsize])

Plot allele summary statistics from summarize_alleles output.

plot_d_statistics(df, method)

Create and save D-statistics plots and MultiQC reports.

plot_d_statistics_heatmap(df[, method_name])

Plots a heatmap of D-statistics colored by -log10(P-value).

plot_dist_matrix(df, *[, pvals, palette, ...])

Plot distance matrix.

plot_dstat_chi_square_distribution(df, ...)

Plots the distribution of Chi-square values for D-statistics.

plot_dstat_pvalue_distribution(df, method_name)

Plots the distribution of -log10(P-values) for D-statistics.

plot_dstat_significance_counts(df, method_name)

Plots the number of significant results per D-statistic.

plot_fst_outliers(outlier_snps, method[, ...])

Create a heatmap of Fst values for outlier SNPs, highlighting contributing population pairs.

plot_gt_distribution(df[, annotation_size])

Plot the distribution of genotype counts.

plot_permutation_dist(obs_fst, dist, ...[, ...])

Plot the permutation distribution of Fst values.

plot_pop_counts(populations)

Plot the population counts.

plot_search_results(df_combined)

Plot and save the filtering results based on the available data.

plot_stacked_significance_barplot(df, ...)

Creates a stacked bar plot of significance categories.

plot_summary_statistics(summary_statistics)

Plot summary statistics per sample and per population.

visualize_missingness(df[, prefix, zoom, ...])

Visualize missing data across loci, individuals, and populations.