genetic_patterns Module
Genetic pattern matching and filtering.
Overview
The genetic_patterns module implements the syntax and logic for matching genotypes and haploid genomes against symbolic patterns.
Complete Module Reference
natal.genetic_patterns
Genetic pattern matching system for genotypes and haploid genomes.
This module provides regex-like pattern matching for genetic sequences: - PatternElement: Base class for allele-level matching - HaplotypePath: Pattern for a single DNA strand of one chromosome - ChromosomePairPattern: Pattern for a pair of homologous chromosomes - GenotypePattern: Pattern for a complete diploid genotype - HaploidGenomePattern: Pattern for a complete haploid genome - GenotypePatternParser: Parser for pattern syntax strings - GenotypeSelector: Unified genotype selector for observation/filtering
PatternParseError
Bases: Exception
Error raised during genotype pattern parsing.
PatternElement
Bases: ABC
Base class for all pattern elements representing allele-level matching.
matches
abstractmethod
Check if a single allele matches this pattern element.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
Optional[Gene]
|
The Gene object to match, or None. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the gene matches this pattern element. |
Source code in src/natal/genetic_patterns.py
AllelePattern
Bases: PatternElement
Exact match for a single allele name.
Source code in src/natal/genetic_patterns.py
WildcardPattern
SetPattern
Bases: PatternElement
Set pattern - matches alleles in a set, with optional negation.
Initialize a set pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alleles
|
Set[str]
|
Set of allele names to match. |
required |
negate
|
bool
|
If True, match alleles NOT in this set. |
False
|
Source code in src/natal/genetic_patterns.py
LocusPattern
LocusPattern(maternal_pattern: PatternElement, paternal_pattern: PatternElement, unordered: bool = False)
Pattern for a single locus (two homologous chromosomes).
Initialize a locus pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
maternal_pattern
|
PatternElement
|
PatternElement for maternal allele. |
required |
paternal_pattern
|
PatternElement
|
PatternElement for paternal allele. |
required |
unordered
|
bool
|
If True, use :: ordering (match either maternal|paternal or paternal|maternal). |
False
|
Source code in src/natal/genetic_patterns.py
matches
Check if a pair of alleles matches this locus pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mat_gene
|
Optional[Gene]
|
Maternal allele. |
required |
pat_gene
|
Optional[Gene]
|
Paternal allele. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the allele pair matches. |
Source code in src/natal/genetic_patterns.py
HaplotypePath
Pattern for a single Haplotype (one copy of a pair of homologous chromosomes).
Initialize a haplotype pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
locus_patterns
|
Sequence[PatternElement]
|
Sequence of PatternElement for each locus in order. Each PatternElement matches a single allele at that locus. |
required |
Source code in src/natal/genetic_patterns.py
matches
Check if a haplotype matches this pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
haplotype
|
Haplotype
|
The Haplotype to match. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if all loci match. |
Source code in src/natal/genetic_patterns.py
to_filter
Convert to a filter function.
Returns:
| Type | Description |
|---|---|
Callable[[Haplotype], bool]
|
A callable that takes a Haplotype and returns bool. |
ChromosomePairPattern
ChromosomePairPattern(maternal_pattern: HaplotypePath, paternal_pattern: HaplotypePath, unordered: bool = False, explicit_grouping: bool = False)
Pattern for a pair of homologous chromosomes.
Initialize a chromosome pair pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
maternal_pattern
|
HaplotypePath
|
HaplotypePath for maternal haplotype. |
required |
paternal_pattern
|
HaplotypePath
|
HaplotypePath for paternal haplotype. |
required |
unordered
|
bool
|
If True, use :: ordering (match either order). |
False
|
explicit_grouping
|
bool
|
If True, this pattern was explicitly grouped with (). |
False
|
Source code in src/natal/genetic_patterns.py
matches
Check if a pair of haplotypes (one chromosome pair) matches.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
haplotype_pair
|
Tuple[Haplotype, Haplotype]
|
Tuple of (maternal_haplotype, paternal_haplotype). |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the haplotype pair matches. |
Source code in src/natal/genetic_patterns.py
GenotypePattern
Complete genotype pattern matching multiple chromosomes.
Initialize a complete genotype pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chromosome_patterns
|
List[Optional[ChromosomePairPattern]]
|
List of ChromosomePairPattern (or None for omitted chromosomes). None means that chromosome is not constrained by the pattern. |
required |
Source code in src/natal/genetic_patterns.py
matches
Check if a genotype matches this pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genotype
|
Genotype
|
The Genotype to match. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the genotype matches all specified chromosome patterns. |
Source code in src/natal/genetic_patterns.py
to_filter
Convert to a filter function for use in rules.
Returns:
| Type | Description |
|---|---|
Callable[[Genotype], bool]
|
A callable that takes a Genotype and returns bool. |
HaploidGenomePattern
Pattern for a complete HaploidGenome (one DNA strand of an individual).
Initialize a haploid genome pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
haplotype_patterns
|
List[Optional[HaplotypePath]]
|
List of HaplotypePath for each chromosome. None means that chromosome is not constrained. |
required |
Source code in src/natal/genetic_patterns.py
matches
Check if a haploid genome matches this pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
haploid_genome
|
HaploidGenome
|
The HaploidGenome to match. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the haploid genome matches all specified patterns. |
Source code in src/natal/genetic_patterns.py
to_filter
Convert to a filter function.
Returns:
| Type | Description |
|---|---|
Callable[[HaploidGenome], bool]
|
A callable that takes a HaploidGenome and returns bool. |
GenotypePatternParser
Parses genotype pattern strings into GenotypePattern objects.
Initialize parser for a specific species.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
Species
|
The Species object to use for validation and context. |
required |
Source code in src/natal/genetic_patterns.py
parse
Parse a pattern string into a GenotypePattern.
Supported syntax includes
;separates chromosomes (outside parentheses)|separates maternal (left) and paternal (right)/separates loci within a chromosome*matches any allele{A,B,C}matches any allele in the set!Amatches any allele except A::matches unordered pair (A::B matches A|B or B|A)()groups loci within a chromosome,;inside () separates loci- Omitted chromosomes default to wildcard matching (optional)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern_str
|
str
|
The pattern string to parse. |
required |
Returns:
| Type | Description |
|---|---|
GenotypePattern
|
A GenotypePattern object. |
Raises:
| Type | Description |
|---|---|
PatternParseError
|
If the pattern is invalid. |
Source code in src/natal/genetic_patterns.py
parse_haplotype_pattern
Parse a complete haplotype pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern_str
|
str
|
Pattern string for a single haplotype (e.g., "A1/B1; C1") |
required |
Returns:
| Type | Description |
|---|---|
HaplotypePath
|
HaplotypePath object with all loci patterns combined. |
Raises:
| Type | Description |
|---|---|
PatternParseError
|
If the pattern is invalid. |
Source code in src/natal/genetic_patterns.py
parse_haploid_genome_pattern
Parse a haploid genome pattern (single DNA strand of individual).
For haploid genomes:
- ; at top level separates different chromosomes
- () brackets represent a single haplotype (one DNA strand)
- Inside brackets, ; separates different loci on that strand
- / is not used inside brackets for haploid (it's only for diploid)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern_str
|
str
|
Pattern string (e.g., "A1/B1; C1" or "(A1; B1); C1") |
required |
Returns:
| Type | Description |
|---|---|
HaploidGenomePattern
|
HaploidGenomePattern object. |
Raises:
| Type | Description |
|---|---|
PatternParseError
|
If the pattern is invalid. |
Source code in src/natal/genetic_patterns.py
get_allowed_alleles
Get all allowed allele names for a pattern element.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern_element
|
PatternElement
|
The PatternElement to analyze. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of allowed allele names. |
Source code in src/natal/genetic_patterns.py
GenotypeSelector
Unified genotype selector for observation and filtering.
This class provides a unified interface for selecting genotypes using various input formats, leveraging the existing pattern matching system.
Initialize genotype selector for a specific species.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
Species
|
The Species object to use for pattern parsing. |
required |
Source code in src/natal/genetic_patterns.py
resolve_genotype_indices
resolve_genotype_indices(gen_spec: Optional[Iterable[Any]], diploid_genotypes: Optional[Sequence[Any]], unordered: bool = False) -> List[int]
Resolve genotype selectors into a list of indices.
This method provides the same functionality as observation.py's _resolve_genotype_list() but uses the pattern matching system.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gen_spec
|
Optional[Iterable[Any]]
|
Genotype selector specification. Can be: - None: select all genotypes - int: genotype index - str: genotype pattern string - Genotype: genotype object - Iterable of any of the above |
required |
diploid_genotypes
|
Optional[Sequence[Any]]
|
Sequence of diploid genotypes for resolution. |
required |
unordered
|
bool
|
Whether to treat genotypes as unordered (A|a == a|A). |
False
|
Returns:
| Type | Description |
|---|---|
List[int]
|
List of resolved genotype indices. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If diploid_genotypes is required but missing. |
Source code in src/natal/genetic_patterns.py
create_filter_function
create_filter_function(gen_spec: Optional[Iterable[Any]], unordered: bool = False) -> Callable[[Any], bool]
Create a filter function for genotype selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gen_spec
|
Optional[Iterable[Any]]
|
Genotype selector specification. |
required |
unordered
|
bool
|
Whether to use unordered matching. |
False
|
Returns:
| Type | Description |
|---|---|
Callable[[Any], bool]
|
A callable that takes a genotype and returns True if it matches. |
Source code in src/natal/genetic_patterns.py
get_pattern_for_selector
Convert a selector to a GenotypePattern if possible.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
selector
|
Any
|
Genotype selector. |
required |
Returns:
| Type | Description |
|---|---|
Optional[GenotypePattern]
|
GenotypePattern if selector can be converted, None otherwise. |