Iterator exercises
Iterable or not?
For each object below, answer:
Is it an iterable, an iterator, both, or neither? Can it be used in a for loop? Will it work with next?
Reusable or not?
Before running the code, predict the output, write down what you think will be printed.
Why these two pieces of code behave in a different way?
What will this code print and why?
Even numbers
Write a generator function called even_numbers(n) that yields even numbers from 0 up to n (inclusive).
The fibonacci generator
Create an infinite fibonacci sequence generator.
DNA sequence generator
Create a generator of random DNA sequences.
List vs generator
Rewrite this code using a generator expression:
Use zip to pair lists
Use the zip function to pair the values in two lists, and using a for loop print the name of the sample along with its gc_content.
Using zip and dict create a dictionary from the same two lists with the names of the samples as keys.
Divide the VCF parser
We are parsing a VCF file and we want to get the variants and for each one we want a numeric index. This is our first attempt:
This code works but it has several issues.
- It is creating a list in memory with all the variants.
- The read_and_index_variants is doing two tasks: parsing the variants and indexing them.
It would be better to create a generator that only parses the vcf file and then to index the variants outside that generator. Moreover, by using a generator we would avoid using too much memory.
Count the number of variants per chromosome
Use the parser created in the last exercise count the number of variants per chromosome.
Fix the bug
The following code is supposed to compute the mean of some values. What is wrong with this code? Fix it without changing the input data.
Write a fasta file parser that yields one sequence at a time
Filter short sequences and calc GC content
Filter the sequences generated by the fasta parser, remove the ones with the length below a threshold, then calculate the mean GC content of the longer ones.
Sliding Window (The k-mer Generator)
In bioinformatics, we often analyze “k-mers” (subsequences of length k). Write a generator function generate_kmers(sequence, k) that takes a DNA string and an integer k and yields every possible k-mer as you slide along the sequence. Using to analyze a whole fasta file and print the final kmer count.
from io import StringIO from collections import Counter
fasta_lines = “““>seq1 ACTGTGCGTCTAGCTAGCTG >seq2 CTAGCTAGTGCTGATGCTGAT CGTACTAGTCTA >seq3 CAGTCTGATCTAGCGT”“”
def parse_fasta(file): seq = None for line in file: line = line.strip() if not line: continue if line.startswith(‘>’): if seq: yield tuple(seq) name = line.split()[0][1:] seq = [name, “”] else: seq[1] += line if seq: yield tuple(seq)
def generate_kmers(seq, kmer_len): seq = seq[1] return ( seq[i : i + kmer_len] for i in range(len(seq) - kmer_len + 1) )
kmer_len = 5 file = StringIO(fasta_lines) seqs = parse_fasta(file) kmers = (kmer for seq in seqs for kmer in generate_kmers(seq, kmer_len)) kmer_counts = Counter(kmers) print(kmer_counts)