1. Recipe 1: Kmers parsing and counting¶
1.2. Description¶
- Create an empty kDataFrame with kmerSize = 21
- Load a fasta file into a kDataFrame
- Save the kDataFrame on disk
1.3. Implementation¶
1.3.1. Importing¶
[1]:
import kProcessor as kp
1.3.2. Create an empty kDataFrame¶
[2]:
kf1 = kp.kDataFrameMQF(21)
1.3.3. Parse the fastq file into the kf1 kDataFrame¶
[10]:
# kp.parseSequencesFromFile(kDataFrame, mode, params, file_path, chunk size)
kp.parseSequencesFromFile(kf1, "kmers", {"k_size" : 21}, "data/test.fastq", 1000)
1.3.4. Iterating over first 10 kmers¶
Note:
kDataFrameIterator.next() is extremely important to move the iterator to the next kmer position.
[11]:
it = kf1.begin()
for i in range(10):
print(it.getKmer())
it.next()
CCCAACAGAATTAAAAAGTCA
AAATTAAATAACTTTAGCGCA
CCAAATTACAACAAAATTTGG
TTAATCATTTGGTATAATTGC
ACCTCGTATAACTTCGTATAA
AACAATTCAACAGAGAAGGAC
AGGCTAATCGAACAAAACATC
AGGAAAAACTCCAGCCAGTAA
TACGGGTCGCAGTGACCAGGC
CCAGGTAGTACAGCAATCGTA
1.3.5. Save the kDataFrame on disk with a name “kf1”¶
[12]:
# This will save the file with the extension ".mqf"
kf1.save("kf1")
1.4. Complete Script¶
import kProcessor as kp
# Creating an empty kDataFrameMQF with kmer size 21
kf1 = kp.kDataFrameMQF(21)
# kp.parseSequencesFromFile(kDataFrame, mode, params, file_path, chunk size)
kp.parseSequencesFromFile(kf1, "kmers", {"k_size" : 21}, "data/test.fastq", 1000)
kf1.save("kf1")
it = kf1.begin()
for i in range(10):
print(it.getKmer())
it.next()