1. Recipe 1: Kmers parsing and counting

1.1. Data

Sample light-weight data for running the examples.

Click here to download

1.2. Description

  1. Create an empty kDataFrame with kmerSize = 21
  2. Load a fasta file into a kDataFrame
  3. Save the kDataFrame on disk

1.3. Implementation

1.3.1. Importing

[1]:
import kProcessor as kp

1.3.2. Create an empty kDataFrame

[2]:
kf1 = kp.kDataFrameMQF(21)

1.3.3. Parse the fastq file into the kf1 kDataFrame

[10]:
# kp.parseSequencesFromFile(kDataFrame, mode, params, file_path, chunk size)
kp.parseSequencesFromFile(kf1, "kmers", {"k_size" : 21}, "data/test.fastq", 1000)

1.3.4. Iterating over first 10 kmers

Note:

kDataFrameIterator.next() is extremely important to move the iterator to the next kmer position.

[11]:
it = kf1.begin()

for i in range(10):
    print(it.getKmer())
    it.next()
CCCAACAGAATTAAAAAGTCA
AAATTAAATAACTTTAGCGCA
CCAAATTACAACAAAATTTGG
TTAATCATTTGGTATAATTGC
ACCTCGTATAACTTCGTATAA
AACAATTCAACAGAGAAGGAC
AGGCTAATCGAACAAAACATC
AGGAAAAACTCCAGCCAGTAA
TACGGGTCGCAGTGACCAGGC
CCAGGTAGTACAGCAATCGTA

1.3.5. Save the kDataFrame on disk with a name “kf1”

[12]:
# This will save the file with the extension ".mqf"
kf1.save("kf1")

1.4. Complete Script

import kProcessor as kp

# Creating an empty kDataFrameMQF with kmer size 21
kf1 = kp.kDataFrameMQF(21)

# kp.parseSequencesFromFile(kDataFrame, mode, params, file_path, chunk size)
kp.parseSequencesFromFile(kf1, "kmers", {"k_size" : 21}, "data/test.fastq", 1000)

kf1.save("kf1")

it = kf1.begin()

for i in range(10):
    print(it.getKmer())
    it.next()