3. Recipe 3: Play with kDataFrame

3.1. Description

In Recipe 1 we saved a kDataFrame to the disk with the file name kf1.mqf

  1. Create kDataFrame with kmerSize = 21
  2. Insert some random kmers with random counts
  3. Query by each kmer and check the count
  4. Insert a pre-exist kmer and check it’s count

3.2. Implementation

3.2.1. Importing

[1]:
import kProcessor as kp

3.2.2. Kmers List

[2]:
kmers = ["CTGACTACTCAGAGCTAGCCT", "CGTAACCTATGCTAGCTAGAT"]

3.2.3. Instantiate an object from the class kDataFrameMQF

[3]:
kf = kp.kDataFrameMQF(21) # Empty kDataFrame

3.3. Insert the first kmer and set count to 1 & the second kmer with count 10

[5]:
print("Inserting 2 kmers")
kf.insert(kmers[0], 1)
kf.insert(kmers[1], 10)
Inserting 2 kmers
[5]:
True

3.3.2. Insert a duplicate kmer without count

[7]:
print("[*] Inserting kmer 1 again")
kf.insert(kmers[0])
[*] Inserting kmer 1 again
[7]:
True

3.3.4. Erase kmer1 from the kDataFrame

[9]:
print("[*] Erasing kmer1")
kf.erase(kmers[0])
[*] Erasing kmer1
[9]:
True

3.4. Complete Script

import kProcessor as kp

kmers = ["CTGACTACTCAGAGCTAGCCT", "CGTAACCTATGCTAGCTAGAT"]
kf = kp.kDataFrameMQF(21) # Empty kDataFrame
print(f"kf size: {kf.size()}")

print("Inserting 2 kmers")
kf.insert(kmers[0], 1)
kf.insert(kmers[1], 10)

print("kf size: %d" % kf.size())

# Print the first kmer count
print("Kmer1 Count: %d" % kf.count(kmers[0]))
print("Kmer2 Count: %d" % kf.count(kmers[1]))

print("[*] Inserting kmer 1 again")
kf.insert(kmers[0])

print("Kmer1 Count: %d" % kf.count(kmers[0]))


print("[*] Erasing kmer1")
kf.erase(kmers[0])

print("Kmer1 Count: %d" % kf.count(kmers[0]))
print("kf size: %d" % kf.size())

print("Kmer1 Count: %d" % kf.count(kmers[0]))
print("kf size: %d" % kf.size())