3. Recipe 3: Play with kDataFrame¶
3.1. Description¶
In Recipe 1 we saved a kDataFrame to the disk with the file name kf1.mqf
- Create kDataFrame with kmerSize = 21
- Insert some random kmers with random counts
- Query by each kmer and check the count
- Insert a pre-exist kmer and check it’s count
3.2. Implementation¶
3.2.1. Importing¶
[1]:
import kProcessor as kp
3.2.2. Kmers List¶
[2]:
kmers = ["CTGACTACTCAGAGCTAGCCT", "CGTAACCTATGCTAGCTAGAT"]
3.2.3. Instantiate an object from the class kDataFrameMQF¶
[3]:
kf = kp.kDataFrameMQF(21) # Empty kDataFrame
3.3. Insert the first kmer and set count to 1 & the second kmer with count 10¶
[5]:
print("Inserting 2 kmers")
kf.insert(kmers[0], 1)
kf.insert(kmers[1], 10)
Inserting 2 kmers
[5]:
True
3.3.1. Print the size of the kDataFrame & counts of kmer1 and kmer2¶
[6]:
print("kf size: %d" % kf.size())
# Print the first kmer count
print("Kmer1 Count: %d" % kf.count(kmers[0]))
print("Kmer2 Count: %d" % kf.count(kmers[1]))
kf size: 2
Kmer1 Count: 1
Kmer2 Count: 10
3.3.2. Insert a duplicate kmer without count¶
[7]:
print("[*] Inserting kmer 1 again")
kf.insert(kmers[0])
[*] Inserting kmer 1 again
[7]:
True
3.3.3. Print the first kmer count again¶
[8]:
print("Kmer1 Count: %d" % kf.count(kmers[0]))
Kmer1 Count: 2
3.3.4. Erase kmer1 from the kDataFrame¶
[9]:
print("[*] Erasing kmer1")
kf.erase(kmers[0])
[*] Erasing kmer1
[9]:
True
3.3.5. Print the first kmer count again and the kDataframe Size¶
[10]:
print("Kmer1 Count: %d" % kf.count(kmers[0]))
print("kf size: %d" % kf.size())
Kmer1 Count: 0
kf size: 1
3.4. Complete Script¶
import kProcessor as kp
kmers = ["CTGACTACTCAGAGCTAGCCT", "CGTAACCTATGCTAGCTAGAT"]
kf = kp.kDataFrameMQF(21) # Empty kDataFrame
print(f"kf size: {kf.size()}")
print("Inserting 2 kmers")
kf.insert(kmers[0], 1)
kf.insert(kmers[1], 10)
print("kf size: %d" % kf.size())
# Print the first kmer count
print("Kmer1 Count: %d" % kf.count(kmers[0]))
print("Kmer2 Count: %d" % kf.count(kmers[1]))
print("[*] Inserting kmer 1 again")
kf.insert(kmers[0])
print("Kmer1 Count: %d" % kf.count(kmers[0]))
print("[*] Erasing kmer1")
kf.erase(kmers[0])
print("Kmer1 Count: %d" % kf.count(kmers[0]))
print("kf size: %d" % kf.size())
print("Kmer1 Count: %d" % kf.count(kmers[0]))
print("kf size: %d" % kf.size())