4. Recipe 4: Itaration on kDataFrame kmers¶
4.1. Description¶
- Create kDataFrame with kmerSize = 21
- Insert some random kmers with random counts
- Iterate over kDataFrames kmers and print Kmer and Count
- Save the result in dictionary
4.2. Implementation¶
4.2.1. Importing¶
[1]:
import kProcessor as kp
import random
4.2.2. Create kmers list with 4 kmers¶
[2]:
kmers = ["ATCATACTGATCGATCGATGC", "CGTAACCTATGCTAGCTAGAT", "CTGACTACTCAGAGCTAGCCT","CAATCGCTGATACGATACGTA"]
4.2.3. Create an empty kDataFrame¶
[3]:
kf2 = kp.kDataFrameMQF(21)
4.2.4. Insert all kmers using a for loop¶
[4]:
for kmer in kmers:
random_count = random.randint(1,100) # generate random count between 1 and 100
print("Inserting kmer: %s with count %d" % (kmer, random_count))
kf2.insert(kmer, random_count)
Inserting kmer: ATCATACTGATCGATCGATGC with count 99
Inserting kmer: CGTAACCTATGCTAGCTAGAT with count 22
Inserting kmer: CTGACTACTCAGAGCTAGCCT with count 35
Inserting kmer: CAATCGCTGATACGATACGTA with count 82
4.2.5. Iterate over all kmers and print their count and save them in a dictionary¶
[5]:
# Create empty dictionary
kf2_data = {}
# create iterator with the first position in the kDataFrame
it = kf2.begin()
while(it != kf2.end()):
# Get the kmer string
kmer = it.getKmer()
# Get the kmer count
count = it.getCount()
# Print the data
print("retrieved kmer: %s with count: %d" % (kmer, count))
# Save data in a dictionary
kf2_data[kmer] = count
it.next() # Extremely Important!
retrieved kmer: AGGCTAGCTCTGAGTAGTCAG with count: 35
retrieved kmer: ATCTAGCTAGCATAGGTTACG with count: 22
retrieved kmer: ATCATACTGATCGATCGATGC with count: 99
retrieved kmer: CAATCGCTGATACGATACGTA with count: 82
4.2.6. Dump the dictionary data to a file¶
[6]:
with open("kf2_data.tsv", 'w') as kf2:
kf2.write("kmer\tcount\n")
for kmer,count in kf2_data.items():
kf2.write("%s\t%d\n" % (kmer, count))
4.3. Complete Script¶
import kProcessor as kp
import random
kmers = ["ATCATACTGATCGATCGATGC", "CGTAACCTATGCTAGCTAGAT", "CTGACTACTCAGAGCTAGCCT","CAATCGCTGATACGATACGTA"]
kf2 = kp.kDataFrameMQF(21)
for kmer in kmers:
random_count = random.randint(1,100) # generate random count between 1 and 100
print("Inserting kmer: %s with count %d" % (kmer, random_count))
kf2.insert(kmer, random_count)
# Create empty dictionary
kf2_data = {}
# create iterator with the first position in the kDataFrame
it = kf2.begin()
while(it != kf2.end()):
# Get the kmer string
kmer = it.getKmer()
# Get the kmer count
count = it.getCount()
# Print the data
print("retrieved kmer: %s with count: %d" % (kmer, count))
# Save data in a dictionary
kf2_data[kmer] = count
it.next() # Extremely Important!
with open("kf2_data.tsv", 'w') as kf2:
kf2.write("kmer\tcount\n")
for kmer,count in kf2_data.items():
kf2.write("%s\t%d\n" % (kmer, count))
4.4. Output CSV¶
[7]:
%%bash
cat kf2_data.tsv
kmer count
AGGCTAGCTCTGAGTAGTCAG 35
ATCTAGCTAGCATAGGTTACG 22
ATCATACTGATCGATCGATGC 99
CAATCGCTGATACGATACGTA 82