Okay, my CS professor have given the class an assignment regarding substring and DNA. We have been given a txt file containing DNA sequences and a specific pattern which we have to get our java program to look for in its specified sequence. So, this is in the txt file.
AATTGCCTTTTAAAAA
ATTG
AATTGCCTTTTAAAAA
TG_
AATTGCCTTTTAAAAA
AAG
AATTGCCTTTTAAAAA
TT_C
AATTGCCTTTTAAAAA
TTT
AATTGCCTTTTAAAAA
AA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GC
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
CGGTA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGT
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
T__T
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
C_T_T
Basically, we get a sequence in one line, and a pattern in another line, so so on. So, for example:
AATTGCCTTTTAAAAA
ATTG
We are to find the pattern ATTG in its sequence, and the java program which I should make will output the location which the program detects a match between the sequences and patterns. So, position 2 is the only one matching right now. For something like this:
AATTGCCTTTTAAAAA
AA
the pattern can be found in position 1, 12, 13, 14, and 15 in its specified sequence. If a pattern contains "_" like in "TG_", it means the blank spot can be filled with anything. Therefore, the program only has to look for a TG pattern. Anyways, those are known as wildcard, I can take care of that myself.
So, now, the most important thing is, my professor suggested using substring command to approach this and gave us in a loop:
int position = 0;
int lx = Sequence.substring(position, pattern);
position = position + x
x basically stands for the position in which a correct pattern is found. and the variable position = x + 1 because you want the program to keep on scanning the line letter by letter. So, the first letter in a sequence is "0", and position = x + 1 will allow the program to continue scanning the letters after a match is found.
Please help... I am so very stump.
AATTGCCTTTTAAAAA
ATTG
AATTGCCTTTTAAAAA
TG_
AATTGCCTTTTAAAAA
AAG
AATTGCCTTTTAAAAA
TT_C
AATTGCCTTTTAAAAA
TTT
AATTGCCTTTTAAAAA
AA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GC
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
CGGTA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGT
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
T__T
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
C_T_T
Basically, we get a sequence in one line, and a pattern in another line, so so on. So, for example:
AATTGCCTTTTAAAAA
ATTG
We are to find the pattern ATTG in its sequence, and the java program which I should make will output the location which the program detects a match between the sequences and patterns. So, position 2 is the only one matching right now. For something like this:
AATTGCCTTTTAAAAA
AA
the pattern can be found in position 1, 12, 13, 14, and 15 in its specified sequence. If a pattern contains "_" like in "TG_", it means the blank spot can be filled with anything. Therefore, the program only has to look for a TG pattern. Anyways, those are known as wildcard, I can take care of that myself.
So, now, the most important thing is, my professor suggested using substring command to approach this and gave us in a loop:
int position = 0;
int lx = Sequence.substring(position, pattern);
position = position + x
x basically stands for the position in which a correct pattern is found. and the variable position = x + 1 because you want the program to keep on scanning the line letter by letter. So, the first letter in a sequence is "0", and position = x + 1 will allow the program to continue scanning the letters after a match is found.
Please help... I am so very stump.