• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Java help ! Substring...

Status
Not open for further replies.

NetMapel

Guilty White Male Mods Gave Me This Tag
Okay, my CS professor have given the class an assignment regarding substring and DNA. We have been given a txt file containing DNA sequences and a specific pattern which we have to get our java program to look for in its specified sequence. So, this is in the txt file.

AATTGCCTTTTAAAAA
ATTG
AATTGCCTTTTAAAAA
TG_
AATTGCCTTTTAAAAA
AAG
AATTGCCTTTTAAAAA
TT_C
AATTGCCTTTTAAAAA
TTT
AATTGCCTTTTAAAAA
AA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GC
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
CGGTA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGT
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
GCCGTTCAGA
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
T__T
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
C_T_T

Basically, we get a sequence in one line, and a pattern in another line, so so on. So, for example:

AATTGCCTTTTAAAAA
ATTG

We are to find the pattern ATTG in its sequence, and the java program which I should make will output the location which the program detects a match between the sequences and patterns. So, position 2 is the only one matching right now. For something like this:

AATTGCCTTTTAAAAA
AA

the pattern can be found in position 1, 12, 13, 14, and 15 in its specified sequence. If a pattern contains "_" like in "TG_", it means the blank spot can be filled with anything. Therefore, the program only has to look for a TG pattern. Anyways, those are known as wildcard, I can take care of that myself.

So, now, the most important thing is, my professor suggested using substring command to approach this and gave us in a loop:

int position = 0;
int lx = Sequence.substring(position, pattern);
position = position + x

x basically stands for the position in which a correct pattern is found. and the variable position = x + 1 because you want the program to keep on scanning the line letter by letter. So, the first letter in a sequence is "0", and position = x + 1 will allow the program to continue scanning the letters after a match is found.

Please help... :( I am so very stump.
 

Ghost

Chili Con Carnage!
If it was the Needleman-Wunsch Global Alignment algorithm (as i thought it was) i could have maybe helped, but sorry ive got no idea with this (subtle bump).

What you've got makes sense to me though, try to match the pattern with the string starting at the first character, if it doesnt match go to the second character, else output the character number (to a file or println) and go on to the next character.
 

NetMapel

Guilty White Male Mods Gave Me This Tag
I know that, but the thing is though, my program won't run because int x = Sequence.substring(n, Pattern); has a "cannot resolve symbol" error. More specifically, "Sequence" cannot resolve symbol...
 

NetMapel

Guilty White Male Mods Gave Me This Tag
http://java.sun.com/j2se/1.3/docs/api/java/lang/String.html#substring(int)

Code:
substring
public String substring(int beginIndex,
                        int endIndex)Returns a new string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1. Thus the length of the substring is endIndex-beginIndex. 
Examples: 

 "hamburger".substring(4, 8) returns "urge"
 "smiles".substring(1, 5) returns "mile"
 Parameters:
beginIndex - the beginning index, inclusive.
endIndex - the ending index, exclusive.
Returns:
the specified substring.
Throws:
IndexOutOfBoundsException - if the beginIndex is negative, or endIndex is larger than the length of this String object, or beginIndex is larger than endIndex.

Yeah, I was like, wtf... int !? But uh... I am not exactly following you there, please elaborate :( I am teh stupid.
 

Ferrio

Banned
Ya, your professor must of been confused cause that's not a good function to go about things.

Hell all you really need is

Code:
public boolean contains(CharSequence s)

    Returns true if and only if this string contains the specified sequence of char values.

    Parameters:
        s - the sequence to search for 
    Returns:
        true if this string contains s, false otherwise 
    Throws:
        NullPointerException - if s is null
    Since:
        1.5


but that's cheating.


But ya, basically you want to use

Code:
public int indexOf(String str,
                   int fromIndex)

    Returns the index within this string of the first occurrence of the specified substring, starting at the specified index. The integer returned is the smallest value k for which:

     k >= Math.min(fromIndex, str.length()) && this.startsWith(str, k)
 

    If no such value of k exists, then -1 is returned.

    Parameters:
        str - the substring for which to search.
        fromIndex - the index from which to start the search. 
    Returns:
        the index within this string of the first occurrence of the specified substring, starting at the specified index.
 

Ferrio

Banned
SOrry I don't know java, but this function looks like it would help

Code:
public char charAt(int index)

    Returns the char value at the specified index. An index ranges from 0 to length() - 1. The first char value of the sequence is at index 0, the next at index 1, and so on, as for array indexing.

    If the char value specified by the index is a surrogate, the surrogate value is returned.

    Specified by:
        charAt in interface CharSequence

    Parameters:
        index - the index of the char value. 
    Returns:
        the char value at the specified index of this string. The first char value is at index 0. 
    Throws:
        IndexOutOfBoundsException - if the index argument is negative or not less than the length of this string.


So let's see. Here's some pseudo code, I dont' knwo if it works. I don't know java, and I haven't fully implimented it. But... that's the basic concept.


Code:
string thestring = "thestring"
string pattern = "pattern"
int stringposition = 0;  // The position in the actual string we are.   This signifys where the first letter of the pattern and string match up
int patternposition = 0;  // What character in the pattern we're currently checking against the string with.  Patternposition+stringposition is what letter in the string we're currently checking.

while (thestring.charAT(stringposition)){  // checks to see if we're at the end of the string
      if (thestring.charAt(stringposition+patternposition) == pattern.charAt(patternposition) || pattern.charAT(patternposition) == '_')  // If both characters match up or wildcard, try the next character in pattern.
      {
            if(pattern.charAt(patternposition++) == NULL) { It matches starting at stringposition, store this value somehow.... an array most likely }   // If there is no next character in pattern, then we've matched it.
           patternposition++;  // increment pattern position.  What character in the pattern we're currently checking
       } 
      else
     {
          stringposition++;  // There was a mismatch, increment the stringposition
          patternposition=0;  // Reset the pattern to check start of pattern with new string position
      }
}
 

Ghost

Chili Con Carnage!
NetMapel said:
http://java.sun.com/j2se/1.3/docs/api/java/lang/String.html#substring(int)

Yeah, I was like, wtf... int !? But uh... I am not exactly following you there, please elaborate :( I am teh stupid.

Instead of

int lx = Sequence.substring(position, pattern);


You need


int lx = Sequence.indexOf(pattern, position))


That'll return the starting position (If its the 3rd character in the sequence then lx=3) of the match in the sequence

Then you need to increment the position value.
 

iapetus

Scary Euro Man
If you're conforming to Java style guidelines, then 'Sequence' should start with a lower case letter. Upper case for classes and constants only.
 
Status
Not open for further replies.
Top Bottom