How would you code this in Perl/PHP etc.?
Please note that if you submit an answer to this question, your code will be used in a NOT-FOR-PROFIT scientific research program, and you will be given credit for the code in any publication etc.
1. You have a string S of 20-80 characters over the alphabet A-Z (this string is drawn at random from a database DS of strings over A-Z.)
2. You have grouped the letters A-Z into the following three groups a, j, and t:
a = A-I j = J-S t = T-Z.
3. Within the string S, you identify all doublets (two consecutive letters) such that the first letter of the doublet is within group a or group j, and the second letter of the doublet is in group a or j.
4. Since you are only interested in these doublets within S, you represent S (for example) as the string S':
AKS . . . . . BL . . . . . . MD . . . . . EE (where each "." represents a letter in group t.)
where the "group-level" representation of the string S' is the string S":
ajj . . . . . aj . . . . . . ja . . . . . aa
5. You want to search the database DS and return all strings that have the form S" at the group-level.
6. Also, and MOST IMPORTANTLY, you want to return a string Sz from the database even if there is only a rough spacing correspondence between Sz and the template string S". The "allowable spacing difference" rule is that for a doublet Dz in Sz to match a doublet Ds" in S' (at the group-level), there must be no more than four characters between the end of Dz and the start of Ds", or vice-versa.
Edited by: David Halitsky on Jun 25, 2010 10:03 PM
Edited by: David Halitsky on Jun 25, 2010 10:05 PM