How would you code this in Perl/PHP etc.?

Please note that if you submit an answer to this question, your code will be used in a NOT-FOR-PROFIT scientific research program, and you will be given credit for the code in any publication etc.

Problem Statement:

1. You have a string S of 20-80 characters over the alphabet A-Z (this string is drawn at random from a database DS of strings over A-Z.)

2. You have grouped the letters A-Z into the following three groups a, j, and t:

a = A-I j = J-S t = T-Z.

3. Within the string S, you identify all doublets (two consecutive letters) such that the first letter of the doublet is within group a or group j, and the second letter of the doublet is in group a or j.

4. Since you are only interested in these doublets within S, you represent S (for example) as the string S':

AKS . . . . . BL . . . . . . MD . . . . . EE (where each "." represents a letter in group t.)

where the "group-level" representation of the string S' is the string S":

ajj . . . . . aj . . . . . . ja . . . . . aa

5. You want to search the database DS and return all strings that have the form S" at the group-level.

6. Also, and MOST IMPORTANTLY, you want to return a string Sz from the database even if there is only a rough spacing correspondence between Sz and the template string S". The "allowable spacing difference" rule is that for a doublet Dz in Sz to match a doublet Ds" in S' (at the group-level), there must be no more than four characters between the end of Dz and the start of Ds", or vice-versa.

Edited by: David Halitsky on Jun 25, 2010 10:03 PM

Edited by: David Halitsky on Jun 25, 2010 10:05 PM

