Solved: Problem with REGEXP_SUBSTR

Former Member · ‎01-02-2015

I am having a problem with REGEXP_SUBSTR(). I am trying to extract "captureThis" from the following string:

Token1 blah, blah, (Token1 blah Token2 ignoreThis blah) blah, blah Token2 captureThis blah blah'

The rules are that it must be preceded by "Token1" followed by arbitrary text, terminated by "Token2". Except that if the same pattern appears in parentheses in the arbitrary text, everthing in parentheses should be ignored.

If I run the following REGEXP_SUBSTR() statement, I get a result that correctly ends in "captureThis"

select REGEXP_SUBSTR('Token1 blah, blah, (Token1 blah Token2 ignoreThis blah) blah, blah Token2 captureThis blah blah',
'Token1\\s+((\\([^\\)]+\\))|(((?!Token2).)+))*\\s+Token2\\s+\\S*' ,
1,1);

Result: Token1 blah, blah, (Token1 blah Token2 ignoreThis blah) blah, blah Token2 captureThis

However, if I use a Positive Lookbehind Zero-Width Assertion to filter out the tokens, I get a different result:

select REGEXP_SUBSTR('Token1 blah, blah, (Token1 blah Token2 ignoreThis blah) blah, blah Token2 captureThis blah blah',
'(?<=Token1\\s+((\\([^\\)]+\\))|(((?!Token2).)+))*\\s+Token2\\s+)\\S*'
, 1,1);

Result: ignoreThis (I need it to be "captureThis")

I also found an interesting result by playing with the fourth parameter, occurrence-number. In the first statement above, occurrence-number 1 is a string ending in "captureThis", and occurrence-number 2 is a string ending in "ignoreThis".

However, in the second statement, with the Lookbehind, the order is reversed. Occurrence-number 1 is "ignoreThis", and occurrence-number 2 is "captureThis".

Is there any way to alter the regular expression in the second statement so occurrence-number 1 will be "captureThis"?

Thanks,

Eric

Ali_Chalhoub · ‎01-20-2015

Hello,

I have discussed this with my development team and here is our analysis:

Here our breakdown and annotation of the of the expression:

'(?<=Token1\\s+((\\([^\\)]+\\))|(((?!Token2).)+))*\\s+Token2\\s+)\\S*'

'
(?<=                                                       zero width look-behind assertion
    Token1                                                             match Token1
    \\s+                                                   one or more spaces
    (
      (                                                                                        alternative 1
        \\(                                                                    match '('
                [^\\)]+                                                                    match anything but ')'
        \\)                                                                    match ')'
      )
     |
      (                                                                                        alternative 2
        (
          (?!Token2)                                                                  look ahead and do _not_ match 'Token2' (negative zero width look ahead assertion)
          .                                                                       match any character
        )+                                                                      match one or more - i.e. match any character up to (but not including) 'Token2'
      )
    )*                                                                        match zero or more of the two alternatives
    \\s+
    Token2
    \\s+
)                                                              end zero width look-behind assertion
\\S*
'

Now note that the zero width look-behind assertion will match
Token1 blah, blah, (Token1 blah
because the ‘(‘ in this string matches alternative 2 (i.e. the dot that does not precede ‘Token2’) !!

Hence regexp_subtr is working as intended.

If that answer your question, please mark the question answered.

Thank you

Ali_Chalhoub · ‎01-20-2015

So you want the result of the second expression to be like the first:

Result: Token1 blah, blah, (Token1 blah Token2 ignoreThis blah) blah, blah Token2 captureThis

Please confirm!!!

Problem with REGEXP_SUBSTR

Accepted Solutions (1)

Accepted Solutions (1)

Answers (1)

Answers (1)

Re: git command to connect to BTP Destination poin...

Re: Making an API call and unzipping a file in Dat...

Re: Making an API call and unzipping a file in Dat...

Re: How to make a REST "PUT" call to a dynamic url...

Re: User Deprovisioning Signavio