Making PCRE compatible or similar to other RegExps

Ask about the individual Gambas components here.
Post Reply
sergioabreu
Regular
Posts: 106
Joined: Tue Jul 09, 2024 9:27 am

Making PCRE compatible or similar to other RegExps

Post by sergioabreu »

PCRE is nice but a little "peculiar".

Their properties and methods behave different from Regexp in other famous languages. (PHP, javascript, etc)

I am not criticizing, just reporting facts and I offer a sugestion in the end that makes it compatible and a "real" findall method, you can call it allMatches method

RegExp.text is actually the "Match[0]" but is not included in the Count value, making a bit confusing if the person didn't read the documentation carefully.
Checking Count=0 leads a beginner to think "Oh, there is no matches..." leaving the RegExp.Text that actually has Match[0]

findall can not be executed against the full pattern with some paretheses groups. It only works if we provide "subgroup pattern" to it.
For example:
RegExp.findall( "the whole text 1234", "[\\w\\ ]+(\\d+)", RegExp.Extended)

Will find only the whole string, skipping subgroup match. But if the pattern is only \\w+ it finds 4 tokens.

I wrote this peace of code that makes it similar to other languages.
Using the example above, this Function will return the full source matched AND all its submatches, with a real Count= 2 that is the whole + the submatch

Public Function allMatches(source As String, pattern As String, Optional regOptions As Integer = 11) As String[]

  Dim sx As New String[]
  Dim i As Integer
  Dim reg As New RegExp(source, pattern, regOptions) '11 = extended + multiline + case insensitive

  If reg.text Then 
     sx.Add(reg.text) ' <= Match 0
  Endif
  
  If reg.Count > 0 Then  '  <= Subgroup Matches 1, 2, etc...  
    i = 1    
    While i <= reg.Count
      sx.Add(reg[i].text)
      i += 1
    Wend    
  Endif   
  
  Return sx
  
End



That could be added in future versions, an allMatches method, please show this to Benoir.
Post Reply