Page 1 of 1

Using REGEX to return text from an html block

Posted: Saturday 29th April 2023 3:03am
by bazzvn
I want to retrieve all the text between the beginning and end of an html block, for example between the tags <mycontent> and </mycontent>. The text between the beginning and ending tags could be of arbitrary length and could contain any characters. It should be simple with regex, but despite trying to make sense of examples in the Gambas help files and the vast amount of material online, I still haven't found a simple explanation of regex codes that I can get my aging brain around. Could anyone help with a simple suggestion.

Re: Using REGEX to return text from an html block

Posted: Saturday 29th April 2023 1:41pm
by cogier
Can you post an example file on the site, so we can see what you are trying to clean up.

Re: Using REGEX to return text from an html block

Posted: Sunday 30th April 2023 8:21am
by BruceSteers
Why bother. just use gambas functions InStr() and Mid()


' Get text in between 2 text patterns
Public Sub TextBetween(Text As String, StartString As String, EndString As String) As String
  
  Dim iStart, iEnd As Integer
  iStart = InStr(Text, StartString)
  iEnd = InStr(Text, EndString, iStart)
  If iStart = -1 Or iEnd = -1 Then Return ""
  iStart += StartString.Len
  Return Mid(Text, iStart, iEnd - iStart)
  
End


Public Sub Form_Open()

  Print TextBetween("hello this <hm>some test text</hm>", "<hm>", "</hm>")

End



Or make it specifically for Html tag names like this...


Public Sub GetHtmlTag(Text As String, HtmlTagName As String) As String
  
  Dim iStart, iEnd As Integer
  iStart = InStr(Text, "<" & HtmlTagName & ">")
  iEnd = InStr(Text, "</" & HtmlTagName & ">", iStart)
  If iStart = -1 Or iEnd = -1 Then Return ""
  iStart += HtmlTagName.Len + 2
  Return Mid(Text, iStart, iEnd - iStart)
  
End


Public Sub Form_Open()

  Print "'"; GetHtmlTag("hello this <mycontent>some test text</mycontent>", "mycontent"); "'"

End


Re: Using REGEX to return text from an html block

Posted: Sunday 30th April 2023 8:46am
by thatbruce
... or just use the gb.xml.html component which has all the tools needed to extract stuff from a html source without having o write it yourself.
b

Re: Using REGEX to return text from an html block SOLVED

Posted: Sunday 30th April 2023 12:30pm
by bazzvn
Thank you all for your suggestions. Yes, I realized before reading your responses that it was unnecessary to use regex, and that Instr() and Mid() would do the trick. I am not familiar with the gb.xml.html component, but maybe I should also play around with that when I have a bit more time. I'm sorry for taking your time with a 'red herring'.