Using REGEX to return text from an html block

Post your Gambas programming questions here.
Post Reply
bazzvn
Posts: 18
Joined: Wednesday 22nd February 2017 11:06am
Location: Vietnam

Using REGEX to return text from an html block

Post by bazzvn »

I want to retrieve all the text between the beginning and end of an html block, for example between the tags <mycontent> and </mycontent>. The text between the beginning and ending tags could be of arbitrary length and could contain any characters. It should be simple with regex, but despite trying to make sense of examples in the Gambas help files and the vast amount of material online, I still haven't found a simple explanation of regex codes that I can get my aging brain around. Could anyone help with a simple suggestion.
User avatar
cogier
Site Admin
Posts: 1118
Joined: Wednesday 21st September 2016 2:22pm
Location: Guernsey, Channel Islands

Re: Using REGEX to return text from an html block

Post by cogier »

Can you post an example file on the site, so we can see what you are trying to clean up.
User avatar
BruceSteers
Posts: 1523
Joined: Thursday 23rd July 2020 5:20pm
Location: Isle of Wight
Contact:

Re: Using REGEX to return text from an html block

Post by BruceSteers »

Why bother. just use gambas functions InStr() and Mid()


' Get text in between 2 text patterns
Public Sub TextBetween(Text As String, StartString As String, EndString As String) As String
  
  Dim iStart, iEnd As Integer
  iStart = InStr(Text, StartString)
  iEnd = InStr(Text, EndString, iStart)
  If iStart = -1 Or iEnd = -1 Then Return ""
  iStart += StartString.Len
  Return Mid(Text, iStart, iEnd - iStart)
  
End


Public Sub Form_Open()

  Print TextBetween("hello this <hm>some test text</hm>", "<hm>", "</hm>")

End



Or make it specifically for Html tag names like this...


Public Sub GetHtmlTag(Text As String, HtmlTagName As String) As String
  
  Dim iStart, iEnd As Integer
  iStart = InStr(Text, "<" & HtmlTagName & ">")
  iEnd = InStr(Text, "</" & HtmlTagName & ">", iStart)
  If iStart = -1 Or iEnd = -1 Then Return ""
  iStart += HtmlTagName.Len + 2
  Return Mid(Text, iStart, iEnd - iStart)
  
End


Public Sub Form_Open()

  Print "'"; GetHtmlTag("hello this <mycontent>some test text</mycontent>", "mycontent"); "'"

End

If at first you don't succeed , try doing something differently.
BruceS
User avatar
thatbruce
Posts: 161
Joined: Saturday 4th September 2021 11:29pm

Re: Using REGEX to return text from an html block

Post by thatbruce »

... or just use the gb.xml.html component which has all the tools needed to extract stuff from a html source without having o write it yourself.
b
Have you ever noticed that software is never advertised using the adjective "spreadable".
bazzvn
Posts: 18
Joined: Wednesday 22nd February 2017 11:06am
Location: Vietnam

Re: Using REGEX to return text from an html block SOLVED

Post by bazzvn »

Thank you all for your suggestions. Yes, I realized before reading your responses that it was unnecessary to use regex, and that Instr() and Mid() would do the trick. I am not familiar with the gb.xml.html component, but maybe I should also play around with that when I have a bit more time. I'm sorry for taking your time with a 'red herring'.
Post Reply