Using REGEX to return text from an html block
Using REGEX to return text from an html block
I want to retrieve all the text between the beginning and end of an html block, for example between the tags <mycontent> and </mycontent>. The text between the beginning and ending tags could be of arbitrary length and could contain any characters. It should be simple with regex, but despite trying to make sense of examples in the Gambas help files and the vast amount of material online, I still haven't found a simple explanation of regex codes that I can get my aging brain around. Could anyone help with a simple suggestion.
- cogier
- Site Admin
- Posts: 1157
- Joined: Wednesday 21st September 2016 2:22pm
- Location: Guernsey, Channel Islands
Re: Using REGEX to return text from an html block
Can you post an example file on the site, so we can see what you are trying to clean up.
- BruceSteers
- Posts: 1790
- Joined: Thursday 23rd July 2020 5:20pm
- Location: Isle of Wight
- Contact:
Re: Using REGEX to return text from an html block
Why bother. just use gambas functions InStr() and Mid()
Or make it specifically for Html tag names like this...
' Get text in between 2 text patterns
Public Sub TextBetween(Text As String, StartString As String, EndString As String) As String
Dim iStart, iEnd As Integer
iStart = InStr(Text, StartString)
iEnd = InStr(Text, EndString, iStart)
If iStart = -1 Or iEnd = -1 Then Return ""
iStart += StartString.Len
Return Mid(Text, iStart, iEnd - iStart)
End
Public Sub Form_Open()
Print TextBetween("hello this <hm>some test text</hm>", "<hm>", "</hm>")
End
Or make it specifically for Html tag names like this...
Public Sub GetHtmlTag(Text As String, HtmlTagName As String) As String
Dim iStart, iEnd As Integer
iStart = InStr(Text, "<" & HtmlTagName & ">")
iEnd = InStr(Text, "</" & HtmlTagName & ">", iStart)
If iStart = -1 Or iEnd = -1 Then Return ""
iStart += HtmlTagName.Len + 2
Return Mid(Text, iStart, iEnd - iStart)
End
Public Sub Form_Open()
Print "'"; GetHtmlTag("hello this <mycontent>some test text</mycontent>", "mycontent"); "'"
End
If at first you don't succeed , try doing something differently.
BruceS
BruceS
Re: Using REGEX to return text from an html block
... or just use the gb.xml.html component which has all the tools needed to extract stuff from a html source without having o write it yourself.
b
b
Re: Using REGEX to return text from an html block SOLVED
Thank you all for your suggestions. Yes, I realized before reading your responses that it was unnecessary to use regex, and that Instr() and Mid() would do the trick. I am not familiar with the gb.xml.html component, but maybe I should also play around with that when I have a bit more time. I'm sorry for taking your time with a 'red herring'.