all of the punctuation, dialogues tags and other formatting codes so that you can write analysis functions on the remaining text.
I'm 'trying' to create a Gambas application to help me in my SciFi/Fantasy story writing organisation.
{Bit of a cross between 'World Anvil & Grammerly'}
Private Function CleanTextLine(InLine As String) As String
InLine = Replace(InLine, String.Chr(34), "") 'Unicode Character 'QUOTATION MARK' (U+0022)
InLine = Replace(InLine, ",", "") 'Normal comma
InLine = Replace(InLine, ".", "") 'Normal period
InLine = Replace(InLine, "...", "") 'Unicode Character 'HORIZONTAL ELLIPSIS' (U+2026)
InLine = Replace(InLine, "'", "") 'Normal single quote
InLine = Replace(InLine, String.Chr(8220), "") 'Left slanted double quote
InLine = Replace(InLine, String.Chr(8221), "") 'Right slanted double quote
InLine = Replace(InLine, String.Chr(8217), "") 'Left single quote
InLine = Replace(InLine, String.Chr(8218), "") 'Right single quote
InLine = Replace(InLine, "?", "") 'Question mark
InLine = Replace(InLine, Gb.NewLine, "") '/n
InLine = Replace(InLine, String.Chr(9), "") 'Tab
InLine = Replace(InLine, String.Chr(160), "") 'Unicode Character 'NO-BREAK SPACE' (U+00A0)
InLine = Replace(InLine, String.Chr(8211), "") 'Unicode Character 'EN DASH' (U+2013)
InLine = Replace(InLine, String.Chr(8212), "") 'Unicode Character 'EM DASH' (U+2014)
Return InLine
End
Importing the .odt document to html so that you can use it in a WebView control
There is a bit of useless 'swarf' imported with the code and I've found removing in bulk helps:
Exec ["soffice", "--headless", "--convert-to", "html:HTML", {TargetFilePath}] Wait
TmpText = File.Load({TargetFilePath})
MyWebViewControl.HTML = TmpText
To remove:style="margin-bottom: 0cm; line-height: 100%"
style="font-style: normal"
<span style="background: transparent">
</span>