A reader writes....

Post your Gambas programming questions here.
Post Reply
User avatar
Cedron
Posts: 156
Joined: Thursday 21st February 2019 5:02pm
Location: The Mitten State
Contact:

A reader writes....

Post by Cedron »

I recently got a letter from a Dorothy in Kansas, she writes:
Gosh, Mr. Dawg, why do you hate tabs so much?
Well, Dottie, may I call you Dottie? First a shout out to farm country, you are feeding the world, and we appreciate you so much. Hug a farmer today. It's also in the middle of tornado alley, so please be careful out there.

Whatever gave you the idea that I hate tabs? I love tabs, they are the best at what they do, no other character can replace them. When tabs are outlawed, only outlaws will use tabs.

So, anticipating your next question, what are tabs good for?

They are absolutely the best delimiter to use when passing text files to a spreadsheet program. Commas, which are commonly used for this, come with a host of problems:

Numeric looking fields at are actually character strings (you are in a text file after all), like 1,234.56, or the convention of using commas for decimal points, will throw off a parser, so those fields have to be wrapped in quotes. But, oh no, what if that field also has a quote in it, then it has to be escaped (replaced with a sequence that will parse correctly). There are two prevalent methods, using a \" or doubling up the quote "". You can see a parser has to be one or the other, it can't be both and work properly. So, if you are writing a program that is dealing with unknown values, you have to take all these things into consideration. What a pain.

Now, let's bring in the tab. Stick a tab between each field and all those issues go away, poof. A tab is considered a white space character, so it is invisible when a document is sent to a printer. If there is even the possibility that there might be a tab character coming in, all you have to do is replace it with the standard escape sequence \t. Most spreadsheet programs will interpret this literally, so that is what you will see when it is loaded in a cell, but it will be in the right cell. Spreadsheets don't handle tabs as characters in values very either, so you are not likely to encounter this.

To demonstrate, here is a code sample of a writer reading:
  Dim IO As File          ' Input/Output
  Dim FileName As String
  Dim D As String         ' Delimiter Character 
  Dim Cell As String
  Dim Cells As String[]
  Dim InputLine As String
  Dim Row As Integer
  Dim Col As Integer
  
  FileName = "~/test.csv"
  D = ","
  GoSub WriteFile
  GoSub ReadFile
  
  FileName = "~/test.tsv"
  D = gb.Tab
  GoSub WriteFile
  GoSub ReadFile

  Return
  
WriteFile:

  Print
  Print FileName  

  IO = Open FileName For Output Create
  
  Print #IO, "1,234.56"; D; "Howdy, folks"; D; "I'm in column 3"
  Print #IO, "Embedded \t tab"; D; " Howdy, \"folks\""; D; "I'm in column 3"
  
  Close #IO    

  Return

ReadFile:

  IO = Open FileName 

  Row = 1    
  Do Until IO.EndOfFile
     Line Input #IO, InputLine
     Col = 1    
     Cells = Split(InputLine, D)
     For Each Cell In Cells
       Print Row, Col, Cell  
       Inc Col
     Next
     Inc Row
  Loop
  
  Close #IO    

  Return
Your output should look like this:

Code: Select all

~/test.csv
1       1       1
1       2       234.56
1       3       Howdy
1       4        folks
1       5       I'm in column 3
2       1       Embedded         tab
2       2        Howdy
2       3        "folks"
2       4       I'm in column 3

~/test.tsv
1       1       1,234.56
1       2       Howdy, folks
1       3       I'm in column 3
2       1       Embedded 
2       2        tab
2       3        Howdy, "folks"
2       4       I'm in column 3
So you see, Dottie, neither approach is foolproof, but one is easier to defend against hackers than the other, ummmmm, I mean easier to fix. Try each in your favorite spreadsheet program to see how they react.

Just keep the tabs in your text files and out of your source code. And watch out for all those \r and \n's too, but that will have to wait. A replace "\\" with "\\\\" should fix all that. But that is a story for another time. Thanks for reading, and writing, Dottie. Remember it's always a good time to go for a walk.

Sincerely,

Mr. Dawg
.... and carry a big stick!
Post Reply