Exposing bytes for what they really are

Post your Gambas programming questions here.
Post Reply
User avatar
Cedron
Posts: 156
Joined: Thursday 21st February 2019 5:02pm
Location: The Mitten State
Contact:

Exposing bytes for what they really are

Post by Cedron »

Bytes are the fundamental unit of storage in most modern computers. They are composed of eight bits, which can be considered two groupings of four. For convenience sake, binary numbers can be written in shorthand using hex numbers. Each grouping of four corresponds to a hex digit. The these two groupings in a byte are sometimes referred to as nybbles.

ASCII (American Standard Code for Information Interchange) is the mapping of the byte values from 0 to 127 to the common printed characters. The values from 0 to 31 are control characters, meaning they coordinate transmission on a teletype. The values from 48 to 57 are the decimal digit characters "0" to "9". The values from 65 to 90 are "A" to "Z", and 97 to 122 are "a" to "z".

Gambas makes dealing with byte value fairly simple. The sample program and its output demonstrate some of the concepts and syntax.

'=============================================================================
Public Sub Main()

'---- Sample string of ordinary characters

        Dim theSample As String = "0123 ABCD Gambas Zz"
        
        For p As Integer = 1 To Len(theSample)
          Dim theByteValue As Byte = Asc(Mid(theSample, p, 1))
          DisplayByteValue(theByteValue)
        Next

'---- Some ASCII characters

        Print
        For d As Integer = 0 To 5
          Print d, Chr(48 + d), Chr(64 + d), Chr(96 + d)
        Next

'---- Some special characters
        
        Print
        Print "   Null aka \\0 = "; Asc("\0")
        Print "   Tab  aka \\t = "; Asc("\t"), Asc(gb.Tab)
        Print "   LF   aka \\n = "; Asc("\n"), Asc(gb.Lf)
        Print "   CR   aka \\r = "; Asc("\r"), Asc(gb.Cr)

End
'=============================================================================
Private Sub DisplayByteValue(argByteValue As Byte)

        Print Chr(argByteValue); "  ";
        Print Right("00000000" & Bin(argByteValue), 8); "   ";
        Print "&"; Right("00" & Hex(argByteValue), 2); "&   ";
        Print Right("   " & Str(argByteValue), 3); "     ";
        Print HexSums(argByteValue);
        Print BinarySums(argByteValue)

End
'=============================================================================
Private Sub HexSums(argByteValue As Byte) As String

        Dim theHighNybble As Byte = Shr(argByteValue, 4)
        Dim theLowNybble As Byte = argByteValue And &0F&
        
        Dim r As String = Str(theHighNybble) & " * 16 + " & Str(theLowNybble)

        Return Left(r & "                      ", 16)

End
'=============================================================================
Private Sub BinarySums(argByteValue As Byte) As String

        If argByteValue = 0 Then Return "0"

        Dim theMaskValue As Integer = 128 ' 100000000b
        
        Dim theResult As String
        
        For b As Integer = 7 To 0 Step -1
          If (argByteValue And theMaskValue) > 0 Then
             theResult &= " + " & Str(theMaskValue)
          End If    
           '  theMaskValue = Shr(theMaskValue, 1)
                      theMaskValue /= 2
        Next 

        Return Mid(theResult, 4)
End
'=============================================================================

Here is the output:

Code: Select all

0  00110000   &30&    48     3 * 16 + 0      32 + 16
1  00110001   &31&    49     3 * 16 + 1      32 + 16 + 1
2  00110010   &32&    50     3 * 16 + 2      32 + 16 + 2
3  00110011   &33&    51     3 * 16 + 3      32 + 16 + 2 + 1
   00100000   &20&    32     2 * 16 + 0      32
A  01000001   &41&    65     4 * 16 + 1      64 + 1
B  01000010   &42&    66     4 * 16 + 2      64 + 2
C  01000011   &43&    67     4 * 16 + 3      64 + 2 + 1
D  01000100   &44&    68     4 * 16 + 4      64 + 4
   00100000   &20&    32     2 * 16 + 0      32
G  01000111   &47&    71     4 * 16 + 7      64 + 4 + 2 + 1
a  01100001   &61&    97     6 * 16 + 1      64 + 32 + 1
m  01101101   &6D&   109     6 * 16 + 13     64 + 32 + 8 + 4 + 1
b  01100010   &62&    98     6 * 16 + 2      64 + 32 + 2
a  01100001   &61&    97     6 * 16 + 1      64 + 32 + 1
s  01110011   &73&   115     7 * 16 + 3      64 + 32 + 16 + 2 + 1
   00100000   &20&    32     2 * 16 + 0      32
Z  01011010   &5A&    90     5 * 16 + 10     64 + 16 + 8 + 2
z  01111010   &7A&   122     7 * 16 + 10     64 + 32 + 16 + 8 + 2

0       0       @       `
1       1       A       a
2       2       B       b
3       3       C       c
4       4       D       d
5       5       E       e

   Null aka \0 = 0
   Tab  aka \t = 9      9
   LF   aka \n = 10     10
   CR   aka \r = 13     13

Here is a related post from long ago:

viewtopic.php?p=1553
.... and carry a big stick!
User avatar
Cedron
Posts: 156
Joined: Thursday 21st February 2019 5:02pm
Location: The Mitten State
Contact:

Re: Exposing bytes for what they really are

Post by Cedron »

You may have wondered why I wrapped my hex values in ampersands rather than use the traditional '&H' prefix.

Consider the following Gambas code:

        Print Val("&Babe&"), &Babe&

        Print Val("&HBabe"), &HBabe

        Print Val("&HBabe&"), &HBabe&


A = 10, B = 11, and E = 14 so the result should be 11*16^3 + 10*16^2 + 11*16 + 14, right?

The experts snicker.

Somewhere, stored in memory, it would look something like this:

Code: Select all

          Address  Hex    Binary
          ::::::::
          ######9E ??
          ######9F ??
Varptr--> ######A0 BE  1011 1110 
          ######A1 BA  1011 1010
          ######A2 ??
          ######A3 ??
          ######A4 ??
          ######A5 ??
          ::::::::
Think of it as a big byte array where the address is the index.

This is a little endian representation of a two byte integer value. In Gambas this is the variable type 'Short'. The common integer type is a four byte version, also little endian. (BTW, serial transmissions are also little endian bitwise, with the most significant bit, often used as a parity bit, comes last.)

In a signed integer variable, the highest order bit determines the sign. If it is set, the number is negative. The difference between signed and unsigned integers can be understood by looking at a three bit example.

Code: Select all

    000    0   0
    001    1   1
    010    2   2
    011    3   3
    100    4  -4
    101    5  -3
    110    6  -2
    111    7  -1
Same bit patterns, different interpretations. Now, if you want to store the three bit value in an eight bit byte, you would put the three bits in the lowest positions.

Code: Select all

    00000XXX 
As a byte, those will have values strictly between 0 and 7 inclusive. As a signed conversion, the highest order bit needs to be "sign extended", so the results look like this:

Code: Select all

    000000XX  Positive values
    111111XX  Negative values
When short integers values are put into integer variables, they are sign extended.

Code: Select all

          ######9F ??
Varptr--> ######A0 BE  1011 1110 
          ######A1 BA  1011 1010
          ######A2 FF  1111 1111
          ######A3 FF  1111 1111
          ######A4 ??
But what does the "Val" function do with a value represented by a string of characters?

Let's check the output of the program:

Code: Select all

Priceless!   Priceless!
Priceless!   Priceless!
Priceless!   Priceless!
Okay, I've got a strange computer, yours probably printed numbers.
.... and carry a big stick!
User avatar
Cedron
Posts: 156
Joined: Thursday 21st February 2019 5:02pm
Location: The Mitten State
Contact:

Re: Exposing bytes for what they really are

Post by Cedron »

Just for completeness, here are the printable* ASCII characters, arranged by their byte value codes.

Code: Select all

                   0 1 2 3 4 5 6 7 8 9 A B C D E F

00100000 &20&  32    ! " # $ % & ' ( ) * + , - . /
00110000 &30&  48  0 1 2 3 4 5 6 7 8 9 : ; < = > ?
01000000 &40&  64  @ A B C D E F G H I J K L M N O
01010000 &50&  80  P Q R S T U V W X Y Z [ \ ] ^ _
01100000 &60&  96  ` a b c d e f g h i j k l m n o
01110000 &70& 112  p q r s t u v w x y z { | } ~ 
(*) Note, 127 isn't exactly printable. It is known as "rub out".

Here is the code that produced it.
        Dim theHighNybble, theLowNybble, theHighValue, theByteValue As Integer

        Print "                  ";

        For theLowNybble = 0 To 15
         Print " "; Hex(theLowNybble);
        Next

        Print
        Print

        For theHighNybble = 2 To 7
          theHighValue = theHighNybble * 16 ' &10&  00010000b  shl by 4

          Print Right("00000000" & Bin(theHighValue), 8); " ";
          Print "&" & Right("00" & Hex(theHighValue), 2); "& ";
          Print Right("        " & Str(theHighValue), 3); " ";

          For theLowNybble = 0 To 15
            theByteValue = theHighValue + theLowNybble  
            Print " "; Chr(theByteValue);
          Next

         Print
        Next    

.... and carry a big stick!
User avatar
Cedron
Posts: 156
Joined: Thursday 21st February 2019 5:02pm
Location: The Mitten State
Contact:

Re: Exposing bytes for what they really are

Post by Cedron »

Here is a demonstration of little endianess, and the answer to the above.

Priceless.
'=============================================================================
Public Sub Main()

        DisplayMemoryOfInteger(&Babe&)
        DisplayMemoryOfInteger(&HBabe)

End
'=============================================================================
Public Sub DisplayMemoryOfInteger(ArgIntegerValue As Integer)

        Dim theAddress As Pointer = VarPtr(ArgIntegerValue)

        Print 
        
        For m As Integer = 0 To 3
          Dim theByteValue As Byte = Byte@(theAddress)
          Dim theBinary As String = Right("00000000" & Bin(theByteValue), 8)
        
          Print Hex(theAddress); ": "; Right("00" & Hex(theByteValue), 2);
          Print "  "; Left(theBinary, 4); " "; Right(theBinary, 4)
          
          Inc theAddress
        Next

End
'=============================================================================

Code: Select all

FFFF92797028: BE  1011 1110
FFFF92797029: BA  1011 1010
FFFF9279702A: 00  0000 0000
FFFF9279702B: 00  0000 0000

FFFF92797028: BE  1011 1110
FFFF92797029: BA  1011 1010
FFFF9279702A: FF  1111 1111
FFFF9279702B: FF  1111 1111
Negative sign? Did anybody see a negative sign? I always like to treat hex constants as unsigned integers.
Print -&100&, -&FF&

Those are negative signs, and they work as expected.

Code: Select all

-256    -255
Gambas treats Bytes as unsigned.
Dim theByte As Byte = -1

Print theByte, Bin(theByte)

Code: Select all

255     11111111
Here is the official narrative on the matter:
http://gambaswiki.org/wiki/lang/type/integer
.... and carry a big stick!
User avatar
Cedron
Posts: 156
Joined: Thursday 21st February 2019 5:02pm
Location: The Mitten State
Contact:

Re: Exposing bytes for what they really are

Post by Cedron »

This example should convince you that using the &form&, like quotation marks, rather than the &Hform is a good practice. Besides, in ordinary syntax with constant values, it makes for better looking code, as in easier to read and understand.
Print Hex(15), Val("&" & Hex(15) & "&"), Val("&H" & Hex(15))
Print Hex(250), Val("&" & Hex(250) & "&"), Val("&H" & Hex(250))
Print Hex(4013), Val("&" & Hex(4013) & "&"), Val("&H" & Hex(4013))
Print Hex(64222), Val("&" & Hex(64222) & "&"), Val("&H" & Hex(64222))
Print Hex(1027565), Val("&" & Hex(1027565) & "&"), Val("&H" & Hex(1027565))

Code: Select all

F       15      15
FA      250     250
FAD     4013    4013
FADE    64222   -1314
FADED   1027565 1027565
Spot the "Gotcha!" lurking in there?


Here is another illustration of the boundary, (or the odometer rollover), for signed integers.
        For theValueAsInteger As Integer = -4 To 3
          Dim theByte As Byte = theValueAsInteger
          Dim theShort As Short = theValueAsInteger
          Dim theLong As Long = theValueAsInteger
        
          Print theValueAsInteger,
          Print Bin(theValueAsInteger, 3); "  ";
          Print Bin(theByte, 8); "  ";
          Print Hex(theByte, 2); "  ";
          Print Hex(theByte),
          Print Hex(theShort); " ";
          Print Hex(theValueAsInteger); " ";
          Print Hex(theLong)
        Next

Note the quirky behavior of Hex(Long) vs Hex(Byte).

Code: Select all

-4      100  11111100  FC  FC   FFFFFFFFFFFFFFFC FFFFFFFFFFFFFFFC FFFFFFFFFFFFFFFC
-3      101  11111101  FD  FD   FFFFFFFFFFFFFFFD FFFFFFFFFFFFFFFD FFFFFFFFFFFFFFFD
-2      110  11111110  FE  FE   FFFFFFFFFFFFFFFE FFFFFFFFFFFFFFFE FFFFFFFFFFFFFFFE
-1      111  11111111  FF  FF   FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF
0       000  00000000  00  0    0 0 0
1       001  00000001  01  1    1 1 1
2       010  00000010  02  2    2 2 2
3       011  00000011  03  3    3 3 3
Just like if you took a brand new car with zero miles, drove it in reverse for a mile (with an odometer that allowed rollback) it would read "999999" for however many digits there are.

Or like grouping decimal numbers with commas (U.S. style) to effectively make a base 1000 numbering system.

In summary:

Code: Select all

Number       Function     String of characters

Byte     ---->   Chr        ---->  Character          
Byte     <----   Asc        <----  Character          

Integer  ---->   Str        ---->  Text Decimal Representation
Integer  <----   Val        <----  Text Decimal Representation

Integer  ---->   Hex        ---->  Text Hexadecimal Representation
Integer  <----   Val(& &)   <----  Text Hexadecimal Representation
These functions, and more, can be found at the language index page of the Gambas Wiki:

http://gambaswiki.org/wiki/lang

That's where I found that bin and hex can take a second argument specifying the zeropadded length. The previous code in this post would look better converted, but I'm not going to change them.

The next steps are how floating point numbers can be stored in the same bit patterns, and on the character side, how Utf-8 works. Fixed point formats are just integers with an implied whole/fraction partition. ("Decimal point" doesn't fit, and "Binary point" just doesn't seem to apply.)

Then how strings and objects are stored. After that, you'll be ready to write, or at least understand, function calls to shared libraries. Even write shared libraries of your own.
.... and carry a big stick!
Post Reply