Go - Slices cont. and Unicode

Contemporary Programming Languages - CS2001 - 31 October 2017

(You may assume all snippets are embedded in otherwise compiling programs) What is the output of the following code snippet?

s := []float64 {2, 3, 5, 7, 11, 13}
t := s[1:len(s)-1]
fmt.Println("s:", s, len(s), cap(s)) // s: [2, 3, 5, 7, 11, 13] 6 6
fmt.Println("t:", t, len(t), cap(t)) // t: [3, 5, 7, 11] 4 5
s[3] = 17
fmt.Println("s:", s, len(s), cap(s)) // s: [2, 3, 5, 17, 11, 13] 6 6
fmt.Println("t:", t, len(t), cap(t)) // t: [3, 4, 17, 11] 4 5

PRIME TEST MATERIAL

Slices cont.

Appending to Slices

  • func append(slice []T, elem T) // []T is returned
    • Built in function
    • Appends to the end of a slice
      • If the slice has sufficient capacity, the destination is resliced to accomodate the new elements
      • If there isn’t enough room, a new underlying array will be allocated
    • Append returns a new slice
    • It is necessary to hang on to the return value
    // Simple example
    s := []int{1, 2}
    s = append(s, 3)
    fmt.Println(s) // [1, 2, 3]
    
    // Reallocating a Slice
    s := []int{1, 2}
    fmt.Println(s, len(s), cap(s)) // [1, 2] 2 2
    t := append(s, 3)
    fmt.Println(t, len(t), cap(t)) // [1, 2, 3] 3 4
    

    The amount of capacity granted after a reallocation is not always going to be the same, and therefore is insignificant knowledge.

    // Reslicing
    s := make([]int, 2, 10)
    s[0], s[1] = 1, 2
    fmt.Println(s, len(s), cap(s)) // [1, 2] 2 10
    t := append(s, 3)
    fmt.Println(t, len(t), cap(t)) // [1, 2, 3] 3 10
    s[0] = 5
    fmt.Println(s, t) // [5, 2] [5, 2, 3]
    

    You will almost always overwrite your slice with the appended slice. s = append(s, 3, 4, 3) WILL BE TESTED ON APPEND

Unicode

string is an alias of []byte

  • Unicode is a collection of symbols including letters, numbers, emoji, accents, etc.
    • Unicode repitoire has more than 128,000 code points.
  • A code point is not necessarily a character.
    • U+0041 is A
    • U+030A is a little circle
    • U+0041U+030A is A with a circle on top
    • Any code point in Unicode can be described/stored in Go as a rune.
  • Unicode can be encoded in several ways using UTF, which is the Unicode Transformation Format
    • UTF-32
      • Fixed width
      • Each code point is directly indexable
      • []rune
    • UTF-16
      • Each code point gets 1 or 2 16-bit units
      • Variable width
      • []uint16
    • UTF-8
      • Each code point gets 1, 2, 3, or 4 8-bit units
      • Variable width
      • []uint8 or []byte or string
      • First 128 entries in the ASCII table match up with UTF-8
        u32 := []rune{'h', 'e', 'l', 'l', 'o', '<neutral-face-emoji>'}
        fmt.Printf("%x\n", u32) // [68, 65, 6c, 6c, 6f, 1f610] 24 bytes
        u16 := utf16.Encode(u32)
        fmt.Printf("%x\n", u16) // [68, 65, 6c, 6c, 6f, d8cd, de10] 14 bytes
        u8 := string(u32)
        fmt.Printf("% x\n", u8) // 68 65 6c 6c 6f f0 9f 98 90 9 bytes
        

On strings

  • A string is simply a slice of bytes
  • Go does not guarantee that the slice will be ASCII encoded, UTF-8 encoded or anything else
  • Go source code is UTF-8, so the source for string literals is UTF-8 text.
    s := "hello <neutral-face-emoji>" // a UTF-8 encoded string
    
    r := 'o'
    s := string(r)
    t := []byte(s)
    fmt.Println(s) // o
    fmt.Println(len(s)) // 1
    fmt.Println(t) // [III]
    
    r := '<o-with-two-dots-on-top>'
    s := string(r) 
    t := []byte(s) //
    fmt.Println(s) // <o-with-two-dots-on-top>
    fmt.Println(len(s)) // 2
    fmt.Println(t) // [195, 182]
    c := utf8.RuneCountInString(s)
    fmt.Println(c) // 1