Skip to content

Latest commit

 

History

History

count-iterate

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Count and Iterate Over a Go String

A simple example that shows how we count and iterate over a string by rune and byte.

Unicode can be complex, but it isn't rocket science. Here's a short primer to understand the basics of Unicode and text encoding. Also, read the blog on strings, bytes, and runes, and characters on the official Go website.

Definitions

Term Description
String String is a read-only slice of arbitrary bytes encoded in UTF-8 (absent of byte-level escapes.
Code point A numerical value that is mapped to a character. It's dependent on the character encoding eg. ASCII or Unicode.
Rune Go speak for code point.
Byte 8-bit or 1-byte of a unit of digital information.
Character An abstract representation of a symbol. The term character is often ambiguous and it really depends on the context given a character can be represented in different ways.

Byte Count vs Rune Count

Let's use this string "你好世界" as an example. In Go, we can represent it in the following ways:

// Both strings are equivalent.
str := "你好世界"   // As UTF-8 string.
str := "\xe4\xbd\xa0\xe5\xa5\xbd\xe4\xb8\x96\xe7\x95\x8c"   // As bytes using byte-level escapes.

How big is a string? It depends. You can count the size by number of bytes or by number of runes.

fmt.Println(len(str))                       // Prints 12
fmt.Println(utf8.RuneCountInString(str))    // Prints 4

See source code for details.

Iterate by Byte vs Iterate by Rune

To iterate over a string by byte:

for i := 0; i < len(str); i++ {
	fmt.Printf("%x ", str[i])
}

for _, b := range []byte(str) {
	fmt.Printf("%x ", b)
}

To iterate over a string by rune:

for i, b := range str {
	fmt.Printf("%q starts at position (in byte) %d ", b, i)
}

Pay close attention to the index i, it denotes the index of the rune in byte units.

See source code for details.

Setup

  1. Run the program.

    $ make run

Credits and Reference