Any of the natural numbers, the negatives of these numbers, or zero.
-
Ranges
-
Signed: -(2n-1) to (2n-1 - 1)
-
Unsigned: 0 to (2n - 1)
-
-
Fast hexadecimal ⇒ binary conversions because log2(16) = 4
-
Fast operations: Comparison, Negation, Addition, Multiplication
| Real Value | Memory Value |
|---|---|
0x0 |
0x0 |
0x1 |
0x1 |
-0x1 |
0xFFFFFFFF |
-0x10 |
0xFFFFFFF0 |
-0x100 |
0xFFFFFF00 |
-0x1000 |
0xFFFFF000 |
-0x10000 |
0xFFFF0000 |
-0x100000 |
0xFFF00000 |
-0x1000000 |
0xFF000000 |
-0x10000000 |
0xF0000000 |
-0x80000000 |
0x80000000 |
0x7FFFFFFF |
0x7FFFFFFF |
This representation mechanism is called Two’s Complement. Sum of a positive number and its negative counterpart gives 2n (except zero, n is number of bits). The conversion between negative and positive (multiplication with -1) can be done very fast as bit negation (ones' complement) and increase by 1.
Specifies interchange and arithmetic formats and methods for binary and decimal floating-point arithmetic in computer programming environments. IEEE = Institute of Electrical and Electronics Engineers
There are some reserved values for signed zero, infinity, and NaN (not a number). For non-reserved values, numeric representation can be obtained with the following formula:
#include <stdio.h>
int main()
{
double a;
double b;
a = 1;
b = a + 1;
if(a == b) {
printf("this would be weird\n");
}
a = 10000000000000000.0;
b = a + 1;
if(a == b) {
printf("yes, this is perfectly normal\n");
}
}See https://en.wikipedia.org/wiki/Double-precision_floating-point_format for detailed information.
string name = "Antonín Dvořák";
//name.Length == 14UCS = Universal Coded Character Set; UTF = UCS Transformation Format
UTF-8 encoded string occupies 17 bytes.
00000000: 41 6e 74 6f 6e c3 ad 6e 20 44 76 6f c5 99 c3 a1 Anton..n Dvo.... 00000010: 6b k
UTF-32 encoded string occupies 56 bytes.
00000000: 00 00 00 41 00 00 00 6e 00 00 00 74 00 00 00 6f ...A...n...t...o 00000010: 00 00 00 6e 00 00 00 ed 00 00 00 6e 00 00 00 20 ...n.......n... 00000020: 00 00 00 44 00 00 00 76 00 00 00 6f 00 00 01 59 ...D...v...o...Y 00000030: 00 00 00 e1 00 00 00 6b .......k...
-
UTF-32 uses fixed four bytes
-
UTF-8 uses a byte at the minimum in encoding the characters
-
UTF-8 encoded file tends to be smaller
-
UTF-8 is compatible with ASCII
Consider problems like sorting, error detection, length determination, and conversions between distinct encodings.
A regular expression, regex or regexp is a string that defines a search pattern.
Such patterns can be used for match or replace operations on strings. The
control syntax uses a limited set of characters: {}[]()^$.|+?-
The implementations can vary, look at Regular expressions in Java, etc.
A vertical bar separates alternatives. apple|orange
Parentheses are used to define the scope and precedence of the operators. For example, gray|grey and gr(a|e)y are equivalent patterns which both describe the set of "gray" or "grey". Grouping is also used to specify and extract specific data within the regex match.
A quantifier how often the previous element must precede to match.
-
?optional occurrence, colou?r matches both "color" and "colour", also used for greediness control -
*zero or more occurrence; ab*c matches "ac", "abc", "abbc", "abbbc", and so on. -
+at least one occurrence; ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac". -
{n}The preceding item is matched exactlyntimes. -
{min,}The preceding item is matched at leastmintimes. -
{min,max}The preceding item is matched at leastmintimes, but less thanmaxtimes.
-
^beginning of a line (also used as a negation, see below) -
$end of line -
.match any character -
[]determines a group of characters,-interval,^is negation-
[a-z]match any character between a and z -
[0123456789]or[0-9]match any decimal digit -
[^A-Z]match all characters except all between A and Z
-
The environment (programming language, text editor) usually defines some specific abbreviations or macros.
-
\ddecimal digit -
\w,\Wword character, non-word character -
\s,\Sspace character, non-space character -
\b,\<,\>word boundary, beginning/end of a word
Meta-characters need to be escaped if they should be matched.
Regex \\d matches string \d, regex *+\+\? matches string *++?
Notation: example regular expressions are enclosed between two slashes /regex/.
/[a-z0-9_-]{3,16}/ # match a username
/[a-z0-9_-]{6,18}/ # match a password
/0x?([a-f0-9]+)/ # match a hexadecimal number
/(\d\d):(\d\d):(\d\d)/ # match a date in hh:mm:ss format
# match a web address
/(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?/Experiment with your own regular expressions at https://regexr.com/
Consider a string containing multiple space separated expressions. An expression is either a single word or a compound expression consisting of multiple words between double quotes.
Example input:
apple orange banana "honey pie" sun "high noon"
Example output:
all expressions: 6 compound expressions: 2

