首页 > 其他分享> 文章详细

Study Notes of CS:APP (Chapter 2)

2022-01-16 01:33:13 阅读：178 来源： 互联网

标签：Chapter Study earlier unsigned xw two CS bit complement

Study Notes of CS:APP

Part I Program Structure and Execution

How application programs are represented and executed.

Chapter 2 Representing and Manipulating Information

[22-01]

Two-valued Signals versus Decimal Notation

• Computers store and process information represented as two-valued signals.

• Lowly binary digits, or bits

• Decimal, or base-10, representation: natural for humans.

• Two-valued signals can readily be represented, stored, and transmitted

Encodings

• When we group bits together and apply some interpretation that gives meaning to the different possible bit patterns, we can represent the elements of any finite set.

• 3 most important representations of numbers:

• Unsigned encodings

• Based on traditional binary notation

• Represent numbers ≥ 0

• Two's-complement encodings

• Represent signed integers

• Either positive or negative

• Floating-point encodings

• Base-2 version of scientific notation for representing real numbers

• Arithmetic operations implemented

• Properties of arithmetic operations

• Properties of integer computer (vs. true integer) arithmetic

• Difference: Use a limited number of bits, hence some operations can overflow when the results are too large to be represented.

• Similarity: Multiplication is associative and commutative

• Properties of floating-point (vs. integer) arithmetic

• Different:

• Product of a set of positive numbers will always be positive

• Not associative due to the finite precision

• Stem from the difference in how they handle the finiteness

• Integer representations encode a comparatively small range of values precisely

• Floating-point representations encode a wide range of values approximately

• A number of computer security vulnerabilities have arisen due to some of the subtleties of computer arithmetic.

2.1 Information Storage [22-01]

Virtual Memory

• Computers use bytes (blocks of 8 bits) as the smallest addressable unit of memory.

• A machine-level program views memory as virtual memory (a very large array of bytes).

• Every byte of memory is identified by its address (a unique number).

• Virtual address space: set of all possible addresses

Storing Program Objects

• Program objects: program data, instructions, and control information

• Value of a pointer in C is the virtual address of the first byte of some block of storage.

• C compiler also associate type information with each pointer.

• Actual machine-level program simply treats each program object as a block of bytes and the program itself as a sequence of bytes.

2.1.1 Hexadecimal Notation

A Byte in Different Notations

• Byte = 8 bits

• Binary

• 00000000₂ to 11111111₂

• Decimal

• 0₁₀ to 255₁₀

• Hexadecimal (or "hex", base-16):

• Convenient for describing bit patterns

• Use '0' through '9' along with 'A' through 'F'

• 00₁₆ to FF₁₆

•

• In C, starts with 0x or 0X

• Example: write FA1D37B16 as 0xFA1D37B, as 0xfa1d37b, or even 0xFa1D37b

Converting Between Decimal, Binary, and Hexadecimal

• Converting between binary and hexadecimal

• Straightforward, by referring to chart

• Simple trick: Memorize digital equivalents of A, C, F

• Convert hexadecimal to binary

• Example:

• Hexadecimal: 0x173A4C

•

• Convert binary to hexadecimal

• Split binary into 4-bit groups and translate each

• Note: Make leftmost group with ≤4 bits

• Example:

• Binary: 1111001010110110110011

•

• When x = 2ⁿ, binary of x: 10…0₂ (n 0s).

• For n = i + 4j, where 0 ≤ i ≤ 3, hex of x: 0x2ⁱ0…0 (j 0s).

• Example: x = 2,048 = 2¹¹

• n = 11= 3 + 4 · 2

• Hexadecimal: 0x800

• Converting between decimal and hexadecimal: multiplication or division

• Convert decimal to hexadecimal

• Convert decimal x to hexadecimal:

• Let x = q · 16 + r

• Use r as the least significant digit and generate the remaining digits by repeating the process on q

• Example

• Decimal: 314,156

• Hexadecimal: 0x4CB2C

• Convert hexadecimal to decimal

• Multiply each of the hex digits by the appropriate power of 16

• Example

• Hexadecimal: 0x7AF

• Decimal: 7 · 162 + 10 · 16 + 15 = 7 · 256 + 10 · 16 + 15 = 1,967

2.1.2 Data Sizes

The Word Size

• Every computer has a word size (nominal size of pointer data)

• For w-bit machine, virtual addresses range from 0 to 2^w – 1

• There's been a widespread shift from 32-bit machines to 64-bit ones

• 32-bit: virtual address space of 4 GB

• 64-bit: virtual address space of 16 EB

• 1 exabyte = 2⁶⁰

• Distinction between 32-bit programs and 64-bit programs: compilation, rather than machine type

Data Formats

• Computers and compilers support multiple data formats using different ways to encode data.

• C supports multiple data formats for both integer and floating-point data.

• Integer data:

• Signed: negative, zero, and positive values

• Unsigned: nonnegative values

• char: single byte

• Stores character / integer values

• short, int, and long: provide a range of sizes.

• Pointer (e.g., char *): full word size

• Two different floating-point formats:

• float: single precision, 4 bytes

• double: double precision, 8 bytes

• Fixed-size integer types: int32_t and int64_t

• Used to have close control over data representations

• Signed and unsigned values

• Mostly signed

• Unsigned prefixed by unsigned or uint-.

• char: mostly signed but not guaranteed

• signed char: 1-byte signed value

• In many contexts, insensitive

• C allows a variety of ways:

• To order the keywords

• To include or omit optional keywords

• One aspect of portability: to make the program insensitive to the exact sizes of the different data types.

2.1.3 Addressing and Byte Ordering

• For multi-byte objects, we must establish two conventions:

• Addressing

• Byte ordering

Addressing

• Multi-byte object is stored as a contiguous sequence of bytes

• Address: smallest address of the bytes used

• Example: int x at 0x100

• (Assuming 32-bit) 4 bytes would be stored in 0x100, 0x101, 0x102, and 0x103

Byte Ordering

• Two common conventions.

• Consider w-bit integer [x_w −1, x_w −2, . . . , x₁, x₀], where x_w−1 is the most significant bit, x₀ the least.

• Assuming w is multiple of 8

• Most significant byte: [x_w−1, x_w−2, …, x_w−8]

• Least significant byte: [x₇, x₆, …, x₀]

• Other bytes: bits from the middle

• Little endian: least significant byte comes first

• Big endian: most significant byte comes first

• Example: int x at 0x100 has value of 0x01234567

• Ordering depends on the type of machine:

• Note: high-order byte is 0x01, while low-order byte is 0x67.

• Machines:

• Little-endian: most Intel-compatible machines, Android, iOS (OSs)

• Big-endian: most machines from IBM and Oracle

• Bi-endian: ARM microprocessors (hardware)

• Cases when byte-ordering becomes an issue:

1. When binary data are communicated over a network between different machines.

2. When looking at the byte sequences representing integer data.

• Occurs often when inspecting machine-level programs

• Example: line of a disassembly an Intel x86-64 processor

4004d3: 01 05 43 0b 20 00 add %eax,0x200b43(%rip)

• Generated by a disassembler

• Disassembler: tool that determines the instruction sequence represented by an executable program file

• Bytes are in reverse order for disassembly generated for little-endian machines

3. When programs are written that circumvent the normal type system.

• In C, using a cast or a union

• C code that uses casting to print the byte representations of program objects.

• size_t: preferred data type for expressing the sizes of data structures

• sizeof(T) returns # bytes required to store an object of type T.

• Writing portable code

• Running code on different machines

• Different machine/OS configurations use different conventions for storage allocation

2.1.4 Representing Strings

• String in C

• Encoded by an array of characters

• Each character is most commonly represented by the ASCII character code.

• ASCII code for decimal digit x happens to be 0x3x.

• Terminated by terminating byte (0x00)

2.1.5 Representing Code

• Binary code is seldom portable across different combinations of machine and OS.

• Fundamental concept: A program, from the perspective of the machine, is simply a sequence of bytes.

2.1.6 Introduction to Boolean Algebra

• Boolean algebra.

• Work of George Boole

• true, false encoded as 1, 0

Boolean Operations

• Simplest Boolean algebra is defined over {0, 1}.

• 4 Boolean operations

Boolean operation	Corresponding logical operation
~	not	¬
&	and	∧
\|	or	∨
^	exclusive-or	⊕

Boolean Operations over Bit Vectors

• Extending the 4 Boolean operations to bit vectors (strings of 0's and 1's of fixed length w).

• Defined according to applications to the matching elements

• Examples

• Application: Represent finite sets

• Encode A ⊆ {0, 1, …, w− 1} with [a_w₋₁, …, a₁, a₀], a_i= 1 if and only if i ∈ A.

• Example: a = [01101001] encodes A = {0, 3, 5, 6}, b = [01010101] encodes B = {0, 2, 4, 6}.

• Corresponding set operations of Boolean operations

Boolean operation	Corresponding set operation
\|	Set union
&	Set intersection
~	Set complement

• Continuing the example: a & b yields [01000001], A ∩ B = {0, 6}.

• Practical applications example:

• Signal can interrupt the execution of a program.

• Applications can explicitly block and unblock selected signals by specifying the blocked bit-vector mask

2.1.7 Bit-Level Operations in C

• C supports bitwise Boolean operations.

• Symbols

C symbol	Corresponding logical operation
~	not
&	and
\|	or
^	exclusive-or

• Can be applied to any "integral" data type

• Examples: expression evaluation for char

• Evaluating bit-level expression:

(1) Expand hexadecimal to binary

(2) Perform the operations

(3) Convert back to hexadecimal

• Use: Implement masking operations

• Mask: bit pattern that indicates a selected set of bits within a word.

• Example:

• x & 0xFF: least significant byte of x and 0's (all others).

• Examples:

• x = 0x89ABCDEF: 0x000000EF

• ~0: all 1's

2.1.8 Logical Operations in C

• Logical operators provided by C

Logical operator	Corresponding logic operation
\|\|	or
&&	and
!	not

• Argument: nonzero is true, 0 is false

• Return either 1 (true) or 0 (false)<span style="font-variant: small-caps">

• </span>Examples of expression evaluation

•

Bitwise Operations vs. their Logical Counterparts

• Matches only when arguments are restricted to 0 or 1.

• '&&' and '||' do not evaluate 2nd argument if result of can be determined by evaluating 1st

• Examples: a && 5/a and p && *p++ will never cause an exception

2.1.9 Shift Operations in C

• Shift operations shift bit patterns to the left and to the right.

• x: [x_w₋₁, x_w₋₂, …, x₀]

• Left shift operation: x << k

• [x_w_−k−1, x_w_−k−2, …, x₀, 0, …, 0]

• x is shifted k bits to the left, dropping off k x_w₋₁s and filling the right end with k 0's.

• 0 ≤ k ≤ w − 1

• Associate from left to right:

• x << j << k ⇔ (x << j) << k

• Right shift operation: x >> k (2 forms)

• Logical

• Left end is filled with k 0's

• [0, …, 0, x_w₋₁, x_w₋₂, …, x_k]

• Arithmetic

• Left end is filled with k x_w₋₁s

• [x_w₋₁, …, x_w₋₁, x_w₋₁, x_w₋₂, …, x_k]

• Examples (italicized fill):

• In C, no precise definition, either

• In practice

• For signed data, arithmetic

• For unsigned data, logical (must)

• Java definition

• x >> k: arithmetic

• x >>> k: logical

2.2 Integer Representations [earlier]

2.2.1 Integral Data Types [earlier]

Integral data types represent finite ranges of integers.

• Size: char, short, long

• All nonnegative or possibly negative: unsigned or the default

• The only machine-dependent: long (8-byte with 64-bit, 4-byte with 32-bit)

• Asymmetric

• ≤ the typical

• Symmetric except the fixed-size data types

• int could be 2-byte (mostly for 16-bit), long can be 4-byte (typically for 32-bit)

• The fixed-size data types

• Ranges = those of typical numbers

• Asymmetric

2.2.2 Unsigned Encodings [earlier]

Consider an integer data type of w bits.

• A bit vector is written as:

• , to denote the entire vector

• [x_w₋₁, x_w₋₂, ..., x₀], to denote the individual bits within the vector

• The unsigned interpretation of : treated as binary.

• x_i = 0 or 1 (2ⁱ is part of the value).

principle: Definition of unsigned encoding

For vector = [x_w₋₁, x_w₋₂, ..., x₀]:

B2U_w() ≐ x_i2ⁱ

• ≐: the left-hand side is defined to be equal to the right-hand side.

• Examples:

• UMax_w ≐ 2ⁱ =2^w − 1

• A mapping: B2U_w: {0, 1}^w→{0, … , UMax_w}.

principle: Uniqueness of unsigned encoding

Function B2U_w is a bijection.

• Bijection: a function f that goes two ways: it maps a value x to a value y where y = f (x), but it can also operate in reverse, since for every y, there is a unique value x such that f (x) = y, which is given by the inverse function x = f⁻¹(y).

• U2B_w: the inverse of B2U_w

2.2.3 Two's-Complement Encodings [earlier]

Two's-complement form: the most common representation of signed numbers

principle: Definition of two's-complement encoding

For vector = [x_w₋₁, x_w₋₂, ..., x₀]:

B2T_w() ≐ −x_w₋₁2^w−1 +x_i2ⁱ

The sign bit: x_w₋₁

• "Weight": −2^w−1

• 1: negative; 0: nonnegative.

• TMin_w ≐ −2^w−1, TMax_w ≐ 2ⁱ =2^w−1 − 1.

• A mapping: B2T_w: {0, 1}^w →{TMin_w, …, TMax_w}.

principle: Uniqueness of two's-complement encoding

Function B2T_w is a bijection.

• T2B_w: the inverse of B2T_w

Drop w: UMax, TMin, and TMax

Points worth highlighting:

• Asymmetric Range: |TMin| = |TMax| + 1

• UMax = 2TMax + 1

• −1: same representation as UMax

• 0: a string of all 0's in both

Two's-complement in languages:

• Not required by C. <limits.h> defines a set of constants delimiting the ranges of the different integer data types for the particular machine.

• Example: for a two's-complement machine, INT_MAX = TMax_w, INT_MIN = TMin_w and UINT_MAX = UMax_w.

• Required by Java.

Example to get a better understanding:

2.2.4 Conversions between Signed and Unsigned [earlier]

Casting example:

• int x, unsigned u

• (unsigned) x converts x to unsigned, (int) u converts u to int.

A general rule of handling conversions between signed and unsigned numbers with the same word size—the numeric values might change, but the bit patterns do not.

U2B_w and T2B_w [earlier]

• U2B_w

• 0 ≤ x ≤ UMax_w, U2B_w(x)

• Unique unsigned

• T2B_w

• TMin_w ≤ x ≤ TMax_w, T2B_w(x)

• Unique two's-complement

T2U_w and U2T_w [earlier]

• T2U_w

• TMin_w ≤ x ≤ TMax_w, T2U_w(x) ≐ B2U_w(T2B_w(x))

• 0 ≤ T2U_w(x) ≤ UMax_w, same representation

principle: Conversion from two's complement to unsigned

For x such that TMin_w ≤ x ≤ TMax_w:

T2U_w(x) =

derivation: Conversion from two's complement to unsigned

B2U_w(T2B_w(x)) = T2U_w(x) = x + x_w₋₁2^w

In a two's-complement representation of x, bit x_w₋₁ determines whether or not x is negative.

• Examples:

• The behavior of T2U:

• < 0: converted to large positive

• ≥ 0: unchanged

• U2T_w(u)

• U2T_w 0 ≤ x ≤ UMax_w, U2T_w(x) ≐ B2T_w (U2B_w(x))

• 0 ≤ U2T_w(x) ≤ UMax_w, same representation

principle: Unsigned to two's-complement conversion

For u such that 0 ≤ u ≤ UMax_w:

U2T_w(u) =

derivation: Unsigned to two's-complement conversion

U2T_w(u) = −u_w₋₁2^w + u

In the unsigned representation of u, bit u_w₋₁ determines whether or not u is greater than TMax_w

= 2^w−1 − 1.

• The behavior of U2T:

• ≤ TMax_w, unchanged

• > TMax_w, converted to negative

Summary [earlier]

The effects of converting in both directions:

• 0 ≤ x ≤ TMax_w

• T2U_w(x) = x, U2T_w(x) = x, identical

• Outside

• +/− 2^w

• 2 extremes:

• T2U_w (−1) = UMax_w

• T2U_w(TMin_w) = TMax_w + 1

2.2.5 Signed versus Unsigned in C [earlier]

Numbers

• Signed: by default

• Example: 12345 or 0x1A2B

• Unsigned: adding 'U' or 'u' as a suffix

• Example: 12345U or 0x1A2Bu

Conversion between:

• Allowed but not specified. Mostly U2T_w and T2U_w.

• Explicit casting and implicit casting (when an expression of one type is assigned to a variable of another)

• printf does not use type information.

• Possibly nonintuitive behavior:

2.2.6 Expanding the Bit Representation of a Number [earlier]

One common operation: to convert between integers having different word sizes while retaining the same numeric value. Converting from a smaller to a larger data type should always be possible.

Converting Unsigned to Larger [earlier]

Zero extension: adding leading zeros to the representation.

principle: Expansion of an unsigned number by zero extension

Define bit vectors = [u_w₋₁, u_w₋₂, …, u₀] of width w and ′ = [0, …, 0, u_w₋₁, u_w₋₂, …, u₀] of width w, where w′>w. Then B2U_w_′() = B2U_w(′).

Converting Two's-complement to Larger [earlier]

Sign extension: adding copies of the most significant bit to the representation.

principle: Expansion of a two's-complement number by sign extension

Define bit vectors = [x_w₋₁, x_w₋₂, …, x₀] of width w and ′ = [x_w₋₁, …, x_w₋₁, x_w₋₁, x_w₋₂, …, x₀] of width w, where w′>w. Then B2T_w (x) = B2T_w_′(x).

• The value is preserved:

derivation: Expansion of a two's-complement number by sign extension

• The relative order of conversion: The program first changes the size and then the type.

2.2.7 Truncating Numbers [earlier]

Truncating = [x_w₋₁, x_w₋₂, …, x₀] to k-bit: drop the high-order w − k bits, ′ = [x_k₋₁, x_k₋₂, …, x₀]. Truncating a number can alter its value—a form of overflow.

Truncating Unsigned [earlier]

principle: Truncation of an unsigned number

Let be the bit vector [x_w₋₁, x_w₋₂, …, x₀], and let ′ be the result of truncating it to k bits: ′ = [x_k₋₁, x_k₋₂, …, x₀]. Let x = B2U_w() and x′ = B2U_k(′). Then x = x′ mod 2^k.

Truncating Two's-complement [earlier]

principle: Truncation of a two's-complement number

Let be the bit vector [x_w₋₁, x_w₋₂, …, x₀], and let ′ be the result of truncating it to k bits: ′ = [x_k₋₁, x_k₋₂, …, x₀]. Let x = B2U_w() and x′ = B2T_k(′). Then x = U2T_k(x mod 2^k).

derivation: Truncation of a two's-complement number

B2T_w ([x_w₋₁, x_w₋₂, …, x₀]) mod 2^k = B2U_k ([x_k₋₁, x_k₋₂, …, x₀])

Summary [earlier]

The effect of truncation

• For unsigned:

B2U_k([x_k₋₁, x_k₋₂, …, x₀]) = B2U_w([x_w₋₁, x_w₋₂, …, x₀]) mod 2^k

• For two's-complement:

B2T_k([x_k₋₁, x_k₋₂, …, x₀]) = U2T_k(B2U_w ([x_w₋₁, x_w₋₂, …, x₀]) mod 2^k)

2.2.8 Advice on Signed versus Unsigned [earlier]

The implicit conversion can lead to errors or vulnerabilities.

• One way to avoid: to never use unsigned.

• Example: Java.

Unsigned values are useful

• When words = collections of bits. Example:

• Packing a word with flags describing various Boolean conditions

• Addresses (naturally unsigned)

• When implementing mathematical packages for modular / multiprecision arithmetic.

2.3 Integer Arithmetic [earlier]

2.3.1 Unsigned Addition [earlier]

"Word size inflation":

• Some programming languages support arbitrary size arithmetic;
More commonly, programming languages support fixed-size arithmetic.

x y for arguments x and y, where 0 ≤ x, y < 2^w: the result of truncating the integer sum x + y to be w bits long, viewed as an unsigned number.

• Characterized as a form of modular arithmetic: discarding any bits with weight > 2^w−1

principle: Unsigned addition

For x and y such that 0 ≤ x, y < 2^w:

x y =

• Illustration:

Overflow: an arithmetic operation whose integer result cannot fit within the word size limits of the data type.

• Occurs when the two operands sum to 2^w or more

• Not signaled as errors

principle: Detecting overflow of unsigned addition

For x and y in the range 0 ≤ x, y ≤ UMax_w, let s ≐ x y. Then the computation of s overflowed if and only if s < x (or equivalently, s < y).

Modular addition forms an abelian group (a mathematical structure).

• Commutative and associative

• The identity element: 0; every element has an additive inverse

• Value x for every value x: x x = 0

principle: Unsigned negation

For any number x such that 0 ≤ x < 2^w, its w-bit unsigned negation x is given by the following:

x =

2.3.2 Two's-Complement Addition [earlier]

x y, given integer values x and y where −2^w−1≤ x, y ≤ 2^w−1 − 1: the result of truncating the integer sum x + y to w bits, viewed as a two's-complement number.

principle: Two's-complement addition

For integer values x and y in the range −2^w−1≤ x, y ≤ 2^w−1 − 1:

• Illustration:

• Positive overflow: x + y exceeds TMax_w (case 4)

• Negative overflow: x + y is less than TMin_w (case 1)

• Has the same bit-level representation as the unsigned sum

• Examples:

principle: Detecting overflow in two's-complement addition

For x and y in the range TMin_w ≤ x, y ≤ TMax_w, let s ≐ x y. Then the computation of s has had positive overflow if and only if x > 0 and y > 0 but s ≤ 0. The computation has had negative overflow if and only if x < 0 and y <0 but s ≥ 0.

• Illustrations:

2.3.3 Two's-Complement Negation [earlier]

x: the additive inverse under

principle: Two's-complement negation

For x in the range TMin_w ≤ x ≤ TMax_w, its two's-complement negation x is given by the formula

x =

2.3.4 Unsigned Multiplication [earlier]

x y for integers x and y where 0 ≤ x, y ≤ 2^w − 1: the result of truncating the 2w-bit product x · y to w bits, viewed as an unsigned number.

principle: Unsigned multiplication

For x and y such that 0 ≤ x, y ≤ UMax_w:

x y = (x · y) mod 2^w

2.3.5 Two's-Complement Multiplication [earlier]

x y for integers x and y where −2^w-1 ≤ x, y ≤ 2^w-1 − 1: the result of truncating the 2w-bit product x · y to w bits, viewed as a two's-complement number.

principle: Two's-complement multiplication

For x and y such that TMin_w ≤ x, y ≤ TMax_w:

x y = U2T_w((x · y) mod 2^w)

principle: Bit-level equivalence of unsigned and two's-complement multiplication

Let and be bit vectors of length w. Define integers x and y as the values represented by these bits in two's-complement form: x = B2T_w(x) and y = B2T_w(y). Define nonnegative integers x′ and y′ as the values represented by these bits in unsigned form: x = B2U_w(x) and y = B2U_w(y). Then

T2B_w(x y) = U2B_w(x′ y′)

• Illustrations:

2.3.6 Multiplying by Constants [earlier]

Integer multiply is slow. Optimization: to replace multiplications by constants with combinations of shift and addition operations.

principle: Multiplication by a power of 2

Let x be the unsigned integer represented by bit pattern [x_w₋₁, x_w₋₂, …, x₀]. Then for any k ≥ 0, the w + k-bit unsigned representation of x^2k is given by [x_w₋₁, x_w₋₂, …, x₀, 0, …, 0], where k zeros have been added to the right.

principle: Unsigned multiplication by a power of 2

For C variables x and k with unsigned values x and k, such that 0 ≤ k < w, the C expression x << k yields the value x 2^k.

principle: Two's-complement multiplication by a power of 2

For C variables x and k with two's-complement value x and unsigned value k, such that 0 ≤ k < w, the C expression x << k yields the value x 2^k.

The task of generating code for the expression x * K, for some constant K.

• K: [(0…0) (1…1) (0…0) . . . (1…1)].

• Compute a run of 1's from bit position n down to bit position m (n ≥ m) using either form:

• Form A: (x<<n) + (x<<(n − 1)) + . . . + (x<<m)

• Form B: (x<<(n + 1)) - (x<<m)

• Compute x * K by adding together the results for each run.

2.3.7 Dividing by Powers of 2 [earlier]

Performed using a right shift:

• Logical: unsigned

• Arithmetic: two's-complement

Integer division always rounds toward 0. Some notation for any real number a:

• ⌊a⌋: the unique integer a′ such that a′ ≤ a < a′ + 1.

• ⌈a⌉: the unique integer a′ such that a′ − 1< a′ ≤ a′.

Dividing by a Power of 2 with Unsigned Arithmetic [earlier]

principle: Unsigned division by a power of 2

For C variables x and k with unsigned values x and k, such that 0 ≤ k < w, the C expression x >> k yields the value ⌊x/2^k⌋.

• Examples:

Dividing by a Power of 2 with Two's-complement Arithmetic [earlier]

Using an arithmetic right shift:

principle: Two's-complement division by a power of 2, rounding down

Let C variables x and k have two's-complement value x and unsigned value k, respectively, such that 0 ≤ k < w. The C expression x >> k, when the shift is performed arithmetically, yields the value ⌊x/2^k⌋.

• Examples:

Correcting for the improper rounding that occurs when a negative number is shifted right by "biasing" the value before shifting.

principle: Two's-complement division by a power of 2, rounding up

Let C variables x and k have two's-complement value x and unsigned value k, respectively, such that 0 ≤ k < w. The C expression (x + (1 << k) - 1) >> k, when the shift is performed arithmetically, yields the value ⌈x/2^k⌉.

• Demonstration:

(x<0 ? x+(1<<k)-1 : x) >> k will compute x/2^k.

2.3.8 Final Thoughts on Integer Arithmetic [earlier]

2.4 Floating Point [earlier]

2.4.1 Fractional Binary Numbers [earlier]

2.4.2 IEEE Floating-Point Representation [earlier]

2.4.3 Example Numbers [earlier]

2.4.4 Rounding

2.4.5 Floating-Point Operations [earlier]

2.4.6 Floating Point in C [earlier]

2.5 Summary [earlier]

Computers encode information as bits, generally organized as sequences of bytes. Different encodings are used for representing integers, real numbers, and character strings. Different models of computers use different conventions for encoding numbers and for ordering the bytes within multi-byte data.

The C language is designed to accommodate a wide range of different implementations in terms of word sizes and numeric encodings. Machines with 64-bit word sizes have become increasingly common, replacing the 32-bit machines that dominated the market for around 30 years. Because 64-bit machines can also run programs compiled for 32-bit machines, we have focused on the distinction between 32- and 64-bit programs, rather than machines. The advantage of 64-bit programs is that they can go beyond the 4 GB address limitation of 32-bit programs.

Most machines encode signed numbers using a two's-complement representation and encode floating-point numbers using IEEE Standard 754. Understanding these encodings at the bit level, as well as understanding the mathematical characteristics of the arithmetic operations, is important for writing programs that operate correctly over the full range of numeric values.

When casting between signed and unsigned integers of the same size, most C implementations follow the convention that the underlying bit pattern does not change. On a two's-complement machine, this behavior is characterized by functions T2U_w and U2T_w, for a w-bit value. The implicit casting of C gives results that many programmers do not anticipate, often leading to program bugs.

Due to the finite lengths of the encodings, computer arithmetic has properties quite different from conventional integer and real arithmetic. The finite length can cause numbers to overflow, when they exceed the range of the representation. Floating-point values can also underflow, when they are so close to 0.0 that they are changed to zero.

The finite integer arithmetic implemented by C, as well as most other programming languages, has some peculiar properties compared to true integer arithmetic. For example, the expression x*x can evaluate to a negative number due to overflow. Nonetheless, both unsigned and two's-complement arithmetic satisfy many of the other properties of integer arithmetic, including associativity, commutativity, and distributivity. This allows compilers to do many optimizations. For example, in replacing the expression 7*x by (x<<3)-x, we make use of the associative, commutative, and distributive properties, along with the relationship between shifting and multiplying by powers of 2.

We have seen several clever ways to exploit combinations of bit-level operations and arithmetic operations. For example, we saw that with two's-complement arithmetic, ~x+1 is equivalent to -x. As another example, suppose we want a bit pattern of the form [0, …, 0, 1, …, 1], consisting of w − k zeros followed by k ones. Such bit patterns are useful for masking operations. This pattern can be generated by the C expression (1<<k)-1, exploiting the property that the desired bit pattern has numeric value 2^k − 1. For example, the expression (1<<8)-1 will generate the bit pattern 0xFF.

Floating-point representations approximate real numbers by encoding numbers of the form x × 2^y. IEEE Standard 754 provides for several different precisions, with the most common being single (32 bits) and double (64 bits). IEEE floating point also has representations for special values representing plus and minus infinity, as well as not-a-number.

Floating-point arithmetic must be used very carefully, because it has only limited range and precision, and because it does not obey common mathematical properties such as associativity.

标签：Chapter,Study,earlier,unsigned,xw,two,CS,bit,complement
来源： https://www.cnblogs.com/zjnu/p/csapp_2.html

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

Study Notes of CS:APP (Chapter 2)

Part I Program Structure and Execution

Chapter 2 Representing and Manipulating Information

[22-01]

Two-valued Signals versus Decimal Notation

Encodings

2.1 Information Storage [22-01]

Virtual Memory

Storing Program Objects

2.1.1 Hexadecimal Notation

A Byte in Different Notations

Converting Between Decimal, Binary, and Hexadecimal

2.1.2 Data Sizes

The Word Size

Data Formats

2.1.3 Addressing and Byte Ordering

Addressing

Byte Ordering

2.1.4 Representing Strings

2.1.5 Representing Code

2.1.6 Introduction to Boolean Algebra

Boolean Operations

Boolean Operations over Bit Vectors

2.1.7 Bit-Level Operations in C

2.1.8 Logical Operations in C

Bitwise Operations vs. their Logical Counterparts

2.1.9 Shift Operations in C

2.2 Integer Representations [earlier]

2.2.1 Integral Data Types [earlier]

2.2.2 Unsigned Encodings [earlier]

2.2.3 Two's-Complement Encodings [earlier]

2.2.4 Conversions between Signed and Unsigned [earlier]

U2Bw and T2Bw [earlier]

T2Uw and U2Tw [earlier]

Summary [earlier]

2.2.5 Signed versus Unsigned in C [earlier]

2.2.6 Expanding the Bit Representation of a Number [earlier]

Converting Unsigned to Larger [earlier]

Converting Two's-complement to Larger [earlier]

2.2.7 Truncating Numbers [earlier]

Truncating Unsigned [earlier]

Truncating Two's-complement [earlier]

Summary [earlier]

2.2.8 Advice on Signed versus Unsigned [earlier]

2.3 Integer Arithmetic [earlier]

2.3.1 Unsigned Addition [earlier]

2.3.2 Two's-Complement Addition [earlier]

2.3.3 Two's-Complement Negation [earlier]

2.3.4 Unsigned Multiplication [earlier]

2.3.5 Two's-Complement Multiplication [earlier]

2.3.6 Multiplying by Constants [earlier]

2.3.7 Dividing by Powers of 2 [earlier]

Dividing by a Power of 2 with Unsigned Arithmetic [earlier]

Dividing by a Power of 2 with Two's-complement Arithmetic [earlier]

2.3.8 Final Thoughts on Integer Arithmetic [earlier]

2.4 Floating Point [earlier]

2.4.1 Fractional Binary Numbers [earlier]

2.4.2 IEEE Floating-Point Representation [earlier]

2.4.3 Example Numbers [earlier]

2.4.4 Rounding

2.4.5 Floating-Point Operations [earlier]

2.4.6 Floating Point in C [earlier]

2.5 Summary [earlier]

U2B_w and T2B_w [earlier]

T2U_w and U2T_w [earlier]