CPUs and Registers

CPUs and Registers

TL; DR

·      CPU architecture defines how instructions are executed:

o  CISC = complex but fewer instructions

o  RISC = simple but more instructions

·      Processor bitness (16-bit, 32-bit, 64-bit, etc.) depends on the size of registers
and memory addresses, not the instruction length.

·      Registers are the CPU’s fastest memory:

o  General-Purpose Registers store operands, intermediate results, and memory addresses for computations.

o  Instruction Pointer (IP/PC) tracks the next instruction.

o  Flags Register tracks CPU state for decision-making (Zero, Sign, Carry, Overflow, Parity, Auxiliary).

·      Implementation evolution:

8-bit CPUs: single 8-bit A register.

o  16-bit CPUs: AX (16-bit), accessible as AL (low 8 bits) and AH (high 8 bits);
only AX, BX, CX, DX allow high/low byte access.

o  32-bit CPUs: EAX (32-bit), still accessible as AX, AL, AH; same restrictions for
EAX, EBX, ECX, EDX.

o  64-bit CPUs: RAX (64-bit), accessible as RAX (64-bit), EAX (32-bit), AX
(16-bit), AL/AH (8-bit); new low-byte registers added for other General Purpose Registers – SI, DI, SP, BP – low 8 bits only.

·      Types of registers: General-Purpose, Special-Purpose, Segment Registers (CS, DS, SS,
ES, FS, GS). Some registers not covered here for simplicity. 

·      AMD introduced 64-bit architecture and created a new naming convention.

Intro

This is a very basic post about CPUs and registers. It discusses the evolution of CPUs over time and how the implementation of registers has changed. This information serves as a starting point for learning assembly language and reverse engineering. Efforts were made to avoid unexplained terms, but you may come across some unfamiliar terms, such as the stack, in a few places. Please don’t worry if some terms are unclear; they will be covered in future posts. If you are completely new to this topic, you may not fully understand the uses of different registers, and that’s perfectly fine. These are concepts that one becomes comfortable with through practice and familiarity. Use this post as a starting point to introduce yourself to the very basics and to have an idea about CPU architecture and the types of registers and their implementation across different processors.   

Memory Hierarchy

CPU throughout their working requires different types of memory based on the accessibility. Having the same type of memory for all use cases won’t be efficient. Using only the fastest memory type for all use cases like storing a movie would definitely be inefficient, as these fast-accessible memories are expensive, and having gigabytes of such memory would be costly. Volatility also plays a role. Not all information needs to persist in memory. So, the usage of
non-volatile memory in all scenarios is not the right way, as it’s slow and pointless to use slow non-volatile storage while sacrificing speed. This why there are different types of memories that the CPU can make use of based on the need. This is why the CPU utilizes different types of memory depending on the specific requirements.

Memory Hierarchy:

1.     CPU Registers

2.     Cache Memory

3.     Main Memory (RAM)

4.     Secondary Storage (Hard drives, SSDs)

5.     Tertiary Storage (Magnetic tapes, optical
disks)

Use Cases for Each Level

·      CPU Registers: These are the smallest and fastest volatile memory located inside the CPU. The CPU uses registers to hold the data, instructions, and addresses it is actively processing. Registers provide the immediate workspace for instruction execution, enabling extremely fast operations critical for assembly language and low-level programming.

·      Cache Memory: Cache acts as a fast-access intermediary storing frequently used data and instructions recently fetched from main memory. The CPU first checks cache for needed data to minimize the latency of memory access, which helps accelerate instruction execution when registers do not contain the required data.

·      Main Memory (RAM): RAM stores programs and data currently in use. When the required data is not present in cache, CPU fetches it from RAM. While slower than cache, RAM provides a large, volatile storage space for active processes and running applications.

·      Secondary Storage: This includes hard disk drives and solid-state drives that provide non-volatile long-term data storage. The CPU accesses secondary storage via slower I/O
processes only when data is not found in main memory.

·      Tertiary Storage: Used primarily for archival or backup purposes, tertiary storage such as magnetic tapes or optical disks are very large but slow and rarely accessed directly by
the CPU, primarily supporting long-term retention.

CPU Architectures

The history of CPU architecture is a very interesting journey especially if you see the evolution from 4-bit to 64-bit processors. The architecture of a CPU defines how a processor executes instructions and performs computations, balancing speed, power consumption, complexity, and efficiency. CPU architectures fall into two main categories:

·      CISC (Complex Instruction Set Computing) architectures: These processors are designed to handle complex instructions, allowing them to complete tasks with fewer instructions.
However, this complexity leads to higher power consumption. Because of their ability to process intricate tasks efficiently despite using more power, CISC processors are well-suited for personal computing and are commonly found in desktops, laptops, and servers.

·      RISC (Reduced Instruction Set Computing) architectures: These processors use simpler instructions and often require more instructions to complete a task. However, this simplicity allows them to execute instructions faster and use less power. RISC architectures like ARM and RISC-V have become very popular, especially in mobile devices, embedded systems, and are increasingly being used in laptops and servers.

In addition to the classification based on the type of instruction set (CISC and RISC), processors are also classified by their bitness. Historically, CPUs started with 4-bit designs (used in early calculators like the Intel 4004), then moved to 8-bit (such as the Intel 8080 and the MOS 6502, common in early home computers), followed by 16-bit (like the Intel 8086, which powered the first
IBM PCs). The industry later shifted to 32-bit processors (such as Intel’s 80386 and the Pentium series), and today, most modern processors are 64-bit (like Intel Core i7, AMD Ryzen, and ARM-based Apple M-series chips).

Each step in this progression allowed processors to handle larger numbers, access bigger
memory spaces, and perform more complex operations more efficiently.

Understanding Instruction Size and Processor Bitness

The bitness of a processor (16-bit, 32-bit, 64-bit, etc.) defines how much data the CPU can
handle in a single operation, which depends on the size of its registers, as well as the size of the memory addresses it can work with.

·      A 16-bit processor can work with data chunks and memory addresses up to 16 bits (2
bytes) wide.

·      A 32-bit processor can work with up to 32 bits (4 bytes).

A 64-bit processor can work with up to 64 bits (8 bytes).

There can be a misunderstanding that when we say a processor has a 16-bit or 32-bit
instruction set, it means every instruction is exactly 16 or 32 bits long. Instead, it refers to the default size of operands and memory addresses that the processor is designed to work with.

The actual length of an instruction in machine code is variable. When we say the length of
an instruction, it is not just the size of the instruction itself but also includes the size of its operands and any memory addresses it references. 

On 32-bit processors, an instruction can be anywhere from 1 byte up to 15 bytes in size. 15
bytes equals 120 bits, which is clearly larger than 32 bits.

This shows that instruction size is not the same as processor bitness—what matters is the
size of the data the instruction is handling, not the total bytes taken by the instruction itself.

For example:

·      MOV 5 → Here, 5 is the operand (the value being moved). Since it is a small constant, the instruction is short and only takes a few bytes.

·      MOV [12345678], 5 → In this case, the operand 5 must be stored at a specific memory address (12345678). To represent this instruction, the machine code must include the entire memory address along with the value, so the instruction becomes much longer.

But in both cases, if the processor is 32-bit, the size of the operand (5) or the memory address (12345678) cannot exceed 32 bits.

Registers

When a CPU carries out a task, it uses registers in several ways:

1.     Holding Operands: Registers store the data values (operands) that will be used in arithmetic or logical operations. For example, to add two numbers, the CPU loads them into registers first.

2.     Storing Instructions: Some registers hold the current instruction being executed or point to the next instruction to fetch (instruction pointer).

3.     Addressing Memory: Registers can hold memory addresses, helping the CPU quickly locate data in RAM or cache.

4.     Temporary Results: During computations, intermediate results are kept in registers for quick access as the CPU works through instructions.

5.     Control and Status: Certain registers keep track of the state of the CPU and control the flow of operations, such as flags that indicate conditions like zero or overflow.

By using registers to hold data close to the processing units, the CPU avoids slower memory accesses, greatly speeding up task execution.

Types of Registers

Different registers serve different purposes, and t we usually group them based on their function. But this grouping isn’t strict—some registers might be called general-purpose by some sources, while others might list them as special-purpose. Before proceding, this does not cover all existing
registers
—it focuses only on the ones most relevant for understanding CPU operations and the topics discussed here.

Types of registers:

·      General-Purpose Registers

·      Special-Purpose Registers

·      Segment Registers

General Purpose Registers

These registers are used to temporarily hold data that the CPU is actively working on. They hold temporary data, operands, intermediate results, memory addresses, and control information during program execution. They are the main working registers used by the CPU for most computations.

·      AX (Accumulator): Used for arithmetic and logic operations. Result data from operations or syscalls stored in this register.

·      BX (Base): Often serves as a base pointer for memory references to data that is to be read or written.

·      CX (Counter): Counter register Used mainly for loop counting and string operations.

·      DX (Data): Used for I/O operations and extended arithmetic.

·      SI (Source Index) and DI (Destination Index): Used for string and array manipulation.

·      BP (Base Pointer): Points to the base of the current stack frame for accessing function variables.

·      SP (Stack Pointer): Points to the top of the stack, essential for managing function calls and local variables.

·      R8 to R15: Additional registers in 64-bit architectures for increased operational capacity.

Special Purpose Registers

·      IP (Instruction Pointer): The CPU has to know which instruction to execute next. This register, also known as the Program Counter (PC), keeps track of the address of the next instruction. Each time the CPU finishes an instruction, the instruction pointer updates to the address of the following

instruction in the program. This allows the CPU to step through instructions one by one in the correct order. If the program needs to jump to a different part (for example, in loops, branches, or function calls), the instruction pointer is updated to point to the new location, so the CPU knows where to continue.

·      Flags: The flag or status register is a special register in the CPU that is used for decision-making and flow control in CPU operations, allowing instructions to “know” what happened previously and respond accordingly. It keeps track of important information about the outcome of operations and the current state of the processor. It consists of individual bits called “flags,” and each flag records a specific condition, such as whether the result of a calculation was zero.

These flags are automatically set or cleared by the CPU after arithmetic or logic instructions. One simple use case of this register is the ‘JMP’ instruction, which alters the flow of the program. Basically, all the “if-else” statements in our high-level code make use of this register underneath.

Common flags include:

o  Zero Flag (ZF): Set if an operation’s result is zero.

o  Sign Flag (SF): Shows if the result is negative.

o  Carry Flag (CF): Indicates if an arithmetic operation produced a carry out or borrow into the highest bit.

o  Overflow Flag (OF): Shows if an operation produced a result too large for the register.

o  Parity Flag (PF): Set if the number of set bits in the result is even.

o  Auxiliary Carry Flag (AF): Used in specialized arithmetic (e.g., BCD).

Segment Registers

Segment registers are special registers in the CPU that hold the base (starting) addresses of specific memory segments.

The segments Registers are:

·      Code Segment (CS): Contains the executable program instructions.

·      Data Segment (DS): Stores variables and program data.

·      Stack Segment (SS): Used for the stack, which handles function calls and local variables.

·      Extra Segment (ES), FS, GS: Additional data segments used for various purposes.

Each segment register stores the base address of its corresponding segment. When the CPU
accesses memory, it combines the segment base address from the segment register with an offset address to calculate the full physical memory location.

One thing to keep in mind is that we were now looking at the case where the memory is
segmented. But there’s another case where the memory is not segmented and this is decided based on the memory model the CPU uses. Memory models say how the memory is laid out in a CPU. There are two types – Segmented and Flat.

Segmented Memory Model

The memory is segmented into different segments. These segments are logical divisions of the
system memory, each used to organize different types of data or instructions. As discussed above, the CPU combines segment registers with offsets to calculate physical addresses. This segmentation was essential in early CPUs with limited address spaces.

Flat Memory Model

The memory is treated as one continuous block. The CPU then uses simple linear addresses for
memory access, simplifying programming and increasing flexibility. Here, segment registers are typically all set to point to the same base address (often zero), effectively disabling segmentation for software. Segment registers remain present mainly for backward compatibility.

Implementation of Registers

Now that we know what registers and the different CPU architectures are, let’s go ahead to 
see how registers have been implemented and how it has been extended to be used in 16-bit and other successive processor. 

8-bit Architecture

Initially in the Intel 8-bit 8008 processor (pre-x86), there was only an 8-bit A register.

16-bit Architecture

With the 16-bit 8086 architecture, this evolved into a 16-bit AX register which was extended from the previous 8-bit A, which stands for “A-extended,”. The AX register could still be accessed as two separate 8-bit registers: AL (lower 8 bits) and AH (higher 8 bits). However, individual access to the 8-bit registers (higher or lower) is only restricted to the four AX, BX, CX, and DX registers
and not the rest.

32-bit Architecture

With the 32-bit architecture, AX was further extended to EAX (Extended AX), creating a 32-bit register. The assembly instructions could still access the 16-bit AX or the 8-bit AL and AH parts. Similar to the 16-bit architecture, individual access to the 8-bit registers (higher or lower) is only restricted to the four EAX, EBX, ECX, and EDX registers and not the rest.

64-bit Architecture

AMD was the first to develop the 64-bit architecture, which was later adopted and licensed
by Intel. In AMD’s 64-bit extension, EAX was extended to RAX, a 64-bit register where “R” stands for “register” or “really wide.” In this architecture, it is possible to access RAX (64-bit), EAX (32-bit), AX (16-bit), as well as the 8-bit AL and AH registers.

However, AMD also introduced the ability to individually access the least significant (lower)
8 bits of all general-purpose registers, not just AX, BX, CX, and DX. This was done by adding new 8-bit registers such as SIL (low 8 bits of RSI), DIL (low 8 bits of RDI), BPL (low 8 bits of RBP), and SPL (low 8 bits of RSP). Unlike AX, BX, CX, and DX, these new registers only provide access to their low 8 bits and do not have separate high-byte counterparts.

 

 

Naming Conventions

CPU Architecture

The x86 architecture refers to the 32-bit instruction set originally developed by Intel. The x86-64 is the official name for the 64-bit extension of the x86 architecture The term x64 is simply Microsoft’s marketing name for the same x86-64 architecture.

The term x86 originates from the naming pattern of these Intel processors, many of which
ended with “86” (like 8086, 80286, 80386, and 80486). This family of processors and their successors became collectively known as the x86 architecture.

Registers
Following the introduction of the 64-bit processor, AMD introduced a new naming convention to provide a consistent way to refer to the expanded set of registers in the 64-bit architecture. In this scheme, the traditional register names were mapped to a numeric system: for example, the 64-bit RAX register could also be called R0, representing register 0. Similarly, the 32-bit EAX became R0D, where the “D” indicates a double word (32-bit) size. In x86, a word is 16 bits (2 bytes), so a double word equals 32 bits (4 bytes).

This pattern extended to all 16 general-purpose registers, named R0 through R15. Each register could be accessed in multiple sizes by appending suffixes: for the 64-bit full register (e.g., R8), the 32-bit lower portion (R8D), 16-bit word (R8W), and 8-bit byte (R8B). This naming system created uniformity across registers and sizes, improving clarity and simplifying assembly programming.

Despite this new numeric naming convention, most disassemblers and programmers typically
continue to use the traditional alphabetical names (such as RAX, RBX, RCX) as mnemonics for ease of understanding and continuity with older architectures.

Conclusion

This post has covered the basics of registers, their types, and their use cases. The initial plan was to also include basic assembly instructions and simple assembly code examples, but to keep each post clear and focused, those topics will be covered in the next post to prevent cramming too much content at once. I hope this post was on point and has given you an overview of the concepts, helping you better understand the upcoming topics.