How does a CPU work?

Let’s explain in the simplest and most understandable way, without getting lost in the jungle of terms that are more intimidating than the next, how a CPU works, i.e., the processor in your laptop or …

how a cpu works

Let’s explain in the simplest and most understandable way, without getting lost in the jungle of terms that are more intimidating than the next, how a CPU works, i.e., the processor in your laptop or PC (applicable in many cases also for other microprocessors in other devices).

You can think of the processor in your computer or laptop as a super orchestra conductor. It receives information (or instructions, if you will), processes it quickly, and sends it where it needs to go so that everything functions harmoniously.

Note: This article is not the most academic or detailed explanation. But it should clarify things and explain, broadly, how a processor works.

1. Processor Core

A processor can have one or more cores. Each core can process information independently of the others. More cores mean more processing power, theoretically speaking. The number of cores is not a good characteristic (or at least not just that) to determine how ‘good’ a processor is for you. (see the performance chapter in this article)

1.1 Core Operation

Each processor core can function independently, having its own set of registers and often its own L2 and L2 caches. Registers store temporary data. Caches are fast memories used to reduce the access time to frequently used data. Cores may share some resources, such as the L3 cache and the memory controller.

The execution process in a core involves several steps: Fetch – the core retrieves instructions from the cache or main memory Decode – the instructions are decoded into a format the core can understand Execute – the core executes the decoded instructions Write-back – the results are written back into the cache or main memory.

1.2 Multi-Core and Benefits

A multi-core processor brings a series of benefits, but we must also consider some technological considerations or certain compromises that the manufacturer must make to accommodate multiple cores within the same CPU.

Multi-core benefits: Improved performance: with more cores, a processor can handle more tasks simultaneously. This is ideal for modern software that is designed to be multi-threaded. Energy efficiency: multi-core processors can be more energy-efficient compared to increasing the clock speed of a single core, as they can distribute the load among cores, which can run at a lower and more efficient frequency. Reliability: in critical systems, more cores and high energy efficiency can reduce the chances of overheating.

Other considerations:

Scalability: adding additional cores does not always mean a linear improvement in performance. The efficiency of scaling depends on the software’s ability to exploit these additional cores. For example, if an application can use up to 8 cores, moving to 16 cores will not significantly adjust the situation (assuming other variables remain the same – performance per core, etc.).

Software design: applications must be written or optimized to take advantage of multi-core architecture. If an application does not know how to use more than 1 core, we will not see performance increases in that application if we add execution threads or cores.

Technology and chip space: there are always trade-offs. Space is limited on the chip. Caches and certain components take up space. That’s why we see all kinds of differences between generations of processors: attempts are made to maximize space and performance through the best architectures and the most advanced manufacturing processes.

1.3 Architecture

The architecture of a CPU defines its fundamental structure. The architecture also defines how the processor processes information, executes instructions, and interacts with other hardware components.

Mainly we have 2 sets of architectures: Von Neumann Architecture vs Harvard Architecture RISC (Reduced Instruction Set Computing) vs CISC (Complex Instruction Set Computing) As an idea, modern processors from Intel and AMD use principles from both sets of architectures. They use a variation of the Harvard Architecture, known as the Modified Harvard Architecture. In this configuration, the processor has separate caches for instructions and data (similar to Harvard), but these are fed from a common memory (similar to Von Neumann). This allows a balance between the efficiency of separating data and instructions for rapid access and the flexibility of managing a single memory.

Modern processors from Intel and AMD are often seen as based on CISC architecture due to their complex sets of instructions that allow the execution of advanced operations in a few instructions. However, the lines between RISC and CISC have become less clear due to developments in microarchitectural design.

1.4 Die Size

In reviews or presentations, you might hear or read the term “die size”. For example, the AMD Ryzen 5 3600X has a die size of 74mm2. What does this mean?

The term die size refers to the physical surface area of the semiconductor chip, measured in square millimeters. This is actually the individual silicon chip on which the circuits and transistors of a processor or other type of integrated circuit are etched.

The die size is an important factor in the design and manufacture of semiconductors, influencing both production cost and component performance.

This specification, die size, affects:

Production Cost – A smaller die allows manufacturers to obtain more functional chips from a single silicon wafer, thus reducing material costs per unit. A larger die is more susceptible to defects, which can reduce production yield.

Performance – A smaller die reduces the distances that signals must travel between different components of the chip, such as the processor cores, cache, and memory controller. This can lead to improved performance and energy efficiency.

Heat dissipation – technically speaking, heat produced is managed better.

Portability – Processors with smaller dies are essential for mobile devices such as smartphones and tablets. Space is limited in such a device, and energy efficiency is crucial for a longer battery life.

1.5 Transistor Size

When you see something like 7nm or 14nm in a processor’s specifications, the manufacturer is referring to the physical size of the transistors used in the chip.

The term ‘nm’ stands for nanometers. A nanometer is one billionth of a meter. So it’s small. In the length of semiconductors, this indicates the gate length of each transistor on the chip.

Why does transistor size matter?

Improved Performance: Smaller transistors can switch faster between on and off states, which allows for higher clock frequencies and, therefore, better performance. Reducing the size of transistors also reduces latency and can improve response to processing tasks.

Higher Density: The smaller the transistors, the more can be incorporated on a single chip. This means that processors can have more cores, larger caches, and generally more integrated functions on the same silicon area. This leads directly to an increase in overall performance and energy efficiency.

Energy Efficiency: Smaller transistors consume less energy, as they require a lower voltage to operate and have lower electrical resistance.

However, as transistor sizes decrease, manufacturers face significant challenges such as:

Leakage currents – leakage current can increase as transistors become smaller, leading to energy losses and increased heat generation.

Process variability – the precision in manufacturing extremely small transistors can lead to variability between chips, which affects production yield and performance consistency.

Physical limitations – there is a limit to how small transistor sizes can be reduced, known as the scaling limit. This has led manufacturers to explore other technologies, such as 3D transistors.

2. Cache

Cache memory is a type of fast memory located directly on the processor (thus on the die).

The main role of the cache is to reduce the access time to data frequently used by the processor, thus improving the overall performance of the system.

2.1 Types of Cache

Cache memory is usually divided into three levels, each with various characteristics and purposes:

L1 Cache (Level 1)

Speed: It is the fastest form of cache memory.
Size: Usually relatively small (on the order of a few kilobytes).
Purpose: Stores instructions and data that the processor needs to access most quickly.
Location: Usually, each processor core has its own L1 cache.

L2 Cache (Level 2)

Speed: Slower than L1, but faster than main memory (RAM).
Size: Larger than L1, usually a few hundred kilobytes to a few megabytes.
Purpose: Acts as a buffer between the fast L1 cache and the slower L3 cache or RAM.
Location: Can be integrated into each core or shared between a group of cores.

L3 Cache (Level 3)

Speed: Slower than L1 and L2, but significantly faster than RAM.
Size: The largest of the caches, can reach several megabytes to tens of megabytes.
Purpose: Reduces access time to data frequently used by all the processor’s cores.
Location: Usually shared between all the processor’s cores.

2.2 Cache Operation

The process of using the cache is quite simple in theory, but complex in implementation.

Basic steps:

  1. Initial Access: when the processor needs data, it first checks the L1 cache
  2. Hits and Misses:
    Cache Hit: if the data are found in the cache (a ‘hit’ – as in a hit, not as in a musical piece), they are quickly delivered to the processor
    Cache Miss: if the data are not in the cache (a ‘miss’ – a miss), the processor checks the next levels of cache (L2 and then L3). If the data are not found in these levels, they are brought from the RAM.
  3. Cache Update: data brought from RAM are stored in the cache for future accesses, replacing less frequently used data, based on replacement algorithms such as Least Recently Used (LRU).

2.3 Impact on Performance

Efficient use of the cache can have a significant impact on system performance. Access to data is much faster than access to RAM, thus reducing latency and improving data processing speed. Although somewhat forced, an example would be Ryzen 7 5800X vs 5800X3D. The X3D has a large Vertical 3D Cache. And as you probably know or have seen/can see through benchmarks, the 5800X3D is significantly superior in applications that can use the larger cache. One explanation is that the implementation of the cache and its use is better.

3. Clock Speed

Clock speed refers to the processor’s clock speed and is measured in gigahertz (GHz). This speed indicates the number of cycles a processor can perform in a second. A cycle is essentially a clock pulse during which the processor can perform basic operations, such as reading instructions, executing them, and writing the results. For example, a processor with a clock speed of 3 GHz can perform 3 billion cycles per second.

3.1 How Does It Work?

The processor receives a clock signal from a crystal oscillator, which generates a constant and precise rhythm. This clock signal is used to synchronize the processor’s internal operations. Each clock pulse allows the processor to advance in executing the instructions it processes.

3.2 Impact of Clock Speed on Performance

Better Performance: The higher the clock speed, the more instructions the processor can process in a shorter period of time. This translates into better performance, especially for applications that depend on the processing speed of a single execution thread (single-thread).

Energy Efficiency and Heat: Higher clock speeds can also lead to higher energy consumption and heat production. Processor manufacturers must find a balance between clock speed, energy consumption, and heat dissipation.

Diminishing Benefits: Not all applications benefit equally from a higher clock speed. Programs that can use multiple cores simultaneously may benefit more from a multi-core processor than from one with an extremely high clock speed but with fewer cores and higher heat output.

3.3 Limitations

As the transistors in chips become smaller and smaller, it becomes increasingly difficult to increase the clock speed without significantly increasing energy consumption and heat generation.

4. Pipelines

Pipelining is an essential concept in modern CPU architecture. Pipelining allows the processor to execute multiple instructions simultaneously by dividing the execution process into several stages. Each stage is executed by a different part of the processor, which increases overall efficiency.

4.1 Typical Stages of a Pipeline

In a simple pipeline, the stages might be:

IF (Instruction Fetch): The processor reads the instruction from the cache or RAM.

ID (Instruction Decode): The processor decodes the instruction to determine what operation it needs to perform.

EX (Execute): The processor executes the operation specified by the instruction.

MEM (Memory Access): The processor accesses memory to read or write data, if necessary.

WB (Write Back): The results are written back into a register or memory.

4.2 Advantages of Pipelining

Improved Efficiency – by working on multiple instructions at the same time, the pipeline increases the total number of instructions the processor can complete in a unit of time.

Better Resource Use – each component of the processor can be used continuously, without waiting for other stages to complete.

4.3 Disadvantages and Challenges

Hazards: There are three main types of hazards in a pipeline: structural (when the hardware does not support multiple instructions simultaneously), data (when instructions depend on each other), and control (when jumps and decisions affect the flow of instructions).

Design Complexity: Implementing an efficient pipeline requires sophisticated design to properly manage dependencies and hazards.

Performance Penalties: In the case of a control hazard, for example, the pipeline may be forced to discard some partially processed instructions, leading to performance penalties.

5. Hyper-Threading and SMT

5.1 Hyper-Threading – Intel

Hyper-Threading is a technology developed by Intel that allows a single physical processor core to execute two threads (or execution processes) simultaneously. This is achieved by duplicating certain sections of the processor that store the architectural state but not duplicating the entire physical core.

How Hyper-Threading Works

Each physical core of the processor has two separate sets of registers (small, fast data stores) and processor states. The operating system sees each physical core as two logical cores. The processor can switch between these two threads on a core depending on processing needs, which allows for more efficient use of the processor’s resources.

When one thread is blocked or waiting for resources (for example, waiting for data from RAM), the processor can quickly switch to the other thread to continue working, thus reducing downtime.

5.2 Simultaneous Multithreading – AMD

Simultaneous Multithreading, implemented by AMD under the name “SMT”, works in a similar way to Intel’s Hyper-Threading. The goal is the same: to allow each physical core to execute multiple threads simultaneously to improve efficiency and overall performance.

Differences and Similarities: Both technologies aim to maximize the efficiency of the processor by allowing a single core to manage multiple tasks simultaneously.

SMT and Hyper-Threading are just different commercial names for the same fundamental idea of allowing a core to execute multiple threads simultaneously.

The specific implementation may vary between Intel and AMD, but the basic principle remains the same.

5.3 Impact on Performance

Using Hyper-Threading or SMT can lead to significant performance improvements in multi-threaded scenarios, where there are many tasks that can be executed in parallel. Depending on the application, performance improvements can vary:

Applications optimized for multi-threading, such as video editing and scientific calculations, can benefit significantly from multiple execution threads. Games and applications that depend more on the performance of a single thread may not see significant improvements and, in some cases, may even experience a slight decrease in performance due to the additional complexity in managing threads and/or inefficiency in using multiple threads.

6. TDP – Thermal Design Power

TDP represents the maximum amount of heat generated by the processor (also used for other components, such as the video card or GPU), which the cooling system must be able to dissipate under standard working conditions. It is expressed in watts (W) and indicates energy consumption and heat production under typical use conditions, but beware, not necessarily maximum conditions.

6.1 Why Does TDP Matter?

Cooling System Design: TDP helps determine the type and size of the cooling system needed to keep components at safe operating temperatures. Inadequate cooling can lead to overheating, which can reduce performance and the lifespan of components.

Component Selection: When building or upgrading a system, it is important to consider the TDP of components to ensure compatibility with the cooling capacity of the case and the cooling system.

Energy Efficiency: TDP can provide an idea of the energy efficiency of a processor or graphics card. Components with a lower TDP are usually more energy-efficient, which can be important for systems that run 24/7, such as servers, or for portable devices, where battery life is critical. For example, for a home lab, home server, or NAS, we would want the lowest possible TDP and the highest possible energy efficiency to have the lowest possible electricity costs.

6.2 How Is TDP Measured?

TDP is determined by hardware manufacturers based on a typical usage scenario, which does not necessarily include the maximum possible load.

Manufacturers test components under various loads and measure the heat generated to establish a TDP that reflects the cooling needs in the most common usage scenarios. For this reason, we usually want a cooling system that exceeds the TDP of our processor in terms of cooling capacity. Especially for gaming systems or other intense and long-lasting work scenarios.

7. How Do We Determine the Processing Power of a Processor?

Let’s approach this discussion about performance in 2 ways.

IPC, because often in the news we see that manufacturer X mentions that the new generation of processors has an IPC 20% higher. And then about performance as… the performance perceived by us, people. How many FPS does a processor get in game X or how many seconds does it take to render a particular clip?

7.1 IPC – Instructions Per Cycle

When manufacturers such as AMD or Intel mention that “IPC” has increased by a certain percentage from one generation to the next, they are referring to “Instructions Per Cycle”. IPC is a crucial performance indicator in CPU architecture, indicating the average number of instructions a CPU can execute in a clock cycle. Let’s detail what this means and how it influences the overall performance of the CPU.

IPC measures the efficiency with which a CPU executes instructions. It is a key indicator of the architectural efficiency of a processor, independent of its clock speed.

Here’s why IPC is important:

Efficiency: Higher IPC means that the processor can do more work with each clock cycle. This efficiency leads to better performance, especially in scenarios where clock speed is similar between CPUs.

Scaling Performance: Even if clock speed (GHz) does not increase, improvements in IPC can still provide better performance. This is crucial for achieving gains in computing power without pushing the limits of energy consumption and heat generation.

Ways to Improve IPC

Manufacturers achieve improvements in IPC through architectural improvements such as:

  • Optimized Execution Paths: Reducing the number of steps or the complexity of operations required to execute instructions.
  • Improved Branch Prediction: More accurately predicting the paths that programs will take, thus preparing the CPU to process the most likely outcomes more quickly.
  • Larger, More Efficient Caches: Allowing faster access to frequently used data and instructions, reducing the time CPUs wait.
  • Enhanced Out-of-Order Execution Capabilities: Allowing CPUs to process instructions as resources are available rather than strictly sequentially.
  • Increasing the Number of Cores and Threads: Although this does not directly increase IPC, it allows for the simultaneous processing of multiple instructions, complementing IPC improvements.

7.2 How do I find out what performance a processor has? How do I know which processor is better?

When you want to buy a processor, you first need to decide on a budget and how you will use it.

The way you plan to use the processor matters a lot. For example, if the program/game you want to run is not optimized for multiple cores or for more than x cores, you might need a processor that has faster individual cores, even if it does not have many cores. And vice versa.

The easiest and most accurate way to determine a processor’s performance is to look for benchmarks with it and compare the results with other processors or with the processor you currently have.

Yes, you can also look at specifications such as GHz speed, number of cores, TDP, and manufacturing process. Still, ultimately, the benchmark is the accurate performance indicator you can rely on.

I hope you found this article useful, and if you did, make sure to share it!