There has been a lot of criticism about improving processor performance with every generation change, and there certainly will be. Under Intel’s leadership, we saw that the difference in performance was minimal over the years, right up to the present day.
Meanwhile, GPUs have increased their performance by several times, how is it possible if, for example, in the case of AMD, they used the same lithographic process for processors and video cards? What are the reasons?
There are many explanations for this curious discrepancy, but the main thing and the basis of everything is the fate of each type of equipment and the software approach to it. With this in mind, a number of explanations open up that require closer scrutiny as we move from the lithographic process to software developers .
GPUs Are Always Way Ahead of Processors – Here’s Why?
The first reason is what processors and video cards are designed for. As we well know, the CPU is an extremely complex component, the heart of the system, but when we talk about workloads to improve performance in a thread, and therefore in IPC, we have to consider that the limiting factor is frequency.
And along with this, the restriction of the duty unit. The architectural improvements improve a much more optimized Front-End and Back-End along with access to caches and registers usually improve performance, but we cannot forget the parallelism that these modern processors require.
If we add all of the above, we get a bottleneck, which is always primarily associated with the lithographic process. Turning on more transistors per mm2 is most optimal if you want to turn on more cores and thus increase overall performance, but at the flow level we have to push the flow as fast as possible.
We are currently running at 5GHz with Intel, so if, with this limitation, we apply Amdahl’s Law (the workload is hard to accelerate and the harder task becomes more difficult even if it is parallelized) we have a difficulty that could be exponential in some cases.
Another point to be discussed is, of course, the executions and instructions that are added to the CPU where we can optimize and improve performance in a more or less complex way, but usually they are direct improvements on a thread or thread.
But of course the CPU runs in parallel with technologies such as speculative execution or instruction execution, for example, not to mention more available cores, caches, and access to RAM or technologies such as HT or SMT.
In the end, all of these technologies are trying to do a very simple thing: keep each CPU and thread busy for as long as possible and with the most perfect order available for each task so that there is no delay between data. Why is this and how cpu is different from GPUs?
Super Scalability And Parallelism Are Key Differences Between CPU vs GPU
The CPU must perform a large number of different, simple and complex tasks, but it must also interact with any component of the PC, which implies receiving information and transferring it over different buses at the highest possible speed. On the other hand, the GPU has a different way of working, in fact a simpler one.
Changing the information in the way they work is called changing the context, and here the GPU has many advantages because, by its very nature, the work they have to do requires very little context change, since it is extremely parallelized and the workloads are usually uniform.
Developers work differently as the GPU has as many cores as the shaders integrate its silicon, so parallelization is extremely easy as they can integrate up to 6912 real shaders without too much trouble ( NVIDIA A100 ) where each shader acts as a core almost independently of the processor.
Therefore, we have a large number of cores to work with, the performance of which is logically limited by the node speed for each developed chip and at the same time by the efficiency of the chip. Keep in mind that in GPUs we are talking about a huge dozen with unthinkable CPU consumption.
The tradeoff is lower speeds due to the nature of the architecture, but parallelization is unmatched, so it is easier to scale performance with it. Finally, we have to consider Dennard’s Law , which we have talked about more than once, and which accurately takes into account efficiency as the main pillar where energy use is conserved in proportion to the area of the chip.
Therefore, if you can parallelize a series of tasks, it will be very simple that by adding more cores to the GPU, you can scale performance much more, where, in addition, the number of transistors is much higher, and with them the consumption, the same, but dissipated.
Since the GPU does not reach the node frequency limit, it is not limited in this aspect, but in efficiency, which, having more headroom than the CPU, gives more gain if we combine everything that has been explained.