» 您尚未登录:请 登录 | 注册 | 标签 | 帮助 | 小黑屋 |


发新话题
打印

[专题讨论] 模拟器硬核研究系列5:图形合成器(GS)图形处理单元(GPU)和双核

此文发表于2006年8月7日
图形合成器(GS)、图形处理单元(GPU)和双核

原文地址:http://www.pcsx2.net/blog.php?p=2
图形合成器(GS)、图形处理单元(GPU)和双核
早就说GS插件会成为3D场景重大的瓶颈。倒不是说GS插件占用大量CPU运算时间,而是因为GS插件需要和显卡通信,也就意味着当GPU和CPU同步的时候图形驱动将会出现没有必要的停顿。停顿过程中,CPU基本上去吃午饭了,干等GPU就绪。注意到这一点之后,图形驱动和库尽可能少的和显卡通信。通常都把渲染状态变化,着色器变化和纹理变化缓存起来,直到渲染实际几何体。还使用了FIFO(先进先出缓冲)。CPU只是写入FIFO,GPU只是从FIFO里面读,这样就避免了GPU忙的时候CPU在等或者CPU在忙的时候GPU在等。

设计游戏和核心程序需要尽可能发挥GPU的潜能的时候,最大的挑战在于不惜一切代价减少图形驱动的停顿。杀死游戏的不是将几何图形发送到图形流水线,而是什么时候切换渲染目标,什么时候渲染目标在下一次绘制调用中被作为纹理,什么时候锁定纹理,什么时候把渲染目标从显存传到内存中(最后提到的那种情况中,CPU不仅仅去吃午饭了,连晚饭都吃了)。一般来说所有的游戏开发峰会都会有GPU优化的议题,网上也有很多论文,这个方面的故事远比这里罗列的要丰富得多。

所有这些意味着单线程应用程序必须要有很好的GPU算法才能获得高FPS。不幸地是对于Pcsx2和GS插件来说这不可能实现。GS插件必须按照所收到的顺序来绘制几何图形。这使得几乎所有的现代游戏中采用的缓存技术都无法使用,因为GS和PC GPU有着截然不同的性能瓶颈。现代GPU的优势在于在一次绘制调用中把尽可能多的几何图形处理掉。而GS没有这方面的瓶颈。GS自身又有两种不同的环境(context)(译者注: Context 上下文 容器 环境 翻译方法很多 不统一),这使得问题复杂度加倍。在不牺牲兼容性的情况下ZeroGS只有很有限的工作可以做,所以剩下的选择只有尝试用多线程处理GS。值得注意的是,多线程下调用图形库可不是无关紧要的任务。

幸运的是,GS插件很特别。GS插件和其他系统组件通信时99%的情况是GS接受数据,唯一EE需要和GS同步的时候是EE回读FINISH/SIGNAL寄存器和4Mb GS显存的一部分。寄存器回读使用很频繁,所以这表明EE和GS需要紧密同步。GS显存回读没有那么频繁;但是,它们需要对虚拟内存和DMA进行一些特殊的考虑。剩下的99%的GS接受数据的通信都用GS FIFO解决。

第一次启动时,Pcsx2创建一个GS线程,并为GS FIFO保留特定内存。然后GS插件仅为该线程创建Direct3D设备/GL环境。再然后当游戏开始运行的时候,EE将其所有GS数据包处理到FIFO并通知GS线程。GS线程随后检查FIFO是否有数据,并把数据传送给GS插件。这听起来比实际更容易,因为需要进行非常紧密的同步,以确保FIFO中不会发生覆盖。FINISH/SIGNAL寄存器同步实际上并不能跨越EE和GS线程。相反EE线程提前查看所有数据包并按照自己的流程来处理。

“双核”选项的特殊之处在于最后一个解释中的通知部分。GS线程可以休眠等待EE的通知,这可以通过WaitForSingleObject和SetEvent函数完成。或者它可以不停地不断检查GS FIFO是否为空。后一种方法可以扼杀单核,但在双核上速度要快得多。双核处理器用户点击MTGS和DC选项的结果是惊人的。通常帧速率会上升甚至超过2倍。

游戏中采用多线程以后会很常见。CPU什么都包了而GPU只负责渲染的日子过去了。最大的问题是将游戏处理的哪个部分划分为哪个线程,以及这些线程如何相互通信。这些问题中有许多仍然悬而未决,目前的游戏公司正在努力解决并发执行带来的额外复杂性。

博客中心思想:GPU已经变得如此强大,以至于人们开始用它来完成立体视觉和通用计算等任务。学习如何运用它们,推荐Wolfgang Engel的ShaderX3: Advanced Rendering with DirectX and OpenGL和Matt Pharr,Randima Fernando,以及为此做出贡献的20多名研究人员的GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation。


TOP

Graphics Synthesizer, GPUs, and Dual Cores

It was apparent early on the project that the GS plugin was going to be a big bottleneck during 3D scenes. It isn’t that the GS plugin itself does a lot of computation on the CPU, but the fact that it needs to communicate with the graphics card means that unnecessary stalls will occur in the graphics driver as the GPU and CPU are synchronized. During these stalls, the CPU basically goes to lunch until the GPU is ready. Graphics drivers and libraries are aware of this and try as little as possible to communicate with the graphics card. They usually cache render state changes, shader changes, and texture changes up until actual geometry is rendered. They also take advantage of FIFOs (first in first out buffers). The CPU just writes to the FIFO and the GPU just reads from it, this makes all the difference in terms of keeping the GPU active while the CPU isn’t and vise versa.

The biggest challenge when designing games and hardcore applications that need to use the GPU to its full potential is to make sure that graphics driver stalls are minimal at all costs. What kills games isn’t sending geometry down the graphics pipeline, but it is when render targets are switched, render targets are used as textures in the next draw call, textures are locked, and when render targets are transferred from GPU memory to CPU memory (in the last case, the CPU not only goes to lunch, but has dinner also). GPU optimization talks usually appear in every Game Developers Conference and there are many papers on them on the net, so there is a lot more to the story than written here.

All this means is that single-threaded applications really need to design their GPU algorithms well to see fast frame rates. This unfortunately is not possible with Pcsx2 and the GS plugin. The GS plugin has to draw geometry in the same order as it was received. This kills almost all caching techniques used by modern games because the GS and PC GPUs have very different performance bottlenecks. In modern GPUs, it is advantageous to group as much geometry as possible in one draw call. The GS doesn't suffer from such bottlenecks. The GS also has two different contexts which makes things twice as difficult. ZeroGS can only do a limited amount of work-arounds before compatibility starts dropping, so the only other option is to try to multithread the GS. Note that using graphics libraries from multiple threads is not a trivial task.

Fortunately, the GS plugin is very unique in its nature. 99% of the communication that happens between the GS plugin and the rest of the systems components happens in the direction to the GS. The only times the EE needs to synchronize with the GS is when it reads back the FINISH/SIGNAL registers and part of the 4Mb GS memory. Register readbacks are used frequently, so this suggests that tight synchronization will be needed with the GS. The GS memory readbacks aren't as frequent; however, they require some special considerations with Virtual Memory and DMAs. The rest of the 99% of communication that goes to the GS happens with a GS FIFO.

When first started, Pcsx2 creates a GS thread and reserves special memory for the GS FIFO. The GS plugin then creates the Direct3D device/GL context only for that thread. Then when the game runs, the EE copes all its GS packets to the FIFO and then notifies the GS thread. The GS thread then checks if the FIFO has data, and then sends it to the GS plugin. This sounds easier than it actually is because very tight synchronization needs to happen to make sure no overwriting occurs in the FIFO. The FINISH/SIGNAL register synchronization actually doesn’t happen across the EE and GS thread boundaries. Instead the EE thread peeks at all the packets ahead of time and handles it in its own routines.

What makes the “Dual Core” option special is the notifies part of the last explanation. The GS thread can either sleep waiting for a notification from EE, which can be done by WaitForSingleObject and SetEvent functions. Or it can continually check if the GS FIFO is empty without ever stopping. The latter option kills single cores but goes much faster on dual cores. The results of clicking on the MTGS and DC options on dual cores are phenomenal. Usually frame rates go up or even surpass 2x.

Multithreading in games is going to be very big in the future. The times have passed when there is one CPU that does everything and one GPU that just renders. The biggest problem is which game processing to divide into which thread, and how these threads will communicate with each other. Many of these issues are still open and current game companies are struggling with the added complication of concurrent execution.

Moral of the blog GPUs have become so powerful that people are staring to do tasks like stereo vision and general purpose computation with them. Learn how to use them. I recommend
ShaderX3: Advanced Rendering with DirectX and OpenGL by Wolfgang Engel and GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation by Matt Pharr, Randima Fernando, and the 20+ researchers that contributed to it.



TOP

posted by wap, platform: Chrome
PS2一些游戏在读盘的时候突然会显示半个场


TOP

发新话题
     
官方公众号及微博