魔王撒旦
原帖由 JimmyC 于 2011-2-4 23:25 发表 你先看一下Course-grained Z和Fine-grained Z的归类 Course-grained Z: Course Z, Hierarchical Z, Hi-Z, or ZCULL Fine-grained Z: Fine Z, Early Z, Early Z Checking, Early Z Out 然後究竟G70有没有Fine- ...
Early-Z Optimization Early-z optimization (sometimes called “z-cull”) improves performance by avoiding the rendering of occluded surfaces. If the occluded surfaces have expensive shaders applied to them, z-cull can save a large amount of computation time. To take advantage of z-cull, follow these guidelines: Don’t create triangles with holes in them (that is, avoid alpha test or texkill) Don’t modify depth (that is, allow the GPU to use the interpolated depth value) Violating these rules can invalidate the data the GPU uses for early optimization, and can disable z-cull until the depth buffer is cleared again
查看详细资料
TOP
小黑屋
原帖由 TG春上春 于 2011-2-4 23:53 发表 乃们还真能吵, 还吵得像模像样的. :D Z-cull和early-z本来就不是一个咚咚. Z-cull是在raster里面的, 所谓coarse是因为它是逐tile做深度测试, 不是逐sample. 做逐sample深度测试的是ZROP, 所谓的fine-grained. ZRO ...
原帖由 hourousha 于 2011-2-4 23:56 发表 原来你又发现了新大陆,呵呵,可惜的是你只知其一不知其二。 这个early-z rejection指的是一种行为——也就是把‘本来就通不过z-test的fragment在进入fragment shader之前预先cull掉,避免不必要的运算’。至于不 ...
原帖由 JimmyC 于 2011-2-5 00:15 发表 以你的标准 现在连Tegra1支援也真HSR, 非阉割HSR了... (Tegra支援early-z rejection) 哎... 这样的话我也无话可说了...
原帖由 hourousha 于 2011-2-5 00:25 发表 Tegra细节是啥我不清楚别和我扯这个。 RacingPHT在本论坛也有账号你直接问他关于这问题不就OK了? 他在那贴里明明也说了‘因为首先z-cull也可以算是early-z’。换句话说,G70的Z-Cull本身也是Early-Z,只不过后来 ...
原帖由 JimmyC 于 2011-2-5 00:55 发表 刚又找到SCEE的官方开发文档PDF 2009年版 在适当环境下, 依足步骤, 没有违反建议下, RSX的Early Z-cull可以足足省回10%GPU! 哈哈, 好吧, 我认了 RSX的HSR是"真"HSR, "非阉割"HSR 虽然效率只有G8X的一半 TBDR的六分一(依x2.5计算)
RSX 2 z/stencil SGX543MP4+ 64 z/stencil 两者的实际HSR效率差了32倍 就算RSX的HSR仅能省回10%也好, 总之RSX的是"真"HSR, "非阉割"HSR就是
话说回来, "PowerVR有TBDR有什麽了不起, RSX也有HSR"这话题呢是谁开的? 现在有答案了, 呵呵
RacingPHT我不熟, 你可以问问看
看你对CLX2在TBDR的同时对alpha test硬件加速一面怀疑
其实beyond3d的讨论区就有Imgtec的员工长驻 说CLX2有alpha test硬件加速, 同时脉性能比Neon250高一倍的就是他 你可以问他究竟十二年前是怎样做到的 (虽然随便下一个DC模拟器已经可看到zwrite/alpha test zwrite的选项)
原帖由 hourousha 于 2011-2-5 11:31 发表 请贴10%的原文与前提条件,要是原场景的depth complexity就只有1或者渲染全是transparent obj,那还一点都省不了呢。少逗咳嗽了你 还乘2.5,还TBDR的六分之一喷了,你要不就是算术太棒,要不就是脑子太好,真是 ...
SimonF说的话我信....给出我怀疑PVR CLX2的连接,别急了眼就信口胡说啊……
一样是MBX, Sega的Aurora(2005产品)就有专门优化透明/不完整三角形 当年PowerVR2代, Dreamcast也是alpha test with HW front, 效能比同时脉的电脑版快一倍
优化透明三角形么?还是看我给你的那个Insider FAQ,里面提到了,我再给你引用一下 .... 是让开发者事先把blend的几何体给分割成不透明/半透明两大集合,尽量减小blend处理量,这就是你说的硬件优化透明/镂空三角形吗?喷了…
本来透明物体渲染就和HSR无缘。
是谁开的呢?
这问题我在08年就和他聊过
原帖由 JimmyC 于 2011-2-5 15:08 发表 10% 没有, 原文那一頁, 就这六行, 你可以不信, 呵呵 Many games are fragment shaderbound •Rendering Z only ‘primes’ the RSX™ Z-cull unit –Very fast, 16 pixels/clock rather than 8 –Render entire scene, –Or ‘large’ meshes only –Easily save 10% GPU
怎样不直接计算SGX和RSX受惠於TBDR/z-cull能省掉多少GPU? RSX方面SCEE已直接给了省10%GPU这答案 SGX将400MP/s当1000MP/s用对吧? 省多少?怎样计算?我不知道, 呵
拿十二年前的CLX2/六年前的MBX替USSE2说项不行 拿六年前的USSE喷USSE2就可以了, 呵呵
你不会看?
就贴出来呀
还有麻烦您别缩,我怀疑CLX2的证据在哪?
看你对CLX2在TBDR的同时对alpha test硬件加速一面怀疑 其实beyond3d的讨论区就有Imgtec的员工长驻 说CLX2有alpha test硬件加速, 同时脉性能比Neon250高一倍的就是他 你可以问他究竟十二年前是怎样做到的
优化透明三角形么?还是看我给你的那个Insider FAQ,里面提到了,我再给你引用一下 引用: For sprites with transparent areas, create polygons that are optimal for the visible area and exclude fragments that are completely transparent. If an application was to render a simple triangular shaped tree texture on a quad polygon, there would be large, empty areas that would need to be blended. A better approach in this situation would be to use a triangle that tightly fits the shape of the texture. By doing so, most of the empty area that would have to be blended when using a quad to render the tree sprite can be removed, which means there are fewer fragments to blend. Geometry used to tightly fit sprites in a given application should be kept as simple as possible while eliminating as many unwanted fragments as possible. Finding the balance between geometric complexity and the empty space that will be removed by using more complex geometry is a balance that is very application and platform specific. A tool such as the one described here: http://www.humus.name/index.php?page=Cool&ID=8 can be used to generate the geometry required. For further optimisation, when rendering sprites with partially transparent areas, break each object down into an area that can be rendered as an opaque sprite and a second area of partially transparency that can be blended. By taking this approach, the number of fragments that need to be blended for each sprite can be significantly reduced, which allows the HSR process to provide a "super" fill rate. In order to maintain sprite ordering, use of the depth buffer will be required - each sprite will need a unique offset to avoid artefacts. Generating the areas for this technique can be done with a similar tool to that mentioned above, but this time looking for opaque pixels instead of completely transparent. As stated previously, the opaque objects should be drawn first followed by the blended objects as this will allow the blended objects to gain the most benefit possible from the hardware's HSR process. 是让开发者事先把blend的几何体给分割成不透明/半透明两大集合,尽量减小blend处理量,这就是你说的硬件优化透明/镂空三角形吗?喷了……
原帖由 JimmyC 于 2011-2-5 17:40 发表 问题1. 你信SimonF吗?
问题2. CLX2/MBX到底有没有alpha test硬体加速?
问题3. HSR渲染下能否对alpha test硬体加速?
问题4. Imgtec是否曾经掌握HSR渲染下对alpha test硬体加速的设计?
问题5. 为什麽你要用PowerVR Insider那段软件解决方法
查看个人网站
侠客
PIKA
管理员
满天都是比卡丘