魔王撒旦
搜索更多相关主题的帖子: NGP
查看详细资料
TOP
原帖由 JimmyC 于 2011-2-3 19:27 发表 要到G80才算真正支援early z-rejection http://www.gamedev.net/topic/576 ... on-on-g70-hardware/
To take advantage of this all 3D and 2D applications should use opaque objects (blending off, alpha test off, no discard in shader) as much as possible so that the HSR process can reduce fragment processing to a minimum. These should be rendered first, before any objects with transparency. Examples of this kind of sprites could be background graphics, terrain tiles, pop-up message windows.
原帖由 JimmyC 于 2011-2-4 14:30 发表 不至, 至少在这些情况下也会失效(fps降至1/10~30) -use kill/clip in pixelshader -change compare func -modify depth 好吧, 你要说这也算是完整的HSR我也没办法 那G80的官方文档和Nvidia GPU Programming Guide还真是写心酸的 USSE2的TBDR效能已经比USSE好了一倍(16z:8z) 一样是MBX, Sega的Aurora(2008产品)就有专门优化透明/不完整三角形 当年PowerVR2代, Dreamcast也是alpha test with HW front, 效能比同时脉的电脑版快一倍 难保SGX543MP4+不会有硬件加速alpha test, 就算没有, 也有64z, 即是Galaxy S的八倍 200MHz的Galaxy S(SGX540)比起240MHz的Tegra2 GPU效能差距多少? 就算不是N粉也可以参考Nvidia今年1月26日发出的宣传PDF, 说是110~150%, 实际约110~125% 然後Nvidia声称Tegra2的GPU效能是低阶G80(Tegra1是低阶Geforce6) 要喷, 请连NV一起喷, 好歹SGX543MP4+的同时脉效能是这"低阶G80"的八倍以上
For sprites with transparent areas, create polygons that are optimal for the visible area and exclude fragments that are completely transparent. If an application was to render a simple triangular shaped tree texture on a quad polygon, there would be large, empty areas that would need to be blended. A better approach in this situation would be to use a triangle that tightly fits the shape of the texture. By doing so, most of the empty area that would have to be blended when using a quad to render the tree sprite can be removed, which means there are fewer fragments to blend. Geometry used to tightly fit sprites in a given application should be kept as simple as possible while eliminating as many unwanted fragments as possible. Finding the balance between geometric complexity and the empty space that will be removed by using more complex geometry is a balance that is very application and platform specific. A tool such as the one described here: http://www.humus.name/index.php?page=Cool&ID=8 can be used to generate the geometry required. For further optimisation, when rendering sprites with partially transparent areas, break each object down into an area that can be rendered as an opaque sprite and a second area of partially transparency that can be blended. By taking this approach, the number of fragments that need to be blended for each sprite can be significantly reduced, which allows the HSR process to provide a "super" fill rate. In order to maintain sprite ordering, use of the depth buffer will be required - each sprite will need a unique offset to avoid artefacts. Generating the areas for this technique can be done with a similar tool to that mentioned above, but this time looking for opaque pixels instead of completely transparent. As stated previously, the opaque objects should be drawn first followed by the blended objects as this will allow the blended objects to gain the most benefit possible from the hardware's HSR process.
原帖由 JimmyC 于 2011-2-4 16:39 发表 early-z exists since gf3, like mentioned before. it is disabled if you -enable alpha test -use kill/clip in pixelshader -change compare func in order to get speed again on G70, you need to work around your alpha-testing. this is critical, otherwise you pretty much run without optimization and then you're easily 10 to 30 times slower. 你自己搜索一下随便一个Dreamcast模拟器的说明 DC用的PowerVR2的指令分ZWrite和Alpha ZWrite等 後者可大幅强化fps数倍, 这硬体加速指令可是DC版的PowerVR2才有, 显卡的Neon250没有 Sega街机用的MBX也有这个指令, 但iphone2G/3G用的就没有 证明Imgtec一早就有解决方法但没全部采用 在还没清楚SGX543MP4+的规格前就喷这点会不会太早? PowerVR Insider那边的资料别说SGX543MP4+, 连SGX543的也没有, 也没有家用机芯片的资料 最近期的就是2007年发表的SGX540的开发建议 比起USSE, USSE2每管线shader/TBDR/隐面处理性能增加一倍, 8z>16z, 1D>2D, Vec2>Vec4, 同时支援更多硬体加速 难为你可以面不红气不喘地用2005年USSE的资料来喷2009年的USSE2 跑什麽题? RSX:G70(7800)阉割版(8:24:24:8) 时脉比SGX543MP4+高20%, 效能高10~25%的240MHz Tegra2:低阶G80, 最低阶的G80为8300GS(8:8:4) 前一点不敢喷, 说到同时脉效能为Tegra2八倍以上的SGX543MP4+效能接近8600GT(32:16:8)/RSX就要喷了 可笑的是连SGX543MP4+时脉多少还未知道 当2011Q1的OMAP4440(45nm)用的已是380MHz 还要拿着200MHz的数据来喷
原帖由 JimmyC 于 2011-2-4 20:21 发表 G70及之前的只能coarse level Z and Stencil culling G80及以後的才能fine-grained Z and Stencil culling Course-grained Z: Course Z, Hierarchical Z, Hi-Z, or ZCULL Fine-grained Z: Fine Z, Early Z, Early Z Checking, Early Z Out 好吧, 这不是阉割, fine-grained Z and Stencil culling是多馀的 skip the shading of occluded pixels其实是没有用的垃圾功能 没有这的G70已经是完整的HSR 没有这的G70才是真HSR 有这的G80反而是假HSR 我这样说没错吧?
1/7, 1/10-30都是别人在G70使用HSR实际编程的结果, Nvidia自然不会说白慢多少, 但随便搜一下也有很多这方面的讨论
我放出讨论链结又被喷是搜回来的, 非官方不能作准 但我又不会写, 你怎样不自己写一点看看? 还有, MBX是五年前的产品 拿2005年USSE来喷2009年USSE2的不是你?
原帖由 JimmyC 于 2011-2-4 23:25 发表 你先看一下Course-grained Z和Fine-grained Z的归类 Course-grained Z: Course Z, Hierarchical Z, Hi-Z, or ZCULL Fine-grained Z: Fine Z, Early Z, Early Z Checking, Early Z Out 然後究竟G70有没有Fine- ...
Early-Z Optimization Early-z optimization (sometimes called “z-cull”) improves performance by avoiding the rendering of occluded surfaces. If the occluded surfaces have expensive shaders applied to them, z-cull can save a large amount of computation time. To take advantage of z-cull, follow these guidelines: Don’t create triangles with holes in them (that is, avoid alpha test or texkill) Don’t modify depth (that is, allow the GPU to use the interpolated depth value) Violating these rules can invalidate the data the GPU uses for early optimization, and can disable z-cull until the depth buffer is cleared again
原帖由 JimmyC 于 2011-2-5 00:15 发表 以你的标准 现在连Tegra1支援也真HSR, 非阉割HSR了... (Tegra支援early-z rejection) 哎... 这样的话我也无话可说了...
原帖由 JimmyC 于 2011-2-5 00:55 发表 刚又找到SCEE的官方开发文档PDF 2009年版 在适当环境下, 依足步骤, 没有违反建议下, RSX的Early Z-cull可以足足省回10%GPU! 哈哈, 好吧, 我认了 RSX的HSR是"真"HSR, "非阉割"HSR 虽然效率只有G8X的一半 TBDR的六分一(依x2.5计算)
RSX 2 z/stencil SGX543MP4+ 64 z/stencil 两者的实际HSR效率差了32倍 就算RSX的HSR仅能省回10%也好, 总之RSX的是"真"HSR, "非阉割"HSR就是
话说回来, "PowerVR有TBDR有什麽了不起, RSX也有HSR"这话题呢是谁开的? 现在有答案了, 呵呵
RacingPHT我不熟, 你可以问问看
看你对CLX2在TBDR的同时对alpha test硬件加速一面怀疑
其实beyond3d的讨论区就有Imgtec的员工长驻 说CLX2有alpha test硬件加速, 同时脉性能比Neon250高一倍的就是他 你可以问他究竟十二年前是怎样做到的 (虽然随便下一个DC模拟器已经可看到zwrite/alpha test zwrite的选项)
原帖由 JimmyC 于 2011-2-5 15:08 发表 10% 没有, 原文那一頁, 就这六行, 你可以不信, 呵呵 Many games are fragment shaderbound •Rendering Z only ‘primes’ the RSX™ Z-cull unit –Very fast, 16 pixels/clock rather than 8 –Render entire scene, –Or ‘large’ meshes only –Easily save 10% GPU
怎样不直接计算SGX和RSX受惠於TBDR/z-cull能省掉多少GPU? RSX方面SCEE已直接给了省10%GPU这答案 SGX将400MP/s当1000MP/s用对吧? 省多少?怎样计算?我不知道, 呵
拿十二年前的CLX2/六年前的MBX替USSE2说项不行 拿六年前的USSE喷USSE2就可以了, 呵呵
你不会看?
就贴出来呀
原帖由 JimmyC 于 2011-2-5 17:40 发表 问题1. 你信SimonF吗?
问题2. CLX2/MBX到底有没有alpha test硬体加速?
问题3. HSR渲染下能否对alpha test硬体加速?
问题4. Imgtec是否曾经掌握HSR渲染下对alpha test硬体加速的设计?
问题5. 为什麽你要用PowerVR Insider那段软件解决方法