终于知道Vega为什么要64cu了

u571

禁止访问

帖子: 31333
精华: 0
积分: 50368
激骚: 667 度
爱车
主机
相机
手机
注册时间: 2002-9-11

TGFC 2015新年勋章☆☆☆☆

发短消息
加为好友
当前离线

1^# 大中小发表于 2017-8-17 12:30 显示全部帖子

posted by wap, platform: Chrome
At a high level, Vega 10’s compute core is configured almost exactly like Fiji. This means we’re looking at 64 CUs spread out over 4 shader engines. Or as AMD is now calling them, compute engines. Each compute engine in turn is further allocated a portion of Vega 10’s graphics resources, amounting to one geometry engine and rasterizer bundle at the front end, and 16 ROPs (or rather 4 actual ROP units with a 4 pix/clock throughput rate) at the back end. Not assigned to any compute engine, but closely aligned with the compute engines is the command processor frontend, which like Fiji before it, is a single command processor paired with 4 ACEs and another 2 Hardware Schedulers.

On a brief aside, the number of compute engines has been an unexpectedly interesting point of discussion over the years. Back in 2013 we learned that the then-current iteration of GCN had a maximum compute engine count of 4, which AMD has stuck to ever since, including the new Vega 10. Which in turn has fostered discussions about scalability in AMD’s designs, and compute/texture-to-ROP ratios.

Talking to AMD’s engineers about the matter, they haven’t taken any steps with Vega to change this. They have made it clear that 4 compute engines is not a fundamental limitation – they know how to build a design with more engines – however to do so would require additional work. In other words, the usual engineering trade-offs apply, with AMD’s engineers focusing on addressing things like HBCC and rasterization as opposed to doing the replumbing necessary for additional compute engines in Vega 10.

简而言之，目前4 组 compute engine/64CUs/64ROPs是农企工程师所能找到的最佳光栅化图形计算配比，虽然有可以增加更多CUs的方案但是需要更多的晶体管和能耗成本

so这个GCN构架到64CUs就已经over了，就算增加CUs也是成本跟性能提升不成正比。如果没有新的图形构架超越GCN，农企也就只能在HBCC和通用计算上下功夫

http://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/2

TOP

u571

禁止访问

帖子: 31333
精华: 0
积分: 50368
激骚: 667 度
爱车
主机
相机
手机
注册时间: 2002-9-11

TGFC 2015新年勋章☆☆☆☆

发短消息
加为好友
当前离线

2^# 大中小发表于 2017-8-17 12:48 显示全部帖子

posted by wap, platform: Chrome

引用:

原帖由 @yfl2 于 2017-8-17 12:45 发表
你贴的这段话中没有一句说这是个限制……

从功耗和晶体管可以看出限制，目前300瓦+功耗和500mm2核心才能达到1080水平

要达到titanXP至少也要650mm2+和4096bit HBM2，这样的产品成本就高达1000多刀，做出来也没有任何商业价值。

TOP

u571

禁止访问

帖子: 31333
精华: 0
积分: 50368
激骚: 667 度
爱车
主机
相机
手机
注册时间: 2002-9-11

TGFC 2015新年勋章☆☆☆☆

发短消息
加为好友
当前离线

3^# 大中小发表于 2017-8-18 09:45 显示全部帖子

posted by wap, platform: Chrome

引用:

原帖由 @Nemo_theCaptain 于 2017-8-17 17:36 发表
GCN时代农企旗舰的效率一向不怎么样吧
放大规模不会提升，但缩小也许会提升
当年Tahiti的能耗比跟Pitcairn比就跟屎一样

因为光栅化渲染并不完全是靠shader计算，ROPs、TMU乃至顶点处理速度对于帧数影响都很大。

我还是一贯的观点，农企在单纯浮点计算方面做的并不差，但是怎么能让整体光栅化渲染更快、能耗更低，比老黄差了十万八千里。

目前全球最顶尖的光栅化渲染大牛大部分都在NV，这可不是仅仅靠钞票就能得到的

TOP

u571

禁止访问

帖子: 31333
精华: 0
积分: 50368
激骚: 667 度
爱车
主机
相机
手机
注册时间: 2002-9-11

TGFC 2015新年勋章☆☆☆☆

发短消息
加为好友
当前离线

4^# 大中小发表于 2017-8-18 09:49 显示全部帖子

posted by wap, platform: Chrome
其实GPU研发比CPU要难的多，因为CPU发展已经非常成熟，苹果都能拿出单线程相当不错的CPU核心。

但是GPU就完全不一样了，如果你没有一个很懂光栅化的技术团队，不要说性能就是连正确渲染画面都做不到。

牙膏厂的GPU就是这个屌样，论浮点性能很牛逼，跑实际游戏一塌糊涂渲染BUG到处都是。

TOP