打印

[专题讨论] 模拟器硬核研究系列4：浮点街上的噩梦

SSforME

魔头

帖子: 1935
精华: 0
积分: 10284
激骚: 84 度
爱车
主机
相机
手机
注册时间: 2010-10-23

发短消息
加为好友
当前离线

1^# 大中小发表于 2021-7-13 13:46 只看该作者

注意此文发布时间是2006年 7月 25日
浮点街上的噩梦

原文http://www.pcsx2.net/blog.php?p=3
浮点街上的噩梦
在x86 CPU上很难模拟R5900浮点运算单元(FPU)和向量单元(Vector Unit VU)，因为Playstation 2没有遵循IEEE标准。两个数的乘法运算在FPU、VU和x86处理器上面会得到三种不同的结果，结果之间相差好几个位(bit)。平方根和除法运算就更不准确了。

起先我们觉得几个位的差异可能并不重要，游戏开发人员应该不会依赖如此精确的计算。浮点数更多用在世界坐标转换或者插值计算，所以应该没有人在乎末日圣剑在主角手里有0.00001米的偏差。简而言之，我们猜错了，游戏开发人员比我们想象的要疯狂，就算是浮点数舍入模式的改变也会造成游戏无法运行。
舍入模式是个问题，浮点数的无穷大值简直就是噩梦。IEEE标准这样规定，当一个数字溢出(也就是比3.4028234663852886E+38还大)的时候，这个结果就是无穷大。任何一个数字乘以无穷大还是无穷大(甚至 0 x 无穷大 = 无穷大)。这个规定看起来不错，当时当你发现VU不支持无穷大的时候就完全不是那么回事了，取而代之的是，它们将所有大数固定到可能的最大浮点数。这个差异造成很多游戏运行错误。
举个例子，如果一个游戏开发人员通过除向量长度来对一个零向量(零向量的长度就是0)进行规范化的时候，在VU上结果是(0,0,0)，而在x86/IEEE上，结果就是(无穷,无穷,无穷)。现在如果游戏开发人员用这个向量来扰动脸部获得人工毛发效果或者用在一些形式的动画中，在PS2这个最终位置会完全不变，但是在x86上面位置就会变成无穷远...游戏画面中的这些问题现在算是找到由头了。
最简单的解决方案是固定当前指令的写入向量。这需要两个SSE操作，会很慢，而且有些时候还不解决问题。最重要的是，你永远不能忽视游戏开发人员可能会一开始就在VU里面装载了溢出的浮点数，而有些游戏又用乘零来清零，这个时候VU不在乎里面这个溢出的值，但是x86在乎。
这两个问题使得浮点模拟难以又快又准，产生的问题是各种各样的，从渐渐淡出一个角色时的屏幕闪烁到常见的多边形毛刺症(spiky polygon syndrome 就是广为人知的SPS)

最后所有的浮点操作Pcsx2都用SSE处理，因为这样更容易缓存寄存器。对于FPU和VU采用两种不同的舍入模式。当FPU执行除法或者平方根运算的时候，都进行溢出检查，在VU里面溢出检查更加频繁。VU在同一个SSE寄存器中同时处理整数和浮点数据的事实使得检查时间稍微长一些。未来，Pcsx2将会从补丁文件中读取舍入模式和溢出设置，这样所有游戏都可以选择最好或者最快的设置。
博客中心思想:比较两个浮点数a和b的时候，决不要用 a == b，要用类似如下的方式
fabs(a-b) < epsilon
其中epsilon是一个很小的数字。

TOP

SSforME

魔头

帖子: 1935
精华: 0
积分: 10284
激骚: 84 度
爱车
主机
相机
手机
注册时间: 2010-10-23

发短消息
加为好友
当前离线

2^# 大中小发表于 2021-7-13 13:48 只看该作者

Nightmare on Floating-Point Street
It is very hard to emulate the floating-point calculations of the R5900 FPU and the Vector Units on an x86 CPU because the Playstation 2 does not follow the IEEE standard. Multiplying two numbers on the FPU, VU, and an x86 processor can give you 3 different results all differing by a couple of bits! Operations like square root and division are even more imprecise.

Originally, we thought that a couple of bits shouldn't matter, that game developers would be crazy to rely on such precise calculation. Floating points are mostly used for world transformations or interpolation calculations, so no one would care if their Holy Sword of Armageddon was 0.00001 meters off from the main player's hand. To put it shortly, we were wrong and game developers are crazier than we thought. Games started breaking just by changing the floating point rounding mode!
While rounding mode is a problem, the bigger nightmare is the floating-point infinities. The IEEE standard states that when a number overflows (meaning that it is larger than 3.4028234663852886E+38), the result will be infinity. Any number multiplied by infinity is infinity (even 0 * infinity = infinity). That sounds great until you figure out that the VUs don't support infinities. Instead they clamp all large numbers to the max floating point possible. This discrepancy breaks a lot of games!
For example, let's say a game developer tries to normalize a zero vector by dividing by its length, which is 0. On the VU, the end result will be (0,0,0). On x86/IEEE, the result will be (infinity, infinity, infinity). Now if the game developer uses this vector to perturb some faces for artificial hair or some type of animation, all final positions on the PS2 will remain the same. All final positions on x86 will go to infinity... and there goes the game's graphics, now figure out where the problem occurred.

The simplest solution is to clamp the written vector of the current instruction. This requires 2 SSE operations and is SLOW; and it doesn't work sometimes. To top it off, you can never dismiss the fact that game developers can be loading bad floating-point data to the VUs to begin with! Some games zero out vectors by multiplying them with a zero, so the VU doesn't care at all what kind of garbage the original vector's data has, x86 does care.

These two problems make floating-point emulation very hard to do fast and accurate. The range of bugs are from screen flickering when a fade occurs, to disappearing characters, to spiky polygon syndrome (the most common problem and widely known as SPS).
In the end Pcsx2 does all its floating-point operations with SSE since it is easier to cache the registers. Two different rounding modes are used for the FPU and VUs. Whenever a divide or rsqrt occur on the FPU, overflow is checked. Overflow is checked much more frequently with the VUs. The fact that VUs handle both integer and floating-point data in the same SSE register makes the checking a little longer. In the future, Pcsx2 will read the rounding mode and overflow settings from the patch files. This is so that all games can be accomodated with the best/fastest settings.

Moral of the blog When comparing two floating point numbers a and b, never use a == b. Instead use something along the lines of

fabs(a-b) < epsilon

where epsilon is some very small number.

TOP

SSforME

魔头

帖子: 1935
精华: 0
积分: 10284
激骚: 84 度
爱车
主机
相机
手机
注册时间: 2010-10-23

发短消息
加为好友
当前离线

3^# 大中小发表于 2021-7-13 13:50 只看该作者

这篇文章主要说的是PS2的浮点数表示和PC不同

PC的浮点数表示是基于IEEE标准754

PS2略有不同

主要是当浮点数的指数超过127时,PC是非规格化表示而PS2只是简单的将指数定为128

另外PC有无穷大的表示而PS2没有

还有当数值溢出时,比如说0作为除数,PC表示为无穷大(和数学表示一致),但PS2表示为0

PS2之所以不遵循IEEE标准754是因为此标准是按照数学标准建立的(毕竟PC不是只用来玩游戏的),图形处理显然没必要这么精确

对于游戏机来说用PS2的标准效率更高些,但对于模拟器来说意味着计算数据之前之后都要检查,增加了很多开销

TOP