It’s the era of big data, and every day more and more business are trying to leverage their data to make informed decisions. Many businesses are turning to Python’s powerful data science ecosystem to analyze their data, as evidenced by Python’s rising popularity in the data science realm.
这是大数据时代,越来越多的企业每天都在尝试利用其数据做出明智的决策。 Python在数据科学领域越来越流行,证明了许多企业正在使用Python强大的数据科学生态系统来分析其数据。
One thing every data science practitioner must keep in mind is how a dataset may be biased. Drawing conclusions from biased data can lead to costly mistakes.
每位数据科学从业者都必须牢记的一件事是,如何对数据集施加偏见。 从有偏见的数据得出结论可能会导致代价高昂的错误。
There are many ways bias can creep into a dataset. If you’ve studied some statistics, you’re probably familiar with terms like reporting bias, selection bias and sampling bias. There is another type of bias that plays an important role when you are dealing with numeric data: rounding bias.
偏差可以通过多种方式蔓延到数据集中。 如果您已经研究过一些统计信息,则可能熟悉诸如报告偏差,选择偏差和采样偏差之类的术语。 在处理数字数据时,还有另一种类型的偏差会起重要作用:四舍五入偏差。
In this article, you will learn:
在本文中,您将学习:
Take the Quiz: Test your knowledge with our interactive “Rounding Numbers in Python” quiz. Upon completion you will receive a score so you can track your learning progress over time.
参加测验:通过我们的交互式“ Python中的舍入数字”测验测试您的知识。 完成后,您将获得一个分数,以便您可以随时间追踪学习进度。
Click here to start the quiz »
点击此处开始测验»
This article is not a treatise on numeric precision in computing, although we will touch briefly on the subject. Only a familiarity with the fundamentals of Python is necessary, and the math involved here should feel comfortable to anyone familiar with the equivalent of high school algebra.
尽管我们将简要介绍该主题,但本文不是关于计算精度的论文。 只需要熟悉Python的基础知识,并且所涉及的数学对于熟悉高等代数的人来说应该感到很舒服。
Let’s start by looking at Python’s built-in rounding mechanism.
让我们先来看一下Python的内置舍入机制。
round()
函数 (Python’s Built-in round()
Function)Python has a built-in round()
function that takes two numeric arguments, n
and ndigits
, and returns the number n
rounded to ndigits
. The ndigits
argument defaults to zero, so leaving it out results in a number rounded to an integer. As you’ll see, round()
may not work quite as you expect.
Python有一个内置的round()
函数,该函数接受两个数字参数n
和ndigits
,并返回将n
舍入为ndigits
。 ndigits
参数默认为零,因此将其ndigits
会导致数字四舍五入为整数。 如您所见, round()
可能无法按预期工作。
The way most people are taught to round a number goes something like this:
教导大多数人四舍五入的方法如下:
Round the number
n
top
decimal places by first shifting the decimal point inn
byp
places by multiplyingn
by 10ᵖ (10 raised to thep
th power) to get a new numberm
.轮的数量
n
到p
通过首先将小数点在小数n
由p
地方乘以n
由10ᵖ(10升高到p
次方),以获得一个新的数m
。Then look at the digit
d
in the first decimal place ofm
. Ifd
is less than 5, roundm
down to the nearest integer. Otherwise, roundm
up.然后查看
m
的第一小数位d
。 如果d
小于5,则将m
向下舍入到最接近的整数。 否则,圆m
了。Finally, shift the decimal point back
p
places by dividingm
by 10ᵖ.最后,将
m
除以10ᵖ将小数点后移p
位。
It’s a straightforward algorithm! For example, the number 2.5
rounded to the nearest whole number is 3
. The number 1.64
rounded to one decimal place is 1.6
.
这是一个简单的算法! 例如,四舍五入到最接近的整数的数字2.5
是3
。 将1.64
四舍五入到小数点后一位是1.6
。
Now open up an interpreter session and round 2.5
to the nearest whole number using Python’s built-in round()
function:
现在打开一个解释器会话,并使用Python的内置round()
函数将2.5
舍入到最接近的整数:
>>> round(2.5)
2
Gasp!
喘气!
How does round()
handle the number 1.5
?
round()
如何处理数字1.5
?
>>> round(1.5)
2
So, round()
rounds 1.5
up to 2
, and 2.5
down to 2
!
因此, round()
1.5
向上round()
为2
,然后将2.5
下round()
为2
!
Before you go raising an issue on the Python bug tracker, let me assure you that round(2.5)
is supposed to return 2
. There is a good reason why round()
behaves the way it does.
在向Python Bug Tracker提出问题之前,让我向您保证round(2.5)
应该返回2
。 round()
表现出其行为方式是有充分的理由的。
In this article, you’ll learn that there are more ways to round a number than you might expect, each with unique advantages and disadvantages. round()
behaves according to a particular rounding strategy—which may or may not be the one you need for a given situation.
在本文中,您将了解到,对数字进行四舍五入的方法超出了您的预期,每种方法都有其独特的优点和缺点。 round()
会根据特定的舍入策略进行操作,该策略可能是也可能不是您在特定情况下所需的策略。
You might be wondering, “Can the way I round numbers really have that much of an impact?” Let’s take a look at just how extreme the effects of rounding can be.
您可能想知道,“四舍五入的方式真的会产生如此大的影响吗?” 让我们看一下四舍五入的影响有多大。
Suppose you have an incredibly lucky day and find $100 on the ground. Rather than spending all your money at once, you decide to play it smart and invest your money by buying some shares of different stocks.
假设您有个非常幸运的日子,在地上找到100美元。 您决定花钱多花钱,而不是立即花所有钱,而是购买一些不同股票来投资。
The value of a stock depends on supply and demand. The more people there are who want to buy a stock, the more value that stock has, and vice versa. In high volume stock markets, the value of a particular stock can fluctuate on a second-by-second basis.
库存的价值取决于供求关系。 要购买股票的人越多,股票的价值就越高,反之亦然。 在大宗股票市场中,特定股票的价值可能会每秒波动。
Let’s run a little experiment. We’ll pretend the overall value of the stocks you purchased fluctuates by some small random number each second, say between $0.05 and -$0.05. This fluctuation may not necessarily be a nice value with only two decimal places. For example, the overall value may increase by $0.031286 one second and decrease the next second by $0.028476.
让我们进行一些实验。 我们将假装您购买的股票的总价值每秒波动一些小随机数,例如介于$ 0.05和-$ 0.05之间。 该波动不一定是一个只有两个小数位的好值。 例如,总价值可能会增加一秒$ 0.031286,然后下一秒减少$ 0.028476。
You don’t want to keep track of your value to the fifth or sixth decimal place, so you decide to chop everything off after the third decimal place. In rounding jargon, this is called truncating the number to the third decimal place. There’s some error to be expected here, but by keeping three decimal places, this error couldn’t be substantial. Right?
您不想将值跟踪到小数点后第五位或第六位,因此决定将所有内容都砍掉小数点后第三位。 在四舍五入的术语中,这称为将数字截断到小数点后三位。 此处可能会有一些错误,但是通过保留小数点后三位,这个错误就不会很大。 对?
To run our experiment using Python, let’s start by writing a truncate()
function that truncates a number to three decimal places:
要使用Python运行实验,让我们开始编写一个truncate()
函数,该函数将数字截断到小数点后三位:
>>> def truncate(n):
... return int(n * 1000) / 1000
The truncate()
function works by first shifting the decimal point in the number n
three places to the right by multiplying n
by 1000
. The integer part of this new number is taken with int()
. Finally, the decimal point is shifted three places back to the left by dividing n
by 1000
.
该truncate()
函数的工作原理是首先将小数点的数字n
乘以三个地向右n
由1000
。 这个新数字的整数部分是用int()
。 最后,将n
除以1000
,将小数点向左移三位。
Next, let’s define the initial parameters of the simulation. You’ll need two variables: one to keep track of the actual value of your stocks after the simulation is complete and one for the value of your stocks after you’ve been truncating to three decimal places at each step.
接下来,让我们定义模拟的初始参数。 您将需要两个变量:一个变量用于跟踪模拟完成后的股票实际价值,另一个变量用于在每一步将其截断至小数点后三位后的股票价值。
Start by initializing these variables to 100
:
首先将这些变量初始化为100
:
>>> actual_value, truncated_value = 100, 100
Now let’s run the simulation for 1,000,000 seconds (approximately 11.5 days). For each second, generate a random value between -0.05
and 0.05
with the uniform()
function in the random
module, and then update actual
and truncated
:
现在,让我们运行模拟1,000,000秒(大约11.5天)。 对于每一秒,使用random
模块中的uniform()
函数生成一个介于-0.05
和0.05
之间的随机值,然后更新actual
和truncated
:
>>> import random
>>> random.seed(100)
>>> for _ in range(1000000):
... randn = random.uniform(-0.05, 0.05)
... actual_value = actual_value + randn
... truncated_value = truncate(truncated_value + randn)
...
>>> actual_value
96.45273913513529
>>> truncated_value
0.239
The meat of the simulation takes place in the for
loop, which loops over the range(1000000)
of numbers between 0
and 999,999
. The value taken from range()
at each step is stored in the variable _
, which we use here because we don’t actually need this value inside of the loop.
模拟的要点发生在for
循环中,该循环在0
到999,999
之间的数字range(1000000)
中循环。 在每个步骤中,从range()
获取的值都存储在变量_
,我们在这里使用它是因为实际上在循环内部不需要此值。
At each step of the loop, a new random number between -0.05
and 0.05
is generated using random.randn()
and assigned to the variable randn
. The new value of your investment is calculated by adding randn
to actual_value
, and the truncated total is calculated by adding randn
to truncated_value
and then truncating this value with truncate()
.
在循环的每个步骤中,使用random.randn()
生成一个介于-0.05
和0.05
之间的新随机数,并将其分配给变量randn
。 你投资的新价值是通过将计算randn
到actual_value
,并截断完全是通过将计算randn
到truncated_value
然后截断该值truncate()
As you can see by inspecting the actual_value
variable after running the loop, you only lost about $3.55. However, if you’d been looking at truncated_value
, you’d have thought that you’d lost almost all of your money!
如您所见,通过在运行循环后检查actual_value
变量,您仅损失了约3.55美元。 但是,如果您一直在看truncated_value
,那么您会以为您几乎损失了所有钱!
Note: In the above example, the random.seed()
function is used to seed the pseudo-random number generator so that you can reproduce the output shown here.
注意:在上面的示例中, random.seed()
函数用于为伪随机数生成器提供种子,以便您可以重现此处显示的输出。
To learn more about randomness in Python, check out Real Python’s Generating Random Data in Python (Guide).
要了解有关Python中随机性的更多信息,请查看Real Python 在Python中生成随机数据(指南) 。
Ignoring for the moment that round()
doesn’t behave quite as you expect, let’s try re-running the simulation. We’ll use round()
this time to round to three decimal places at each step, and seed()
the simulation again to get the same results as before:
暂时忽略round()
行为不如您预期的那样,让我们尝试重新运行仿真。 这次我们将使用round()
将每一步四舍五入到小数点后三位,然后再次使用seed()
进行模拟以获得与之前相同的结果:
>>> random.seed(100)
>>> actual_value, rounded_value = 100, 100
>>> for _ in range(1000000):
... randn = random.uniform(-0.05, 0.05)
... actual_value = actual_value + randn
... rounded_value = round(rounded_value + randn, 3)
...
>>> actual_value
96.45273913513529
>>> rounded_value
96.258
What a difference!
有什么不同!
Shocking as it may seem, this exact error caused quite a stir in the early 1980s when the system designed for recording the value of the Vancouver Stock Exchange truncated the overall index value to three decimal places instead of rounding. Rounding errors have swayed elections and even resulted in the loss of life.
看起来令人震惊的是,这个精确的错误在1980年代初引起了不小的轰动,当时用于记录温哥华证券交易所价值的系统将总指数值截断为三位小数,而不是四舍五入。 四舍五入的错误影响了选举 ,甚至导致生命损失 。
How you round numbers is important, and as a responsible developer and software designer, you need to know what the common issues are and how to deal with them. Let’s dive in and investigate what the different rounding methods are and how you can implement each one in pure Python.
如何四舍五入很重要,作为负责任的开发人员和软件设计师,您需要了解常见问题以及如何解决这些问题。 让我们深入研究什么是不同的舍入方法,以及如何在纯Python中实现每种方法。
There are a plethora of rounding strategies, each with advantages and disadvantages. In this section, you’ll learn about some of the most common techniques, and how they can influence your data.
舍入策略过多 ,每种策略各有利弊。 在本节中,您将学习一些最常见的技术,以及它们如何影响您的数据。
The simplest, albeit crudest, method for rounding a number is to truncate the number to a given number of digits. When you truncate a number, you replace each digit after a given position with 0. Here are some examples:
对数字进行四舍五入的最简单(尽管最原始)方法是将数字截断为给定的数字位数。 截断数字时,将给定位置后的每个数字替换为0。以下是一些示例:
Value | 值 | Truncated To | 截断为 | Result | 结果 |
---|---|---|---|---|---|
12.345 | 12.345 | Tens place | 十数位 | 10 | 10 |
12.345 | 12.345 | Ones place | 一个人的地方 | 12 | 12 |
12.345 | 12.345 | Tenths place | 十分之一的地方 | 12.3 | 12.3 |
12.345 | 12.345 | Hundredths place | 百分之一的地方 | 12.34 | 12.34 |
You’ve already seen one way to implement this in the truncate()
function from the How Much Impact Can Rounding Have? section. In that function, the input number was truncated to three decimal places by:
四舍五入有多大影响,您已经在truncate()
函数中看到了实现此目标的一种方法。 部分。 在该函数中,输入数字通过以下方式被截断为小数点后三位:
1000
to shift the decimal point three places to the rightint()
1000
1000
可将小数点向右移动三位 int()
新数字的整数部分 1000
后再向左移动三位 You can generalize this process by replacing 1000
with the number 10ᵖ (10
raised to the pth power), where p is the number of decimal places to truncate to:
您可以通过将1000
替换为数字10ᵖ(将10
提升为pth次幂)来概括该过程,其中p是要截断为的小数位数:
def def truncatetruncate (( nn , , decimalsdecimals == 00 ):
):
multiplier multiplier = = 10 10 ** ** decimals
decimals
return return intint (( n n * * multipliermultiplier ) ) / / multiplier
multiplier
In this version of truncate()
, the second argument defaults to 0
so that if no second argument is passed to the function, then truncate()
returns the integer part of whatever number is passed to it.
在此版本的truncate()
,第二个参数默认为0
因此,如果没有第二个参数传递给函数,则truncate()
返回传递给它的任何数字的整数部分。
The truncate()
function works well for both positive and negative numbers:
truncate()
函数对于正数和负数均适用:
>>> truncate ( 12.5 )
12.0
>>> truncate ( - 5.963 , 1 )
-5.9
>>> truncate ( 1.625 , 2 )
1.62
You can even pass a negative number to decimals
to truncate to digits to the left of the decimal point:
您甚至可以将负数传递给decimals
以截断为小数点左侧的数字:
>>> truncate ( 125.6 , - 1 )
120.0
>>> truncate ( - 1374.25 , - 3 )
-1000.0
When you truncate a positive number, you are rounding it down. Likewise, truncating a negative number rounds that number up. In a sense, truncation is a combination of rounding methods depending on the sign of the number you are rounding.
当您截断正数时,您在四舍五入。 同样,截断负数会舍入该数字。 从某种意义上说,截断是舍入方法的组合,具体取决于要舍入的数字的符号。
Let’s take a look at each of these rounding methods individually, starting with rounding up.
让我们从上舍入开始分别看一下这些四舍五入方法。
The second rounding strategy we’ll look at is called “rounding up.” This strategy always rounds a number up to a specified number of digits. The following table summarizes this strategy:
我们将研究的第二个舍入策略称为“舍入”。 此策略始终将数字四舍五入到指定的位数。 下表总结了此策略:
Value | 值 | Round Up To | 向上舍入 | Result | 结果 |
---|---|---|---|---|---|
12.345 | 12.345 | Tens place | 十数位 | 20 | 20 |
12.345 | 12.345 | Ones place | 一个人的地方 | 13 | 13 |
12.345 | 12.345 | Tenths place | 十分之一的地方 | 12.4 | 12.4 |
12.345 | 12.345 | Hundredths place | 百分之一的地方 | 12.35 | 12.35 |
To implement the “rounding up” strategy in Python, we’ll use the ceil()
function from the math
module.
为了在Python中实现“舍入”策略,我们将使用math
模块中的ceil()
函数。
The ceil()
function gets its name from the term “ceiling,” which is used in mathematics to describe the nearest integer that is greater than or equal to a given number.
ceil()
函数的名称来自“天花板”一词,该术语在数学中用于描述大于或等于给定数字的最接近的整数。
Every number that is not an integer lies between two consecutive integers. For example, the number 1.2
lies in the interval between 1
and 2
. The “ceiling” is the greater of the two endpoints of the interval. The lesser of the two endpoints in called the “floor.” Thus, the ceiling of 1.2
is 2
, and the floor of 1.2
is 1
.
每个不是整数的数字都位于两个连续的整数之间。 例如,数字1.2
介于1
和2
之间。 “上限”是间隔的两个端点中较大的那个。 两个端点中的较小者称为“地板”。 因此,天花板1.2
是2
,和地板1.2
是1
。
In mathematics, a special function called the ceiling function maps every number to its ceiling. To allow the ceiling function to accept integers, the ceiling of an integer is defined to be the integer itself. So the ceiling of the number 2
is 2
.
在数学中,称为上限函数的特殊函数将每个数字映射到其上限。 为了允许上限函数接受整数,将整数的上限定义为整数本身。 因此,数字2
的上限为2
。
In Python, math.ceil()
implements the ceiling function and always returns the nearest integer that is greater than or equal to its input:
在Python中, math.ceil()
实现了上限函数,并始终返回大于或等于其输入的最接近的整数:
>>> import math
>>> math . ceil ( 1.2 )
2
>>> math . ceil ( 2 )
2
>>> math . ceil ( - 0.5 )
0
Notice that the ceiling of -0.5
is 0
, not -1
. This makes sense because 0
is the nearest integer to -0.5
that is greater than or equal to -0.5
.
注意-0.5
的上限是0
,而不是-1
。 这是有道理的,因为0
是最接近-0.5
整数,该整数大于或等于-0.5
。
Let’s write a function called round_up()
that implements the “rounding up” strategy:
让我们编写一个名为round_up()
的函数,该函数实现“舍入”策略:
You may notice that round_up()
looks a lot like truncate()
. First, the decimal point in n
is shifted the correct number of places to the right by multiplying n
by 10 ** decimals
. This new value is rounded up to the nearest integer using math.ceil()
, and then the decimal point is shifted back to the left by dividing by 10 ** decimals
.
您可能会注意到round_up()
看起来很像truncate()
。 首先,在小数点n
通过乘以转移的地方向右正确的号码n
由10 ** decimals
。 使用math.ceil()
将此新值四舍五入到最接近的整数,然后通过除以10 ** decimals
,将10 ** decimals
向左移。
This pattern of shifting the decimal point, applying some rounding method to round to an integer, and then shifting the decimal point back will come up over and over again as we investigate more rounding methods. This is, after all, the mental algorithm we humans use to round numbers by hand.
这种移位小数点,应用一些舍入方法将其舍入为整数,然后再将小数点向后移位的模式将随着我们研究更多的舍入方法而反复出现。 毕竟,这是我们人类用来手动舍入数字的思维算法。
Let’s look at how well round_up()
works for different inputs:
让我们看看round_up()
在不同输入下的效果如何:
>>> round_up(1.1)
2.0
>>> round_up(1.23, 1)
1.3
>>> round_up(1.543, 2)
1.55
Just like truncate()
, you can pass a negative value to decimals
:
就像truncate()
,您可以将负值传递给decimals
:
>>> round_up(22.45, -1)
30.0
>>> round_up(1352, -2)
1400
When you pass a negative number to decimals
, the number in the first argument of round_up()
is rounded to the correct number of digits to the left of the decimal point.
当您将负数传递给decimals
, round_up()
的第一个参数中的数字将四舍五入到小数点左边的正确位数。
Take a guess at what round_up(-1.5)
returns:
猜测round_up(-1.5)
返回什么:
>>> round_up(-1.5)
-1.0
Is -1.0
what you expected?
是您期望的-1.0
吗?
If you examine the logic used in defining round_up()
—in particular, the way the math.ceil()
function works—then it makes sense that round_up(-1.5)
returns -1.0
. However, some people naturally expect symmetry around zero when rounding numbers, so that if 1.5
gets rounded up to 2
, then -1.5
should get rounded up to -2
.
如果检查用于定义round_up()
的逻辑round_up()
尤其是math.ceil()
函数的工作方式math.ceil()
,那么round_up(-1.5)
返回-1.0
是有意义的。 但是,有些人自然希望在对数字取整时对称性在零附近,因此,如果1.5
取整到2
,则-1.5
应该取整到-2
。
Let’s establish some terminology. For our purposes, we’ll use the terms “round up” and “round down” according to the following diagram:
让我们建立一些术语。 为了我们的目的,我们将根据下图使用术语“向上舍入”和“向下舍入”:
Round up to the right and down to the left. (Image: David Amos) 向上舍入到右边,向下舍入到左边。 图片:大卫·阿莫斯(David Amos)Rounding up always rounds a number to the right on the number line, and rounding down always rounds a number to the left on the number line.
向上舍入总是将数字向右舍入到一个数字,向下舍入总是将数字向左舍入到一个数字。
The counterpart to “rounding up” is the “rounding down” strategy, which always rounds a number down to a specified number of digits. Here are some examples illustrating this strategy:
与“四舍五入”相对应的是“四舍五入”策略,该策略始终将数字四舍五入为指定的位数。 以下是一些说明此策略的示例:
Value | 值 | Rounded Down To | 四舍五入为 | Result | 结果 |
---|---|---|---|---|---|
12.345 | 12.345 | Tens place | 十数位 | 10 | 10 |
12.345 | 12.345 | Ones place | 一个人的地方 | 12 | 12 |
12.345 | 12.345 | Tenths place | 十分之一的地方 | 12.3 | 12.3 |
12.345 | 12.345 | Hundredths place | 百分之一的地方 | 12.34 | 12.34 |
To implement the “rounding down” strategy in Python, we can follow the same algorithm we used for both trunctate()
and round_up()
. First shift the decimal point, then round to an integer, and finally shift the decimal point back.
为了在Python中实现“舍入”策略,我们可以遵循与trunctate()
和round_up()
相同的算法。 首先移动小数点,然后四舍五入为整数,最后将小数点后移。
In round_up()
, we used math.ceil()
to round up to the ceiling of the number after shifting the decimal point. For the “rounding down” strategy, though, we need to round to the floor of the number after shifting the decimal point.
在round_up()
,我们使用math.ceil()
将小数点移位后四舍五入到数字的上限。 但是,对于“四舍五入”策略,我们需要在移动小数点后四舍五入到数字的底数。
Lucky for us, the math
module has a floor()
function that returns the floor of its input:
对我们来说幸运的是, math
模块具有floor()
函数,该函数返回其输入的下限:
>>> math.floor(1.2)
1
>>> math.floor(-0.5)
-1
Here’s the definition of round_down()
:
这是round_down()
的定义:
def def round_downround_down (( nn , , decimalsdecimals == 00 ):
):
multiplier multiplier = = 10 10 ** ** decimals
decimals
return return mathmath .. floorfloor (( n n * * multipliermultiplier ) ) / / multiplier
multiplier
That looks just like round_up()
, except math.ceil()
has been replaced with math.floor()
.
看起来就像round_up()
,只是math.ceil()
已被math.floor()
取代。
You can test round_down()
on a few different values:
您可以在一些不同的值上测试round_down()
:
>>> round_down ( 1.5 )
1
>>> round_down ( 1.37 , 1 )
1.3
>>> round_down ( - 0.5 )
-1
The effects of round_up()
and round_down()
can be pretty extreme. By rounding the numbers in a large dataset up or down, you could potentially remove a ton of precision and drastically alter computations made from the data.
round_up()
和round_down()
可能非常极端。 通过向上或向下舍入大型数据集中的数字,您可能会删除大量的精度并大大改变从数据中进行的计算。
Before we discuss any more rounding strategies, let’s stop and take a moment to talk about how rounding can make your data biased.
在讨论更多舍入策略之前,我们先停下来花点时间讨论舍入如何使您的数据产生偏差。
You’ve now seen three rounding methods: truncate()
, round_up()
, and round_down()
. All three of these techniques are rather crude when it comes to preserving a reasonable amount of precision for a given number.
现在,您已经看到了三种舍入方法: truncate()
, round_up()
和round_down()
。 对于给定数量的数字,要保持合理的精度,这三种技术都相当粗糙。
There is one important difference between truncate()
and round_up()
and round_down()
that highlights an important aspect of rounding: symmetry around zero.
truncate()
与round_up()
与round_down()
之间的一个重要区别突出了舍入的一个重要方面:零左右对称。
Recall that round_up()
isn’t symmetric around zero. In mathematical terms, a function f(x) is symmetric around zero if, for any value of x, f(x) + f(-x) = 0. For example, round_up(1.5)
returns 2
, but round_up(-1.5)
returns -1
. The round_down()
function isn’t symmetric around 0, either.
回想一下round_up()
不对称于零。 用数学术语来说,如果对于任何x值,f(x)+ f(-x)= 0,则函数f(x)对称于零。例如, round_up(1.5)
返回2
,而round_up(-1.5)
返回-1
。 round_down()
函数也不是围绕0对称的。
On the other hand, the truncate()
function is symmetric around zero. This is because, after shifting the decimal point to the right, truncate()
chops off the remaining digits. When the initial value is positive, this amounts to rounding the number down. Negative numbers are rounded up. So, truncate(1.5)
returns 1
, and truncate(-1.5)
returns -1
.
另一方面, truncate()
函数在零附近对称。 这是因为,在将小数点向右移动之后, truncate()
砍掉剩余的数字。 当初始值为正时,等于将数字四舍五入。 负数四舍五入。 因此, truncate(1.5)
返回1
,而truncate(-1.5)
返回-1
。
The concept of symmetry introduces the notion of rounding bias, which describes how rounding affects numeric data in a dataset.
对称性概念引入了舍入偏差的概念,该概念描述了舍入如何影响数据集中的数字数据。
The “rounding up” strategy has a round towards positive infinity bias, because the value is always rounded up in the direction of positive infinity. Likewise, the “rounding down” strategy has a round towards negative infinity bias.
“向上取整”策略朝正无穷大方向取整,因为该值总是朝正无穷大的方向取整。 同样,“向下舍入”策略具有朝着负无穷大偏差的回合 。
The “truncation” strategy exhibits a round towards negative infinity bias on positive values and a round towards positive infinity for negative values. Rounding functions with this behavior are said to have a round towards zero bias, in general.
“截断”策略对正值表现出向负无穷大的回合,对负值表现出对正无穷大的回合。 一般说来,具有这种行为的舍入函数具有向零偏差的舍入的功能。
Let’s see how this works in practice. Consider the following list of floats:
让我们看看这在实践中是如何工作的。 考虑以下浮点数列表:
>>> data = [ 1.25 , - 2.67 , 0.43 , - 1.79 , 4.32 , - 8.19 ]
Let’s compute the mean value of the values in data
using the statistics.mean()
function:
让我们使用statistics.mean()
函数计算data
值的平均值:
>>> import statistics
>>> statistics . mean ( data )
-1.1083333333333332
Now apply each of round_up()
, round_down()
, and truncate()
in a list comprehension to round each number in data
to one decimal place and calculate the new mean:
现在,在列表 round_up()
应用round_up()
, round_down()
和truncate()
每一个,将data
每个数字四舍五入到小数点后一位,并计算新的均值:
>>> ru_data = [ round_up ( n , 1 ) for n in data ]
>>> ru_data
[1.3, -2.6, 0.5, -1.7, 4.4, -8.1]
>>> statistics . mean ( ru_data )
-1.0333333333333332
>>> rd_data = [ round_down ( n , 1 ) for n in data ]
>>> statistics . mean ( rd_data )
-1.1333333333333333
>>> tr_data = [ truncate ( n , 1 ) for n in data ]
>>> statistics . mean ( tr_data )
-1.0833333333333333
After every number in data
is rounded up, the new mean is about -1.033
, which is greater than the actual mean of about 1.108
. Rounding down shifts the mean downwards to about -1.133
. The mean of the truncated values is about -1.08
and is the closest to the actual mean.
将data
每个数字四舍五入后,新的均值约为-1.033
,大于约1.108
的实际均值。 向下舍入将平均值向下移动到大约-1.133
。 截断值的平均值约为-1.08
,最接近实际平均值。
This example does not imply that you should always truncate when you need to round individual values while preserving a mean value as closely as possible. The data
list contains an equal number of positive and negative values. The truncate()
function would behave just like round_up()
on a list of all positive values, and just like round_down()
on a list of all negative values.
此示例并不意味着在需要舍入单个值时应始终截断,同时尽可能保留平均值。 data
列表包含相等数量的正值和负值。 该truncate()
函数将表现就像round_up()
所有正值的列表上,并就像round_down()
所有负值的名单上。
What this example does illustrate is the effect rounding bias has on values computed from data that has been rounded. You will need to keep these effects in mind when drawing conclusions from data that has been rounded.
此示例确实说明了取整偏差对根据取整的数据计算出的值的影响。 从四舍五入的数据得出结论时,您需要牢记这些影响。
Typically, when rounding, you are interested in rounding to the nearest number with some specified precision, instead of just rounding everything up or down.
通常,在四舍五入时,您希望以指定的精度四舍五入到最接近的数字,而不是将所有内容四舍五入。
For example, if someone asks you to round the numbers 1.23
and 1.28
to one decimal place, you would probably respond quickly with 1.2
and 1.3
. The truncate()
, round_up()
, and round_down()
functions don’t do anything like this.
例如,如果有人要求您将数字1.23
和1.28
四舍五入到小数点后一位,则您可能会很快用1.2
和1.3
响应。 truncate()
, round_up()
和round_down()
函数不会执行任何此类操作。
What about the number 1.25
? You probably immediately think to round this to 1.3
, but in reality, 1.25
is equidistant from 1.2
and 1.3
. In a sense, 1.2
and 1.3
are both the nearest numbers to 1.25
with single decimal place precision. The number 1.25
is called a tie with respect to 1.2
and 1.3
. In cases like this, you must assign a tiebreaker.
1.25
呢? 您可能会立即考虑将其四舍五入为1.3
,但实际上1.25
与1.2
和1.3
等距。 从某种意义上说, 1.2
和1.3
都是最接近1.25
数字,且精确到小数点后一位。 相对于1.2
和1.3
,数字1.25
被称为平局 。 在这种情况下,您必须分配决胜局。
The way that most people are taught break ties is by rounding to the greater of the two possible numbers.
教导大多数人打破联系的方法是将两个可能的数字中的较大者四舍五入。
The “rounding half up” strategy rounds every number to the nearest number with the specified precision, and breaks ties by rounding up. Here are some examples:
“四舍五入”策略以指定的精度将每个数字四舍五入到最接近的数字,并通过四舍五入打破平局。 这里有些例子:
Value | 值 | Round Half Up To | 舍入到一半 | Result | 结果 |
---|---|---|---|---|---|
13.825 | 13.825 | Tens place | 十数位 | 10 | 10 |
13.825 | 13.825 | Ones place | 一个人的地方 | 14 | 14 |
13.825 | 13.825 | Tenths place | 十分之一的地方 | 13.8 | 13.8 |
13.825 | 13.825 | Hundredths place | 百分之一的地方 | 13.83 | 13.83 |
To implement the “rounding half up” strategy in Python, you start as usual by shifting the decimal point to the right by the desired number of places. At this point, though, you need a way to determine if the digit just after the shifted decimal point is less than or greater than or equal to 5
.
要在Python中实现“四舍五入”策略,您可以照常开始,将小数点向右移动所需的位数。 但是,在这一点上,您需要一种方法来确定刚移动的小数点后的数字是否小于或大于或等于5
。
One way to do this is to add 0.5
to the shifted value and then round down with math.floor()
. This works because:
一种方法是将0.5
加上移位值,然后使用math.floor()
向下舍入。 之所以有效,是因为:
If the digit in the first decimal place of the shifted value is less than five, then adding 0.5
won’t change the integer part of the shifted value, so the floor is equal to the integer part.
If the first digit after the decimal place is greater than or equal to 5
, then adding 0.5
will increase the integer part of the shifted value by 1
, so the floor is equal to this larger integer.
如果移位值的第一个小数位数小于5,则加0.5
不会改变移位值的整数部分,因此下限等于整数部分。
如果小数点后的第一位数字大于或等于5
,则加0.5
将使移位值的整数部分增加1
,因此下限等于这个较大的整数。
Here’s what this looks like in Python:
这是Python中的样子:
Notice that round_half_up()
looks a lot like round_down()
. This might be somewhat counter-intuitive, but internally round_half_up()
only rounds down. The trick is to add the 0.5
after shifting the decimal point so that the result of rounding down matches the expected value.
注意, round_half_up()
看起来很像round_down()
。 这可能有点违反直觉,但是内部的round_half_up()
仅向下round_half_up()
。 诀窍是在移动小数点后加上0.5
,以使四舍五入的结果与期望值匹配。
Let’s test round_half_up()
on a couple of values to see that it works:
让我们在几个值上测试round_half_up()
,看看它是否有效:
>>> round_half_up(1.23, 1)
1.2
>>> round_half_up(1.28, 1)
1.3
>>> round_half_up(1.25, 1)
1.3
Since round_half_up()
always breaks ties by rounding to the greater of the two possible values, negative values like -1.5
round to -1
, not to -2
:
由于round_half_up()
总是通过舍入到两个可能的值中的较大者来打破round_half_up()
,因此负值(如-1.5
舍入为-1
,而round_half_up()
入为-2
:
>>> round_half_up(-1.5)
-1.0
>>> round_half_up(-1.25, 1)
-1.2
Great! You can now finally get that result that the built-in round()
function denied to you:
大! 现在,您终于可以得到拒绝内置的round()
函数的结果:
>>> round_half_up(2.5)
3.0
Before you get too excited though, let’s see what happens when you try and round -1.225
to 2
decimal places:
不过,在让您太兴奋之前,让我们看看尝试将-1.225
舍-1.225
到2
小数会发生什么:
>>> round_half_up(-1.225, 2)
-1.23
Wait. We just discussed how ties get rounded to the greater of the two possible values. -1.225
is smack in the middle of -1.22
and -1.23
. Since -1.22
is the greater of these two, round_half_up(-1.225, 2)
should return -1.22
. But instead, we got -1.23
.
等待。 我们刚刚讨论了如何将联系四舍五入为两个可能值中的较大者。 -1.225
是嫌在中间-1.22
和-1.23
。 由于-1.22
是这两者中的较大者,因此round_half_up(-1.225, 2)
应该返回-1.22
。 但是相反,我们得到-1.23
。
Is there a bug in the round_half_up()
function?
round_half_up()
函数中是否存在错误?
When round_half_up()
rounds -1.225
to two decimal places, the first thing it does is multiply -1.225
by 100
. Let’s make sure this works as expected:
当round_half_up()
-1.225
到小数点后两位时,它要做的第一件事是将-1.225
乘以100
。 让我们确保它能按预期工作:
>>> -1.225 * 100
-122.50000000000001
Well… that’s wrong! But it does explain why round_half_up(-1.225, 2)
returns -1.23. Let’s continue the round_half_up()
algorithm step-by-step, utilizing _
in the REPL to recall the last value output at each step:
好吧...那是错的! 但这确实解释了为什么round_half_up(-1.225, 2)
返回-1.23。 让我们一步一步地继续round_half_up()
算法,利用REPL中的_
调用每一步输出的最后一个值:
>>> _ + 0.5
-122.00000000000001
>>> math.floor(_)
-123
>>> _ / 100
-1.23
Even though -122.00000000000001
is really close to -122
, the nearest integer that is less than or equal to it is -123
. When the decimal point is shifted back to the left, the final value is -1.23
.
即使-122.00000000000001
实际上非常接近-122
,但小于或等于它的最接近的整数是-123
。 当小数点向左移动时,最终值为-1.23
。
Well, now you know how round_half_up(-1.225, 2)
returns -1.23
even though there is no logical error, but why does Python say that -1.225 * 100
is -122.50000000000001
? Is there a bug in Python?
好了,现在您知道即使没有逻辑错误, round_half_up(-1.225, 2)
如何返回-1.23
,但是为什么Python会说-1.225 * 100
是-122.50000000000001
? Python中有错误吗?
Aside: In a Python interpreter session, type the following:
另外:在Python解释器会话中,键入以下内容:
>>> 0.1 + 0.1 + 0.1
0.30000000000000004
Seeing this for the first time can be pretty shocking, but this is a classic example of floating-point representation error. It has nothing to do with Python. The error has to do with how machines store floating-point numbers in memory.
第一次看到这一点可能会令人震惊,但这是浮点表示错误的经典示例。 它与Python无关。 该错误与机器如何在内存中存储浮点数有关。
Most modern computers store floating-point numbers as binary decimals with 53-bit precision. Only numbers that have finite binary decimal representations that can be expressed in 53 bits are stored as an exact value. Not every number has a finite binary decimal representation.
大多数现代计算机将浮点数存储为53位精度的二进制小数。 只有具有可以用53位表示的有限二进制十进制表示形式的数字才存储为精确值。 并非每个数字都有一个有限的二进制十进制表示形式。
For example, the decimal number 0.1
has a finite decimal representation, but infinite binary representation. Just like the fraction 1/3 can only be represented in decimal as the infinitely repeating decimal 0.333...
, the fraction 1/10
can only be expressed in binary as the infinitely repeating decimal 0.0001100110011...
.
例如,十进制数0.1
具有有限的十进制表示形式,但是具有无限的二进制表示形式。 就像分数1/3只能以十进制表示为无限重复的小数0.333...
,分数1/10
只能以二进制表示为无限重复的小数0.0001100110011...
A value with an infinite binary representation is rounded to an approximate value to be stored in memory. The method that most machines use to round is determined according to the IEEE-754 standard, which specifies rounding to the nearest representable binary fraction.
具有无限二进制表示形式的值将四舍五入为近似值以存储在内存中。 大多数机器使用的舍入方法是根据IEEE-754标准确定的,该标准规定了舍入到最接近的可表示二进制分数的方法。
The Python docs have a section called Floating Point Arithmetic: Issues and Limitations which has this to say about the number 0.1:
Python文档有一个名为“ 浮点算术:问题和局限性”的小节,其中用数字0.1表示:
On most machines, if Python were to print the true decimal value of the binary approximation stored for
0.1
, it would have to display在大多数机器上,如果Python要打印存储为
0.1
的二进制近似值的真实十进制值,则必须显示>>>>>> 0.1 0.1000000000000000055511151231257827021181583404541015625
>>>>>> 0.1 0.1000000000000000055511151231257827021181583404541015625
That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead
这比大多数人认为有用的数字更多,因此Python通过显示舍入的值来保持数字的可管理性。
>>>>>> 1 / 10 0.1
>>>Just remember, even though the printed result looks like the exact value of
1/10
, the actual stored value is the nearest representable binary fraction. (Source)请记住,即使打印结果看起来像精确的
1/10
值,实际的存储值也是最接近的可表示二进制分数。 ( 来源 )
For a more in-depth treatise on floating-point arithmetic, check out David Goldberg’s article What Every Computer Scientist Should Know About Floating-Point Arithmetic, originally published in the journal ACM Computing Surveys, Vol. 23, No. 1, March 1991.
有关浮点算术的更深入论述,请参阅David Goldberg的文章, 《每位计算机科学家应该知道的有关浮点算术的知识》 ,该文章最初发表在ACM Computing Surveys,第1卷。 1991年3月23日,第1号。
The fact that Python says that -1.225 * 100
is -122.50000000000001
is an artifact of floating-point representation error. You might be asking yourself, “Okay, but is there a way to fix this?” A better question to ask yourself is “Do I need to fix this?”
Python表示-1.225 * 100
为-122.50000000000001
是浮点表示错误的-122.50000000000001
。 您可能会问自己:“好吧,但是有办法解决此问题吗?” 一个更好的问题要问自己:“我需要解决这个问题吗?”
Floating-point numbers do not have exact precision, and therefore should not be used in situations where precision is paramount. For applications where the exact precision is necessary, you can use the Decimal
class from Python’s decimal
module. You’ll learn more about the Decimal
class below.
浮点数没有精确的精度,因此,在精度至关重要的情况下, 不应使用浮点数。 对于需要精确度的应用程序,可以使用Python decimal
模块中的Decimal
类。 您将在下面了解有关Decimal
类的更多信息。
If you have determined that Python’s standard float
class is sufficient for your application, some occasional errors in round_half_up()
due to floating-point representation error shouldn’t be a concern.
如果您确定Python的标准float
类足以满足您的应用程序的需要,则由于浮点表示错误而导致的round_half_up()
偶尔出现的错误就round_half_up()
担心了。
Now that you’ve gotten a taste of how machines round numbers in memory, let’s continue our discussion on rounding strategies by looking at another way to break a tie.
既然您已经了解了机器如何对内存中的数字进行舍入,让我们通过寻找打破平局的另一种方式继续我们对舍入策略的讨论。
The “rounding half down” strategy rounds to the nearest number with the desired precision, just like the “rounding half up” method, except that it breaks ties by rounding to the lesser of the two numbers. Here are some examples:
“舍入一半向下”策略与“舍入一半向上”方法一样,以所需的精度舍入到最接近的数字,不同之处在于它通过舍入到两个数字中的较小者来打破平局。 这里有些例子:
Value | 值 | Round Half Down To | 舍入到一半 | Result | 结果 |
---|---|---|---|---|---|
13.825 | 13.825 | Tens place | 十数位 | 10 | 10 |
13.825 | 13.825 | Ones place | 一个人的地方 | 14 | 14 |
13.825 | 13.825 | Tenths place | 十分之一的地方 | 13.8 | 13.8 |
13.825 | 13.825 | Hundredths place | 百分之一的地方 | 13.82 | 13.82 |
You can implement the “rounding half down” strategy in Python by replacing math.floor()
in the round_half_up()
function with math.ceil()
and subtracting 0.5
instead of adding:
您可以通过替换实施“四舍五入半年下来”在Python战略math.floor()
在round_half_up()
函数math.ceil()
再减去0.5
,而不是添加的:
def def round_half_downround_half_down (( nn , , decimalsdecimals == 00 ):
):
multiplier multiplier = = 10 10 ** ** decimals
decimals
return return mathmath .. ceilceil (( nn ** multiplier multiplier - - 0.50.5 ) ) / / multiplier
multiplier
Let’s check round_half_down()
against a few test cases:
让我们针对一些测试用例检查round_half_down()
:
>>> round_half_down ( 1.5 )
1.0
>>> round_half_down ( - 1.5 )
-2.0
>>> round_half_down ( 2.25 , 1 )
2.2
Both round_half_up()
and round_half_down()
have no bias in general. However, rounding data with lots of ties does introduce a bias. For an extreme example, consider the following list of numbers:
round_half_up()
和round_half_down()
都没有偏差。 但是,通过大量联系对数据进行舍入确实会带来偏差。 举一个极端的例子,请考虑以下数字列表:
>>> data = [ - 2.15 , 1.45 , 4.35 , - 12.75 ]
Let’s compute the mean of these numbers:
让我们计算这些数字的平均值:
>>> statistics . mean ( data )
-2.275
Next, compute the mean on the data after rounding to one decimal place with round_half_up()
and round_half_down()
:
接下来,使用round_half_up()
和round_half_down()
将四舍五入到小数点后一位,然后计算数据的平均值:
>>> rhu_data = [ round_half_up ( n , 1 ) for n in data ]
>>> statistics . mean ( rhu_data )
-2.2249999999999996
>>> rhd_data = [ round_half_down ( n , 1 ) for n in data ]
>>> statistics . mean ( rhd_data )
-2.325
Every number in data
is a tie with respect to rounding to one decimal place. The round_half_up()
function introduces a round towards positive infinity bias, and round_half_down()
introduces a round towards negative infinity bias.
data
中的每个数字都是舍入到小数点后一位的关系。 round_half_up()
函数向正无限偏差引入一轮, round_half_down()
向负无限偏差引入一轮。
The remaining rounding strategies we’ll discuss all attempt to mitigate these biases in different ways.
我们将讨论其余的舍入策略,所有这些尝试都以不同的方式缓解了这些偏差。
If you examine round_half_up()
and round_half_down()
closely, you’ll notice that neither of these functions is symmetric around zero:
如果仔细检查round_half_up()
和round_half_down()
,您会注意到这些函数都不是零对称的:
>>> round_half_up ( 1.5 )
2.0
>>> round_half_up ( - 1.5 )
-1.0
>>> round_half_down ( 1.5 )
1.0
>>> round_half_down ( - 1.5 )
-2.0
One way to introduce symmetry is to always round a tie away from zero. The following table illustrates how this works:
引入对称性的一种方法是始终使领带远离零。 下表说明了它是如何工作的:
Value | 值 | Round Half Away From Zero To | 从零到一半舍入 | Result | 结果 |
---|---|---|---|---|---|
15.25 | 15.25 | Tens place | 十数位 | 20 | 20 |
15.25 | 15.25 | Ones place | 一个人的地方 | 15 | 15 |
15.25 | 15.25 | Tenths place | 十分之一的地方 | 15.3 | 15.3 |
-15.25 | -15.25 | Tens place | 十数位 | -20 | -20 |
-15.25 | -15.25 | Ones place | 一个人的地方 | -15 | -15 |
-15.25 | -15.25 | Tenths place | 十分之一的地方 | -15.3 | -15.3 |
To implement the “rounding half away from zero” strategy on a number n
, you start as usual by shifting the decimal point to the right a given number of places. Then you look at the digit d
immediately to the right of the decimal place in this new number. At this point, there are four cases to consider:
要在数字n
上实现“从零开始舍入一半”的方法,您可以照常开始,将小数点向右移动指定的位数。 然后,您将立即查看此新数字中小数点右边的数字d
。 此时,有四种情况需要考虑:
n
is positive and d >= 5
, round upn
is positive and d < 5
, round downn
is negative and d >= 5
, round downn
is negative and d < 4
, round upn
为正且d >= 5
,则四舍五入 n
为正且d < 5
,则向下舍入 n
为负且d >= 5
,则向下舍入 n
为负且d < 4
,则四舍五入 After rounding according to one of the above four rules, you then shift the decimal place back to the left.
根据上述四个规则之一舍入后,然后将小数位移回左侧。
Given a number n
and a value for decimals
, you could implement this in Python by using round_half_up()
and round_half_down()
:
给定一个数字n
和一个decimals
值,您可以在Python中通过使用round_half_up()
和round_half_down()
:
That’s easy enough, but there’s actually a simpler way!
这很容易,但是实际上有一种更简单的方法!
If you first take the absolute value of n
using Python’s built-in abs()
function, you can just use round_half_up()
to round the number. Then all you need to do is give the rounded number the same sign as n
. One way to do this is using the math.copysign()
function.
如果您首先使用Python的内置abs()
函数获取n
的绝对值,则可以使用round_half_up()
来舍入数字。 然后,您需要做的就是给舍入后的数字与n
相同的符号。 一种方法是使用math.copysign()
函数。
math.copysign()
takes two numbers a
and b
and returns a
with the sign of b
:
math.copysign()
使用两个数字a
和b
,并返回a
与的符号b
:
>>> math.copysign(1, -2)
-1.0
Notice that math.copysign()
returns a float
, even though both of its arguments were integers.
请注意, math.copysign()
返回一个float
,即使其两个参数都是整数。
Using abs()
, round_half_up()
and math.copysign()
, you can implement the “rounding half away from zero” strategy in just two lines of Python:
使用abs()
, round_half_up()
和math.copysign()
,您可以在两行Python中实现“从零开始舍入一半”的策略:
def def round_half_away_from_zeroround_half_away_from_zero (( nn , , decimalsdecimals == 00 ):
):
rounded_abs rounded_abs = = round_half_upround_half_up (( absabs (( nn ), ), decimalsdecimals )
)
return return mathmath .. copysigncopysign (( rounded_absrounded_abs , , nn )
)
In round_half_away_from_zero()
, the absolute value of n
is rounded to decimals
decimal places using round_half_up()
and this result is assigned to the variable rounded_abs
. Then the original sign of n
is applied to rounded_abs
using math.copysign()
, and this final value with the correct sign is returned by the function.
在round_half_away_from_zero()
,使用round_half_up()
将n
的绝对值四舍五入到decimals
位,并将此结果分配给变量rounded_abs
。 然后,使用math.copysign()
将n
的原始符号应用于rounded_abs
,该最终值带有正确的符号由函数返回。
Checking round_half_away_from_zero()
on a few different values shows that the function behaves as expected:
在几个不同的值上检查round_half_away_from_zero()
显示该函数的行为符合预期:
>>> round_half_away_from_zero ( 1.5 )
2.0
>>> round_half_away_from_zero ( - 1.5 )
-2.0
>>> round_half_away_from_zero ( - 12.75 , 1 )
-12.8
The round_half_away_from_zero()
function rounds numbers the way most people tend to round numbers in everyday life. Besides being the most familiar rounding function you’ve seen so far, round_half_away_from_zero()
also eliminates rounding bias well in datasets that have an equal number of positive and negative ties.
round_half_away_from_zero()
函数以大多数人在日常生活中倾向于四舍五入的方式对数字进行四舍五入。 除了是您迄今为止所见过的最熟悉的舍入函数外, round_half_away_from_zero()
还可以很好地消除正负关系相等的数据集中的舍入偏差。
Let’s check how well round_half_away_from_zero()
mitigates rounding bias in the example from the previous section:
在上一节的示例中,让我们检查round_half_away_from_zero()
减轻舍入偏差的程度:
>>> data = [ - 2.15 , 1.45 , 4.35 , - 12.75 ]
>>> statistics . mean ( data )
-2.275
>>> rhaz_data = [ round_half_away_from_zero ( n , 1 ) for n in data ]
>>> statistics . mean ( rhaz_data )
-2.2750000000000004
The mean value of the numbers in data
is preserved almost exactly when you round each number in data
to one decimal place with round_half_away_from_zero()
!
当您使用round_half_away_from_zero()
将data
每个数字四舍五入到小数点后一位时, data
数字的平均值几乎可以保留下来!
However, round_half_away_from_zero()
will exhibit a rounding bias when you round every number in datasets with only positive ties, only negative ties, or more ties of one sign than the other. Bias is only mitigated well if there are a similar number of positive and negative ties in the dataset.
但是,当您对数据集中的每个数字仅进行正向平键,仅负向平键或一个符号比另一方更多的平键时, round_half_away_from_zero()
将表现出舍入偏差。 只有在数据集中存在相同数量的正负关系时,偏差才能得到很好的缓解。
How do you handle situations where the number of positive and negative ties are drastically different? The answer to this question brings us full circle to the function that deceived us at the beginning of this article: Python’s built-in round()
function.
您如何处理正面和负面关系的数量截然不同的情况? 这个问题的答案使我们对本文开头欺骗我们的函数有了一个全面的了解:Python的内置round()
函数。
One way to mitigate rounding bias when rounding values in a dataset is to round ties to the nearest even number at the desired precision. Here are some examples of how to do that:
在对数据集中的值进行四舍五入时,减轻舍入偏差的一种方法是以所需的精度将关系四舍五入到最接近的偶数。 以下是一些有关如何执行此操作的示例:
Value | 值 | Round Half To Even To | 舍入到一半 | Result | 结果 |
---|---|---|---|---|---|
15.255 | 15.255 | Tens place | 十数位 | 20 | 20 |
15.255 | 15.255 | Ones place | 一个人的地方 | 15 | 15 |
15.255 | 15.255 | Tenths place | 十分之一的地方 | 15.2 | 15.2 |
15.255 | 15.255 | Hundredths place | 百分之一的地方 | 15.26 | 15.26 |
The “rounding half to even strategy” is the strategy used by Python’s built-in round()
function and is the default rounding rule in the IEEE-754 standard. This strategy works under the assumption that the probabilities of a tie in a dataset being rounded down or rounded up are equal. In practice, this is usually the case.
“将一半舍入为偶数策略”是Python内置的round()
函数使用的策略,并且是IEEE-754标准中的默认舍入规则 。 该策略在假设数据集中被四舍五入或四舍五入的平局概率相等的前提下工作。 实际上,通常是这种情况。
Now you know why round(2.5)
returns 2
. It’s not a mistake. It is a conscious design decision based on solid recommendations.
现在您知道为什么round(2.5)
返回2
。 没错 这是基于可靠建议的有意识的设计决策。
To prove to yourself that round()
really does round to even, try it on a few different values:
要向自己证明round()
确实可以舍入为偶数,请尝试以下几种不同的值:
>>> round ( 4.5 )
4
>>> round ( 3.5 )
4
>>> round ( 1.75 , 1 )
1.8
>>> round ( 1.65 , 1 )
1.6
The round()
function is nearly free from bias, but it isn’t perfect. For example, rounding bias can still be introduced if the majority of the ties in your dataset round up to even instead of rounding down. Strategies that mitigate bias even better than “rounding half to even” do exist, but they are somewhat obscure and only necessary in extreme circumstances.
round()
函数几乎没有偏差,但这并不是完美的。 例如,如果数据集中的大多数联系向上舍入而不是向下舍入,仍然可以引入舍入偏差。 确实存在比“四舍五入”更好的缓解偏见的策略,但是它们有些模糊,只有在极端情况下才需要。
Finally, round()
suffers from the same hiccups that you saw in round_half_up()
thanks to floating-point representation error:
最后,由于浮点表示错误, round()
遭受了与round_half_up()
相同的round_half_up()
:
>>> # Expected value: 2.68
>>> round ( 2.675 , 2 )
2.67
You shouldn’t be concerned with these occasional errors if floating-point precision is sufficient for your application.
如果浮点精度足以满足您的应用程序的需求,则不必担心这些偶然的错误。
When precision is paramount, you should use Python’s Decimal
class.
当精度至关重要时,您应该使用Python的Decimal
类。
Decimal
类 (The Decimal
Class)Python’s decimal module is one of those “batteries-included” features of the language that you might not be aware of if you’re new to Python. The guiding principle of the decimal
module can be found in the documentation:
Python的十进制模块是该语言的那些“含电池”功能之一,如果您不熟悉Python,可能会不知道。 decimal
模块的指导原则可以在文档中找到:
Decimal “is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.” – excerpt from the decimal arithmetic specification. (Source)
十进制“基于浮点模型,该浮点模型是为人而设计的,并且必然具有最重要的指导原则–计算机必须提供一种与人们在学校学习的算法相同的算法。” –摘自十进制算术规范。 ( 来源 )
The benefits of the decimal
module include:
decimal
模块的好处包括:
0.1
is actually 0.1
, and 0.1 + 0.1 + 0.1 - 0.3
returns 0
, as you’d expect.1.20
and 2.50
, the result is 3.70
with the trailing zero maintained to indicate significance.decimal
module is twenty-eight digits, but this value can be altered by the user to match the problem at hand.0.1
实际上是0.1
,而0.1 + 0.1 + 0.1 - 0.3
返回0
,这正是您所期望的。 1.20
和2.50
,结果为3.70
,并保留了结尾的零以指示重要性。 decimal
模块的默认精度为28位,但是用户可以更改此值以匹配当前的问题。 Let’s explore how rounding works in the decimal
module. Start by typing the following into a Python REPL:
让我们探讨一下decimal
模块中的舍入是如何工作的。 首先在Python REPL中输入以下内容:
>>> import decimal
>>> decimal . getcontext ()
Context(
prec=28,
rounding=ROUND_HALF_EVEN,
Emin=-999999,
Emax=999999,
capitals=1,
clamp=0,
flags=[],
traps=[
InvalidOperation,
DivisionByZero,
Overflow
]
)
decimal.getcontext()
returns a Context
object representing the default context of the decimal
module. The context includes the default precision and the default rounding strategy, among other things.
decimal.getcontext()
返回一个Context
对象,该对象代表decimal
模块的默认上下文。 上下文包括默认精度和默认舍入策略等。
As you can see in the example above, the default rounding strategy for the decimal
module is ROUND_HALF_EVEN
. This aligns with the built-in round()
function and should be the preferred rounding strategy for most purposes.
如上例所示, decimal
模块的默认舍入策略为ROUND_HALF_EVEN
。 这与内置的round()
函数保持一致,对于大多数目的,它应该是首选的舍入策略。
Let’s declare a number using the decimal
module’s Decimal
class. To do so, create a new Decimal
instance by passing a string
containing the desired value:
让我们使用decimal
模块的Decimal
类声明一个数字。 为此,通过传递包含所需值的string
创建新的Decimal
实例:
>>> from decimal import Decimal
>>> Decimal ( "0.1" )
Decimal('0.1')
Note: It is possible to create a Decimal
instance from a floating-point number, but doing so introduces floating-point representation error right off the bat. For example, check out what happens when you create a Decimal
instance from the floating-point number 0.1
:
注意:可以从浮点数创建Decimal
实例,但是这样做会立即引入浮点表示错误。 例如,检查从浮点数0.1
创建Decimal
实例时会发生什么:
>>> Decimal ( 0.1 )
Decimal('0.1000000000000000055511151231257827021181583404541015625')
In order to maintain exact precision, you must create Decimal
instances from strings containing the decimal numbers you need.
为了保持精确的精度,必须从包含所需十进制数字的字符串中创建Decimal
实例。
Just for fun, let’s test the assertion that Decimal
maintains exact decimal representation:
只是为了好玩,让我们测试一下Decimal
保持精确的十进制表示形式的断言:
>>> Decimal ( '0.1' ) + Decimal ( '0.1' ) + Decimal ( '0.1' )
Decimal('0.3')
Ahhh. That’s satisfying, isn’t it?
啊 令人满意,不是吗?
Rounding a Decimal
is done with the .quantize()
method:
舍入Decimal
与done .quantize()
方法:
>>> Decimal ( "1.65" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.6')
Okay, that probably looks a little funky, so let’s break that down. The Decimal("1.0")
argument in .quantize()
determines the number of decimal places to round the number. Since 1.0
has one decimal place, the number 1.65
rounds to a single decimal place. The default rounding strategy is “rounding half to even,” so the result is 1.6
.
好吧,这看起来可能有点时髦,所以让我们分解一下。 .quantize()
的Decimal("1.0")
参数确定小数位数以四舍五入。 由于1.0
有一个小数位,因此数字1.65
四舍五入到一个小数位。 默认的舍入策略为“将一半舍入为偶数”,因此结果为1.6
。
Recall that the round()
function, which also uses the “rounding half to even strategy,” failed to round 2.675
to two decimal places correctly. Instead of 2.68
, round(2.675, 2)
returns 2.67
. Thanks to the decimal
modules exact decimal representation, you won’t have this issue with the Decimal
class:
回想一下, round()
函数也使用“将一半舍入为偶数策略”,但未能将2.675
正确舍入到小数点后两位。 round(2.675, 2)
代替2.68
返回2.67
。 多亏了decimal
模块的精确十进制表示形式, Decimal
类不会出现此问题:
>>> Decimal ( "2.675" ) . quantize ( Decimal ( "1.00" ))
Decimal('2.68')
Another benefit of the decimal
module is that rounding after performing arithmetic is taken care of automatically, and significant digits are preserved. To see this in action, let’s change the default precision from twenty-eight digits to two, and then add the numbers 1.23
and 2.32
:
decimal
模块的另一个好处是自动执行算术运算后的舍入并保留了有效数字。 为了了解这一点,让我们将默认精度从28位更改为两位,然后添加数字1.23
和2.32
:
>>> decimal . getcontext () . prec = 2
>>> Decimal ( "1.23" ) + Decimal ( "2.32" )
Decimal('3.6')
To change the precision, you call decimal.getcontext()
and set the .prec
attribute. If setting the attribute on a function call looks odd to you, you can do this because .getcontext()
returns a special Context
object that represents the current internal context containing the default parameters used by the decimal
module.
要更改精度,请调用decimal.getcontext()
并设置.prec
属性。 如果在函数调用上设置属性看起来很奇怪,则可以这样做,因为.getcontext()
返回一个特殊的Context
对象,该对象表示当前的内部上下文,其中包含decimal
模块使用的默认参数。
The exact value of 1.23
plus 2.32
is 3.55
. Since the precision is now two digits, and the rounding strategy is set to the default of “rounding half to even,” the value 3.55
is automatically rounded to 3.6
.
1.23
加2.32
的精确值为3.55
。 由于现在的精度为两位数,并且舍入策略设置为默认的“舍入一半至偶数”,因此值3.55
将自动舍入为3.6
。
To change the default rounding strategy, you can set the decimal.getcontect().rounding
property to any one of several flags. The following table summarizes these flags and which rounding strategy they implement:
要更改默认的舍入策略,可以将decimal.getcontect().rounding
属性设置为多个flags中的任何一个。 下表总结了这些标志及其实现的舍入策略:
Flag | 旗 | Rounding Strategy | 四舍五入策略 |
---|---|---|---|
decimal.ROUND_CEILING decimal.ROUND_CEILING |
Rounding up | 围捕 | |
decimal.ROUND_FLOOR decimal.ROUND_FLOOR |
Rounding down | 四舍五入 | |
decimal.ROUND_DOWN decimal.ROUND_DOWN |
Truncation | 截断 | |
decimal.ROUND_UP decimal.ROUND_UP |
Rounding away from zero | 四舍五入 | |
decimal.ROUND_HALF_UP decimal.ROUND_HALF_UP |
Rounding half away from zero | 四舍五入到零 | |
decimal.ROUND_HALF_DOWN decimal.ROUND_HALF_DOWN |
Rounding half towards zero | 将一半舍入为零 | |
decimal.ROUND_HALF_EVEN decimal.ROUND_HALF_EVEN |
Rounding half to even | 四舍五入到偶数 | |
decimal.ROUND_05UP decimal.ROUND_05UP |
Rounding up and rounding towards zero | 四舍五入并向零舍入 |
The first thing to notice is that the naming scheme used by the decimal
module differs from what we agreed to earlier in the article. For example, decimal.ROUND_UP
implements the “rounding away from zero” strategy, which actually rounds negative numbers down.
首先要注意的是, decimal
模块使用的命名方案与本文前面所同意的不同。 例如, decimal.ROUND_UP
实现“从零舍入”策略,该策略实际上将负数舍入。
Secondly, some of the rounding strategies mentioned in the table may look unfamiliar since we haven’t discussed them. You’ve already seen how decimal.ROUND_HALF_EVEN
works, so let’s take a look at each of the others in action.
其次,表中提到的一些舍入策略可能看起来并不熟悉,因为我们没有讨论它们。 您已经了解了decimal.ROUND_HALF_EVEN
工作原理,因此让我们看一下其他每个实际的操作。
The decimal.ROUND_CEILING
strategy works just like the round_up()
function we defined earlier:
decimal.ROUND_CEILING
策略的工作方式与我们之前定义的round_up()
函数类似:
>>> decimal . getcontext () . rounding = decimal . ROUND_CEILING
>>> Decimal ( "1.32" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.4')
>>> Decimal ( "-1.32" ) . quantize ( Decimal ( "1.0" ))
Decimal('-1.3')
Notice that the results of decimal.ROUND_CEILING
are not symmetric around zero.
请注意, decimal.ROUND_CEILING
的结果在零附近不对称。
The decimal.ROUND_FLOOR
strategy works just like our round_down()
function:
decimal.ROUND_FLOOR
策略的工作方式与我们的round_down()
函数类似:
>>> decimal . getcontext () . rounding = decimal . ROUND_FLOOR
>>> Decimal ( "1.32" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.3')
>>> Decimal ( "-1.32" ) . quantize ( Decimal ( "1.0" ))
Decimal('-1.4')
Like decimal.ROUND_CEILING
, the decimal.ROUND_FLOOR
strategy is not symmetric around zero.
像decimal.ROUND_CEILING
一样, decimal.ROUND_FLOOR
策略也不对称于零。
The decimal.ROUND_DOWN
and decimal.ROUND_UP
strategies have somewhat deceptive names. Both ROUND_DOWN
and ROUND_UP
are symmetric around zero:
decimal.ROUND_DOWN
和decimal.ROUND_UP
策略的名称具有欺骗性。 ROUND_DOWN
和ROUND_UP
都围绕零对称:
>>> decimal . getcontext () . rounding = decimal . ROUND_DOWN
>>> Decimal ( "1.32" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.3')
>>> Decimal ( "-1.32" ) . quantize ( Decimal ( "1.0" ))
Decimal('-1.3')
>>> decimal . getcontext () . rounding = decimal . ROUND_UP
>>> Decimal ( "1.32" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.4')
>>> Decimal ( "-1.32" ) . quantize ( Decimal ( "1.0" ))
Decimal('-1.4')
The decimal.ROUND_DOWN
strategy rounds numbers towards zero, just like the truncate()
function. On the other hand, decimal.ROUND_UP
rounds everything away from zero. This is a clear break from the terminology we agreed to earlier in the article, so keep that in mind when you are working with the decimal
module.
就像truncate()
函数一样, decimal.ROUND_DOWN
策略decimal.ROUND_DOWN
数字四舍五入为零。 另一方面, decimal.ROUND_UP
所有内容四舍五入为零。 这与我们在本文前面同意的术语明显不同,因此在使用decimal
模块时请记住这一点。
There are three strategies in the decimal
module that allow for more nuanced rounding. The decimal.ROUND_HALF_UP
method rounds everything to the nearest number and breaks ties by rounding away from zero:
decimal
模块中有三种策略可以实现更细微的舍入。 decimal.ROUND_HALF_UP
方法将所有内容四舍五入到最接近的数字,并通过四舍五入来打破decimal.ROUND_HALF_UP
:
>>> decimal . getcontext () . rounding = decimal . ROUND_HALF_UP
>>> Decimal ( "1.35" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.4')
>>> Decimal ( "-1.35" ) . quantize ( Decimal ( "1.0" ))
Decimal('-1.4')
Notice that decimal.ROUND_HALF_UP
works just like our round_half_away_from_zero()
and not like round_half_up()
.
请注意, decimal.ROUND_HALF_UP
工作方式与我们的round_half_away_from_zero()
相似,而与round_half_up()
。
There is also a decimal.ROUND_HALF_DOWN
strategy that breaks ties by rounding towards zero:
还有一个decimal.ROUND_HALF_DOWN
策略,通过四舍五入来打破decimal.ROUND_HALF_DOWN
:
>>> decimal . getcontext () . rounding = decimal . ROUND_HALF_DOWN
>>> Decimal ( "1.35" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.3')
>>> Decimal ( "-1.35" ) . quantize ( Decimal ( "1.0" ))
Decimal('-1.3')
The final rounding strategy available in the decimal
module is very different from anything we have seen so far:
decimal
模块中可用的最终舍入策略与迄今为止所看到的完全不同:
>>> decimal . getcontext () . rounding = decimal . ROUND_05UP
>>> Decimal ( "1.38" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.3')
>>> Decimal ( "1.35" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.3')
>>> Decimal ( "-1.35" ) . quantize ( Decimal ( "1.0" ))
Decimal('-1.3')
In the above examples, it looks as if decimal.ROUND_05UP
rounds everything towards zero. In fact, this is exactly how decimal.ROUND_05UP
works, unless the result of rounding ends in a 0
or 5
. In that case, the number gets rounded away from zero:
在上面的示例中,看起来好像是decimal.ROUND_05UP
所有内容四舍五入为零。 实际上,这恰好是decimal.ROUND_05UP
工作方式,除非舍入结果以0
或5
结尾。 在这种情况下,数字会从零舍入:
>>> Decimal ( "1.49" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.4')
>>> Decimal ( "1.51" ) . quantize ( Decimal ( "1.0" ))
Decimal('1.6')
In the first example, the number 1.49
is first rounded towards zero in the second decimal place, producing 1.4
. Since 1.4
does not end in a 0
or a 5
, it is left as is. On the other hand, 1.51
is rounded towards zero in the second decimal place, resulting in the number 1.5
. This ends in a 5
, so the first decimal place is then rounded away from zero to 1.6
.
在第一个示例中,数字1.49
首先在小数点后第二位四舍五入,产生1.4
。 由于1.4
并不以0
或5
结尾,因此按原样保留。 另一方面, 1.51
在小数点后第二位舍入为零,结果为1.5
。 这以5
结尾,因此将小数点后的第一位从零舍入到1.6
。
In this section, we have only focused on the rounding aspects of the decimal
module. There are a large number of other features that make decimal
an excellent choice for applications where the standard floating-point precision is inadequate, such as banking and some problems in scientific computing.
在本节中,我们仅关注decimal
模块的舍入方面。 对于标准浮点精度不足的应用程序,例如银行业务和科学计算中的一些问题,还有许多其他功能使decimal
成为绝佳的选择。
For more information on Decimal
, check out the Quick-start Tutorial in the Python docs.
有关Decimal
更多信息,请查看Python文档中的快速入门教程 。
Next, let’s turn our attention to two staples of Python’s scientific computing and data science stacks: NumPy and Pandas.
接下来,让我们将注意力转向Python科学计算和数据科学堆栈的两个主要组成部分:NumPy和Pandas。
In the domains of data science and scientific computing, you often store your data as a NumPy array
. One of NumPy’s most powerful features is its use of vectorization and broadcasting to apply operations to an entire array at once instead of one element at a time.
在数据科学和科学计算领域,您通常将数据存储为NumPy array
。 NumPy最强大的功能之一是其使用矢量化和广播功能,一次将操作应用于整个数组,而不是一次应用于一个元素。
Let’s generate some data by creating a 3×4 NumPy array of pseudo-random numbers:
让我们通过创建一个3×4 NumPy伪随机数数组来生成一些数据:
>>> import numpy as np
>>> np . random . seed ( 444 )
>>> data = np . random . randn ( 3 , 4 )
>>> data
array([[ 0.35743992, 0.3775384 , 1.38233789, 1.17554883],
[-0.9392757 , -1.14315015, -0.54243951, -0.54870808],
[ 0.20851975, 0.21268956, 1.26802054, -0.80730293]])
First, we seed the np.random
module so that you can easily reproduce the output. Then a 3×4 NumPy array of floating-point numbers is created with np.random.randn()
.
首先,我们播种np.random
模块,以便您可以轻松地重现输出。 然后,使用np.random.randn()
创建一个3×4 NumPy浮点数数组。
Note: You’ll need to pip3 install numpy
before typing the above code into your REPL if you don’t already have NumPy in your environment. If you installed Python with Anaconda, you’re already set!
注意:如果您的环境中还没有NumPy,则需要在将上述代码键入REPL之前先pip3 install numpy
。 如果您在Anaconda中安装了Python,那么您已经设置好了!
If you haven’t used NumPy before, you can get a quick introduction in the Getting Into Shape section of Brad Solomon’s Look Ma, No For-Loops: Array Programming With NumPy here at Real Python.
如果您以前没有使用过NumPy,则可以在Real Python的Brad Solomon的Look Ma,No For-Loops:使用NumPy进行数组编程中的“ 进入形状”部分中进行快速介绍。
For more information on NumPy’s random module, check out the PRNG’s for Arrays section of Brad’s Generating Random Data in Python (Guide).
有关NumPy的随机模块的更多信息,请查看Brad的《在Python中生成随机数据》(指南)的PRNG的Arrays部分。
To round all of the values in the data
array, you can pass data
as the argument to the np.around()
function. The desired number of decimal places is set with the decimals
keyword argument. The round half to even strategy is used, just like Python’s built-in round()
function.
要舍入data
数组中的所有值,可以将data
作为参数传递给np.around()
函数。 所需的小数位数是通过使用decimals
关键字参数设置的。 就像Python的内置round()
函数一样,使用了“舍入到一半”策略。
For example, the following rounds all of the values in data
to three decimal places:
例如,以下将data
所有值四舍五入到小数点后三位:
>>> np . around ( data , decimals = 3 )
array([[ 0.357, 0.378, 1.382, 1.176],
[-0.939, -1.143, -0.542, -0.549],
[ 0.209, 0.213, 1.268, -0.807]])
np.around()
is at the mercy of floating-point representation error, just like round()
is.
就像round()
一样, np.around()
也会受到浮点表示错误的影响。
For example, the value in the third row of the first column in the data
array is 0.20851975
. When you round this to three decimal places using the “rounding half to even” strategy, you expect the value to be 0.208
. But you can see in the output from np.around()
that the value is rounded to 0.209
. However, the value 0.3775384
in the first row of the second column rounds correctly to 0.378
.
例如, data
数组中第一列的第三行中的值为0.20851975
。 当您使用“将一半舍入为偶数”策略将其舍入到小数点后三位时,您期望该值为0.208
。 但是您可以在np.around()
的输出中看到,该值四舍五入为0.209
。 但是,第二列的第一行中的值0.3775384
正确0.3775384
入为0.378
。
If you need to round the data in your array to integers, NumPy offers several options:
如果需要将数组中的数据四舍五入为整数,NumPy提供了几种选择:
numpy.ceil()
numpy.floor()
numpy.trunc()
numpy.rint()
numpy.ceil()
numpy.floor()
numpy.trunc()
numpy.rint()
The np.ceil()
function rounds every value in the array to the nearest integer greater than or equal to the original value:
np.ceil()
函数将数组中的每个值np.ceil()
入为大于或等于原始值的最接近的整数:
>>> np . ceil ( data )
array([[ 1., 1., 2., 2.],
[-0., -1., -0., -0.],
[ 1., 1., 2., -0.]])
Hey, we discovered a new number! Negative zero!
嘿,我们发现了一个新号码! 负零!
Actually, the IEEE-754 standard requires the implementation of both a positive and negative zero. What possible use is there for something like this? Wikipedia knows the answer:
实际上, IEEE-754标准要求同时实现正零和负零。 这样的东西有什么可能的用途? 维基百科知道答案:
Informally, one may use the notation “
−0
” for a negative value that was rounded to zero. This notation may be useful when a negative sign is significant; for example, when tabulating Celsius temperatures, where a negative sign means below freezing. (Source)非正式地,可以将符号“
−0
”用于舍入为零的负值。 当负号显着时,此符号可能有用。 例如,在对摄氏温度进行制表时,负号表示低于冰点。 ( 来源 )
To round every value down to the nearest integer, use np.floor()
:
要将每个值舍入为最接近的整数,请使用np.floor()
:
>>> np . floor ( data )
array([[ 0., 0., 1., 1.],
[-1., -2., -1., -1.],
[ 0., 0., 1., -1.]])
You can also truncate each value to its integer component with np.trunc()
:
您还可以使用np.trunc()
将每个值截断为其整数部分:
>>> np . trunc ( data )
array([[ 0., 0., 1., 1.],
[-0., -1., -0., -0.],
[ 0., 0., 1., -0.]])
Finally, to round to the nearest integer using the “rounding half to even” strategy, use np.rint()
:
最后,要使用“舍入一半到偶数”策略舍入到最接近的整数,请使用np.rint()
:
>>> np . rint ( data )
array([[ 0., 0., 1., 1.],
[-1., -1., -1., -1.],
[ 0., 0., 1., -1.]])
You might have noticed that a lot of the rounding strategies we discussed earlier are missing here. For the vast majority of situations, the around()
function is all you need. If you need to implement another strategy, such as round_half_up()
, you can do so with a simple modification:
您可能已经注意到,我们前面讨论的许多舍入策略都在这里丢失了。 在绝大多数情况下,您只需要around()
函数。 如果您需要实现另一种策略,例如round_half_up()
,则可以通过简单的修改来实现:
Thanks to NumPy’s vectorized operations, this works just as you expect:
感谢NumPy的向量化操作 ,它可以按您期望的那样工作:
>>> round_half_up(data, decimals=2)
array([[ 0.36, 0.38, 1.38, 1.18],
[-0.94, -1.14, -0.54, -0.55],
[ 0.21, 0.21, 1.27, -0.81]])
Now that you’re a NumPy rounding master, let’s take a look at Python’s other data science heavy-weight: the Pandas library.
既然您是NumPy取整大师,那么让我们看一下Python的另一项重量级的数据科学:Pandas库。
Series
和DataFrame
(Rounding Pandas Series
and DataFrame
)The Pandas library has become a staple for data scientists and data analysts who work in Python. In the words of Real Python’s own Joe Wyndham:
Pandas库已成为使用Python的数据科学家和数据分析人员的必备资料。 用Real Python自己的Joe Wyndham的话来说:
Pandas is a game-changer for data science and analytics, particularly if you came to Python because you were searching for something more powerful than Excel and VBA. (Source)
Pandas改变了数据科学和分析的格局,尤其是因为您要搜索比Excel和VBA更强大的功能而加入Python时,尤其如此。 ( 来源 )
Note: Before you continue, you’ll need to pip3 install pandas
if you don’t already have it in your environment. As was the case for NumPy, if you installed Python with Anaconda, you should be ready to go!
注意:如果您的环境中尚未pip3 install pandas
则需要继续进行pip3 install pandas
。 与NumPy一样,如果您将Python与Anaconda一起安装了,就应该准备好了!
The two main Pandas data structures are the DataFrame
, which in very loose terms works sort of like an Excel spreadsheet, and the Series
, which you can think of as a column in a spreadsheet. Both Series
and DataFrame
objects can also be rounded efficiently using the Series.round()
and DataFrame.round()
methods:
两个主要的大熊猫数据结构的DataFrame
,这在非常宽松的条件的工作有点像Excel电子表格,以及Series
,你可以认为的电子表格中的一列。 Series
和DataFrame
对象也可以使用Series.round()
和DataFrame.round()
方法有效地四舍五入:
>>> import pandas as pd
>>> # Re-seed np.random if you closed your REPL since the last example
>>> np.random.seed(444)
>>> series = pd.Series(np.random.randn(4))
>>> series
0 0.357440
1 0.377538
2 1.382338
3 1.175549
dtype: float64
>>> series.round(2)
0 0.36
1 0.38
2 1.38
3 1.18
dtype: float64
>>> df = pd.DataFrame(np.random.randn(3, 3), columns=["A", "B", "C"])
>>> df
A B C
0 -0.939276 -1.143150 -0.542440
1 -0.548708 0.208520 0.212690
2 1.268021 -0.807303 -3.303072
>>> df.round(3)
A B C
0 -0.939 -1.143 -0.542
1 -0.549 0.209 0.213
2 1.268 -0.807 -3.303
The DataFrame.round()
method can also accept a dictionary or a Series
, to specify a different precision for each column. For instance, the following examples show how to round the first column of df
to one decimal place, the second to two, and the third to three decimal places:
DataFrame.round()
方法还可以接受字典或Series
,以为每列指定不同的精度。 例如,以下示例显示如何将df
的第一列四舍五入到小数点后一位,第二至第二位,以及小数点后三位至三位:
>>> # Specify column-by-column precision with a dictionary
>>> df.round({"A": 1, "B": 2, "C": 3})
A B C
0 -0.9 -1.14 -0.542
1 -0.5 0.21 0.213
2 1.3 -0.81 -3.303
>>> # Specify column-by-column precision with a Series
>>> decimals = pd.Series([1, 2, 3], index=["A", "B", "C"])
>>> df.round(decimals)
A B C
0 -0.9 -1.14 -0.542
1 -0.5 0.21 0.213
2 1.3 -0.81 -3.303
If you need more rounding flexibility, you can apply NumPy’s floor()
, ceil()
, and rint()
functions to Pandas Series
and DataFrame
objects:
如果需要更大的舍入灵活性,可以将NumPy的floor()
, ceil()
和rint()
函数应用于Pandas Series
和DataFrame
对象:
>>> np.floor(df)
A B C
0 -1.0 -2.0 -1.0
1 -1.0 0.0 0.0
2 1.0 -1.0 -4.0
>>> np.ceil(df)
A B C
0 -0.0 -1.0 -0.0
1 -0.0 1.0 1.0
2 2.0 -0.0 -3.0
>>> np.rint(df)
A B C
0 -1.0 -1.0 -1.0
1 -1.0 0.0 0.0
2 1.0 -1.0 -3.0
The modified round_half_up()
function from the previous section will also work here:
上一节中修改后的round_half_up()
函数在这里也可以使用:
>>> round_half_up(df, decimals=2)
A B C
0 -0.94 -1.14 -0.54
1 -0.55 0.21 0.21
2 1.27 -0.81 -3.30
Congratulations, you’re well on your way to rounding mastery! You now know that there are more ways to round a number than there are taco combinations. (Well… maybe not!) You can implement numerous rounding strategies in pure Python, and you have sharpened your skills on rounding NumPy arrays and Pandas Series
and DataFrame
objects.
恭喜,您已经掌握四舍五入了! 您现在知道,比起炸玉米饼组合,有更多的方法可以舍入数字。 (嗯……也许不是!)您可以在纯Python中实现多种舍入策略,并且已经提高了对NumPy数组以及Pandas Series
和DataFrame
对象进行舍入的技巧。
There’s just one more step: knowing when to apply the right strategy.
仅此而已,只有一步:知道何时应用正确的策略。
The last stretch on your road to rounding virtuosity is understanding when to apply your newfound knowledge. In this section, you’ll learn some best practices to make sure you round your numbers the right way.
四舍五入的道路上的最后一步是了解何时应用新发现的知识。 在本节中,您将学习一些最佳实践,以确保以正确的方式四舍五入数字。
When you deal with large sets of data, storage can be an issue. In most relational databases, each column in a table is designed to store a specific data type, and numeric data types are often assigned precision to help conserve memory.
当您处理大量数据时,存储可能会成为问题。 在大多数关系数据库中,表中的每一列都旨在存储特定的数据类型,并且经常为数字数据类型分配精度以帮助节省内存。
For example, a temperature sensor may report the temperature in a long-running industrial oven every ten seconds accurate to eight decimal places. The readings from this are used to detect abnormal fluctuations in temperature that could indicate the failure of a heating element or some other component. So, there might be a Python script running that compares each incoming reading to the last to check for large fluctuations.
例如,温度传感器可以每隔十秒报告一次长时间运行的工业烤箱中的温度,精确到小数点后八位。 从中读取的读数用于检测温度异常波动,该异常波动可能表明加热元件或某些其他组件发生故障。 因此,可能正在运行一个Python脚本,该脚本将每个输入读数与最后一个读数进行比较,以检查是否有较大的波动。
The readings from this sensor are also stored in a SQL database so that the daily average temperature inside the oven can be computed each day at midnight. The manufacturer of the heating element inside the oven recommends replacing the component whenever the daily average temperature drops .05
degrees below normal.
来自该传感器的读数也存储在SQL数据库中,以便可以每天在午夜计算烤箱内部的日平均温度。 烤箱内加热元件的制造商建议,只要日平均温度低于正常水平.05
度,就应更换组件。
For this calculation, you only need three decimal places of precision. But you know from the incident at the Vancouver Stock Exchange that removing too much precision can drastically affect your calculation.
对于此计算,您只需要三个小数位的精度。 但是您从温哥华证券交易所的事件中知道,删除过多的精度会严重影响您的计算。
If you have the space available, you should store the data at full precision. If storage is an issue, a good rule of thumb is to store at least two or three more decimal places of precision than you need for your calculation.
如果有可用空间,则应以全精度存储数据。 如果存储是一个问题,一个好的经验法则是存储比您的计算所需的精度至少多两个或三个小数位。
Finally, when you compute the daily average temperature, you should calculate it to the full precision available and round the final answer.
最后,当您计算每日平均温度时,应将其计算到可用的全部精度并四舍五入得出最终答案。
When you order a cup of coffee for $2.40 at the coffee shop, the merchant typically adds a required tax. The amount of that tax depends a lot on where you are geographically, but for the sake of argument, let’s say it’s 6%. The tax to be added comes out to $0.144. Should you round this up to $0.15 or down to $0.14? The answer probably depends on the regulations set forth by the local government!
当您在咖啡店以$ 2.40的价格订购一杯咖啡时,商家通常会加收必填税款。 该税额很大程度上取决于您所处的地理位置,但是为了争辩,我们将其定为6%。 要加的税为$ 0.144。 您应该将其向上舍入为0.15美元还是向下舍入为0.14美元? 答案可能取决于当地政府的规定!
Situations like this can also arise when you are converting one currency to another. In 1999, the European Commission on Economical and Financial Affairs codified the use of the “rounding half away from zero” strategy when converting currencies to the Euro, but other currencies may have adopted different regulations.
当您将一种货币转换为另一种货币时,也会出现这种情况。 1999年,欧洲经济和金融事务委员会编纂了在将货币转换为欧元时使用“从零舍入为零”的策略 ,但其他货币可能采用了不同的规定。
Another scenario, “Swedish rounding”, occurs when the minimum unit of currency at the accounting level in a country is smaller than the lowest unit of physical currency. For example, if a cup of coffee costs $2.54 after tax, but there are no 1-cent coins in circulation, what do you do? The buyer won’t have the exact amount, and the merchant can’t make exact change.
另一种情况是“瑞典舍入” ,发生在一个国家会计级别的最小货币单位小于最低实物货币单位时。 例如,如果一杯咖啡的税后价格为2.54美元,但是流通中没有1美分的硬币,您该怎么办? 买方将没有确切的金额,而商人也无法进行准确的更改。
How situations like this are handled is typically determined by a country’s government. You can find a list of rounding methods used by various countries on Wikipedia.
此类情况的处理方式通常由一国政府决定。 您可以在Wikipedia上找到各国使用的舍入方法列表。
If you are designing software for calculating currencies, you should always check the local laws and regulations in your users’ locations.
如果要设计用于计算货币的软件,则应始终检查用户所在地的当地法律法规。
When you are rounding numbers in large datasets that are used in complex computations, the primary concern is limiting the growth of the error due to rounding.
在复杂计算中使用的大型数据集中对数值进行四舍五入时,主要的考虑是限制由于四舍五入而导致的误差的增长。
Of all the methods we’ve discussed in this article, the “rounding half to even” strategy minimizes rounding bias the best. Fortunately, Python, NumPy, and Pandas all default to this strategy, so by using the built-in rounding functions you’re already well protected!
在本文讨论的所有方法中,“将一半舍入为偶数”策略可以最大程度地减少舍入偏差。 幸运的是,Python,NumPy和Pandas都默认使用此策略,因此通过使用内置的舍入函数,您已经受到了很好的保护!
Whew! What a journey this has been!
ew! 这是一段多么艰难的旅程!
In this article, you learned that:
在本文中,您了解到:
There are various rounding strategies, which you now know how to implement in pure Python.
Every rounding strategy inherently introduces a rounding bias, and the “rounding half to even” strategy mitigates this bias well, most of the time.
The way in which computers store floating-point numbers in memory naturally introduces a subtle rounding error, but you learned how to work around this with the decimal
module in Python’s standard library.
You can round NumPy arrays and Pandas Series
and DataFrame
objects.
There are best practices for rounding with real-world data.
有多种舍入策略,您现在知道如何在纯Python中实现。
每个舍入策略都会固有地引入舍入偏差,而“大多数情况下,“将一半舍入为四舍五入””策略可以很好地缓解这种偏差。
计算机在内存中存储浮点数的方式自然会引入一个细微的舍入错误,但是您学习了如何使用Python标准库中的decimal
模块来解决此问题。
您可以舍入NumPy数组以及Pandas Series
和DataFrame
对象。
有一些最佳做法可以对实际数据进行四舍五入。
Take the Quiz: Test your knowledge with our interactive “Rounding Numbers in Python” quiz. Upon completion you will receive a score so you can track your learning progress over time.
参加测验:通过我们的交互式“ Python中的舍入数字”测验测试您的知识。 完成后,您将获得一个分数,以便您可以随时间追踪学习进度。
Click here to start the quiz »
点击此处开始测验»
If you are interested in learning more and digging into the nitty-gritty details of everything we’ve covered, the links below should keep you busy for quite a while.
如果您有兴趣了解更多信息并深入研究我们涵盖的所有内容的细节,那么下面的链接应该使您忙了一段时间。
At the very least, if you’ve enjoyed this article and learned something new from it, pass it on to a friend or team member! Be sure to share your thoughts with us in the comments. We’d love to hear some of your own rounding-related battle stories!
至少,如果您喜欢这篇文章并从中学到了新东西,请将其传递给朋友或团队成员! 请确保在评论中与我们分享您的想法。 我们希望听到您一些与舍入有关的战斗故事!
Happy Pythoning!
快乐的Pythoning!
Rounding strategies and bias:
四舍五入策略和偏见:
Floating-point and decimal specifications:
浮点数和十进制规格:
Interesting Reads:
有趣的读物:
翻译自: https://www.pybloggers.com/2018/10/how-to-round-numbers-in-python/