cumei1658

Pandas教程：使用Python进行数据分析：第1部分

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. Pandas builds on packages like NumPy and matplotlib to give you a single, convenient, place to do most of your data analysis and visualization work.

Python是进行数据分析的一种出色语言，主要是因为以数据为中心的Python软件包拥有一个奇妙的生态系统。 Pandas是其中的一种，使导入和分析数据更加容易。 Pandas建立在NumPy和matplotlib之类的软件包的基础上，为您提供一个方便，方便的地方来进行大多数数据分析和可视化工作。

In this introduction, we’ll use Pandas to analyze data on video game reviews from IGN, a popular video game review site. The data was scraped by Eric Grinstein, and can be found here. As we analyze the video game reviews, we’ll learn key Pandas concepts like indexing.

在本简介中，我们将使用Pandas分析来自流行视频游戏评论网站IGN的视频游戏评论数据。数据由Eric Grinstein抓取，可以在此处找到。在分析视频游戏评论时，我们将学习熊猫的关键概念，例如索引。

Do games like the Witcher 3 tend to get better reviews on the PS4 than the Xbox One? This dataset can help us find out.

像《巫师3》这样的游戏是否在PS4上获得比Xbox One更好的评论？该数据集可以帮助我们找出答案。

Just as a note, we’ll be using Python 3.5 and Jupyter Notebook to do our analysis.

请注意，我们将使用Python 3.5和Jupyter Notebook进行分析。

用熊猫导入数据 (Importing Data with Pandas)

The first step we’ll take is to read the data in. The data is stored as a comma-separated values, or csv, file, where each row is separated by a new line, and each column by a comma (,). Here are the first few rows of the ign.csv file:

我们将采取的第一步是读取数据。数据以逗号分隔的值或csv文件存储，其中每行用换行分隔，每列用逗号（ , ）分隔。以下是ign.csv文件的前几行：

,score_phrase,title,url,platform,score,genre,editors_choice,release_year,release_month,release_day
0,Amazing,LittleBigPlanet PS Vita,/games/littlebigplanet-vita/vita-98907,PlayStation Vita,9.0,Platformer,Y,2012,9,12
1,Amazing,LittleBigPlanet PS Vita -- Marvel Super Hero Edition,/games/littlebigplanet-ps-vita-marvel-super-hero-edition/vita-20027059,PlayStation Vita,9.0,Platformer,Y,2012,9,12
2,Great,Splice: Tree of Life,/games/splice/ipad-141070,iPad,8.5,Puzzle,N,2012,9,12
3,Great,NHL 13,/games/nhl-13/xbox-360-128182,Xbox 360,8.5,Sports,N,2012,9,11

,score_phrase,title,url,platform,score,genre,editors_choice,release_year,release_month,release_day
0,Amazing,LittleBigPlanet PS Vita,/games/littlebigplanet-vita/vita-98907,PlayStation Vita,9.0,Platformer,Y,2012,9,12
1,Amazing,LittleBigPlanet PS Vita -- Marvel Super Hero Edition,/games/littlebigplanet-ps-vita-marvel-super-hero-edition/vita-20027059,PlayStation Vita,9.0,Platformer,Y,2012,9,12
2,Great,Splice: Tree of Life,/games/splice/ipad-141070,iPad,8.5,Puzzle,N,2012,9,12
3,Great,NHL 13,/games/nhl-13/xbox-360-128182,Xbox 360,8.5,Sports,N,2012,9,11

As you can see above, each row in the data represents a single game that was reviewed by IGN. The columns contain information about that game:

正如您在上面看到的，数据中的每一行代表一个由IGN审核的游戏。这些列包含有关该游戏的信息：

score_phrase – how IGN described the game in one word. This is linked to the score it received.
title – the name of the game.
url – the URL where you can see the full review.
platform – the platform the game was reviewed on (PC, PS4, etc).
score – the score for the game, from 1.0 to 10.0.
genre – the genre of the game.
editors_choice – N if the game wasn’t an editor’s choice, Y if it was. This is tied to score.
release_year – the year the game was released.
release_month – the month the game was released.
release_day – the day the game was released.

score_phrase – IGN如何用一个词形容游戏。这链接到它收到的分数。
title –游戏名称。
url –您可以在其中查看完整评论的URL。
platform –审查游戏的平台（PC，PS4等）。
score –游戏的得分，从1.0到10.0 。
genre –游戏的体裁。
editors_choice –如果游戏不是编辑选择， editors_choice N否则为Y 这与得分息息相关。
release_year –游戏发布的年份。
release_month –游戏发布的月份。
release_day –游戏发布的日期。

There’s also a leading column that contains row index values. We can safely ignore this column, but we’ll dive into what index values are later on. In order to be able to work with the data in Python, we’ll need to read the csv file into a Pandas DataFrame. A DataFrame is a way to represent and work with tabular data. Tabular data has rows and columns, just like our csv file.

还有一个前导列，其中包含行索引值。我们可以放心地忽略此列，但稍后将深入探讨哪些索引值。为了能够使用Python中的数据，我们需要将csv文件读取到Pandas DataFrame中。 DataFrame是表示和使用表格数据的一种方式。表格数据具有行和列，就像我们的csv文件一样。

In order to read in the data, we’ll need to use the pandas.read_csv function. This function will take in a csv file and return a DataFrame. The below code will:

为了读入数据，我们需要使用pandas.read_csv函数。此函数将接收一个csv文件并返回一个DataFrame。下面的代码将：

Import the pandas library. We rename it to pd so it’s faster to type out.
Read ign.csv into a DataFrame, and assign the result to reviews.

导入pandas库。我们将其重命名为pd这样可以更快地进行输入。
阅读ign.csv成数据帧，并分配结果reviews 。

Once we read in a DataFrame, Pandas gives us two methods that make it fast to print out the data. These functions are:

读取DataFrame后，Pandas为我们提供了两种方法，可以快速打印出数据。这些功能是：

We’ll use the head method to see what’s in reviews:

我们将使用head方法查看reviews ：

reviewsreviews .. headhead ()
()

		Unnamed: 0	未命名：0	score_phrase	score_phrase	title	标题	url	网址	platform	平台	score	得分	genre	类型	editors_choice	editors_choice	release_year	release_year	release_month	release_month	release_day	release_day
0	0	0	0	Amazing	惊人	LittleBigPlanet PS Vita	LittleBigPlanet PS Vita	/games/littlebigplanet-vita/vita-98907	/ games / littlebigplanet-vita / vita-98907	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
1	1个	1	1个	Amazing	惊人	LittleBigPlanet PS Vita — Marvel Super Hero E…	LittleBigPlanet PS Vita —惊奇超级英雄E…	/games/littlebigplanet-ps-vita-marvel-super-he…	/ games / littlebigplanet-ps-vita-marvel-super-he…	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
2	2	2	2	Great	大	Splice: Tree of Life	拼接：生命之树	/games/splice/ipad-141070	/ games / splice / ipad-141070	iPad	的iPad	8.5	8.5	Puzzle	难题	N	ñ	2012	2012年	9	9	12	12
3	3	3	3	Great	大	NHL 13	NHL 13	/games/nhl-13/xbox-360-128182	/ games / nhl-13 / xbox-360-128182	Xbox 360	Xbox 360	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11
4	4	4	4	Great	大	NHL 13	NHL 13	/games/nhl-13/ps3-128181	/ games / nhl-13 / ps3-128181	PlayStation 3	的PlayStation 3	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11

We can also access the pandas.DataFrame.shape property to see row many rows and columns are in reviews:

我们还可以访问pandas.DataFrame.shape属性，以查看reviews行，行和列：


(18625, 11)

As you can see, everything has been read in properly – we have 18625 rows and 11 columns.

如您所见，所有内容均已正确读取-我们有18625行和11列。

One of the big advantages of Pandas vs just using NumPy is that Pandas allows you to have columns with different data types. reviews has columns that store float values, like score, string values, like score_phrase, and integers, like release_year.

与仅使用NumPy相比，Pandas的一大优点是Pandas允许您使用具有不同数据类型的列。 reviews列存储浮点值（例如score ，字符串值（例如score_phrase ）和整数（例如release_year 。

Now that we’ve read the data in properly, let’s work on indexing reviews to get the rows and columns that we want.

现在，我们已经在读的正确数据，让我们对索引工作reviews来获得我们想要的行和列。

用Pandas索引数据帧 (Indexing DataFrames with Pandas)

Earlier, we used the head method to print the first 5 rows of reviews. We could accomplish the same thing using the pandas.DataFrame.iloc method. The iloc method allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well.

之前，我们使用head方法打印reviews的前5行。我们可以使用pandas.DataFrame.iloc方法完成同样的事情。 iloc方法允许我们按位置检索行和列。为此，我们需要指定所需行的位置以及所需列的位置。

The below code will replicate reviews.head():

以下代码将复制reviews.head() ：

reviewsreviews .. ilociloc [[ 00 :: 55 ,:]
,:]

		Unnamed: 0	未命名：0	score_phrase	score_phrase	title	标题	url	网址	platform	平台	score	得分	genre	类型	editors_choice	editors_choice	release_year	release_year	release_month	release_month	release_day	release_day
0	0	0	0	Amazing	惊人	LittleBigPlanet PS Vita	LittleBigPlanet PS Vita	/games/littlebigplanet-vita/vita-98907	/ games / littlebigplanet-vita / vita-98907	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
1	1个	1	1个	Amazing	惊人	LittleBigPlanet PS Vita — Marvel Super Hero E…	LittleBigPlanet PS Vita —惊奇超级英雄E…	/games/littlebigplanet-ps-vita-marvel-super-he…	/ games / littlebigplanet-ps-vita-marvel-super-he…	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
2	2	2	2	Great	大	Splice: Tree of Life	拼接：生命之树	/games/splice/ipad-141070	/ games / splice / ipad-141070	iPad	的iPad	8.5	8.5	Puzzle	难题	N	ñ	2012	2012年	9	9	12	12
3	3	3	3	Great	大	NHL 13	NHL 13	/games/nhl-13/xbox-360-128182	/ games / nhl-13 / xbox-360-128182	Xbox 360	Xbox 360	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11
4	4	4	4	Great	大	NHL 13	NHL 13	/games/nhl-13/ps3-128181	/ games / nhl-13 / ps3-128181	PlayStation 3	的PlayStation 3	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11

As you can see above, we specified that we wanted rows 0:5. This means that we wanted the rows from position 0 up to, but not including, position 5. The first row is considered to be in position 0. This gives us the rows at positions 0, 1, 2, 3, and 4.

正如您在上面看到的，我们指定了要0:5行。这意味着我们想要从位置0到但不包括位置5 。第一行被认为是在位置0 。这给我们的位置处的行0 ， 1 ， 2 ， 3 ，和4 。

If we leave off the first position value, like :5, it’s assumed we mean 0. If we leave off the last position value, like 0:, it’s assumed we mean the last row or column in the DataFrame.

如果我们忽略第一个位置值，例如:5 ，则假定我们的意思是0 。如果我们忽略了最后一个位置值（例如0: ：），则假定我们是指DataFrame中的最后一行或最后一列。

We wanted all of the columns, so we specified just a colon (:), without any positions. This gave us the columns from 0 to the last column.

我们希望所有的列，所以我们只指定了一个冒号（ : ），没有任何职位。这给了我们从0到最后一列的列。

Here are some indexing examples, along with the results:

以下是一些索引示例以及结果：

reviews.iloc[:5,:] – the first 5 rows, and all of the columns for those rows.
reviews.iloc[:,:] – the entire DataFrame.
reviews.iloc[5:,5:] – rows from position 5 onwards, and columns from position 5 onwards.
reviews.iloc[:,0] – the first column, and all of the rows for the column.
reviews.iloc[9,:] – the 10th row, and all of the columns for that row.

reviews.iloc[:5,:] –前5行，以及这些行的所有列。
reviews.iloc[:,:] –整个DataFrame。
reviews.iloc[5:,5:] -从位置行5起，并从位置列5起。
reviews.iloc[:,0] –第一列，以及该列的所有行。
reviews.iloc[9,:] –第十行，以及该行的所有列。

Indexing by position is very similar to NumPy indexing. If you want to learn more, you can read our NumPy tutorial here.

按位置索引与NumPy索引非常相似。如果您想了解更多信息，可以在此处阅读我们的NumPy教程。

Now that we know how to index by position, let’s remove the first column, which doesn’t have any useful information:

现在我们知道了如何按位置索引，让我们删除第一列，该列没有任何有用的信息：

		score_phrase	score_phrase	title	标题	url	网址	platform	平台	score	得分	genre	类型	editors_choice	editors_choice	release_year	release_year	release_month	release_month	release_day	release_day
0	0	Amazing	惊人	LittleBigPlanet PS Vita	LittleBigPlanet PS Vita	/games/littlebigplanet-vita/vita-98907	/ games / littlebigplanet-vita / vita-98907	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
1	1个	Amazing	惊人	LittleBigPlanet PS Vita — Marvel Super Hero E…	LittleBigPlanet PS Vita —惊奇超级英雄E…	/games/littlebigplanet-ps-vita-marvel-super-he…	/ games / littlebigplanet-ps-vita-marvel-super-he…	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
2	2	Great	大	Splice: Tree of Life	拼接：生命之树	/games/splice/ipad-141070	/ games / splice / ipad-141070	iPad	的iPad	8.5	8.5	Puzzle	难题	N	ñ	2012	2012年	9	9	12	12
3	3	Great	大	NHL 13	NHL 13	/games/nhl-13/xbox-360-128182	/ games / nhl-13 / xbox-360-128182	Xbox 360	Xbox 360	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11
4	4	Great	大	NHL 13	NHL 13	/games/nhl-13/ps3-128181	/ games / nhl-13 / ps3-128181	PlayStation 3	的PlayStation 3	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11

在熊猫中使用标签建立索引 (Indexing Using Labels in Pandas)

Now that we know how to retrieve rows and columns by position, it’s worth looking into the other major way to work with DataFrames, which is to retrieve rows and columns by label.

既然我们知道如何按位置检索行和列，就值得研究使用DataFrames的另一种主要方法，即按标签检索行和列。

A major advantage of Pandas over NumPy is that each of the columns and rows has a label. Working with column positions is possible, but it can be hard to keep track of which number corresponds to which column.

与NumPy相比，Pandas的主要优势在于，每一列和每一行都有一个标签。可以处理列的位置，但是很难跟踪哪个数字对应于哪个列。

We can work with labels using the pandas.DataFrame.loc method, which allows us to index using labels instead of positions.

我们可以使用pandas.DataFrame.loc方法处理标签，该方法允许我们使用标签而不是位置进行索引。

We can display the first five rows of reviews using the loc method like this:

我们可以使用loc方法显示reviews的前五行，如下所示：

reviewsreviews .. locloc [[ 00 :: 55 ,:]
,:]

		score_phrase	score_phrase	title	标题	url	网址	platform	平台	score	得分	genre	类型	editors_choice	editors_choice	release_year	release_year	release_month	release_month	release_day	release_day
0	0	Amazing	惊人	LittleBigPlanet PS Vita	LittleBigPlanet PS Vita	/games/littlebigplanet-vita/vita-98907	/ games / littlebigplanet-vita / vita-98907	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
1	1个	Amazing	惊人	LittleBigPlanet PS Vita — Marvel Super Hero E…	LittleBigPlanet PS Vita —惊奇超级英雄E…	/games/littlebigplanet-ps-vita-marvel-super-he…	/ games / littlebigplanet-ps-vita-marvel-super-he…	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
2	2	Great	大	Splice: Tree of Life	拼接：生命之树	/games/splice/ipad-141070	/ games / splice / ipad-141070	iPad	的iPad	8.5	8.5	Puzzle	难题	N	ñ	2012	2012年	9	9	12	12
3	3	Great	大	NHL 13	NHL 13	/games/nhl-13/xbox-360-128182	/ games / nhl-13 / xbox-360-128182	Xbox 360	Xbox 360	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11
4	4	Great	大	NHL 13	NHL 13	/games/nhl-13/ps3-128181	/ games / nhl-13 / ps3-128181	PlayStation 3	的PlayStation 3	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11
5	5	Good	好	Total War Battles: Shogun	全面战争：将军	/games/total-war-battles-shogun/mac-142565	/ games /全面战争战斗将军/ mac-142565	Macintosh	苹果机	7.0	7.0	Strategy	战略	N	ñ	2012	2012年	9	9	11	11

The above doesn’t actually look much different from reviews.iloc[0:5,:]. This is because while row labels can take on any values, our row labels match the positions exactly. You can see the row labels on the very left of the table above (they’re in bold). You can also see them by accessing the index property of a DataFrame. We’ll display the row indexes for reviews:

上面的内容实际上与reviews.iloc[0:5,:]并没有太大区别。这是因为尽管行标签可以采用任何值，但我们的行标签与位置完全匹配。您可以在上方表格的最左侧看到行标签（它们以粗体显示）。您还可以通过访问DataFrame的index属性来查看它们。我们将显示reviews的行索引：


Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')

Indexes don’t always have to match up with positions, though. In the below code cell, we’ll:

但是，索引不一定总是与位置匹配。在下面的代码单元中，我们将：

Get row 10 to row 20 of reviews, and assign the result to some_reviews.
Display the first 5 rows of some_reviews.

获取行10至行20的reviews ，并分配结果some_reviews 。
显示some_reviews的前5行。

some_reviews some_reviews = = reviewsreviews .. ilociloc [[ 1010 :: 2020 ,]
,]
some_reviewssome_reviews .. headhead ()
()

		score_phrase	score_phrase	title	标题	url	网址	platform	平台	score	得分	genre	类型	editors_choice	editors_choice	release_year	release_year	release_month	release_month	release_day	release_day
10	10	Good	好	Tekken Tag Tournament 2	《铁拳》 Tag Tournament 2	/games/tekken-tag-tournament-2/ps3-124584	/ games / tekken-tag-tournament-2 / ps3-124584	PlayStation 3	的PlayStation 3	7.5	7.5	Fighting	战斗	N	ñ	2012	2012年	9	9	11	11
11	11	Good	好	Tekken Tag Tournament 2	《铁拳》 Tag Tournament 2	/games/tekken-tag-tournament-2/xbox-360-124581	/ games / tekken-tag-tournament-2 / xbox-360-124581	Xbox 360	Xbox 360	7.5	7.5	Fighting	战斗	N	ñ	2012	2012年	9	9	11	11
12	12	Good	好	Wild Blood	狂血	/games/wild-blood/iphone-139363	/ games / wild-blood / iphone-139363	iPhone	苹果手机	7.0	7.0	NaN	N	N	ñ	2012	2012年	9	9	10	10
13	13	Amazing	惊人	Mark of the Ninja	忍者印记	/games/mark-of-the-ninja-135615/xbox-360-129276	/ games / mark-of-the-ninja-135615 / xbox-360-129276	Xbox 360	Xbox 360	9.0	9.0	Action, Adventure	动作，冒险	Y	ÿ	2012	2012年	9	9	7	7
14	14	Amazing	惊人	Mark of the Ninja	忍者印记	/games/mark-of-the-ninja-135615/pc-143761	/ games / mark-of-the-ninja-135615 / pc-143761	PC	个人电脑	9.0	9.0	Action, Adventure	动作，冒险	Y	ÿ	2012	2012年	9	9	7	7

As you can see above, in some_reviews, the row indexes start at 10 and end at 20. Thus, trying loc along with numbers lower than 10 or higher than 20 will result in an error:

如上所示，在some_reviews ，行索引从10开始，在20结束。因此，将loc与小于10或大于20数字一起尝试将导致错误：

---------------------------------------------------------------------------
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
KeyError                                  Traceback (most recent call last)
 in  in  ()
()
----> 1----> 1  some_reviews some_reviews .loc. loc [[ 99 :: 2121 ,, :: ]

]

/Users/vik/python_envs/dsserver/lib/python3.4/site-packages/pandas/core/indexing.py in /Users/vik/python_envs/dsserver/lib/python3.4/site-packages/pandas/core/indexing.py in __getitem____getitem__ (self, key)
(self, key)
   1198        1198     def __getitem__def __getitem__ (self( self , key, key )) :
:
   1199            1199         if typeif type (key( key ) ) is tupleis tuple :
:
-> 1200-> 1200                            return selfreturn self ._getitem_tuple. _getitem_tuple (key( key )
)
   1201            1201         elseelse :
:
   1202                1202             return selfreturn self ._getitem_axis. _getitem_axis (key( key , axis, axis == 00 )

)

/Users/vik/python_envs/dsserver/lib/python3.4/site-packages/pandas/core/indexing.py in /Users/vik/python_envs/dsserver/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_tuple_getitem_tuple (self, tup)
(self, tup)
    702 
    702 
    703             703         # no multi-index, so validate all of the indexers
# no multi-index, so validate all of the indexers
--> 704--> 704          self         self ._has_valid_tuple. _has_valid_tuple (tup( tup )
)
    705 
    705 
    706             706         # ugly hack for GH #836

# ugly hack for GH #836

/Users/vik/python_envs/dsserver/lib/python3.4/site-packages/pandas/core/indexing.py in /Users/vik/python_envs/dsserver/lib/python3.4/site-packages/pandas/core/indexing.py in _has_valid_tuple_has_valid_tuple (self, key)
(self, key)
    129                 129             if i if i >= self>= self .obj. obj .ndim. ndim :
:
    130                     130                 raise IndexingErrorraise IndexingError (( 'Too many indexers''Too many indexers' )
)
--> 131--> 131                            if if not selfnot self ._has_valid_type. _has_valid_type (k( k , i, i )) :
:
    132                 raise ValueError("Location based indexing can only have [%s] "
    132                 raise ValueError("Location based indexing can only have [%s] "
    133                                  "types" % self._valid_types)

    133                                  "types" % self._valid_types)

/Users/vik/python_envs/dsserver/lib/python3.4/site-packages/pandas/core/indexing.py in /Users/vik/python_envs/dsserver/lib/python3.4/site-packages/pandas/core/indexing.py in _has_valid_type_has_valid_type (self, key, axis)
(self, key, axis)
   1258                         raise KeyError(
   1258                         raise KeyError(
   1259                                1259                             "start bound [%s] is not the [%s]" "start bound [%s] is not the [%s]" %
%
-> 1260-> 1260                                                            (key( key .start. start , self, self .obj. obj ._get_axis_name. _get_axis_name (axis( axis )) )
)
   1261                         )
   1261                         )
   1262                    1262                 if keyif key .stop . stop is is not not NoneNone :

:

KeyError: 'start bound [9] is not the [index]'
KeyError : 'start bound [9] is not the [index]'

As we mentioned earlier, column labels can make life much easier when you’re working with data. We can specify column labels in the loc method to retrieve columns by label instead of by position.

如前所述，在使用数据时，列标签可以使工作更加轻松。我们可以在loc方法中指定列标签，以按标签而不是按位置检索列。


0    9.0
1    9.0
2    8.5
3    8.5
4    8.5
5    7.0
Name: score, dtype: float64

We can also specify more than one column at a time by passing in a list:

我们还可以通过传递列表来一次指定多个列：

reviewsreviews .. locloc [:[: 55 ,[,[ "score""score" , , "release_year""release_year" ]]
]]

		score	得分	release_year	release_year
0	0	9.0	9.0	2012	2012年
1	1个	9.0	9.0	2012	2012年
2	2	8.5	8.5	2012	2012年
3	3	8.5	8.5	2012	2012年
4	4	8.5	8.5	2012	2012年
5	5	7.0	7.0	2012	2012年

熊猫系列物件 (Pandas Series Objects)

We can retrieve an individual column in Pandas a few different ways. So far, we’ve seen two types of syntax for this:

我们可以通过几种不同的方式在Pandas中检索单个列。到目前为止，我们已经看到了两种语法：

reviews.iloc[:,1] – will retrieve the second column.
reviews.loc[:,"score_phrase"] – will also retrieve the second column.

reviews.iloc[:,1] –将检索第二列。
reviews.loc[:,"score_phrase"] –还将检索第二列。

There’s a third, even easier, way to retrieve a whole column. We can just specify the column name in square brackets, like with a dictionary:

还有第三种甚至更简单的方法来检索整列。我们可以在方括号中指定列名称，例如使用字典：


0     9.0
1     9.0
2     8.5
3     8.5
4     8.5
5     7.0
6     3.0
7     9.0
8     3.0
9     7.0
10    7.5
11    7.5
12    7.0
13    9.0
14    9.0
...
18610     6.0
18611     5.8
18612     7.8
18613     8.0
18614     9.2
18615     9.2
18616     7.5
18617     8.4
18618     9.1
18619     7.9
18620     7.6
18621     9.0
18622     5.8
18623    10.0
18624    10.0
Name: score, Length: 18625, dtype: float64

We can also use lists of columns with this method:

我们还可以通过以下方法使用列列表：

reviewsreviews [[[[ "score""score" , , "release_year""release_year" ]]
]]

		score	得分	release_year	release_year
0	0	9.0	9.0	2012	2012年
1	1个	9.0	9.0	2012	2012年
2	2	8.5	8.5	2012	2012年
3	3	8.5	8.5	2012	2012年
4	4	8.5	8.5	2012	2012年
5	5	7.0	7.0	2012	2012年
6	6	3.0	3.0	2012	2012年
7	7	9.0	9.0	2012	2012年
8	8	3.0	3.0	2012	2012年
9	9	7.0	7.0	2012	2012年
10	10	7.5	7.5	2012	2012年
11	11	7.5	7.5	2012	2012年
12	12	7.0	7.0	2012	2012年
13	13	9.0	9.0	2012	2012年
14	14	9.0	9.0	2012	2012年
15	15	6.5	6.5	2012	2012年
16	16	6.5	6.5	2012	2012年
17	17	8.0	8.0	2012	2012年
18	18	5.5	5.5	2012	2012年
19	19	7.0	7.0	2012	2012年
20	20	7.0	7.0	2012	2012年
21	21	7.5	7.5	2012	2012年
22	22	7.5	7.5	2012	2012年
23	23	7.5	7.5	2012	2012年
24	24	9.0	9.0	2012	2012年
25	25	7.0	7.0	2012	2012年
26	26	9.0	9.0	2012	2012年
27	27	7.5	7.5	2012	2012年
28	28	8.0	8.0	2012	2012年
29	29	6.5	6.5	2012	2012年
…	…	…	…	…	…
18595	18595	4.4	4.4	2016	2016年
18596	18596	6.5	6.5	2016	2016年
18597	18597	4.9	4.9	2016	2016年
18598	18598	6.8	6.8	2016	2016年
18599	18599	7.0	7.0	2016	2016年
18600	18600	7.4	7.4	2016	2016年
18601	18601	7.4	7.4	2016	2016年
18602	18602	7.4	7.4	2016	2016年
18603	18603	7.8	7.8	2016	2016年
18604	18604	8.6	8.6	2016	2016年
18605	18605	6.0	6.0	2016	2016年
18606	18606	6.4	6.4	2016	2016年
18607	18607	7.0	7.0	2016	2016年
18608	18608	5.4	5.4	2016	2016年
18609	18609	8.0	8.0	2016	2016年
18610	18610	6.0	6.0	2016	2016年
18611	18611	5.8	5.8	2016	2016年
18612	18612	7.8	7.8	2016	2016年
18613	18613	8.0	8.0	2016	2016年
18614	18614	9.2	9.2	2016	2016年
18615	18615	9.2	9.2	2016	2016年
18616	18616	7.5	7.5	2016	2016年
18617	18617	8.4	8.4	2016	2016年
18618	18618	9.1	9.1	2016	2016年
18619	18619	7.9	7.9	2016	2016年
18620	18620	7.6	7.6	2016	2016年
18621	18621	9.0	9.0	2016	2016年
18622	18622	5.8	5.8	2016	2016年
18623	18623	10.0	10.0	2016	2016年
18624	18624	10.0	10.0	2016	2016年

18625 rows × 2 columns

18625行×2列

When we retrieve a single column, we’re actually retrieving a Pandas Series object. A DataFrame stores tabular data, but a Series stores a single column or row of data.

当我们检索单个列时，实际上是在检索Pandas Series对象。 DataFrame存储表格数据，而Series存储数据的单列或单行。

We can verify that a single column is a Series:

我们可以验证单个列是否为系列：


pandas.core.series.Series

We can create a Series manually to better understand how it works. To create a Series, we pass a list or NumPy array into the Series object when we instantiate it:

我们可以手动创建系列以更好地了解其工作原理。要创建一个Series，我们在实例化它时将一个列表或NumPy数组传递给Series对象：

s1 s1 = = pdpd .. SeriesSeries ([([ 11 ,, 22 ])
])
s1
s1


0    1
1    2
dtype: int64

A Series can contain any type of data, including mixed types. Here, we create a Series that contains string objects:

系列可以包含任何类型的数据，包括混合类型。在这里，我们创建一个包含字符串对象的系列：


0        Boris Yeltsin
1    Mikhail Gorbachev
dtype: object

在熊猫中创建一个DataFrame (Creating A DataFrame in Pandas)

We can create a DataFrame by passing multiple Series into the DataFrame class. Here, we pass in the two Series objects we just created, s1 as the first row, and s2 as the second row:

我们可以通过将多个Series传递到DataFrame类中来创建DataFrame。在这里，我们传入我们刚刚创建的两个Series对象， s1作为第一行， s2作为第二行：

pdpd .. DataFrameDataFrame ([([ s1s1 ,, s2s2 ])
])

		0	0	1	1个
0	0	1	1个	2	2
1	1个	Boris Yeltsin	鲍里斯·叶利钦	Mikhail Gorbachev	米哈伊尔·戈尔巴乔夫

We can also accomplish the same thing with a list of lists. Each inner list is treated as a row in the resulting DataFrame:

我们还可以使用列表列表完成同样的事情。每个内部列表在结果DataFrame中被视为一行：

		0	0	1	1个
0	0	1	1个	2	2
1	1个	Boris Yeltsin	鲍里斯·叶利钦	Mikhail Gorbachev	米哈伊尔·戈尔巴乔夫

We can specify the column labels when we create a DataFrame:

我们可以在创建DataFrame时指定列标签：

pdpd .. DataFrameDataFrame (
    (
    [
        [
        [[ 11 ,, 22 ],
        ],
        [[ "Boris Yeltsin""Boris Yeltsin" , , "Mikhail Gorbachev""Mikhail Gorbachev" ]
    ]
    ],
    ],
    columnscolumns == [[ "column1""column1" , , "column2""column2" ]
]
)
)

		column1	第1栏	column2	专栏2
0	0	1	1个	2	2
1	1个	Boris Yeltsin	鲍里斯·叶利钦	Mikhail Gorbachev	米哈伊尔·戈尔巴乔夫

As well as the row labels (the index):

以及行标签（索引）：

		column1	第1栏	column2	专栏2
row1	第1行	1	1个	2	2
row2	第2行	Boris Yeltsin	鲍里斯·叶利钦	Mikhail Gorbachev	米哈伊尔·戈尔巴乔夫

We’re then able index the DataFrame using the labels:

然后，我们可以使用标签为DataFrame编制索引：

frameframe .. locloc [[ "row1""row1" :: "row2""row2" , , "column1""column1" ]
]


row1                1
row2    Boris Yeltsin
Name: column1, dtype: object

We can skip specifying the columns keyword argument if we pass a dictionary into the DataFrame constructor. This will automatically setup column names:

如果将字典传递给DataFrame构造函数，则可以跳过指定columns关键字参数的DataFrame 。这将自动设置列名称：

		column1	第1栏	column2	专栏2
0	0	1	1个	2	2
1	1个	Boris Yeltsin	鲍里斯·叶利钦	Mikhail Gorbachev	米哈伊尔·戈尔巴乔夫

熊猫DataFrame方法 (Pandas DataFrame Methods)

As we mentioned earlier, each column in a DataFrame is a Series object:

如前所述，DataFrame中的每一列都是一个Series对象：

typetype (( reviewsreviews [[ "title""title" ])
])


pandas.core.series.Series

We can call most of the same methods on a Series object that we can on a DataFrame, including head:

我们可以在Series对象上调用与在DataFrame上可以调用的大多数相同方法，包括head ：


0                              LittleBigPlanet PS Vita
1    LittleBigPlanet PS Vita -- Marvel Super Hero E...
2                                 Splice: Tree of Life
3                                               NHL 13
4                                               NHL 13
Name: title, dtype: object

Pandas Series and DataFrames also have other methods that make calculations simpler. For example, we can use the pandas.Series.mean method to find the mean of a Series:

Pandas Series和DataFrames还具有其他使计算更简单的方法。例如，我们可以使用pandas.Series.mean方法来查找Series的均值：

reviewsreviews [[ "score""score" ]] .. meanmean ()
()


6.950459060402685

We can also call the similar pandas.DataFrame.mean method, which will find the mean of each numerical column in a DataFrame by default:

我们还可以调用类似的pandas.DataFrame.mean方法，该方法默认情况下将查找DataFrame中每个数字列的平均值：


score               6.950459
release_year     2006.515329
release_month       7.138470
release_day        15.603866
dtype: float64

We can modify the axis keyword argument to mean in order to compute the mean of each row or of each column. By default, axis is equal to 0, and will compute the mean of each column. We can also set it to 1 to compute the mean of each row. Note that this will only compute the mean of the numerical values in each row:

我们可以将axis关键字参数修改为mean ，以便计算每一行或每一列的平均值。默认情况下， axis等于0 ，并将计算每列的平均值。我们还可以将其设置为1以计算每行的平均值。请注意，这只会计算每行中数值的平均值：

reviewsreviews .. meanmean (( axisaxis == 11 )
)


0     510.500
1     510.500
2     510.375
3     510.125
4     510.125
5     509.750
6     508.750
7     510.250
8     508.750
9     509.750
10    509.875
11    509.875
12    509.500
13    509.250
14    509.250
...
18610    510.250
18611    508.700
18612    509.200
18613    508.000
18614    515.050
18615    515.050
18616    508.375
18617    508.600
18618    515.025
18619    514.725
18620    514.650
18621    515.000
18622    513.950
18623    515.000
18624    515.000
Length: 18625, dtype: float64

There are quite a few methods on Series and DataFrames that behave like mean. Here are some handy ones:

Series和DataFrame上有很多方法的行为类似于mean 。这里有一些方便的东西：

We can use the corr method to see if any columns correlation with score. For instance, this would tell us if games released more recently have been getting higher reviews (release_year), or if games released towards the end of the year score better (release_month):

我们可以使用corr方法查看是否有任何列与score相关。例如，这可以告诉我们最近发布的游戏获得了更高的评价（ release_year ），还是在年底之前发布的游戏获得了更好的评分（ release_month ）：

		score	得分	release_year	release_year	release_month	release_month	release_day	release_day
score	得分	1.000000	1.000000	0.062716	0.062716	0.007632	0.007632	0.020079	0.020079
release_year	release_year	0.062716	0.062716	1.000000	1.000000	-0.115515	-0.115515	0.016867	0.016867
release_month	release_month	0.007632	0.007632	-0.115515	-0.115515	1.000000	1.000000	-0.067964	-0.067964
release_day	release_day	0.020079	0.020079	0.016867	0.016867	-0.067964	-0.067964	1.000000	1.000000

As you can see above, none of our numeric columns correlates with score, meaning that release timing doesn’t linearly relate to review score.

正如您在上面看到的那样，我们的所有数字列都没有与score相关，这意味着发布时间与评论得分没有线性关系。

DataFrame Math与Pandas (DataFrame Math with Pandas)

We can also perform math operations on Series or DataFrame objects. For example, we can divide every value in the score column by 2 to switch the scale from 0–10 to 0–5:

我们还可以对Series或DataFrame对象执行数学运算。例如，我们可以将score列中的每个值除以2以将标度从0 – 10切换到0 – 5 ：

reviewsreviews [[ "score""score" ] ] / / 2
2


0     4.50
1     4.50
2     4.25
3     4.25
4     4.25
5     3.50
6     1.50
7     4.50
8     1.50
9     3.50
10    3.75
11    3.75
12    3.50
13    4.50
14    4.50
...
18610    3.00
18611    2.90
18612    3.90
18613    4.00
18614    4.60
18615    4.60
18616    3.75
18617    4.20
18618    4.55
18619    3.95
18620    3.80
18621    4.50
18622    2.90
18623    5.00
18624    5.00
Name: score, Length: 18625, dtype: float64

All the common mathematical operators that work in Python, like +, -, *, /, and ^ will work, and will apply to each element in a DataFrame or a Series.

所有在Python中运行的常用数学运算符，例如+ ， - ， * ， /和^都可以使用，并将应用于DataFrame或Series中的每个元素。

熊猫中的布尔索引 (Boolean Indexing in Pandas)

As we saw above, the mean of all the values in the score column of reviews is around 7. What if we wanted to find all the games that got an above average score? We could start by doing a comparison. The comparison compares each value in a Series to a specified value, then generate a Series full of Boolean values indicating the status of the comparison. For example, we can see which of the rows have a score value higher than 7:

正如我们在上面看到的， reviews的score栏中所有值的平均值约为7 。如果我们想找到所有得分都高于平均水平的游戏怎么办？我们可以先进行比较。比较会将“系列”中的每个值与指定值进行比较，然后生成一个“系列”，其中包含表示比较状态的布尔值。例如，我们可以看到哪些行的score值高于7 ：


0      True
1      True
2      True
3      True
4      True
5     False
6     False
7      True
8     False
9     False
10     True
11     True
12    False
13     True
14     True
...
18610    False
18611    False
18612     True
18613     True
18614     True
18615     True
18616     True
18617     True
18618     True
18619     True
18620     True
18621     True
18622    False
18623     True
18624     True
Name: score, Length: 18625, dtype: bool

Once we have a Boolean Series, we can use it to select only rows in a DataFrame where the Series contains the value True. So, we could only select rows in reviews where score is greater than 7:

一旦有了Boolean Series，我们就可以使用它来选择DataFrame中Series包含值True 。因此，我们只能在score大于7 reviews中选择行：

filtered_reviews filtered_reviews = = reviewsreviews [[ score_filterscore_filter ]
]
filtered_reviewsfiltered_reviews .. headhead ()
()

		score_phrase	score_phrase	title	标题	url	网址	platform	平台	score	得分	genre	类型	editors_choice	editors_choice	release_year	release_year	release_month	release_month	release_day	release_day
0	0	Amazing	惊人	LittleBigPlanet PS Vita	LittleBigPlanet PS Vita	/games/littlebigplanet-vita/vita-98907	/ games / littlebigplanet-vita / vita-98907	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
1	1个	Amazing	惊人	LittleBigPlanet PS Vita — Marvel Super Hero E…	LittleBigPlanet PS Vita —惊奇超级英雄E…	/games/littlebigplanet-ps-vita-marvel-super-he…	/ games / littlebigplanet-ps-vita-marvel-super-he…	PlayStation Vita	PlayStation Vita	9.0	9.0	Platformer	平台游戏	Y	ÿ	2012	2012年	9	9	12	12
2	2	Great	大	Splice: Tree of Life	拼接：生命之树	/games/splice/ipad-141070	/ games / splice / ipad-141070	iPad	的iPad	8.5	8.5	Puzzle	难题	N	ñ	2012	2012年	9	9	12	12
3	3	Great	大	NHL 13	NHL 13	/games/nhl-13/xbox-360-128182	/ games / nhl-13 / xbox-360-128182	Xbox 360	Xbox 360	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11
4	4	Great	大	NHL 13	NHL 13	/games/nhl-13/ps3-128181	/ games / nhl-13 / ps3-128181	PlayStation 3	的PlayStation 3	8.5	8.5	Sports	体育	N	ñ	2012	2012年	9	9	11	11

It’s possible to use multiple conditions for filtering. Let’s say we want to find games released for the Xbox One that have a score of more than 7. In the below code, we:

可以使用多个条件进行过滤。假设我们要查找为Xbox One发行的得分超过7 。在下面的代码中，我们：

Setup a filter with two conditions:
- Check if score is greater than 7.
- Check if platform equals Xbox One
Apply the filter to reviews to get only the rows we want.
Use the head method to print the first 5 rows of filtered_reviews.

设置具有两个条件的过滤器：
- 检查score是否大于7 。
- 检查platform等于Xbox One
将过滤器应用于reviews以仅获取所需的行。
使用head方法打印前5行filtered_reviews 。

		score_phrase	score_phrase	title	标题	url	网址	platform	平台	score	得分	genre	类型	editors_choice	editors_choice	release_year	release_year	release_month	release_month	release_day	release_day
17137	17137	Amazing	惊人	Gone Home	回家了	/games/gone-home/xbox-one-20014361	/ games / gone-home / xbox-one-20014361	Xbox One	Xbox One	9.5	9.5	Simulation	模拟	Y	ÿ	2013	2013年	8	8	15	15
17197	17197	Amazing	惊人	Rayman Legends	雷曼传奇	/games/rayman-legends/xbox-one-20008449	/ games / rayman-legends / xbox-one-20008449	Xbox One	Xbox One	9.5	9.5	Platformer	平台游戏	Y	ÿ	2013	2013年	8	8	26	26
17295	17295	Amazing	惊人	LEGO Marvel Super Heroes	乐高漫威超级英雄	/games/lego-marvel-super-heroes/xbox-one-20000826	/ games / lego-marvel-super-heroes / xbox-one-20000826	Xbox One	Xbox One	9.0	9.0	Action	行动	Y	ÿ	2013	2013年	10	10	22	22
17313	17313	Great	大	Dead Rising 3	死亡崛起3	/games/dead-rising-3/xbox-one-124306	/ games / dead-rising-3 / xbox-one-124306	Xbox One	Xbox One	8.3	8.3	Action	行动	N	ñ	2013	2013年	11	11	18	18
17317	17317	Great	大	Killer Instinct	杀手本能	/games/killer-instinct-2013/xbox-one-20000538	/ games / killer-instinct-2013 / xbox-one-20000538	Xbox One	Xbox One	8.4	8.4	Fighting	战斗	N	ñ	2013	2013年	11	11	18	18

When filtering with multiple conditions, it’s important to put each condition in parentheses, and separate them with a single ampersand (&).

当与多个条件滤波，它把每个条件括号，并将它们与一个单一的符号（分离是很重要的& ）。

熊猫图 (Pandas Plotting)

Now that we know how to filter, we can create plots to observe the review distribution for the Xbox One vs the review distribution for the PlayStation 4. This will help us figure out which console has better games. We can do this via a histogram, which will plot the frequencies for different score ranges. This will tell us which console has more highly reviewed games.

现在我们知道如何过滤，我们可以创建图来观察Xbox One的评论分布与PlayStation 4的评论分布。这将帮助我们确定哪个控制台具有更好的游戏。我们可以通过直方图来做到这一点，直方图将绘制不同分数范围的频率。这将告诉我们哪个控制台具有更受好评的游戏。

We can make a histogram for each console using the pandas.DataFrame.plot method. This method utilizes matplotlib, the popular Python plotting library, under the hood to generate good-looking plots.

我们可以使用pandas.DataFrame.plot方法为每个控制台制作一个直方图。该方法利用内部流行的Python绘图库matplotlib生成美观的绘图。

The plot method defaults to drawing a line graph. We’ll need to pass in the keyword argument kind="hist" to draw a histogram instead.

plot方法默认为绘制折线图。我们需要传入关键字参数kind="hist"来绘制直方图。

In the below code, we:

在下面的代码中，我们：

Call %matplotlib inline to setup plotting inside a Jupyter notebook.
Filter reviews to only have data about the Xbox One.
Plot the score column.

%matplotlib inline调用%matplotlib inline以在Jupyter笔记本中设置打印。
筛选reviews以仅包含有关Xbox One数据。
绘制score列。

%% matplotlib inline
matplotlib inline
reviewsreviews [[ reviewsreviews [[ "platform""platform" ] ] == == "Xbox One""Xbox One" ][][ "score""score" ]] .. plotplot (( kindkind == "hist""hist" )
)

We can also do the same for the PS4:

我们也可以对PS4做同样的事情：

It appears from our histogram that the PlayStation 4 has many more highly rated games than the Xbox One.

从我们的直方图中可以看出， PlayStation 4比Xbox One具有更高的评价。

filtered_reviewsfiltered_reviews [[ "score""score" ]] .. histhist ()
()

进一步阅读 (Further Reading)

翻译自: https://www.pybloggers.com/2016/10/pandas-tutorial-data-analysis-with-python-part-1/

你可能感兴趣的:(python,数据分析,java,大数据,人工智能)

理解Gunicorn：Python WSGI服务器的基石范范0825 ipython linux 运维
理解Gunicorn：PythonWSGI服务器的基石介绍Gunicorn，全称GreenUnicorn，是一个为PythonWSGI（WebServerGatewayInterface）应用设计的高效、轻量级HTTP服务器。作为PythonWeb应用部署的常用工具，Gunicorn以其高性能和易用性著称。本文将介绍Gunicorn的基本概念、安装和配置，帮助初学者快速上手。1.什么是Gunico
Long类型前后端数据不一致 igotyback 前端
响应给前端的数据浏览器控制台中response中看到的Long类型的数据是正常的到前端数据不一致前后端数据类型不匹配是一个常见问题，尤其是当后端使用Java的Long类型（64位）与前端JavaScript的Number类型（最大安全整数为2^53-1，即16位）进行数据交互时，很容易出现精度丢失的问题。这是因为JavaScript中的Number类型无法安全地表示超过16位的整数。为了解决这个问
LocalDateTime 转 String igotyback java 开发语言
importjava.time.LocalDateTime;importjava.time.format.DateTimeFormatter;publicclassMain{publicstaticvoidmain(String[]args){//获取当前时间LocalDateTimenow=LocalDateTime.now();//定义日期格式化器DateTimeFormatterformat
Linux下QT开发的动态库界面弹出操作（SDL2） 13jjyao QT类 qt 开发语言 sdl2 linux
需求：操作系统为linux，开发框架为qt，做成需带界面的qt动态库，调用方为java等非qt程序难点：调用方为java等非qt程序，也就是说调用方肯定不带QApplication::exec()，缺少了这个，QTimer等事件和QT创建的窗口将不能弹出(包括opencv也是不能弹出)；这与qt调用本身qt库是有本质的区别的思路：1.调用方缺QApplication::exec()，那么我们在接口
Python数据分析与可视化实战指南 William数据分析 python python 数据
在数据驱动的时代，Python因其简洁的语法、强大的库生态系统以及活跃的社区，成为了数据分析与可视化的首选语言。本文将通过一个详细的案例，带领大家学习如何使用Python进行数据分析，并通过可视化来直观呈现分析结果。一、环境准备1.1安装必要库在开始数据分析和可视化之前，我们需要安装一些常用的库。主要包括pandas、numpy、matplotlib和seaborn等。这些库分别用于数据处理、数学
python os.environ 江湖偌大 python 深度学习
os.environ['TF_CPP_MIN_LOG_LEVEL']='0'#默认值，输出所有信息os.environ['TF_CPP_MIN_LOG_LEVEL']='1'#屏蔽通知信息（INFO）os.environ['TF_CPP_MIN_LOG_LEVEL']='2'#屏蔽通知信息和警告信息（INFO\WARNING）os.environ['TF_CPP_MIN_LOG_LEVEL']='
Python中os.environ基本介绍及使用方法鹤冲天Pro #Python python 服务器开发语言
文章目录python中os.environos.environ简介os.environ进行环境变量的增删改查python中os.environ的使用详解1.简介2.key字段详解2.1常见key字段3.os.environ.get()用法4.环境变量的增删改查和判断是否存在4.1新增环境变量4.2更新环境变量4.3获取环境变量4.4删除环境变量4.5判断环境变量是否存在python中os.envi
Pyecharts数据可视化大屏：打造沉浸式数据分析体验我的运维人生信息可视化数据分析数据挖掘运维开发技术共享
Pyecharts数据可视化大屏：打造沉浸式数据分析体验在当今这个数据驱动的时代，如何将海量数据以直观、生动的方式展现出来，成为了数据分析师和企业决策者关注的焦点。Pyecharts，作为一款基于Python的开源数据可视化库，凭借其丰富的图表类型、灵活的配置选项以及高度的定制化能力，成为了构建数据可视化大屏的理想选择。本文将深入探讨如何利用Pyecharts打造数据可视化大屏，并通过实际代码案例
Python教程：一文了解使用Python处理XPath 旦莫 Python进阶 python 开发语言
目录1.环境准备1.1安装lxml1.2验证安装2.XPath基础2.1什么是XPath？2.2XPath语法2.3示例XML文档3.使用lxml解析XML3.1解析XML文档3.2查看解析结果4.XPath查询4.1基本路径查询4.2使用属性查询4.3查询多个节点5.XPath的高级用法5.1使用逻辑运算符5.2使用函数6.实战案例6.1从网页抓取数据6.1.1安装Requests库6.1.2代
python os.environ_python os.environ 读取和设置环境变量 weixin_39605414 python os.environ
>>>importos>>>os.environ.keys()['LC_NUMERIC','GOPATH','GOROOT','GOBIN','LESSOPEN','SSH_CLIENT','LOGNAME','USER','HOME','LC_PAPER','PATH','DISPLAY','LANG','TERM','SHELL','J2REDIR','LC_MONETARY','QT_QPA
DIV+CSS+JavaScript技术制作网页（旅游主题网页设计与制作）云南大理 STU学生网页设计网页设计期末网页作业 html静态网页 html5期末大作业网页设计 web大作业
️精彩专栏推荐作者主页:【进入主页—获取更多源码】web前端期末大作业：【HTML5网页期末作业(1000套)】程序员有趣的告白方式：【HTML七夕情人节表白网页制作(110套)】文章目录二、网站介绍三、网站效果▶️1.视频演示2.图片演示四、网站代码HTML结构代码CSS样式代码五、更多源码二、网站介绍网站布局方面：计划采用目前主流的、能兼容各大主流浏览器、显示效果稳定的浮动网页布局结构。网站程
【华为OD机试真题2023B卷 JAVA&JS】We Are A Team 若博豆 java 算法华为 javascript
华为OD2023（B卷）机试题库全覆盖，刷题指南点这里WeAreATeam时间限制：1秒|内存限制：32768K|语言限制：不限题目描述：总共有n个人在机房，每个人有一个标号（1<=标号<=n），他们分成了多个团队，需要你根据收到的m条消息判定指定的两个人是否在一个团队中，具体的：1、消息构成为：abc，整数a、b分别代
探索OpenAI和LangChain的适配器集成：轻松切换模型提供商 nseejrukjhad langchain easyui 前端 python
#探索OpenAI和LangChain的适配器集成：轻松切换模型提供商##引言在人工智能和自然语言处理的世界中，OpenAI的模型提供了强大的能力。然而，随着技术的发展，许多人开始探索其他模型以满足特定需求。LangChain作为一个强大的工具，集成了多种模型提供商，通过提供适配器，简化了不同模型之间的转换。本篇文章将介绍如何使用LangChain的适配器与OpenAI集成，以便轻松切换模型提供商
使用Faiss进行高效相似度搜索 llzwxh888 faiss python
在现代AI应用中，快速和高效的相似度搜索是至关重要的。Faiss（FacebookAISimilaritySearch）是一个专门用于快速相似度搜索和聚类的库，特别适用于高维向量。本文将介绍如何使用Faiss来进行相似度搜索，并结合Python代码演示其基本用法。什么是Faiss？Faiss是一个由FacebookAIResearch团队开发的开源库，主要用于高维向量的相似性搜索和聚类。Faiss
python是什么意思中文-在python中%是什么意思编程大乐趣
Python中%有两种：1、数值运算：%代表取模，返回除法的余数。如：>>>7%212、%操作符（字符串格式化，stringformatting），说明如下：%[(name)][flags][width].[precision]typecode(name)为命名flags可以有+，-，''或0。+表示右对齐。-表示左对齐。''为一个空格，表示在正数的左侧填充一个空格，从而与负数对齐。0表示使用0填
深入理解 MultiQueryRetriever：提升向量数据库检索效果的强大工具 nseejrukjhad 数据库 python
深入理解MultiQueryRetriever：提升向量数据库检索效果的强大工具引言在人工智能和自然语言处理领域，高效准确的信息检索一直是一个关键挑战。传统的基于距离的向量数据库检索方法虽然广泛应用，但仍存在一些局限性。本文将介绍一种创新的解决方案：MultiQueryRetriever，它通过自动生成多个查询视角来增强检索效果，提高结果的相关性和多样性。MultiQueryRetriever的工
关于城市旅游的HTML网页设计——(旅游风景云南 5页)HTML+CSS+JavaScript 二挡起步 web前端期末大作业 javascript html css 旅游风景
⛵源码获取文末联系✈Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业|游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作|HTML期末大学生网页设计作业，Web大学生网页HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScrip
HTML网页设计制作大作业（div+css）云南我的家乡旅游景点带文字滚动二挡起步 web前端期末大作业 web设计网页规划与设计 html css javascript dreamweaver 前端
Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作HTML期末大学生网页设计作业HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScript：做与用户的交互行为文章目录前端学习路线
Day1笔记-Python简介&标识符和关键字&输入输出 ~在杰难逃~ Python python 开发语言大数据数据分析数据挖掘
大家好，从今天开始呢，杰哥开展一个新的专栏，当然，数据分析部分也会不定时更新的，这个新的专栏主要是讲解一些Python的基础语法和知识，帮助0基础的小伙伴入门和学习Python，感兴趣的小伙伴可以开始认真学习啦！一、Python简介【了解】1.计算机工作原理编程语言就是用来定义计算机程序的形式语言。我们通过编程语言来编写程序代码，再通过语言处理程序执行向计算机发送指令，让计算机完成对应的工作，编程
python八股文面试题分享及解析(1) Shawn________ python
#1.'''a=1b=2不用中间变量交换a和b'''#1.a=1b=2a,b=b,aprint(a)print(b)结果：21#2.ll=[]foriinrange(3):ll.append({'num':i})print(11)结果:#[{'num':0},{'num':1},{'num':2}]#3.kk=[]a={'num':0}foriinrange(3):#0,12#可变类型，不仅仅改变
人工智能时代，程序员如何保持核心竞争力？ jmoych 人工智能
随着AIGC（如chatgpt、midjourney、claude等）大语言模型接二连三的涌现，AI辅助编程工具日益普及，程序员的工作方式正在发生深刻变革。有人担心AI可能取代部分编程工作，也有人认为AI是提高效率的得力助手。面对这一趋势,程序员应该如何应对?是专注于某个领域深耕细作，还是广泛学习以适应快速变化的技术环境?又或者，我们是否应该将重点转向AI无法轻易替代的软技能？让我们一起探讨程序员
每日算法&面试题，大厂特训二十八天——第二十天（树）肥学 ⚡算法题⚡面试题每日精进 java 算法数据结构
目录标题导读算法特训二十八天面试题点击直接资料领取导读肥友们为了更好的去帮助新同学适应算法和面试题，最近我们开始进行专项突击一步一步来。上一期我们完成了动态规划二十一天现在我们进行下一项对各类算法进行二十八天的一个小总结。还在等什么快来一起肥学进行二十八天挑战吧！！特别介绍小白练手专栏，适合刚入手的新人欢迎订阅编程小白进阶python有趣练手项目里面包括了像《机器人尬聊》《恶搞程序》这样的有趣文章
Python快速入门 —— 第三节：类与对象孤华暗香 Python快速入门 python 开发语言
第三节：类与对象目标：了解面向对象编程的基础概念，并学会如何定义类和创建对象。内容：类与对象：定义类：class关键字。类的构造函数：__init__()。类的属性和方法。对象的创建与使用。示例：classStudent:def__init__(self,name,age,major):self.name&#
pyecharts——绘制柱形图折线图 2224070247 信息可视化 python java 数据可视化
一、pyecharts概述自2013年6月百度EFE(ExcellentFrontEnd）数据可视化团队研发的ECharts1.0发布到GitHub网站以来，ECharts一直备受业界权威的关注并获得广泛好评，成为目前成熟且流行的数据可视化图表工具，被应用到诸多数据可视化的开发领域。Python作为数据分析领域最受欢迎的语言，也加入ECharts的使用行列，并研发出方便Python开发者使用的数据
node.js学习小猿L node.js node.js 学习 vim
node.js学习实操及笔记温故node.js，node.js学习实操过程及笔记~node.js学习视频node.js官网node.js中文网实操笔记githubcsdn笔记为什么学node.js可以让别人访问我们编写的网页为后续的框架学习打下基础，三大框架vuereactangular离不开node.jsnode.js是什么官网：node.js是一个开源的、跨平台的运行JavaScript的运行
Python 实现图片裁剪（附代码） | Python工具剑客阿良_ALiang
前言本文提供将图片按照自定义尺寸进行裁剪的工具方法，一如既往的实用主义。环境依赖ffmpeg环境安装，可以参考我的另一篇文章：windowsffmpeg安装部署_阿良的博客-CSDN博客本文主要使用到的不是ffmpeg，而是ffprobe也在上面这篇文章中的zip包中。ffmpy安装：pipinstallffmpy-ihttps://pypi.douban.com/simple代码不废话了，上代码
【华为OD技术面试真题 - 技术面】- python八股文真题题库（4) 算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选**1.Python中的`with`**用途和功能自动资源管理示例：文件操作上下文管理协议示例代码工作流程解析优点2.\_\_new\_\_和**\_\_init\_\_**区别__new____init__区别总结3.**切片（Slicing）操作**基本切片语法
python os 环境变量 CV矿工 python 开发语言 numpy
环境变量：环境变量是程序和操作系统之间的通信方式。有些字符不宜明文写进代码里，比如数据库密码，个人账户密码，如果写进自己本机的环境变量里，程序用的时候通过os.environ.get（）取出来就行了。os.environ是一个环境变量的字典。环境变量的相关操作importos"""设置/修改环境变量：os.environ[‘环境变量名称’]=‘环境变量值’#其中key和value均为string类
Python爬虫解析工具之xpath使用详解 eqa11 python 爬虫开发语言
文章目录Python爬虫解析工具之xpath使用详解一、引言二、环境准备1、插件安装2、依赖库安装三、xpath语法详解1、路径表达式2、通配符3、谓语4、常用函数四、xpath在Python代码中的使用1、文档树的创建2、使用xpath表达式3、获取元素内容和属性五、总结Python爬虫解析工具之xpath使用详解一、引言在Python爬虫开发中，数据提取是一个至关重要的环节。xpath作为一门
【华为OD技术面试真题 - 技术面】- python八股文真题题库（1）算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选1.数据预处理流程数据预处理的主要步骤工具和库2.介绍线性回归、逻辑回归模型线性回归（LinearRegression）模型形式：关键点：逻辑回归（LogisticRegression）模型形式：关键点：参数估计与评估：3.python浅拷贝及深拷贝浅拷贝（Shal
TOMCAT在POST方法提交参数丢失问题 357029540 java tomcat jsp
摘自http://my.oschina.net/luckyi/blog/213209 昨天在解决一个BUG时发现一个奇怪的问题，一个AJAX提交数据在之前都是木有问题的，突然提交出错影响其他处理流程。检查时发现页面处理数据较多，起初以为是提交顺序不正确修改后发现不是由此问题引起。于是删除掉一部分数据进行提交，较少数据能够提交成功。恢复较多数据后跟踪提交FORM DATA ，发现数
在MyEclipse中增加JSP模板删除-2008-08-18 ljy325 jsp xml MyEclipse
在D:\Program Files\MyEclipse 6.0\myeclipse\eclipse\plugins\com.genuitec.eclipse.wizards_6.0.1.zmyeclipse601200710\templates\jsp 目录下找到Jsp.vtl，复制一份，重命名为jsp2.vtl,然后把里面的内容修改为自己想要的格式，保存。然后在 D:\Progr
JavaScript常用验证脚本总结 eksliang JavaScript javaScript表单验证
转载请出自出处：http://eksliang.iteye.com/blog/2098985 下面这些验证脚本，是我在这几年开发中的总结，今天把他放出来，也算是一种分享吧，现在在我的项目中也在用！包括日期验证、比较，非空验证、身份证验证、数值验证、Email验证、电话验证等等...! &nb
微软BI（4） 18289753290 微软BI SSIS
1） Q:查看ssis里面某个控件输出的结果： A MessageBox.Show(Dts.Variables["v_lastTimestamp"].Value.ToString()); 这是我们在包里面定义的变量 2):在关联目的端表的时候如果是一对多的关系，一定要选择唯一的那个键作为关联字段。 3) Q：ssis里面如果将多个数据源的数据插入目的端一
定时对大数据量的表进行分表对数据备份酷的飞上天空大数据量
工作中遇到数据库中一个表的数据量比较大，属于日志表。正常情况下是不会有查询操作的，但如果不进行分表数据太多，执行一条简单sql语句要等好几分钟。。分表工具：linux的shell + mysql自身提供的管理命令原理：使用一个和原表数据结构一样的表，替换原表。 linux shell内容如下： =======================开始
本质的描述与因材施教永夜-极光感想随笔
不管碰到什么事,我都下意识的想去探索本质,找寻一个最形象的描述方式。我坚信,世界上对一件事物的描述和解释,肯定有一种最形象,最贴近本质,最容易让人理解 &
很迷茫。。。随便小屋随笔
小弟我今年研一，也是从事的咱们现在最流行的专业（计算机）。本科三流学校，为了能有个更好的跳板，进入了考研大军，非常有幸能进入研究生的行业（具体学校就不说了，怕把学校的名誉给损了）。先说一下自身的条件，本科专业软件工程。主要学习就是软件开发，几乎和计算机没有什么区别。因为学校本身三流，也就是让老师带着学生学点东西，然后让学生毕业就行了。对专业性的东西了解的非常浅。就那学的语言来说
23种设计模式的意图和适用范围 aijuans 设计模式
Factory Method 意图定义一个用于创建对象的接口，让子类决定实例化哪一个类。Factory Method 使一个类的实例化延迟到其子类。　　适用性当一个类不知道它所必须创建的对象的类的时候。　　当一个类希望由它的子类来指定它所创建的对象的时候。　　当类将创建对象的职责委托给多个帮助子类中的某一个，并且你希望将哪一个帮助子类是代理者这一信息局部化的时候。 Abstr
Java中的synchronized和volatile aoyouzi java volatile synchronized
说到Java的线程同步问题肯定要说到两个关键字synchronized和volatile。说到这两个关键字，又要说道JVM的内存模型。JVM里内存分为main memory和working memory。 Main memory是所有线程共享的，working memory则是线程的工作内存，它保存有部分main memory变量的拷贝，对这些变量的更新直接发生在working memo
js数组的操作和this关键字百合不是茶 js 数组操作 this关键字
js数组的操作; 一:数组的创建: 1、数组的创建 var array = new Array();　//创建一个数组 var array = new Array([size]);　//创建一个数组并指定长度，注意不是上限，是长度 var arrayObj = new Array([element0[, element1[, ...[, elementN]]]
别人的阿里面试感悟 bijian1013 面试分享工作感悟阿里面试
原文如下：http://greemranqq.iteye.com/blog/2007170 一直做企业系统，虽然也自己一直学习技术，但是感觉还是有所欠缺，准备花几个月的时间，把互联网的东西，以及一些基础更加的深入透析，结果这次比较意外，有点突然，下面分享一下感受吧！ &nb
淘宝的测试框架Itest Bill_chen spring maven 框架单元测试 JUnit
Itest测试框架是TaoBao测试部门开发的一套单元测试框架，以Junit4为核心，集合DbUnit、Unitils等主流测试框架，应该算是比较好用的了。近期项目中用了下，有关itest的具体使用如下： 1.在Maven中引入itest框架： <dependency> <groupId>com.taobao.test</groupId&g
【Java多线程二】多路条件解决生产者消费者问题 bit1129 java多线程
package com.tom; import java.util.LinkedList; import java.util.Queue; import java.util.concurrent.ThreadLocalRandom; import java.util.concurrent.locks.Condition; import java.util.concurrent.loc
汉字转拼音pinyin4j 白糖_ pinyin4j
以前在项目中遇到汉字转拼音的情况，于是在网上找到了pinyin4j这个工具包，非常有用，别的不说了，直接下代码： import java.util.HashSet; import java.util.Set; import net.sourceforge.pinyin4j.PinyinHelper; import net.sourceforge.pinyin
org.hibernate.TransactionException: JDBC begin failed解决方案 bozch ssh 数据库异常 DBCP
org.hibernate.TransactionException: JDBC begin failed: at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:68) at org.hibernate.impl.SessionImp
java-并查集（Disjoint-set）-将多个集合合并成没有交集的集合 bylijinnan java
import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.Map; import java.ut
Java PrintWriter打印乱码 chenbowen00 java
一个小程序读写文件，发现PrintWriter输出后文件存在乱码，解决办法主要统一输入输出流编码格式。读文件： BufferedReader 从字符输入流中读取文本，缓冲各个字符，从而提供字符、数组和行的高效读取。可以指定缓冲区的大小，或者可使用默认的大小。大多数情况下，默认值就足够大了。通常，Reader 所作的每个读取请求都会导致对基础字符或字节流进行相应的读取请求。因
[天气与气候]极端气候环境 comsci 环境
如果空间环境出现异变...外星文明并未出现,而只是用某种气象武器对地球的气候系统进行攻击,并挑唆地球国家间的战争,经过一段时间的准备...最大限度的削弱地球文明的整体力量,然后再进行入侵...... 那么地球上的国家应该做什么样的防备工作呢? &n
oracle order by与union一起使用的用法 daizj UNION oracle order by
当使用union操作时，排序语句必须放在最后面才正确，如下：只能在union的最后一个子查询中使用order by，而这个order by是针对整个unioning后的结果集的。So：如果unoin的几个子查询列名不同，如 Sql代码 select supplier_id, supplier_name from suppliers UNI
zeus持久层读写分离单元测试 deng520159 单元测试
本文是zeus读写分离单元测试,距离分库分表,只有一步了.上代码: 1.ZeusMasterSlaveTest.java package com.dengliang.zeus.webdemo.test; import java.util.ArrayList; import java.util.List; import org.junit.Assert; import org.j
Yii 截取字符串(UTF-8) 使用组件 dcj3sjt126com yii
1.将Helper.php放进protected\components文件夹下。 2.调用方法： Helper::truncate_utf8_string($content,20,false); //不显示省略号 Helper::truncate_utf8_string($content,20); //显示省略号 &n
安装memcache及php扩展 dcj3sjt126com PHP
安装memcache tar zxvf memcache-2.2.5.tgz cd memcache-2.2.5/ /usr/local/php/bin/phpize (?) ./configure --with-php-confi
JsonObject 处理日期 feifeilinlin521 java json JsonOjbect JsonArray JSONException
写这边文章的初衷就是遇到了json在转换日期格式出现了异常 net.sf.json.JSONException: java.lang.reflect.InvocationTargetException 原因是当你用Map接收数据库返回了java.sql.Date 日期的数据进行json转换出的问题话不多说直接上代码 &n
Ehcache（06）——监听器 234390216 监听器 listener ehcache
监听器 Ehcache中监听器有两种，监听CacheManager的CacheManagerEventListener和监听Cache的CacheEventListener。在Ehcache中，Listener是通过对应的监听器工厂来生产和发生作用的。下面我们将来介绍一下这两种类型的监听器。
activiti 自带设计器中chrome 34版本不能打开bug的解决 jackyrong Activiti
在acitivti modeler中，如果是chrome 34，则不能打开该设计器，其他浏览器可以，经证实为bug，参考 http://forums.activiti.org/content/activiti-modeler-doesnt-work-chrome-v34 修改为，找到 oryx.debug.js 在最头部增加 if (!Document.
微信收货地址共享接口-终极解决 laotu5i0 微信开发
最近要接入微信的收货地址共享接口，总是不成功，折腾了好几天，实在没办法网上搜到的帖子也是骂声一片。我把我碰到并解决问题的过程分享出来，希望能给微信的接口文档起到一个辅助作用，让后面进来的开发者能快速的接入，而不需要像我们一样苦逼的浪费好几天，甚至一周的青春。各种羞辱、谩骂的话就不说了，本人还算文明。如果你能搜到本贴，说明你已经碰到了各种 ed
关于人才 netkiller.github.com 工作面试招聘 netkiller 人才
关于人才每个月我都会接到许多猎头的电话，有些猎头比较专业，但绝大多数在我看来与猎头二字还是有很大差距的。与猎头接触多了，自然也了解了他们的工作，包括操作手法，总体上国内的猎头行业还处在初级阶段。总结就是“盲目推荐，以量取胜”。目前现状许多从事人力资源工作的人，根本不懂得怎么找人才。处在人才找不到企业，企业找不到人才的尴尬处境。企业招聘，通常是需要用人的部门提出招聘条件，由人
搭建 CentOS 6 服务器 - 目录 rensanning centos
(1) 安装CentOS ISO（desktop/minimal）、Cloud（AWS/阿里云）、Virtualization（VMWare、VirtualBox）详细内容 (2) Linux常用命令 cd、ls、rm、chmod...... 详细内容 (3) 初始环境设置用户管理、网络设置、安全设置...... 详细内容 (4) 常驻服务Daemon
【求助】mongoDB无法更新主键 toknowme mongodb
Query query = new Query(); query.addCriteria(new Criteria("_id").is(o.getId())); &n
jquery 页面滚动到底部自动加载插件集合 xp9802 jquery
很多社交网站都使用无限滚动的翻页技术来提高用户体验，当你页面滑到列表底部时候无需点击就自动加载更多的内容。下面为你推荐 10 个 jQuery 的无限滚动的插件： 1. jQuery ScrollPagination jQuery ScrollPagination plugin 是一个 jQuery 实现的支持无限滚动加载数据的插件。 2. jQuery Screw S