钱甫新

pandas 简单练习

创建xlsx(Excel)文件

import pandas as pd

'''
Parameters
----------
excel_writer :  str or ExcelWriter object
    文件路径或者ExcelWriter对象
sheet_name : str, default 'Sheet1'
    工作薄名称
na_rep : str, default ''
    缺少数据的表示形式
float_format : str, optional
    浮点数的格式
    ``float_format="%%.2f"`` will format 0.1234 to 0.12.
columns : sequence or list of str格式, optional
    从原来的列中选择你想要保存的列
header : bool or list of str, default True
    给列名更换名称，要和列的数量一致（比如有五列  那你header的list必须有五个str）
index : bool, default True
    是否使用index
index_label : str or sequence, optional
    没懂什么意思
startrow : int, default 0
     新文件写入行的位置  从n行开始书写
startcol : int, default 0
    新文件写入列的位置  从n列开始书写
engine : str, optional
    选择引擎
    'openpyxl' 
    'xlsxwriter'
    ``io.excel.xlsx.writer``
    ``io.excel.xls.writer``
    ``io.excel.xlsm.writer``.
merge_cells : bool, default True
   将多索引和分层行写入为合并单元格。
encoding : str, optional
    设置编码  只有xlwt格式需要设置、其他格式支持unicode编码
inf_rep : str, default 'inf'
    无穷大的表示方式
verbose : bool, default True
   在错误日志中显示更多信息。
freeze_panes : tuple of int (length 2), optional
    指定要冻结的最底部的行和最右边的列。
'''

if __name__ == '__main__':
    '''
    ['a', 'b'], ['c', 'd']两行数据
    ['row 1', 'row 2']是索引的值
    ['col 1', 'col 2']是列名
    '''
    df1 = pd.DataFrame([['a', 'b'], ['c', 'd']], index=['row 1', 'row 2'], columns=['col 1', 'col 2'])
    df1.to_excel('demo1.xlsx', sheet_name='第一薄')

    # 一个excel文件写入多个工作薄
    df2 = pd.DataFrame([['1', '2'], ['3', '4']], index=['row 1', 'row 2'], columns=['col 1', 'col 2'])
    df3 = pd.DataFrame([['+', '-'], ['*', '/']], index=['row 1', 'row 2'], columns=['col 1', 'col 2'])
    with pd.ExcelWriter('output.xlsx') as writer:  # doctest: +SKIP
        df2.to_excel(writer, sheet_name='第二薄')
        df3.to_excel(writer, sheet_name='第三薄')

读取xlsx(Excel)文件

 import pandas as pd

'''io : str, bytes, ExcelFile, xlrd.Book, path object, or file-like object
    Any valid string path is acceptable. The string could be a URL. Valid
    URL schemes include http, ftp, s3, and file. For file URLs, a host is
    expected. A local file could be: ``file://localhost/path/to/table.xlsx``.
    If you want to pass in a path object, pandas accepts any ``os.PathLike``.

    By file-like object, we refer to objects with a ``read()`` method,
    such as a file handler (e.g. via builtin ``open`` function)
    or ``StringIO``.
    
    文件的路径：本地文件或者网络文件
  
sheet_name : str, int, list, or None, default 0
    Strings are used for sheet names. Integers are used in zero-indexed
    sheet positions. Lists of strings/integers are used to request
    multiple sheets. Specify None to get all sheets.

    Available cases:
    * Defaults to ``0``: 1st sheet as a `DataFrame`
    * ``1``: 2nd sheet as a `DataFrame`
    * ``"Sheet1"``: Load sheet with name "Sheet1"
    * ``[0, 1, "Sheet5"]``: Load first, second and sheet named "Sheet5"
      as a dict of `DataFrame`
    * None: All sheets.
    
    默认数字0是第一个工作薄、数字1是第二个工作薄... 
    也可以输入工作薄的名称  
    如果值为None  则等价于选择所有的工作薄

header : int, list of int, default 0
    Row (0-indexed) to use for the column labels of the parsed
    DataFrame. If a list of integers is passed those row positions will
    be combined into a ``MultiIndex``. Use None if there is no header.
    
    默认值为0代表把第一行当成header  可以理解为列名那一行
    使用None代表不使用header
    
    header什么场景可以用到？如果前几行是空白数据  那你可以使用header跳过空白数据
    
names : array-like, default None
    List of column names to use. If file contains no header row,
    then you should explicitly pass header=None.
    
    对列名进行重命名 如果没有header 需要设置header=None
    
index_col : int, list of int, default None
    Column (0-indexed) to use as the row labels of the DataFrame.
    Pass None if there is no such column.  If a list is passed,
    those columns will be combined into a ``MultiIndex``.  If a
    subset of data is selected with ``usecols``, index_col
    is based on the subset.
    
    默认值是None代表不使用内部列作为索引  而是生成新的一列（数字）作为索引
    0代表使用第一列作为索引...
    也可以使用列名代替数字
    
usecols : int, str, list-like, or callable default None
    * If None, then parse all columns.
    * If str, then indicates comma separated list of Excel column letters
      and column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of
      both sides.
    * If list of int, then indicates list of column numbers to be parsed.
    * If list of string, then indicates list of column names to be parsed.

      .. versionadded:: 0.24.0

    * If callable, then evaluate each column name against it and parse the
      column if the callable returns ``True``.

    Returns a subset of the columns according to behavior above.

      .. versionadded:: 0.24.0

    你想使用列的范围，有这样几种输入格式
    1、None 使用所有列
    2、'A:F'使用A-F列的数据（包括A列和F列）
    3、[0,1,2]使用第一列、第二列、第三列的数据
    4、['ID','money'] 使用列名为ID和money的列
    
squeeze : bool, default False
    If the parsed data only contains one column then return a Series.
    
    如果解析的数字只有一列  则返回序列
    
dtype : Type name or dict of column -> type, default None
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
    Use `object` to preserve data as stored in Excel and not interpret dtype.
    If converters are specified, they will be applied INSTEAD
    of dtype conversion.
    
    改变某些列的数据类型
    dtype={'ID':np.float64，'money':np.int32} 把列名为ID的数据类型改为float np为numpy
    
engine : str, default None
    If io is not a buffer or path, this must be set to identify io.
    Acceptable values are None, "xlrd", "openpyxl" or "odf".
    
    引擎
    
converters : dict, default None
    Dict of functions for converting values in certain columns. Keys can
    either be integers or column labels, values are functions that take one
    input argument, the Excel cell content, and return the transformed
    content.
    
    你想对某列值做的改变
    converters={'time': lambda x: x.split('.')[0]} 对time这一列的值分割后只取第一个值
    
true_values : list, default None

    Values to consider as True.
    
    把一些值改为True  建议和false_values一起使用
    只有当一列全为True、False的时候  true、false才会显示、否则列不会发生变化
    例如  考试有四个评分abcd ab及格、cd不及格  代码如下
    true_values=['a','b'],false_values=['c','d']
    
false_values : list, default None

    Values to consider as False.
    
    把一些值改为False 建议和true_values一起使用
    只有当一列全为True、False的时候  true、false才会显示、否则列不会发生变化

    
skiprows : list-like
    Rows to skip at the beginning (0-indexed).
    跳过前多少行  
    skipfooter（跳过后多少行）  相反的作用  
    
nrows : int, default None
    Number of rows to parse.
    .. versionadded:: 0.23.0
    
    解析的行数,可以理解为只显示前多少行

na_values : scalar, str, list-like, or dict, default None
    Additional strings to recognize as NA/NaN. If dict passed, specific
    per-column NA values. By default the following values are interpreted
    as NaN: '"""
    + fill("', '".join(sorted(STR_NA_VALUES)), 70, subsequent_indent="    ")
    + """'.
    
    你想把哪些值当作na
    例如  考试有四个评分ABCD  你想把D当作NA  代码如下
    na_values='D'
    
keep_default_na : bool, default True
    Whether or not to include the default NaN values when parsing the data.
    Depending on whether `na_values` is passed in, the behavior is as follows:

    * If `keep_default_na` is True, and `na_values` are specified, `na_values`
      is appended to the default NaN values used for parsing.
    * If `keep_default_na` is True, and `na_values` are not specified, only
      the default NaN values are used for parsing.
    * If `keep_default_na` is False, and `na_values` are specified, only
      the NaN values specified `na_values` are used for parsing.
    * If `keep_default_na` is False, and `na_values` are not specified, no
      strings will be parsed as NaN.

    Note that if `na_filter` is passed in as False, the `keep_default_na` and
    `na_values` parameters will be ignored.
    
    如果na_filter为false 所有的值都不会作为NaN
    实例如下
    keep_default_na         na_values
       true                     true        把默认的值和添加的值都作为NaN
       true                     false       只把默认的值作为NaN
       false                    true        只把添加的值作为NaN
       false                    false       不把任何值作为NaN
     
na_filter : bool, default True
    Detect missing value markers (empty strings and the value of na_values). In
    data without any NAs, passing na_filter=False can improve the performance
    of reading a large file.
    
    检测缺失的值  但是也会增加开销 如果想提高文件的读入速度，可以让   na_filter=False
    
verbose : bool, default False
    Indicate number of NA values placed in non-numeric columns.
    
    网上的解释：是否显示程序处理过程中的一些额外信息
    此处没懂含义
    
parse_dates : bool, list-like, or dict, default False
    The behavior is as follows:

    * bool. If True -> try parsing the index.
    * list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3
      each as a separate date column.
    * list of lists. e.g.  If [[1, 3]] -> combine columns 1 and 3 and parse as
      a single date column.
    * dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call
      result 'foo'

    If a column or index contains an unparseable date, the entire column or
    index will be returned unaltered as an object data type. If you don`t want to
    parse some cells as date just change their type in Excel to "Text".
    For non-standard datetime parsing, use ``pd.to_datetime`` after ``pd.read_excel``.

    Note: A fast-path exists for iso8601-formatted dates.
    
    当parse_dates为True时  系统会解析index是否可以转换为时间格式 无法解析 就不做更改
    当parse_dates为['one','two']时  系统会解析这两列是否可以转换为时间格式 无法解析就不做更改
    当parse_dates为[['year','month','day']]时 系统会解析这三列 如果可以转换为时间格式 就把三列挤压到一列 如果年月日分为三列  可以用这个操作把年月日挤压为一列
    当parse_dates为{'年月日':['year','month','day']}时 三列挤压为一列  并且把列名的名称改为'年月日'
    
date_parser : function, optional
    Function to use for converting a sequence of string columns to an array of
    datetime instances. The default uses ``dateutil.parser.parser`` to do the
    conversion. Pandas will try to call `date_parser` in three different ways,
    advancing to the next if an exception occurs: 1) Pass one or more arrays
    (as defined by `parse_dates`) as arguments; 2) concatenate (row-wise) the
    string values from the columns defined by `parse_dates` into a single array
    and pass that; and 3) call `date_parser` once for each row using one or
    more strings (corresponding to the columns defined by `parse_dates`) as
    arguments.
    
    当你想让系统把某列当作时间格式去读取时，你的列需要满足系统时间格式的要求  比如2020-01-01、2020.01.01
    如果你的列是这样的格式  2020年1月1日  中文 系统肯定不认识  这时候你只需要这样做
    parse_dates=['测试'],
    date_parser=lambda x: pd.to_datetime(x, format='%Y年%m月%d日'
    PS：'测试'是2020年1月1日那一列
    
thousands : str, default None
    Thousands separator for parsing string columns to numeric.  Note that
    this parameter is only necessary for columns stored as TEXT in Excel,
    any numeric columns will automatically be parsed, regardless of display
    format.
    
    thousandsc是用来处理以“千分位分隔符”表示的数据的，且存储时必须为string类型。
    某些数字太长需要用','分割以便人去阅读  
    例如   123,11,111,222
    ','不是你想要的数据 只需要thousands=',' 就可以剔除','  这个功能有点像replace  把所有逗号换成空
    
comment : str, default None
    Comments out remainder of line. Pass a character or characters to this
    argument to indicate comments in the input file. Any data between the
    comment string and the end of the current line is ignored.
    
    跟编程语言的注释不编译是一个思想
    这里coment是注释的意思  如果你指定了comment为某个值  那这个值和同一行它后面的值都是None
    
skipfooter : int, default 0
    Rows at the end to skip (0-indexed).
    
    跳过 后（后面） 多少行
    
convert_float : bool, default True
    Convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric
    data will be read in as floats: Excel stores all numbers as floats
    internally.
    
    如果  convert_float=false  所有的数字类型都被当作float类型
    
mangle_dupe_cols : bool, default True
    Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than
    'X'...'X'. Passing in False will cause data to be overwritten if there
    are duplicate names in the columns.
    
    如果 mangle_dupe_cols=true  某列被复制  复制那列的列名为x.1(x为原列名)
    如果 mangle_dupe_cols=false 某列被复制  原列会被复制的列重写（就是新列顶替旧列）
    
**kwds : optional
        Optional keyword arguments can be passed to ``TextFileReader``.

Returns
-------
DataFrame or dict of DataFrames
    DataFrame from the passed in Excel file. See notes in sheet_name
    argument for more information on when a dict of DataFrames is returned.

See Also
--------
to_excel : Write DataFrame to an Excel file.
to_csv : Write DataFrame to a comma-separated values (csv) file.
read_csv : Read a comma-separated values (csv) file into DataFrame.
read_fwf : Read a table of fixed-width formatted lines into DataFrame.

Examples
--------
The file can be read using the file name as string or an open file object:

>>> pd.read_excel('tmp.xlsx', index_col=0)  # doctest: +SKIP
       Name  Value
0   string1      1
1   string2      2
2  #Comment      3

>>> pd.read_excel(open('tmp.xlsx', 'rb'),
...               sheet_name='Sheet3')  # doctest: +SKIP
   Unnamed: 0      Name  Value
0           0   string1      1
1           1   string2      2
2           2  #Comment      3

Index and header can be specified via the `index_col` and `header` arguments

>>> pd.read_excel('tmp.xlsx', index_col=None, header=None)  # doctest: +SKIP
     0         1      2
0  NaN      Name  Value
1  0.0   string1      1
2  1.0   string2      2
3  2.0  #Comment      3

Column types are inferred but can be explicitly specified

>>> pd.read_excel('tmp.xlsx', index_col=0,
...               dtype={'Name': str, 'Value': float})  # doctest: +SKIP
       Name  Value
0   string1    1.0
1   string2    2.0
2  #Comment    3.0

True, False, and NA values, and thousands separators have defaults,
but can be explicitly specified, too. Supply the values you would like
as strings or lists of strings!

>>> pd.read_excel('tmp.xlsx', index_col=0,
...               na_values=['string1', 'string2'])  # doctest: +SKIP
       Name  Value
0       NaN      1
1       NaN      2
2  #Comment      3

Comment lines in the excel input file can be skipped using the `comment` kwarg

>>> pd.read_excel('tmp.xlsx', index_col=0, comment='#')  # doctest: +SKIP
      Name  Value
0  string1    1.0
1  string2    2.0
2     None    NaN
'''

df1 = pd.read_excel('/Users/apple/Desktop/test.xlsx', convert_float=False)

print(df1)

行列单元格的处理(Series)

  # Series其实就是一行（列）数据，是行还是列取决于插入dataFrame时用的是字典格式还是列表格式
    s1 = pd.Series([1, 2, 3], index=[1, 2, 3], name='A')
    s2 = pd.Series([10, 20, 30], index=[1, 2, 3], name='B')
    s3 = pd.Series([100, 200, 300], index=[1, 2, 3], name='C')

    # 按行插入
    data = pd.DataFrame([s1, s2, s3])

    # 按列插入
    data = pd.DataFrame({
     s1.name: s1, s2.name: s2, s3.name: s3})
    print(data)

数字填充、日期填充


from datetime import date, datetime

import pandas as pd

readPath=''

test = pd.read_excel(readPath)

# 获取到单元格  然后按照你想要的规则赋值
#获取单元格：test['ID'].at[i] 等价于 test.at[i,'ID']

# NAN默认变成浮点类型

# 数字+1填充
for i in test.index:
    test.at[i, 'ID'] = i + 1

# 日期填充

# 增加日期列
s1 = pd.Series(['2020-11-01', '', ''], index=[0, 1, 2], name='time')
test[s1.name] = s1.values

# 获取第一个单元格  转换为日期
firstData = datetime.strptime(test['time'].at[0], "%Y-%m-%d")

# 年份+1填充
for i in test.index:
    test['time'].at[i] = date(firstData.year + i, firstData.month, firstData.day)

print(test)

# 月份+1填充  因为12月进1年 所以需要小算法处理
def add_month(cellDate, addMonthNumber):
    year = cellDate.year + addMonthNumber // 12
    month = cellDate.month + addMonthNumber % 12
    # 如果月份大于12
    if month > 12:
        year += month // 12
        month = month % 12
    day = cellDate.day
    return date(year, month, day)

for i in test.index:
    test['time'].at[i] = add_month(firstData, i)
print(test)

# 天数+1填充 

# 天数更复杂 要列举12个月的天数  以及区分闰年、平年

# 判断是否是闰年
def judgeYears(years):
    if ((years % 4 == 0 and years % 100 != 0) or (years % 400 == 0)):
        return True
    else:
        return False
        
def add_day(cellDate, addDayNumber):
    # 每年每个月有多少天
    yearMonthDays = {
     }

    # 当前是哪年
    year = cellDate.year

    # 当前年有总共多少天
    yearWithDays = 0

    # 初始化属性 每年每个月有多少天
    if (judgeYears(year)):
        # 闰年  二月29天
        yearMonthDays = {
     1: 31, 3: 31, 5: 31, 7: 31, 8: 31, 10: 31,
                         12: 31, 4: 30, 6: 30, 9: 30, 11: 30, 2: 29}
        yearWithDays = 366
    else:
        # 平年  二月28天
        yearMonthDays = {
     1: 31, 3: 31, 5: 31, 7: 31, 8: 31, 10: 31,
                         12: 31, 4: 30, 6: 30, 9: 30, 11: 30, 2: 28}
        yearWithDays = 365

    # 处理年数
    while addDayNumber // yearWithDays > 0:
        # addDayNumber减去当前年的总天数
        addDayNumber -= yearWithDays
        # 当前年份+1
        year += 1

        # 根据新的一年  重新配置今年的总天数和二月天数
        if judgeYears(year):
            # 闰年
            yearMonthDays[2] = 29
            yearWithDays = 366

        else:
            # 平年
            yearMonthDays[2] = 28
            yearWithDays = 365

    # 此时循环出来的addDayNumber定是小于365的  所有只需要考虑月份就好了

    # 当前是几月
    month = cellDate.month
    # 当前月份有多少天
    monthWithdays = yearMonthDays[cellDate.month]
    # 当前是几号
    day = cellDate.day

    # 处理月份
    # 如果addDayNumber大于当前月份的天数  月份+1
    while addDayNumber > monthWithdays:

        # 把addDayNumber减去当月的天数
        addDayNumber -= monthWithdays
        # 把当前月+1
        month += 1
        # 判断  当前月是否>12
        if month > 12:
            # 进位的年数加上对应的数量
            year += month // 12
            # 月份重新计数
            month = month % 12
            # 此时 年份已经变  应该考虑 是否闰年
            if judgeYears(year):
                # 闰年
                yearMonthDays[2] = 29
                yearWithDays = 366

            else:
                # 平年
                yearMonthDays[2] = 28
                yearWithDays = 365

        # 更新+1后的月有多少天
        monthWithdays = yearMonthDays[month]
    # 此时的addDayNumber定小于28  具体是几号为addDayNumber+day  但存在月份进位的情况
    if addDayNumber + day > monthWithdays:
        day = addDayNumber + day - monthWithdays
        # 月份+1
        month += 1
        # 此时判断 月份有没有越界
        if month > 12:
            # 进位的年数加上对应的数量
            year += month // 12
            # 月份重新计数
            month = month % 12
            # 此时 年份已经变  应该考虑 是否闰年
            if judgeYears(year):
                # 闰年
                yearMonthDays[2] = 29
                yearWithDays = 366

            else:
                # 平年
                yearMonthDays[2] = 28
                yearWithDays = 365
    else:
        day = day + addDayNumber
    return date(year, month, day)


for i in test.index:
    test['time'].at[i] = add_day(firstData, i)

print(test)

列之间的加减乘除（折扣价格=单价*折扣）

import pandas as pd

readPath=''

test = pd.read_excel(readPath)

# 这种方式非常简便  但是不适应  区域数据的乘积
test['总金额'] = test['单价'] * test['销售额']

# 指定区域进行数据操作
for i in range(5, 10):
    test['总金额'].at[i] = test['单价'].at[i] * test['销售额'].at[i] * 10

# apply操作  对每个元素做同样操作
# lambda 可以理解为简化的函数
test['总金额'] = test['单价'].apply(lambda x: x + 2)
print(test)

数据排序`在这里插入代码片`

import  pandas as pd
readPath=''
sort=pd.read_excel(readPath)
print(sort)
#by 排序字段   多个字段用列表
# ascending  False 降序  多个字段用列表 
# inplace    True  不生成新的rdd
sort.sort_values(by=['单价','销售额'],ascending=[False,False],inplace=True)
print(sort)

数据筛选和过滤

import pandas as pd

readPath=''
filter = pd.read_excel(readPath)
# 过滤掉总金额小于600的
# loc相当于where

filter = filter.loc[filter.总金额.apply(lambda x: x >= 600)]
print(filter)

行操作集锦

import pandas as pd

readPah = '/Users/apple/Desktop/test.xlsx'
re1 = pd.read_excel(readPah, sheet_name='Sheet1', dtype={
     'time': str})
re2 = pd.read_excel(readPah, sheet_name='Sheet2', dtype={
     'time': str})
# 两个DF拼接  重新调整index
re1 = re1.append(re2).reset_index(drop=True)

# 表格结尾插入新的series
newSeries = pd.Series({
     'ID': 41, 'money': 41, 'category': 3, 'time': '2020.1.1'})
re1 = re1.append(newSeries, ignore_index=True)

# 更改某个单元格的值  纵横交错定位一个单元格  用at赋值
re1.at[37, 'category'] = 2

# 替换某行的全部数据去更改某个单元格的值 
newSeries = pd.Series({
     'ID': 41, 'money': 50, 'category': 2, 'time': '2020.1.1'})
# iloc 是index location的意思  索引定位
re1.iloc[37] = newSeries

# 表格中间插入某行数据  先把表格从插入点切割开，然后拼接组装
newSeries = pd.Series({
     'ID': 2, 'money': 50, 'category': 2, 'time': '2020.1.1'})
part1 = re1[:2]
part2 = re1[2:]
re1 = part1.append(newSeries, ignore_index=True).append(part2).reset_index(drop=True)

# 删除数据行 
re1.drop(index=[0, 1, 2], inplace=True)

# 删除数据行  通过条件去删除某行
# 删除所有类别为1的列
deleteRows = re1.loc[re1.category == 1]
re1.drop(index=deleteRows.index, inplace=True)
# 删除后index不规整 重新设置index
re1 = re1.reset_index(drop=True)
print(re1)

列集锦操作

import numpy as np
import pandas as pd

readPah = '/Users/apple/Desktop/test.xlsx'
re1 = pd.read_excel(readPah, sheet_name='Sheet1', dtype={
     'time': str})
re2 = pd.read_excel(readPah, sheet_name='Sheet2', dtype={
     'time': str})
# concat 当axis=1 左右拼接
# concat 当axis=0 上下拼接
re3 = pd.concat([re1, re2]).reset_index(drop=True)
# 添加一列  如果是一个数 则整列填充
# re3['newColumn']='ceshi'
# 添加一列 从0开始填充
re3['newColumn'] = np.arange(0, len(re3))
# 删除一列
re3.drop(columns='newColumn', inplace=True)
# 插入一列  在第1列之后插入
re3.insert(1, column='ceshi', value='3')
# 更改列名  将ceshi更为noceshi
re3.rename(columns={
     'ceshi': 'noceshi'}, inplace=True)
# 去掉空值（nan）一行扫描 发现nan  整行删除
re3.dropna(inplace=True)
print(re3)

柱形图

import pandas as pd
import matplotlib.pyplot as plt
readPah='/Users/apple/Desktop/test.xlsx'
barShape=pd.read_excel(readPah)
print(barShape)

# 如果数据大小顺序乱  可以排序后再使用图表
# matplotlib.pyplot制图
plt.bar(barShape.name,barShape.money,color='red')
plt.title('nameAndPrice',fontSize='30')

# dataFrame制图
# barShape.plot.bar(x='name',y='money',title='nameAndPrice',color='red')

plt.tight_layout()

plt.show()

分组柱形图

import pandas as pd
import matplotlib.pyplot as plt
readPah='/Users/apple/Desktop/test.xlsx'
barShape=pd.read_excel(readPah)

# DT.plot.bar制图毕竟使用的是matplotlib的接口 所以会有很多限制
# 但是 matplotlib.pyplot可以补充DT.plot.bar制图的信息

#DT.plot.bar制图  多个y轴图表用列表
barShape.plot.bar(x='name',y=['money','moreMoney'],color=['red','green'])

# matplotlib.pyplot补充DT.plot.bar制图的信息
# 标题 内容 字体大小 字体种类
plt.title('more bar',fontsize=20,fontweight='bold')
# x标签 内容 字体大小 字体种类
plt.xlabel('name',fontweight='bold',fontsize=20)
# y标签 内容 字体大小 字体种类
plt.ylabel('all money',fontweight='bold',fontsize=20)
# 获得轴
ax=plt.gca()
# 设置x轴的标签  旋转  水平对齐
ax.set_xticklabels(barShape.name,rotation=45,ha='right')
# 获得图
f=plt.gcf()
# 图的位置调整  左边留出20%，低端留出42%
f.subplots_adjust(left=0.2,bottom=0.42)

plt.show()
print(barShape)

叠加柱状图

import pandas as pd
import matplotlib.pyplot as plt
readPah='/Users/apple/Desktop/test.xlsx'
barShape=pd.read_excel(readPah)
# 如果想要按照n列叠加总数排序 可以建立一个列 存放叠加列的总和  然后按照叠加列排序

# barShape.plot.barh 是水平的柱形图
# barShape.plot.bar 是竖直的的柱形图
# stacked=True  是叠加图  stacked=True  是分组图
barShape.plot.barh(x='name',y=['money','moreMoney'],stacked=True,title='stacked bar')
plt.tight_layout()
plt.show()
print(barShape)

饼图

import pandas as pd
import matplotlib.pyplot as plt
readPah='/Users/apple/Desktop/test.xlsx'
# 饼图默认用index作为名称  如果不指定index列  饼图的每部分名称 就是0，1，2，3。。。。
pieShape=pd.read_excel(readPah,index_col='name')

# 饼图逆时针排列 counterclock=True   默认不写
# 饼图顺时针排列 counterclock=False
# startangle=-200 开始的角度为-200度
pieShape.money.plot.pie(fontsize=12,counterclock=False,startangle=-200)
print(pieShape)
plt.title('pie',fontsize=18)
# 设置Y轴标签
plt.ylabel('Y-Lable',fontsize=20,fontweight='bold')
plt.show()

折线图和叠加折线图

import pandas as pd
import matplotlib.pyplot as plt
readPah='/Users/apple/Desktop/test.xlsx'
lineShape=pd.read_excel(readPah,index_col='name')
print(lineShape)
# 折线图是plot
# lineShape.plot(y=['money','moreMoney'])

#叠加折线图是plot.area
lineShape.plot.area(y=['money','moreMoney'])
plt.title('line',fontsize=20,fontweight='bold')
plt.ylabel('about money',fontsize=20,fontweight='bold')
plt.show()

散点图

import matplotlib.pyplot as plt
import pandas as pd

# 当数据字段过多时，print不会输出全部字段  所以用下面这行代码让print输出全部字段
pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
scatterShape = pd.read_excel(readPah, index_col='name')
print(scatterShape)
# 散点图是scatter
scatterShape.plot.scatter(x='money', y='moreMoney')
plt.show()

直方图

import matplotlib.pyplot as plt
import pandas as pd

# 当数据字段过多时，print不会输出全部字段  所以用下面这行代码让print输出全部字段
pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
squareShape = pd.read_excel(readPah, index_col='name')
# bins越大 分割数越多  越细
squareShape.money.plot.hist(bins=10)
# 设置x轴标签
plt.xticks(range(0,max(squareShape.index),5),fontsize=20,rotation=90)
plt.show()
print(squareShape)

密度图

# 当数据字段过多时，print不会输出全部字段  所以用下面这行代码让print输出全部字段
pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
kdeShape = pd.read_excel(readPah, index_col='name')
# 密度图是kde
kdeShape.money.plot.kde()
# 设置x轴标签
plt.xticks(range(0, max(kdeShape.index), 3), fontsize=10, rotation=90)
plt.show()
print(kdeShape)

数据相关性

import matplotlib.pyplot as plt
import pandas as pd

# 当数据字段过多时，print不会输出全部字段  所以用下面这行代码让print输出全部字段
pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
re = pd.read_excel(readPah, index_col='name')
# 展示每列数据之间的相关性   是否呈线性相关
print(re.corr())

多表联合（merge和join）

import pandas as pd

# 当数据字段过多时，print不会输出全部字段  所以用下面这行代码让print输出全部字段
pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
sheet1 = pd.read_excel(readPah, index_col='name', sheet_name='Sheet1')
sheet2 = pd.read_excel(readPah, index_col='name', sheet_name='Sheet2')
# print(sheet1)
# print(sheet2)
'''
1、merge 的使用相当于Excel中的vlookup函数
2、在这里只介绍：自主设置index_col的情况
3、how的作用：假设左边三列  右边五列 how='left' 结果只有三列  how='right' 结果只有五列
4、left_on 和right_on 是联合的列
5、fillna(0)把值为na的数据填充为0
'''
#
# allSheet=sheet1.merge(sheet2,how='left',left_on=sheet1.index,right_on=sheet2.index).fillna(0)
# # 把数据类型改为int
# allSheet.mergeMoney=allSheet.mergeMoney.astype(int)

'''
1、join没有 left_on 和right_on   只有on  也就是说 联合的列名要相等
'''
allSheet=sheet1.join(sheet2,how='left',on='name').fillna(0)
# 把数据类型改为int
allSheet.mergeMoney=allSheet.mergeMoney.astype(int)

print(allSheet)

数据值的检测

import pandas as pd

# 数据检测函数
def checkInvalidData(row):
    try:
        assert 30 >= row.money >= 10
    except:
        print(f'不合标准的数据：[name]:{row.name}\t[money]:{row.money}')


# 当数据字段过多时，print不会输出全部字段  所以用下面这行代码让print输出全部字段
pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
re = pd.read_excel(readPah, index_col='name')
# axis=1:代表数据横向检查（一行一行检查）
# 检测数据
re.apply(checkInvalidData, axis=1)

姓名列转换成姓列和名列（一列分两列）

import pandas as pd

pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
re = pd.read_excel(readPah, sheet_name='Sheet2', index_col='ID')
print(re)
# expand的作用是把分割后的列汇总
df = re.name.str.split(" ", expand=True)
# 生成新的列
# 姓的拼接全大写
re["姓"] = df[0].str.upper()
re["名"] = df[1]
print(re)

数列值的总和、平均值等操作

import pandas as pd

pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
re = pd.read_excel(readPah, sheet_name='Sheet1', index_col='ID')
# 把需要计算的列取出来
aboutMoney = re[['money', 'moreMoney']]

# axis=1的作用是横式计算（从左往右）   axis=0的作用是列式计算（从上往下）   默认列式计算
# mean 是average的意思
row_mean = aboutMoney.mean(axis=1)
# 总和
row_sum = aboutMoney.sum(axis=1)
# 增加列
re['sum'] = row_sum
re['mean'] = row_mean
# 增加行
col_mean = re[['money', 'moreMoney', 'sum', 'mean']].mean()
re = re.append(col_mean, ignore_index=True)
print(re)

重复数据的查找和去除

import pandas as pd

pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
re = pd.read_excel(readPah, sheet_name='Sheet1', index_col='ID')
# 查看重复的数据
result = re.duplicated(subset='money')
result = result[result == True]
# iloc是indexLocation的意思  根据index得到数据
print('重复的数据')
print(re.iloc[result.index])
# 去除重复数据
# subset的作用是设置哪一列不能出现重复数据
# keep的作用是保留哪个重复数据，开始的数据或结尾的数据
re.drop_duplicates(subset='money', inplace=True, keep='last')
print('清楚重复数据后的表格')
print(re)

行列颠倒（行转列，列转行）

import pandas as pd

pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
re = pd.read_excel(readPah, sheet_name='Sheet1', index_col='ID')
# 行列颠倒  transpose
result=re.transpose()
print(result)

读取不同格式的文件

import pandas as pd
# 无论是csv（默认逗号分割）、tsv（\tab分割）、txt（文本）都可以用read_csv  sep 是分割符 index_col是指定索引
pd.read_csv('',sep='',index_col='')

数据透视表

import numpy as np
import pandas as pd

pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
re = pd.read_excel(readPah, sheet_name='Sheet1', index_col='ID')

# 从时间里取年份
re['year'] = pd.DatetimeIndex(re.time).year

# index 相当于rows 是行  columns是列  values 是行列交叉的点的值  aggfunc是对于交叉点值有什么要求
# 透视表 就是行列交叉得到对应的值
# 分类和年份对应money的值 money的什么值呢   sum值

p1 = re.pivot_table(index='category', columns='year', values='money', aggfunc=np.sum)
print('pivot_table表')
print(p1)

# 分组  group by  按组查询  mysql附体
# 透视表是行列交叉   第一行第一列得到某个值
# groupby是列列得值  第一行第一列第二列得到某个值
gb = re.groupby(['category', 'year'])
# 得到sum 和count
s = gb.money.sum()
c = gb.money.count()
re2 = pd.DataFrame({
     'Sum': s, 'Count': c})
print('group by表')
print(re2)

线性回归预测数据

import matplotlib.pyplot as plt
# import numpy as np
import pandas as pd
# scipy是科学计算包  调用其中的线性回归包
from scipy.stats import linregress

pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
# 把时间这一列 作为str类型
re = pd.read_excel(readPah, sheet_name='Sheet1', index_col='ID', dtype={
     'time': str})

'''
    每个参数的作用
 slope : float  斜率
        Slope of the regression line.
    intercept : float  截距
        Intercept of the regression line.
    rvalue : float   相关系数
        Correlation coefficient.
    pvalue : float   暂时不明白这个参数的意思
        Two-sided p-value for a hypothesis test whose null hypothesis is
        that the slope is zero, using Wald Test with t-distribution of
        the test statistic.
    stderr : float  标准误差
        Standard error of the estimated gradient.
'''
# 斜率、截距、相关系数、pvalue、标准误差
slope, intercept, rvalue, pvalue, stderr = linregress(re.index, re.money)

# 得到预测值
exceptionValue=slope*re.index+intercept

# 调用散点图图
plt.scatter(re.index, re.money)

# 调用折线图
plt.plot(re.index,exceptionValue,color='red')
# 设置标题
plt.title('Y='+str(slope)+'*X+'+str(intercept))
#  设置x标签  用时间替换index的x轴
plt.xticks(re.index, re.time, rotation=90)
# 布局调整
plt.tight_layout()
# 斜率和截距都有了  只要给出x值就可以求出y值  简称预测数据
plt.show()

Jupyter实现条件格式化（数据上色）

Jupyter的安装和使用

1.pycharm搜索Jupyter库并下载
2.对pycharm的终端输入 python -m IPython notebook
3.跳转页面后

源码和成果预览

import pandas as pd

# 低于40分上红色
def low_money_red(score):
    color='red' if score<40 else 'black'
    return f'color:{color}'
#每列最高分上绿色背景
def hightest_number_green(column):
    return ['background-color:lime' if number==column.max() else 'background-color:white' for number in column]
pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
# 把时间这一列 作为str类型
re = pd.read_excel(readPah, sheet_name='Sheet1', index_col='ID', dtype={
     'time': str})
# apply传递的是以series为基本单位，行或者列；applymap传递的是一个元素，dataframe最基本单元
re.style.applymap(low_money_red,subset='money').apply(hightest_number_green,subset='money')

Jupyter实现条件格式化（颜色深浅)

import pandas as pd
import seaborn as sns
# 设置面板颜色
color_map=sns.light_palette('green',as_cmap=True)
pd.options.display.max_columns = 777
readPah = '/Users/apple/Desktop/test.xlsx'
# 把时间这一列 作为str类型
re = pd.read_excel(readPah, sheet_name='Sheet1', index_col='ID', dtype={
     'time': str})
# 设置背景颜色陡度
re.style.background_gradient(color_map,subset='money')

Jupyter实现条件格式化（进度条)

import pandas as pd
import seaborn as sns

readPah = '/Users/apple/Desktop/test.xlsx'
# 把时间这一列 作为str类型
re = pd.read_excel(readPah, sheet_name='Sheet1', index_col='ID', dtype={
     'time': str})
# 设置进度条颜色
re.style.bar(color='orange',subset='money')

pandas和数据库

import pandas
import pymysql

# 设置本地mysql信息
conn = pymysql.connect(
    host="localhost",
    user="root",
    password="123456",
    database="QianFuxin",
    charset="utf8")
# 设置sql语句
sql = 'select * from job'
# 数据查询并转为DF
job = pandas.read_sql_query(sql, conn)
# 打印数据
print(job.head(100))

复杂编程（函数）

import pandas as pd


# 金钱乘于种类的平方
def moneyMultiplyCategory(money, category):
    return money * (category ** 2)


readPah = '/Users/apple/Desktop/test.xlsx'
re1 = pd.read_excel(readPah, sheet_name='Sheet1', dtype={
     'time': str})
# axis=1 按行扫描
# row是某行 row['money']是某行某列的值
# apply 是应用函数
# lamba是简易函数
re1['allMoney'] = re1.apply(lambda row: moneyMultiplyCategory(row['money'], row['category']), axis=1)

print(re1)

你可能感兴趣的:(python)

Flask(二) 路由routes @昵称不存在 Flask flask
文章目录基本路由定义路由参数路由规则设置请求方法（GET/POST）路由函数返回静态文件和模板Blueprint（模块化路由）显示当前所有路由Flask路由是Web应用程序中将URL映射到Python函数的机制。定义路由：使用@app.route(‘/path’)装饰器定义URL和视图函数的映射。路由参数：通过动态部分在URL中传递参数。路由规则：使用类型转换器指定URL参数的类型。请求方法：指定
python中random中uniform怎么用_Python中的random.uniform()函数教程与实例解析 weixin_39763640
random.uniform()函数教程与实例解析1.uniform()函数说明random.uniform(x,y)方法将随机生成一个实数，它在[x,y]范围内。2.uniform()的语法与参数2.1语法#_*_coding:utf-8_*_importrandomrandom.uniform(x,y)或#_*_coding:utf-8_*_fromrandomimportuniformuni
Python实例题：基于 KNN 算法的手写数字识别
目录Python实例题题目要求：解题思路：代码实现：Python实例题题目基于KNN算法的手写数字识别要求：实现一个基于K-NearestNeighbors(KNN)算法的手写数字识别系统。支持以下功能：使用MNIST数据集训练和测试模型实现KNN分类算法可视化手写数字样本评估模型性能（准确率、混淆矩阵等）添加用户交互界面，允许用户绘制数字并进行识别。解题思路：使用sklearn加载MNIST数据
Python实例题：基于遗传算法的旅行商问题求解狐凄实例 python 开发语言
目录Python实例题题目要求：解题思路：代码实现：Python实例题题目基于遗传算法的旅行商问题求解要求：使用遗传算法解决旅行商问题（TSP）。支持以下功能：随机生成城市坐标或导入预定义城市实现遗传算法的基本操作（选择、交叉、变异）可视化进化过程和最终路径统计进化过程中的适应度变化允许用户调整遗传算法参数（种群大小、迭代次数、交叉率、变异率等）。解题思路：用列表表示城市访问顺序作为染色体。使用欧
Python Flask Web教程004：Flask 变量规则若北辰 flask python 前端
FlaskWeb教程004：Flask变量规则1.Flask变量规则2.实例3.转换器构建规则4.规范的URL5.路由尾部有无斜杠的区别路由尾部斜杠的影响推荐使用带尾斜杠的路由结论1.Flask变量规则通过向规则参数添加变量部分，可以动态构建URL。此变量部分标记为。它作为关键字参数传递给与规则相关联的函数。2.实例在以下示例中，route()装饰器的规则参数包含附加到URL'/hello’的。因
Club_IntelliMatch_Development_Guide Joseit python python pygame django flask
ClubIntelliMatch系统-全栈开发流程文档概述ClubIntelliMatch系统是一个现代化的社团活动智能匹配平台，采用前后端分离架构。系统基于PythonFlask构建RESTfulAPI后端，Vue.js3+Vite构建现代化前端，MySQL作为持久化数据存储。本文档深入分析了整个开发流程的技术架构、设计原则和实现细节。系统架构流程图后端API架构前端组件架构app.pyFlas
Python实例题：基于 Flask 的博客系统狐凄实例 python 开发语言
目录Python实例题题目要求：解题思路：代码实现：1.base.html2.index.html3.post.html4.create_post.html5.login.html6.register.htmlPython实例题题目基于Flask的博客系统要求：使用Flask框架构建一个简单的博客系统。实现用户认证（注册、登录、注销）。支持博客文章的创建、编辑、删除和查看。使用SQLite数据库存
Python助力自动驾驶：深度学习模型优化全攻略 Echo_Wish Python！实战！python 自动驾驶深度学习
Python助力自动驾驶：深度学习模型优化全攻略说起自动驾驶，大家第一反应往往是“高精地图”“传感器融合”“路径规划”等等，背后真正的“大脑”其实是各式各样的深度学习模型。它们负责感知环境、识别路况、预测行为，甚至实时做出决策。可是，跑在车上的这些模型不仅要精准，还得轻量、实时、稳定，这可不是简单的“丢GPU就能解决”的问题。今天，咱们就从Python开发者的视角，聊聊自动驾驶里深度学习模型的优化
Ansible部署MySQL实操码农运维知识运维 mysql ansible mysql
一、Ansible概述Ansible是一款开源的自动化运维工具，由MichaelDeHaan于2012年创建，2015年被红帽（RedHat）收购（收购金额超1亿美元）。它基于Python开发，通过SSH协议实现远程节点管理，无需在被控端安装任何客户端代理（Agentless）。这种设计使其成为轻量级、易部署的自动化解决方案，特别适合批量系统配置、应用程序部署和任务编排等场景。核心特点无代理架构：
AI绘画背后的技术：Stable Diffusion原理详解与实战 AI学长带你学AI ai
AI绘画背后的技术：StableDiffusion原理详解与实战关键词：StableDiffusion、扩散模型、AI绘画、潜在空间、文本生成图像摘要：本文将带你揭开AI绘画“魔法”背后的核心技术——StableDiffusion的神秘面纱。我们会用“给小学生讲故事”的方式，从生活中的例子出发，逐步解释扩散模型的底层逻辑、StableDiffusion的关键创新，并用Python代码实战演示如何生
matplotlib 绘制热力图扶子 python matplotlib绘图代码 matplotlib python 经验分享热力图
1、功能介绍：使用了matplotlib和seaborn两个python库来创建并显示一个热力图。热力图是一种通过颜色变化来表示二维表格数据集中值分布的图形，适合用于展示矩阵数据或数据分析结果中的模式和趋势。2、代码部分：importmatplotlib.pyplotaspltimportseabornassnsimportnumpyasnp#设置中文字体plt.rcParams['font.sa
open3d 使用 RANSAC 算法拟合平面扶子 python 点云处理平面 python open3d 经验分享点云拟合平面
1、功能介绍：一个python代码演示了如何使用open3d和numpy来完成一个完整的点云平面拟合任务。它包括以下几个主要部分：生成符合某一平面方程的随机点云数据、使用RANSAC算法对这些点云进行平面拟合、可视化原始点云和平面拟合结果2、代码部分：importnumpyasnpimportopen3daso3d#生成随机点云np.random.seed(42)n_points=100#假设这些
pycharm——djiango之数据迁移，终端操作 Pop– python
首先在pycharm中找到terminal(终端)，输入指令：pythonmanage.pymakemigrations之后你会看到如下图：这表示创建成功。接着输入指令：pythonmanage.pymigrate就能看到好多ok，你在数据库中也能看到很多表你可以在终端打开数据库查看表，也可以使用客户端的可视化界面查看，还可以在pycharm中右边的database里边打开查看，如下图：之后你就可
python 百度云api_Python使用百度API上传文件到百度网盘代码分享 weixin_39775577 python 百度云api
#coding:UTF-8importurllibimporturllib2__author__='Administrator'fromposter.encodeimportmultipart_encodefromposter.streaminghttpimportregister_openersregister_openers()defupload(fileName):"""通过百度开发者API
Flask入门基础1 浅清陌 Flask flask python 后端
1Flask简介Flask诞生于2010年，是Arminronacher（阿明·罗纳彻）用Python语言基于Werkzeug工具箱编写的轻量级Web开发框架。Flask本身相当于一个内核，其他几乎所有的功能都要用到扩展（邮件扩展Flask-Mail，用户认证Flask-Login，数据库Flask-SQLAlchemy），都需要用第三方的扩展来实现。比如可以用Flask扩展加入ORM、窗体验证工
C++封装python调用库技术大白 c++开发语言
传结构体中间用空字符串问题使用callback传输结构体，中间出现\0字符，使用std::vector类型voidPyProcessInterface::ProcessContent(constchar*buff,UINT32size,boolfromSelf){if(callback){std::vectordataVec(buff,buff+size);callback(std::move(d
量化价值投资入门：Fama-French三因子模型详解与实战应用量化价值投资入门到精通 ai
量化价值投资入门：Fama-French三因子模型详解与实战应用关键词：量化投资、Fama-French三因子模型、价值投资、因子投资、资产定价、Python实现、投资组合管理摘要：本文深入解析Fama-French三因子模型的理论基础、数学原理和实际应用。作为现代金融学最重要的资产定价模型之一，三因子模型通过市场因子、规模因子和价值因子解释股票收益差异。我们将从模型起源开始，详细讲解其数学表达和
Python操作百度网盘指南 weixin_47233946 编程 python 开发语言
##介绍百度网盘是中国流行的云存储服务，通过API可以实现自动化操作。本指南介绍如何使用Python操作百度网盘，包括上传、下载、管理文件等功能。##准备工作###1.获取百度网盘开发者权限1.访问[百度开发者中心](https://pan.baidu.com/union/home)2.注册开发者账号并创建应用3.获取API密钥（AppKey和SecretKey）###2.安装必要的Python库
python读取sas数据集_SASpy模块，利用Python操作SAS
SASpy模块打通了Python与SAS之间的连接。有了SASpy模块，我们就能够在Python中操控SAS。本文将首先介绍SASpy模块的一些基本方法，最后通过一个聚类分析的例子，来展示如何在Python中调用SAS的机器学习过程，以及对聚类结果的可视化。SASpy模块特点1、需要Python3.X及以上，SAS9.4及以上，需要Java环境；2、无论是本地SAS还是远程服务器上的SAS，都可以
从数据抓取到分析：用Python爬虫获取、清洗与可视化数据程序员威哥 python 爬虫 c++
在数据科学领域，数据的获取、清洗与分析是整个数据处理过程中的关键步骤。随着互联网上数据的不断增多，使用Python爬虫抓取网站数据并进行分析已成为数据科学家和分析师的常见任务。本篇文章将通过具体的实例，展示如何使用Python从零开始抓取数据，清洗数据，并进行数据分析和可视化。1.数据抓取：用Python爬虫获取网页数据1.1选择爬虫工具Python提供了多个强大的爬虫框架和库，常用的工具包括：r
Python基础（吃洋葱小游戏） aaiier python pygame 开发语言
下面我将为你设计一个"吃洋葱小游戏"的Python实现方案，使用Pygame库开发。这个游戏模拟吃洋葱的过程，玩家需要收集不同种类的洋葱以获得高分，同时避免吃到辣椒。吃洋葱小游戏-Python实现方案1.游戏设计概念游戏目标：玩家控制角色吃掉尽可能多的洋葱获得高分核心机制：洋葱从屏幕上方随机下落玩家左右移动角色接住洋葱不同洋葱有不同分值（普通洋葱+10，红洋葱+20，紫洋葱+50）辣椒会扣减生命值
模拟工作队列 - 华为OD机试真题(JavaScript卷) 什码情况算法面试 javascript 数据结构华为od
华为OD机试题库《C++》限时优惠9.9华为OD机试题库《Python》限时优惠9.9华为OD机试题库《JavaScript》限时优惠9.9针对刷题难，效率慢，我们提供一对一算法辅导，针对个人情况定制化的提高计划（全称1V1效率更高）。看不懂有疑问需要答疑辅导欢迎私VX：code5bug题目描述让我们来模拟一个工作队列的运作，有一个任务提交者和若干任务执行者，执行者从1开始编号。提交者会在给定的时
数据分类 - 华为OD机试真题(JavaScript 题解) 什码情况华为od javascript 开发语言数据结构算法机试
华为OD机试题库《C++》限时优惠9.9华为OD机试题库《Python》限时优惠9.9华为OD机试题库《JavaScript》限时优惠9.9针对刷题难，效率慢，我们提供一对一算法辅导，针对个人情况定制化的提高计划（全称1V1效率更高）。看不懂有疑问需要答疑辅导欢迎私VX：code5bug题目描述对一个数据a进行分类，分类方法为：此数据a（四个字节大小）的四个字节相加对一个给定的值b取模，如果得到的
odrive软件的版本 m0_55305757 stm32 电机嵌入式硬件 odrive
odrive软件的版本0.4.0通信方面引入一个fibre变复杂了（节点还是手工生成的），cpp程序开始变多了。（sensorless我看到变成独立文件了）pythontool开始使用pip安装形式。0.5.0开始支持spi的encoder，as5047之类0.5.1据说之后的版本controlloop开始变化0.5.2开始应该是大修改了//RequiredtouseOC4forADCtrigge
说说自己Python 代码优化实践 chilavert318 大数据 linux 运维 python
今年上半年在外省做一个大数据相关的项目，在review项目组成员的代码时，发现一段处理大数据集的模块存在明显性能瓶颈：10万条数据的清洗流程耗时近20分钟，CPU占用率却始终在30%以下。深入分析后发现，看似简洁的Python代码背后，隐藏着诸多可以优化的细节——这并非个例，我们的程序在追求代码可读性时，往往忽略了Python特有的性能陷阱。今天抽点时间，从我实践中的代码就python开发，从内存
ROS学习笔记5：常用API和模块导入
前言本人ROS小白，利用寒假时间学习ROS，在此以笔记的方式记录自己每天的学习过程。争取写满15篇(5/15)。环境：Ubuntu20.04、ROS1：noetic环境配置：严格按照下方学习链接的教程配置，基本一次成功。学习链接：【Autolabor初级教程】ROS机器人入门对应链接文档：ROS机器人入门课程《ROS理论与实践》笔记绝大部分代码使用Python语言编写。本期关键词：初始化，话题服务
一个简单测试Deepseek吞吐量的脚本,国内环境可跑谢平康深度学习 pytorch 人工智能
一个简单测试Deepseek吞吐量的脚本,这里用DeepSeek-R1-Distill-Qwen-32B,支持单卡409024G可跑,具体看你的硬件情况做调整,理论支持所有的模型,看你需要,可以修改模型名称,重点是pip使用国内的源,模型下载用阿里的ModelScope,无障碍下载,使用.最后可以生成一个txt与html报表.前提是你安装了python与python-venv,你可以不用venv来
OpenCV图像添加水印
一、前言在数字图像处理中，为图片添加水印是一项常见且重要的技术。无论是版权保护、品牌宣传还是防止未经授权的使用，水印都能发挥重要作用。OpenCV作为一款强大的计算机视觉库，提供了丰富的功能来实现各种水印效果。本教程将详细介绍如何使用OpenCV为图像添加文字水印和图片水印。二、环境准备在开始之前，请确保已安装以下环境：Python3.xOpenCV库（可通过pipinstallopencv-py
Ast解析Python代码示例 X1A0RAN python 开发语言
#-*-coding:utf-8-*-#@Desc:Ast代码解析示例importastclassCodeParse():def__init__(self):self.visited_nodes=set()#解析装饰器defparse_decorator(self,decorator):returnast.dump(decorator)#解析函数defparse_func(self,node,st
Python编程实战：爬虫与数据可视化的全过程草莓味儿柠檬
本文还有配套的精品资源，点击获取简介：本项目通过Python编程实现网络数据爬取和数据可视化，适合初学者深入了解Python。我们将涵盖基础语法、网络爬虫技术、数据处理、可视化技术、文件操作和错误处理等关键知识点，最终完成从爬取各省降水量数据到可视化展示的全过程。1.Python基础语法使用Python作为一门流行的编程语言，因其简洁和易读性被广泛应用于网络爬虫、数据处理和可视化等领域。本章将帮助
Spring中@Value注解，需要注意的地方无量 spring bean @Value xml
Spring 3以后,支持@Value注解的方式获取properties文件中的配置值，简化了读取配置文件的复杂操作 1、在applicationContext.xml文件(或引用文件中)中配置properties文件 <bean id="appProperty" class="org.springframework.beans.fac
mongoDB 分片开窍的石头 mongodb
mongoDB的分片。要mongos查询数据时候先查询configsvr看数据在那台shard上，configsvr上边放的是metar信息，指的是那条数据在那个片上。由此可以看出mongo在做分片的时候咱们至少要有一个configsvr,和两个以上的shard（片）信息。第一步启动两台以上的mongo服务 &nb
OVER(PARTITION BY)函数用法 0624chenhong oracle
这篇写得很好，引自 http://www.cnblogs.com/lanzi/archive/2010/10/26/1861338.html OVER(PARTITION BY)函数用法 2010年10月26日 OVER(PARTITION BY)函数介绍开窗函数 &nb
Android开发中，ADB server didn't ACK 解决方法一炮送你回车库 Android开发
首先通知：凡是安装360、豌豆荚、腾讯管家的全部卸载，然后再尝试。一直没搞明白这个问题咋出现的，但今天看到一个方法，搞定了！原来是豌豆荚占用了 5037 端口导致。参见原文章：一个豌豆荚引发的血案——关于ADB server didn't ACK的问题简单来讲，首先将Windows任务进程中的豌豆荚干掉，如果还是不行，再继续按下列步骤排查。 &nb
canvas中的像素绘制问题换个号韩国红果果 JavaScript canvas
pixl的绘制，1.如果绘制点正处于相邻像素交叉线，绘制x像素的线宽，则从交叉线分别向前向后绘制x/2个像素，如果x/2是整数，则刚好填满x个像素，如果是小数，则先把整数格填满，再去绘制剩下的小数部分，绘制时，是将小数部分的颜色用来除以一个像素的宽度，颜色会变淡。所以要用整数坐标来画的话（即绘制点正处于相邻像素交叉线时），线宽必须是2的整数倍。否则会出现不饱满的像素。 2.如果绘制点为一个像素的
编码乱码问题灵静志远 java jvm jsp 编码
1、JVM中单个字符占用的字节长度跟编码方式有关，而默认编码方式又跟平台是一一对应的或说平台决定了默认字符编码方式；2、对于单个字符：ISO-8859-1单字节编码，GBK双字节编码，UTF-8三字节编码；因此中文平台(中文平台默认字符集编码GBK)下一个中文字符占2个字节，而英文平台(英文平台默认字符集编码Cp1252(类似于ISO-8859-1))。 3、getBytes()、getByte
java 求几个月后的日期 darkranger calendar getinstance
Date plandate = planDate.toDate(); SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd"); Calendar cal = Calendar.getInstance(); cal.setTime(plandate); // 取得三个月后时间 cal.add(Calendar.M
数据库设计的三大范式（通俗易懂） aijuans 数据库复习
关系数据库中的关系必须满足一定的要求。满足不同程度要求的为不同范式。数据库的设计范式是数据库设计所需要满足的规范。只有理解数据库的设计范式，才能设计出高效率、优雅的数据库，否则可能会设计出错误的数据库. 目前，主要有六种范式：第一范式、第二范式、第三范式、BC范式、第四范式和第五范式。满足最低要求的叫第一范式，简称1NF。在第一范式基础上进一步满足一些要求的为第二范式，简称2NF。其余依此类推。
想学工作流怎么入手 atongyeye jbpm
工作流在工作中变得越来越重要，很多朋友想学工作流却不知如何入手。很多朋友习惯性的这看一点，那了解一点，既不系统，也容易半途而废。好比学武功，最好的办法是有一本武功秘籍。研究明白，则犹如打通任督二脉。系统学习工作流，很重要的一本书《JBPM工作流开发指南》。本人苦苦学习两个月，基本上可以解决大部分流程问题。整理一下学习思路，有兴趣的朋友可以参考下。 1 首先要
Context和SQLiteOpenHelper创建数据库百合不是茶 android Context创建数据库
一直以为安卓数据库的创建就是使用SQLiteOpenHelper创建,但是最近在android的一本书上看到了Context也可以创建数据库,下面我们一起分析这两种方式创建数据库的方式和区别,重点在SQLiteOpenHelper 一:SQLiteOpenHelper创建数据库: 1,SQLi
浅谈group by和distinct bijian1013 oracle 数据库 group by distinct
group by和distinct只了去重意义一样，但是group by应用范围更广泛些，如分组汇总或者从聚合函数里筛选数据等。譬如：统计每id数并且只显示数大于3 select id ,count(id) from ta
vi opertion 征客丶 mac opration vi
进入 command mode （命令行模式）按 esc 键再按 shift + 冒号注：以下命令中带 $ 【在命令行模式下进行】，不带 $ 【在非命令行模式下进行】一、文件操作 1.1、强制退出不保存 $ q! 1.2、保存 $ w 1.3、保存并退出 $ wq 1.4、刷新或重新加载已打开的文件 $ e 二、光标移动 2.1、跳到指定行数字
【Spark十四】深入Spark RDD第三部分RDD基本API bit1129 spark
对于K/V类型的RDD,如下操作是什么含义？ val rdd = sc.parallelize(List(("A",3),("C",6),("A",1),("B",5)) rdd.reduceByKey(_+_).collect reduceByKey在这里的操作，是把
java类加载机制 BlueSkator java 虚拟机
java类加载机制 1.java类加载器的树状结构引导类加载器 ^ | 扩展类加载器 ^ | 系统类加载器 java使用代理模式来完成类加载，java的类加载器也有类似于继承的关系，引导类是最顶层的加载器，它是所有类的根加载器，它负责加载java核心库。当一个类加载器接到装载类到虚拟机的请求时，通常会代理给父类加载器，若已经是根加载器了，就自己完成加载。虚拟机区分一个Cla
动态添加文本框 BreakingBad 文本框
<script> var num=1; function AddInput() { var str=""; str+="<input
读《研磨设计模式》-代码笔记-单例模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ public class Singleton { } /* * 懒汉模式。注意，getInstance如果在多线程环境中调用，需要加上synchronized，否则存在线程不安全问题 */ class LazySingleton
iOS应用打包发布常见问题 chenhbc ios iOS发布 iOS上传 iOS打包
这个月公司安排我一个人做iOS客户端开发，由于急着用，我先发布一个版本，由于第一次发布iOS应用，期间出了不少问题，记录于此。 1、使用Application Loader 发布时报错：Communication error.please use diagnostic mode to check connectivity.you need to have outbound acc
工作流复杂拓扑结构处理新思路 comsci 设计模式工作算法企业应用 OO
我们走的设计路线和国外的产品不太一样，不一样在哪里呢？国外的流程的设计思路是通过事先定义一整套规则(类似XPDL)来约束和控制流程图的复杂度(我对国外的产品了解不够多，仅仅是在有限的了解程度上面提出这样的看法)，从而避免在流程引擎中处理这些复杂的图的问题，而我们却没有通过事先定义这样的复杂的规则来约束和降低用户自定义流程图的灵活性，这样一来，在引擎和流程流转控制这一个层面就会遇到很
oracle 11g新特性Flashback data archive daizj oracle
1. 什么是flashback data archive Flashback data archive是oracle 11g中引入的一个新特性。Flashback archive是一个新的数据库对象，用于存储一个或多表的历史数据。Flashback archive是一个逻辑对象，概念上类似于表空间。实际上flashback archive可以看作是存储一个或多个表的所有事务变化的逻辑空间。
多叉树:2-3-4树 dieslrae 树
平衡树多叉树,每个节点最多有4个子节点和3个数据项,2,3,4的含义是指一个节点可能含有的子节点的个数,效率比红黑树稍差.一般不允许出现重复关键字值.2-3-4树有以下特征: 1、有一个数据项的节点总是有2个子节点(称为2-节点) 2、有两个数据项的节点总是有3个子节点(称为3-节
C语言学习七动态分配 malloc的使用 dcj3sjt126com c language malloc
/* 2013年3月15日15:16:24 malloc 就memory(内存) allocate(分配)的缩写本程序没有实际含义，只是理解使用 */ # include <stdio.h> # include <malloc.h> int main(void) { int i = 5; //分配了4个字节静态分配 int * p
Objective-C编码规范[译] dcj3sjt126com 代码规范
原文链接 : The official raywenderlich.com Objective-C style guide 原文作者 : raywenderlich.com Team 译文出自 : raywenderlich.com Objective-C编码规范译者 : Sam Lau
0.性能优化-目录 frank1234 性能优化
从今天开始笔者陆续发表一些性能测试相关的文章，主要是对自己前段时间学习的总结，由于水平有限，性能测试领域很深，本人理解的也比较浅，欢迎各位大咖批评指正。主要内容包括：一、性能测试指标吞吐量、TPS、响应时间、负载、可扩展性、PV、思考时间 http://frank1234.iteye.com/blog/2180305 二、性能测试策略生产环境相同基准测试预热等 htt
Java父类取得子类传递的泛型参数Class类型 happyqing java 泛型父类子类 Class
import java.lang.reflect.ParameterizedType; import java.lang.reflect.Type; import org.junit.Test; abstract class BaseDao<T> { public void getType() { //Class<E> clazz =
跟我学SpringMVC目录汇总贴、PDF下载、源码下载 jinnianshilongnian springMVC
----广告-------------------------------------------------------------- 网站核心商详页开发掌握Java技术，掌握并发/异步工具使用，熟悉spring、ibatis框架；掌握数据库技术，表设计和索引优化，分库分表/读写分离；了解缓存技术，熟练使用如Redis/Memcached等主流技术；了解Ngin
the HTTP rewrite module requires the PCRE library 流浪鱼 rewrite
./configure: error: the HTTP rewrite module requires the PCRE library. 模块依赖性Nginx需要依赖下面3个包 1. gzip 模块需要 zlib 库 ( 下载: http://www.zlib.net/ ) 2. rewrite 模块需要 pcre 库 ( 下载: http://www.pcre.org/ ) 3. s
第12章 Ajax（中） onestopweb Ajax
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
Optimize query with Query Stripping in Web Intelligence blueoxygen BO
http://wiki.sdn.sap.com/wiki/display/BOBJ/Optimize+query+with+Query+Stripping+in+Web+Intelligence and a very straightfoward video http://www.sdn.sap.com/irj/scn/events?rid=/library/uuid/40ec3a0c-936
Java开发者写SQL时常犯的10个错误 tomcat_oracle java sql
1、不用PreparedStatements 　　有意思的是，在JDBC出现了许多年后的今天，这个错误依然出现在博客、论坛和邮件列表中，即便要记住和理解它是一件很简单的事。开发者不使用PreparedStatements的原因可能有如下几个：　　他们对PreparedStatements不了解　　他们认为使用PreparedStatements太慢了　　他们认为写Prepar
世纪互联与结盟有感阿尔萨斯
10月10日，世纪互联与（Foxcon）签约成立合资公司，有感。全球电子制造业巨头（全球500强企业）与世纪互联共同看好IDC、云计算等业务在中国的增长空间，双方迅速果断出手，在资本层面上达成合作，此举体现了全球电子制造业巨头对世纪互联IDC业务的欣赏与信任，另一方面反映出世纪互联目前良好的运营状况与广阔的发展前景。众所周知，精于电子产品制造（世界第一），对于世纪互联而言，能够与结盟

pandas 简单练习

创建xlsx(Excel)文件

读取xlsx(Excel)文件

行列单元格的处理(Series)

数字填充、日期填充

列之间的加减乘除（折扣价格=单价*折扣）

数据排序在这里插入代码片

数据筛选和过滤

行操作集锦

列集锦操作

柱形图

分组柱形图

叠加柱状图

饼图

折线图和叠加折线图

散点图

直方图

密度图

数据相关性

多表联合（merge和join）

数据值的检测

姓名列转换成姓列和名列（一列分两列）

数列值的总和、平均值等操作

重复数据的查找和去除

行列颠倒（行转列，列转行）

读取不同格式的文件

数据透视表

线性回归预测数据

Jupyter实现条件格式化（数据上色）

Jupyter的安装和使用

源码和成果预览

Jupyter实现条件格式化（颜色深浅)

Jupyter实现条件格式化（进度条)

pandas和数据库

复杂编程（函数）

你可能感兴趣的:(python)

数据排序`在这里插入代码片`