xfxf996

使用熊猫的“大数据”工作流程

本文翻译自：“Large data” work flows using pandas

I have tried to puzzle out an answer to this question for many months while learning pandas. 在学习熊猫的过程中，我试图解决这个问题的答案已经有很多月了。 I use SAS for my day-to-day work and it is great for it's out-of-core support. 我在日常工作中使用SAS，这非常有用，因为它提供了核心支持。 However, SAS is horrible as a piece of software for numerous other reasons. 但是，由于许多其他原因，SAS作为一个软件也是很糟糕的。

One day I hope to replace my use of SAS with python and pandas, but I currently lack an out-of-core workflow for large datasets. 有一天，我希望用python和pandas取代我对SAS的使用，但是我目前缺少大型数据集的核心工作流程。 I'm not talking about "big data" that requires a distributed network, but rather files too large to fit in memory but small enough to fit on a hard-drive. 我并不是在说需要分布式网络的“大数据”，而是文件太大而无法容纳在内存中，但文件又足够小而无法容纳在硬盘上。

My first thought is to use HDFStore to hold large datasets on disk and pull only the pieces I need into dataframes for analysis. 我的第一个想法是使用HDFStore将大型数据集保存在磁盘上，然后仅将我需要的部分拉入数据帧中进行分析。 Others have mentioned MongoDB as an easier to use alternative. 其他人则提到MongoDB是一种更易于使用的替代方案。 My question is this: 我的问题是这样的：

What are some best-practice workflows for accomplishing the following: 什么是实现以下目标的最佳实践工作流：

Loading flat files into a permanent, on-disk database structure 将平面文件加载到永久的磁盘数据库结构中
Querying that database to retrieve data to feed into a pandas data structure 查询该数据库以检索要输入到熊猫数据结构中的数据
Updating the database after manipulating pieces in pandas 处理熊猫中的片段后更新数据库

Real-world examples would be much appreciated, especially from anyone who uses pandas on "large data". 现实世界中的示例将不胜感激，尤其是那些使用“大数据”中的熊猫的人。

Edit -- an example of how I would like this to work: 编辑-我希望如何工作的示例：

Iteratively import a large flat-file and store it in a permanent, on-disk database structure. 迭代导入一个大型平面文件，并将其存储在永久的磁盘数据库结构中。 These files are typically too large to fit in memory. 这些文件通常太大而无法容纳在内存中。
In order to use Pandas, I would like to read subsets of this data (usually just a few columns at a time) that can fit in memory. 为了使用Pandas，我想读取这些数据的子集（通常一次只读取几列），使其适合内存。
I would create new columns by performing various operations on the selected columns. 我将通过对所选列执行各种操作来创建新列。
I would then have to append these new columns into the database structure. 然后，我将不得不将这些新列添加到数据库结构中。

I am trying to find a best-practice way of performing these steps. 我正在尝试找到执行这些步骤的最佳实践方法。 Reading links about pandas and pytables it seems that appending a new column could be a problem. 阅读有关熊猫和pytables的链接，似乎添加一个新列可能是个问题。

Edit -- Responding to Jeff's questions specifically: 编辑-专门回答杰夫的问题：

I am building consumer credit risk models. 我正在建立消费者信用风险模型。 The kinds of data include phone, SSN and address characteristics; 数据类型包括电话，SSN和地址特征； property values; 财产价值； derogatory information like criminal records, bankruptcies, etc... The datasets I use every day have nearly 1,000 to 2,000 fields on average of mixed data types: continuous, nominal and ordinal variables of both numeric and character data. 诸如犯罪记录，破产等之类的贬义信息。我每天使用的数据集平均有近1,000到2,000个混合数据类型的字段：数字和字符数据的连续，名义和有序变量。 I rarely append rows, but I do perform many operations that create new columns. 我很少追加行，但是我确实执行了许多创建新列的操作。
Typical operations involve combining several columns using conditional logic into a new, compound column. 典型的操作涉及使用条件逻辑将几个列合并到一个新的复合列中。 For example, if var1 > 2 then newvar = 'A' elif var2 = 4 then newvar = 'B' . 例如， if var1 > 2 then newvar = 'A' elif var2 = 4 then newvar = 'B' ； if var1 > 2 then newvar = 'A' elif var2 = 4 then newvar = 'B' 。 The result of these operations is a new column for every record in my dataset. 这些操作的结果是数据集中每个记录的新列。
Finally, I would like to append these new columns into the on-disk data structure. 最后，我想将这些新列添加到磁盘数据结构中。 I would repeat step 2, exploring the data with crosstabs and descriptive statistics trying to find interesting, intuitive relationships to model. 我将重复步骤2，使用交叉表和描述性统计数据探索数据，以寻找有趣的直观关系进行建模。
A typical project file is usually about 1GB. 一个典型的项目文件通常约为1GB。 Files are organized into such a manner where a row consists of a record of consumer data. 文件组织成这样的方式，其中一行包含消费者数据记录。 Each row has the same number of columns for every record. 每条记录的每一行都有相同的列数。 This will always be the case. 情况总是如此。
It's pretty rare that I would subset by rows when creating a new column. 创建新列时，我会按行进行子集化是非常罕见的。 However, it's pretty common for me to subset on rows when creating reports or generating descriptive statistics. 但是，在创建报告或生成描述性统计信息时，对行进行子集化是很常见的。 For example, I might want to create a simple frequency for a specific line of business, say Retail credit cards. 例如，我可能想为特定业务创建一个简单的频率，例如零售信用卡。 To do this, I would select only those records where the line of business = retail in addition to whichever columns I want to report on. 为此，除了我要报告的任何列之外，我将只选择那些业务线=零售的记录。 When creating new columns, however, I would pull all rows of data and only the columns I need for the operations. 但是，在创建新列时，我将拉出所有数据行，而仅提取操作所需的列。
The modeling process requires that I analyze every column, look for interesting relationships with some outcome variable, and create new compound columns that describe those relationships. 建模过程要求我分析每一列，寻找与某些结果变量有关的有趣关系，并创建描述这些关系的新复合列。 The columns that I explore are usually done in small sets. 我探索的列通常以小集合形式完成。 For example, I will focus on a set of say 20 columns just dealing with property values and observe how they relate to defaulting on a loan. 例如，我将重点介绍一组仅涉及属性值的20个列，并观察它们与贷款违约的关系。 Once those are explored and new columns are created, I then move on to another group of columns, say college education, and repeat the process. 在探索了这些列并创建了新的列之后，我接着转到另一组列，例如大学学历，然后重复该过程。 What I'm doing is creating candidate variables that explain the relationship between my data and some outcome. 我正在做的是创建候选变量，这些变量解释我的数据和某些结果之间的关系。 At the very end of this process, I apply some learning techniques that create an equation out of those compound columns. 在此过程的最后，我应用了一些学习技术，这些技术可以根据这些复合列创建方程。

It is rare that I would ever add rows to the dataset. 我很少向数据集添加行。 I will nearly always be creating new columns (variables or features in statistics/machine learning parlance). 我几乎总是会创建新列（统计/机器学习术语中的变量或功能）。

#1楼

参考：https://stackoom.com/question/xqJF/使用熊猫的-大数据-工作流程

#2楼

I routinely use tens of gigabytes of data in just this fashion eg I have tables on disk that I read via queries, create data and append back. 我通常以这种方式使用数十GB的数据，例如，我在磁盘上有一些表，这些表是通过查询读取，创建数据并追加回去的。

It's worth reading the docs and late in this thread for several suggestions for how to store your data. 值得阅读文档以及该线程的后期内容，以获取有关如何存储数据的一些建议。

Details which will affect how you store your data, like: 将影响您存储数据方式的详细信息，例如：
Give as much detail as you can; 尽可能多地提供细节； and I can help you develop a structure. 我可以帮助您建立结构。

Size of data, # of rows, columns, types of columns; 数据大小，行数，列数，列类型； are you appending rows, or just columns? 您要追加行还是仅追加列？
What will typical operations look like. 典型的操作将是什么样的。 Eg do a query on columns to select a bunch of rows and specific columns, then do an operation (in-memory), create new columns, save these. 例如，对列进行查询以选择一堆行和特定的列，然后执行一项操作（在内存中），创建新列并保存。
(Giving a toy example could enable us to offer more specific recommendations.) （提供一个玩具示例可以使我们提供更具体的建议。）
After that processing, then what do you do? 处理完之后，您该怎么办？ Is step 2 ad hoc, or repeatable? 步骤2是临时的还是可重复的？
Input flat files: how many, rough total size in Gb. 输入平面文件：大约总大小（以Gb为单位）。 How are these organized eg by records? 这些是如何组织的，例如通过记录？ Does each one contains different fields, or do they have some records per file with all of the fields in each file? 每个文件都包含不同的字段，还是每个文件都有一些记录，每个文件中都有所有字段？
Do you ever select subsets of rows (records) based on criteria (eg select the rows with field A > 5)? 您是否曾经根据条件选择行（记录）的子集（例如，选择字段A> 5的行）？ and then do something, or do you just select fields A, B, C with all of the records (and then do something)? 然后执行某些操作，还是只选择带有所有记录的A，B，C字段（然后执行某些操作）？
Do you 'work on' all of your columns (in groups), or are there a good proportion that you may only use for reports (eg you want to keep the data around, but don't need to pull in that column explicity until final results time)? 您是否“工作”所有列（成组），还是只用于报告的比例很高（例如，您想保留数据，但无需明确地拉入该列，直到最终结果时间）？

Solution 解

Ensure you have pandas at least 0.10.1 installed. 确保已安装至少0.10.1熊猫。

Read iterating files chunk-by-chunk and multiple table queries . 逐块读取迭代文件和多个表查询。

Since pytables is optimized to operate on row-wise (which is what you query on), we will create a table for each group of fields. 由于pytables已优化为可按行操作（这是您要查询的内容），因此我们将为每组字段创建一个表。 This way it's easy to select a small group of fields (which will work with a big table, but it's more efficient to do it this way... I think I may be able to fix this limitation in the future... this is more intuitive anyhow): 这样一来，很容易选择一小组字段（它将与一个大表一起使用，但是这样做更有效。我想我将来可能会解决此限制。这是更加直观）：
(The following is pseudocode.) （以下是伪代码。）

import numpy as np
import pandas as pd

# create a store
store = pd.HDFStore('mystore.h5')

# this is the key to your storage:
#    this maps your fields to a specific group, and defines 
#    what you want to have as data_columns.
#    you might want to create a nice class wrapping this
#    (as you will want to have this map and its inversion)  
group_map = dict(
    A = dict(fields = ['field_1','field_2',.....], dc = ['field_1',....,'field_5']),
    B = dict(fields = ['field_10',......        ], dc = ['field_10']),
    .....
    REPORTING_ONLY = dict(fields = ['field_1000','field_1001',...], dc = []),

)

group_map_inverted = dict()
for g, v in group_map.items():
    group_map_inverted.update(dict([ (f,g) for f in v['fields'] ]))

Reading in the files and creating the storage (essentially doing what append_to_multiple does): 读入文件并创建存储（基本上是执行append_to_multiple操作）：

for f in files:
   # read in the file, additional options hmay be necessary here
   # the chunksize is not strictly necessary, you may be able to slurp each 
   # file into memory in which case just eliminate this part of the loop 
   # (you can also change chunksize if necessary)
   for chunk in pd.read_table(f, chunksize=50000):
       # we are going to append to each table by group
       # we are not going to create indexes at this time
       # but we *ARE* going to create (some) data_columns

       # figure out the field groupings
       for g, v in group_map.items():
             # create the frame for this group
             frame = chunk.reindex(columns = v['fields'], copy = False)    

             # append it
             store.append(g, frame, index=False, data_columns = v['dc'])

Now you have all of the tables in the file (actually you could store them in separate files if you wish, you would prob have to add the filename to the group_map, but probably this isn't necessary). 现在，您已将所有表存储在文件中（实际上，您可以根据需要将它们存储在单独的文件中，您可能需要将文件名添加到group_map中，但这可能不是必需的）。

This is how you get columns and create new ones: 这是获取列并创建新列的方式：

frame = store.select(group_that_I_want)
# you can optionally specify:
# columns = a list of the columns IN THAT GROUP (if you wanted to
#     select only say 3 out of the 20 columns in this sub-table)
# and a where clause if you want a subset of the rows

# do calculations on this frame
new_frame = cool_function_on_frame(frame)

# to 'add columns', create a new group (you probably want to
# limit the columns in this new_group to be only NEW ones
# (e.g. so you don't overlap from the other tables)
# add this info to the group_map
store.append(new_group, new_frame.reindex(columns = new_columns_created, copy = False), data_columns = new_columns_created)

When you are ready for post_processing: 准备进行后期处理时：

# This may be a bit tricky; and depends what you are actually doing.
# I may need to modify this function to be a bit more general:
report_data = store.select_as_multiple([groups_1,groups_2,.....], where =['field_1>0', 'field_1000=foo'], selector = group_1)

About data_columns, you don't actually need to define ANY data_columns; 关于data_columns，您实际上不需要定义任何 data_columns。 they allow you to sub-select rows based on the column. 它们使您可以根据列来子选择行。 Eg something like: 例如：

store.select(group, where = ['field_1000=foo', 'field_1001>0'])

They may be most interesting to you in the final report generation stage (essentially a data column is segregated from other columns, which might impact efficiency somewhat if you define a lot). 在最后的报告生成阶段，它们可能对您来说最有趣（实际上，数据列与其他列是分开的，如果定义太多，这可能会在一定程度上影响效率）。

You also might want to: 您可能还想：

create a function which takes a list of fields, looks up the groups in the groups_map, then selects these and concatenates the results so you get the resulting frame (this is essentially what select_as_multiple does). 创建一个使用字段列表的函数，在groups_map中查找组，然后选择它们并连接结果，以便获得结果框架（本质上就是select_as_multiple所做的事情）。 This way the structure would be pretty transparent to you. 这样，结构对您将非常透明。
indexes on certain data columns (makes row-subsetting much faster). 在某些数据列上建立索引（使行子设置更快）。
enable compression. 启用压缩。

Let me know when you have questions! 如有疑问，请告诉我！

#3楼

This is the case for pymongo. pymongo就是这种情况。 I have also prototyped using sql server, sqlite, HDF, ORM (SQLAlchemy) in python. 我还使用python中的sql server，sqlite，HDF，ORM（SQLAlchemy）进行了原型设计。 First and foremost pymongo is a document based DB, so each person would be a document ( dict of attributes). pymongo首先是基于文档的数据库，因此每个人都是文档（属性dict ）。 Many people form a collection and you can have many collections (people, stock market, income). 很多人组成一个集合，您可以有很多集合（人，股票市场，收入）。

pd.dateframe -> pymongo Note: I use the chunksize in read_csv to keep it to 5 to 10k records(pymongo drops the socket if larger) pd.dateframe-> pymongo注意：我在read_csv使用chunksize将其保持为5至10k记录（如果较大，pymongo会删除套接字）

aCollection.insert((a[1].to_dict() for a in df.iterrows()))

querying: gt = greater than... 查询：gt =大于...

pd.DataFrame(list(mongoCollection.find({'anAttribute':{'$gt':2887000, '$lt':2889000}})))

.find() returns an iterator so I commonly use ichunked to chop into smaller iterators. .find()返回一个迭代器，因此我通常使用ichunked切成更小的迭代器。

How about a join since I normally get 10 data sources to paste together: 由于我通常可以将10个数据源粘贴在一起，因此如何进行联接：

aJoinDF = pandas.DataFrame(list(mongoCollection.find({'anAttribute':{'$in':Att_Keys}})))

then (in my case sometimes I have to agg on aJoinDF first before its "mergeable".) 然后（就我而言，有时我必须先在aJoinDF进行“可合并”操作）。

df = pandas.merge(df, aJoinDF, on=aKey, how='left')

And you can then write the new info to your main collection via the update method below. 然后，您可以通过下面的update方法将新信息写入您的主要收藏夹。 (logical collection vs physical datasources). （逻辑收集与物理数据源）。

collection.update({primarykey:foo},{key:change})

On smaller lookups, just denormalize. 在较小的查询中，只需进行非规范化即可。 For example, you have code in the document and you just add the field code text and do a dict lookup as you create documents. 例如，您在文档中有代码，而您仅添加域代码文本并在创建文档时进行dict查找。

Now you have a nice dataset based around a person, you can unleash your logic on each case and make more attributes. 现在，您有了一个基于人的漂亮数据集，您可以在每种情况下释放自己的逻辑并添加更多属性。 Finally you can read into pandas your 3 to memory max key indicators and do pivots/agg/data exploration. 最后，您可以将3个最大记忆键指标读入大熊猫，并进行数据透视/汇总/数据探索。 This works for me for 3 million records with numbers/big text/categories/codes/floats/... 这对我来说适合300万条带有数字/大文本/类别/代码/浮点数/ ...的记录

You can also use the two methods built into MongoDB (MapReduce and aggregate framework). 您还可以使用MongoDB内置的两种方法（MapReduce和聚合框架）。 See here for more info about the aggregate framework , as it seems to be easier than MapReduce and looks handy for quick aggregate work. 有关聚合框架的更多信息，请参见此处，因为它似乎比MapReduce容易，并且看起来便于进行快速聚合工作。 Notice I didn't need to define my fields or relations, and I can add items to a document. 注意，我不需要定义字段或关系，可以将项目添加到文档中。 At the current state of the rapidly changing numpy, pandas, python toolset, MongoDB helps me just get to work :) 在快速变化的numpy，pandas，python工具集的当前状态下，MongoDB可以帮助我开始工作:)

#4楼

I spotted this a little late, but I work with a similar problem (mortgage prepayment models). 我发现这有点晚了，但我遇到了类似的问题（抵押预付款模型）。 My solution has been to skip the pandas HDFStore layer and use straight pytables. 我的解决方案是跳过熊猫HDFStore层，并使用直接pytables。 I save each column as an individual HDF5 array in my final file. 我将每列保存为最终文件中的单独HDF5阵列。

My basic workflow is to first get a CSV file from the database. 我的基本工作流程是首先从数据库中获取CSV文件。 I gzip it, so it's not as huge. 我用gzip压缩，所以它没有那么大。 Then I convert that to a row-oriented HDF5 file, by iterating over it in python, converting each row to a real data type, and writing it to a HDF5 file. 然后，通过在python中对其进行迭代，将每一行转换为实际数据类型并将其写入HDF5文件，将其转换为面向行的HDF5文件。 That takes some tens of minutes, but it doesn't use any memory, since it's only operating row-by-row. 这花费了数十分钟，但是它不使用任何内存，因为它仅逐行运行。 Then I "transpose" the row-oriented HDF5 file into a column-oriented HDF5 file. 然后，我将面向行的HDF5文件“转置”为面向列的HDF5文件。

The table transpose looks like: 表转置如下：

def transpose_table(h_in, table_path, h_out, group_name="data", group_path="/"):
    # Get a reference to the input data.
    tb = h_in.getNode(table_path)
    # Create the output group to hold the columns.
    grp = h_out.createGroup(group_path, group_name, filters=tables.Filters(complevel=1))
    for col_name in tb.colnames:
        logger.debug("Processing %s", col_name)
        # Get the data.
        col_data = tb.col(col_name)
        # Create the output array.
        arr = h_out.createCArray(grp,
                                 col_name,
                                 tables.Atom.from_dtype(col_data.dtype),
                                 col_data.shape)
        # Store the data.
        arr[:] = col_data
    h_out.flush()

Reading it back in then looks like: 然后读回它就像：

def read_hdf5(hdf5_path, group_path="/data", columns=None):
    """Read a transposed data set from a HDF5 file."""
    if isinstance(hdf5_path, tables.file.File):
        hf = hdf5_path
    else:
        hf = tables.openFile(hdf5_path)

    grp = hf.getNode(group_path)
    if columns is None:
        data = [(child.name, child[:]) for child in grp]
    else:
        data = [(child.name, child[:]) for child in grp if child.name in columns]

    # Convert any float32 columns to float64 for processing.
    for i in range(len(data)):
        name, vec = data[i]
        if vec.dtype == np.float32:
            data[i] = (name, vec.astype(np.float64))

    if not isinstance(hdf5_path, tables.file.File):
        hf.close()
    return pd.DataFrame.from_items(data)

Now, I generally run this on a machine with a ton of memory, so I may not be careful enough with my memory usage. 现在，我通常在具有大量内存的计算机上运行此程序，因此我可能对内存使用情况不够谨慎。 For example, by default the load operation reads the whole data set. 例如，默认情况下，装入操作将读取整个数据集。

This generally works for me, but it's a bit clunky, and I can't use the fancy pytables magic. 这通常对我有用，但是有点笨拙，我不能使用花式的pytables魔术。

Edit: The real advantage of this approach, over the array-of-records pytables default, is that I can then load the data into R using h5r, which can't handle tables. 编辑：相对于默认的记录数组pytables，此方法的真正优势在于，我可以使用无法处理表的h5r将数据加载到R中。 Or, at least, I've been unable to get it to load heterogeneous tables. 或者，至少，我一直无法使它加载异构表。

#5楼

If your datasets are between 1 and 20GB, you should get a workstation with 48GB of RAM. 如果您的数据集介于1到20GB之间，则应该获得具有48GB RAM的工作站。 Then Pandas can hold the entire dataset in RAM. 然后，熊猫可以将整个数据集保存在RAM中。 I know its not the answer you're looking for here, but doing scientific computing on a notebook with 4GB of RAM isn't reasonable. 我知道这不是您在这里寻找的答案，但是在具有4GB RAM的笔记本电脑上进行科学计算是不合理的。

#6楼

I think the answers above are missing a simple approach that I've found very useful. 我认为以上答案都缺少一种我发现非常有用的简单方法。

When I have a file that is too large to load in memory, I break up the file into multiple smaller files (either by row or cols) 当我的文件太大而无法加载到内存中时，我将该文件分成多个较小的文件（按行或列）

Example: In case of 30 days worth of trading data of ~30GB size, I break it into a file per day of ~1GB size. 示例：如果有30天的〜30GB大小的交易数据值得每天将其拆分为一个〜1GB大小的文件。 I subsequently process each file separately and aggregate results at the end 随后，我分别处理每个文件，并在最后汇总结果

One of the biggest advantages is that it allows parallel processing of the files (either multiple threads or processes) 最大的优势之一是它允许并行处理文件（多个线程或多个进程）

The other advantage is that file manipulation (like adding/removing dates in the example) can be accomplished by regular shell commands, which is not be possible in more advanced/complicated file formats 另一个优点是文件操作（例如示例中的添加/删除日期）可以通过常规的shell命令来完成，而在更高级/更复杂的文件格式中则无法实现

This approach doesn't cover all scenarios, but is very useful in a lot of them 这种方法无法涵盖所有情况，但在许多情况下非常有用

你可能感兴趣的:(python,mongodb,pandas,hdf5,large-data)

python实现简易任务管理器 Roc-xb python 服务器 linux
本章教程，主要利用python实现一个简单的任务管理器，可以快速结束任务进程。目录一、实例代码二、效果演示一、实例代码#!/usr/bin/python#-*-coding:UTF-8-*-"""@author:Roc-xb"""#encoding:utf-8importsubprocessdefexecute_cmd(command):subprocess.run('chcp65001',she
Python 学习第五册深度学习第1章什么是深度学习 weixin_38135241 python 学习深度学习人工智能
----用教授的方式学习。目录1.1人工智能、机器学习与深度学习1.1.1人工智能1.1.2机器学习1.1.3从数据中学习表示1.1.4深度学习之“深度”1.1.5用三张图理解深度学习的工作原理1.2深度学习之前：机器学习简史1.2.1概率建模1.2.2核方法1.2.3决策树、随机森林与梯度提升机1.2.4深度学习有何不同什么是深度学习？1.1人工智能、机器学习与深度学习三者关系：1.1.1人工智
Python 爬虫实战：汽车电商平台价格波动监控与市场趋势洞察西攻城狮北 python 爬虫汽车实战案例
目录一、环境准备与依赖安装二、目标网站分析1.网站页面结构分析2.数据爬取策略三、代码实现1.数据抓取模块(1)爬取车型列表(2)爬取车型详情(3)主爬取函数2.数据存储模块3.数据分析模块四、完整工作流程(1)初始化爬虫(2)执行爬虫(3)数据存储(4)数据分析五、注意事项六、扩展功能在当今数字化时代，汽车电商平台为消费者提供了便捷的购车渠道。通过Python爬虫技术，我们可以监控汽车电商平台的
Python实现微博关键词爬虫才华是浅浅的耐心 python 新浪微博爬虫
1.背景介绍随着社交媒体的广泛应用，微博上的海量数据成为了很多研究和分析的重要信息源。为了方便获取微博的相关内容，本文将介绍如何使用Python编写一个简单的爬虫脚本，从微博中抓取指定关键词的相关数据，并将这些数据保存为Excel文件。本文将以关键词“樊振东”为例，展示从微博抓取该关键词相关数据的全过程。废话不多说，先上结果图。2.项目实现思路该爬虫通过向微博的搜索接口发送HTTP请求，获取与指定
使用 Python 实现批量发送电子邮件才华是浅浅的耐心 python 爬虫开发语言
引言：在日常工作中，我们可能会遇到需要批量发送邮件的场景，例如通知、营销邮件或测试邮件。如果手动发送，不仅效率低下，还容易出错。今天，我将分享一个使用Python实现的自动化邮件发送脚本，通过读取Excel文件中的发件人和收件人信息，轻松完成批量邮件发送任务。功能概述这个脚本的主要功能包括：从Excel文件中读取发件人信息（邮箱和授权码）和收件人信息（邮箱）。根据发件人邮箱的域名，自动匹配SMTP
python 之GUI设计：Entry组件时间之里 python-tkinter python python
说明：Entry（输入框）组件通常用于获取用户的输入文本。使用条件：Entry组件在GUI界面的设计中主要用于单行文本的键入（实际键入的内容可以比显示的空间更长，此种情况下结束鼠标和位移键能够产看自己输入的隐藏内容），通过几何外观图形属性设计可以改变实际的元素表现如果你希望接收多行文本的输入，可以使用Text组件（后面介绍）。常见用法：-普通输入框作为输入框最重要的属性是输入内容的获取：eg:pa
Python Tkinter库实战（用Entry和button控件做一个小型的浏览器） IT界小菜鸡笔记 python 开发语言
大家好，上一期我们大概了解了一下PythonTkinter库。这是一个方便快捷的GUI库；可以用短短几行代码生成出一个用户图形化接口的窗口。算是非常方便。既然前一期我们了解了tk库。那么我们今天就来做一个实战。今天这个实战项目源自于我一个奇奇怪怪的想法。当时打开浏览器的时候想着，既然我打开浏览器输入网址，搜索URL。既然别人可以，那我为什么不可以自己做一个呢？抱着这个想法，我就开始了这个实验。废话
珍藏！Java SpringBoot 精品源码合集约惠来袭，获取路径大公开秋野酱 java spring boot 开发语言
技术范围：SpringBoot、Vue、SSM、HLMT、Jsp、PHP、Nodejs、Python、爬虫、数据可视化、小程序、安卓app、大数据、物联网、机器学习等设计与开发。主要内容：免费功能设计、开题报告、任务书、中期检查PPT、系统功能实现、代码编写、论文编写和辅导、论文降重、长期答辩答疑辅导、腾讯会议一对一专业讲解辅导答辩、模拟答辩演练、和理解代码逻辑思路。文末获取源码联系文末获取源码联
python调用DeepSeek的API garfield_sun06 大模型 python 语言模型
1获取API获得deepseek开放平台的APIhttps://platform.deepseek.com/api_keys点击创建APIkey2调用方法方法一：采用openai的调用方法pipinstallopenai需要openai的包调用的代码框架fromopenaiimportOpenAIimportosclient=OpenAI(api_key='自己的APIkey',base_url=
Python GUI 开发：全面指南一休哥助手 python python 开发语言
1.PythonGUI开发简介GUI是指图形用户界面，它使用户可以通过图形元素（如按钮、文本框、下拉菜单等）与应用程序进行交互。与命令行界面相比，GUI更加直观易用。Python提供了多种库和框架，使开发者能够轻松创建功能丰富的桌面应用程序。1.1为什么选择Python进行GUI开发？简洁易读：Python的语法简洁，代码易于理解，开发者可以专注于应用程序的逻辑而不是语法。跨平台：Python是跨
基于Python+Django的可视化学习系统设计与实现（毕业设计源码+技术文档+系统部署）逐梦设计 Python毕业设计实战案例 python django 课程设计 vue.js 毕业设计源码
博主简介作者简介：Java领域优质创作者、CSDN博客专家、CSDN内容合伙人、掘金特邀作者、阿里云博客专家、51CTO特邀作者、多年架构师设计经验、多年校企合作经验，被多个学校常年聘为校外企业导师，指导学生毕业设计并参与学生毕业答辩指导，有较为丰富的相关经验。期待与各位高校教师、企业讲师以及同行交流合作主要内容：Java项目、Python项目、前端项目、PHP、ASP.NET、人工智能与大数据、
Python图形界面(GUI)Tkinter笔记（十四）：Entry与Button的碰撞（1）小叶肥辉 tkinter python gui tkinter
用功能按钮(Button)、单行文本输入框(Entry)、文本框内容读取(get)实现一个极简易的加法运算，及与其他控件的交互，提高体验，主要体现其人机交互的意义。因为Entry()文本输入框没有限制输入内容属性的参数，它是把所有的输入都视作它特有的一个类属性，所以用get()方法读取出来是一个字符串而这字符串可包括字母或其它符号。因此我们必须对其进行判断后再计算，若直接计算可能会出现不可预料的错
python ppt转pdf macos_如何在 macOS 上一键批量把 PPT 和 Word 文件转成 PDF weixin_39857792 python ppt转pdf macos
原标题：如何在macOS上一键批量把PPT和Word文件转成PDF相信不少人都有或曾经有过需要将多个PPT/Word文件转为PDF的需求，可能是一堆PPT课件为了方便批注，也可能是一些Word文档为了方便阅读。每次只能打开一个文档，选择「另存为」，选「PDF」，点「保存」，关掉，再打开下一个文档，文档数目一多，整个过程就会变得很令人沮丧。最近我研究了一下这个磨人的问题，制作了一个动作可以在不到2秒
python智能合约编程_技术指南 | Python智能合约开发？看这一篇就够了 weixin_39897127 python智能合约编程
01前言在之前的技术视点文章中，我们介绍了目前本体主网支持的智能合约体系以及相应的智能合约开发工具SmartX。很多小伙伴都想上手练一练。在本期的技术视点中，我们将正式开始讲述智能合约语法部分。本体的智能合约API分为7个模块，分别是Blockchain&BlockAPI、RuntimeAPI、StorageAPI、NativeAPI、UpgradeAPI、ExecutionEngineAPI以及
langchain chroma 与 chromadb笔记 phynikesi langchain 笔记 chromadb
chromadb可独立使用也可搭配langchain框架使用。环境：python3.9langchain=0.2.16chromadb=0.5.3chromadb使用示例importchromadbfromchromadb.configimportSettingsfromchromadb.utilsimportembedding_functions#加载embedding模型en_embeddin
python电脑怎么打开任务管理器_利用Python调用Windows API，实现任务管理器功能 weixin_39778400
任务管理器具体功能有：1、列出系统当前所有进程。2、列出隶属于该进程的所有线程。3、如果进程有窗口，可以显示和隐藏窗口。4、强行结束指定进程。通过Python调用WindowsAPI还是很实用的，能够结合Python的简洁和WindowsAPI的强大，写出各种各样的脚本。编码中的几个难点有：1、API的入参是结构体时，怎么解决？答：Python内手动建立结构体。详见：https://baijiah
OpenCV 基础模块 Python 版 ice_junjun OpenCV opencv python 计算机视觉
OpenCV基础模块权威指南（Python版）一、模块全景图plaintextOpenCV架构(v4.x+)├─核心层│├─core：基础数据结构与操作（Mat/Scalar/Point）│└─imgproc：图像处理流水线（滤波→变换→检测）├─交互层│├─highgui：GUI与媒体I/O（显示/捕获/交互）│└─video：视频分析（运动检测/目标跟踪）├─3D视觉层│└─calib3d：相
Python入门(函数) 高育良00003 python 开发语言
一.基础认识一种映射关系1.1什么是函数呢？概念函数是可以重复执行的语句块，可以重复调用作用用于封装语句块，提高代码的重用性1.2函数的定义语法：deffunction():#def为关键字，function为函数名#语句想要执行的操作returnre#re为返回值二.函数的调用函数名后+小括号()表示函数的执行2.1基本用法语法：函数名(实际调用的参数)2.2调用传参2.2.1位置传参最为常见，
python本地连接minio 伶星37 python 网络服务器
在你浏览器能成功访问到你的minio网页，并且成功登录之后。接下来如果你想用python连接数据库，并且想用python连接minio，就可以用这个blog。连接代码client=Minio("localhost:9000",#9000是默认端口号access_key="admin",#你的账户secret_key="password",#你的密码secure=False,#这点我会详细说明)为什
MongoDB实战-生产环境中分片的部署与配置 perfecttshoot MongoDB 部署配置分片集群 mongodb
在生产环境里部署分片集群时，面前会出现很多选择和挑战。下面会介绍几个推荐的拓扑结构。1.部署拓扑要运行示例MongoDB分片集群，你一共要启动九个进程（每个副本集三个mongod，外加三个配置服务器）。咋一看，这个数字有点吓人。一开始用户会假设在生产环境里运行两个分片集群要有九台独立的机器。幸运的是，实际需要的机器要少很多，看一下集群中各组件所要求的资源就知道为什么了。首先考虑下副本集，每个成员都
头歌实践教学平台 Python程序设计实训答案（三）学习的锅头哥实践教学平台实训答案 python
第七阶段文件实验一文本文件的读取第1关：学习-Python文件之文本文件的读取任务描述本关任务：使用open函数以只写的方式打开文件，打印文件的打开方式。相关知识为了完成本关任务，你需要掌握：文本文件；open函数及其参数；文件打开模式；文件对象常用属性；关闭文件close函数。#请在下面的Begin-End之间按照注释中给出的提示编写正确的代码##########Begin###########
python基础之--面相对象--OOP基本特性暴龙胡乱写博客 python 开发语言人工智能
python基础之–面相对象–OOP基本特性文章目录python基础之--面相对象--OOP基本特性一，OOP基本特性1.1封装1.2继承/派生1.2.1基础概念1.2.3继承实现1.3多态1.4对象对成员的操作（补充）1.5私有属性1.6重写魔术方法二，super函数2.1基本使用2.2super().\__init__()一，OOP基本特性OOP的四大基本特性是封装、继承、多态和抽象。1.1封
Dify1.01版本vscode 本地环境搭建运行实践 hamish-wu vscode 编辑器 dify 大模型 python flask
dify是python编写的低代码AI开发平台，是常用的大模型开发平台。本文基于最新的1.0.1版本实践完成，有需要的可以私信交流。咨询免费，详细文档及视频需要一定成本，大概相当于节约的时间成本。搭建环境windows11开发工具vscode搭建步骤：1.Startthedocker-composestackwindow环境下运行docker命令，需要下载docker官网镜像，会遇到timeout
vscode python 入门教程(一) window 10 环境下安装pyenv hamish-wu Python python 开发语言 pyenv
python的环境配置方法很多，由于python有两个大版本，很多时候需要切换某个固定的版本才能运行三方包，所以推荐使用pyenv配置python环境变量pyenv的安装安装方法：Invoke-WebRequest-UseBasicParsing-Uri"https://raw.githubusercontent.com/pyenv-win/pyenv-win/master/pyenv-win/i
mongodb与爬虫的关系 getapi mongodb 爬虫数据库
MongoDB与爬虫之间的关系主要体现在数据存储和管理的层面。爬虫（WebCrawler或Spider）是一种自动化工具，用于从互联网上抓取网页内容或特定数据。而MongoDB是一个NoSQL数据库，常被用来存储和管理爬虫抓取到的数据。以下是它们之间关系的具体分析：1.爬虫的数据存储需求爬虫在运行过程中会抓取大量的非结构化或半结构化数据（例如HTML页面、JSON数据、图片链接等）。这些数据通常具
【MongoDB】分片部署和应用实践全过程 gaoyi1234560 mongodb 数据库运维开发
基本概念Router(mongos)：数据库集群请求的入口ConfigServers(replicaset)存储数据库的元数据，如路由，分片的配置Share:数据库拆分分片具体操作配置主机名：vi/etc/hosts192.168.0.222m1192.168.0.111m2192.168.0.113m3安装目录为：/opt/mongodb/cluster创建目录和日志目录：mkdir-p/opt
Mongodb配置分片服务器 czw0723 mongodb 数据库服务器
mongodb的sharding集群由以下3个服务组成：ShardsServer:每个shard由一个或多个mongod进程组成，用于存储数据ConfigServer:用于存储集群的Metadata信息，包括每个Shard的信息和chunks信息RouteServer:用于提供路由服务，由Client连接，使整个Cluster看起来像单个DB服务器另外，Chunks是指MongoDB中一段连续的数
1-5 Python 入门之运算符的使用 Sa_sa_ki_Haise python
第1关：算术、比较、赋值运算符100任务要求参考答案评论201任务描述相关知识算术运算符比较(关系)运算符赋值运算符编程要求测试说明任务描述在编程时，我们常常需要对数值或对象进行算术、比较运算和赋值运算，以此来实现我们的功能需求。本关介绍Python中的一些基本运算符，并要求对给定的苹果和梨的数量进行算术运算、比较、赋值运算，然后输出相应的结果。相关知识要实现上述功能，需要用到Python中的各种
rabbitmq + minio +python 上传文件伶星37 rabbitmq python ruby
功能实现RabbitMq接收hello里面传来的消息根据消息在MobileFile里面新建文件新建文件上传到miniopython新建文件importospath='./MobileFile'file_path=os.path.join(path,"new_file.txt")withopen(file_path,"w")asfile:pass转换成函数格式importosdefcreatefil
vscode python 入门教程(二) vscode使用gti 管理代码 hamish-wu vscode ide 编辑器
vscode代码管理需要用管道git的命令，这点和idea的代码管理区别比较大。作为java开发需要自己熟悉适应一下。一、GitHub新建一个仓库过程略二、本地git项目初始化gitinitvscode中可以看到文件状态gitstatus使用gitremote命令吧本地git仓库和远程git仓库链接起来[email protected]提交代码gitcommit-m"评论
关于旗正规则引擎中的MD5加密问题何必如此 jsp MD5 规则加密
一般情况下，为了防止个人隐私的泄露，我们都会对用户登录密码进行加密，使数据库相应字段保存的是加密后的字符串，而非原始密码。在旗正规则引擎中，通过外部调用，可以实现MD5的加密，具体步骤如下： 1.在对象库中选择外部调用，选择“com.flagleader.util.MD5”，在子选项中选择“com.flagleader.util.MD5.getMD5ofStr({arg1})”； 2.在规
【Spark101】Scala Promise/Future在Spark中的应用 bit1129 Promise
Promise和Future是Scala用于异步调用并实现结果汇集的并发原语，Scala的Future同JUC里面的Future接口含义相同，Promise理解起来就有些绕。等有时间了再仔细的研究下Promise和Future的语义以及应用场景，具体参见Scala在线文档：http://docs.scala-lang.org/sips/completed/futures-promises.html
spark sql 访问hive数据的配置详解 daizj spark sql hive thriftserver
spark sql 能够通过thriftserver 访问hive数据，默认spark编译的版本是不支持访问hive，因为hive依赖比较多，因此打的包中不包含hive和thriftserver,因此需要自己下载源码进行编译，将hive，thriftserver打包进去才能够访问，详细配置步骤如下： 1、下载源码 2、下载Maven,并配置此配置简单，就略过
HTTP 协议通信周凡杨 java httpclient http 通信
一：简介 HTTPCLIENT，通过JAVA基于HTTP协议进行点与点间的通信！二：代码举例测试类： import java
java unix时间戳转换 g21121 java
把java时间戳转换成unix时间戳： Timestamp appointTime=Timestamp.valueOf(new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date())) SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd hh:m
web报表工具FineReport常用函数的用法总结（报表函数）老A不折腾 web报表 finereport 总结
说明：本次总结中，凡是以tableName或viewName作为参数因子的。函数在调用的时候均按照先从私有数据源中查找，然后再从公有数据源中查找的顺序。 CLASS CLASS(object):返回object对象的所属的类。 CNMONEY CNMONEY(number,unit)返回人民币大写。 number:需要转换的数值型的数。 unit:单位，
java jni调用c++ 代码报错墙头上一根草 java C++jni
# # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000777c3290, pid=5632, tid=6656 # # JRE version: Java(TM) SE Ru
Spring中事件处理de小技巧 aijuans spring Spring 教程 Spring 实例 Spring 入门 Spring3
Spring 中提供一些Aware相关de接口，BeanFactoryAware、 ApplicationContextAware、ResourceLoaderAware、ServletContextAware等等，其中最常用到de匙ApplicationContextAware.实现ApplicationContextAwaredeBean，在Bean被初始后，将会被注入 Applicati
linux shell ls脚本样例 annan211 linux linux ls源码 linux 源码
#! /bin/sh - #查找输入文件的路径 #在查找路径下寻找一个或多个原始文件或文件模式 # 查找路径由特定的环境变量所定义 #标准输出所产生的结果通常是查找路径下找到的每个文件的第一个实体的完整路径 # 或是filename :not found 的标准错误输出。 #如果文件没有找到则退出码为0 #否则即为找不到的文件个数 #语法 pathfind [--
List,Set,Map遍历方式 (收集的资源,值得看一下) 百合不是茶 list set Map遍历方式
List特点：元素有放入顺序，元素可重复 Map特点：元素按键值对存储，无放入顺序 Set特点：元素无放入顺序，元素不可重复（注意：元素虽然无放入顺序，但是元素在set中的位置是有该元素的HashCode决定的，其位置其实是固定的） List接口有三个实现类：LinkedList，ArrayList，Vector LinkedList：底层基于链表实现，链表内存是散乱的，每一个元素存储本身
解决SimpleDateFormat的线程不安全问题的方法 bijian1013 java thread 线程安全
在Java项目中，我们通常会自己写一个DateUtil类，处理日期和字符串的转换，如下所示： public class DateUtil01 { private SimpleDateFormat dateformat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); public void format(Date d
http请求测试实例（采用fastjson解析） bijian1013 http 测试
在实际开发中，我们经常会去做http请求的开发，下面则是如何请求的单元测试小实例，仅供参考。 import java.util.HashMap; import java.util.Map; import org.apache.commons.httpclient.HttpClient; import
【RPC框架Hessian三】Hessian 异常处理 bit1129 hessian
RPC异常处理概述 RPC异常处理指是，当客户端调用远端的服务，如果服务执行过程中发生异常，这个异常能否序列到客户端？如果服务在执行过程中可能发生异常，那么在服务接口的声明中，就该声明该接口可能抛出的异常。在Hessian中，服务器端发生异常，可以将异常信息从服务器端序列化到客户端，因为Exception本身是实现了Serializable的
【日志分析】日志分析工具 bit1129 日志分析
1. 网站日志实时分析工具 GoAccess http://www.vpsee.com/2014/02/a-real-time-web-log-analyzer-goaccess/ 2. 通过日志监控并收集 Java 应用程序性能数据(Perf4J) http://www.ibm.com/developerworks/cn/java/j-lo-logforperf/ 3.log.io 和
nginx优化加强战斗力及遇到的坑解决 ronin47 nginx 优化
　　　先说遇到个坑，第一个是负载问题，这个问题与架构有关，由于我设计架构多了两层，结果导致会话负载只转向一个。解决这样的问题思路有两个：一是改变负载策略，二是更改架构设计。　　　由于采用动静分离部署，而nginx又设计了静态，结果客户端去读nginx静态，访问量上来，页面加载很慢。解决：二者留其一。最好是保留apache服务器。　　　来以下优化：　　　
java-50-输入两棵二叉树A和B，判断树B是不是A的子结构 bylijinnan java
思路来自： http://zhedahht.blog.163.com/blog/static/25411174201011445550396/ import ljn.help.*; public class HasSubtree { /**Q50. * 输入两棵二叉树A和B，判断树B是不是A的子结构。例如，下图中的两棵树A和B，由于A中有一部分子树的结构和B是一
mongoDB 备份与恢复开窍的石头 mongDB备份与恢复
Mongodb导出与导入 1: 导入/导出可以操作的是本地的mongodb服务器,也可以是远程的. 所以,都有如下通用选项: -h host 主机 --port port 端口 -u username 用户名 -p passwd 密码 2: mongoexport 导出json格式的文件
[网络与通讯]椭圆轨道计算的一些问题 comsci 网络
如果按照中国古代农历的历法，现在应该是某个季节的开始，但是由于农历历法是3000年前的天文观测数据，如果按照现在的天文学记录来进行修正的话，这个季节已经过去一段时间了。。。。。也就是说，还要再等3000年。才有机会了，太阳系的行星的椭圆轨道受到外来天体的干扰，轨道次序发生了变
软件专利如何申请 cuiyadll 软件专利申请
软件技术可以申请软件著作权以保护软件源代码，也可以申请发明专利以保护软件流程中的步骤执行方式。专利保护的是软件解决问题的思想，而软件著作权保护的是软件代码（即软件思想的表达形式）。例如，离线传送文件，那发明专利保护是如何实现离线传送文件。基于相同的软件思想，但实现离线传送的程序代码有千千万万种，每种代码都可以享有各自的软件著作权。申请一个软件发明专利的代理费大概需要5000-8000申请发明专利可
Android学习笔记 darrenzhu android
1.启动一个AVD 2.命令行运行adb shell可连接到AVD,这也就是命令行客户端 3.如何启动一个程序 am start -n package name/.activityName am start -n com.example.helloworld/.MainActivity 启动Android设置工具的命令如下所示： # am start -
apache虚拟机配置，本地多域名访问本地网站 dcj3sjt126com apache
现在假定你有两个目录，一个存在于 /htdocs/a，另一个存在于 /htdocs/b 。现在你想要在本地测试的时候访问 www.freeman.com 对应的目录是 /xampp/htdocs/freeman ,访问 www.duchengjiu.com 对应的目录是 /htdocs/duchengjiu。 1、首先修改C盘WINDOWS\system32\drivers\etc目录下的
yii2 restful web服务[速率限制] dcj3sjt126com PHP yii2
速率限制为防止滥用，你应该考虑增加速率限制到您的API。例如，您可以限制每个用户的API的使用是在10分钟内最多100次的API调用。如果一个用户同一个时间段内太多的请求被接收，将返回响应状态代码 429 (这意味着过多的请求)。要启用速率限制, [[yii\web\User::identityClass|user identity class]] 应该实现 [[yii\filter
Hadoop2.5.2安装——单机模式 eksliang hadoop hadoop单机部署
转载请出自出处：http://eksliang.iteye.com/blog/2185414 一、概述 Hadoop有三种模式单机模式、伪分布模式和完全分布模式，这里先简单介绍单机模式，默认情况下，Hadoop被配置成一个非分布式模式，独立运行JAVA进程，适合开始做调试工作。二、下载地址 Hadoop 网址http:
LoadMoreListView+SwipeRefreshLayout（分页下拉）基本结构 gundumw100 android
一切为了快速迭代 import java.util.ArrayList; import org.json.JSONObject; import android.animation.ObjectAnimator; import android.os.Bundle; import android.support.v4.widget.SwipeRefreshLayo
三道简单的前端HTML/CSS题目 ini html Web 前端 css 题目
使用CSS为多个网页进行相同风格的布局和外观设置时，为了方便对这些网页进行修改，最好使用（）。http://hovertree.com/shortanswer/bjae/7bd72acca3206862.htm 在HTML中加入<table style=”color:red; font-size:10pt”>，此为（）。http://hovertree.com/s
overrided方法编译错误 kane_xie override
问题描述：在实现类中的某一或某几个Override方法发生编译错误如下： Name clash: The method put(String) of type XXXServiceImpl has the same erasure as put(String) of type XXXService but does not override it 当去掉@Over
Java中使用代理IP获取网址内容（防IP被封，做数据爬虫） mcj8089 免费代理IP 代理IP 数据爬虫 JAVA设置代理IP 爬虫封IP
推荐两个代理IP网站： 1. 全网代理IP：http://proxy.goubanjia.com/ 2. 敲代码免费IP：http://ip.qiaodm.com/ Java语言有两种方式使用代理IP访问网址并获取内容，方式一，设置System系统属性 // 设置代理IP System.getProper
Nodejs Express 报错之 listen EADDRINUSE qiaolevip 每天进步一点点学习永无止境 nodejs 纵观千象
当你启动 nodejs服务报错： >node app Express server listening on port 80 events.js:85 throw er; // Unhandled 'error' event ^ Error: listen EADDRINUSE at exports._errnoException (
C++中三种new的用法 _荆棘鸟_ C++new
转载自：http://news.ccidnet.com/art/32855/20100713/2114025_1.html 作者: mt 其一是new operator，也叫new表达式；其二是operator new，也叫new操作符。这两个英文名称起的也太绝了，很容易搞混，那就记中文名称吧。new表达式比较常见，也最常用，例如： string* ps = new string("
Ruby深入研究笔记1 wudixiaotie Ruby
module是可以定义private方法的 module MTest def aaa puts "aaa" private_method end private def private_method puts "this is private_method" end end