程序员-不秃头的阿焕

（盘点）25个值得收藏的Python文本处理案例

今天主要跟大家整理了25个值得收藏的Python文本处理案例。Python 处理文本是一项非常常见的功能，可以收藏起来，总会用到的，想要了解更多的关于python知识的，领取免费资源的，可以点击这个链接

1提取 PDF 内容

# pip install PyPDF2 安装 PyPDF2

import PyPDF2

from PyPDF2 import PdfFileReader

# Creating a pdf file object.

pdf = open("test.pdf", "rb")

# Creating pdf reader object.

pdf_reader = PyPDF2.PdfFileReader(pdf)

# Checking total number of pages in a pdf file.

print("Total number of Pages:", pdf_reader.numPages)

# Creating a page object.

page = pdf_reader.getPage(200)

# Extract data from a specific page number.

print(page.extractText())

# Closing the object.

pdf.close()

2提取 Word 内容

# pip install python-docx 安装 python-docx

import docx

def main():

try:

doc = docx.Document('test.docx') # Creating word reader object.

data = ""

fullText = []

for para in doc.paragraphs:

fullText.append(para.text)

data = '\n'.join(fullText)

print(data)

except IOError:

print('There was an error opening the file!')

return

if __name__ == '__main__':

main()

3提取 Web 网页内容

# pip install bs4 安装 bs4

from urllib.request import Request, urlopen

from bs4 import BeautifulSoup

req = Request('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1',

headers={'User-Agent': 'Mozilla/5.0'})

webpage = urlopen(req).read()

# Parsing

soup = BeautifulSoup(webpage, 'html.parser')

# Formating the parsed html file

strhtm = soup.prettify()

# Print first 500 lines

print(strhtm[:500])

# Extract meta tag value

print(soup.title.string)

print(soup.find('meta', attrs={'property':'og:description'}))

# Extract anchor tag value

for x in soup.find_all('a'):

print(x.string)

# Extract Paragraph tag value

for x in soup.find_all('p'):

print(x.text)

4读取 Json 数据

import requests

import json

r = requests.get("https://support.oneskyapp.com/hc/en-us/article_attachments/202761727/example_2.json")

res = r.json()

# Extract specific node content.

print(res['quiz']['sport'])

# Dump data as string

data = json.dumps(res)

print(data)

5读取 CSV 数据

import csv

with open('test.csv','r') as csv_file:

reader =csv.reader(csv_file)

next(reader) # Skip first row

for row in reader:

print(row)

6删除字符串中的标点符号

import re

import string

data = "Stuning even for the non-gamer: This sound track was beautiful!\

It paints the senery in your mind so well I would recomend\

it even to people who hate vid. game music! I have played the game Chrono \

Cross but out of all of the games I have ever played it has the best music! \

It backs away from crude keyboarding and takes a fresher step with grate\

guitars and soulful orchestras.\

It would impress anyone who cares to listen!"

# Methood 1 : Regex

# Remove the special charaters from the read string.

no_specials_string = re.sub('[!#?,.:";]', '', data)

print(no_specials_string)

# Methood 2 : translate()

# Rake translator object

translator = str.maketrans('', '', string.punctuation)

data = data.translate(translator)

print(data)

7使用 NLTK 删除停用词

from nltk.corpus import stopwords

data = ['Stuning even for the non-gamer: This sound track was beautiful!\

It paints the senery in your mind so well I would recomend\

it even to people who hate vid. game music! I have played the game Chrono \

Cross but out of all of the games I have ever played it has the best music! \

It backs away from crude keyboarding and takes a fresher step with grate\

guitars and soulful orchestras.\

It would impress anyone who cares to listen!']

# Remove stop words

stopwords = set(stopwords.words('english'))

output = []

for sentence in data:

temp_list = []

for word in sentence.split():

if word.lower() not in stopwords:

temp_list.append(word)

output.append(' '.join(temp_list))

print(output)

8使用 TextBlob 更正拼写

from textblob import TextBlob

data = "Natural language is a cantral part of our day to day life, and it's so antresting to work on any problem related to langages."

output = TextBlob(data).correct()

print(output)

9使用 NLTK 和 TextBlob 的词标记化

import nltk

from textblob import TextBlob

data = "Natural language is a central part of our day to day life, and it's so interesting to work on any problem related to languages."

nltk_output = nltk.word_tokenize(data)

textblob_output = TextBlob(data).words

print(nltk_output)

print(textblob_output)

Output:

['Natural', 'language', 'is', 'a', 'central', 'part', 'of', 'our', 'day', 'to', 'day', 'life', ',', 'and', 'it', "'s", 'so', 'interesting', 'to', 'work', 'on', 'any', 'problem', 'related', 'to', 'languages', '.']
['Natural', 'language', 'is', 'a', 'central', 'part', 'of', 'our', 'day', 'to', 'day', 'life', 'and', 'it', "'s", 'so', 'interesting', 'to', 'work', 'on', 'any', 'problem', 'related', 'to', 'languages']

10使用 NLTK 提取句子单词或短语的词干列表

from nltk.stem import PorterStemmer

st = PorterStemmer()

text = ['Where did he learn to dance like that?',

'His eyes were dancing with humor.',

'She shook her head and danced away',

'Alex was an excellent dancer.']

output = []

for sentence in text:

output.append(" ".join([st.stem(i) for i in sentence.split()]))

for item in output:

print(item)

print("-" * 50)

print(st.stem('jumping'), st.stem('jumps'), st.stem('jumped'))

Output:

where did he learn to danc like that?
hi eye were danc with humor.
she shook her head and danc away
alex wa an excel dancer.
--------------------------------------------------
jump jump jump

11使用 NLTK 进行句子或短语词形还原

from nltk.stem import WordNetLemmatizer

wnl = WordNetLemmatizer()

text = ['She gripped the armrest as he passed two cars at a time.',

'Her car was in full view.',

'A number of cars carried out of state license plates.']

output = []

for sentence in text:

output.append(" ".join([wnl.lemmatize(i) for i in sentence.split()]))

for item in output:

print(item)

print("*" * 10)

print(wnl.lemmatize('jumps', 'n'))

print(wnl.lemmatize('jumping', 'v'))

print(wnl.lemmatize('jumped', 'v'))

print("*" * 10)

print(wnl.lemmatize('saddest', 'a'))

print(wnl.lemmatize('happiest', 'a'))

print(wnl.lemmatize('easiest', 'a'))

Output:

She gripped the armrest a he passed two car at a time.
Her car wa in full view.
A number of car carried out of state license plates.
**********
jump
jump
jump
**********
sad
happy
easy

12使用 NLTK 从文本文件中查找每个单词的频率

import nltk

from nltk.corpus import webtext

from nltk.probability import FreqDist

nltk.download('webtext')

wt_words = webtext.words('testing.txt')

data_analysis = nltk.FreqDist(wt_words)

# Let's take the specific words only if their frequency is greater than 3.

filter_words = dict([(m, n) for m, n in data_analysis.items() if len(m) > 3])

for key in sorted(filter_words):

print("%s: %s" % (key, filter_words[key]))

data_analysis = nltk.FreqDist(filter_words)

data_analysis.plot(25, cumulative=False)

Output:

[nltk_data] Downloading package webtext to
[nltk_data] C:\Users\amit\AppData\Roaming\nltk_data...
[nltk_data] Unzipping corpora\webtext.zip.
1989: 1
Accessing: 1
Analysis: 1
Anyone: 1
Chapter: 1
Coding: 1
Data: 1
...

13从语料库中创建词云

import nltk

from nltk.corpus import webtext

from nltk.probability import FreqDist

from wordcloud import WordCloud

import matplotlib.pyplot as plt

nltk.download('webtext')

wt_words = webtext.words('testing.txt') # Sample data

data_analysis = nltk.FreqDist(wt_words)

filter_words = dict([(m, n) for m, n in data_analysis.items() if len(m) > 3])

wcloud = WordCloud().generate_from_frequencies(filter_words)

# Plotting the wordcloud

plt.imshow(wcloud, interpolation="bilinear")

plt.axis("off")

(-0.5, 399.5, 199.5, -0.5)

plt.show()

14NLTK 词法散布图

import nltk

from nltk.corpus import webtext

from nltk.probability import FreqDist

from wordcloud import WordCloud

import matplotlib.pyplot as plt

words = ['data', 'science', 'dataset']

nltk.download('webtext')

wt_words = webtext.words('testing.txt') # Sample data

points = [(x, y) for x in range(len(wt_words))

for y in range(len(words)) if wt_words[x] == words[y]]

if points:

x, y = zip(*points)

else:

x = y = ()

plt.plot(x, y, "rx", scalex=.1)

plt.yticks(range(len(words)), words, color="b")

plt.ylim(-1, len(words))

plt.title("Lexical Dispersion Plot")

plt.xlabel("Word Offset")

plt.show()

15使用 countvectorizer 将文本转换为数字

import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer

# Sample data for analysis

data1 = "Java is a language for programming that develops a software for several platforms. A compiled code or bytecode on Java application can run on most of the operating systems including Linux, Mac operating system, and Linux. Most of the syntax of Java is derived from the C++ and C languages."

data2 = "Python supports multiple programming paradigms and comes up with a large standard library, paradigms included are object-oriented, imperative, functional and procedural."

data3 = "Go is typed statically compiled language. It was created by Robert Griesemer, Ken Thompson, and Rob Pike in 2009. This language offers garbage collection, concurrency of CSP-style, memory safety, and structural typing."

df1 = pd.DataFrame({'Java': [data1], 'Python': [data2], 'Go': [data2]})

# Initialize

vectorizer = CountVectorizer()

doc_vec = vectorizer.fit_transform(df1.iloc[0])

# Create dataFrame

df2 = pd.DataFrame(doc_vec.toarray().transpose(),

index=vectorizer.get_feature_names())

# Change column headers

df2.columns = df1.columns

print(df2)

Output:

Go Java Python
and 2 2 2
application 0 1 0
are 1 0 1
bytecode 0 1 0
can 0 1 0
code 0 1 0
comes 1 0 1
compiled 0 1 0
derived 0 1 0
develops 0 1 0
for 0 2 0
from 0 1 0
functional 1 0 1
imperative 1 0 1
...

16使用 TF-IDF 创建文档术语矩阵

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

# Sample data for analysis

data2 = "Python supports multiple programming paradigms and comes up with a large standard library, paradigms included are object-oriented, imperative, functional and procedural."

df1 = pd.DataFrame({'Java': [data1], 'Python': [data2], 'Go': [data2]})

# Initialize

vectorizer = TfidfVectorizer()

doc_vec = vectorizer.fit_transform(df1.iloc[0])

# Create dataFrame

df2 = pd.DataFrame(doc_vec.toarray().transpose(),

index=vectorizer.get_feature_names())

# Change column headers

df2.columns = df1.columns

print(df2)

Output:

Go Java Python
and 0.323751 0.137553 0.323751
application 0.000000 0.116449 0.000000
are 0.208444 0.000000 0.208444
bytecode 0.000000 0.116449 0.000000
can 0.000000 0.116449 0.000000
code 0.000000 0.116449 0.000000
comes 0.208444 0.000000 0.208444
compiled 0.000000 0.116449 0.000000
derived 0.000000 0.116449 0.000000
develops 0.000000 0.116449 0.000000
for 0.000000 0.232898 0.000000
...

17为给定句子生成 N-gram

自然语言工具包：NLTK

import nltk

from nltk.util import ngrams

# Function to generate n-grams from sentences.

def extract_ngrams(data, num):

n_grams = ngrams(nltk.word_tokenize(data), num)

return [ ' '.join(grams) for grams in n_grams]

data = 'A class is a blueprint for the object.'

print("1-gram: ", extract_ngrams(data, 1))

print("2-gram: ", extract_ngrams(data, 2))

print("3-gram: ", extract_ngrams(data, 3))

print("4-gram: ", extract_ngrams(data, 4))

文本处理工具：TextBlob

from textblob import TextBlob

# Function to generate n-grams from sentences.

def extract_ngrams(data, num):

n_grams = TextBlob(data).ngrams(num)

return [ ' '.join(grams) for grams in n_grams]

data = 'A class is a blueprint for the object.'

print("1-gram: ", extract_ngrams(data, 1))

print("2-gram: ", extract_ngrams(data, 2))

print("3-gram: ", extract_ngrams(data, 3))

print("4-gram: ", extract_ngrams(data, 4))

Output:

1-gram: ['A', 'class', 'is', 'a', 'blueprint', 'for', 'the', 'object']
2-gram: ['A class', 'class is', 'is a', 'a blueprint', 'blueprint for', 'for the', 'the object']
3-gram: ['A class is', 'class is a', 'is a blueprint', 'a blueprint for', 'blueprint for the', 'for the object']
4-gram: ['A class is a', 'class is a blueprint', 'is a blueprint for', 'a blueprint for the', 'blueprint for the object']

18使用带有二元组的 sklearn CountVectorize 词汇规范

import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer

# Sample data for analysis

data1 = "Machine language is a low-level programming language. It is easily understood by computers but difficult to read by people. This is why people use higher level programming languages. Programs written in high-level languages are also either compiled and/or interpreted into machine language so that computers can execute them."

data2 = "Assembly language is a representation of machine language. In other words, each assembly language instruction translates to a machine language instruction. Though assembly language statements are readable, the statements are still low-level. A disadvantage of assembly language is that it is not portable, because each platform comes with a particular Assembly Language"

df1 = pd.DataFrame({'Machine': [data1], 'Assembly': [data2]})

# Initialize

vectorizer = CountVectorizer(ngram_range=(2, 2))

doc_vec = vectorizer.fit_transform(df1.iloc[0])

# Create dataFrame

df2 = pd.DataFrame(doc_vec.toarray().transpose(),

index=vectorizer.get_feature_names())

# Change column headers

df2.columns = df1.columns

print(df2)

Output:

Assembly Machine
also either 0 1
and or 0 1
are also 0 1
are readable 1 0
are still 1 0
assembly language 5 0
because each 1 0
but difficult 0 1
by computers 0 1
by people 0 1
can execute 0 1
...

19使用 TextBlob 提取名词短语

from textblob import TextBlob

#Extract noun

blob = TextBlob("Canada is a country in the northern part of North America.")

for nouns in blob.noun_phrases:

print(nouns)

Output:

canada
northern part
america

20如何计算词-词共现矩阵

import numpy as np

import nltk

from nltk import bigrams

import itertools

import pandas as pd

def generate_co_occurrence_matrix(corpus):

vocab = set(corpus)

vocab = list(vocab)

vocab_index = {word: i for i, word in enumerate(vocab)}

# Create bigrams from all words in corpus

bi_grams = list(bigrams(corpus))

# Frequency distribution of bigrams ((word1, word2), num_occurrences)

bigram_freq = nltk.FreqDist(bi_grams).most_common(len(bi_grams))

# Initialise co-occurrence matrix

# co_occurrence_matrix[current][previous]

co_occurrence_matrix = np.zeros((len(vocab), len(vocab)))

# Loop through the bigrams taking the current and previous word,

# and the number of occurrences of the bigram.

for bigram in bigram_freq:

current = bigram[0][1]

previous = bigram[0][0]

count = bigram[1]

pos_current = vocab_index[current]

pos_previous = vocab_index[previous]

co_occurrence_matrix[pos_current][pos_previous] = count

co_occurrence_matrix = np.matrix(co_occurrence_matrix)

# return the matrix and the index

return co_occurrence_matrix, vocab_index

text_data = [['Where', 'Python', 'is', 'used'],

['What', 'is', 'Python' 'used', 'in'],

['Why', 'Python', 'is', 'best'],

['What', 'companies', 'use', 'Python']]

# Create one list using many lists

data = list(itertools.chain.from_iterable(text_data))

matrix, vocab_index = generate_co_occurrence_matrix(data)

data_matrix = pd.DataFrame(matrix, index=vocab_index,

columns=vocab_index)

print(data_matrix)

Output:

best use What Where ... in is Python used
best 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 1.0
use 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0
What 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
Where 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
Pythonused 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 1.0
Why 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 1.0
companies 0.0 1.0 0.0 1.0 ... 1.0 0.0 0.0 0.0
in 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0
is 0.0 0.0 1.0 0.0 ... 0.0 0.0 0.0 0.0
Python 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
used 0.0 0.0 1.0 0.0 ... 0.0 0.0 0.0 0.0

[11 rows x 11 columns]

21使用 TextBlob 进行情感分析

from textblob import TextBlob

def sentiment(polarity):

if blob.sentiment.polarity < 0:

print("Negative")

elif blob.sentiment.polarity > 0:

print("Positive")

else:

print("Neutral")

blob = TextBlob("The movie was excellent!")

print(blob.sentiment)

sentiment(blob.sentiment.polarity)

blob = TextBlob("The movie was not bad.")

print(blob.sentiment)

sentiment(blob.sentiment.polarity)

blob = TextBlob("The movie was ridiculous.")

print(blob.sentiment)

sentiment(blob.sentiment.polarity)

Output:

Sentiment(polarity=1.0, subjectivity=1.0)
Positive
Sentiment(polarity=0.3499999999999999, subjectivity=0.6666666666666666)
Positive
Sentiment(polarity=-0.3333333333333333, subjectivity=1.0)
Negative

22使用 Goslate 进行语言翻译

import goslate

text = "Comment vas-tu?"

gs = goslate.Goslate()

translatedText = gs.translate(text, 'en')

print(translatedText)

translatedText = gs.translate(text, 'zh')

print(translatedText)

translatedText = gs.translate(text, 'de')

print(translatedText)

23使用 TextBlob 进行语言检测和翻译

from textblob import TextBlob

blob = TextBlob("Comment vas-tu?")

print(blob.detect_language())

print(blob.translate(to='es'))

print(blob.translate(to='en'))

print(blob.translate(to='zh'))

Output:

fr
¿Como estas tu?
How are you?
你好吗？

24使用 TextBlob 获取定义和同义词

from textblob import TextBlob

from textblob import Word

text_word = Word('safe')

print(text_word.definitions)

synonyms = set()

for synset in text_word.synsets:

for lemma in synset.lemmas():

synonyms.add(lemma.name())

print(synonyms)

Output:

['strongbox where valuables can be safely kept', 'a ventilated or refrigerated cupboard for securing provisions from pests', 'contraceptive device consisting of a sheath of thin rubber or latex that is worn over the penis during intercourse', 'free from danger or the risk of harm', '(of an undertaking) secure from risk', 'having reached a base without being put out', 'financially sound']
{'secure', 'rubber', 'good', 'safety', 'safe', 'dependable', 'condom', 'prophylactic'}

25使用 TextBlob 获取反义词列表

from textblob import TextBlob

from textblob import Word

text_word = Word('safe')

antonyms = set()

for synset in text_word.synsets:

for lemma in synset.lemmas():

if lemma.antonyms():

antonyms.add(lemma.antonyms()[0].name())

print(antonyms)

Output:

{'dangerous', 'out'}

00_01 python机器学习_环境搭建辛　欣机器学习 python sklearn
机器学习环境的搭建Windows+Python3Python3下载地址python环境设置安装尽量安装在自定义目录下,方便查找,其他选项都用默认值就行.安装成功后,cmd里输入python校验.下载用于机器学习的虚拟环境的包>python-mpipvirtualenv初始化虚拟环境#进入到自定义要保存环境的位置>cdxxxxxx#.venv是新创建的用于存放机器学习必要包的文件夹,名字可以随意起,
Python自定义鼠标指针工具(支持像素级编辑) 木木黄木木 python 计算机外设开发语言
Python自定义鼠标指针工具(支持像素级编辑)项目简介这是一个基于Python的鼠标指针自定义工具,可以让用户自由创建、编辑和设置Windows系统的鼠标指针。该工具提供了像素级编辑功能,支持多种绘图工具,并且可以直接将设计好的指针应用到系统中。主要功能像素级编辑器支持16x16到64x64多种尺寸铅笔、橡皮擦工具直线、矩形、圆形等几何图形工具填充/空心图形选择取色器功能热点设置可视化热点标记精
如何在Windows 10上安装Python？字节王德发 python windows python 开发语言
在今天的数字时代，Python已经成为了一个非常流行的编程语言。无论你是想做数据分析、网页开发，还是人工智能，Python都能满足你的需求。如果你使用的是Windows10系统，安装Python其实并不复杂。下面就让我带你一步一步走过这个过程，确保你可以顺利地在你的电脑上运行Python。1.准备工作在开始之前，确保你的Windows10系统已经更新到最新版本。打开“设置”，选择“更新与安全”，然
Dash-FastAPI-Admin 技术文档钟音洋Winona
Dash-FastAPI-Admin技术文档Dash-FastAPI-AdminAgeneralmiddleandbackendmanagementsystemdevelopedpurelyinPythonbasedonDash+FastAPI.基于Dash+FastAPI纯Python开发的一个通用中后台管理系统。项目地址:https://gitcode.com/gh_mirrors/da/Da
AttributeError: ‘NoneType‘ object has no attribute ‘astype‘ 冰虺 python
今天在复现fasterRCNN网络时，出现AttributeError:'NoneType'objecthasnoattribute'astype'报错，如下图所示通过dug,发现im的shape为none,究其原因是因为cv.imread无法读取图像，主要原因是因为我数据集中存在中文名称。更加详细的请参考一下大佬的博客pythoncv2.imread读取中文路径的图片返回为None的问题_陶将的
一个神奇 Python 库，PyPDF2帮你在工作中轻松驾驭PDF 代码小念软件测试面试自动化测试 python pdf 职场和发展软件测试面试
大家好，很多人学习Python进行办公自动化，其中一个场景就是处理PDF，PyPDF2是一个非常强大的Python库，它允许你管理和操作PDF文件。不论是分割、合并、旋转还是加密PDF，PyPDF2都能轻松应对。PyPDF2是一个纯Python编写的库，用于读取PDF文件和操作PDF页面。它允许你进行以下操作：•读取PDF:使用PyPDF2读取PDF文件非常简单。首先需要导入库，然后加载PDF文件
如何利用PyPDF2库轻松提取PDF中的文本？字节王德发 python pdf
在数字化时代，PDF格式的文件随处可见。从电子书到官方文档，PDF是我们经常遇到的一种文件格式。不过，有时候我们需要从这些PDF里提取文本，这就需要一些工具来帮忙了！今天我们就来聊聊Python的一个强大工具——PyPDF2库，它能让我们轻松实现PDF文本提取。PyPDF2库的简介PyPDF2是一个开源的Python库，主要用于处理PDF文件。这个库支持多种操作，包括合并、拆分、加密和解密PDF文
使用MySQL Yum 源在Linux上安装MySQL Run Out Of Brain mysql linux 数据库
OracleLinux、RedHatEnterpriseLinux、CentOS和Fedora的MySQLYum源提供了RPM安装包，用于安装MySQL服务器、客户端、MySQLWorkbench、MySQL实用程序、MySQL路由器、MySQLShell、连接器/ODBC、连接器/Python等在你开始之前作为一种流行的开源软件，MySQL以其原始或重新打包的形式被广泛安装在来自各种来源的许多系
python中的Counter函数算法小菜鸟moon python python 开发语言
在Python中，Counter是collections模块中的一个类，用于统计可迭代对象中元素的出现次数，并以字典的形式返回，键为元素，值为对应的计数。它非常适合处理频率统计问题。用之前必须先导入fromcollectionsimportCounter统计元素的频率ss="helloworld"ss_counter=Counter(ss)print(ss_counter)Counter({'l'
测试江湖：为什么大多数人宁愿吃生活的苦，也不愿意吃学习的苦爱吃香菜软件测试自动化测试职场经验学习软件测试自动化测试测试工具职场经验程序员功能测试
面试求职：「面试试题小程序」，内容涵盖测试基础、Linux操作系统、MySQL数据库、Web功能测试、接口测试、APPium移动端测试、Python知识、Selenium自动化测试相关、性能测试、性能测试、计算机网络知识、Jmeter、HR面试，命中率杠杠的。（大家刷起来…）职场经验干货：软件测试工程师简历上如何编写个人信息（一周8个面试）软件测试工程师简历上如何编写专业技能（一周8个面试）软件测
Python--类墨上烟雨 Python python 开发语言
Python--类一、类的定义方法二、创建类的实例三、创建类的成员1、创建属性创建类属性创建实例属性2、创建方法实例方法类方法四、类成员的访问权限1、保护属性成员2、私有属性成员五、装饰器一、类的定义方法在Python中，类可以通过关键字class来定义。类的定义包含在一个代码块中，并且通常以冒号:结尾。类的定义方法有以下几种：1.空类定义：classMyClass:pass这是一个最简单的类定义
【Steg】CTF 隐写术题目解题思路图 D-river CTF 安全网络安全
以下是专门针对CTF隐写术（Steganography）的解题思路与步骤树形图，包含常见分类、工具链和关键方法：CTF隐写术题目解题思路图隐写术（Steganography）├──1.图片隐写（ImageSteg）│├──1.1LSB隐写（最低有效位）││├──步骤：StegSolve逐通道分析，提取LSB数据。││└──工具：StegSolve、zsteg、PythonPIL库。│││├──1.
python中Counter的用法 V.E. python
Python中，Counter是一个字典子类，用于计算可哈希对象（如列表、元组、字符串）中元素的出现次数。Counter可以用于快速计数元素，而不需要手动循环计算。以下是一些Counter的用法：导入Counter：fromcollectionsimportCountermy_list=[1,2,3,2,4,3,1,2,2]my_counter=Counter(my_list)访问Counter对
Python精进系列：ord 函数进一步有进一步的欢喜 python 开发语言
目录一、引言二、基本概念2.1Unicode编码简介2.2ord函数定义三、使用示例3.1处理ASCII字符3.2处理非ASCII字符四、与chr函数的关系五、实际应用场景5.1字符排序5.2简单加密算法5.3字符验证六、注意事项6.1输入参数要求6.2跨平台兼容性七、总结一、引言在Python编程里，字符和编码的处理是常见的操作。ord函数作为Python内置函数之一，在字符和Unicode编码
Python精进系列：divmod 函数进一步有进一步的欢喜 python
目录一、引言二、基本语法与原理2.1语法格式2.2返回值2.3原理理解三、使用示例3.1整数运算示例3.2浮点数运算示例四、应用场景4.1时间单位转换4.2数据分组4.3循环计数与位置计算五、注意事项5.1除数不能为零5.2浮点数运算的精度问题六、总结一、引言divmod()函数就是其中一个小巧而强大的工具，它能同时进行除法运算并返回商和余数，在很多场景下都能大显身手。本文将对divmod()函数
如何本地运行和开发 Langchain 项目 dgay_hua langchain python
在这篇指南中，我们将详细讲解如何在本地运行Langchain仓库，并提交您的第一个代码变更。如果您希望在开发容器中操作，可参考.devcontainer文件夹中的指南。技术背景介绍Langchain项目是一个高级框架，围绕构建“链”这种概念来组织代码。每个链的基本构件是微型逻辑单元，它们彼此结合，并可通过可编辑的配置文件进行管理。这个项目依托于Python生态，利用了Poetry作为其依赖管理工具
Python 基础：类诸葛思颖 python python 开发语言
目录一、类的概念二、定义类三、创建对象并进行访问四、修改属性的值方法一：句点表示法直接访问并修改方法二：通过方法进行修改五、继承继承父类属性和方法重写父类方法六、将实例用作属性七、导入类导入单个类从一个模块中导入多个类导入整个模块导入模块中的所有类八、一些代码编写规范遇到看不明白的地方，欢迎在评论中留言呐，一起讨论，一起进步！本文参考：《Python编程：从入门到实践（第2版）》一、类的概念类是是
‌Visual Studio Code（VS Code）支持的编程语言计算机辅助工程 vscode
‌JavaScript‌：VSCode原生支持JavaScript，提供语法高亮、代码折叠、自动补全等功能。推荐使用ESLint和Prettier进行代码格式化和错误检查‌。‌TypeScript‌：作为JavaScript的超集，TypeScript在VSCode中也得到原生支持，提供类似的编辑功能‌。‌Python‌：通过安装Python扩展，VSCode支持Python编程，提供调试、Int
Python精进系列：Counter 函数进一步有进一步的欢喜 python 编程语言
目录一、Counter函数概述二、基本使用案例（一）列表元素计数（二）字符串字符计数（三）元组计数三、Counter对象的常用方法（一）most_common()方法（二）update()方法（三）subtract()方法（四）elements()方法四、Counter对象的数学运算（一）加法运算（二）减法运算（三）交集运算（四）并集运算五、实际应用场景（一）文本分析（二）数据分析（三）游戏开发应
RK3568笔记六十八：Yolov11目标检测部署测试殷忆枫 RK3568学习笔记笔记 YOLO
若该文为原创文章，转载请注明原文出处。看到yolov11出了，山水无移大佬测试通过，跟个风测试一下效果。使用的是正点原子的ATK-DLRK3568开发板。这里不训练自己的模型了，使用官方模型测试。一、环境搭建1、下载源码ultralytics/ultralytics:UltralyticsYOLO112、创建虚拟环境condacreate-nyolov11_envpython=3.83、激活con
Python 爬虫实战：从大众点评爬取餐厅评价，探寻美食打卡地西攻城狮北 python 爬虫美食实战案例
目录引言一、项目背景与需求分析1.1大众点评平台的特点1.2数据爬取目标二、技术选型与工具准备2.1技术选型2.2工具准备三、爬取餐厅信息3.1获取餐厅列表3.2获取餐厅详情四、数据存储五、数据处理与分析5.1数据清洗5.2数据分析六、可视化展示七、总结与展望引言大众点评作为国内知名的本地生活服务平台，提供了丰富的餐厅信息和用户评价。通过爬取大众点评上的餐厅评价数据，我们可以分析餐厅的受欢迎程度、
Python高级开发工程师巴啦啦小魔仙变身 python 开发语言
Python高级开发工程师通常会围绕技术能力、项目经验、问题解决能力等方面展开,以下为你详细介绍面试的常见内容、准备方式及注意事项:常见面试内容技术基础语言特性:深入理解Python的高级特性,如装饰器、元类、描述符等的原理和应用场景。例如,面试官可能会要求你现场编写一个装饰器来实现函数执行时间的统计。数据结构与算法:熟悉常见的数据结构(如列表、字典、集合、堆、栈、队列、链表、树、图等)和算法(如
Python爬虫实战010：反爬取机制学习若北辰 Python爬虫教程 python 爬虫开发语言
#-*-coding:utf-8-*-"""@ModuleName:demo_001@Function:@Author:@Time:2020/12/28上午11:21"""fromlxmlimportetreeimportpandasaspdimportreimportrandomimporturllibimportrequestsimporttimeimportosimportjson
python while循环 18例实相无相 python 开发语言 while 循环
pythonwhile循环18例打印1到10的数字i=1whilei=1:print(i)i-=1这个例子中，我们使用了while循环来打印10到1的数字。首先，我们将i的值设置为10，然后在循环中检查i是否大于或等于1。如果是，我们打印i的值，然后将i的值减1。这个过程一直重复，直到i的值小于1为止。打印1到10的偶数i=1whilei<=10:ifi%2==0:print(i)i+=1这个例子
Python的函数活跃家族 python
1、函数的相互调用函数里可以调用函数：函数一定是调用的时候才会执行函数体的代码。如果函数里调用函数需要传参，这个传参的数据可以再次参数化。注意代码执行的顺序：先执行所有顶格的代码：运行函数的定义，函数体缩进的的代码不会执行；调用函数的时候才会运行函数体的内容2、函数的作用域全局变量：直接定义在py文件(模块)且函数外部的变量，叫全局变量；全局变量在当前文件中任何地方都可以使用，包括在函数里使用；函
人工智能学习星月IWJ 人工智能机器学习深度学习神经网络目标检测人工智能
//-----初探-----//人工智能三大核心要素数据/算法/算力人工智能是通过机器来模拟人类认知能力的技术机器学习/神经网络/深度学习(多层隐藏层神经网络)tf1.14python3.5keras2.1.5//-----数学基础&&数字图像-----//向量大小/方向矢量(有大小和方向)标量(只有大小没有方向(长度))单位向量线性变换(矩阵运算)T(v+w)=T(v)+T(w)T(cv)=cT
Python while循环东北银儿 python基础知识 python
1.简介：while循环是不断运行，直到指定的条件不满足为止，不同于for循环。for循环是针对于集合中每个元素都执行相同的一个代码块。2.while循环的使用：#使用while循环从1数到5number=1whilenumber5:breakprint(number)number+=1#输出结果123454.continue语句：当满足一个条件要返回到循环开头，并根据条件测试结果决定是否继续执行
【Python】使用 requests 发送消息到飞书（Feishu）Webhook 教程 Prajna-Paramita python 飞书开发语言
本教程将指导你如何使用Python编写一个简单的脚本，通过飞书的Webhook功能发送消息到指定的群聊或频道。飞书的Webhook功能允许你通过HTTP请求将消息推送到飞书群聊中，非常适合用于自动化通知、监控报警等场景。1.准备工作1.1获取飞书WebhookURL首先，你需要在飞书中创建一个自定义机器人，并获取其WebhookURL。具体步骤如下：打开飞书，进入你想要接收消息的群聊。点击群聊右上
物联网设备接入系统后如何查看硬件实时数据? 动亦定物联网 java 物联网
要在软件中实时查看硬件设备的信息，通常需要结合前后端技术来实现。以下是设计思路和实现步骤：1.系统架构设计实时查看硬件设备信息的系统通常采用以下架构：数据采集层:硬件设备通过传感器采集数据，发送到InfluxDB。数据存储层:InfluxDB存储设备的历史和实时数据。后端服务层:提供API接口，从InfluxDB查询数据并返回给前端。前端展示层:通过Web界面或移动端实时展示设备信息。实时通信层:
基于Python + Flask + PyEcharts + Bootstrap实现疫情可视化平台小夕Coding 大数据系列可视化 python 数据可视化 java vue
基于Python+Flask+PyEcharts+Bootstrap实现的疫情可视化平台依赖库DependencyVersionPython3.6flask1.1.1pyecharts1.7.1requests2.22使用pythonrun.py#在浏览器中访问http://localhost:776前端展示项目下载链接：https://pan.baidu.com/s/1l1y2eiTWMTApQ
项目中枚举与注解的结合使用飞翔的马甲 java enum annotation
前言：版本兼容，一直是迭代开发头疼的事，最近新版本加上了支持新题型，如果新创建一份问卷包含了新题型，那旧版本客户端就不支持，如果新创建的问卷不包含新题型，那么新旧客户端都支持。这里面我们通过给问卷类型枚举增加自定义注解的方式完成。顺便巩固下枚举与注解。一、枚举 1.在创建枚举类的时候，该类已继承java.lang.Enum类，所以自定义枚举类无法继承别的类，但可以实现接口。
【Scala十七】Scala核心十一：下划线_的用法 bit1129 scala
下划线_在Scala中广泛应用，_的基本含义是作为占位符使用。_在使用时是出问题非常多的地方，本文将不断完善_的使用场景以及所表达的含义 1. 在高阶函数中使用 scala> val list = List(-3,8,7,9) list: List[Int] = List(-3, 8, 7, 9) scala> list.filter(_ > 7) r
web缓存基础：术语、http报头和缓存策略 dalan_123 Web
对于很多人来说，去访问某一个站点，若是该站点能够提供智能化的内容缓存来提高用户体验，那么最终该站点的访问者将络绎不绝。缓存或者对之前的请求临时存储，是http协议实现中最核心的内容分发策略之一。分发路径中的组件均可以缓存内容来加速后续的请求，这是受控于对该内容所声明的缓存策略。接下来将讨web内容缓存策略的基本概念，具体包括如如何选择缓存策略以保证互联网范围内的缓存能够正确处理的您的内容，并谈论下
crontab 问题周凡杨 linux crontab unix
一： 0481-079 Reached a symbol that is not expected. 背景： */5 * * * * /usr/IBMIHS/rsync.sh
让tomcat支持2级域名共享session g21121 session
tomcat默认情况下是不支持2级域名共享session的，所有有些情况下登陆后从主域名跳转到子域名会发生链接session不相同的情况，但是只需修改几处配置就可以了。打开tomcat下conf下context.xml文件找到Context标签,修改为如下内容如果你的域名是www.test.com <Context sessionCookiePath="/path&q
web报表工具FineReport常用函数的用法总结（数学和三角函数）老A不折腾 Web finereport 总结
ABS ABS(number):返回指定数字的绝对值。绝对值是指没有正负符号的数值。 Number:需要求出绝对值的任意实数。示例: ABS(-1.5)等于1.5。 ABS(0)等于0。 ABS(2.5)等于2.5。 ACOS ACOS(number):返回指定数值的反余弦值。反余弦值为一个角度，返回角度以弧度形式表示。 Number:需要返回角
linux 启动java进程 sh文件墙头上一根草 linux shell jar
#!/bin/bash #初始化服务器的进程PId变量 user_pid=0; robot_pid=0; loadlort_pid=0; gateway_pid=0; ######### #检查相关服务器是否启动成功 #说明： #使用JDK自带的JPS命令及grep命令组合，准确查找pid #jps 加 l 参数，表示显示java的完整包路径 #使用awk，分割出pid
我的spring学习笔记5-如何使用ApplicationContext替换BeanFactory aijuans Spring 3 系列
如何使用ApplicationContext替换BeanFactory？ package onlyfun.caterpillar.device; import org.springframework.beans.factory.BeanFactory; import org.springframework.beans.factory.xml.XmlBeanFactory; import
Linux 内存使用方法详细解析 annan211 linux 内存 Linux内存解析
来源 http://blog.jobbole.com/45748/ 我是一名程序员，那么我在这里以一个程序员的角度来讲解Linux内存的使用。一提到内存管理，我们头脑中闪出的两个概念，就是虚拟内存，与物理内存。这两个概念主要来自于linux内核的支持。 Linux在内存管理上份为两级，一级是线性区，类似于00c73000-00c88000，对应于虚拟内存，它实际上不占用
数据库的单表查询常用命令及使用方法(-) 百合不是茶 oracle 函数单表查询
创建数据库; --建表 create table bloguser(username varchar2(20),userage number(10),usersex char(2)); 创建bloguser表,里面有三个字段 &nbs
多线程基础知识 bijian1013 java 多线程 thread java多线程
一．进程和线程进程就是一个在内存中独立运行的程序，有自己的地址空间。如正在运行的写字板程序就是一个进程。 “多任务”：指操作系统能同时运行多个进程（程序）。如WINDOWS系统可以同时运行写字板程序、画图程序、WORD、Eclipse等。线程：是进程内部单一的一个顺序控制流。线程和进程 a. 每个进程都有独立的
fastjson简单使用实例 bijian1013 fastjson
一.简介阿里巴巴fastjson是一个Java语言编写的高性能功能完善的JSON库。它采用一种“假定有序快速匹配”的算法，把JSON Parse的性能提升到极致，是目前Java语言中最快的JSON库；包括“序列化”和“反序列化”两部分，它具备如下特征：
【RPC框架Burlap】Spring集成Burlap bit1129 spring
Burlap和Hessian同属于codehaus的RPC调用框架，但是Burlap已经几年不更新，所以Spring在4.0里已经将Burlap的支持置为Deprecated,所以在选择RPC框架时，不应该考虑Burlap了。这篇文章还是记录下Burlap的用法吧，主要是复制粘贴了Hessian与Spring集成一文，【RPC框架Hessian四】Hessian与Spring集成
【Mahout一】基于Mahout 命令参数含义 bit1129 Mahout
1. mahout seqdirectory $ mahout seqdirectory --input (-i) input Path to job input directory(原始文本文件). --output (-o) output The directory pathna
linux使用flock文件锁解决脚本重复执行问题 ronin47 linux lock　重复执行
linux的crontab命令，可以定时执行操作，最小周期是每分钟执行一次。关于crontab实现每秒执行可参考我之前的文章《linux crontab 实现每秒执行》现在有个问题，如果设定了任务每分钟执行一次，但有可能一分钟内任务并没有执行完成，这时系统会再执行任务。导致两个相同的任务在执行。例如： <? // test .php
java-74-数组中有一个数字出现的次数超过了数组长度的一半，找出这个数字 bylijinnan java
public class OcuppyMoreThanHalf { /** * Q74 数组中有一个数字出现的次数超过了数组长度的一半，找出这个数字 * two solutions: * 1.O(n) * see <beauty of coding>--每次删除两个不同的数字，不改变数组的特性 * 2.O(nlogn) * 排序。中间
linux 系统相关命令 candiio linux
系统参数 cat /proc/cpuinfo cpu相关参数 cat /proc/meminfo 内存相关参数 cat /proc/loadavg 负载情况性能参数 1）top M：按内存使用排序 P：按CPU占用排序 1：显示各CPU的使用情况 k：kill进程 o：更多排序规则回车：刷新数据 2）ulimit ulimit -a：显示本用户的系统限制参
[经营与资产]保持独立性和稳定性对于软件开发的重要意义 comsci 软件开发
一个软件的架构从诞生到成熟，中间要经过很多次的修正和改造如果在这个过程中，外界的其它行业的资本不断的介入这种软件架构的升级过程中那么软件开发者原有的设计思想和开发路线
在CentOS5.5上编译OpenJDK6 Cwind linux OpenJDK
几番周折终于在自己的CentOS5.5上编译成功了OpenJDK6，将编译过程和遇到的问题作一简要记录，备查。 0. OpenJDK介绍 OpenJDK是Sun（现Oracle）公司发布的基于GPL许可的Java平台的实现。其优点： 1、它的核心代码与同时期Sun（-> Oracle）的产品版基本上是一样的，血统纯正，不用担心性能问题，也基本上没什么兼容性问题；（代码上最主要的差异是
java乱码问题 dashuaifu java乱码问题 js中文乱码
swfupload上传文件参数值为中文传递到后台接收中文乱码在js中用setPostParams（{"tag" : encodeURI( document.getElementByIdx_x("filetag").value，"utf-8")}）; 然后在servlet中String t
cygwin很多命令显示command not found的解决办法 dcj3sjt126com cygwin
cygwin很多命令显示command not found的解决办法修改cygwin.BAT文件如下 @echo off D: set CYGWIN=tty notitle glob set PATH=%PATH%;d:\cygwin\bin;d:\cygwin\sbin;d:\cygwin\usr\bin;d:\cygwin\usr\sbin;d:\cygwin\us
[介绍]从 Yii 1.1 升级 dcj3sjt126com PHP yii2
2.0 版框架是完全重写的，在 1.1 和 2.0 两个版本之间存在相当多差异。因此从 1.1 版升级并不像小版本间的跨越那么简单，通过本指南你将会了解两个版本间主要的不同之处。如果你之前没有用过 Yii 1.1，可以跳过本章，直接从"入门篇"开始读起。请注意，Yii 2.0 引入了很多本章并没有涉及到的新功能。强烈建议你通读整部权威指南来了解所有新特性。这样有可能会发
Linux SSH免登录配置总结 eksliang ssh-keygen Linux SSH免登录认证 Linux SSH互信
转载请出自出处：http://eksliang.iteye.com/blog/2187265 一、原理我们使用ssh-keygen在ServerA上生成私钥跟公钥，将生成的公钥拷贝到远程机器ServerB上后,就可以使用ssh命令无需密码登录到另外一台机器ServerB上。生成公钥与私钥有两种加密方式，第一种是
手势滑动销毁Activity gundumw100 android
老是效仿ios，做android的真悲催！有需求：需要手势滑动销毁一个Activity 怎么办尼？自己写？不用~，网上先问一下百度。结果： http://blog.csdn.net/xiaanming/article/details/20934541 首先将你需要的Activity继承SwipeBackActivity，它会在你的布局根目录新增一层SwipeBackLay
JavaScript变换表格边框颜色 ini JavaScript html Web html5 css
效果查看：http://hovertree.com/texiao/js/2.htm代码如下，保存到HTML文件也可以查看效果： <html> <head> <meta charset="utf-8"> <title>表格边框变换颜色代码-何问起</title> </head> <body&
Kafka Rest : Confluent kane_xie kafka REST confluent
最近拿到一个kafka rest的需求，但kafka暂时还没有提供rest api（应该是有在开发中，毕竟rest这么火），上网搜了一下，找到一个Confluent Platform，本文简单介绍一下安装。这里插一句，给大家推荐一个九尾搜索，原名叫谷粉SOSO，不想fanqiang谷歌的可以用这个。以前在外企用谷歌用习惯了，出来之后用度娘搜技术问题，那匹配度简直感人。环境声明：Ubu
Calender不是单例 men4661273 单例 Calender
在我们使用Calender的时候，使用过Calendar.getInstance()来获取一个日期类的对象，这种方式跟单例的获取方式一样，那么它到底是不是单例呢，如果是单例的话，一个对象修改内容之后，另外一个线程中的数据不久乱套了吗？从试验以及源码中可以得出，Calendar不是单例。测试： Calendar c1 =
线程内存和主内存之间联系 qifeifei java thread
1， java多线程共享主内存中变量的时候，一共会经过几个阶段， lock:将主内存中的变量锁定，为一个线程所独占。 unclock:将lock加的锁定解除，此时其它的线程可以有机会访问此变量。 read:将主内存中的变量值读到工作内存当中。 load:将read读取的值保存到工作内存中的变量副本中。
schedule和scheduleAtFixedRate tangqi609567707 java timer schedule
原文地址：http://blog.csdn.net/weidan1121/article/details/527307 import java.util.Timer;import java.util.TimerTask;import java.util.Date; /** * @author vincent */public class TimerTest {
erlang 部署 wudixiaotie erlang
1.如果在启动节点的时候报这个错： {"init terminating in do_boot",{'cannot load',elf_format,get_files}} 则需要在reltool.config中加入 {app, hipe, [{incl_cond, exclude}]}, 2.当generate时，遇到： ERROR

按字母分类： A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 其他

（盘点）25个值得收藏的Python文本处理案例

目录

1提取 PDF 内容

2提取 Word 内容

3提取 Web 网页内容

4读取 Json 数据

5读取 CSV 数据

6删除字符串中的标点符号

7使用 NLTK 删除停用词

8使用 TextBlob 更正拼写

9使用 NLTK 和 TextBlob 的词标记化

10使用 NLTK 提取句子单词或短语的词干列表

11使用 NLTK 进行句子或短语词形还原

12使用 NLTK 从文本文件中查找每个单词的频率

13从语料库中创建词云

14NLTK 词法散布图

15使用 countvectorizer 将文本转换为数字

16使用 TF-IDF 创建文档术语矩阵

17为给定句子生成 N-gram

18使用带有二元组的 sklearn CountVectorize 词汇规范

19使用 TextBlob 提取名词短语

20如何计算词-词共现矩阵

21使用 TextBlob 进行情感分析

22使用 Goslate 进行语言翻译

23使用 TextBlob 进行语言检测和翻译

24使用 TextBlob 获取定义和同义词

25使用 TextBlob 获取反义词列表

你可能感兴趣的:(python,开发语言,后端)