str字符串及其常用函数

str字符串

- str
- 转义字符
- 格式化
- 内建函数

字符串

  • 表示文字信息
  • 用单引号,双引号,三引号括起来
s = 'hello word'
print(s)
hello word
s = "hello world"
print(s)
hello world
s = """
撒开了多久
阿萨
达拉克斯基的
圣诞节阿松
"""
print(s)
撒开了多久
阿萨
达拉克斯基的
圣诞节阿松

转义字符

  • 用一个特色的方法表示出一系列不方便写出的内容,比如回车键,换行符,退格键
  • 借助反斜杠字符 \ ,一旦字符串中出现反斜杠,则反斜杠后面一个或者几个字符表示已经不是原来的意思了,进行了转义
  • 在字符串中,一旦出现反斜杠就要加倍小心,可能有转义字符出现
  • 不同系统对换行操作有不同的表示
    • windows:\n
    • Linux:\r\n
# 转义字符的案例
# 想表达Let's Go
# 使用转义字符
s = 'Let\'s Go'
print(s)

# 使用单双引号嵌套
s = "Let's Go"
print(s)

# 表示斜杠
# 比如表示C:\User\Augsnano
s = "C:\\User\\Augsnano"
print(s)

# 回车换行
# 想表达的效果是
# ich
# lieb
# xxxx
# windows 下也可以使用\r\n,效果相同
s = "ich\nlieb\nxxxx"
print(s)
Let's Go
Let's Go
C:\User\Augsnano
ich
lieb
xxxx

常用转义字符

    转义字符  描述
    \(在行尾时)续行符
    \\ 反斜杠符
    \' 单引号
    \" 双引号
    \a 响铃
    \b 退格(backspace)
    \e 转义
    \000 空
    \n 换行
    \v 纵向制表符
    \t 横向制表符
    \r 回车
    \f 换页
    \oyy 八进制数,yy代表的字符,例如:\o12代表换行
    \xyy 十六进制数,yy代表的字符,例如:\x0a代表换行
    \other 其他的字符一普通格式输出

# 单个斜杠的用法
# 在pythin里,单个反斜杠表示此行未结束,处于美观,需要下一行继续
# 理论上应该写成 def myDemo(int x,int y,int z):
def myDemo(x, \
           y, \
           z):
    print("sadajhdiashda")
    
myDemo(1,2,3)
    

sadajhdiashda

格式化

- 把字符串按照一定格式进行打印或者填充
- 格式话的分类:
    - 传统格式化
    - format

# 填充
s = "i am bb"
print(s)

#

i am bb

字符串的传统格式化方法

  • 使用%进行格式化
  • %(百分号)也叫占位符
    %s: 字符串
    %r: 字符串,但是时使用repr而不是str
    %c: 整数转换为单个字符
    %d: 十进制整数
    %u: 无符号整数
    %o: 表示八进制
    %x: 十六进制,字母为小写(x为小写)
    %X: 十六进制,字母为大写(X为大写)
    %e: 浮点数(e为小写),例如2.87e+12
    %E: 浮点数(E为大写),例如2.87E+12
    %f,%F: 浮点数十进制形式
    %g,%G: 十进制形式浮点或者指数浮点自动转换
    格式字符前出现整数表示此占位符所占位置的宽度
    格式字符前边出现’-‘表示左对齐
    格式字符前边出现’+'表示右对齐
    0位数不足用’0’补齐
    width表示宽度
    pricision精度
# %s 表示简单的字符串
# 占位符可以单独使用

s = "I love %s"
print(s)

print(s%"LQ")

print(s%"WCP")

I love %s
I love LQ
I love WCP

print("I Love %s" % "xiaojing")
# 占位符一般只能被同类型替换,或者替换类型能被转换成占位符的类型
# 一下案例属于特例
print("I Love %s" %100)

I Love xiaojing
I Love 100

s = "我今年%d岁了?"
print(s % 17)

print(s % "19")

我今年17岁了?

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

 in 
      2 print(s % 17)
      3 
----> 4 print(s % "19")

TypeError: %d format: a number is required, not str

s = "I am %fKG weight, %fm height"
print(s)

# 如果需要格式化的信息多个,则用括号括起来就可以
# 以下打印使用了默认格式,多余打出了好多个零
print(s % (76.4,1.84))

# 实际需要进行格式化的信息的数量必须与百分号后面给出的数据数量匹配,否则报错
# 如下例子,实际需要格式化的为4出,但是给出数据为三个,则报错
s = "I am %.2fKG weight, %.2fm height"
print(s%(76.4,1.84))

I am %fKG weight, %fm height
I am 76.400000KG weight, 1.840000m height
I am 76.40KG weight, 1.84m height

format格式化

  • 使用函数形式进行格式化,代替以前的百分号
# 不用指定位置,按顺序读取
# 方式1
s = "{} {}!"
print(s.format("hello","world"))

# 方式2
s = "{} {}!".format("hello","world")
print(s)

# 设置指定位置
s = "{0} {1}".format("hello","world")
print(s)

# 设置指定位置
s = "{1} {0}".format("hello","world")
print(s)

# 设置指定位置
s = "{0} {0}".format("hello")
print(s)
# 下面案例报错,跟上面案例进行对比
# 设置指定位置
#s = "{} {}".format("hello")
#print(s)

# 通过字段设置参数需要解包
# 使用命名参数
s = "我们是{team_name},我们的口号是{slogan},{name}最帅"

s_dict = {"team_name":"pd", "slogan": "团结就是力量", "name": "wcp"}

#s = s.format(team_name = "pd",\
#             slogan = "团结就是力量",\
#             name = "wcp")
# **是解包操作,后面会讲到
s = s.format(**s_dict)
print(s)

hello world!
hello world!
hello world
world hello
hello hello
我们是pd,我们的口号是团结就是力量,wcp最帅

# 对数字的格式化需要用到
s = "Liu Dana is {:.2f}m heigh,{:.2f}KG weight"
print(s.format(1.84,76.45))

# ^,<,> 分表是居中、左对齐、右对齐,后面带宽度,
# : 号后面带填充的字符,只能是一个字符,不指定则默认是用空格填充
# + 表示在正数前显示 + ,负数前显示 - ;(空格)表示在正数前加空格
# b、d、o、x分别是二进制、十进制、八进制、十六进制
# 此外我们可以使用大括号{}来转义大括号
s = "format函数是使用{{}}来进行占位的"
print(s)

Liu Dana is 1.84m heigh,76.45KG weight
format函数是使用{{}}来进行占位的

str内置函数

  • 很多语言字符串使用strig表示,但是python中用str表示字符串
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __getnewargs__(...)
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self int
 |      
 |      Return the number of non-overlapping occurrences of substring sub in
 |      string S[start:end].  Optional arguments start and end are
 |      interpreted as in slice notation.
 |  
 |  encode(self, /, encoding='utf-8', errors='strict')
 |      Encode the string using the codec registered for encoding.
 |      
 |      encoding
 |        The encoding in which to encode the string.
 |      errors
 |        The error handling scheme to use for encoding errors.
 |        The default is 'strict' meaning that encoding errors raise a
 |        UnicodeEncodeError.  Other possible values are 'ignore', 'replace' and
 |        'xmlcharrefreplace' as well as any other name registered with
 |        codecs.register_error that can handle UnicodeEncodeErrors.
 |  
 |  endswith(...)
 |      S.endswith(suffix[, start[, end]]) -> bool
 |      
 |      Return True if S ends with the specified suffix, False otherwise.
 |      With optional start, test S beginning at that position.
 |      With optional end, stop comparing S at that position.
 |      suffix can also be a tuple of strings to try.
 |  
 |  expandtabs(self, /, tabsize=8)
 |      Return a copy where all tab characters are expanded using spaces.
 |      
 |      If tabsize is not given, a tab size of 8 characters is assumed.
 |  
 |  find(...)
 |      S.find(sub[, start[, end]]) -> int
 |      
 |      Return the lowest index in S where substring sub is found,
 |      such that sub is contained within S[start:end].  Optional
 |      arguments start and end are interpreted as in slice notation.
 |      
 |      Return -1 on failure.
 |  
 |  format(...)
 |      S.format(*args, **kwargs) -> str
 |      
 |      Return a formatted version of S, using substitutions from args and kwargs.
 |      The substitutions are identified by braces ('{' and '}').
 |  
 |  format_map(...)
 |      S.format_map(mapping) -> str
 |      
 |      Return a formatted version of S, using substitutions from mapping.
 |      The substitutions are identified by braces ('{' and '}').
 |  
 |  index(...)
 |      S.index(sub[, start[, end]]) -> int
 |      
 |      Return the lowest index in S where substring sub is found, 
 |      such that sub is contained within S[start:end].  Optional
 |      arguments start and end are interpreted as in slice notation.
 |      
 |      Raises ValueError when the substring is not found.
 |  
 |  isalnum(self, /)
 |      Return True if the string is an alpha-numeric string, False otherwise.
 |      
 |      A string is alpha-numeric if all characters in the string are alpha-numeric and
 |      there is at least one character in the string.
 |  
 |  isalpha(self, /)
 |      Return True if the string is an alphabetic string, False otherwise.
 |      
 |      A string is alphabetic if all characters in the string are alphabetic and there
 |      is at least one character in the string.
 |  
 |  isascii(self, /)
 |      Return True if all characters in the string are ASCII, False otherwise.
 |      
 |      ASCII characters have code points in the range U+0000-U+007F.
 |      Empty string is ASCII too.
 |  
 |  isdecimal(self, /)
 |      Return True if the string is a decimal string, False otherwise.
 |      
 |      A string is a decimal string if all characters in the string are decimal and
 |      there is at least one character in the string.
 |  
 |  isdigit(self, /)
 |      Return True if the string is a digit string, False otherwise.
 |      
 |      A string is a digit string if all characters in the string are digits and there
 |      is at least one character in the string.
 |  
 |  isidentifier(self, /)
 |      Return True if the string is a valid Python identifier, False otherwise.
 |      
 |      Use keyword.iskeyword() to test for reserved identifiers such as "def" and
 |      "class".
 |  
 |  islower(self, /)
 |      Return True if the string is a lowercase string, False otherwise.
 |      
 |      A string is lowercase if all cased characters in the string are lowercase and
 |      there is at least one cased character in the string.
 |  
 |  isnumeric(self, /)
 |      Return True if the string is a numeric string, False otherwise.
 |      
 |      A string is numeric if all characters in the string are numeric and there is at
 |      least one character in the string.
 |  
 |  isprintable(self, /)
 |      Return True if the string is printable, False otherwise.
 |      
 |      A string is printable if all of its characters are considered printable in
 |      repr() or if it is empty.
 |  
 |  isspace(self, /)
 |      Return True if the string is a whitespace string, False otherwise.
 |      
 |      A string is whitespace if all characters in the string are whitespace and there
 |      is at least one character in the string.
 |  
 |  istitle(self, /)
 |      Return True if the string is a title-cased string, False otherwise.
 |      
 |      In a title-cased string, upper- and title-case characters may only
 |      follow uncased characters and lowercase characters only cased ones.
 |  
 |  isupper(self, /)
 |      Return True if the string is an uppercase string, False otherwise.
 |      
 |      A string is uppercase if all cased characters in the string are uppercase and
 |      there is at least one cased character in the string.
 |  
 |  join(self, iterable, /)
 |      Concatenate any number of strings.
 |      
 |      The string whose method is called is inserted in between each given string.
 |      The result is returned as a new string.
 |      
 |      Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'
 |  
 |  ljust(self, width, fillchar=' ', /)
 |      Return a left-justified string of length width.
 |      
 |      Padding is done using the specified fill character (default is a space).
 |  
 |  lower(self, /)
 |      Return a copy of the string converted to lowercase.
 |  
 |  lstrip(self, chars=None, /)
 |      Return a copy of the string with leading whitespace removed.
 |      
 |      If chars is given and not None, remove characters in chars instead.
 |  
 |  partition(self, sep, /)
 |      Partition the string into three parts using the given separator.
 |      
 |      This will search for the separator in the string.  If the separator is found,
 |      returns a 3-tuple containing the part before the separator, the separator
 |      itself, and the part after it.
 |      
 |      If the separator is not found, returns a 3-tuple containing the original string
 |      and two empty strings.
 |  
 |  replace(self, old, new, count=-1, /)
 |      Return a copy with all occurrences of substring old replaced by new.
 |      
 |        count
 |          Maximum number of occurrences to replace.
 |          -1 (the default value) means replace all occurrences.
 |      
 |      If the optional argument count is given, only the first count occurrences are
 |      replaced.
 |  
 |  rfind(...)
 |      S.rfind(sub[, start[, end]]) -> int
 |      
 |      Return the highest index in S where substring sub is found,
 |      such that sub is contained within S[start:end].  Optional
 |      arguments start and end are interpreted as in slice notation.
 |      
 |      Return -1 on failure.
 |  
 |  rindex(...)
 |      S.rindex(sub[, start[, end]]) -> int
 |      
 |      Return the highest index in S where substring sub is found,
 |      such that sub is contained within S[start:end].  Optional
 |      arguments start and end are interpreted as in slice notation.
 |      
 |      Raises ValueError when the substring is not found.
 |  
 |  rjust(self, width, fillchar=' ', /)
 |      Return a right-justified string of length width.
 |      
 |      Padding is done using the specified fill character (default is a space).
 |  
 |  rpartition(self, sep, /)
 |      Partition the string into three parts using the given separator.
 |      
 |      This will search for the separator in the string, starting at the end. If
 |      the separator is found, returns a 3-tuple containing the part before the
 |      separator, the separator itself, and the part after it.
 |      
 |      If the separator is not found, returns a 3-tuple containing two empty strings
 |      and the original string.
 |  
 |  rsplit(self, /, sep=None, maxsplit=-1)
 |      Return a list of the words in the string, using sep as the delimiter string.
 |      
 |        sep
 |          The delimiter according which to split the string.
 |          None (the default value) means split according to any whitespace,
 |          and discard empty strings from the result.
 |        maxsplit
 |          Maximum number of splits to do.
 |          -1 (the default value) means no limit.
 |      
 |      Splits are done starting at the end of the string and working to the front.
 |  
 |  rstrip(self, chars=None, /)
 |      Return a copy of the string with trailing whitespace removed.
 |      
 |      If chars is given and not None, remove characters in chars instead.
 |  
 |  split(self, /, sep=None, maxsplit=-1)
 |      Return a list of the words in the string, using sep as the delimiter string.
 |      
 |      sep
 |        The delimiter according which to split the string.
 |        None (the default value) means split according to any whitespace,
 |        and discard empty strings from the result.
 |      maxsplit
 |        Maximum number of splits to do.
 |        -1 (the default value) means no limit.
 |  
 |  splitlines(self, /, keepends=False)
 |      Return a list of the lines in the string, breaking at line boundaries.
 |      
 |      Line breaks are not included in the resulting list unless keepends is given and
 |      true.
 |  
 |  startswith(...)
 |      S.startswith(prefix[, start[, end]]) -> bool
 |      
 |      Return True if S starts with the specified prefix, False otherwise.
 |      With optional start, test S beginning at that position.
 |      With optional end, stop comparing S at that position.
 |      prefix can also be a tuple of strings to try.
 |  
 |  strip(self, chars=None, /)
 |      Return a copy of the string with leading and trailing whitespace remove.
 |      
 |      If chars is given and not None, remove characters in chars instead.
 |  
 |  swapcase(self, /)
 |      Convert uppercase characters to lowercase and lowercase characters to uppercase.
 |  
 |  title(self, /)
 |      Return a version of the string where each word is titlecased.
 |      
 |      More specifically, words start with uppercased characters and all remaining
 |      cased characters have lower case.
 |  
 |  translate(self, table, /)
 |      Replace each character in the string using the given translation table.
 |      
 |        table
 |          Translation table, which must be a mapping of Unicode ordinals to
 |          Unicode ordinals, strings, or None.
 |      
 |      The table must implement lookup/indexing via __getitem__, for instance a
 |      dictionary or list.  If this operation raises LookupError, the character is
 |      left untouched.  Characters mapped to None are deleted.
 |  
 |  upper(self, /)
 |      Return a copy of the string converted to uppercase.
 |  
 |  zfill(self, width, /)
 |      Pad a numeric string with zeros on the left, to fill a field of the given width.
 |      
 |      The string is never truncated.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  maketrans(x, y=None, z=None, /)
 |      Return a translation table usable for str.translate().
 |      
 |      If there is only one argument, it must be a dictionary mapping Unicode
 |      ordinals (integers) or characters to Unicode ordinals, strings or None.
 |      Character keys will be then converted to ordinals.
 |      If there are two arguments, they must be strings of equal length, and
 |      in the resulting dictionary, each character in x will be mapped to the
 |      character at the same position in y. If there is a third argument, it
 |      must be a string, whose characters will be mapped to None in the result.

help(str.find)

Help on method_descriptor:

find(...)
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.

str内置函数

  • 字符串查找类,find,index,islower
  • find:查找字符串中是否包含一个字串
  • index: 跟find的唯一区别是index如果找不到会引发异常
  • rfind,lfind:从左开始查找或者从右开始查找
s = "wcp love lq and wcplq"
s1 = "lq"
# 返回第一次发现这个字符串的位置
s.find(s1)

# 返回-1表示没有
s2 = "wcww"
s.find(s2)

-1

# index会报错或者引发异常
s.index(s2)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

 in 
      1 # index会报错或者引发异常
----> 2 s.index(s2)

ValueError: substring not found

# 使用的时候还可以使用区间查找
s = "wo love xie dai ma,wo love exercises."
s1 = "love"

# 从下标为20开始查找,看能否找到
s.find(s1,25)

-1

help(str.rfind)

Help on method_descriptor:

rfind(...)
    S.rfind(sub[, start[, end]]) -> int
    
    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.

判断类函数

  • 此类函数的特点是一般都用is开头,比如islower

  • isalpha:判断是否是字母,需要注意的是两点:

    • 此函数默认的前提是字符串至少包含一个字符,如果没有,同样返回False
    • 汉字被认为是alpha,所以此函数不能作为区分英文字母还是汉字的标识,区分中英文请使用unicode码。
    • 注意使用区别,防止被坑
  • isdigit,isnumeric,isdecimal三个判断数字的函数

    • 此类函数不建议使用,在后期爬虫中,判断是否是数字建议采用正则表达式的方式

    对着三个函数的一个总结就是:
    digit:
    True:Unicode数字,byte数字(单字节),全角数字(双字节),罗马数字
    False:汉字数字
    Error:无
    isdecimal:
    True:Unicode数字,全角数字(双字节)
    False:罗马数字,汉字数字
    Error:byte数字(单字节)
    isnumeric:
    True:Unicode数字,全角数字(双字节),罗马数字,汉字数字
    False:无
    Error:byte数字(单字节)

  • islower:判断字符串大小写

help(str.islower)

Help on method_descriptor:

islower(self, /)
    Return True if the string is a lowercase string, False otherwise.
    
    A string is lowercase if all cased characters in the string are lowercase and
    there is at least one cased character in the string.

# 需要注意的是,因为输入法的问题,输入罗马数字可能得不到我们想要的结果
chin_num = "一二三四"
print(chin_num.isdigit())
print(chin_num.isnumeric())
print(chin_num.isdecimal())

False
True
False

内容判断

  • startswith/endswith:是否以xxx开头或者结尾
    • 检测某个字符串是否以某个字符串开头,常用三个参数
    • suffix:被检查的字符串,必须有
    • start: 检查范围的开始范围
    • end: 检查范围的结束范围
  • islower/isupper:判断字符串是否是大写或者小写
wcp = "Wang CaiPo"
lq = "Liao Qing"

s = "Wang CaiPo really love Liao Qing"

print(s.startswith(wcp))
print(s.endswith(lq))

True
True

s1 = "CaiPo love liaoqing"
s2 = "CaiPoloveliaoqing"
s3 = "caipoloveliaoqing"
# s4包含空格,但空格不影响结果,忽略
s4 = "caipo love liaoqing"
s5 = "坡坡同学是爱清清的"

print(s1.islower())
print(s2.islower())
print(s3.islower())
print(s4.islower())
print(s5.islower())
# 汉字字符串无大小写概念,所以判断大小写都是False
print(s5.isupper())

False
False
True
True
False
False

操作类函数

  • format:格式化用的
  • strip:这个函数主要作用是删除字符串两边的空,其实这个函数允许你去定义删除字符串两边的那个字符,只不过如果不指定的删除字符的话,默认是空格。同样还有lstrip和rstrip,此处l和r分别表示左右,即删除字符串左边或者右边制定字符,默认空格。需要注意的是,此处的删除不是删除一个,是删除从头开始符合条件的连续字符。
  • strip相似的函数还包含lstrip,rstrip
  • join:这个函数主要对字符串进行拼接。它需要一个可以迭代的内容作为参数(迭代的概念后面介绍,此处暂时理解成一个列表),功能是把可迭代的字符串拼接在一起,中间使用调用字符串作为分隔符。
c = "DDDDDDsadas dsfsf vvvvv "
# 是否成功删除两边空格不能观察出来
print(c.strip(),end="--------")
print()
print(c.strip('D'),end="--------")

DDDDDDsadas dsfsf vvvvv--------
sadas dsfsf vvvvv --------

help(str.strip)

Help on method_descriptor:

strip(self, chars=None, /)
    Return a copy of the string with leading and trailing whitespace remove.
    
    If chars is given and not None, remove characters in chars instead.

# join的例子,我们需要使用s1,s2,s3作为分隔符,把ss内的内容拼接在一起
s1 = "$"
s2 = "-"
s3 = " "
ss = ["wcp","love","liao","qingqing"]

print(s1.join(ss))
print(s2.join(ss))
print(s3.join(ss))

wcp$love$liao$qingqing
wcp-love-liao-qingqing
wcp love liao qingqing

你可能感兴趣的:(python基础,str)