文章详情页_Python 字符串操作

在 Python 中，字符串数据几乎无处不在，掌握字符串处理非常重要。

字符串内置函数

序号	方法	描述
1	capitalize()	将字符串的第一个字符转换为大写
2	count()	统计字符串里某个字符出现的次数
3	find()	检测 str 是否包含在字符串中，如果包含则返回开始的索引值，否则返回-1
4	join(seq)	将序列中的元素以指定的字符连接生成一个新的字符串
5	len(string)	返回字符串长度
6	replace(old, new [, max])	将字符串中的 old 替换成 new，如果 max 指定，则替换不超过 max 次
7	split(str, maxsplit=num)	把将字符串中的 old 替换成 new，如果 max 指定，则替换不超过 max 次
8	upper()	转换字符串中的小写字母为大写
9	lower()	转换字符串中所有大写字符为小写
10	max	返回字符串中最大的字母
11	min	返回字符串中最小的字母
12	strip([chars])	移除字符串头尾指定的字符（默认为空格）或字符序列

capitalize 首字符会转换成大写，其余字符会转换成小写
首字符如果是非字母，首字母不会转换成大写，且所有字符都会转换成小写

>>> s = "hello PYTHON"
>>> s.capitalize()
'Hello python'

>>> s = "123 Hello Python"
>>> s.capitalize()
'123 hello python'

count()

>>> str = "https://www.python.org"
>>> sub = "o"
>>> str.count(sub)
2

find()

>>> info = 'abca'

>>> print(info.find('a'))  # 从下标0开始，查找在字符串里第一个出现的子串
0

>>> print(info.find('a', 1))  # 从下标1开始，查找在字符串里第一个出现的子串
3

>>> print(info.find('3'))  # 查找不到返回-1
-1

join()

>>> s1 = ""                               # 连接串
>>> seq = ('P', 'Y', 'T', 'H', 'O', 'N')  # 字符串序列

>>> s1.join(seq)
'PYTHON'

>>> str = 'I Love it.'
>>> len(str)
10

replace()

>>> str = "www.baidu.com"
>>> str.replace('baidu.com', 'python.org')
'www.python.org'

upper()

>>> str = "www.baidu.com"
>>> str.upper()
'WWW.BAIDU.COM'

lower()

>>> str = "ORACLE ACE"
>>> str.lower()
'oracle ace'

>>> str = "www.baidu.com"
>>> max(str)
'w'

>>> str = "www.baidu.com"
>>> min(str)
'.'

strip

>>> str = "*****this is **string** example....wow!!!*****"
>>> str.strip( '*' )
'this is **string** example....wow!!!'

字符串格式化

%s 形式

>>> print('hello %s and %s' %('Python', 'Oracle'))
hello Python and Oracle

%s + 字典形式

>>> print('hello %(first)s and %(second)s' %{'first':'Python', 'second':'Oracle'})
hello Python and Oracle

.format 形式

>>> print('hello {0} and {1}'.format('Python', 'Oracle'))
hello Python and Oracle

>>> print('hello {first} and {second}'.format(first='Python', second='Oracle'))
hello Python and Oracle

字符串切片

字符串切片操作(slice)，可以从一个字符串中获取子字符串，我们使用一对方括号、起始偏移量 start 、终止偏移量 end 以及可选的步长 step 来定义一个分片。

序号	方法	描述
1	[:]	提取从开头(默认位置0)位置到结尾(默认位置-1)位置的整个字符串
2	[start:]	从 start 位置提取到结尾位置
3	[:end]	从开头位置提取到 end - 1 位置
4	[start:end]	从 start 位置提取到 end - 1 位置
5	[start:end:step]	从 start 位置提取到 end - 1 位置，每 step 个字符提取一个

基础用法：

>>> letter = 'abcd'

>>> letter[0]
'a'

>>> letter[-1]
'd'

提取最后N个字符：

>>> letter = 'abcdefghijklmnopqrstuvwxyz'

>>> letter[-3:]
'xyz'

从开头到结尾，step为N：

>>> letter[::5]
'afkpuz'

将字符串倒转(reverse)，通过设置步长为负数：

>>> letter[::-1]
'zyxwvutsrqponmlkjihgfedcba'

字符串拆分

无参数字符串拆分

如果不指定任何分隔符，.split() 会使用空格作为分割符，并返回 list 类型数据。

>>> s = "this is my string"

>>> s.split()
['this', 'is', 'my', 'string']

>>> type(s.split())
<class 'list'>
>>>

指定分隔符

当指定分隔符后， .split() 会使用指定的分隔符，对字符串进行分隔。

>>> s = '<a href="www.baidu.com">baidu</a>'

>>> s.split('"')[1].split('.')
['www', 'baidu', 'com']

使用 Maxsplit 对拆分进行限定

.split() 有一个名为 maxsplit 的可选参数，默认情况下，.split() 将在调用时进行所有可能的拆分，但是，如果对 maxsplit 赋值后，就只会生成指定数量的拆分。

>>> s = "this is my string"

>>> s.split(maxsplit=1)
['this', 'is my string']

练习：使用 .split 对 CSV 文件进行分隔

CSV 文件：

Name,Phone,Address
Mike Smith,15554218841,123 Nice St, Roy, NM, USA
Anita Hernandez,15557789941,425 Sunny St, New York, NY, USA
Guido van Rossum,315558730,Science Park 123, 1980 XG Amsterdam, NL

希望得到的数据：

[
    ['Mike Smith', '15554218841', '123 Nice St, Roy, NM, USA'],
    ['Anita Hernandez', '15557789941', '425 Sunny St, New York, NY, USA'],
    ['Guido van Rossum', '315558730', 'Science Park 123, 1980 XG Amsterdam, NL']
]

Python 代码：

>>> s = """Name,Phone,Address
Mike Smith,15554218841,123 Nice St, Roy, NM, USA
Anita Hernandez,15557789941,425 Sunny St, New York, NY, USA
Guido van Rossum,315558730,Science Park 123, 1980 XG Amsterdam, NL"""

>>> def str_split(unsplit):
...     results = []
...     for line in unsplit.split('\n')[1:]:
...         results.append(line.split(',', maxsplit=2))
...     return results
...

>>> print(str_split(s))
[['Mike Smith', '15554218841', '123 Nice St, Roy, NM, USA'], ['Anita Hernandez', '15557789941', '425 Sunny St, New York, NY, USA'], ['Guido van Rossum', '315558730', 'Science Park 123, 1980 XG Amsterdam, NL']]

字符串连接及拼接

使用运算符 + 进行字符串连接

使用运算符 +，可以将多个字符串连接在一起

>>> s1 = "hello"
>>> s2 = "world"

>>> s = s1 + ' ' +  s2

>>> s
'hello world'

使用 .join() 进行字符串连接

join() 方法用于将字符串素以指定的字符(分隔符)连接生成一个新的字符串

>>> s = "python"

>>> ','.join(s)
'p,y,t,h,o,n'

>>> li = ['hello', 'python']

>>> ','.join(li)
'hello,python'

字符串编码

通过以下代码查看 Python 3 的字符串默认编码：

>>> import sys

>>> sys.getdefaultencoding()
'utf-8'

使用 Python 解释器进行如下编码解码操作，在 bytes 和 str 之间转换：

>>> '字符串'.encode()
b'\xe5\xad\x97\xe7\xac\xa6\xe4\xb8\xb2'

>>> b'\xe5\xad\x97\xe7\xac\xa6\xe4\xb8\xb2'.decode('utf-8')
'字符串'

原创文章，转载请注明出处：http://www.opcoder.cn/article/31/