目录

python-正则表达式-re模块-练习

python 正则表达式 re模块 练习

python 正则表达式 re模块 练习

练习1:验证电子邮箱格式

任务:编写正则表达式验证电子邮箱格式是否正确。

代码:

import re

def validate_email(email):
    # 此处编写正则表达式赋值给 pattern 
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+[a-zA-Z]{2,}$'
    if re.match(pattern, email):
        return True
    else:
        return False

# 测试
emails = ["example@gmail.com", "invalid-email", "test.email+alias@gmail.com"]
for email in emails:
    print(f"{email}: {'Valid' if validate_email(email) else 'Invalid'}")
example@gmail.com: Valid
invalid-email: Invalid
test.email+alias@gmail.com: Valid

练习2:提取电话号码

任务:从文本中提取标准的电话号码(如:13812345678)。

代码:

import re

text = "我的电话是13812345678,欢迎联系!"
# 此处编写正则表达式赋值给 pattern 
pattern = r'(?<!\d)1[3-9]\d{9}(?!\d)'
matches = re.findall(pattern, text)
print("Extracted phone numbers:", matches)
Extracted phone numbers: ['13812345678']

练习3:提取日期

任务:从文本中提取日期(格式为YYYY-MM-DD)。

代码:

import re

text = "今天的日期是2023-10-10,明天是2023-10-11。"
# 此处编写正则表达式赋值给 pattern 
pattern = r'(?<!\d)\d{4}[-/]\d{2}[-/]\d{2}(?!\d)'
matches = re.findall(pattern, text)
print("Extracted dates:", matches)
Extracted dates: ['2023-10-10', '2023-10-11']

练习4:验证密码强度

任务:编写正则表达式验证密码是否符合以下规则:长度至少8位,包含字母和数字。

代码:

import re

def validate_password(password):
    # 此处编写正则表达式赋值给 pattern 
    pattern = r'^(?=.*[a-zA-Z])(?=.*\d).{8,}$'
    return bool(re.match(pattern, password))

# 测试
passwords = ["password123", "weak", "Strong1234"]
for pwd in passwords:
    print(f"{pwd}: {'Valid' if validate_password(pwd) else 'Invalid'}")
password123: Valid
weak: Invalid
Strong1234: Valid

练习5:提取URL

任务:从文本中提取URL。

代码:

import re

text = "访问 https://www.example.com 或 http://example.org 查看详情。"
# 此处编写正则表达式赋值给 pattern 
pattern = r'https?://[^\s]+'
matches = re.findall(pattern, text)
print("Extracted URLs:", matches)
Extracted URLs: ['https://www.example.com', 'http://example.org']

练习6:替换文本中的敏感词

任务:将文本中的敏感词替换为 ***

代码:

import re

text = "这是一个包含敏感词的文本。"
sensitive_words = ["敏感词"]
# 此处编写正则表达式赋值给 pattern 
pattern = r'(' + '|'.join(sensitive_words) + r')'
cleaned_text = re.sub(pattern, "***", text)
print("Cleaned text:", cleaned_text)
Cleaned text: 这是一个包含***的文本。

练习7:提取IP地址

任务:从文本中提取IPv4地址。

代码:

import re

text = "服务器的IP地址是192.168.1.1,备用地址是10.0.0.1。"
# 此处编写正则表达式赋值给 pattern 
pattern = r'[0-9]+(?:[0-9]+){3}|[0-9a-fA-F:]+'
matches = re.findall(pattern, text)
print("Extracted IP addresses:", matches)
Extracted IP addresses: ['192.168.1.1', '10.0.0.1']

练习8:验证身份证号码

任务:编写正则表达式验证18位身份证号码。

代码:

import re

def validate_id_card(id_card):
    # 此处编写正则表达式赋值给 pattern 
    pattern = r'^\d{17}[\dXx]$'
    return bool(re.match(pattern, id_card))

# 测试
id_cards = ["123456789012345678", "12345678901234567X"]
for id_card in id_cards:
    print(f"{id_card}: {'Valid' if validate_id_card(id_card) else 'Invalid'}")
123456789012345678: Valid
12345678901234567X: Valid

练习9:提取HTML标签

任务:从HTML文本中提取所有标签。

代码:

import re

html = "<html><head><title>Test</title></head><body><p>Hello, world!</p></body></html>"
# 此处编写正则表达式赋值给 pattern 
pattern = r'<[^>]+>'
matches = re.findall(pattern, html)
print("Extracted HTML tags:", matches)
Extracted HTML tags: ['<html>', '<head>', '<title>', '</title>', '</head>', '<body>', '<p>', '</p>', '</body>', '</html>']

练习10:提取信用卡号

任务:从文本中提取信用卡号(16位数字)。

代码:

import re

text = "我的信用卡号是1234567890123456,请勿泄露!"
# 此处编写正则表达式赋值给 pattern 
pattern = r'\d{16}'
matches = re.findall(pattern, text)
print("Extracted credit card numbers:", matches)
Extracted credit card numbers: ['1234567890123456']