Python 파이썬으로 String 문자열 처리

IT/Python

Python 파이썬으로 String 문자열 처리

Uncle D. 2023. 6. 23. 20:49

이번에는 Python 의 기본적인 데이터 유형 중의 하나인 String(문자열) Data 에 대해서 처리하기 위해 제공되는 기능과 방법을 소개하겠습니다. 파이썬에서 문자열 연산을 위해 일반적으로 사용되는 라이브러리와 함수는 데이터 유형에 기본적으로 제공되는 Built-in Function, String 을 위해 제공되는 Method, Regular Expression 을 위한 Module 이 있습니다.

Built-in Functions

Python 에 내장된 Function 으로 정의된 Data Type 에 대해서 기본적으로 제공되는 기능입니다.

len(string): 문자열의 길이를 반환한다.

str( ), repr( ) : 값을 문자열 표현으로 변환한다.

format( ): 변수 대체를 위해 자리 표시자가 있는 문자열을 형식화합니다. (%d, %s, 등을 사용하여 변수 대입)

split() : 문자열 분리

join() : 문자열 합치기

strip() : 문자열 공백 제거

replace() : 문자열 부분 대체/교환

lower(), upper() : case 변경 기능

좀 더 상세한 정보는 아래 링크에서 확인해보시길 바랍니다.

https://docs.python.org/3/library/functions.html

Built-in Functions

The Python interpreter has a number of functions and types built into it that are always available. They are listed here in alphabetical order.,,,, Built-in Functions,,, A, abs(), aiter(), all(), a...

docs.python.org

String Method

String Class (Library) 에서 제공되는 Method 인데, 내장형(built-in) 함수보다는 조금 더 확장된 기능을 제공합니다.

string.upper(), string.lower(): 대문자 또는 소문자에 모든 문자가 있는 새 문자열을 반환

string.strip(), string.rstrip(), string.lstrip(): string에서 leading and trailing whitespace 또는 특정 문자를 제거

string.split(sep): 지정된 구분자가 발생할 경우 문자열을 부분 문자열 목록으로 분할

string.join(iterable): 문자열을 구분자로 사용하여 반복 가능한의 요소를 하나의 문자열로 결합

string.replace(old, new): substring의 발생을 새로운 substring으로 대체

string.find(sub), string.rfind(sub): string에서 substring이 처음 또는 마지막으로 발생하는 index를 찾기 string.startswith(접두사), string.endswith(접두사): 문자열이 지정된 접두사나 접미사로 시작하거나 끝나는지 확인한다.

string.isalpha(), string.isdigit(), string.isalnum() 등: 문자열이 알파벳 문자, 숫자, 영숫자 등으로 구성 확인

https://docs.python.org/3/library/string.html

string — Common string operations

Source code: Lib/string.py String constants: The constants defined in this module are: Custom String Formatting: The built-in string class provides the ability to do complex variable substitutions ...

docs.python.org

Regular Expression (re module)

정규표현식을 통해 패턴 형식으로 문자열을 처리할 수 있습니다. 이 경우, 정규표현식(Regular Expression) 처리를 위한 re module import 가 필요하고, 아래 추가한 내용과 같이 다양한 형식의 문자열을 매칭하거나 처리할 수 있어 확장성이 극대화 될 수 있습니다.

re.match(), re.search(), re.findall(), re.sub() 등의 함수는 규칙식을 이용한 문자열의 패턴 매칭과 조작이 가능합니다.

정규 표현식 관련한 상세 정보는 아래에 이전 포스팅한 글에서 살펴보시면 좋을 것 같습니다.

Python 파이썬으로 정규 표현식 사용 - re module

Regular Expression (정규 표현식) 파이썬에서 정규 표현식에 사용되는 기본 라이브러리는 're 모듈'이다. 주로 문자열에 대한 패턴(정규식)을 사용하여 데이터로 부터 검색, 대체 등의 기능 및 가능한

blogwhatever.tistory.com

실제로 re module 로 string data 를 처리할 때에는 아래 Pattern 으로 하나의 pattern object 를 만들어서 해당되는 string data 를 찾거나 교체할 수 있어 유용합니다.

Pattern	Description
.	Matches any character except a newline.
^	Matches the start of a string.
$	Matches the end of a string.
\A	Matches only at the start of a string.
\Z	Matches only at the end of a string.
[abc]	Matches any single character from the given set.
[^abc]	Matches any single character that is not in the given set.
[0-9]	Matches any digit from 0 to 9.
\d	Matches any digit (equivalent to [0-9]).
\D	Matches any non-digit character.
\w	Matches any alphanumeric character (word character).
\W	Matches any non-alphanumeric character.
\s	Matches any whitespace character (space, tab, newline, etc.).
\S	Matches any non-whitespace character.
*	Matches zero or more occurrences of the preceding pattern.
+	Matches one or more occurrences of the preceding pattern.
?	Matches zero or one occurrence of the preceding pattern.
{n}	Matches exactly n occurrences of the preceding pattern.
{n,}	Matches n or more occurrences of the preceding pattern.
{n,m}	Matches between n and m occurrences of the preceding pattern.
(...)	Creates a capturing group. The matched substring can be extracted using .group() method.
(?P<name>...)	Creates a named capturing group. The matched substring can be extracted using .group('name') method.
(?i)	Enables case-insensitive matching.
(?m)	Enables multi-line mode, where ^ and $ match the start and end of each line.
(?s)	Enables dot-all mode, where . matches any character, including a newline.
(?x)	Enables verbose mode, allowing whitespace and comments within the pattern.

'string' module

문자열 모듈은 문자열.ascii_letters, 문자열.digits 등 문자열 연산을 위한 다양한 상수와 도우미 기능을 제공한다.

예제 코드

import string
import re

# String functions
text = "Hello, world!"
print(len(text))  # Output: 13
print(text.upper())  # Output: HELLO, WORLD!
print(text.split(","))  # Output: ['Hello', ' world!']
print(text.replace("Hello", "Hi"))  # Output: Hi, world!

# String methods
text = "   Python is fun!   "
print(text.strip())  # Output: "Python is fun!"
print(text.startswith("Python"))  # Output: False
print(text.endswith("!"))  # Output: True

# Regular expressions
pattern = r"\b[A-Z]\w+\b"  # Matches capitalized words
text = "Python is an Amazing Language"
matches = re.findall(pattern, text)
print(matches)  # Output: ['Python', 'Amazing', 'Language']

# String module
print(string.ascii_letters)  # Output: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
print(string.digits)  # Output: 0123456789

이상입니다.

읽어주셔서 고맙습니다. 작성 후 리뷰 중에 추가할 내용이 있으면 계속 업데이트 해나가겠습니다.

저작자표시 비영리 변경금지 (새창열림)

'IT > Python' 카테고리의 다른 글

Python 파이썬으로 정규 표현식 사용 - re module (0)	2023.06.22
Python 파이썬으로 HTML 다루기 - BeautifulSoup (0)	2023.06.21
Python 파이썬으로 Excel 다루기 - openpyxl (0)	2023.06.18
Python 파이썬 기본 문법 (0)	2023.06.18
Python 파이썬 프로그래밍 소개 (0)	2023.06.18

현재글Python 파이썬으로 String 문자열 처리

다양한 주제에 대한 관심사, 그리고 여행과 프로그래밍을 함께 하는 인생을 꿈꾸고 있습니다.

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

BlogWhatever