Strings¶

The Python string data type is a sequence made up of one or more individual characters that could consist of letters, numbers, whitespace characters, or symbols. Because a string is a sequence, it can be accessed in the same ways that other sequence-based data types are, through indexing and slicing.

r'ABC' - raw string. Python raw string treats backslash (\) as a literal character

u'ABC' or just 'ABC' - unicode string. Sequence of unicode characters. Each character may be encoded in several bytes

b'ABC' - bytes string. Each element is a byte or char. Can be stored on disk

f'ABC' - format string. Designed to apply formatting techniques

Raw String

print(r'a\nb') # escaping \

print('a\nb')  # reading \n as new-line symbol

Encode / Decode

# Unicode to bytes
a = 'Строка на русском'
a.encode()

# Bytes to Unicode
b'\xd0\x9f\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'.decode ()

Format string

# Formatting string without format or %. No more code duplication!

a = 'some variable'
print(f'y = {a + " Y"} + x')
print(a + " Y")

y = some variable Y + x
some variable Y

Arithmetic operations¶

Several arithmetic operations available when working with strings

# Concatenation
'abc' + 'def'

# Duplication
'abc' * 3

'abcabcabc'

Size of the string¶

len('abcdef')
list('abcdef')

['a', 'b', 'c', 'd', 'e', 'f']

Indexing¶

изображение.png

a = 'abcdef'
for symbol in a:
  print(symbol)

for index in range(len(a)):
  print(index, a[index])

String in python - immutable object¶

Result of all operations on string is another string. It is impossible to change the string in-place

a = 'abcdef'
#    012345
# a[0]  = 'A'
var = 1000
b = 'A' + a[1:]

Abcdef

Slices¶

We can also call out a range of characters from the string. With slices, we can call multiple character values by creating a range of index numbers separated by a colon [x:y]

# selections and slices
a = 'abcdef'
a[4]

# Up to the fourth symbol
a[:4]

# Indexing from the end
a[-3]

We can also specify the step of the slice. This parameter is called stride

# Step of the slice
a[1:4:2]

# String revert
a[::-1]

Sequences and strings¶

# joining string
",".join(['Hello', 'student'])

'Hello,student'

# splitting string
a = list("Hello, student")
a.split()

['H', 'e', 'l', 'l', 'o', ',', ' ', 's', 't', 'u', 'd', 'e', 'n', 't']

Case functions¶

# upper
a.upper()

'ABCDEF'

# lower
a.lower()

'abcdef'

a = "mysevEralwordtitle"
a.title()

'Myseveralwordtitle'

# Change case
a.swapcase()

'MY SEVeRAL WORD TITLE'

String type¶

# is alphabetic symbol
a.isalpha()

True

a[0].isdigit()

False

Find and replace¶

# Number of occurrences of substring
a.count('e')

# Return index of first occurrence
a.find('e')

a.replace('e', 'AAAAA')

in / not in¶

b_string = a
'a' not in b_string

False

'Z' not in a

True

Removing excess symbols¶

# Remove trailing symbols
a = '    some noizy string       \n'
print(a.strip())

some noizy string

a = '!!!!!some noizy string!!!!!!!'
a.rstrip('\n')

'!!!!!some noizy string!!!!!!!'

index = a.find('s')

Exercise¶

Repetition encoding is one of the ways to compress strings: sequence of the same elements is substituted with one symbol concatenated with the number of repetitions.

'A6B5C11' -> 'AAAAAABBBBBCCCCCCCCCCC'

Let's suppose the string may contain only alphabetic symbols A-Z or a-z

Read an input string, compressed using repetition encoding and decode it.

Part A¶

Numbers are 1-9

Part B¶

Any numbers

a.isalpha() # check if all characters of a string are letters
a.isdigit() # check if all characters of a string are digits

Second maximum¶

Find a second maximum in a sequence of numbers (like previously, numbers are entered, 0 means end of sequence, 0 is not a part of the sequence)

Input example code is available below

current = int(input())
minimum = current
while current != 0:
    # your code
    current = int(input())

print(minimum)