Sets¶

new_set = {1,2,3,1,2,3} # braces
new_set

another_set = set([3,1,2])
print(another_set)

new_set == another_set

print(set(range(10)))
print(set('qwetyqwerty')) # random order
print(set(map(abs,(1,2,3,-1,-2,-3))))

mixed_set = {'2',0,False,(3,2,1)} # no list inside
mixed_set

mixed_set = {1,'2',True,(1,2,3), [1, 2]} # list can't be inside, this will give an error
mixed_set

Hash-function does not guarantee that if $A > B$ then $h (A) > h (B)$ ¶

print(sorted(set('qwertyqwertyaa')))

for i in set((1,2,3,4,5,11,2,23,1,2,-1,-1)):
    print(i,end='\t')

if 1 in {1,2,3}:
    print('YES')

if 1 in {2,3} & {1,5}:
    print('YES')

if 1 in {2,3} | {1,5}:
    print('YES')

print({2,3,1} | {1,5})

if 1 in {2,3} - {1,5}:
    print('YES')

if 1 in {2,3} ^ {1,5}: # in A|B but not in A&B (xor)
    print('YES')

Sets methods¶

t_set = set(range(5))
t_set

len(t_set)

t_set.discard(1)

Removing¶

t_set.remove(1) # throws an error if can't find the element
t_set.discard(2) 
t_set.pop() # you will not know what item gets removed

t_set.clear()
t_set

del t_set

Dictionaries¶

# JSON
phones = {"Mom": ["+79999999999","+79999999998"],
          "Bro":"+71111111111",
          "Dad":"+77777777777"}
print(*phones)

Mom Bro Dad

print(phones['Mom'],phones.get('Mom'))
print(phones.get('Mom','No such number'))

squares = {}
squares[0] = 0
squares[1] = 1
squares[2] = 4

squares['surprise'] = 'here'
print(squares)

sqrts = {}
sqrts[1] = 1
sqrts[4] = 2
sqrts[9] = 3
sqrts

del sqrts

dict([[1,2],(2,3),['no','yes']])

{1: 2, 2: 3, 'no': 'yes'}

print(phones.items())
my_list = [4124, 1251, 5321]
for index, value in enumerate(my_list):
  print(index, value)
for name,number in phones.items():
    print(name,number,sep=' <<<<<<->>>>>> ')

dict_items([('Mom', ['+79999999999', '+79999999998']), ('Bro', '+71111111111'), ('Dad', '+77777777777')])
0 4124
1 1251
2 5321
Mom <<<<<<->>>>>> ['+79999999999', '+79999999998']
Bro <<<<<<->>>>>> +71111111111
Dad <<<<<<->>>>>> +77777777777

phones.values(),phones.keys()

(dict_values([['+79999999999', '+79999999998'], '+71111111111', '+77777777777']),
 dict_keys(['Mom', 'Bro', 'Dad']))

phones['Mo'] = '+70000000000'
phones

for i in phones:
    print(i)
    print(phones[i])

Mom
['+79999999999', '+79999999998']
Bro
+71111111111
Dad
+77777777777

for key, value in phones.items():
    print(key, 'is', value)

Mom is ['+79999999999', '+79999999998']
Bro is +71111111111
Dad is +77777777777

phones.items()

deleted = phones.pop('Bro')

sorted(phones) # keys

Capitals = dict()


Capitals['Russia'] = 'Moscow'
Capitals['Ukraine'] = 'Kiev'
Capitals['USA'] = 'Washington'

Countries = ['Russia', 'France', 'USA', 'Russia']

for country in Countries:
    if country in Capitals:
        print('Capital of ' + country + ': ' + Capitals[country])
    else:
        print('Not found for ' + country)

Capital of Russia: Moscow
Not found for France
Capital of USA: Washington
Capital of Russia: Moscow

Capitals = {'Russia': 'Moscow', 'Ukraine': 'Kiev', 'USA': 'Washington'}
Capitals = dict(Russia = 'Moscow', Ukraine = 'Kiev', USA = 'Washington')
Capitals = dict([("Russia", "Moscow"), ("Ukraine", "Kiev"), ("USA", "Washington")])
Capitals = dict(zip(["Russia", "Ukraine", "USA"], ["Moscow", "Kiev", "Washington"]))
print(list(zip(["Russia", "Ukraine", "USA"], ["Moscow", "Kiev", "Washington"])))

[('Russia', 'Moscow'), ('Ukraine', 'Kiev'), ('USA', 'Washington')]

A = {'ab' : 'ba', 'aa' : 'aa', 'bb' : 'bb', 'ba' : 'ab', 'ac' : 'ca'}

key = 'ac'
if key in A:
    del A[key]
try:
    del A[key]
except KeyError:
    print('There is no element with key "' + key + '" in dict')
print(A)

A = dict(zip('abcdef', list(range(6))))
for key in A:
    print(key, A[key])

A = dict(zip('abcdef', list(range(6))))
for key, val in A.items():
    print(key, val)

Tasks¶

Task 1¶

You are given two pieces of text, both potentially containing many words. Output the number of unique words that appear in both texts and the sorted list of such words. (use sets and then sorted(my_set))

Example

Text A

Hurricane Gonzalo was the second tropical cyclone, after Hurricane Fay, to directly strike the island of Bermuda in a one-week time frame in October 2014, and was the first Category 4 Atlantic hurricane since Hurricane Ophelia in 2011. At the time, it was the strongest hurricane in the Atlantic since Igor in 2010. Gonzalo struck Bermuda less than a week after the surprisingly fierce Hurricane Fay; 2014 was the first season in recorded history to feature two hurricane landfalls in Bermuda. A powerful Atlantic tropical cyclone that wrought destruction in the Leeward Islands and Bermuda, Gonzalo was the seventh named storm, sixth and final hurricane and only the second major hurricane of the below-average 2014 Atlantic hurricane season. The storm formed from a tropical wave on October 12, while located east of the Lesser Antilles. It made landfall on Antigua, Saint Martin, and Anguilla as a Category 1 hurricane, causing damage on those and nearby islands. Antigua and Barbuda sustained US$40 million in losses, and boats were abundantly damaged or destroyed throughout the northern Leeward Islands. The storm killed three people on Saint Martin and Saint Barthélemy. Gonzalo tracked northwestward as it intensified into a major hurricane. Eyewall replacement cycles led to fluctuations in the hurricanes structure and intensity, but on October 16, Gonzalo peaked with maximum sustained winds of 145 mph (230 km/h).

Text B

Hurricane Ophelia was the most intense hurricane of the 2011 Atlantic hurricane season. The seventeenth tropical cyclone, sixteenth tropical storm, fifth hurricane, and third major hurricane, Ophelia originated in a tropical wave in the central Atlantic, forming approximately midway between the Cape Verde Islands and the Lesser Antilles on September 17. Tracking generally west-northwestward, Ophelia was upgraded to a tropical storm on September 21, and reached an initial peak of 65 mph (100 km/h) on September 22. As the storm entered a region of higher wind shear it began to weaken, and was subsequently downgraded to a remnant low on September 25. The following day, however, the remnants of the system began to reorganize as wind shear lessened, and on September 27, the National Hurricane Center once again began advisories on the system. Moving northward, Ophelia regained tropical storm status early on September 28, and rapidly deepened to attain its peak intensity with maximum sustained winds of 140 mph (220 km/h) several days later. The system weakened as it entered cooler sea surface temperatures and began a gradual transition to an extratropical cyclone, a process it completed by October 3.

Output

31
Atlantic Hurricane Islands Lesser October Ophelia The a and as cyclone, hurricane hurricane, in it major maximum mph of on season. storm storm, sustained the to tropical was wave winds with

Task 1a¶

Modify the solution to the last task so that the same word written in different case (e.g. 'Appear' and 'appear') is counted as the same word.

When outputting the list of common words, you may use any case.

Hash-functions¶

def myHASHer(string):
    hash_string = ''
    for i in string:
        hash_string +='{:03d}'.format(ord(i))
            
    return (hash_string)

hash_string = myHASHer('Hello, HSE!)_:2020')
hash_string

def unHASHer(hash_string):
    string = ''
    for i in range(len(hash_string)//3):
        string += chr(int((hash_string)[3*i:3*i+3]))
    return string

unHASHer(hash_string)

import hashlib
hashlib.algorithms_available

a ='''
{'blake2b',
 'blake2b512',
 'blake2s',
 'blake2s256',
 'md4',
 'md5',
 'md5-sha1',
 'ripemd160',
 'sha1',
 'sha224',
 'sha256',
 'sha3-224',
 'sha3-256',
 'sha3-384',
 'sha3-512',
 'sha384',
 'sha3_224',
 'sha3_256',
 'sha3_384',
 'sha3_512',
 'sha512',
 'sha512-224',
 'sha512-256',
 'shake128',
 'shake256',
 'shake_128',
 'shake_256',
 'sm3',
 'whirlpool'}
 '''
print(hashlib.sha256(a.encode()).hexdigest())
print(hashlib.sha256('hellp'.encode()).hexdigest())
print(hashlib.sha256('hello'.encode()).hexdigest())
print(hashlib.sha256('hello'.encode()).hexdigest())

12352356473273

The ideal cryptographic hash function has five main properties:¶

it is deterministic so the same message always results in the same hash
it is quick to compute the hash value for any given message
it is infeasible to generate a message from its hash value except by trying all possible messages
a small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value
it is infeasible to find two different messages with the same hash value

https://en.wikipedia.org/wiki/Cryptographic_hash_function