Hi everyone! this is Abdul from Pythonest

dictionaries are playing a crucial role in the entire python’s implementation

so it is very important to understand this data structure very well in order

to write the performant programs in Python. so let’s try to deeply understand

this data structure. Dictionaries are very good at element

insertion, deletion and retrieving and provide 0(1) time complexity for

all of these operations.Hash Tables are the engines behind python’s

high-performance dictionaries let’s take a look at an example to understand how

the implementation of dictionaries works behind the scene, so we have a dictionary

of Fruit names so behind the scene a hash function will be applied to this

data structure and creates a hash table where keys have been assigned an integer

index by applying a hash function so the hash function is the most important part

of this entire workflow a hash function Maps a large space of inputs to

the smallest space of output so it means that the hash function can be used to

convert a large space of keys to a small space of integer indices of indexes the

built-in hash function works directly with built-in types and falls back to

calling the dunder hash for user-defined types if two objects compare equal

their hash values must also be equal otherwise the hash table algorithm

doesn’t work for example because one equal to equal to 1.0 is true in Python

so the hash of 1 equal to equal to hash of 1.0 must also be true even though the

internal representations of an integer and a float are very different so let’s

dive a little bit deeper into hash table in the context of inserting and

retrieving for dictionaries in order to create a hash table from scratch we

start with some allocated memory blocks similar to what we started for arrays

the placement of the new data is dependent on two properties of the data

for inserting the hash value of the key and how the value compares to other

objects this is because when we insert data the key is first hashed

and masked so that it turns into an effective index in an array the mask

makes sure that the hash value which can take the values of any integer fits with

the allocated number of buckets so in short the mask is the reserved number

of buckets minus one so for example if we have allocated eight blocks of memory and our hash value is for example twenty eight thousand nine hundred and seventy

five we will consider the bucket at index twenty thousand nine hundred and

seventy-five module 0b111 and 0b triple one is the binary representation

of number 7 which we can get from the number of allocated memory blocks

minus 1 and if our dictionary has grown to require 512 blocks of memory then the

mask becomes 0b triple 1 triple 1 and triple 1 which is the binary

representation of number 511 so actually we have done with the first step of the

insertion operation now we must check if this bucket is

already in use, if it is empty we can insert the key and value into this block

of memory, if it is used and the value of that bucket is equal to the value we

wish to insert then the key-value pair is already in the hash table and we can

simply return that but one another scenario is also existed here so think

about if both of these values are not equal or not similar then we must find

a new place to put the data to find the new index. we compute a new index using a simple linear function a method called probing python’s probing mechanism adds a contribution for the higher bits of the original hash. Using these higher-order

bits gives each hash a different sequence of next possible hashes which

helps to avoid future collision there’s a lot of freedom when picking the

algorithm to generate a new index however it’s quite important that the

scheme visit every possible index in order to evenly distribute the data in

the table how well distributed the data is throughout the hash table is called

the load factor. The linear probing only deals with the last several bytes of the

hash and disregards the rest but the PERTURB scheme that Python uses will

start taking into consideration more bits from the item’s hashes in order to

resolve this problem a similar procedure is done when we are performing lookups

on a specific key the given key is transformed into an index and that index

is examined if the key in that matches then we can return that value if it

doesn’t we keep creating new indices using the same scheme until either find

the data or hit an empty bucket and in the case of an empty bucket we can

conclude that the data doesn’t exist in the table so I think that’s enough

explanation about the entire process let’s try to elaborate it with the help of

an example so let’s say we have a dictionary with the following items as Roam for value 4, San Francisco for value 3 New York for value 5 and

Barcelona for value 2 let’s perform the process of insertion and lookups

step by step so in the very first step we will reserve a continuous sequence of

buckets in memory let’s save reserve 8 then we have to

define the mask so by taking 8 blocks of memory means we have a mask 8 – 1 is equal to 7 and its binary representation is 0b111 then in

the third step to insert the first item which is Roam we will calculate the hash

of the Roam with the help of hash function and our mask and it will be

2 and we will perform the same process for San Francisco New York and Barcelona

but for the Barcelona if we calculate its index once again using the hash

function and mask it will also be 2 which is similar to the index of Roam so at

this stage we have a collision problem here

so to solve this issue we’ll use the PERTURB scheme which will calculate the

new index using the previous hashes so I will place it before the New York don’t

worry about it right now you will see the reason

shortly so that’s the complete process of insertion and lookups let’s talk

about the deletion process when a value is deleted from a hash table we cannot

simply write null to that bucket of memory this is because we have used

nulls as a sentinel value while probing for hash collision as a result we must

write a special value that signifies that the bucket is empty but there still

may be a value after it to consider when resolving a hash collision these empty

slots can be written to it in the future and are removed when the hash

table is resized so that was the reason for placing Barcelona just before the

New York instead of just after San Francisco okay let’s talk about the

resizing as more items are inserted into the hash table the table itself must be

resized to accommodate it it can be shown that a table that is no more than

2/3 full will have optimal space savings while still having a good bound on the

number of collisions to expect. Thus when a table reaches this critical point it

is grown in order to do this a larger table is allocated the mask is adjusted

to fit the new table and all elements of the old table re-inserted into the new

one this requires recomputing indices since the change to mask will change

the resulting index as a result resizing large hash tables can be quite expensive

however since we only do this resizing operation when the table is too small as

opposed to on every insert the amortized cost of an insert is still O(1) by

default the smallest size of a dictionary or set is 8

on resize the number of buckets increases by four times until we reach

50,000 elements after which the sizes increased by two times this gives the

following possible sizes eight then it’s goes to the 32 then it’s going to 128

then 512 then 2048 and goes on it is important to note that resizing can

happen to make a hash table larger or smaller that is if sufficiently many

elements of a hash table are deleted the table can be scaled down in size

however resizing only happens during an insert operation that’s great we have

learned a lot so let’s try to write some code and implement our own custom

dictionary in Python so you will be able to understand all the concepts

practically so I will define a class named as MyDict

I’m not using any base class then lets initialize some values like size and

hashmap inside the __init__ magic function it will take the context and

inside that function I will define the size of the dictionary if we remember

that the default size at the beginning is 8 and I will use the same size so I

will say self.size=8 then we will reserve the memory buckets of

that size as a hash map okay let’s define our hashing function named

as custom_hash which will take the context and the key and we

will produce the index by calculating the hash of the key divided by the size of

the reserved buckets – 1 and return that key after that I will define the

insert function named as insert_item which will take the

context, key and value now inside this function we will get the index from our

custom_hash function and set the key_exists to false

because we are assuming that the memory bucket is empty then get that bucket of

memory from hash map as bucket equal to self.hashmap[hash_key ] and

then iterate the key value in the case of existence using a for loop and set the

key_exists to true and break the loop or save the item in the

either case we will append the new key value into the bucket that’s for the

insertion now I will define a function to get an item named as get_item

which will take the context and the key for lookups and inside that function

once again we will get the index and memory buckets and return the value in

case of existence or the key error in case of not exists by using a for loop

that’s great we almost done with our own implementation of a dictionary let’s

create an instance of our custom dictionary class

as d=MyDict() and insert an item by calling its insert_ item

method and print out the result you can see the result here if you remember

there in pythons’ implementation of dictionary we use lists like syntax to

insert or get our items but here in the case of our own custom implementation we are calling corresponding functions directly to achieve this functionality

we have to utilize some magic functions from Python like __set_item__ and __get_item__ and point the relative functions to them that’s great

we have learned a lot I’m stopping it here just to avoid more complexity but

will implement the delete function also in the text version of this tutorial

I’ll post the link in the description below so check it out if you want more

explanation and the delete function so that’s enough for this video if you

liked this video be sure to subscribe to my channel and hit the bell icon so you

will never miss any fantastic video in the future thanks for watching

very well explained, thank You!

Great waiting for more videos