Tree and Hash


TREE AND HASH


In this post, we will talk about TREE AND HASH in Data Structure. If you have learned about linked lists, queues, and stacks, we know that those data structures are called "linear" data structures because they all have a logical start and a logical end. However, tree and hash is not a "linear" data structure. 

TREE


A tree is a collection of nodes connected by directed (or undirected) edges. A tree stored its data in a hierarchy way. The easiest way to imagine it is by using a family tree. A tree can be empty with no nodes or a tree is a structure consisting of one node called the root and zero or one or more subtrees. 


A tree has the following general properties:
  • One node is distinguished as a root.
  • Every node (exclude a root) is connected by a directed edge from exactly one other node; A direction is: parent -> children 

Terminology summary
Root is the topmost node of the tree
·      Edge is the link between two nodes
·      Child is a node that has a parent node
·      Parent is a node that has an edge to a child node
·      Leaf is a node that does not have a child node in the tree
·      Height is the length of the longest path to a leaf
·      Depth is the length of the path to its root

Binary Trees

A binary tree is a tree data structure in which each node has at the most two children, which are referred to as the left child and the right child. A tree is represented by a pointer to the topmost node in tree. If the tree is empty, then the value of root is NULL. A Binary Tree node contains following parts:
1.     Data
2.     Pointer to left child
3.     Pointer to right child

A Binary Tree can be traversed in two ways:
1.     Depth First Traversal: 
·    In a preorder transversal, we visit any given node before we visit its children
·    In a postorder traversal, we visit each node only after we visit its children (and their subtrees).
·    In a inorder traversal, we first visit the left child (including its enitre subtree), the visit the parent node and finally visit the right child (including its entire subtree).
2.     Breadth-First Traversal: Level Order Traversal

There are three different types of binary trees that will be discussed in this lesson:
1.       Full binary tree: Every node other than leaf nodes has 2 child nodes.
2.       Complete binary tree: All levels are filled except possibly the last one, and all nodes are filled in as far left as possible.
3.       Perfect binary tree: All nodes have two children and all leaves are at the same level.

Binary Search Tree


In Binary Search Tree is a Binary Tree with following additional properties:
·      The left subtree of a node contains only nodes with keys less than the node’s key.
·      The right subtree of a node contains only nodes with keys greater than the node’s key.
·      The left and right subtree each must also be a binary search tree.


Binary Heap

A Binary Heap is a Binary Tree with following properties.
1.     It’s a complete tree (All levels are completely filled except possibly the last level and the last level has all keys as left as possible). This property of Binary Heap makes them suitable to be stored in an array.
2.     A Binary Heap is either Min Heap or Max Heap. In a Min Binary Heap, the key at root must be minimum among all keys present in Binary Heap. The same property must be recursively true for all nodes in Binary Tree. Max Binary Heap is similar to Min Heap. It is mainly implemented using array.


HASH


If in tree the data is stored hierarchically, hash table stores data in an associative manner.  In a hash table, data is stored in an array format, where each data value has its own unique index value. Thus, it becomes a data structure in which insertion and search operations are very fast irrespective of the size of the data.


Hashing 

Hashing is a technique to convert a range of key values into a range of indexes of an array. We're going to use modulo operator to get a range of key values. Hashing has two main applications. Hashed values can be used to speed data retrieval, and can be used to check the validity of data.


Hash Table 

A hash table is a data structure that is used to store keys/value pairs. It uses a hash function to compute an index into an array in which an element will be inserted or searched. Each position of the hash table, often called a slot, can hold an item and is named by an integer value starting at 0.




Hash Function

A hash function is any function that can be used to map a data set of an arbitrary size to a data set of a fixed size, which falls into the hash table. To achieve a good hashing mechanism, It is important to have a good hash function with the following basic requirements: 
·      Easy to compute: It should be easy to compute and must not become an algorithm in itself.
·      Uniform distribution: It should provide a uniform distribution across the hash table and should not result in clustering.
·      Less collisions: Collisions occur when pairs of elements are mapped to the same hash value. These should be avoided.

Following are the basic primary operations of a hash table:

  • Search − Searches an element in a hash table.
  •  Insert − inserts an element in a hash table.
  • Delete − Deletes an element from a hash table.

Collision 


Since a hash function gets us a small number for a key which is a big integer or string, there is a possibility that two keys result in the same value. The situation where a newly inserted key maps to an already occupied slot in the hash table is called collision and must be handled using some collision handling technique.

Following are the ways to handle collisions:
1.     Open hashing or Separate chaining. Separate chaining is one of the most commonly used collision resolution techniques. It is usually implemented using linked lists. In separate chaining, each element of the hash table is a linked list. To store an element in the hash table you must insert it into a specific linked list. If there is any collision (i.e. two different elements have same hash value) then store both the elements in the same linked list.

Advantages:
·         Simple to implement.
·         Hash table never fills up, we can always add more elements to the chain.
·         Less sensitive to the hash function or load factors.
·         It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.

Disadvantages:
·         Cache performance of chaining is not good as keys are stored using a linked list. Open addressing provides better cache performance as everything is stored in the same table.
·         Wastage of Space (Some Parts of hash table are never used).
·         If the chain becomes long, then search time can become O(n) in the worst case.
·         Uses extra space for links.

2.     Open Addressing. In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NULL. When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table.

·         Linear probing. In open addressing, instead of in linked lists, all entry records are stored in the array itself. When a new entry has to be inserted, the hash index of the hashed value is computed and then the array is examined (starting with the hashed index). If the slot at the hashed index is unoccupied, then the entry record is inserted in slot at the hashed index else it proceeds in some probe sequence until it finds an unoccupied slot.
·         Quadratic probing. Quadratic probing is similar to linear probing and the only difference is the interval between successive probes or entry slots. Here, when the slot at a hashed index for an entry record is already occupied, you must start traversing until you find an unoccupied slot. The interval between slots is computed by adding the successive value of an arbitrary polynomial in the original hashed index.
·         Double hashing is similar to linear probing and the only difference is the interval between successive probes. Here, the interval between probes is computed by using two hash functions.

Comments

Popular posts from this blog

Even Semester Full Summary

Linked List and Stack and Queue

First Part of Even Semester