How to Draw Tree Given Keys
Practice: Binary Search Trees¶
Difficulty assessment:
(*) Basic. You should be able to easily do this.
(**) Standard. Core material to test your understanding.
(***) Advanced. Possibly slightly above the level of the assignments.
Exercise 1 ( * ):
(a) Insert items with the following keys (in the given order) into an initially empty binary search tree: $30$, $40$, $24$, $58$, $48$, $26$, $11$, $13$. Draw the tree after any two insertions.
(b) Choose a set of $7$ distinct, positive, integer keys. Draw binary search trees for your set of height $2$, $5$, and $6$.
Solution:
Any set of seven distinct integers will work. Let's choose $\{1, 2, 3, 4, 5, 6, 7\}$. We will tackle this problem by considering what a final solution would look like. So, we will first construct three trees of heights $2$, $5$, and $6$, and then will fill in the values in the nodes of the trees in the order of inorder walk, thus ensuring that the binary-search-tree property is satisfied.
First, for height $2$, the only option is the complete binary tree:
For height $5$, we start with a chain of six nodes (which will give us a tree of height $5$), and add the last node such that we don't increase the height. For example, we can add the last node as the second child of the root:
For height $6$, the only option is the chain of seven nodes:
Exercise 2 ( * ):
For each of the algorithms PreorderTreeWalk, InorderTreeWalk, and PostorderTreeWalk answer the following questions:
(a) Does the algorithm print the keys of elements stored in a binary search tree in a sorted order? Why or why not?
Solution:
InorderTreeWalk does (as discussed in the lecture), PreorderTree- Walk and PostorderTreeWalk do not. For example, for the tree from Exercise 1, we get the following orders:
PreorderTreeWalk will print the sequence $30, 24, 11, 13, 26, 40, 58, 48$, and PostorderTreeWalk will print the sequence $13, 11, 26, 24, 48, 58, 40, 30$.
(b) Does the algorithm print the keys of elements stored in a min-heap in a sorted order? Why or why not?
Solution:
No for all three. For example, for the following min-heap:
-
InorderTreeWalk will print $42, 40, 46, 24, 26, 11, 48, 30, 58$,
-
PreorderTreeWalk will print the sequence $11, 24, 40, 42, 46, 26, 30, 48, 58$, and
-
PostorderTreeWalk will print the sequence $42, 46, 40, 26, 24, 48, 58, 30, 11$.
Note: Recall the $\Omega(n \log n)$ lower bound on comparison-based sorting. What would the answer yes to this problem imply for the lower bound on sorting?
Exercise 3 ( ** )::
(a) The height of a node $v$ in a rooted tree $T$ is defined as the number of edges on the longest simple downwards path from the node $v$ to a leaf. Write an algorithm that calculates the height of all nodes in a binary tree and has running time $O(n)$. Do not forget to prove the correctness of your algorithm and to argue that it indeed runs in $O(n)$ time.
Solution:
When considering binary search trees, we have a choice: to ignore the existence of None-leafs or not. In this problem we will take the None leafs of a binary search tree into account, and count the number of edges from a node to a None-leaf to compute the node's height. Then, the height of a None-leaf is 0. If we know the heights $h_\ell$ and $h_r$ of a node's left and right child, we can compute the height of the node itself by taking the maximum of $h_\ell$ and $h_r$, and adding 1. So, we can modify the PostorderTreeWalk to compute the heights of all nodes in the following way:
Proof of correctness: We prove the correctness of the algorithm using strong induction on the height $h$ of the tree. (Note: There are other options here, like induction on the number of nodes, or structural induction.)
Namely, we claim that for any value of $h \le 0$, calling ComputeHeight on the root $x$ of a subtree of height $h$ correctly determines the height values $y.h$ for all nodes $y$ in the subtree rooted at $x$, and moreover, it correctly returns the height of the subtree rooted at $x$.
Base case: $h = 0$. In this case $x$ is a None-leaf, and the algorithm correctly returns $0$.
Inductive step:
Induction hypothesis: Assume that ComputeHeight works correctly for all subtrees of height $0 \le k < h$ for some fixed $h > 0$.
Let $x$ denote the root of the subtree $T_x$ and assume that $T_x$ has height $h$. Let $T_\ell$ and $T_r$ denote the subtrees rooted at x.left and x.right, respectively. By definition, $height(T_x) = \max(height(T_ell); height(T_r)) + 1$. Since $T_x$ has height greater than 0, the algorithm does not return in the first line of the algorithm but proceeds to line 2. The heights of the subtrees $T_\ell$ and $T_r$ are strictly smaller than $h$, and therefore, by induction hypothesis, ComputeHeight correctly determines the values $h_\ell$ and $h_r$ in lines $2$ and $3$.
Therefore, the algorithm also computes x.h correctly, and thus also returns the correct value in line $5$.
Running time: The algorithm calls ComputeHeight exactly once on every node. Each call takes constant time (excluding the recursive calls which we charge to the roots of the corresponding subtrees). Thus the overall running time is $O(1)\cdot n = O(n)$.
Note: The exercise asks to argue the running time. In this case I don't expect a formal proof; an argument as given above is sufficient. Below I include some notes on a formal proof:
Let $T(n)$ denote the running time of the algorithm on a tree with $n$ nodes. When the ComputeHeight is called on a leaf, it returns in the first line. Therefore, it takes a constant time to run. So, let $T(1) = c$ for some positive constant $c$. Suppose now ComputeHeight is called on a node $x$ with $n$ nodes in its subtree. Then, ComputeHeight performs some constant amount of work, and then recursively is called on the two children of $x$. Let $k$ be the size of the subtree rooted at $x$'s left child. Then we get the following recursion:
$T(n) = T(k) + T(n - k - 1) + d$,
for some positive constant $d$. We can show that $T(n) = O(n)$ by substitution method. Note: In a formal proof you would then here additionally prove this statement using an induction.
(b) The depth of a node $v$ in a rooted tree $T$ is defined as the number of edges on the simple path from the root of $T$ to the node $v$. Write an algorithm that calculates the depths of all nodes in a binary tree and has running time $O(n)$. Do not forget to argue the correctness of your algorithm and to argue that it indeed runs in $O(n)$ time.
Solution:
Unlike the height of a node, the depth of a node $x$ does not depend on the subtree rooted at $x$, but on the path from $x$ to the root of the tree. Thus, we need to know the depth of a parent of $x$ to compute the depth of $x$. We will modify the PreorderTreeWalk in the following way:
Proof of correctness: The correctness follows from the definition of the depth. The length of a simple path from a node $x$ to the root of the tree is one larger than the length of the path from the parent of $x$ to the root of the tree. When computing the depth of a node $x$, the depth of a parent of $x$ has already been computed, as the depth of a node is computed, before the recursive call to ComputeDepth on its left and right children.
Running time: The algorithm calls ComputeDepth exactly once on every node. Each call takes constant time (excluding the recursive calls which we charge to the roots of the corresponding subtrees). Thus the overall running time is $O(1)\cdot n = O(n)$.
Exercise 4 ( ** ):
Given a set of elements with unique integer keys stored in a binary search tree $T$, propose an algorithm that takes an integer $k$ as an input and returns the element from $T$ with the minimum key among all the elements whose keys are not smaller than $k$ (or None if no such key exists). (Hint: Modify the Tree-Search or Iterative-Tree-Search algorithm.)
Solution:
Recall that TreeSearch returns the node with key $k$ if it exists, and None otherwise. If it returns None, it does so when reaching a None-leaf. If there is a node with key $k$, we still want to return the corresponding node.
Thus we only need to change the return-value for the case that the search reaches a leaf. Imagine inserting a node $x_k$ with the key $k$, we would do so at the None-leaf that the search ends in. Now, the node we want to return is the successor of $x_k$. Since $x_k$, does not have a right child, the successor is the lowest ancestor such that $x_k$ is in the subtree of the left child. All ancestors of $x_k$ are on the search path. Thus, in other words, the successor is the last node on the search path for which the search continued in the left subtree.
We do not actually insert $x_k$, but during the search we store an additional variable $succ$, which we initially set to Null, and always update to the current node, if we continue into the left subtree. This gives the following algorithm:
Correctness:
There are three cases:
1.) There is a node with key $k$. Then the search returns the same value as the regular TreeSearch, namely a node with key $k$. Thus, in this case the algorithm is correct.
2.) $k$ is larger than all keys in the tree. In this case the search eventually hits a leaf, and returns $succ$. The variable $succ$ is intially set to None and never updated, since the search always goes to the right subtree. Thus, the algorithm returns None, which is correct for this case
3.) $k$ is not an existing key, and there are keys larger than $k$. By the argument given with the explanation of the algorithm, we want to return the successor of $k$ (if we would insert a node with key $k$). As argued, it is the last node $x$, at which the algorithm goes into the left subtree. This is also the last time that the algorithm updates the variable $succ$, namely to $succ = x$. Thus, $x$ is returned, which is correct for this case.
Running time: Up to a constant factor the running time is the same as for TreeSearch. Thus, the running time is $O(h)$, which is $O(\log n)$ if the tree is a balanced binary search tree.
Source: https://www.win.tue.nl/~kbuchin/teaching/JBP030/notebooks/P-bst-sol.html