\documentclass[letterpaper]{article}
\usepackage{amsmath,amssymb,verbatim}
% Note: order of packages is important
\usepackage[pdftex,letterpaper=true]{hyperref}
\hypersetup{pdfpagelayout=SinglePage,pdfstartview=FitV,
pdfpagemode=None,pdftitle=CSCI6220/4030 HW4,
pdfauthor=Alex Gittens}
\usepackage[letterpaper,height=8.8in]{geometry}
\usepackage{algorithm}
\usepackage{graphicx}
\usepackage{enumitem}
\usepackage[noend]{algpseudocode}
% make the option "empty" to get rid of page numbers
%\pagestyle{empty}
\newcommand{\R}{\ensuremath{\mathbb{R}}}
\newcommand{\E}{\ensuremath{\mathbb{E}}}
\newcommand{\Prob}{\ensuremath{\mathbb{P}}}
\newcommand{\Var}{\ensuremath{\mathrm{Var}}}
\newcommand{\e}{\ensuremath{\mathrm{e}}}
\DeclareMathOperator{\Div}{div}
\DeclareMathOperator{\median}{median}
\makeatletter
\def\pairmod#1#2{#1\mkern3mu{\operator@font mod}\mkern3mu #2}
\makeatother
\begin{document}
\begin{center}
\Large{CSCI 6220/4030: Homework 4}
\end{center}
Assigned Monday October 22 2018. Due at beginning of class Monday November 5 2018.
Remember to typeset your submission, and label it with your name. Please start early so you have ample time to see me during office hours.
Provide mathematically convincing arguments for the following problems. Ask me if you are unclear whether your arguments
are acceptable.
\begin{enumerate}[label=\arabic*.]
\item Arlie did well in a randomized algorithms course and goes on to land a well-paying job at company A. The first solo project Arlie is assigned is to design a load balancing scheme
for company A's new private cloud computing platform. To prevent information leakage, the load balancing algorithm must not use knowledge of the execution times of each
job, and because execution times could be inferred from the length of the job queues, it also must not use knowledge of the size of the job queues.
\begin{enumerate}
\item Arlie's first thought is to randomly assign jobs to servers as they come in. The CIO is convinced when Arlie argues that with high probability over the assignment of
the jobs, if $m$ jobs are assigned to $n$ servers, then the maximum loading is $O(\tfrac{m\log(n)}{n})$.
Prove Arlie correct: find constants $C$ and $c > 1$ such that the probability that the most loaded
server has more than $C\frac{m}{n}\log_2 n$ jobs is smaller than $n^{1-c\frac{m}{n}}.$
\item Unfortunately, the team assigned to Arlie's project incorrectly implements Arlie's algorithm. Their implementation actually assigns the $i$th job to a
uniformly randomly chosen one of the first $i$ servers. Surprisingly, this error is not caught in test runs. The CIO is very concerned when this error comes to light after the product's release, but Arlie argues that
randomized algorithms tend to be robust.
Prove Arlie correct: show that when this slightly different algorithm is used to assign $n$ jobs to $n$ servers, the maximum loading
is $O(\log n).$ Specifically, find constants $C$ and $c > 1$ such that the most loaded server has less than $C \log_2 n$ jobs with probability at least $n^{-c}.$
\end{enumerate}
\item Consider a random treap $T$ with $n$ vertices. As in the lectures, order the vertices in decreasing order of their search keys, so $x_1.k > \cdots > x_n.k$, and
the priorities of the vertices are i.i.d $\text{Uniform}[0,1]$ random variables.
\begin{enumerate}
\item What is the expected number of leaves in $T$?
\item What is the expected number of nodes in $T$ with two children?
\item What is the expected number of nodes in $T$ with exactly one child?
\item Prove that the expected number of proper descendants of any node in $T$ is exactly equal to the expected depth of that node.
\item What is the probability that $x_j$ is a common ancestor of $x_i$ and $x_k$?
\item What is the expected length of the unique path from $x_i$ to $x_k$ in $T$?
\end{enumerate}
\item Suppose we are given two skip lists, one storing a set $A$ of $m$ keys, the other storing a set $B$ of $n$ keys. Describe and analyze an algorithm to merge
these into a single skip list storing the set $A \cup B$ in $O(n+m)$ expected time. We do not assume that every key in $A$ is smaller than every key in $B$; the
two sets may be arbitrarily intermixed.
\item Suppose that we are using an open-addressed hash table of size m to store n items, where $n \leq m/2$. Assume an ideal random hash function. For any $i$, let $X_i$ denote the
number of probes required for the $i$th insertion into the table, and let $X = \max_i X_i$ denote the length of the longest probe sequence.
\begin{enumerate}
\item Prove that $\mathbb{P}[X_i > k] \leq \frac{1}{2^k}$ for all $i$ and $k$.
\item Prove that $\mathbb{P}[X_i > 2 \ln n] \leq \frac{1}{n^2}$ for all $i$.
\item Prove that $\mathbb{P}[X > 2 \ln n] \leq \frac{1}{n}$.
\item Prove that $\E[X] = O(\ln n)$.
\end{enumerate}
\item A radix tree over an alphabet of size $m$ is a tree data structure
where each node has up to $m$ children, each corresponding to one of
the letters in some alphabet. A word is represented by a node at the
end of a path whose edges are labeled with the letters in the word in
order. The only nodes created in the radix tree are those corresponding
to stored keys or ancestors of stored keys. Radix trees are
particularly useful for string matching and string completion problems.
See Figure~\ref{fig:radixtreeexample} for an example.
Suppose you have a radix tree into which you have already inserted $n$
strings of length $k$ from an alphabet of size $m$, generated uniformly
at random with replacement. What is the expected number of new nodes
you need to create to insert a new string of length $k$?
\begin{figure}
\centerline{\includegraphics[scale=.75]{radixtreeillust}}
\caption{A radix tree containing the words ``HEMP'', ``HOUR'', ``RAFT'', ``RAMP'', ``RIOT'', ``TOUR'', and ``TRIP''.}
\label{fig:radixtreeexample}
\end{figure}
\begin{comment}
\item Suppose that we start with a hash table of size 1 and double its size whenever two consecutive items hash to the same location. Assume that we are using an independently chosen
2-universal hash function for each table size. Show that the expected final table size is $O(n)$.
\end{comment}
\item {[{\bf required only for CSCI6220}]} Consider the following variant of multiplicative hashing, which uses slightly longer salt parameters. For any integers $a, b \in [2^{w + \ell}]$ where $a$ is odd, let
\[
h_{a,b}(x) = \left\lfloor \frac{\pairmod{(a x + b)}{2^{w + \ell}}}{2^w} \right\rfloor,
\]
and let $\mathcal{MB}^+ = \left\{ h_{a,b} \mid a,b \in [2^{w + \ell}]\text{ and $a$ is odd}\right\}$. Prove that the family of hash functions $\mathcal{MB}^+$ is strongly near-universal
(aka near-uniform):
\[
\mathbb{P}\left[ \{h(x) = i\} \cap \{h(y)=j\}\right] \leq \frac{2}{m^2}
\]
for all items $x \neq y$ and all (possibly equal) hash values $i$ and $j$.
\end{enumerate}
\end{document}