We now take a short detour away from mirror descent, and instead examine how a special class of metric spaces called ultrametrics control the competitive ratio of MTS in general metric spaces. Hold onto your seats; we’re about to compress two decades and 10 papers into a few paragraphs. For the sake of continuity, bibliographic remarks are held to the end of the post. $\def\e{\varepsilon}$

Metric approximations

For a metric space $(X,d)$, let $\alpha_{\mathrm{mts}}(X,d)$ denote the best (randomized) competitive ratio for MTS on $(X,d)$. Suppose $D$ is some other metric on $X$ that satisfies

$$\label{eq:distortion} \frac{D(x,y)}{K} \leq d(x,y) \leq D(x,y) \qquad \forall x,y \in X\,.$$

Then it is straightforward to verify that $\alpha_{\mathrm{mts}}(X,d) \leq K \cdot \alpha_{\mathrm{mts}}(X,D)$.

But there is a weaker form of approximation that still allows for such a conclusion. Suppose that $\mathbf{D}$ is a random metric that satisfies, for every $x,y \in X$:

1. With probability one, $\mathbf{D}(x,y) \geq d(x,y)$,
2. $\mathbb{E}\left[\mathbf{D}(x,y)\right] \leq K \cdot d(x,y)$.

If $\alpha_{\mathrm{mts}}(X,\mathbf{D}) \leq \alpha$ with probability one, we claim that $\alpha_{\mathrm{mts}}(X,d) \leq K \alpha$.

The algorithm achieving this is as follows: Sample the random metric $\mathbf{D}$. Now given a cost sequence $\left\langle c_t : X \to \mathbb{R}_+ \mid t \geq 1\right\rangle$, let $\left\langle x_0, x_1, x_2, \ldots \right\rangle$ be the random sequence of points produced by an $\alpha$-competitive randomized algorithm for $(X,\mathbf{D})$. Then:

$\mathbb{E}\left[\sum_{t=1}^T \left.\vphantom{\bigoplus} \mathbf{D}(x_t, x_{t-1}) \ \right| \mathbf{D}\right] \leq \alpha \sum_{t=1}^T \mathbf{D}\!\left(x_t^{*}, x_{t-1}^{*}\right) + O(1)\,,$

where $\left\langle x_t^* : t \geq 0\right\rangle$ is an optimal offline sequence for $(X,d)$. (This may not be the optimal sequence for $(X,\mathbf{D})$, but the algorithm is certainly competitive against non-optimal sequences as well.)

Note that the expectation here is taken only with respect to the randomness in the online algorithm. If we take expectation with respect to $\mathbf{D}$ as well, it follows that

$\mathbb{E}\left[\sum_{t=1}^T d(x_t, x_{t-1})\right] \leq \alpha \mathbb{E}\left[\sum_{t=1}^T \mathbf{D}\left(x_t^{*}, x_{t-1}^{*}\right)\right] + O(1) \leq K \alpha \sum_{t=1}^T d\!\left(x_t^{*}, x_{t-1}^{*}\right) + O(1)\,,$

where we have used property (1) of $\mathbf{D}$ for the LHS and property (2) to bound the RHS.

Ultrametrics

Let $T=(V,E)$ be a finite, rooted tree, and let $\mathcal{L}$ denote the leaves of $T$. Suppose $w : V\setminus \mathcal{L} \to \mathbb{R}_+$ is a function that assigns positive weights to the internal vertices of $T$ such that the vertex weights are non-increasing along root-leaf paths. Then one can define a distance on $\mathcal{L}$ by [ d_w\left(\ell,\ell’\right) \mathrel{\vcenter{:}}= w\left(\mathrm{lca}(\ell,\ell’)\right). ] This is an ultrametric on $\mathcal{L}$ (and all finite ultrametrics arise in this way).

It turns out that ultrametrics essentially control the competitive ratio for metrical task systems on finite metric spaces. This follows from the next two facts that hold for an arbitrary $n$-point metric space $(X,d)$.

There is a random ultrametric $\mathbf{D}$ on $(X,d)$ such that (1) and (2) are satisfied with $K \leq O(\log n)$.

By our earlier remarks, this implies that the competitive ratio for MTS on $(X,d)$ is at most $O(\log n)$ times the competitive ratio for $n$-point ultrametrics.

There is a subset $X' \subseteq X$ with $|X'| \geq \sqrt{n}$ and an ultrametric $D$ on $X'$ such that \eqref{eq:distortion} is satisfied with $K \leq O(1)$.

Since $\alpha_{\mathrm{mts}}(X,d) \geq \alpha_{\mathrm{mts}}(X’,d) \geq \Omega(\alpha_{\mathrm{mts}}(X’,D))$, lower bounds on the competitive ratio for ultrametrics yield lower bounds for $(X,d)$ as well. Finally, we remark that MTS on ultrametrics is now well-understood.

If $(X,D)$ is an $n$-point ultrametric, then $\Omega\left(\frac{\log n}{\log \log n}\right) \leq \alpha_{\mathrm{mts}}(X,D) \leq O(\log n)\,.$

There are ultrametrics (e.g., as we have seen already, when $(X,D)$ is the uniform metric) for which the $O(\log n)$ upper bound in Theorem 3 is tight. Whether the LHS can be made $\Omega(\log n)$ is an intriguing open problem; we will address it in the next lecture. In conjunction with Theorem 1 and Theorem 2, this yields:

For any $n$-point metric space $(X,d)$: $\Omega\left(\frac{\log n}{\log \log n}\right) \leq \alpha_{\mathrm{mts}}(X,d) \leq O((\log n)^2)\,.$

Perhaps the central remaining open question for MTS on general metric spaces is whether the upper bound can be improved to $O(\log n)$.

Ultrametrics from partitions

Consider a partition $P$ of $X$. Say that $P$ is $\Delta$-bounded if $S \in P \implies \mathrm{diam}_X(S) \leq \Delta$. For a point $x \in X$, we write $P(x)$ for the unique set in $P$ that contains $x$.

Suppose now that $\mathcal{P} = \{ P_j : j \in \mathbb{Z} \}$ is a sequence of partitions of $X$ such that $P_j$ is $8^j$-bounded for every $j \in \mathbb{Z}$, and define the metric

$D_{\mathcal{P}}(x,y) \mathrel{\vcenter{:}}= \max \left\{ 8^{j+1} : P_j(x) \neq P_j(y) \right\}.$

One can check that $D_{\mathcal{P}}$ is an ultrametric on $X$ (imagine the tree structure induced by the partitions), and moreover

$$\label{eq:exp} D_{\mathcal{P}}(x,y) \geq d(x,y) \qquad \forall x,y \in X\,.$$

This follows from: $D_{\mathcal{P}}(x,y) \leq 8^j \implies P_j(x)=P_j(y) \implies d(x,y) \leq 8^j$.

Define $B(x,r) \mathrel{\vcenter{:}}= \{ y \in X : d(x,y) \leq r \}$ to be the ball of radius $r$ about $x \in X$.

Suppose $(X,d)$ is an $n$-point metric space. Then for every $\Delta > 0$, there is a random $\Delta$-bounded partition $P$ of $X$ such that for every $r \leq \Delta/8$:

$$\label{eq:rp} \mathbb{P}\left[\vphantom{\bigoplus} B(x,r) \subseteq P(x)\right] \geq \exp\left(\frac{-8r}{\Delta} \log \frac{|B(x,\Delta)|}{|B(x,\Delta/8)|}\right).$$

Remarkably, the random partitioning lemma can be used to prove both Theorem 1 and Theorem 2. We will first establish these consequences and then prove the lemma.

Approximation by a random ultrametric

Let’s prove Theorem 1. Let $\mathcal{P} = \{ P_j : j \in \mathbb{Z} \}$ be the random sequence where $P_j$ results from the random partitioning lemma applied with $\Delta = 8^j$ and we take the partitions to be mutually independent. Then from \eqref{eq:exp}, we know $\mathbf{D}_{\mathcal{P}}(x,y) \geq d(x,y)$ for all $x,y \in X$. Now fix $x \neq y \in X$, and let $j_0 \mathrel{\vcenter{:}}= \min \{ j : 8^{j} \geq d(x,y) \}$. Then:

$\mathbb{E}\left[\mathbf{D}_{\mathcal{P}}(x,y)\right] \leq 8^{j_0+1} + \sum_{j > j_0} \mathbb{P}[P_j(x) \neq P_j(y)] 8^{j+1},$

and using $\mathbb{P}[P_j(x) = P_j(y)] \geq \mathbb{P}[B(x, d(x,y)) \subseteq P_j(x)]$ yields

\begin{align*} \sum_{j > j_0} \mathbb{P}[P_j(x) \neq P_j(y)] 8^{j+1} &\leq \sum_{j \in \mathbb{Z}} 8^{j+2} \frac{d(x,y)}{8^{j}} \log \frac{|B(x, 8^j)|}{|B(x,8^{j-1})|} \\ &\leq 64\, d(x,y) \sum_{j \in \mathbb{Z}} \log \frac{|B(x, 8^j)|}{|B(x,8^{j-1}|} \\ &= 64 \,d(x,y) \log n\,. \end{align*}

where the second inequality follows from \eqref{eq:rp} and the fact that $e^{-x} \geq 1-x$, and in the last inequality we evaluate a telescoping sum.

Finding a large approximate ultrametric inside $X$

Now we prove Theorem 2. Let $\mathcal{P}$ be the same random partition sequence chosen above and fix some $0 < \e < 1/8$. Define the random subset:

$\mathbf{S} \mathrel{\vcenter{:}}= \left\{ x \in X : B(x, \e 8^j) \subseteq P_j(x) \ \forall j \in \mathbb{Z}\right\}\,.$

We claim that

$D_{\mathcal{P}}(x,y) \geq d(x,y) \geq \frac{\e}{8} D_{\mathcal{P}}(x,y) \qquad \forall x,y \in \mathbf{S}\,.$

The LHS is simply from \eqref{eq:exp}. For the RHS, observe that if $D_{\mathcal{P}}(x,y)=8^{j+1}$, then $P_j(x) \neq P_j(y)$, hence $B(x,\e 8^j) \subseteq P_j(x) \implies d(x,y) \geq \e 8^j$.

Now the random partitioning lemma gives, for any $x \in X$,

$\mathbb{P}[x \in \mathbf{S}] \geq \prod_{j \in \mathbb{Z}} \exp\left(- 8\e \log \frac{|B(x,8^j)|}{|B(x,8^{j-1})|}\right) = \exp\left(-8 \e \log n\right) = n^{-8\e}\,.$

By linearity of expectation: $\mathbb{E}[|\mathbf{S}|] \geq n^{1-8\e}$. Taking $\e \mathrel{\vcenter{:}}= 1/16$ completes the proof.

Proof of the random partitioning lemma

Let $\mathbf{R} \in [\Delta/4,\Delta/2]$ be chosen uniformly, and let $X = \{x_1, x_2, \ldots, x_n\}$ be a uniformly random ordering of the points in $X$. Our random partitioning is formed by iteratively carving out balls:

$$\label{eq:ckr} P \mathrel{\vcenter{:}}= \left\{ B(x_i, \mathbf{R}) \setminus \bigcup_{j < i} B(x_j, \mathbf{R}) : i=1,2,\ldots,n\right\}.$$

Clearly $P$ is $\Delta$-bounded by construction.

Fix $r \leq \Delta/8$ and observe first that

$\mathbb{P}\left[B(x,r) \subseteq P(x) \mid \mathbf{R}\right] \geq \frac{|B(x,\mathbf{R}-r)|}{|B(x,\mathbf{R}+r)|}.$

This follows because if we condition on $\mathbf{R}$, then only centers in $B(x,\mathbf{R}+r)$ can decide the fate of $B(x,r)$, and the corresponding cluster will contain all of $B(x,r)$ if the center lies in $B(x,\mathbf{R}-r)$.

Thus we have:

\begin{align*} \mathbb{P}\left[B(x,r) \subseteq P(x)\right] &\geq \mathbb{E}\left[\frac{|B(x,\mathbf{R}-r)|}{|B(x,\mathbf{R}+r)|}\right] \\ &= \mathbb{E}\left[\exp\left(- \log \frac{|B(x,\mathbf{R}+r)|}{|B(x,\mathbf{R}-r)|}\right)\right] \\ &\geq \exp \left(\mathbb{E}\left[- \log \frac{|B(x,\mathbf{R}+r)|}{|B(x,\mathbf{R}-r)|}\right]\right) \\ &\geq \exp\left(\frac{- 8 r}{\Delta} \log \frac{|B(x,\Delta)|}{|B(x,\Delta/8)|}\right)\,, \end{align*}

where the second inequality uses convexity of $e^{-x}$ and the last inequality comes from the calculation

\begin{align*} \mathbb{E}\left[ \log \frac{|B(x,\mathbf{R}+r)|}{|B(x,\mathbf{R}-r)|}\right] &= \frac{4}{\Delta} \int_{\Delta/4}^{\Delta/2} \log \frac{|B(x,R+r)|}{|B(x,R-r)|}\,dR \\ &\leq \frac{8r}{\Delta} \log \frac{|B(x,\Delta/2+r)|}{|B(x,\Delta/4-r)|} \\ &\leq \frac{8r}{\Delta} \log \frac{|B(x,\Delta)|}{|B(x,\Delta/8)|}\,, \end{align*}

using $r \leq \Delta/8$.

Historical remarks

The random embedding theorem (Theorem 1) is due to Fakcharoenphol, Rao, and Talwar. The first such result was proved by Bartal, and he later obtained a near-optimal bound of $O(\log n \log \log n)$ on the distortion. The use of random tree approximations of arbitrary metric spaces for online algorithms arose somewhat earlier in the work of Alon, Karp, Peleg, and West, specifically in relation to the $k$-server problem.

The metric Ramsey theorem (Theorem 2) is a result of Bartal, Linial, Mendel, and Naor, improving over an earlier bound of Bartal, Bollobas, and Mendel who established that one can take $|X’| \geq \exp(c \sqrt{\log n})$ for some $c > 0$.

The upper bound in Theorem 3 is from a forthcoming paper with Bubeck, Cohen, and Y. T. Lee, and improves the $O(\log n \log \log n)$ upper bound of Fiat and Mendel to the optimal value (up to a constant factor). The lower bound in Theorem 3 is from the aforementioned paper of Bartal, Bollobas, and Mendel; we will discuss it in the next lecture.

The random partitioning lemma and the proof of Theorem 2 presented here come from a paper of Mendel and Naor. This proof of the random partitioning lemma is somewhat cleaner than the one presented there. The (now famous) distribution on random partitions described in \eqref{eq:ckr} is from Calinescu, Karloff, and Rabani.