We define \[ v_t = \nabla \log P_{1-t} f(W_t) = \frac{\nabla P_{1-t} f(W_t)}{P_{1-t} f(W_t)}\,, \] where $\\{P_t\\}$ is the Brownian semigroup defined by \[ P_t f(x) = \mathbb{E}[f(x + B_t)]\,. \]

Note that $v_t$ is almost surely constant conditioned on the past, hence the chain rule yields \begin{equation}\label{eq:chain} D(W_{[0,1]} \,\|\, B_{[0,1]}) = \frac12 \int_0^1 \mathbb{E}\,\|v_t\|^2\,dt\,. \end{equation} (See line (7) of Lemma 2 in the previous post. Note that $h(v_t)=0$ since $v_t$ is deterministic given the past.) We are left to show that $W_1$ has law $f \,d\gamma_n$ and $D(W_{[0,1]} \,\|\, B_{[0,1]}) = D(f d\gamma_n \,\|\,d\gamma_n)$.

We will prove the first fact using Girsanov's theorem to argue about the change of measure between $\{W_t\}$ and $\{B_t\}$. As in the previous post, we will argue somewhat informally using the heuristic that the law of $dB_t$ is a Gaussian random variable in $\mathbb R^n$ with covariance $dt \cdot I$. Itô's formula states that this heuristic is justified (see our use of the formula below).

The following lemma says that, given any sample path $\{W_s : s \in [0,t]\}$ of our process up to time $s$, the probability that Brownian motion (without drift) would have "done the same thing is $\frac{1}{M_t}$.We can also compute the (time-inhomogeneous) transition kernel $q_t$ of $\\{W_t\\}$: \[ q_t(x,y) = \frac{e^{-\|v_t dt + x - y\|^2/2dt}}{(2\pi dt)^{n/2}} = p(x,y) e^{-\frac12 \|v_t\|^2 dt} e^{-\langle v_t, x-y\rangle}\,. \] Here we are using that $dW_t = dB_t + v_t\,dt$ and $v_t$ is deterministic conditioned on the past, thus the law of $dW_t$ is a normal with mean $v_t\,dt$ and covariance $dt \cdot I$.

To avoid confusion of derivatives, let's use $\alpha_t$ for the density of $\mu_t$ and $\beta_t$ for the density of Brownian motion (recall that these are densities on paths). Now let us relate the density $\alpha_{t+dt}$ to the density $\alpha_{t}$. We use here the notations $\\{\hat W_t, \hat v_t, \hat B_t\\}$ to denote a (non-random) sample path of $\\{W_t\\}$: \begin{align*} \alpha_{t+dt}(\hat W_{[0,t+dt]}) &= \alpha_t(\hat W_{[0,t]}) q_t(\hat W_t, \hat W_{t+dt}) \\ &= \alpha_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{-\frac12 \|\hat v_t\|^2\,dt-\langle \hat v_t,\hat W_t-\hat W_{t+dt}\rangle} \\ &= \alpha_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{-\frac12 \|\hat v_t\|^2\,dt+\langle \hat v_t,d \hat W_t\rangle} \\ &= \alpha_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{\frac12 \|\hat v_t\|^2\,dt+\langle \hat v_t, d \hat B_t\rangle}\,, \end{align*} where the last line uses $d\hat W_t = d\hat B_t + \hat v_t\,dt$.

Now by ``heuristic'' induction, we can assume $\alpha_t(\hat W_{[0,t]})=\frac{1}{M_t} \beta_t(\hat W_{[0,t]})$, yielding \begin{align*} \alpha_{t+dt}(\hat W_{[0,t+dt]}) &= \frac{1}{M_t} \beta_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) e^{\frac12 \|\hat v_t\|^2\,dt+\langle \hat v_t, d \hat B_t\rangle} \\ &= \frac{1}{M_{t+dt}} \beta_t(\hat W_{[0,t]}) p(\hat W_t, \hat W_{t+dt}) \\ &= \frac{1}{M_{t+dt}} \beta_{t+dt}(\hat W_{[0,t+dt]})\,. \end{align*} In the last line, we used the fact that $p$ is the infinitesimal transition kernel for Brownian motion.
The log-Sobolev inequality yields quantitative convergence in the relative entropy
distance as follows:
Define the *Fisher information*
\[
I(f) = \int \frac{\|\nabla f\|^2}{f} \,d\gamma_n\,.
\]

One can check that $$ \frac{d}{dt} \mathrm{Ent}_{\gamma_n} (U_t f)\Big|_{t=0} = - I(f)\,, $$ thus the Fisher information describes the instantaneous decay of the relative entropy of $f$ under diffusion.

So we can rewrite the log-Sobolev inequality as: \[ - \frac{d}{dt} \mathrm{Ent}_{\gamma_n}(U_t f)\Big|_{t=0} \geq \mathrm{Ent}_{\gamma_n}(f)\,. \] This expresses the intuitive fact that when the relative entropy is large, its rate of decay toward equilibrium is faster.