I

submitted a preprint on matrix factorisations and linear filters. I managed to derive some factorisation algorithms as linear filtering algorithms.

In the paper, I left a discussion to here, estimating parameters via maximising marginal likelihood. So here it is.

In the paper, I consider the following probabilistic model,
\begin{align}
p(c) &= \NPDF(c; c_0, V_0 \otimes I) \label{priorC} \\
p(y_k|c,x_k) &= \NPDF(y_k; (x_k \otimes I_m) c, \lambda \otimes I). \label{LikelihoodCform1}
\end{align}
where each $x_k$ is a static model parameter vector, and $C$ is the latent matrix. Note that $(x_k^\top \otimes I_m) c = C x_k$ (see the paper). We would like to solve the following problem to estimate $x_k$ for each $k$,
\begin{align*}
x_k^* = \argmax_{x_k} \log p(y_k | y_{1:k-1},x_k).
\end{align*}
So we would like to obtain the incremental marginal likelihood $p(y_k | y_{1:k-1},x_k)$. We define it as,
\begin{align*}
p(y_k | y_{1:k-1},x_k) = \int p(y_k | c, x_k) p(c | y_{1:k-1}) \mbox{d}c
\end{align*}
where $c = \vect(C)$. The important thing here is $p(c | y_{1:k-1})$ is not the exact distribution but it is an approximation to the true posterior as one has to use $x_{k-1}^*$ to obtain the distribution (If $X$ were known, it would be exact). So we know from the paper,
\begin{align*}
p(c | y_{1:k-1}) = \NPDF(c; c_{k-1}, V_{k-1} \otimes I_m),
\end{align*}
and,
\begin{align*}
p(y_k | c, x_k) = \NPDF(y_k; (x_k \otimes I_m) c, \lambda \otimes I_m).
\end{align*}
Using formulas in Bishop's PRML (2006, page 93), we can arrive the marginal $p(y_k|y_{1:k-1},x_k)$ as the following,
\begin{align*}
p(y_k | y_{1:k-1},x_k) = \NPDF(y_k; (x_k^\top \otimes I_m) c_k, (\lambda \otimes I_m) + (x_k^\top \otimes I_m) (V_{k-1} \otimes I_m) (x_k \otimes I_m)).
\end{align*}
Using properties given in the Section I.A of the paper, we compactly obtain,
\begin{align}\label{MarginalLikelihoodOfMFRLF}
p(y_k | y_{1:k-1},x_k) = \NPDF(y_k; C_{k-1} x_k, (\lambda + x_k^\top V_{k-1} x_k) \otimes I_m)
\end{align}
Although integration is analytic, maximising the logarithm of \eqref{MarginalLikelihoodOfMFRLF} is not possible analytically. However the likelihood is differentiable, so one can use nonlinear optimisation methods. But in terms of performance this does not bring any advantage: One gains a bit in the overall error part but loses too much in terms of computation time. I have tried quasi-Newton methods (BFGS), and conjugate gradients but no cure.