Brad Bell Senior Principal Mathematician bradbell@uw.edu Phone 206-543-6855 |

Research Interests

Optimization, Statistics, Numerical Methods, Software Engineering

Biosketch

Dr. Bell's graduate research was in optimization. Since then much of his research has concerned applications of optimization and numerical methods to data analysis. This includes procedures for properly weighting data from various sources, Kalman smoothing algorithms for the nonlinear case, and procedures for estimating signal arrival times.

He designed a compartmental modeling program that is used by researchers around the world to analyze the kinetics of drugs and naturally occurring compounds in humans and other animals. He is currently working on algorithms that estimate the variation between realizations of scientific models, with special emphasis on modeling pharmacokinetics. He joined APL-UW in 1978.

Education

B.A. Mathematics and Physics, Saint Lawrence University, 1973

M.A. Mathematics, University of Washington, 1976

Ph.D. Mathematics, University of Washington, 1984

Publications |
2000-present and while at APL-UW |

The connection between Bayesian estimation of a Gaussian random field and RKHS Aravkin, A.Y., B.M. Bell, J.V. Burke, and G. Pillonetto, "The connection between Bayesian estimation of a Gaussian random field and RKHS," IEEE Trans. Neural Networks Learn. Syst., 26, 15-18, doi:10.1109/TNNLS.2014.2337939, 2015. |
More Info |
1 Jul 2015 |
|||||||

Reconstruction of a function from noisy data is key in machine learning and is often formulated as a regularized optimization problem over an infinite-dimensional reproducing kernel Hilbert space (RKHS). The solution suitably balances adherence to the observed data and the corresponding RKHS norm. When the data fit is measured using a quadratic loss, this estimator has a known statistical interpretation. Given the noisy measurements, the RKHS estimate represents the posterior mean (minimum variance estimate) of a Gaussian random field with covariance proportional to the kernel associated with the RKHS. In this brief, we provide a statistical interpretation when more general losses are used, such as absolute value, Vapnik or Huber. Specifically, for any finite set of sampling locations (that includes where the data were collected), the maximum a posteriori estimate for the signal samples is given by the RKHS estimate evaluated at the sampling locations. This connection establishes a firm statistical foundation for several stochastic approaches used to estimate unknown regularization parameters. To illustrate this, we develop a numerical scheme that implements a Bayesian estimator with an absolute value loss. This estimator is used to learn a function from measurements contaminated by outliers. |

Distributed Kalman smoothing in static Bayesian networks Pillonetto, G., B.M. Bell, and S. Del Favero, "Distributed Kalman smoothing in static Bayesian networks," Automatica, 49, 1001-1011, doi:10.1016/j.automatica.2013.01.016, 2013. |
More Info |
1 Apr 2013 |
|||||||

This paper considers the state-space smoothing problem in a distributed fashion. In the scenario of sensor networks, we assume that the nodes can be ordered in space and have access to noisy measurements relative to different but correlated states. The goal of each node is to obtain the minimum variance estimate of its own state conditional on all the data collected by the network using only local exchanges of information. We present a cooperative smoothing algorithm for Gauss–Markov linear models and provide an exact convergence analysis for the algorithm, also clarifying its advantages over the Jacobi algorithm. Extensions of the numerical scheme able to perform field estimation using a grid of sensors are also derived. |

An l1-Laplace robust Kalman smoother Bell, B., G. Pillonetto, and J. Burke, "An l1-Laplace robust Kalman smoother," IEEE Trans. Autom. Control, 56, 2898-2911, doi: 10.1109/TAC.2011.2141430, 2011. |
More Info |
1 Dec 2011 |
|||||||

Robustness is a major problem in Kalman filtering and smoothing that can be solved using heavy tailed distributions; e.g., l |

Learning using state space kernel machines Aravkin, A., B. Bell, J.V. Burke, and G. Pillonetto, "Learning using state space kernel machines," In Proc., 18th World Congress of the International Federation of Automatic Control, 28 August - 2 September, Milan, Italy, doi: 10.3182/20110828-6-1002.02726 (IFAC/Elsevier, 2011). |
More Info |
28 Aug 2011 |
|||||||

Reconstruction of a function from noisy data is often formulated as a regularized optimization problem whose solution closely matches an observed data set and also has a small reproducing kernel Hilbert space norm. The loss functions that measure agreement between the data and the function are often smooth (e.g. the least squares penalty), but non-smooth loss functions are of interest in many applications. Using the least squares penalty, large machine learning problems with kernels amenable to a stochastic state space representation (which we call state space kernel machines) have been solved using a Kalman smoothing approach. In this paper we extend this approach for state space kernel machines to the Vapnik penalty, a particularly important non-smooth penalty that is robust to outlier noise in the measurements and induces a sparse representation of the reconstructed function. We exploit the structure of such models using interior point methods that efficiently solve the functional recovery problem with the Vapnik penalty, and demonstrate the effectiveness of the method on a numerical experiment. The computational effort of our approach scales linearly with the number of data points. |

PhilSea10 APL-UW Cruise Report: 5-29 May 2010 Andrew, R.K., J.A. Mercer, B.M. Bell, A.A. Ganse, L. Buck, T. Wen, and T.M. McGinnis, "PhilSea10 APL-UW Cruise Report: 5-29 May 2010," APL-UW TR 1001, October 2010. |
More Info |
30 Oct 2010 |
|||||||

A team from the Applied Physics Laboratory of the University of Washington (APL-UW) conducted underwater sound propagation exercises from 5 to 29 May 2010 aboard the R/V Roger Revelle in the Philippine Sea. This research cruise was part of a larger multi-cruise, multi-institution effort, the PhilSea10 Experiment, sponsored by the Office of Naval Research, to investigate the deterministic and stochastic properties of long-range deep ocean sound propagation in a region of energetic oceanographic processes. The primary objective of the APL-UW cruise was to transmit acoustic signals from electro-acoustic transducers suspended from the R/V Roger Revelle to an autonomous distributed vertical line array (DVLA) deployed in March by a team from the Scripps Institution of Oceanography (SIO.) The DVLA will be recovered in March 2011. |

An inequality constrained nonlinear Kalman-Bucy smoother by interior point likelihood maximization Bell, B.M., J.V. Burke, G. Pillonetto, "An inequality constrained nonlinear Kalman-Bucy smoother by interior point likelihood maximization," Automatica, 45, 25-33, doi:10.1016/j.automatica.2008.05.029, 2009. |
More Info |
1 Jan 2009 |
|||||||

Kalman–Bucy smoothers are often used to estimate the state variables as a function of time in a system with stochastic dynamics and measurement noise. This is accomplished using an algorithm for which the number of numerical operations grows linearly with the number of time points. All of the randomness in the model is assumed to be Gaussian. Including other available information, for example a bound on one of the state variables, is non trivial because it does not fit into the standard Kalman–Bucy smoother algorithm. In this paper we present an interior point method that maximizes the likelihood with respect to the sequence of state vectors satisfying inequality constraints. The method obtains the same decomposition that is normally obtained for the unconstrained Kalman–Bucy smoother, hence the resulting number of operations grows linearly with the number of time points. We present two algorithms, the first is for the affine case and the second is for the nonlinear case. Neither algorithm requires the optimization to start at a feasible sequence of state vector values. Both the unconstrained affine and unconstrained nonlinear Kalman–Bucy smoother are special cases of the class of problems that can be handled by these algorithms. |

Algorithmic differentiation of implicit functions and optimal values Bell, B.M., and J.V. Burke, "Algorithmic differentiation of implicit functions and optimal values," in Advances in Automatic Differentiation, eds., C. Bischof, H. Bucker, P. Hovland, U. Naumann, and J. Utke, 67-78, doi:10.1007/978-3-540-68942-3_7 (Springer, 2008). |
31 Dec 2008 |

Approximating the Bayesian inverse for nonlinear dynamical systems Bell, B.M., and G. Pillonetto, "Approximating the Bayesian inverse for nonlinear dynamical systems," J. Phys.: Conf. Ser., 124, doi:10.1088/1742-6596/012009, 2008. |
More Info |
21 Aug 2008 |
|||||||

We are given a sequence of measurement vectors and a possibly nonlinear relation to a corresponding sequence of state vectors. We are also given a possibly nonlinear model for the dynamics of the state vectors. The goal is to estimate (invert) for the state vectors when there is noise in both the measurements and the dynamics of the state. In general, obtaining the minimum variance (Bayesian) estimate of the state vectors is difficult because it requires evaluations of high dimensional integrals with no closed analytic form. We use a block tridiagonal Cholesky algorithm to simulate from the Laplace approximation for the posterior of the state vectors. These simulations are used as a proposal density for the random-walk Metropolis algorithm to obtain samples from the actual posterior. This provides a means to approximate the minimum variance estimate, as well as confidence intervals, for the state vector sequence. Simulations of a fed-batch bio-reactor model are used to demonstrate that this approach obtains better estimates and confidence intervals than the iterated Kalman smoother. |

Optimal smoothing of non-linear dynamic systems via Monte Carlo Markov chains Pillonetto, G., and B.M. Bell, "Optimal smoothing of non-linear dynamic systems via Monte Carlo Markov chains," Automatica, 44, 1676-1685, 10.1016/j.automatica.2007.10.028, 2008. |
More Info |
1 Jul 2008 |
|||||||

We consider the smoothing problem of estimating a sequence of state vectors given a nonlinear state space model with additive white Gaussian noise, and measurements of the system output. The system output may also be nonlinearly related to the system state. Often, obtaining the minimum variance state estimates conditioned on output data is not analytically intractable. To tackle this difficulty, a Markov chain Monte Carlo technique is presented. The proposal density for this method efficiently draws samples from the Laplace approximation of the posterior distribution of the state sequence given the measurement sequence. This proposal density is combined with the Metropolis-Hastings algorithm to generate realizations of the state sequence that converges to the proper posterior distribution. The minimum variance estimate and confidence intervals are approximated using these realizations. Simulations of a fed-batch bioreactor model are used to demonstrate that the proposed method can obtain significantly better estimates than the iterated Kalman-Bucy smoother. |

Bayes and empirical Bayes semi-blind deconvolution using eigenfunctions of a prior covariance Pilonetto, G., and B.M. Bell, "Bayes and empirical Bayes semi-blind deconvolution using eigenfunctions of a prior covariance," Automatica, 43, 1698-1712, doi:10.1016/j.automatica.2007.02.025, 2007. |
More Info |
1 Oct 2007 |
|||||||

We consider the semi-blind deconvolution problem; i.e., estimating an unknown input function to a linear dynamical system using a finite set of linearly related measurements where the dynamical system is known up to some system parameters. Without further assumptions, this problem is often ill-posed and ill-conditioned. We overcome this difficulty by modeling the unknown input as a realization of a stochastic process with a covariance that is known up to some finite set of covariance parameters. We first present an empirical Bayes method where the unknown parameters are estimated by maximizing the marginal likelihood/posterior and subsequently the input is reconstructed via a Tikhonov estimator (with the parameters set to their point estimates). Next, we introduce a Bayesian method that recovers the posterior probability distribution, and hence the minimum variance estimates, for both the unknown parameters and the unknown input function. Both of these methods use the eigenfunctions of the random process covariance to obtain an efficient representation of the unknown input function and its probability distributions. Simulated case studies are used to test the two methods and compare their relative performance. |

Estimating in vitro mitochondrial oxygen consumption during muscle contraction and recovery: A novel approach that accounts for diffusion Dash, R.K., B.M. Bell, M.J. Kushmerick, and P. Vicini, "Estimating in vitro mitochondrial oxygen consumption during muscle contraction and recovery: A novel approach that accounts for diffusion," Ann. Biomed. Eng., 33, 343-355, 2005 |
More Info |
30 Mar 2005 |
|||||||

A deconvolution algorithm, based on a Bayesian statistical framework and smoothing spline technique, is applied to reconstructing input functions from noisy measurements in biological systems. Deconvolution is usually ill-posed. However, placing a Bayesian prior distribution on the input function can make the problem well-posed. Using this algorithm and a computational model of diffusional oxygen transport in an approximately cylindrical muscle (about 0.5-mm diameter and 10-mm long mouse leg muscle), the time course of muscle oxygen uptake and mitochondrial oxygen consumption, both during isometric twitch contractions (at various frequencies) and the recovery period, is estimated from polarographic measurements of oxygen concentration on the muscle surface. An important feature of our experimental protocol is the availability of data for the apparatus characteristics. From these time courses, the actual mitochondrial consumption rates during resting and exercise states can be estimated. Mitochondrial oxygen consumption rate increased during stimulation to a maximum steady state value approximately five times of the resting value of 0.63 nmol/s/g wet weight for the stimulation conditions studied. Diffusion slowed the kinetic responses to the contraction but not the steady state fluxes during the stimulation interval. |

Contributions of the input signal and prior activation history to the discharge behaviour of rat motoneurones Powers, R.K., Y. Dai, B.M. Bell, D.B. Percival, and M.D. Binder, "Contributions of the input signal and prior activation history to the discharge behaviour of rat motoneurones," J. Physiol., 562, 707-724, doi:10.1113/jphysiol.2004.069039, 2005 |
1 Feb 2005 |

Estimating parameters and stochastic functions of one variable using nonlinear measurement models Bell, B.M., and G. Pillonetto, "Estimating parameters and stochastic functions of one variable using nonlinear measurement models," Inverse Probl., 20, 627-646, doi:10.1088/0266-5611/20/3/001, 2004. |
More Info |
5 Mar 2004 |
|||||||

Estimating an unknown function of one variable from a finite set of measurements is an ill-posed inverse problem. Placing a Bayesian prior on a function space is one way to make this problem well-posed. This problem can turn out well-posed even if the relationship between the unknown function and the measurements, as well as the function space prior, contains unknown parameters. We present a method for estimating the unknown parameters by maximizing an approximation of the marginal likelihood where the unknown function has been integrated out. This is an extension of marginal likelihood estimators for the regularization parameter because we allow for a nonlinear relationship between the unknown function and the measurements. The estimate of the function is then obtained by maximizing its |

Deconvolution of non-stationary physical signals: A smooth variance model for insulin secretion rate Pillonetto, G., and B.M. Bell, "Deconvolution of non-stationary physical signals: A smooth variance model for insulin secretion rate," Inverse Probl., 20, 367-383, doi: 10.1088/0266-5611/20/2/004, 2004. |
More Info |
16 Jan 2004 |
|||||||

Deconvolution is the process of estimating a system's input using measurements of a causally related output where the relationship between the input and output is known and linear. Regularization parameters are used to balance smoothness of the estimated input with accuracy of the measurement values. In this paper we present a maximum marginal likelihood method for estimating unknown regularization (and other) parameters used during deconvolution of dynamical systems. Our computational approach uses techniques that were developed for Kalman filters and smoothers. As an example application we consider estimating insulin secretion rate (ISR) following an intravenous glucose stimulus. This procedure is referred to in the medical literature as an intravenous glucose tolerance test (IVGTT). This estimation problem is difficult because ISR is a strongly non-stationary signal; it presents a fast peak in the first minutes of the experiment, followed by a smoother release. We use three regularization parameters to define a smooth model for ISR variance. This model takes into account the rapid variation of ISR during the beginning of an IVGTT and its slower variation as time progresses. Simulations are used to assess marginal likelihood estimation of these regularization parameters as well as of other parameters in the system. Simulations are also used to compare our model for ISR variance with other stochastic ISR models. In addition, we apply maximum marginal likelihood and our ISR variance model to real data that have previous ISR estimation results reported in the literature. |

Approximating the marginal likelihood estimate for models with random parameters Bell, B.M., "Approximating the marginal likelihood estimate for models with random parameters," Appl. Math. Comput., 119, 57-75, 2001. |
More Info |
25 Mar 2001 |
|||||||

Often a model for the mean and variance of a measurement set is naturally expressed in terms of both deterministic and random parameters. Each of the deterministic parameters has one fixed value while the random parameters come from a distribution of values. We restrict our attention to the case where the random parameters and the measurement error have a Gaussian distribution. In this case, the joint likelihood of the data and random parameters is an extended least squares function. The likelihood of the data alone is the integral of this extended least squares function with respect to the random parameters. This is the likelihood that we would like to optimize, but we do not have a closed form expression for the integral. We use Laplace's method to obtain an approximation for the likelihood of the data alone. Maximizing this approximation is less computationally demanding than maximizing the integral expression, but this yields a different estimator. In addition, evaluation of the approximation requires second derivatives of the original model functions. If we were to use this approximation as our objective function, evaluation of the derivative of the objective would require third derivatives of the original model functions. We present modified approximations that are expressed using only values of the original model functions. Evaluation of the derivative of the modified approximations only requires first derivatives of the original model functions. We use Monte Carlo techniques to approximate the difference between an arbitrary estimator and the estimator that maximizes the likelihood of the data alone. In addition, we approximate the information matrix corresponding to the estimator that maximizes the likelihood of the data alone. |

The marginal likelihood for parameters in a discrete Gauss-Markov process Bell, B.M., "The marginal likelihood for parameters in a discrete Gauss-Markov process," IEEE Trans. Signal Process., 48, 870-873, 2000. |
More Info |
1 Mar 2000 |
|||||||

We use Laplace's method to approximate the marginal likelihood for parameters in a Gauss-Markov process. This approximation requires the determinant of a matrix whose dimensions are equal to the number of state variables times the number of time points. We reduce this to sequential evaluation of determinants and inverses of smaller matrices, we show this is a numerically stable method. |