APL Home
APL-UW Home

Jobs
About
Campus Map
Contact
Privacy
Intranet

James Pitton

Senior Principal Engineer

Affiliate Associate Professor, Electrical Engineering

Email

pitton@apl.washington.edu

Phone

206-685-1843

Research Interests

Statistical Signal Processing, Digital Communications, Auditory Science, Psychoacoustics

Biosketch

James Pitton is a Senior Principal Engineer at APL-UW and Affiliate Associate Professor of Electrical Engineering at the University of Washington. From 2007 until 2010, he was the Associate Director for Ocean and Undersea Science at the US Office of Naval Research Global (ONR Global) in London, UK. Prior to joining ONR Global, Dr. Pitton joined APL-UW in 1999, and was the Head of the Environmental and Information Systems Department there from 2002 until 2007. Dr. Pitton received his Ph.D. in Electrical Engineering from the University of Washington in Seattle in 1994, and has also held research positions at AT&T Bell Laboratories in Murray Hill, NJ, and the Statistical Sciences Division of MathSoft in Seattle, WA. He has served on the organizing committee of numerous workshops and conferences, including the "Workshop on Machine Intelligence for Autonomous Operations" organized jointly between ONR Global, UK DSTL, and NURC. His ongoing research interests are focused on algorithms for information processing and autonomous systems, with an emphasis on sonar, automatic classification, nonstationary signal processing, and array processing.

Education

B.S. Electrical Engineering, University of Michigan, 1985

M.S. Electrical Engineering, University of Michigan, 1986

Ph.D. Electrical Engineering, University of Washington, 1994

Publications

2000-present and while at APL-UW

Robust human tracking based on DPM constrained multiple-kernel from a moving camera

Hou, L., W. Wan, K.-H. Lee, J.-N. Hwang, G. Okopal, and J. Pitton, "Robust human tracking based on DPM constrained multiple-kernel from a moving camera," J. Sign. Process. Syst., 86, 27-39, doi:10.1007/s11265-015-1097-y, 2017

More Info

1 Jan 2017

In this paper, we attempt to solve the challenging task of precise and robust human tracking from a moving camera. We propose an innovative human tracking approach, which efficiently integrates the deformable part model (DPM) into multiple-kernel tracking from a moving camera. The proposed approach consists of a two-stage tracking procedure. For each frame, we first iteratively mean-shift several spatially weighted color histograms, called kernels, from the current frame to the next frame. Each kernel corresponds to a part model of a DPM-detected human. In the second step, conditioned on the tracking results of these kernels on the later frame, we then iteratively mean-shift the part models on that frame. The part models are represented by histogram of gradient (HOG) features, and the deformation cost of each part model provided by the trained DPM detector is used to constrain the movement of each detected body part from the first step. The proposed approach takes advantage of not only low computation owing to the kernel-based tracking, but also robustness of the DPM detector without the need of laborious human detection for each frame. Experimental results have shown that the proposed approach makes it possible to successfully track humans robustly with high accuracy under different scenarios from a moving camera.

Ground-moving-platform-based human tracking using visual SLAM and constrained multiple kernels

Lee, K.-H., J.-N. Hwang, G. Okopal, and J. Pitton, "Ground-moving-platform-based human tracking using visual SLAM and constrained multiple kernels," IEEE Trans. Intell. Transp. Syst., 17, 3602-3612, doi:10.1109/TITS.2016.2557763, 2016.

More Info

1 Dec 2016

This paper proposes a robust ground-moving-platform-based human tracking system, which effectively integrates visual simultaneous localization and mapping (V-SLAM), human detection, ground plane estimation, and kernel-based tracking techniques. The proposed system systematically detects humans from recorded video frames of a moving camera and tracks the humans in the V-SLAM-inferred 3-D space via a tracking-by-detection scheme. To efficiently associate the detected human frame by frame, we propose a novel human tracking framework, combining the constrained-multiple-kernel tracking and the estimated 3-D information (depth), to globally optimize the data association between consecutive frames. By taking advantage of the appearance model and 3-D information, the proposed system not only achieves high effectiveness but also well handles occlusion in the tracking. Experimental results show the favorable performance of the proposed system, which efficiently tracks humans in a camera equipped on a ground-moving platform such as a dash camera and an unmanned ground vehicle.

On spectral noncircularity of natural signals

Wisdom, S., L. Atlas, and J. Pitton, "On spectral noncircularity of natural signals," Proc., IEEE Sensor Array and Multichannel Signal Processing Workshop, 10-13 July, Rio de Janeiro, doi:10.1109/SAM.2016.7569672 (IEEE, 2016).

More Info

19 Sep 2016

Natural signals are typically nonstationary. The complex-valued frequency spectra of nonstationary signals do not have zero spectral correlation, as is assumed for wide-sense stationary processes. Instead, these spectra have non-zero second-order noncircular statistics-that is, they are not rotationally invariant-that are potentially useful for detection, classification, and enhancement. These noncircular statistics are especially significant for transient events, which are common in many natural signals. In this paper we provide practical and effective estimators for spectral noncircularity and spectral correlation. We illustrate the behavior of our spectral noncircularity estimators for synthetic signals. Then, we derive a generalized likelihood ratio test using both circular and noncircular models and show how estimates of spectral noncircularity provide performance improvements for detection of natural acoustic events.

More Publications

Deformable multiple-kernel based human tracking using a moving camera

Hou, L., W. Wan, K.-H. Lee, J.-N. Hwang, G. Okopal, and J. Pitton, "Deformable multiple-kernel based human tracking using a moving camera," Proc., 2015 IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), 19-24 April, South Brisbane, Queensland, 2249-2253, dos:1109/ICASSP.2015.7178371 (IEEE, 2015).

More Info

19 Apr 2015

In this paper, we propose an innovative human tracking algorithm, which efficiently integrates the deformable part model (DPM) into the multiple-kernel based tracking using a moving camera. By representing each part model of a DPM detected human as a kernel, the proposed algorithm iteratively mean-shift the kernels (i.e., part models) based on color appearance and histogram of gradient (HOG) features. More specifically, the color appearance features, in terms of kernel histogram, are used for tracking each body part from one frame to the next, the deformation cost provided by DPM detector is further used to constrain the movement of each body kernel based on the HOG features. The proposed deformable multiple-kernel (DMK) tracking algorithm takes advantage of not only low computation owing to the kernel-based tracking, but also robustness of the DPM detector. Experimental results have shown the favorable performance of the proposed algorithm, which can successfully track human using a moving camera more accurately under different scenarios.

Voice activity detection using subband noncircularity

Wisdom, S., G. Okopal, L. Atlas, and J. Pitton, "Voice activity detection using subband noncircularity," in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 19-24 April, South Brisbane, Queensland, 4505-4509, doi:10.1109/ICASSP.2015.7178823 (IEEE, 2015).

More Info

19 Apr 2015

Many voice activity detection (VAD) systems use the magnitude of complex-valued spectral representations. However, using only the magnitude often does not fully characterize the statistical behavior of the complex values. We present two novel methods for performing VAD on single- and dual-channel audio that do completely account for the second-order statistical behavior of complex data. Our methods exploit the second-order noncircularity (also known as impropriety) of complex subbands of speech and noise. Since speech tends to be more improper than noise, higher impropriety suggests speech activity. Our single-channel method is blind in the sense that it is unsupervised and, unlike many VAD systems, does not rely on non-speech periods for noise parameter estimation. Our methods achieve improved performance over other state-of-the-art magnitude-based VADs on the QUT-NOISE-TIMIT corpus, which indicates that impropriety is a compelling new feature for voice activity detection.

Extending coherence for optimal detection of nonstationary harmonic signals

Wisdom, S., J. Pitton, and L. Atlas, "Extending coherence for optimal detection of nonstationary harmonic signals," in 2014 48th Asilomar Conference on Signals, Systems and Computers, 2-5 November, Pacific Grove, CA, 1784-1788, doi:10.1109/ACSSC.2014.7094774 (IEEE, 2014).

More Info

2 Nov 2014

This paper describes an improved detector for nonstationary harmonic signals. The performance improvement is accomplished by using a novel method for extending the coherence time of such signals. This method applies a transformation to a noisy signal that attempts to fit a simple model to the signal's slowly changing fundamental frequency over the analysis duration. By matching the change in the signal's fundamental frequency, analysis is more coherent with the signal over longer durations, which allows the use of longer windows and thus improves detection performance.

Driving recorder based on-road pedestrian tracking using visual SLAM and constrained multiple-kernel

Lee, K.-H., J.-N. Hwang, G. Okopal, and J. Pitton, "Driving recorder based on-road pedestrian tracking using visual SLAM and constrained multiple-kernel," in 2014 IEEE 17th International Conference on Intelligent Transportation System (ITSC), 8-11 October, Qingdao, 2629-2635, doi:10.1109/ITSC.2014.6958111 (IEEE, 2014).

More Info

8 Oct 2014

This paper proposes a robust driving recorder based on-road pedestrian tracking system, which effectively integrates Visual Simultaneous Localization And Mapping (V-SLAM), pedestrian detection, ground plane estimation, and kernel-based tracking techniques. The proposed system systematically detects the pedestrians from recorded video frames and tracks the pedestrians in the V-SLAM inferred 3-D space via a tracking-by-detection scheme. In order to efficiently associate the detected pedestrian frame-by-frame, we propose a novel tracking framework, combining the Constrained Multiple-Kernel (CMK) tracking and the estimated 3-D (depth) information, to globally optimize the data association between consecutive frames. By taking advantage of the appearance model and 3-D information, the proposed system not only achieves high effectiveness but also well handles occlusion in the tracking. Experimental results show the favorable performance of the proposed system which efficiently tracks on-road pedestrian in a moving camera equipped on a driving vehicle.

Enhancement of reverberant and noisy speech by extending its coherence

Wisdom, S., T. Powers, L. Atlas, and J. Pitton, "Enhancement of reverberant and noisy speech by extending its coherence," in 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 4-9 May, Florence, Italy, p1.11 (IEEE, 2014).

More Info

10 May 2014

We introduce a novel speech enhancement algorithm for removing reverberation and noise from recorded speech data. Our approach centers around using a single-channel minimum mean-square error log-spectral amplitude (MMSE-LSA) estimator, which applies gain coefficients in a time-frequency domain to suppress noise and reverberation. The main contribution of this paper is that the enhancement is done in a time-frequency domain that is coherent with speech signals over longer analysis durations than the short-time Fourier transform (STFT) domain. This extended coherence is gained by using a linear model of fundamental frequency variation over the analysis frame. In the multichannel case, we preprocess the data with either a minimum variance distortionless response (MVDR) beamformer, or a delay-and-sum beamformer (DSB). We evaluate our algorithm on the REVERB challenge dataset. Compared to the same processing done in the STFT domain, our approach achieves significant improvement on the REVERB challenge objective metrics, and according to informal listening tests, results in fewer artifacts in the enhanced speech.

Extending coherence time for analysis of modulated random process

Wisdom, S., L. Atlas, and J. Pitton, "Extending coherence time for analysis of modulated random process," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4-9 May, Florence, Italy, 340-344, doi:10.1109/ICASSP.2014.6853614 (IEEE, 2014).

More Info

4 May 2014

In this paper, we relax a commonly-used assumption about a class of nonstationary random processes composed of modulated wide-sense stationary random processes: that the fundamental frequency of the modulator is stationary within the analysis window. To compensate for the relaxation of this assumption, we define the generalized DEMON (%u201Cdemodulated noise%u201D) spectrum representing modulation frequency, which we use to increase the coherence time of such signals. Increased coherence time means longer analysis windows, which provides higher SNR estimators. We use the example of detection on both synthetic and real-world passive sonar signals to demonstrate this increase.

Adaptive multi-taper array processing in range-bearing space

Pitton, J., "Adaptive multi-taper array processing in range-bearing space," POMA, 17, Proceedings, 11th European Conference on Underwater Acoustics, Edinburgh, Scotland, 2-6 July, doi:10.1121/1.4773193, 2012.

More Info

10 Dec 2012

This paper presents an extension of Thomson's multitaper method for spectral estimation to estimate the spatial spectrum of an acoustic source in range and bearing. To begin, we solve the energy concentration problem in range and bearing for a linear array. This problem corresponds to finding the array shading (taper) whose array response is maximally concentrated in range and bearing. The Fresnel approximation for near-field sources is used to simplify the formulation of the maximization problem (making it equivalent to the Fractional Fourier Transform). This maximization reduces to an eigenvalue problem, similar to the case producing the prolate spheroidal wavefunctions. The resulting eigenvectors correspond to the optimal array shading. The array data is then beamformed with each eigenvector over the set of ranges and bearings of interest. The set of beamformer outputs are then combined to form a multitaper estimate of the spatial spectrum. This initial estimate is used to compute adaptive weights for each beamformer through an iterative least-squares minimization procedure, reducing sidelobe leakage. The adaptive weights are then used to form an adaptive multitaper estimate of the spatial spectrum in range and bearing space. Results of the method will be presented for simulated acoustic array data.

Time-frequency tracking using multi-window local phase analysis

Dadouchi, F., J.W. Pitton, C. Ioana, and C. Gervaise, "Time-frequency tracking using multi-window local phase analysis," Proc., ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, 3401-3404 (IEEE, 2012).

More Info

25 Mar 2012

The analysis of signals consisting of multiple components with non-linear frequency modulation is required in a large number of applications, including the study of marine mammals vocalizations. This analysis has multiple motivations, such as investigating the impact of anthropogenic noise on marine mammal behavior, and species identification to avoid collision between ships and marine mammals. Such applications are normally conducted in a Passive Acoustics Monitoring (PAM) context, where there is low SNR together with very little a priori knowledge on the signals being analyzed. Recently, time-frequency tracking based on local analysis of the instantaneous phase has been successfully applied to underwater signals. In this paper, we present a robust version of this method, based on the use of the multiple analysis windows. The results provided for simulated and real signals demonstrate improved tracking of the instantaneous frequency in noise.

Distributed environmental inversion for multi-static sonar tracking

Pitton, J., A. Ganse, G. Anderson, and D.W. Krout, "Distributed environmental inversion for multi-static sonar tracking," Proc., 9th International Conference on Information Fusion, 10-13 July, Florence, Italy, 6 pp., doi:10.1109/ICIF.2006.301710 (IEEE, 2006).

More Info

10 Jul 2006

This paper presents an approach for adapting a tracking algorithm to the acoustic propagation environment. This adaptation is performed by incorporating the expected target signal-to-noise ratio (SNR) into the data association step through the measured contact amplitude. In this work, expected SNR is provided via acoustic modeling; estimates of bottom loss and scattering strength, required by the acoustic model, are obtained via inversion of the acoustic model based on measured multi-static sonar reverberation data. This paper shows that the use of distributed sensors provides improved estimates of the environmental parameters, and hence better estimates of the expected SNR.

Inventions

3D Reconstruction of Dental Caries Using the Scanning-fiber Endoscope

Record of Invention Number: 47625

Eric Seibel, Matthew Carson, Yuanzheng Gong, James Pitton

Disclosure

16 Feb 2016

Multi-perspective Infrared Interrogation of Teeth

Record of Invention Number: 47417

James Pitton, Eric Seibel

Disclosure

27 Jul 2015

Enhancement of Noisy and Reverberant Speech Using Beamforming and Suppression

Record of Invention Number: 46870

James Pitton

Disclosure

6 Mar 2014

Acoustics Air-Sea Interaction & Remote Sensing Center for Environmental & Information Systems Center for Industrial & Medical Ultrasound Electronic & Photonic Systems Ocean Engineering Ocean Physics Polar Science Center
Close

 

Close