Quester Tangent Company|Transit|Marine|Search|Contact Us
Making Data Intelligent
Quester Tangent  >  Seabed Classification  >  Marine R&D  >  Projects  >  Objective Measure of Acoustic Diversity
Train Photo

Objective Measure of Acoustic Diversity

Objective Measures of Acoustic Diversity

J.M. Preston, A.C. Christney,
L.S. Beran, and W.T. Collins

Quester Tangent

  R.A. McConnaughey NMFS Alaska Fisheries Science Center
7600 Sand Point Way NE
Seattle, WA, USA
Adobe PDF  PDF Version Get Adobe Reader  

Acoustic Seabed Classification     [ TOP ]

Acoustic seabed classification is the organization of the sea floor and shallow subsurface sediment into seabed types or classes based on characteristics of acoustic responses. In maps of acoustic classes, a highly complex data set is reduced to one that can be understood and applied.

Classification of echoes from single-beam sounders is today an established technique. For example, the QTC VIEW™ systems (Quester Tangent Corp.) are in use in 23 countries. Both the current versions, Series 4 and 5, are hardware systems that connect to a wide range of echo sounders and acquire digital representations of echoes. Seabed classification follows. The window surrounding only the first echo is analyzed by a series of algorithms that derive 166 feature descriptors. Some of these features are based on echo shape, and others on spectral characteristics. With a fairly wide beam, 15° or more, echo shape is rich in sediment information with nadir echoes because backscatter coefficients and their variation with angle of incidence differ markedly between sediments near normal incidence.

Principal components analysis (PCA) is used to reduce the information to three "Q" values representing uncorrelated linear combinations of the features most useful in distinguishing seabed types. Points defined by the triad of Q coordinates are plotted in 3-dimensional space for visual inspection and clustering. Class assignments are based on multivariate distances between echoes to be classified and clusters representing the acoustic properties of selected seabed types. In addition, a value is calculated for each point representing the confidence in its class assignment. Quester Tangent also supplies a software suite, QTC IMPACT™, that performs all this processing starting with echoes recorded by any of several commercial systems, including QTC VIEW™ Series 5.

Acoustic seabed classification using multibeam or sidescan upload/images is a more recent development. With upload/images, classification in QTC MULTIVIEW™ and QTC SIDEVIEW™ starts with dividing the image into rectangular patches, and then generating feature values based on the statistics of the backscatter in each. Feature vectors are then processed just as they would be if they were based on single-beam echoes: principal components analysis of the set of feature vectors, and identifying clusters having a similar acoustic response. Thus clustering is common to classification with single-beam and multibeam systems.

This approach to acoustic bottom classification relies on the geological and biological diversity of the sea bottom being expressed in acoustic diversity of echoes. Exploiting this concept requires appropriate acoustic and data acquisition equipment to capture any echo details that carry bottom information.

To relate classes based on acoustic diversity to the biological or geological differences of interest, one must obtain appropriate samples, using divers, grab samplers, or the like. The acoustic classification is deemed successful if samples from several areas within the same acoustic class are distinct, in the required sense, from the samples taken from areas in other acoustic classes. When this is so, as it usually is, the acoustic classifications are both useful and valuable, because acoustic survey can systematically cover large areas at a wide range of spatial resolutions. The user can confidently assign typical bottom characteristics of the point samples from any location in each acoustic class to the entire area within that class.


Measuring Distances for Clustering     [ TOP ]

In a clustering process, each datum is assigned to the class whose centre it is closest to. Closeness could be simply the Euclidean distance between the datum and the cluster centre, or a Bayesian metric.

An implicit assumption when using Euclidean distance is that all clusters have approximately the same variance. With echo data, this is often not true. Cluster variances and covariances become involved in class assignments if a Bayesian, rather than Euclidean, metric is used. This metric is better suited because it accounts for variations in the covariance between clusters, as well as adding terms to account for the a priori probability that a new observation will be a member of a particular cluster.

For a vector x of dimensionality M, the Bayesian metric, dk (x), between x and a cluster defined by mean mk, covariance matrix Ck , and probability p(wk) is

Mathematical Formula

In practice the probability of cluster membership, p(wk), is estimated from the memberships themselves, i.e. p(wk) = Nk/N , where Nk is the number of members in the kthcluster and N is the total number of observations.


Objective Clustering Based on Information Criteria     [ TOP ]

The true number of clusters necessarily depends on the model. In terms of the unknown true model, f (x), and an estimated model, g(x), the Kullback-Leibler Information Quantity, I (f,g), is defined to be

Mathematical Formula

where theta are the model parameters which in practice we must estimate. The range of integration is over all possible values of x. This quantity is a measure of the mismatch between the true distribution and the corresponding model. Hence, the "best" choice of model parameters will produce the minimal value of I.

Through the work of Akaike and others, we are led to use Ex{log(g(x|theta(y)))} as a measure of the goodness-of-fit of the model, g (x|theta), to the unknown true probability density, f (x), where the observed data, y, are used to provide the estimated model parameters, theta(y). In particular, if we assume a Gaussian multimodal model (GMM), we can derive expressions that allow us to calculate this expectation value.

The Bayesian Information Criterion (BIC) is one such expression, and has been used in this work to find the "best" choice of model parameters. For a GMM, the BIC as a function of the number of classes, K, is given by

Mathematical Formula

where mK is the total number of parameters required by the model. For a GMM in three dimensions, we have K - 1 independent class probabilities, 3K class means, and 6K independent covariance matrix elements, thus mK= 10K - 1.


Finding the Minimum Bayesian Information Criterion (BIC)     [ TOP ]

The optimal number of clusters can be thought of as the best balance between information content and confidence in the classifications. The former is high with many clusters, the latter high with few clusters. For each data set, that optimal number is that value of K for which the BIC is minimum. Two processes were used to find optimal K values: a random Monte Carlo process and a search for the global BIC minimum using simulated annealing. Each process also generates the list of clusters to which each datum belongs, from which means, covariances, and populations can be calculated.

Monte Carlo

To start the Monte Carlo process, the data were divided into K groups. With purely random assignments many CPU cycles were wasted investigating arrays of clusters that had very little chance of evolving into sensible models. Initialization techniques were developed to steer these random initial assignments toward realistic clusters. After "dealing" K random clusters, the K-means algorithm was used to find K means. These means and memberships were then used as the starting seed for the K-statistics kernel, as used in QTC IMPACT™, which uses the Bayesian metric in assigning points to the nearest cluster. This process fits well with the BIC, which also uses a Bayesian model for the cluster distribution. After the K-means and K-statistics processes stabilize, a BIC value is available, as well as the memberships and the associated means, variances, and covariances. For each K, ten random starting assignments were used. The multiplicity of results indicates the complexity of the search space and the challenge of finding the global minimum, rather than local minima.

BC10

Bayesian Information Criterion of 10 models for K from 3 to 30. Each BIC result was generated independently from a random initialization. (Because results can be identical, especially at low K, a small horizontal jitter offsets sequential results.) The red box surrounds the least BIC for any K, showing that the optimal model for this data set has 9 clusters.

Simulated Annealing

Simulated annealing is a global search algorithm that works analogously to the cooling of an alloy to its lowest energy configuration. The algorithm involves random perturbations of model parameters (cluster memberships in our case). Each candidate perturbation has an associated "energy" change, deltaE. All perturbations with deltaE<0 are accepted, and the others may or may not be accepted, according to a comparison between a random number and exp(-deltaE/tau), where tau is a "temperature" that diminishes as the search proceeds. In the early phases, at high temperature, large perturbations are often accepted, which helps the emerging solution escape local minima. As the temperature is lowered, the algorithm is more likely to reject uphill perturbations and will gradually freeze at a (hopefully global) minimum. This technique is widely used to find global minima of multi-dimensional functions.

To find the global minimum of a BIC, we used a variant of simulated annealing specialized to clusters. Perturbing cluster means and covariances was not effective because they are correlated and change discontinuously (as a cluster mean is changed, the points assigned to that cluster change and so do all the cluster characteristics). Instead, we found success with an approach we call thief cluster, in which the candidate perturbations (which may or may not be accepted) are thefts of points by one cluster from the others.

For each number of clusters, K, the simulated annealing algorithm continues to search until a convergence criterion is met. This figure shows the progress of a typical search, stopping after 9200 accepted perturbations. Individual searches were done for each K, and the optimal K is the number of clusters that gives the lowest BIC found with that data set.

BIC

Bayesian Information Criterion of 9200 models, evolving by thief perturbations and the simulated annealing process. The resulting BIC, after convergence, is then compared with results for all other values of K to find the optimal clustering of this data set.

User-Guided Clustering     [ TOP ]

The feature-generation and PCA process ascribes three Q coordinates to each datum. Clustering divides these points into classes that correspond to sediment types, based on their acoustic characteristics. Currently the user has to guide the clustering process.

Of the many different clustering techniques available, QTC uses a variation of the K-means algorithm known as K-statistics, as shown in the block diagram.

The basic K-means algorithm derives its name from the fact that it tries to find K clusters in the data, and that the clusters are described by the mean positions of the cluster members. Each datum is assigned to the cluster it is "closest" to, according to some metric. It is an iterative process. The algorithm begins by guessing, usually in a guided way, K means as new cluster centres. Each datum is then assigned to the closest centre. These cluster centres are then wrong, because they are not the mean positions of all the points assigned to that cluster, so they are recalculated. These new means are then used in another assignment step. The process continues until the means and the cluster memberships stabilize.

With the K-statistics algorithm, the initial guess requires not only K means, but also K covariance matrices. The combined process starts as described above. Each time the means have stabilised, the covariance matrix of each cluster is recalculated. Then the K-means part of the algorithm is performed again, using the covariances in the metric, that is, in the closeness decisions. This process continues until the memberships, means, and covariances have all stabilised.

The combination of K-means and K-statistics is an efficient stable method of dividing a data set into its natural clusters. Nevertheless, the user has to choose which cluster to split next, how to split it, and when to stop splitting. Various summary statistics are provided to aid these decisions, but a subjective element remains and results differ between users.

Q-space

A typical clustering process, plotted in Q-space. The left hand image shows the initial state of the data. The right hand image shows the data at a later clustering stage

Flow chart

Flow chart of K-statistics clustering, showing the relationship between the K-means and K-statistics processes.

Bering Sea Survey     [ TOP ]

The largest acoustic classification survey ever done covers the eastern Bering Sea. Almost 14 million echoes from a Simrad EK-500 echo sounder were collected by two QTC VIEW™ 4 systems on the tracks shown below.

In 1996, Quester Tangent Corporation (QTC) and the NMFS Alaska Fisheries Science Center (Seattle) formed a strategic alliance to apply QTC seabed classification technology to the problem of groundfish habitat descriptions. The Bering Sea supports about 300 fish species, many of which are demersal species. This stock supports sizeable fisheries, as indicated by landings at the Bering Sea port of Dutch Harbor: in excess of 579 million pounds in 1996 and 834 million pounds in 2001, the highest landed weight in the U.S., worth $129 Million. Fifteen years earlier, in 1981, the landing at Dutch Harbor was 73 million pounds. This rapid increase in catch highlights the need for effective management to ensure sustainability.

Approximately 18,000 line miles of dual-frequency seabed acoustic data were collected from the eastern Bering Sea over two cruises of the NOAA ship Miller Freeman. Echoes were generated by a Simrad EK-500 scientific echo sounder operating at both 38 kHz and 120 kHz. Acquisition of the echo envelopes was done with a QTC VIEW™ at each sonar frequency for amplification and envelope formation, supplying a Quester Tangent multi-channel digitization and acquisition system.

As deployed aboard the NOAA ship Miller Freeman, the QTC VIEW™ systems amplified and digitized echoes, then passed them to a data acquisition system for display and logging.

Acoustic footprint diagram

The length and area of a footprint of a stack of pings depends on ship speed, v, on depth, d, on ping rate, f, and on sonar beamwidth, q.

The 1999 data were processed subjectively. All the processing methods used gave reasonable classifications, so the optimum processing method could not be identified unequivocally. A selection based on experience and less-than-adequate statistical indicators was not acceptable. A more rigorous and objective statistical treatment of the data using the Bayesian Information Criterion was then implemented. This new procedure led to an objective ranking of the processing methods and the identification of an optimal classification scheme for both frequencies. We believe this is one of the first applications of this criterion to large data sets, and certainly a first use in acoustic classification.

For both frequencies, 38 and 120 kHz, using BIC to select an optimal number of clusters followed by ranking using total score showed that the resolution in classes was improved with larger stacks. This could be interpreted either as improving the signal to noise ratio by suppressing echo variability, or as an artifact of adding together echoes from both sides of a boundary. In view of the very large size of this survey, the former hypothesis is preferred.

Spatial resolution suffers as the number of pings stacked increases. If there are n echoes in a stack, the area, A, and the length, L, of the stack footprint are as shown in the diagram. For the largest stacks, 50, and in the deepest water, 150 m, this distance is about 240 m, which is 0.002° of latitude. The track plots cover 9° of latitude. For many purposes, this spatial resolution of about two parts in ten thousand will be adequate. When it is adequate, the stack size can be chosen to optimise the classification, as the trade-off in resolution would be acceptable. However, it clearly is important to have higher resolution in smaller, more dynamic, environments like ports and reefs. Ultimately, choosing how many echoes to stack has to be a compromise.


Sediment classes

A set of sediment classes determined acoustically from data collected aboard Miller Freeman. These data were recorded at minimal cost and without interference with normal operations, including a mid-water hydroacoustic survey. Class colours indicate cluster positions in Q-space.

Using BIC to Select Optimal Classification Processes     [ TOP ]

Classification results depend on sonar frequency and on processing details. Each combination gave a unique data set, for which BIC was used to identify the optimal result. A statistic based on chi-squared allowed comparison between data sets.

Choices made during survey and processing can affect the classification results. Sonar frequency and beamwidth are two examples, while processing choices include the number of pings that are stacked together before features are generated, to reduce the effects of echo variability. While BIC can be used to find the optimal number of classes with each processing method, other indicators are needed to make comparisons between methods. The reason BIC cannot be used to compare methods directly is that the first term in the Kullback-Leibler.

Information Quantity depends on the unknown true distribution, which is different for each method. Methods were therefore ranked by their total score per datum. The total score is the sum over all classes of the product of the class population and its chi-squared test statistic. The chi-squared statistic is a measure of how well the class distribution fits the assumed distribution model, in this case Gaussian.

Optimal processes for the Bering Sea data set were sought. The table shows that stacking the largest number of pings, 50, brought out details in Q-space. These data sets had the largest optimal numbers of clusters and the smallest scores per datum. Reference depth has less effect. Optimal processes are highlighted in yellow. The BIC searches that gave those optimal results are shown below, with the corresponding track plots. Minima in these BIC plots are sometimes unconvincing, as in these examples. A method was developed to estimate BIC for unrealistically large numbers of clusters, since neither Monte Carlo nor simulated annealing are practical for very large K. Examples showed a broad BIC minimum in the K range we had investigated with Monte Carlo and simulated annealing.


Frequency (kHz) Number Stacked Reference Depth (m) Optimal K Total Score /N
38 5 90 10 1.83
38 15 90 9 2.18
38 50 90 20 1.20
38 5 150 7 2.08
38 15 150 10 1.85
38 50 150 24 1.28
120 5 90 4 5.36
120 15 90 6 3.92
120 50 90 23 1.34
120 5 150 4 4.37
120 15 150 9 2.58
120 50 150 24 1.18

BIC plot

BIC over a wide range of numbers of clusters, for the data set at 38 kHz, stacks of 50, but a deeper reference depth than the plots on the right. These results increase our confidence that the plots to the right do indeed indicate BIC minima.
 

Minimum BIC plots

The search for minimum BIC. Red lines show the best (lowest) and worst Monte Carlo results; blue lines the results from simulated annealing (small process variants shown dashed). The black line is a test for any simpler model being statistically significantly different from the optimal model, shown boxed.
 

Trackplots

Trackplots for the Bering Sea data set and the two best processing options, shown highlighted in the table, for both frequencies. Class colours indicate cluster positions in Q-space; thus all green classes have similar acoustic responses, as do all red, blue,...
    © 2010 Quester Tangent