**2019 | 4 views | 17 Pages | 1.13 MB**

statistical analysis. However, from the point of view of terminology, the situation is slightly confused, because multivariate statistical analysis (MSA) is the name generally used for representing a speci?c group of methods dealing with the analysis of data sets by linear methods. However, linear methods (such as principal components analysis

M U LT I VA R I AT E A NA LYS I S OF M I C RO S C O P E I M AG E S E R I E S 3

It should be stressed that some of the concepts and resolved spectroscopy Ellis et al 1985 Examples of

methods described in this paper linear MSA classification multivariate statistical analysis of such data sets can be

of images in the representation space were introduced in found in Bonnet et al 1991 and Jbara et al 1995

electron microscopy by researchers working in the field of X sets of spectra recorded as a function of position through

three dimensional reconstruction of macromolecules van an interface for instance Tence et al 1995 Examples of

Heel Frank 1981 Frank van Heel 1982 van Heel multivariate statistical analysis of such data sets can be

1984 1989 Frank 1990 Borland van Heel 1990 found in Gatts et al 1995 Mu llejans Bruley 1995

Although these methodological contributions and the great Brun et al 1996 and Titchmarsh Dumbill 1996

success they have had in recent years in elucidating the 3D Multivariate two dimensional 2D image sets include

structure of important biological macromolecules are not Bonnet 1995a

described in this paper their importance in the diffusion of X sets of different elemental or chemical maps of a

related although different techniques for materials science specimen recorded in different microanalytical modes

applications should be recognized Auger EELS X ray emission X ray fluorescence X ray

The outline of the paper is the following In the next differential absorption Examples of the processing of this

section I give some examples of multivariate physical data type of data can be found in Bonnet et al 1992 Prutton et

sets and I introduce those that I will use for illustration in al 1990 1996 Cazaux 1993 Quintana Bonnet

the rest of the paper The following section is devoted to 1994a b Colliex et al 1994 and Trebbia et al 1995 As

linear multivariate statistical analysis Since these methods an illustration of this type of data I have selected a series of

are described in many textbooks Lebart et al 1984 and 14 X ray fluorescence maps of a specimen of granite

are already in use in several laboratories only a brief courtesy of K Janssens and collaborators Department of

description of them will be given Emphasis will be put on Chemistry University of Antwerp Wekemans et al 1997

the extension of orthogonal MSA to oblique MSA The The series is displayed in Fig 1

section thereafter will be devoted to nonlinear mapping an X sets of images of unit cells recorded by high resolution

extension of linear MSA Several approaches will be transmission electron microscopy HRTEM of interfaces

discussed and illustrated ranging from the minimization between two crystals Such data sets have been analysed

of a criterion cost function to neural networks approaches with the purpose of visualizing the gradual change of

The last section will be devoted to automatic classification I composition across the interface by pattern recognition

will concentrate on unsupervised classification which does techniques Ourmazd et al 1990 De Jong Van Dyck

not mean that supervised classification does not deserve 1990 Kisielowski et al 1995 Their analysis by multi

attention After briefly describing some classical statistical variate statistical methods has also begun Rouvie re

classification techniques which make assumptions con Bonnet 1993 Aebersold et al 1996 I will also show some

cerning the shape of clusters in the parameter space I will possibilities of this technique Such a data set is displayed in

put emphasis on new methods which do not make Fig 2

assumptions concerning the shapes of classes Multivariate three dimensional 3D images can also be

obtained with some microanalytical techniques secondary

ion mass spectroscopy SIMS Van Espen et al 1992 or

Some examples of multivariate data sets

fluorescence confocal laser microscopy for instance

Multivariate data sets produced in the domain of physical Techniques working with 2D images can be extended to

sciences as well as in other scientific domains are very 3D images relatively easily thanks to the increasing

diverse in nature They can be simple data series of capabilities of computers

spectra series of two or three dimensional images Four dimensional 4D image data sets are for instance

spectrum images etc 3D images recorded as a function of time a mode which

Examples of simple data are begins to be feasible in fluorescence confocal or not

X the results of measurements concentrations of different videomicroscopy

elements for instance made at different positions on a Spectrum images or a variant of them image spectra

specimen Examples are described in Quintana 1991 and will be the multivariate data sets of choice at the beginning

Quintana Bonnet 1994a b of the 21st century Jeanguillaume Colliex 1989

X different preparation conditions related to some char Combining spatial and full spectral information they will

acteristics of the specimens obtained see for instance the mix the advantages of spectroscopy and microscopy

paper by Simeonova et al 1996 which concerns the Although several acquisition procedures are already in

conditions of preparation of high temperature supercon use in different fields of physics chemistry biology and

ducting thin films teledetection the procedures for analysing the data sets are

Examples of multivariate spectra are still in their infancy and I will not address them in this

X sets of spectra recorded as a function of time time paper

q 1998 The Royal Microscopical Society Journal of Microscopy 190 2 18

4 N BO N N E T

Fig 1 Example of multivariate image set The image series consists

Fig 2 Another example of a multivariate image set The image ser

of 14 X ray fluorescence maps of a specimen of granite courtesy of

ies is composed of 190 subimages unit cells extracted from a

K Janssens and collaborators University of Antwerp The aim

high resolution transmission electron microscope image of a

of the analysis is to segment the specimen area into regions of

GaAs GaAlAs interface Here the aim is to analyse the differences

homogeneous composition which means labelling pixels according

between subimages and to deduce from them the variation of com

to their content in the different images pixels are represented by

position across the interface Subimages are digitized as 25 25

vectors in a 14 dimensional space If this labelling can be per

pixels and can thus be described by a vector in a 625 dimensional

formed successfully further quantification and characterization of

space

the specimen can take place the percentage of area occupied by

the different phases can be computed for instance

Y X X t 1

Linear multivariate statistical analysis LMSA t

where X indicates the transposed matrix Y is the variance

The purpose of LMSA is to reduce the number of covariance matrix variances of the images are along the

components of the objects studied This is useful and diagonal and covariances which represent the exchange of

sometimes necessary because a multivariate data set information between pairs of images are off the diagonal

always contains redundant information the N measure The next step consists of computing the eigenvalues and

ments are never completely independent and some correla eigenvectors of the variance covariance matrix The

tion or anticorrelation is always present LMSA can help eigenvectors then correspond to the new representation

both to reduce redundancy and to define a new representa space The associated eigenvalues are proportional to the

tion space onto which the components of the objects are strength of the corresponding eigenvectors in the variance

less correlated This step is performed on the basis of the covariance matrix that is to say the amount of information

variance covariance matrix classically the concept of carried by the new direction of representation thus

variance is supposed to be one of the concepts connected eigenvalues are sorted in descending order Note that the

to the information However other information descriptors nature of an eigenvector is analogous to that of an original

can also be used Bonnet in preparation individual spectrum image They can thus be

Consider X as the data set arranged as a matrix rows are displayed as spectra eigen spectra or images eigen

objects or individuals and columns are descriptors or images which can help in their interpretation

features or variables of these objects For instance if one Now an original individual can be described as a linear

wants to classify a set of images images are individuals and combination of the eigenvectors

pixel intensities are features But if one wants to classify

Xi Sj aij Aj 2

pixels image segmentation rows are composed of pixels

and columns represent the different image contents for where aij represents the weight or score of the object i on

every pixel the axis eigenvector number j The scores of the different

q 1998 The Royal Microscopical Society Journal of Microscopy 190 2 18

M U LT I VA R I AT E A NA LYS I S OF M I C RO S C O P E I M AG E S E R I E S 5

compression of information can be performed Since the

eigenvectors are orthogonal noise uncorrelated with the

useful signal is rejected into specific components which can

thus be eliminated after proper inspection The same is true

for experimental artefacts Hannequin Bonnet 1988

Trebbia Mory 1990 Therefore the next step may be to

reconstitute the data set after selecting some useful

components and discarding some useless ones Since the

decomposition is linear there is no difficulty in following the

reverse path Bretaudiere Frank 1986 for reconstituting

a filtered data set

As an example I will consider the application of LMSA to

sets of images After its introduction in electron microscopy

by the groups around Frank and van Heel this technique

has also been applied successfully in the domain of

materials science and physics Trebbia Mory 1990 Van

Espen et al 1992 Geladi 1992 Rouvie re Bonnet 1993

Quintana Bonnet 1994a b Aebersold et al 1996

Fig 3 Results of applying correspondence analysis to the series of Trebbia 1996 Thus I will just describe briefly how MSA

14 microanalytical images Fig 1 a first four factorial images can be applied to an example such as the one represented in

b scores of the 14 images onto the first two factorial axes See Fig 1 series of microanalytical maps Then I will

text for the interpretation of these results introduce the discussion concerning the need to go further

than the orthogonal LMSA described above

objects can thus also be displayed for display purposes two The results of applying the orthogonal MSA to the 14

scores aij and aij are often represented simultaneously for images of Fig 1 are displayed in Fig 3 Fig 3 a represents

all objects i 1 N The two display possibilities of the first four factorial images obtained after applying

objects and of objects descriptors often help to interpret the correspondence analysis one variant of LMSA see Trebbia

complete data set in terms of sources of information how Bonnet 1990 for an extended description to this data

many sources are present what do they represent set Figure 3 b represents the scores of the 14 images on

It should be stressed that the true number of sources of the first two factorial axes which account for 71 and 20

information M is often smaller than the number of of the total variance respectively Altogether these two

components N in the experimental set Thus a large figures allow us to understand the content of the data set

Fig 4 An illustration of the need to go from

orthogonal MSA to oblique MSA when

quantitative results are expected a b

Two basic reference images c h Six

images obtained from a linear combination

of the two basic images These images are

supposed to represent an experimental

multivariate data set Poisson noise was

added to each image independently i j

The first two orthogonal factorial images

were obtained after principal components

analysis The corresponding scores are dis

played in Fig 5 k l The two oblique fac

torial images were obtained after oblique

analysis They compare very well with the

original basic images a b

q 1998 The Royal Microscopical Society Journal of Microscopy 190 2 18

6 N BO N N E T

represent the six images 4c h One can notice that

although displaying some similarity with the two basic

images the two eigen images are not strictly equivalent the

eigen images are still a mixture of the two sources Thus

the scores on the two principal axes cannot be used as

estimates of the weighting factors One can reformulate this

by saying that in general orthogonal LMSA is not a

quantitative method see also for instance the comment of

T Walker in the discussion of Trebbia 1996

The reason for this drawback is that LMSA decomposes

the data set into orthogonal components On the other

hand the real sources of information have very little chance

to be orthogonal Thus the eigenvectors do not in general

represent the basic sources of information faithfully and the

scores on the principal components do not correctly

represent the weighting factors

Fig 5 Scores of the six images Fig 4c h numbered 1 6 on the

If one wants to access the elementary sources of

first two orthogonal factorial axes 0 and 1 obtained after principal

components analysis These scores coordinates on orthogonal information one must perform an additional step after the

axes do not correspond to the weighting factors used for the simu orthogonal decomposition This step is often called oblique

lation Thus PCA is not a quantitative method in this situation analysis or factor analysis Malinowski Howery 1980

because the two sources of information Fig 4a b are partly cor because it consists of rotating the significant orthogonal

related After oblique analysis the two axes 00 and 10 are obtained axes until they are consistent with the nature of the basic

At which point the scores coordinates on these two axes corre information sources Contrary to the orthogonal analysis

spond perfectly to the weighting factors Thus oblique analysis is which is completely assumption free except for the choice

a quantitative method provided of course the experimental of one of the variants of LMSA oblique analysis implies

images satisfy the underlying assumption of linear combination that some additional information is injected by the end user

This extra information see Trebbia Bonnet 1990 for a

The first source of information represented by the factorial discussion on the role of extra information in image

axis number 1 opposes Ca and Sr right of Fig 3b to Fe Ti processing can take many different forms indicating that

and Mn left of Fig 3b This corresponds to the spatial many variants of oblique analysis have been suggested

localization displayed in the first factorial image in Fig 3 a ranging from completely interactive forms to completely

left The second source of information represented by the automatic variants In completely interactive variants the

factorial axis 2 opposes Ca Fe Fi Mn top of Fig 3b to K end users must supply the basic components or equiva

bottom of Fig 3b This corresponds to the spatial lently they must specify the scores of M images onto the M

localization displayed in the second factorial image in Fig different basic components Examples of this way of

3 a Altogether we can anticipate that there are finally proceeding can be found in Garenstro m 1986 and Sarkar

three groups of different regions within the analysed area et al 1993 In chemometrics this kind of approach is

This will become more evident when other tools will be used called partial least squares PLS Geladi Kowalski 1986

for analysing this data set see the following sections or principal component regression PCR Esbensen et al

For introducing the need to go towards oblique analysis I 1992

will first choose a simple example I would like to consider For performing completely automatic oblique analyses

the set of simulated images displayed in Fig 4 These images some additional knowledge concerning the basic compo

were simulated according to the following protocol first nents and the scores of the individuals on them must be

two basic images representing two sources of information assumed For instance a frequent assumption is that they

are created Fig 4a b then these basic images are linearly both verify a positivity constraint Di Paola et al 1982

combined to produce six new images which are supposed to Benali et al 1994 With this assumption several

constitute the experimental data set Poisson noise was algorithms can be used for performing the rotation of axes

added in order to produce a more realistic data set Fig 4c automatically oblimax method varimax method etc

h These six images were submitted to principal compo Returning to the example illustrated in Figs 4 and 5 we

nents analysis PCA which means that no normalization can improve the results if we can provide some extra

was applied to the data set before the variance covariance knowledge For instance if we know that image 1 is a linear

analysis The first two eigen images obtained are displayed combination of the two basic images with weighting factors

in Fig 4 i j while the scores of the different images on the 1 and 0 5 and image 2 is a linear combination with

principal axes 1 and 2 are displayed in Fig 5 numbers 1 6 weighting factors 1 2 and 0 4 the solution to the problem

q 1998 The Royal Microscopical Society Journal of Microscopy 190 2 18