*a. theoretical formulation**b. numerical implementation**c. more descriptions on the analysis method
and products*

*a. theoretical formulation*

The governing equations of the large-scale atmospheric fields are

(1)

(2)

(3)

(4)

with boundary conditions

(5)

and

(6)

where is the wind, s=CpT+gz
is the dry static energy, q is the mixing ratio of water vapor, and ps
is the surface pressure. Large-scale is also referred as grid-scale, defined
as the size of a sounding array comprising of several stations. Prime denotes
unresolvable motions of the observational network. *Q*rad
is the net radiative heating related with clouds. *C* is the condensation
of water vapor, *E* is the evaporation of cloud water and rain water.
Phase changes associated with ice can be included but are neglected for
simplicity. *q*l is the cloud liquid water content.

All variables on the left hand side of (1)-(4) can be calculated from coordinated sounding measurements of winds, temperature and humidity over a network of a few stations. Terms on the right hand side of (1)-(4) constitute the unknowns. They are the subject of parameterizations in a large-scale model. Their validation often requires the diagnostics of the left hand side terms. With no prior knowledge of the unknowns, vertical integration of the above equations yields the following constraints:

(7)

(8)

(9)

(10)

where

These are column-integrated conservations of mass,
water vapor, dry static energy and momentum. *R* is the net downward
radiative flux at the TOA and at the surface (SRF).
is the surface wind stress. *Prec* is precipitation. *SH* is
the sensible heat flux, and *E*s is the surface
evaporation. Horizontal advection of hygrometers and all horizontal eddy
covariance terms have been neglected due to insufficient knowledge. These
omissions are not expected to seriously affect the present study if the
scale of the phenomena to be studied is much larger than the scale of the
observational network.

Terms on the right hand side of (7)-(10) are available from surface and satellite measurements. They are the area-averaged fluxes within the observational network. The strategy of objective analysis in this study is to constrain the atmospheric variables (, s, q) to satisfy (7)-(10) with minimum adjustments to direct sounding measurements. The adjustment is justifiable after consideration of instrument and measurement uncertainties, errors from handling of missing sounding data, and aliasing of small-scale features to large-scale fields in the instantaneous soundings.

The analyzed product, denoted as (u* and v*) s*, q*, is derived by minimizing the following cost function:

(11)

with (7)-(10) as strong constraints, where subscript "o" denotes direct measurements, is the weighting function (discussed later). The integration will be replaced by summation over the stations and on vertical layers. It should be pointed out that alternatively we can also minimize the time integration of the above cost function over the whole measurement period to perform data analysis. Such a choice, however, is not well justified if the data uncertainties remain constant with time; and thus it is not used here.

As always, the analyzed data are typically neither measurements nor true values of the variables. An important aspect of the analysis procedure is, therefore, to justify the magnitude of adjustments made to the direct measurements.

Since the general constraints of (7)-(10) apply to all models, this analysis strategy is not particular to any one model. It should be pointed out that most available assimilated products are derived using (1)-(4) with particular parameterization of the unknowns. The technique of adjoint model is often used to minimize the difference between model-generated variables and observations (e.g., Ghil and Malanotte-Rizzoli, 1991; Zou et al., 1993; Xu, 1996). Another approach is to include a nudge term in (1)-(4) to relax the model variables to observations (Daley, 1991). While these approaches have been very useful in filling observational gaps, the utility of the assimilated data, however, is inherently limited for validating the parameterization of the unknowns of a model in (1)-(4). The current approach differs from these assimilation practices in that it relies entirely on measurements.

*b. numerical implementation*

For *I* stations in the sounding network, each
with *K* layers, we use *x*ik to denote
a state variable at station *i* and layer *k*, and use column
vector *X* to denote this variable (u, v, s, q) at all grids:

(12)

where superscript T means transpose. The cost function of (11) can be written as

(13)

where Q is the weighting matrix related with error
covariances of a variable. The analyzed data* *are subject to the
strong constraints of (7)-(10). They can be written in the following discrete
forms:

(14)

(15)

(16)

(17)

where

and subscript *m *represents average over the
area covered by the *I* stations. Geopotential height can be derived
from the virtual temperature analysis using the hydrostatic balance

(18)

Surface pressure *p*s is
not included in the cost function for adjustment, nor are surface evaporation,
sensible heat flux, surface wind stress and cloud liquid content, although
their inclusion for adjustment is possible.

The variational equations (Euler-Lagrange equations) for the analyzed variables of are:

(19)

where
stands for any one of the variables among .
are the Lagrange multipliers. Each variable has *IxK *grids. With
a total of four variables and five Lagrange multipliers, the total number
of variables to calculate at any given time is *4xIxK+5. *They are
solved from the *4xIxK* equations in (19) and the five equations in
(14)-(17). It is noted here that the Lagrange multipliers are only functions
of time because the constraints are vertically integrated budgets.

We assume measurement errors at different locations and for different variables uncorrelated. The covariance matrix is then diagonal. The diagonal elements are the reciprocal of error variances . Thus, (19) becomes

(20)

or

(21)

The analyzed product is equal to the corresponding measurement plus an adjustment term that is associated with the sensitivity of the column-integrated mass, water vapor, moist static energy, and momentum.

Numerical calculation of (21) and (14)-(17) is carried
out in an iterative mode. The iteration is performed for the whole measurement
period. To simplify the description, we describe the iteration procedure
for a single time level. The iteration for the whole measurement period
can be carried by executing an iteration step to all successive time levels
of measurements and then proceeding to the next iteration step up to the
completion of all steps. The iteration, when described to a single time
level, contains three steps. First, the previous estimate or original measurements
are used to calculate each partial derivative to *x*ik
on the right hand side of (21) using the formulae in (14)-(17). Actual
numerical calculation is simplified because the partial derivative does
not require vertical summation. The total number of partial derivatives
at any given time is 5x4xIxK, with five constraints (Aps,
Aq, As, Au,
Av) taking partial derivatives to four variables ()
at each station and layer.

A general form of constraint in (14)-(17) can be written as

(22)

In an iteration, it can be written as

(23)

where *n* denotes the iteration index. When
(21) is written as

(24)

where are the partial derivatives from the first step, substitution of (24) into (23) yields a linearized set of equations for . More specifically, the substitution gives

. (25)

Because of the linearity of the above operator, it can be further written as

. (26)

This set of five equations for the five constraints (Aps, Aq, As, Au, Av) is used to solve for at any given time. This constitutes the second step in the iteration.

In the third step, the adjustments are calculated by using the newly obtained in (24). After that, the next iteration is performed. In our calculation, good convergence of iteration occurs within 20 cycles of iterations for the whole measurement period, which is judged by column-integrated balance of mass, moisture, heat and momentum to the order of better than 0.1 with units Pa/day, W/m2 and N/m2 in (14)-(17) at all time levels.

Note that time derivatives appear in (14)-(17); therefore, more than one measurement time is involved each time. The time derivatives in (14)-(17) are calculated using a central difference scheme using values from the previous iteration cycle. That is why the actual iteration needs to be done for the whole measurement period. It is also noted that we do not adjust the first and last time levels of measurements because of the central time differencing, and budgets are not derived for these two time levels. It is possible to include them in the analysis if forward and backward time difference schemes are used for these two end time levels.

We also note that there might be other algorithms available to obtained the minimization solution of (13) with constraints of (14)-(17), such as using a maximum gradient descent approach. The adjoint method cannot be easily applied here because (14)-(17) are not a model that can be integrated in time. The iteration procedure described above only involves solving linear equations and seems to be quite efficient for our purpose. Nevertheless, we welcome the readers to suggest other more efficient algorithms.

The divergence terms are calculated by first assuming
that field vectors vary linearly along the sides of the outmost domain
formed by connecting the stations, and then taking the convergent fluxes
perpendicular to the sides divided by the area of the domain. This represents
an area-averaged divergence. Gradient is calculated by first taking a least
squares fit of a multi-linear field to the data, and then calculating the
derivatives. Detailed procedures are the same as the flux method described
in Davies-Jones (1993). As pointed out in Davis-Jones (1993), this procedure,
when used for a triangle defined by three stations, is equivalent to several
methods of divergence and gradient calculation including Bellamy's graphical
method (Bellamy, 1949) and spatial linear fitting. It should be pointed
out that even though the analysis is independent of any particular model,
there is the possibility that the analysis depends on the numerical approximations.
A sensitivity study is performed by using various existing numerical schemes
(Davies-Jones, 1993) to calculate (14)-(17). It is found that these numerical
approximations do not affect the analysis to a degree to be concerned.

*c. more descriptions on the analysis method
and products*

**(More detailed descriptions and discussions can
be found in Zhang and Lin (JAS, 1997)
and Zhang et al. (MWR, 2001))**