Total probability of the wrong decision in transport junction

Тип работы:


Детальная информация о работе

Выдержка из работы

Russian Journal of Logistics and Transport Management, Vol. 2, No. 1, 2015
(c)Wieslaw Pasewicz
West Pomeranian University of Technology
The problem of selection of the best decision from the set possible is critical for managing such complex objects like transport junctions. The model of discrimination in operating transport junctions often can lead to taking the wrong decision. Then, finding the probability of such a mistake is important. In the multivariate discriminant analysis, considerable attention has been paid to the quadratic discriminant function (QDF) and the linear discriminant function (LDF). Generally, the distribution of QDF is not known even if parameters of QDF are known. Therefore, exact expressions for the misclassification probabilities are not available. In the case of LDF, the distribution is well known, when the parameters of LDF are known. However, when the parameters of the LDF are not known, it is required training samples to estimate them. The wrong decisions (the misclassifications) probabilities can then be expressed as a multiple integral, which appear to be difficult for a practitioner to use. In the previous author’s article (Pasewicz, 2007), the wrong decisions in the case of several univariate normal populations were found. The current paper considers a practical approximation method for the multiple integrals associated with several normal populations of LDF, using Rao’s method of scoring.
Keywords: discriminant function, a wrong decision, the maximum score, Mahalanobis distance, total probability.
1 Introduction
Suppose that X is a p-variate normal with mean pt and covariance matrix? t from one of k populations (groups) nb i.e. X ~ Np (pi,? t), i =1,…, k. Let qi be the prior probability that an observation x belongs to n ь and C (i | j) be the cost of a wrong decision (misclassification) x into n ь when, in fact, it belongs to n j — i, j = 1,…, k- i Ф j. The optimal decision rule in terms of minimizing Bayes risk (Lachenbruch and Goldstein, 1979) is to classify x into n t or nj, according to:
U (x) = (x- Pj)'-?j 1(x — Pj) -(x — p-У? j 1(X — Pi) & gt- c, or (& lt- c), (1)
for i, j = 1, …, k, and i Фj,
c = ЫтЩ + 2 In
4jC (ij)
4iC (ji)
It is a matter of indifference what action is taken when equality holds in (1). The expression U (x) is called the quadratic discriminant function (QDF), and the distribution of the random variable U (x) is not known. There were obtained approximate expressions for the QDF under certain restrictions on the covariance matrices of the two populations. For example, in the scientific paper (Aiuppa and Bargmann, 1977), the distribution of U (x) for i, j = 1, 2- i Ф j is approximated by a Pearson curve, using the four moments. Bayne et al. (1984) obtained approximations for QDF, if covariance matrices were equal the identity matrix and diagonal matrix for two bivariate normal populations. In case when the covariance matrices S (i = 1, …, k) are equal, i.e. Si = S (i = 1, • • •, k) the optimal decision rule (1) is to classify x into u or Uj, according as:
W (x) = [x -^(i i+Hj)] S _ 1(li-lj)& gt-c or (& lt- c), (3)
i, j = 1, …, k, and i * j.
The function W (x) is known as the linear discriminant function (LDF) (Fisher, 1936). In the following, we will assume throughout that the prior probabilities qi are equal, and also the costs of misclassification are equal. Then, we can use equivalent method (Rao, 1965) of classifying an observation to one of these populations. The method considers assigning scores:
кi (x)=l[S~1 x — iS~11i, i = 1, •••, k (4)
to for this observation and assigning the observation to the population with the maximum score. Until now, we have assumed that mean vectors and a common covariance matrix? are known. However, in practice is one where not all (if any) of the parameters are known. We need training samples of sizes N (i = 1, …, k) from these populations to estimate it and? for use in the scores (4). Although the use of Li minimizes the probability of the wrong decision the use estimator L i of Li cannot be justified similarly. However, as Anderson (1958, p. 137) states 'it seems intuitively reasonable that L i should give good results'. As shows Streit (1977), much valuable of decision-theoretic nature may be given to justify the use L i. Several authors, e.g. John (1973), Lachenbruch and Mickey (1968) have investigated the probability of wrong decisions by the linear discriminant function in the case of two normal populations. Denote by Pi (i = 1, …, k) likelihood of a wrong decision when the observed vector x actually belongs to Uj. Then, the total probability of the bad decision (TP) is:
TP = 1s l 1 Pi ¦ (5)
2. Multivariate normal populations
We first calculate the probability of a wrong decision (misclassification) Ti, using Rao’s method of scoring when population parameters ii and X are known. The likelihood of the wrong decision Ti when x belongs to ni is:
Ti = 1 — probability of a correct decision = (6)
= 1 — Pr{Lj (x) & gt- Lj (x) -j = 1,…, k -j Ф i).
We can express it using multiple integral:
T = 1 — /0°° … /0°° gi dyii … dytJ_ytJ+i … dyik — i = 1,…, k, (7)
where gi is a density function k — 1 dimensional multivariate normal distribution. Schervish (1981) uses the method of asymptotic expansions to this integral, but the result appears to be difficult for a practitioner to use. We will make an approximation of chance of a wrong decision Ti, when the correlations between are ignored. Then, using the distribution of we obtain:
Pr[L t (x) & lt- Lj (x)] = Pr[L (x) — Lj (x) & lt- о] = Ф. — 1д0), (8)
where Ф denotes the cumulative distribution function of a Np (0,1) variable and Д ij is known as the Mahalano b is distance between two populations normal distribution of the form:
д ij = (li — IjOZ _ i (l i — ij) ¦ (9)
Using the К imb a l ls inequality (Kimball, 1951), we can write:
Ti*1 — nk= i, j"-[ 1-ф (-1ду)], i = 1, … k. (10)
As stated Hochberg and Tamhane (1987, p. 63), this approximation is at least as good as Bonferroni bound user for similar results in multiple comparisons in the analysis of variance.
The total probability of a wrong decision is, then, approximately:
tp=in i. (11)
If we now replace every Д i j in (10) by:
Дтin = miп (дi j^, (12)
the total probability of the wrong decision, may be evaluated as follows:
ТР4й-. {1-W-. 1-ф (--V)]}s
& lt- lyfe
S kLl =1
1 -Ф
. 1 /с — 1
? jS?=1{ l-[ 1-(к-1)ф (-1д
and, finally:
ТР~(к-1)ф (-~2дтЫ)
Now, we propose the unknown parameters in the final expression (13) be replaced by the corresponding sample estimates. Let us assume that Xi1,…, XiN. (i = 1, … k) are mutually independent random training samples, drawn from p-dimensional normal populations Np (pi, X). Denote by:
i = 1, …, k —
the corresponding sample means, and estimate X using the pooled covariance matrix:
Д=^Й-1№-ОД, (15)
Si=Nk1N= 1 (Xi,-Xi)(XU-Xi)'- (16)
is the sample covariance matrix corresponding to the i-th sample, for i = 1,., k
and N = X f= 1N, N — к — p — 1& gt-0. We substitute the estimates Xi, S for parameters, respectively, in (4) and obtain:
Li (x)=x-S-1 x--2X[S-1 Xi. (17)
The observation x is then assigned to ni, if:
L i (x) = max [[L]_(x), …, 4(x)]], (18)
let — be the average sample size and denote by A = (N — k) S.
We will show that Л 2 j of the form:
L 2 _ N ^ p 1 (у
(Xi-Xj)'-s — %Xi-Xj)
2 pk N
is the unbiased estimate of Л 2j given by (9).
Indeed, according to Kshirsagar (1972, p. 72): E (A — x) = (N-k-p- 1) & quot- 1 2 & quot- 1.
where A has a Wishart distribution, i.e. A ~ Wp (N — k, 2) and N — k — p — we have:
Thus, in practice, we can use an estimator of the total probability of a wrong decision (misclassification) TP given by (11) of the following form:
where Д 2 j is defined by (19).
Aiuppa, T. A. & amp- Bargmann, R. E. (1977). Distribution of a linear combination of quadratic forms, a four-moment approximation. In Proceedings of the Business and Economics Statistics Section, American Statistical Association. 706, 710 p.
Е (Щ = (N — k — p — 1) 0Ea-i (Xt — Xj)'-A-^Xi — Xj) Xt, Xj]
2 pk N
It is known that (Anderson, 1958, p. 56):
Xi — Xj ~ Np {gi -gj — - 2) (22)
where x2 P & gt- P) denotes the noncentral distribution x2 with p degrees of
freedom and:
TkP=Tk (g i- g j)'- 2 1 i — g j) (24)
is the noncentrality parameter.
Therefore, according to Enis and Geisser (1970):
Е (Ц) = 2i [E* & gt-,(Xi — Xj)'- 2-KXi — X,)] ^{^p + p)-^ = A2j. (25)
TP «(k — 1) Ф --min
Anderson, T. W. (1958). An Introduction to Multivariate Statistical Analysis. John Wiley & amp- Sons, Inc., New York, USA.
Bayne, C. K., Beauchamp, J. J. & amp- Kane, V. E. (1984). Misclassification probabilities for second-order discriminant functions used to classify normal bivariate populations. Communications in Statistics-Simulation and Computation, 13(5), 669−682.
Enis, P. & amp- Geisser, S. (1970). Sample discriminants that minimize posterior squared error loss. South African Statist. J, 4, 85−93.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics. 7(2), 179−188.
Hochberg, Y. & amp- Tamhane, A.C. (1987). Multiple Comparison Procedures. John Wiley, New York, USA.
John, S. (1973). On inferring the probability of misclassification by the linear discriminant function. Annals of the Institute of Statistical Mathematics, 25(1), 363−372.
Kimball, A. W. (1951). On dependent tests of significance in the analysis of variance. The Annals of Mathematical Statistics, 600−602.
Kshirsagar, A. M. (1972). Multivariate Analysis. M. Dekker, New York, USA, 534.
Lachenbruch, P. A. & amp- Goldstein, M. (1979). Discriminant analysis. Biometrics, 69−85.
Lachenbruch, P. A. & amp- Mickey, M. R. (1968). Estimation of error rates in discriminant analysis. Technometrics, 10(1), 1−11.
Pasewicz, W. (2007). Total Chance of Misclassification for Several Univariate Normal
Populations in Transport Junction. St. Petersburg: International Transport Academy, Russia.
Rao, C.R. (1965). Linear Statistical Inference and Its Applications. John Wiley, New York, USA.
Schervish, M. J. (1981). Asymptotic expansions for the means and variances of error rates. Biometrika, 68(1), 295−299.
Streit, F. (1977). Identification rules based on partial information on the parameters, Recent Developments in Statistics, Proceedings of the European Meeting of Statisticians, 797−806.

Заполнить форму текущей работой