| Topic: |
Science > Physics |
| User: |
"OsherD" |
| Date: |
23 Jun 2005 06:34:41 PM |
| Object: |
Rotation vs Expansion-Contraction 4 |
From Osher Doctorow
Why would a "polynomial coefficient lower than second degree" condition
hold for Probable Influence (PI) Maximum Entropy probability
distributions in their densities or distribution functions via the
Riccati Differential equation?
Let's look at the Riccati Differential equation:
1) dy/dt = A(t) + B(t)y + C(t)y^2
where t is ordinarily time but can be generalized to any variable
quantity x (x = t in other words). It also can be generalized beyond y
= f(t) for some function f of t to y = f(u, v) for two or in fact more
than two variables u, v, etc., in which case dy/dt becomes a partial
derivative with regard to t or else a total differential if all of u,
v, etc., are functions of one variable t (in fact, generalization in
both directions so to speak can be done).
Recall that in the Einstein Field Equation's proof (well, one of them
anyway via Steven Weinberg (1972)), derivatives of order greater than 2
are excluded by "convention". Since (1) above already represents
dy/dt, it makes "analogous" sense to exclude coefficients A(t), B(t),
C(t) which are polynomials of degree greater than 1. Of course, order
of a differential equation is very different in general from degree of
a polynomial, but notice that for a degree 3 polynomial A(t):
2) A(t) = at^3 + bt^2 + ct + d
the first and second and third derivatives all exist and are in general
nonzero with a, b, c, d constants and t continuous on an open interval,
while for a degree 1 polynomial A(t) we have:
3) A(t) = kt + h
4) dA(t)/dt = k not equal 0 in general
5) A"(t) = 0 for all t in the open interval
So for polynomial functions there is a rather immediate relationship
between excluding higher than second order derivatives and excluding
higher than first degree polynomials, although it ultimately is
somewhat intuitive from the fact that:
6) dx^n/dx = nx^(n-1)
Osher Doctorow
.
|
|
| User: "OsherD" |
|
| Title: Re: Rotation vs Expansion-Contraction 4 |
23 Jun 2005 07:15:58 PM |
|
|
From Osher Doctorow
In general, I think that we are heading toward a different idea of
"complexity" or perhaps more accurately "complication" in mathematics
and physics to which Einstein was intuitively and explicitly opposed
with his emphasis on simplicity.
The probability distributions (curves or graphs roughly speaking for at
least the univariate cases) that are selected as Maximum Entropy by
either the Shannon or PI (Probable Influence) conditions turn out to
have remarkable Simplicity.
For Shannon PI, the Maximum Entropy for the continuous case is as
follows:
A. For all unknown parameters out of 2, the uniform distribution is
MaxEnt (short here for Maximum Entropy). Since the uniform
distribution on interval (a, b) has two constant endpoints (which
change for different members of the uniform distribution family), and
since in fact fX(x) = 1/(b - a) = contant and the mean or expectation
E(X) = (a + b)/2 and variance also depends on a and b, the uniform
distribution can be considered as a 2-parameter distribution.
B. For one unknown parameter out of 1, the exponential family which
depends on one parameter turns out to be MaxEnt.
C. For 2 parameters known, the normal/Gaussian distribution which
depends on two parameters is MaxEnt.
This is what I was essentially saying in the post that Chris Hillman
thought was hysterically funny (that is to say I was saying A, B, and C
above). Since Claude Shannon was a communications engineer, I suppose
that he wouldn't exactly appreciate that.
Osher Doctorow
.
|
|
|
| User: "OsherD" |
|
| Title: Re: Rotation vs Expansion-Contraction 4 |
23 Jun 2005 07:49:43 PM |
|
|
From Osher Doctorow
For Probable Influence (PI), there are at present about 3 different
ways of obtaining results regarding Maximum Entropy, and the 3 ways
give more or less identical results. They are:
A. The quantity E(X-->Y)(x,y) of Probable Influence (PI) replaces
Shannon Entropy, where we have:
1) E(X-->Y)(x,y) = I[y fX-->Y(x,y)] dy or I[yFX-->Y(x,y)]dy (I...dy is
Lebesgue integral)
2) fX-->Y(x,y) = 1 + f(x,y) - fX(x)
3) FX-->Y(x,y) = 1 + F(x,y) - FX(x) = P(X-->Y)(x,y)
The number 1 in (2) and (3) results in infinite or in undefined
integrals on the whole real line, but for finite interval distributions
everything is finite. Therefore, PI MaxEnt gives "highest" MaxEnt to
finite interval distributions like the uniform or beta distributions or
the finitely truncated or censored distributions which are now so
popular. This would correspond to the uniform distribution being
Shannon MaxEnt for continuous 2-parameter distributions, noting that
for the Shannon scenario the respective A, B, C cases of the previous
posting are for 2 unknown out of 2, 1 unknown out of 1, and 0 unknown
out of 2 parameters, which I abbreviate as 2 or less unknown out of 2
possible parameter distributions for the continuous cases.
The next highest MaxEnt PI distributions have to be assigned to the
infinite or else to the undefined (e.g., infinity - infinity, infinity
times 0, etc.) scenarios of E(X-->Y). Since distributions defined on
the whole nonnegative real line but 0 elsewhere have +infinity as their
E(X-->Y), and in general contribute nonnegative integrals by
monotonicity of Lebesgue integration in Real Analysis, these
distributions are ranked second in MaxEnt to the finite interval
distributions. These nonnegative real line distributions are
precisely the gamma distributions which include the Shannon MaxEnt
one-unknown-parameter exponential distribution as well as the
chi-square and other distributions, the F distribution so important in
ANOVA (analysis of variance) in experimental research and in
regression, the Pareto/Power distribution, etc.
Finally, what about symmetric distributions defined and nonzero on the
whole real line, like the normal/Gaussian and Student's t and Cauchy
distributions? The normal/Gaussian distribution is mainly useful as
large sample approximations to finite mean-variance distributions and
as error-minimizing distributions. The large sample approximation goes
against the grain of PI, which is especially useful for Rare Events.
Error-minimization is also "against the grain" of PI to some extent
because PI is especially related to engineering reliability via the
survival functions and reliabilities P(X > x) and P(X > x, Y > y) which
in turn relate especially to nonnegative real line distributions since
for example survival time is nonnegative (although the normal/Gaussian
distribution has filtered through from its habitual use elsewhere).
I'll try to continue this shortly.
Osher Doctorow
.
|
|
|
| User: "OsherD" |
|
| Title: Re: Rotation vs Expansion-Contraction 4 |
23 Jun 2005 08:03:34 PM |
|
|
From Osher Doctorow
What about the actual value of E(X-->Y)(x,y) on the real line
(-infinity, +infinity)? The difficulty is the I(y)dy term, which is
the integral of y from -infinity to +infinity. This would be y^2/2
"evaluated at infinity" minus y^2/2 "evaluated at -infinity", or
strictly speaking infinity - infinity. This is not merely +infinity
but undefined period. The other terms are in general finite, which I
won't take the time to prove here.
So the symmetric distributions including the normal/Gaussian, Student's
t, and Cauchy distributions on the whole real line are lowest in PI
MaxEnt, corresponding to their being lowest in the Shannon ranking from
uniform to exponential to normal/Gaussian distributions. Notice that
the exponential is "in between", just as the gamma and F are "in
between" in PI MaxEnt where the gamma includes the exponential and
chi-square distributions and others.
B. The "not higher than lst degree polynomial coefficients" in the
Riccati Differential equation rule discussed previously yields
essentially the same ordering of continuous MaxEnt distributions under
Shannon and PI MaxEnt. For example, the uniform distribution pdf fX(x)
is a constant polynomial, the exponential distribution pdf fX(x) is an
exponential function of type kexp(-kx) which is a very simple
exponential and the normal/Gaussiain distribution pdf fX(x) is a more
complicated exponential function of type kexp(-cx^2) or kexp(-c(x -
EX)^2) with constants k, c and EX being the population mean or
expectation of X. Also, x^2 obscures the discrimination of negative
and positive values of x (it lumps them together, which is fine for
minimizing errors that can be either positive or negative but not for
other applications).
Osher Doctorow
.
|
|
|
|
|
|

|
Related Articles |
|
|