## 275 -Positive polynomials

When studying local extreme points of functions of several (real) variables, the textbook asks in one of the exercises to consider the polynomial

$P(x,y)=x^2+3xy+3y^2-6x+3y-6.$

Here we have $P_x=2x+3y-6$ and $P_y=3x+6y+3$, so the only critical point of $P$ is $(15,-8).$ Since $P_{xx}=2$ and the Hessian of $P$ is $2\times 6-3^2=3>0$, it follows that $(15,-8)$ is a local minimum of $P$ and, since it is the only critical point, it is in fact an absolute minimum with $P(15,-8)=-63.$

$P$ being a polynomial, it is reasonable to expect that there is an algebraic explanation as for why $-63$ is its minimum, and why it lies at $(15,-8)$. After all, this is what happens in one variable: If $p(x)=ax^2+bx+c$ and $a\ne0$, then

$\displaystyle p(x)=a\left(x+\frac b{2a}\right)^2+\frac{4ac-b^2}{4a},$

and obviously $p$ has a minimum at $x=-b/2a$, and this minimum is $(4ac-b^2)/4a.$

The polynomial $P$ of the example above can be analyzed this way as well. A bit of algebra shows that we can write

$\displaystyle P(x,y)=\left(x-3+\frac32 y\right)^2+3\left(\frac y2+4\right)^2-63,$

and it follows immediately that $P(x,y)$ has a minimum value of $-63$, achieved precisely when both $x-3+3y/2=0$ and $4+y/2=0$, i.e, at $(15,-8).$

(One can go further, and explain how to go in a systematic way about the `bit of algebra’ that led to the representation of $P$ as above, but I will leave that for a future occasion.)

What we did with $P$ is not a mere coincidence.  Hilbert’s 17th of the 23 problems of his famous address to the Second International Congress of Mathematicians in Paris, 1900, asks whether every polynomial $P(x_1,\dots,x_n)$ with real coefficients which is non-negative for all  (real) choices of $x_1,\dots,x_n$ is actually a sum of squares of rational functions. (A rational function is a quotient of polynomials.) A nonnegative polynomial is usually called positive definite, but I won’t use this notation here.

If Hilbert’s problem had an affirmative solution, this would provide a clear explanation as for why $P$ is non-negative.

Hilbert himself showed that a non-negative polynomial $P$ is a sum of squares of polynomials in either of the following cases:

• $P$ is a polynomial in only one variable,
• $P$ is a quadratic polynomial (as in our example above), in any number $n$ of variables,
• $P$ is a quartic polynomial in two variables.

Emil Artin solved Hilbert’s problem (in the affirmative) in 1927. The stronger conclusion, that $P$ is a sum of squares of polynomials, is unfortunately false in general. Here is a counterexample:

$Q(x,y,z)=z^6+x^4y^2+x^2y^4-3x^2y^2z^2.$

That this polynomial is non-negative follows from the inequality between the arithmetic and the geometric means, see example 1 below. This polynomial appears in the paper of Marie-Francoise Roy, The role of Hilbert’s problems in real algebraic geometry, in Proceedings of the ninth EWM Meeting, Loccum, Germany 1999, and one can also check that it is not a sum of squares of polynomials by applying the algorithm I refer to below.

Although Hilbert himself was aware that some non-negative polynomials $P$ are not a sum of squares of polynomials, I believe it was the logician Rafael Robinson who exhibited the first explicit example of such a $P$, in his nice paper Some deﬁnite polynomials which are not sums of squares of real polynomials, in Selected questions of algebra and logic (collection dedicated to the memory of A. I. Malcev) (Russian), pp. 264282, Izdat. ‘‘Nauka’’ Sibirsk. Otdel., Novosibirsk, 1973.

Here I quote from the Mathscinet review by W. Ellison:

The key result is the following lemma: A polynomial $f(x, y)$ of degree at most 3 that vanishes at 8 of the 9 points $(x, y)$ with $x, y\in\{-1,0,1\}$ must also vanish at the 9th point.

The polynomial $f(x, y) =x^2(x^2-1)^2+y^2(y^2-1)^2$ vanishes to the 2nd order at each of the 9 points and we can ﬁnd a sextic curve $g(x, y) = 0$ that does not pass through $(0,0),$ but does have double points at each of the other 8 points, for example $g(x, y)=(x^2-1)(y^2-1)(x^2+y^2+1).$ It follows that the function $g(x, y)/f(x, y)$ is bounded above in the whole $(x, y)$ plane. Thus, for a suitable $\alpha >0$ (in fact one can take $\alpha= 1$) it follows that $S_\alpha(x,y)=f(x,y)-\alpha g(x,y)$ is positive deﬁnite and that $S_\alpha(0,0)\ne0.$ It is now clear that we cannot have $S_{\alpha} (x,y)=\sum f_r^2 (x,y)$ with real polynomials $f_r,$ for these polynomials are at most cubics and so would have to vanish at $(0,0),$ since they vanish at the other 8 points.

It is interesting to see that several logicians have worked on this problem; notably Robinson, Henkin and Kreisel.

This is a good place to mention that there are algorithms that, given a non-negative polynomial, decide whether it admits a representation as a sum of squares of polynomials, and if so, find such a sum. This is the main result of the nice paper An algorithm for sums of squares of real polynomials, by Victoria Powers and Thorsten Woermann, J. Pure and Appl. Alg. 127 (1998), 99-104. (As of this writing, the paper is also available at Dr. Powers’s webpage.)

Many natural inequalities that appear frequently in analysis can effectively be stated as asserting that certain polynomials are non-negative, so in fact one can (at least in principle) prove these inequalities by simply displaying a representation of the given polynomials as sums of squares.

Examples

1. The inequality between the arithmetic and the geometric means asserts that

$\displaystyle\frac{x_1+\dots+x_n}n\ge\sqrt[n]{x_1\dots x_n}$

for all nonnegative numbers $x_1,\dots,x_n$, with equality only when $x_1=\dots=x_n$. This classical result is one of the most quoted inequalities in analysis. Notice that it is equivalent to the assertion that the polynomial

$\displaystyle P(x_1,...,x_n)=\frac{x_1^{2n} + x_2^{2n} +\dots+x_n^{2n}}n -x_1^2\dots x_n^2$

is expressible as a sum of squares (here, I replaced each $x_i$ with $x_i^2$ to dispense with the restriction that the $x_i$ must be non-negative).

As suggested, one may try to prove this inequality by simply exhibiting $P$ as a sum of squares. Such a representation was actually found by Hurwitz in 1891, in his paper Über den Vergleich des arithmetischen und des geometrischen Mittels, that can be found in Math. Werke, 505-507, Basel, E. Berkhäuser, 1933.

To write the corresponding representation in a palatable way, let’s introduce some notation. Given a function $f(x_1,\dots,x_n)$, let’s write ${\mathcal P}f(x_1,\dots,x_n)$ for the result of adding all the expressions of the form $f(x_{i_1},\dots,x_{i_n})$ for all possible rearrangements $(i_1,\dots,i_n)$ of the indices $(1,\dots,n)$ (there are $n!$ such possible rearrangements, where $n!$, $n$-factorial, is defined by setting $0!=1$ and $n!=1\times\dots\times n$ for $n=1,2,\dots$).

For example, ${\mathcal P}x_1=(n-1)!(x_1+\dots+x_n)$ and ${\mathcal P}(x_1\dots x_n)=n!x_1\dots x_n$.

Write

$\phi_k(x_1,\dots,x_n)$ $={\mathcal P} ((x_1^{n-k}-x_2^{n-k}) (x_1-x_2)x_3 x_4\dots x_{k+1})$

for $k=1,\dots,n-1$ and

$f_k=\phi_k(x_1^2,\dots,x_n^2).$

Notice that each $f_k$ is clearly non-negative, since $(x_1^{n-k}-x_2^{n-k})(x_1-x_2)x_3 x_4\dots x_{k+1}=$

$\displaystyle (x_1-x_2)^2\left(\sum_{j=0}^{n-k-1}x_1^{n-k-1-j}x_2^j\right)x_3\dots x_{k+1}.$

Then

$\displaystyle P(x_1,\dots,x_n)=\frac1{2(n!)}\sum_{k=1}^{n-1}f_k.$

Notice that this expression makes it obvious that equality can only occur if the $x_i$ are all equal.

For example,

$\displaystyle \frac{x_1^2+x_2^2}2-x_1x_2=\frac12(x_1-x_2)^2,$

$\displaystyle \frac{x_1^3+x_2^3+x_3^3}3-x_1x_2x_3=\frac16((x_1-x_2)^2x_3+(x_2-x_3)^2x_1$ $\displaystyle +(x_3-x_1)^2x_2+(x_1-x_2)^2(x_1+x_2)$ $\displaystyle +(x_1-x_3)^2(x_1+x_3) +(x_2-x_3)^2(x_2+x_3)),$

and
$\displaystyle \frac{x_1^4+x_2^4+x_3^4+x_4^4}4-x_1x_2x_3x_4=\frac{(x_1^2-x_2^2)^2+(x_3^2-x_4^2)^2}4$ $\displaystyle +\frac{(x_1x_2-x_3x_4)^2}2.$

2. Another fairly used classical inequality is the Cauchy-Schwarz inequality

$\displaystyle \sum_{i=1}^n x_iy_i\le\sqrt{\left(\sum_{i=1}^n x_i^2\right)\left(\sum_{i=1}^n y_i^2\right)},$

with equality only if the tuples $(x_1,\dots,x_n)$ and $(y_1,\dots,y_n)$ are proportional. This result is usually shown by analyzing the discriminant of a non-negative quadratic polynomial, and it is also widely presented in the more compact form $|\vec x\cdot \vec y|\le\|\vec x\|\|\vec y\|$ for $\vec x,\vec y$ arbitrary vectors in ${\mathbb R}^n$ and $\cdot$ the usual dot product. Letting

$\displaystyle P(\vec x,\vec y)=\left(\sum_{i=1}^n x_i^2\right) \left(\sum_{i=1}^n y_i^2\right)-\left(\sum_{i=1}^n x_iy_i\right)^2,$

the inequality is seen to be equivalent to the assertion that the polynomial $P$ is non-negative, and so (as in the previous example) it should be possible to prove this by exhibiting $P$ as a sum of squares. The resulting identity is due to Lagrange, namely,

$\displaystyle P(x_1,\dots,x_n,y_1,\dots,y_n)=\sum_{1\le i

from which the case of equality also follows immediately.

3. Let me close with an example of a classical inequality that leads to non-negative polynomials for which I do not know whether they admit representations as a sum of squares of polynomials. This inequality is due to Minkowski: For non-negative $x_i,y_i$, we have

$\displaystyle \left(\prod_{i=1}^n(x_i+y_i)\right)^{1/n}\ge\left(\prod_{i=1}^n x_i\right)^{1/n}+\left(\prod_{i=1}^n y_i\right)^{1/n}.$

As in example 1 above, we can rewrite this as the assertion that a polynomial is non-negative, namely,

$\displaystyle P(x_1,\dots,x_n,y_1,\dots,y_n)$ $\displaystyle =\prod_{i=1}^n(x_i^{2n}+y_i^{2n})-\left(\prod_{i=1}^n x_i^2+\prod_{i=1}^n y_i^2\right)^n.$

That the inequality holds is a consequence of the inequality between the arithmetic mean and the geometric mean. For example, expanding the first product, one finds exactly $n\choose k$ terms with exactly $k$ many factors of the form $x_i^{2n}$ and $n-k$ many factors $y_i^{2n}$. For any $j$ with $1\le j\le n,$ precisely ${n-1}\choose{k-1}$ many of these terms contain the factor $x_j^{2n}$ so, by the arithmetic mean-geometric mean inequality, the sum of these terms is at least $n\choose k$ times the product of the $x_i$ to the power $2k$ times the product of the $y_i$ to the power $2(n-k)$ (since $2n {{n-1}\choose{k-1}}/ {n\choose k} =2k$), which is precisely the corresponding term when we expand the $n$-th power we are subtracting in $P$.

However, it is not clear to me that $P$ is a sum of squares of polynomials. The problem trying to apply the representation I displayed in example 1 is similar to the problem one finds trying to apply that representation to the polynomial $Q(x,y,z)=z^6+x^4y^2+x^2y^4-3x^2y^2z^2,$ namely, since for example $x^4y^2$ is not a perfect sixth power, the representation from example 1 gives us $Q$ as a sum of squares of functions involving cubic roots rather than polynomials.

2. Many thanks for all of this. I was thinking on this problem and, actually, I produced another expression for $P(x_1,\dots,x_n)$ as a sum of squares which is more complicated than Hurwitz’. So, thanks again for your link to Hurwitz’ work.
Now, concerning your question at the end, there is a well-known algorithm that determines whether a positive polynomial is a sum of squares. Have you already tested it, say for $n = 3,4,5$?