1. We all know how to solve a linear equation such as , namely (assuming ; if then either and any is a solution, or else there are no solutions). This was known to Babylonian and Persian mathematicians (with the usual caveats about the signs of , since the notion of negative numbers had not been introduced yet.)
This is trivial, but there is a subtle point here:
- Some equations have no solutions.
If we are interested in solving polynomial equations in general, at some point we will need an argument justifying that we can. For now, let us proceed formally, assuming that we will always find solutions.
Just as with linear equations, we all know as well how to solve quadratics, such as . Namely, we can factor out (if we are in the linear case, so let’s assume that this is not the case) and then complete the square. We get
so iff , or
the well known quadratic formula.
Another small subtlety appears here, namely, there is some inherent ambiguity in the meaning of the expression . We usually resolve this by “choosing a sign” of the square root. As long as we are looking at quadratic polynomials with integer (or rational, or real, or even complex) coefficients, there is a standard way of making this choice. In more general situations (in arbitrary fields) there is no such standard procedure.
Besides this subtlety, a more serious one needs to be faced. Nowadays, we are used to working with complex numbers, so the view of a square root of a negative number does not cause confusion, but this was a serious issue for many centuries, and when complex numbers were first used, many were skeptical of whether they actually made sense. It wasn’t until Gauss’ presentation of complex numbers as pairs of reals that their use became mainstream. This is related to the question of whether one can always solve an equation. The answer was “no” until complex numbers were introduced and accepted, and then it became “yes.”
2. Cubic equations. The history of the solutions to cubic and quartic equations is full of melodrama, the historical note at the end of Chapter 2 of the book is well worth reading, see also this.
Here is a brief description of the method. I recommend that you look at the book for additional remarks and examples.
We begin with a general observation; the ideas discussed here (algebra of polynomial equations) tended to constitute a large part of a typical Abstract Algebra course at the beginning of the 20-th century. The “symmetries” that are at the center of the concept of group are apparent in the equations that one obtains.
Suppose is a polynomial of degree , and let be a root, so , and where is a polynomial of degree ; here we are using the identities .
Assuming that we can find roots of , the above shows that we can factor . But then, expanding, we also must have
which relates the coefficients of to the so called elementary symmetric functions of the roots of , namely,
- is minus the sum of the roots,
- is the sum of all the products with ,
- is minus the sum of all the triple products, etc.
Now consider a cubic equation (just as in the previous cases, we can begin by dividing by the leading coefficient, so we may as well assume that the leading coefficient is 1).
It turns out that solving the equation is simpler if the coefficient of , called above, is zero. In order to achieve this, we `translate’ the equation by replacing . If are the roots of the original equation, the roots of the translated equation would be , and the coefficient of would be , so it is zero iff .
In effect, if , the equation becomes , which we will write in the more palatable form , with and .
[Of course, we could have done this directly without the analysis of the roots above, but (beyond the motivation above) it will eventually become relevant through the course that certain functions of the roots are associated to the coefficients of the polynomial.]
In order to solve this equation, we try a new substitution. There are several ways of motivating it (“because it works,” for example). The new substitution replaces by , where for a carefully chosen . This is a Möbius transformation, and one can use a bit of complex analysis to indicate why it would be reasonable to try something like that.
[I remember a few years ago somebody gave a talk on this motivation in the Graduate Student Seminar at UC Berkeley, but I have not been able to find a reference, and I would appreciate any tips.]
The idea behind this substitution is that hopefully it will transform the equation in into an equation in of the form , which is equivalent to . This new equation is actually a quadratic in , and can therefore be solved.
Let’s try to see if this works: If then becomes
and (incredibly!) if we choose , then the coefficients of both and vanish, and the equation indeed takes the form , where and .
[There are of course deeper reasons (than magic) why this trick works, but they will have to wait for a little bit until we develop enough of the theory to make sense of them. Historically, the argument was for a while just seen as a trick, and the explanation of why it made sense did not appear until the development of Galois theory.]
As pointed out above, this is a quadratic in , which we can then proceed to solve, to find two possible values of . To each such value we can associate three possible values of , corresponding to the three possible cubic roots of (see below). Each of these six values can then be used to find a value of , and thus a value of for which the original polynomial vanishes.
As before, there is a new subtlety here. It looks like we have six possibilities for , but we can only have three roots. It takes a little bit of chasing through the equations to see that indeed only three roots are produced. The book shows that at a crucial step a choice of sign in a square root actually does not change the root one obtains. I recommend that you work this out on your own before reading the explanation in the book.
If we were adventurous, we could actually write the solutions explicitly in terms of the coefficients . I’ll skip this step.
(And just to be self contained, say that is a real and we want to solve the equation . Let be the unique real such that . Then , so or , so either or or , where and .)
There is an important point to notice here. Assume that is a cubic polynomial, with real coefficients. From calculus, we know that has at least one real root . This might not be apparent from the formulas we obtain, see Example 2.1 in the book to illustrate this issue. It is a useful exercise (do it!) to check that using the formulas one obtains, indeed at least one of the roots must be a real number. Make sure you verify this fact.
3. Quartic polynomials. The argument in this case begins in a similar way. Say we want to solve the equation As before, it turns out to be useful to try to remove the coefficient of . This is achieved by a translation, just as in the previous case: Set . Then the equation becomes
which, in order to keep our sanity, we write as .
The trick in this case is to see that this equation can be written as a difference of squares, which leads to its factorization as the product of two quadratics, and can therefore be solved. In order to find the squares that lead to this factorization, a parameter is introduced, and one checks that the restrictions it must satisfy lead to a cubic equation. The details will be presented next lecture.