derivations-0.53.20120414.orig/0000755000000000000000000000000011742576343014335 5ustar rootrootderivations-0.53.20120414.orig/README0000644000000000000000000000564311742576317015226 0ustar rootroot ------------------------------------------------------------------------ Derivations of Applied Mathematics ------------------------------------------------------------------------ Understandably, program sources rarely derive the mathematical formulas they use. Not wishing to take the formulas on faith, a user might nevertheless reasonably wish to see such formulas somewhere derived. Derivations of Applied Mathematics is a book which documents and derives many of the mathematical formulas and methods implemented in free software or used in science and engineering generally. It documents and derives the Taylor series (used to calculate trigonometrics), the Newton-Raphson method (used to calculate square roots), the Pythagorean theorem (used to calculate distances) and many others. Among other ways, you can read the book on your computer screen by opening the file /usr/share/doc/derivations/derivations.ps.gz with the gv(1) program under X(7). To print the book on a standard postscript printer, just zcat(1) then lpr(1) the same file. The book is written by Thaddeus H. Black, who also maintains the Debian package 'derivations' in which the book is distributed. Users who need to contact the author in his role as Debian package maintainer can reach him at . However, most e-mail will naturally be about the book itself: this should be sent to . ------------------------------------------------------------------------ Copyright (C) 1983-2010 Thaddeus H. Black License: This book, its source and all the files packaged with them are free software; you can redistribute them and/or modify them under the terms of the GNU General Public License, version 2, as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this package; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. On Debian systems, the complete text of the GNU General Public License can be found in `/usr/share/common-licenses/GPL-2'. The tex/xkeyval.{sty,tex} and associated files are Copyright (C) 2004-2008 Hendri Adriaens and licensed as follows. This work may be distributed and/or modified under the conditions of the LaTeX Project Public License, either version 1.3 of this license or (at your option) any later version. The latest version of this license is in http://www.latex-project.org/lppl.txt and version 1.3 or later is part of all distributions of LaTeX version 2003/12/01 or later. ------------------------------------------------------------------------ Thaddeus H. Black Wed, 10 Mar 2010 00:00:00 +0000 derivations-0.53.20120414.orig/tex/0000755000000000000000000000000011742576346015140 5ustar rootrootderivations-0.53.20120414.orig/tex/xkeyval.tex0000644000000000000000000006107011742575144017344 0ustar rootroot%% %% This is file `xkeyval.tex', %% generated with the docstrip utility. %% %% The original source files were: %% %% xkeyval.dtx (with options: `xkvtex') %% %% --------------------------------------- %% Copyright (C) 2004-2008 Hendri Adriaens %% --------------------------------------- %% %% This work may be distributed and/or modified under the %% conditions of the LaTeX Project Public License, either version 1.3 %% of this license or (at your option) any later version. %% The latest version of this license is in %% http://www.latex-project.org/lppl.txt %% and version 1.3 or later is part of all distributions of LaTeX %% version 2003/12/01 or later. %% %% This work has the LPPL maintenance status "maintained". %% %% This Current Maintainer of this work is Hendri Adriaens. %% %% This work consists of the file xkeyval.dtx and derived files %% keyval.tex, xkvtxhdr.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, pst-xkey.tex, pst-xkey.sty, xkveca.cls, xkvecb.cls, %% xkvesa.sty, xkvesb.sty, xkvesc.sty, xkvex1.tex, xkvex2.tex, %% xkvex3.tex and xkvex4.tex. %% %% The following files constitute the xkeyval bundle and must be %% distributed as a whole: readme, xkeyval.pdf, keyval.tex, %% pst-xkey.sty, pst-xkey.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, xkvtxhdr.tex, pst-xkey.dtx and xkeyval.dtx. %% \csname XKeyValLoaded\endcsname \let\XKeyValLoaded\endinput \edef\XKVcatcodes{% \catcode`\noexpand\@\the\catcode`\@\relax \catcode`\noexpand\=\the\catcode`\=\relax \catcode`\noexpand\,\the\catcode`\,\relax \catcode`\noexpand\:\the\catcode`\:\relax \let\noexpand\XKVcatcodes\relax } \catcode`\@11\relax \catcode`\=12\relax \catcode`\,12\relax \catcode`\:12\relax \newtoks\XKV@toks \newtoks\XKV@tempa@toks \newcount\XKV@depth \newif\ifXKV@st \newif\ifXKV@sg \newif\ifXKV@pl \newif\ifXKV@knf \newif\ifXKV@rkv \newif\ifXKV@inpox \newif\ifXKV@preset \let\XKV@rm\@empty \ifx\ProvidesFile\@undefined \message{2008/08/13 v2.6a key=value parser (HA)} \input xkvtxhdr \else \ProvidesFile{xkeyval.tex}[2008/08/13 v2.6a key=value parser (HA)] \@addtofilelist{xkeyval.tex} \fi \long\def\@firstoftwo#1#2{#1} \long\def\@secondoftwo#1#2{#2} \long\def\XKV@afterfi#1\fi{\fi#1} \long\def\XKV@afterelsefi#1\else#2\fi{\fi#1} \ifx\ifcsname\@undefined\XKV@afterelsefi \def\XKV@ifundefined#1{% \begingroup\expandafter\expandafter\expandafter\endgroup \expandafter\ifx\csname#1\endcsname\relax \expandafter\@firstoftwo \else \expandafter\@secondoftwo \fi } \else \def\XKV@ifundefined#1{% \ifcsname#1\endcsname \expandafter\@secondoftwo \else \expandafter\@firstoftwo \fi } \fi \XKV@ifundefined{ver@keyval.sty}{ \input keyval \expandafter\def\csname ver@keyval.sty\endcsname{1999/03/16} }{} \long\def\@ifnextcharacter#1#2#3{% \@ifnextchar\bgroup {\@ifnextchar{#1}{#2}{#3}}% {\@ifncharacter{#1}{#2}{#3}}% } \long\def\@ifncharacter#1#2#3#4{% \if\string#1\string#4% \expandafter\@firstoftwo \else \expandafter\@secondoftwo \fi {#2}{#3}#4% } \long\def\XKV@for@n#1#2#3{% \XKV@tempa@toks{#1}\edef#2{\the\XKV@tempa@toks}% \ifx#2\@empty \XKV@for@break \else \expandafter\XKV@f@r \fi #2{#3}#1,\@nil,% } \long\def\XKV@f@r#1#2#3,{% \XKV@tempa@toks{#3}\edef#1{\the\XKV@tempa@toks}% \ifx#1\@nnil \expandafter\@gobbletwo \else #2\expandafter\XKV@f@r \fi #1{#2}% } \long\def\XKV@for@break #1\@nil,{\fi} \long\def\XKV@for@o#1{\expandafter\XKV@for@n\expandafter{#1}} \long\def\XKV@for@en#1#2#3{\XKV@f@r#2{#3}#1,\@nil,} \long\def\XKV@for@eo#1#2#3{% \def#2{\XKV@f@r#2{#3}}\expandafter#2#1,\@nil,% } \long\def\XKV@whilist#1#2#3\fi#4{% #3\expandafter\XKV@wh@list#1,\@nil,\@nil\@@#2#3\fi{#4}{}\fi } \long\def\XKV@wh@list#1,#2\@@#3#4\fi#5#6{% \def#3{#1}% \ifx#3\@nnil \def#3{#6}\expandafter\XKV@wh@l@st \else #4% #5\expandafter\expandafter\expandafter\XKV@wh@list \else \def#3{#6}\expandafter\expandafter\expandafter\XKV@wh@l@st \fi \fi #2\@@#3#4\fi{#5}{#1}% } \long\def\XKV@wh@l@st#1\@@#2#3\fi#4#5{} \def\XKV@addtomacro@n#1#2{% \XKV@tempa@toks\expandafter{#1#2}% \edef#1{\the\XKV@tempa@toks}% } \def\XKV@addtomacro@o#1#2{% \expandafter\XKV@addtomacro@n\expandafter#1\expandafter{#2}% } \def\XKV@addtolist@n#1#2{% \ifx#1\@empty \XKV@addtomacro@n#1{#2}% \else \XKV@addtomacro@n#1{,#2}% \fi } \def\XKV@addtolist@o#1#2{% \ifx#1\@empty \XKV@addtomacro@o#1#2% \else \XKV@addtomacro@o#1{\expandafter,#2}% \fi } \def\XKV@addtolist@x#1#2{\edef#1{#1\ifx#1\@empty\else,\fi#2}} \def\@selective@sanitize{\@testopt\@s@lective@sanitize\@M} \def\@s@lective@sanitize[#1]#2#3{% \begingroup \count@#1\relax\advance\count@\@ne \XKV@toks\expandafter{#3}% \def#3{#2}\@onelevel@sanitize#3% \edef#3{{#3}{\the\XKV@toks}}% \expandafter\@s@l@ctive@sanitize\expandafter#3#3% \expandafter\XKV@tempa@toks\expandafter{#3}% \expandafter\endgroup\expandafter\toks@\expandafter{\the\XKV@tempa@toks}% \edef#3{\the\toks@}% } \def\@s@l@ctive@sanitize#1#2#3{% \def\@i{\futurelet\@@tok\@ii}% \def\@ii{% \expandafter\@iii\meaning\@@tok\relax \ifx\@@tok\@s@l@ctive@sanitize \let\@@cmd\@gobble \else \ifx\@@tok\@sptoken \XKV@toks\expandafter{#1}\edef#1{\the\XKV@toks\space}% \def\@@cmd{\afterassignment\@i\let\@@tok= }% \else \let\@@cmd\@iv \fi \fi \@@cmd }% \def\@iii##1##2\relax{\if##1\@backslashchar\let\@@tok\relax\fi}% \def\@iv##1{% \toks@\expandafter{#1}\XKV@toks{##1}% \ifx\@@tok\bgroup \advance\count@\m@ne \ifnum\count@>\z@ \begingroup \def#1{\expandafter\@s@l@ctive@sanitize \csname\string#1\endcsname{#2}}% \expandafter#1\expandafter{\the\XKV@toks}% \XKV@toks\expandafter\expandafter\expandafter {\csname\string#1\endcsname}% \edef#1{\noexpand\XKV@toks{\the\XKV@toks}}% \expandafter\endgroup#1% \fi \edef#1{\the\toks@{\the\XKV@toks}}% \advance\count@\@ne \let\@@cmd\@i \else \edef#1{\expandafter\string\the\XKV@toks}% \expandafter\in@\expandafter{#1}{#2}% \edef#1{\the\toks@\ifin@#1\else \ifx\@@tok\@sptoken\space\else\the\XKV@toks\fi\fi}% \edef\@@cmd{\noexpand\@i\ifx\@@tok\@sptoken\the\XKV@toks\fi}% \fi \@@cmd }% \let#1\@empty\@i#3\@s@l@ctive@sanitize } \def\XKV@checksanitizea#1#2{% \XKV@ch@cksanitize{#1}#2=% \ifin@\else\XKV@ch@cksanitize{#1}#2,\fi \ifin@\@selective@sanitize[0]{,=}#2\fi } \def\XKV@checksanitizeb#1#2{% \XKV@ch@cksanitize{#1}#2,% \ifin@\@selective@sanitize[0],#2\fi } \def\XKV@ch@cksanitize#1#2#3{% \XKV@tempa@toks{#1}\edef#2{\the\XKV@tempa@toks}% \@onelevel@sanitize#2% \@expandtwoargs\in@#3{#2}% \ifin@ \def#2##1#3##2\@nil{% \XKV@tempa@toks{##2}\edef#2{\the\XKV@tempa@toks}% \ifx#2\@empty\else\in@false\fi }% #2#1#3\@nil \fi \XKV@tempa@toks{#1}\edef#2{\the\XKV@tempa@toks}% } \def\XKV@sp@deflist#1#2{% \let#1\@empty \XKV@for@n{#2}\XKV@resa{% \expandafter\KV@@sp@def\expandafter\XKV@resa\expandafter{\XKV@resa}% \XKV@addtomacro@o#1{\expandafter,\XKV@resa}% }% \ifx#1\@empty\else \def\XKV@resa,##1\@nil{\def#1{##1}}% \expandafter\XKV@resa#1\@nil \fi } \def\XKV@merge#1#2#3{% \XKV@checksanitizea{#2}\XKV@tempa \XKV@for@o\XKV@tempa\XKV@tempa{% \XKV@pltrue #3\XKV@tempa\XKV@tempb \let\XKV@tempc#1% \let#1\@empty \XKV@for@o\XKV@tempc\XKV@tempc{% #3\XKV@tempc\XKV@tempd \ifx\XKV@tempb\XKV@tempd \XKV@plfalse \XKV@addtolist@o#1\XKV@tempa \else \XKV@addtolist@o#1\XKV@tempc \fi }% \ifXKV@pl\XKV@addtolist@o#1\XKV@tempa\fi }% \ifXKV@st\global\let#1#1\fi } \def\XKV@delete#1#2#3{% \XKV@checksanitizeb{#2}\XKV@tempa \let\XKV@tempb#1% \let#1\@empty \XKV@for@o\XKV@tempb\XKV@tempb{% #3\XKV@tempb\XKV@tempc \@expandtwoargs\in@{,\XKV@tempc,}{,\XKV@tempa,}% \ifin@\else\XKV@addtolist@o#1\XKV@tempb\fi }% \ifXKV@st\global\let#1#1\fi } \def\XKV@warn#1{\message{xkeyval warning: #1}} \def\XKV@err#1{\errmessage{xkeyval error: #1}} \def\KV@errx{\XKV@err} \let\KV@err\KV@errx \def\XKV@ifstar#1{\@ifnextcharacter*{\@firstoftwo{#1}}} \def\XKV@ifplus#1{\@ifnextcharacter+{\@firstoftwo{#1}}} \def\XKV@makepf#1{% \KV@@sp@def\XKV@prefix{#1}% \def\XKV@resa{XKV}% \ifx\XKV@prefix\XKV@resa \XKV@err{`XKV' prefix is not allowed}% \let\XKV@prefix\@empty \else \edef\XKV@prefix{\ifx\XKV@prefix\@empty\else\XKV@prefix @\fi}% \fi } \def\XKV@makehd#1{% \expandafter\KV@@sp@def\expandafter\XKV@header\expandafter{#1}% \edef\XKV@header{% \XKV@prefix\ifx\XKV@header\@empty\else\XKV@header @\fi }% } \def\XKV@srstate#1#2{% \ifx\@empty#2\@empty\advance\XKV@depth\@ne\fi \XKV@for@n{XKV@prefix,XKV@fams,XKV@tkey,XKV@na,% ifXKV@st,ifXKV@pl,ifXKV@knf}\XKV@resa{% \expandafter\let\csname\XKV@resa#1\expandafter \endcsname\csname\XKV@resa#2\endcsname }% \ifx\@empty#1\@empty\advance\XKV@depth\m@ne\fi } \def\XKV@testopta#1{% \XKV@ifstar{\XKV@sttrue\XKV@t@stopta{#1}}% {\XKV@stfalse\XKV@t@stopta{#1}}% } \def\XKV@t@stopta#1{\XKV@ifplus{\XKV@pltrue#1}{\XKV@plfalse#1}} \def\XKV@testoptb#1{\@testopt{\XKV@t@stoptb#1}{KV}} \def\XKV@t@stoptb#1[#2]#3{% \XKV@makepf{#2}% \XKV@makehd{#3}% \KV@@sp@def\XKV@tfam{#3}% #1% } \def\XKV@testoptc#1{\@testopt{\XKV@t@stoptc#1}{KV}} \def\XKV@t@stoptc#1[#2]#3{% \XKV@makepf{#2}% \XKV@checksanitizeb{#3}\XKV@fams \expandafter\XKV@sp@deflist\expandafter \XKV@fams\expandafter{\XKV@fams}% \@testopt#1{}% } \def\XKV@testoptd#1#2{% \XKV@testoptb{% \edef\XKV@tempa{#2\XKV@header}% \def\XKV@tempb{\@testopt{\XKV@t@stoptd#1}}% \expandafter\XKV@tempb\expandafter{\XKV@tempa}% }% } \def\XKV@t@stoptd#1[#2]#3{% \@ifnextchar[{\XKV@sttrue#1{#2}{#3}}{\XKV@stfalse#1{#2}{#3}[]}% } \def\XKV@ifcmd#1#2#3{% \def\XKV@@ifcmd##1#2##2##3\@nil##4{% \def##4{##2}\ifx##4\@nnil \def##4{##1}\expandafter\@secondoftwo \else \expandafter\@firstoftwo \fi }% \XKV@@ifcmd#1#2{\@nil}\@nil#3% } \def\XKV@getkeyname#1#2{\expandafter\XKV@g@tkeyname#1=\@nil#2} \def\XKV@g@tkeyname#1=#2\@nil#3{% \XKV@ifcmd{#1}\savevalue#3{\XKV@rkvtrue\XKV@sgfalse}{% \XKV@ifcmd{#1}\gsavevalue#3% {\XKV@rkvtrue\XKV@sgtrue}{\XKV@rkvfalse\XKV@sgfalse}% }% } \def\XKV@getsg#1#2{% \expandafter\XKV@ifcmd\expandafter{#1}\global#2\XKV@sgtrue\XKV@sgfalse } \def\XKV@define@default#1#2{% \expandafter\def\csname\XKV@header#1@default\expandafter \endcsname\expandafter{\csname\XKV@header#1\endcsname{#2}}% } \def\define@key{\XKV@testoptb\XKV@define@key} \def\XKV@define@key#1{% \@ifnextchar[{\XKV@d@fine@k@y{#1}}{% \expandafter\def\csname\XKV@header#1\endcsname####1% }% } \def\XKV@d@fine@k@y#1[#2]{% \XKV@define@default{#1}{#2}% \expandafter\def\csname\XKV@header#1\endcsname##1% } \def\define@cmdkey{\XKV@testoptd\XKV@define@cmdkey{cmd}} \def\XKV@define@cmdkey#1#2[#3]#4{% \ifXKV@st\XKV@define@default{#2}{#3}\fi \def\XKV@tempa{\expandafter\def\csname\XKV@header#2\endcsname####1}% \begingroup\expandafter\endgroup\expandafter\XKV@tempa\expandafter {\expandafter\def\csname#1#2\endcsname{##1}#4}% } \def\define@cmdkeys{\XKV@testoptd\XKV@define@cmdkeys{cmd}} \def\XKV@define@cmdkeys#1#2[#3]{% \XKV@sp@deflist\XKV@tempa{#2}% \XKV@for@o\XKV@tempa\XKV@tempa{% \edef\XKV@tempa{\noexpand\XKV@define@cmdkey{#1}{\XKV@tempa}}% \XKV@tempa[#3]{}% }% } \def\define@choicekey{\XKV@testopta{\XKV@testoptb\XKV@define@choicekey}} \def\XKV@define@choicekey#1{\@testopt{\XKV@d@fine@choicekey{#1}}{}} \def\XKV@d@fine@choicekey#1[#2]#3{% \toks@{#2}% \XKV@sp@deflist\XKV@tempa{#3}\XKV@toks\expandafter{\XKV@tempa}% \@ifnextchar[{\XKV@d@fine@ch@icekey{#1}}{\XKV@d@fine@ch@ic@key{#1}}% } \def\XKV@d@fine@ch@icekey#1[#2]{% \XKV@define@default{#1}{#2}% \XKV@d@fine@ch@ic@key{#1}% } \def\XKV@d@fine@ch@ic@key#1{% \ifXKV@pl\XKV@afterelsefi \expandafter\XKV@d@f@ne@ch@ic@k@y \else\XKV@afterfi \expandafter\XKV@d@f@ne@ch@ic@key \fi \csname\XKV@header#1\endcsname } \def\XKV@d@f@ne@ch@ic@key#1#2{\XKV@d@f@n@@ch@ic@k@y#1{{#2}}} \def\XKV@d@f@ne@ch@ic@k@y#1#2#3{\XKV@d@f@n@@ch@ic@k@y#1{{#2}{#3}}} \def\XKV@d@f@n@@ch@ic@k@y#1#2{% \edef#1##1{% \ifXKV@st\noexpand\XKV@sttrue\else\noexpand\XKV@stfalse\fi \ifXKV@pl\noexpand\XKV@pltrue\else\noexpand\XKV@plfalse\fi \noexpand\XKV@checkchoice[\the\toks@]{##1}{\the\XKV@toks}% }% \def\XKV@tempa{\def#1####1}% \expandafter\XKV@tempa\expandafter{#1{##1}#2}% } \def\define@boolkey{\XKV@t@stopta{\XKV@testoptd\XKV@define@boolkey{}}} \def\XKV@define@boolkey#1#2[#3]{% \ifXKV@pl\XKV@afterelsefi \expandafter\XKV@d@f@ne@boolkey \else\XKV@afterfi \expandafter\XKV@d@fine@boolkey \fi \csname\XKV@header#2\endcsname{#2}{#1#2}{#3}% } \def\XKV@d@fine@boolkey#1#2#3#4#5{% \XKV@d@f@ne@b@olkey#1{#2}{#3}{#4}% {{\csname#3\XKV@resa\endcsname#5}}% } \def\XKV@d@f@ne@boolkey#1#2#3#4#5#6{% \XKV@d@f@ne@b@olkey#1{#2}{#3}{#4}% {{\csname#3\XKV@resa\endcsname#5}{#6}}% } \def\XKV@d@f@ne@b@olkey#1#2#3#4#5{% \expandafter\newif\csname if#3\endcsname \ifXKV@st\XKV@define@default{#2}{#4}\fi \ifXKV@pl \def#1##1{\XKV@pltrue\XKV@sttrue \XKV@checkchoice[\XKV@resa]{##1}{true,false}#5% }% \else \def#1##1{\XKV@plfalse\XKV@sttrue \XKV@checkchoice[\XKV@resa]{##1}{true,false}#5% }% \fi } \def\define@boolkeys{\XKV@plfalse\XKV@testoptd\XKV@define@boolkeys{}} \def\XKV@define@boolkeys#1#2[#3]{% \XKV@sp@deflist\XKV@tempa{#2}% \XKV@for@o\XKV@tempa\XKV@tempa{% \expandafter\XKV@d@fine@boolkeys\expandafter{\XKV@tempa}{#1}{#3}% }% } \def\XKV@d@fine@boolkeys#1#2#3{% \expandafter\XKV@d@f@ne@b@olkey\csname\XKV@header#1\endcsname {#1}{#2#1}{#3}{{\csname#2#1\XKV@resa\endcsname}}% } \def\XKV@cc{\XKV@testopta{\@testopt\XKV@checkchoice{}}} \def\XKV@checkchoice[#1]#2#3{% \def\XKV@tempa{#1}% \ifXKV@st\lowercase{\fi \ifx\XKV@tempa\@empty \def\XKV@tempa{\XKV@ch@ckch@ice\@nil{#2}{#3}}% \else \def\XKV@tempa{\XKV@ch@ckchoice#1\@nil{#2}{#3}}% \fi \ifXKV@st}\fi\XKV@tempa } \def\XKV@ch@ckchoice#1#2\@nil#3#4{% \def\XKV@tempa{#2}% \ifx\XKV@tempa\@empty\XKV@afterelsefi \XKV@ch@ckch@ice#1{#3}{#4}% \else\XKV@afterfi \XKV@@ch@ckchoice#1#2{#3}{#4}% \fi } \def\XKV@ch@ckch@ice#1#2#3{% \def\XKV@tempa{#1}% \ifx\XKV@tempa\@nnil\let\XKV@tempa\@empty\else \def\XKV@tempa{\def#1{#2}}% \fi \in@{,#2,}{,#3,}% \ifin@ \ifXKV@pl \XKV@addtomacro@n\XKV@tempa\@firstoftwo \else \XKV@addtomacro@n\XKV@tempa\@firstofone \fi \else \ifXKV@pl \XKV@addtomacro@n\XKV@tempa\@secondoftwo \else \XKV@toks{#2}% \XKV@err{value `\the\XKV@toks' is not allowed}% \XKV@addtomacro@n\XKV@tempa\@gobble \fi \fi \XKV@tempa } \def\XKV@@ch@ckchoice#1#2#3#4{% \edef\XKV@tempa{\the\count@}\count@\z@ \def\XKV@tempb{#3}% \def\XKV@tempc##1,{% \def#1{##1}% \ifx#1\@nnil \def#1{#3}\def#2{-1}\count@\XKV@tempa \ifXKV@pl \let\XKV@tempd\@secondoftwo \else \XKV@toks{#3}% \XKV@err{value `\the\XKV@toks' is not allowed}% \let\XKV@tempd\@gobble \fi \else \ifx#1\XKV@tempb \edef#2{\the\count@}\count@\XKV@tempa \ifXKV@pl \let\XKV@tempd\XKV@@ch@ckch@ice \else \let\XKV@tempd\XKV@@ch@ckch@ic@ \fi \else \advance\count@\@ne \let\XKV@tempd\XKV@tempc \fi \fi \XKV@tempd }% \XKV@tempc#4,\@nil,% } \def\XKV@@ch@ckch@ice#1\@nil,{\@firstoftwo} \def\XKV@@ch@ckch@ic@#1\@nil,{\@firstofone} \def\key@ifundefined{\@testopt\XKV@key@ifundefined{KV}} \def\XKV@key@ifundefined[#1]#2{% \XKV@makepf{#1}% \XKV@checksanitizeb{#2}\XKV@fams \expandafter\XKV@sp@deflist\expandafter \XKV@fams\expandafter{\XKV@fams}% \XKV@key@if@ndefined } \def\XKV@key@if@ndefined#1{% \XKV@knftrue \KV@@sp@def\XKV@tkey{#1}% \XKV@whilist\XKV@fams\XKV@tfam\ifXKV@knf\fi{% \XKV@makehd\XKV@tfam \XKV@ifundefined{\XKV@header\XKV@tkey}{}{\XKV@knffalse}% }% \ifXKV@knf \expandafter\@firstoftwo \else \expandafter\@secondoftwo \fi } \def\disable@keys{\XKV@testoptb\XKV@disable@keys} \def\XKV@disable@keys#1{% \XKV@checksanitizeb{#1}\XKV@tempa \XKV@for@o\XKV@tempa\XKV@tempa{% \XKV@ifundefined{\XKV@header\XKV@tempa}{% \XKV@err{key `\XKV@tempa' undefined}% }{% \edef\XKV@tempb{% \noexpand\XKV@warn{key `\XKV@tempa' has been disabled}% }% \XKV@ifundefined{\XKV@header\XKV@tempa @default}{% \edef\XKV@tempc{\noexpand\XKV@define@key{\XKV@tempa}}% }{% \edef\XKV@tempc{\noexpand\XKV@define@key{\XKV@tempa}[]}% }% \expandafter\XKV@tempc\expandafter{\XKV@tempb}% }% }% } \def\presetkeys{\XKV@stfalse\XKV@testoptb\XKV@presetkeys} \def\gpresetkeys{\XKV@sttrue\XKV@testoptb\XKV@presetkeys} \def\XKV@presetkeys#1#2{% \XKV@pr@setkeys{#1}{preseth}% \XKV@pr@setkeys{#2}{presett}% } \def\XKV@pr@setkeys#1#2{% \XKV@ifundefined{XKV@\XKV@header#2}{% \XKV@checksanitizea{#1}\XKV@tempa \ifXKV@st\expandafter\global\fi\expandafter\def\csname XKV@\XKV@header#2\expandafter\endcsname\expandafter{\XKV@tempa}% }{% \expandafter\XKV@merge\csname XKV@\XKV@header #2\endcsname{#1}\XKV@getkeyname }% } \def\delpresetkeys{\XKV@stfalse\XKV@testoptb\XKV@delpresetkeys} \def\gdelpresetkeys{\XKV@sttrue\XKV@testoptb\XKV@delpresetkeys} \def\XKV@delpresetkeys#1#2{% \XKV@d@lpresetkeys{#1}{preseth}% \XKV@d@lpresetkeys{#2}{presett}% } \def\XKV@d@lpresetkeys#1#2{% \XKV@ifundefined{XKV@\XKV@header#2}{% \XKV@err{no presets defined for `\XKV@header'}% }{% \expandafter\XKV@delete\csname XKV@\XKV@header #2\endcsname{#1}\XKV@getkeyname }% } \def\unpresetkeys{\XKV@stfalse\XKV@testoptb\XKV@unpresetkeys} \def\gunpresetkeys{\XKV@sttrue\XKV@testoptb\XKV@unpresetkeys} \def\XKV@unpresetkeys{% \XKV@ifundefined{XKV@\XKV@header preseth}{% \XKV@err{no presets defined for `\XKV@header'}% }{% \ifXKV@st\expandafter\global\fi\expandafter\let \csname XKV@\XKV@header preseth\endcsname\@undefined \ifXKV@st\expandafter\global\fi\expandafter\let \csname XKV@\XKV@header presett\endcsname\@undefined }% } \def\savekeys{\XKV@stfalse\XKV@testoptb\XKV@savekeys} \def\gsavekeys{\XKV@sttrue\XKV@testoptb\XKV@savekeys} \def\XKV@savekeys#1{% \XKV@ifundefined{XKV@\XKV@header save}{% \XKV@checksanitizeb{#1}\XKV@tempa \ifXKV@st\expandafter\global\fi\expandafter\def\csname XKV@% \XKV@header save\expandafter\endcsname\expandafter{\XKV@tempa}% }{% \expandafter\XKV@merge\csname XKV@\XKV@header save\endcsname{#1}\XKV@getsg }% } \def\delsavekeys{\XKV@stfalse\XKV@testoptb\XKV@delsavekeys} \def\gdelsavekeys{\XKV@sttrue\XKV@testoptb\XKV@delsavekeys} \def\XKV@delsavekeys#1{% \XKV@ifundefined{XKV@\XKV@header save}{% \XKV@err{no save keys defined for `\XKV@header'}% }{% \expandafter\XKV@delete\csname XKV@\XKV@header save\endcsname{#1}\XKV@getsg }% } \def\unsavekeys{\XKV@stfalse\XKV@testoptb\XKV@unsavekeys} \def\gunsavekeys{\XKV@sttrue\XKV@testoptb\XKV@unsavekeys} \def\XKV@unsavekeys{% \XKV@ifundefined{XKV@\XKV@header save}{% \XKV@err{no save keys defined for `\XKV@header'}% }{% \ifXKV@st\expandafter\global\fi\expandafter\let \csname XKV@\XKV@header save\endcsname\@undefined }% } \def\setkeys{\XKV@testopta{\XKV@testoptc\XKV@setkeys}} \def\XKV@setkeys[#1]#2{% \XKV@checksanitizea{#2}\XKV@resb \let\XKV@naa\@empty \XKV@for@o\XKV@resb\XKV@tempa{% \expandafter\XKV@g@tkeyname\XKV@tempa=\@nil\XKV@tempa \XKV@addtolist@x\XKV@naa\XKV@tempa }% \ifnum\XKV@depth=\z@\let\XKV@rm\@empty\fi \XKV@usepresetkeys{#1}{preseth}% \expandafter\XKV@s@tkeys\expandafter{\XKV@resb}{#1}% \XKV@usepresetkeys{#1}{presett}% \let\CurrentOption\@empty } \def\XKV@usepresetkeys#1#2{% \XKV@presettrue \XKV@for@eo\XKV@fams\XKV@tfam{% \XKV@makehd\XKV@tfam \XKV@ifundefined{XKV@\XKV@header#2}{}{% \XKV@toks\expandafter\expandafter\expandafter {\csname XKV@\XKV@header#2\endcsname}% \@expandtwoargs\XKV@s@tkeys{\the\XKV@toks}% {\XKV@naa\ifx\XKV@naa\@empty\else,\fi#1}% }% }% \XKV@presetfalse } \def\XKV@s@tkeys#1#2{% \XKV@sp@deflist\XKV@na{#2}% \XKV@for@n{#1}\CurrentOption{% \expandafter\XKV@s@tk@ys\CurrentOption==\@nil }% } \def\XKV@s@tk@ys#1=#2=#3\@nil{% \XKV@g@tkeyname#1=\@nil\XKV@tkey \expandafter\KV@@sp@def\expandafter\XKV@tkey\expandafter{\XKV@tkey}% \ifx\XKV@tkey\@empty \XKV@toks{#2}% \ifcat$\the\XKV@toks$\else \XKV@err{no key specified for value `\the\XKV@toks'}% \fi \else \@expandtwoargs\in@{,\XKV@tkey,}{,\XKV@na,}% \ifin@\else \XKV@knftrue \KV@@sp@def\XKV@tempa{#2}% \ifXKV@preset\XKV@s@tk@ys@{#3}\else \ifXKV@pl \XKV@for@eo\XKV@fams\XKV@tfam{% \XKV@makehd\XKV@tfam \XKV@s@tk@ys@{#3}% }% \else \XKV@whilist\XKV@fams\XKV@tfam\ifXKV@knf\fi{% \XKV@makehd\XKV@tfam \XKV@s@tk@ys@{#3}% }% \fi \fi \ifXKV@knf \ifXKV@inpox \ifx\XKV@doxs\relax \ifx\@currext\@clsextension\else \let\CurrentOption\XKV@tkey\@unknownoptionerror \fi \else\XKV@doxs\fi \else \ifXKV@st \XKV@addtolist@o\XKV@rm\CurrentOption \else \XKV@err{`\XKV@tkey' undefined in families `\XKV@fams'}% \fi \fi \else \ifXKV@inpox\ifx\XKV@testclass\XKV@documentclass \expandafter\XKV@useoption\expandafter{\CurrentOption}% \fi\fi \fi \fi \fi } \def\XKV@s@tk@ys@#1{% \XKV@ifundefined{\XKV@header\XKV@tkey}{}{% \XKV@knffalse \XKV@ifundefined{XKV@\XKV@header save}{}{% \expandafter\XKV@testsavekey\csname XKV@\XKV@header save\endcsname\XKV@tkey }% \ifXKV@rkv \ifXKV@sg\expandafter\global\fi\expandafter\let \csname XKV@\XKV@header\XKV@tkey @value\endcsname\XKV@tempa \fi \expandafter\XKV@replacepointers\expandafter{\XKV@tempa}% \ifx\@empty#1\@empty\XKV@afterelsefi \XKV@ifundefined{\XKV@header\XKV@tkey @default}{% \XKV@err{no value specified for key `\XKV@tkey'}% }{% \expandafter\expandafter\expandafter\XKV@default \csname\XKV@header\XKV@tkey @default\endcsname\@nil }% \else\XKV@afterfi \XKV@srstate{@\romannumeral\XKV@depth}{}% \csname\XKV@header\XKV@tkey\expandafter \endcsname\expandafter{\XKV@tempa}\relax \XKV@srstate{}{@\romannumeral\XKV@depth}% \fi }% } \def\XKV@testsavekey#1#2{% \ifXKV@rkv\else \XKV@for@o#1\XKV@resa{% \expandafter\XKV@ifcmd\expandafter{\XKV@resa}\global\XKV@resa{% \ifx#2\XKV@resa \XKV@rkvtrue\XKV@sgtrue \fi }{% \ifx#2\XKV@resa \XKV@rkvtrue\XKV@sgfalse \fi }% }% \fi } \def\XKV@replacepointers#1{% \let\XKV@tempa\@empty \let\XKV@resa\@empty \XKV@r@placepointers#1\usevalue\@nil } \def\XKV@r@placepointers#1\usevalue#2{% \XKV@addtomacro@n\XKV@tempa{#1}% \def\XKV@tempb{#2}% \ifx\XKV@tempb\@nnil\else\XKV@afterfi \XKV@ifundefined{XKV@\XKV@header#2@value}{% \XKV@err{no value recorded for key `#2'; ignored}% \XKV@r@placepointers }{% \@expandtwoargs\in@{,#2,}{,\XKV@resa,}% \ifin@\XKV@afterelsefi \XKV@err{back linking pointers; pointer replacement canceled}% \else\XKV@afterfi \XKV@addtolist@x\XKV@resa{#2}% \expandafter\expandafter\expandafter\XKV@r@placepointers \csname XKV@\XKV@header#2@value\endcsname \fi }% \fi } \def\XKV@default#1#2\@nil{% \expandafter\edef\expandafter\XKV@tempa \expandafter{\expandafter\@gobble\string#1}% \edef\XKV@tempb{\XKV@header\XKV@tkey}% \@onelevel@sanitize\XKV@tempb \ifx\XKV@tempa\XKV@tempb \begingroup \expandafter\def\csname\XKV@header\XKV@tkey\endcsname##1{% \gdef\XKV@tempa{##1}% }% \csname\XKV@header\XKV@tkey @default\endcsname \endgroup \XKV@ifundefined{XKV@\XKV@header save}{}{% \expandafter\XKV@testsavekey\csname XKV@\XKV@header save\endcsname\XKV@tkey }% \ifXKV@rkv \ifXKV@sg\expandafter\global\fi\expandafter\let \csname XKV@\XKV@header\XKV@tkey @value\endcsname\XKV@tempa \fi \expandafter\XKV@replacepointers\expandafter {\XKV@tempa}\XKV@afterelsefi \XKV@srstate{@\romannumeral\XKV@depth}{}% \expandafter#1\expandafter{\XKV@tempa}\relax \XKV@srstate{}{@\romannumeral\XKV@depth}% \else\XKV@afterfi \XKV@srstate{@\romannumeral\XKV@depth}{}% \csname\XKV@header\XKV@tkey @default\endcsname\relax \XKV@srstate{}{@\romannumeral\XKV@depth}% \fi } \def\setrmkeys{\XKV@testopta{\XKV@testoptc\XKV@setrmkeys}} \def\XKV@setrmkeys[#1]{% \def\XKV@tempa{\XKV@setkeys[#1]}% \expandafter\XKV@tempa\expandafter{\XKV@rm}% } \XKVcatcodes \endinput %% %% End of file `xkeyval.tex'. derivations-0.53.20120414.orig/tex/taylor.tex0000644000000000000000000032236111742566274017202 0ustar rootroot% ---------------------------------------------------------------------- \chapter{The Taylor series} \label{taylor} \index{Taylor series} \index{series!Taylor} \index{Taylor, Brook (1685--1731)} \index{function!fitting of} The Taylor series is a power series which fits a function in a limited domain neighborhood. Fitting a function in such a way brings two advantages: \bi \item it lets us take derivatives and integrals in the same straightforward way~(\ref{drvtv:240:polyderivz}) we take them with any power series; and \item it implies a simple procedure to calculate the function numerically. \ei This chapter introduces the Taylor series and some of its incidents. It also derives Cauchy's integral formula. The chapter's early sections prepare the ground for the treatment of the Taylor series proper in \S~\ref{taylor:310}.% \footnote{ Because even at the applied level the proper derivation of the Taylor series involves mathematical induction, analytic continuation and the matter of convergence domains, no balance of rigor the chapter might strike seems wholly satisfactory. The chapter errs maybe toward too much rigor; for, with a little less, most of \S\S~\ref{taylor:314}, \ref{taylor:317}, \ref{taylor:320} and~\ref{taylor:330} would cease to be necessary. For the impatient, to read only the following sections might not be an unreasonable way to shorten the chapter: \S\S~% \ref{taylor:310}, \ref{taylor:325}, \ref{taylor:350}, \ref{taylor:315} and~% \ref{taylor:355}, plus the introduction of \S~\ref{taylor:314}. From another point of view, the chapter errs maybe toward too little rigor. Some pretty constructs of pure mathematics serve the Taylor series and Cauchy's integral formula. However, such constructs drive the applied mathematician on too long a detour. The chapter as written represents the most nearly satisfactory compromise the writer has been able to attain. } % ---------------------------------------------------------------------- \section{The power-series expansion of $1/(1-z)^{n+1}$} \label{taylor:314} \index{expansion of $1/(1-z)^{n+1}$} Before approaching the Taylor series proper in \S~\ref{taylor:310}, we shall find it both interesting and useful to demonstrate that \bq{taylor:314:70} \frac{1}{(1-z)^{n+1}} = \sum_{k=0}^{\infty} \cmb{n+k}{n} z^k, \ \ n\ge 0. \eq The demonstration comes in three stages. Of the three, it is the second stage (\S~\ref{taylor:314.20}) which actually proves~(\ref{taylor:314:70}). The first stage (\S~\ref{taylor:314.10}) comes up with the formula for the second stage to prove. The third stage (\S~\ref{taylor:314.25}) establishes the sum's convergence. In all the section, \[ i,j,k,m,n,K \in \mathbb Z. \] \subsection{The formula} \label{taylor:314.10} In \S~\ref{alggeo:228.30} we found that \[ \frac{1}{1-z} = \sum_{k=0}^{\infty} z^k = 1 + z + z^2 + z^3 + \cdots \] for $\left|z\right| < 1$. What about $1/(1-z)^2$, $1/(1-z)^3$, $1/(1-z)^4$, and so on? By the long-division procedure of Table~\ref{alggeo:228:tbl-up}, one can calculate the first few terms of $1/(1-z)^2$ to be \[ \frac{1}{(1-z)^2} = \frac{1}{1-2z+z^2} = 1 + 2z + 3z^2 + 4z^3 + \cdots \] whose coefficients $1,2,3,4,\ldots$ happen to be the numbers down the first diagonal of Pascal's triangle (Fig.~\ref{drvtv:pasc} on page~\pageref{drvtv:pasc}; see also Fig.~\ref{drvtv:pasc0}). Dividing $1/(1-z)^3$ seems to produce the coefficients $1,3,6,\mbox{0xA},\ldots$ down the second diagonal; dividing $1/(1-z)^4$, the coefficients down the third. A curious pattern seems to emerge, worth investigating more closely. The pattern recommends the conjecture~(\ref{taylor:314:70}). To motivate the conjecture a bit more formally (though without actually proving it yet), suppose that $1/(1-z)^{n+1}$, $n \ge 0$, is expandable in the power series \bq{taylor:314:10} \frac{1}{(1-z)^{n+1}} = \sum_{k=0}^{\infty} a_{nk} z^k, \eq where the~$a_{nk}$ are coefficients to be determined. Multiplying by $1-z$, we have that \[ \frac{1}{(1-z)^n} = \sum_{k=0}^{\infty} [a_{nk}-a_{n(k-1)}] z^k. \] This is to say that \[ a_{(n-1)k} = a_{nk}-a_{n(k-1)}, \] or in other words that \bq{taylor:314:30} a_{n(k-1)} + a_{(n-1)k} = a_{nk}. \eq Thinking of Pascal's triangle,~(\ref{taylor:314:30}) reminds one of~(\ref{drvtv:220:37}), transcribed here in the symbols \bq{taylor:314:50} \cmb{m-1}{j-1} + \cmb{m-1}{j} = \cmb{m}{j}, \eq except that~(\ref{taylor:314:30}) is not $a_{(m-1)(j-1)} + a_{(m-1)j} = a_{mj}$. \index{false try} Various changes of variable are possible to make~(\ref{taylor:314:50}) better match~(\ref{taylor:314:30}). We might try at first a few false ones, but eventually the change \[ \begin{split} n + k &\la m, \\ k &\la j, \end{split} \] recommends itself. Thus changing in~(\ref{taylor:314:50}) gives \[ \cmb{n+k-1}{k-1} + \cmb{n+k-1}{k} = \cmb{n+k}{k}. \] Transforming according to the rule~(\ref{drvtv:220:31}), this is \bq{taylor:314:51} \cmb{n+[k-1]}{n} + \cmb{[n-1]+k}{n-1} = \cmb{n+k}{n}, \eq which fits~(\ref{taylor:314:30}) perfectly. Hence we conjecture that \bq{taylor:314:60} a_{nk} = \cmb{n+k}{n}, \eq which coefficients, applied to~(\ref{taylor:314:10}), yield~(\ref{taylor:314:70}). Equation~(\ref{taylor:314:70}) is thus suggestive. It works at least for the important case of $n=0$; this much is easy to test. In light of~(\ref{taylor:314:30}), it seems to imply a relationship between the $1/(1-z)^{n+1}$ series and the $1/(1-z)^n$ series for any~$n$. But \emph{to seem} is not \emph{to be.} At this point, all we can say is that~(\ref{taylor:314:70}) seems right. We will establish that it is right in the next subsection. \subsection{The proof by induction} \label{taylor:314.20} \index{induction} \index{proof!by induction} Equation~(\ref{taylor:314:70}) is proved by induction as follows. Consider the sum \bq{taylor:315:10} S_n \equiv \sum_{k=0}^{\infty} \cmb{n+k}{n} z^k. \eq Multiplying by $1-z$ yields \[ (1-z)S_n = \sum_{k=0}^{\infty} \left[ \cmb{n+k}{n} - \cmb{n+[k-1]}{n} \right] z^k. \] Per~(\ref{taylor:314:51}), this is \bq{taylor:315:11} (1-z)S_n = \sum_{k=0}^{\infty} \cmb{[n-1]+k}{n-1} z^k. \eq Now suppose that~(\ref{taylor:314:70}) is true for $n=i-1$ (where~$i$ denotes an integer rather than the imaginary unit): \bq{taylor:315:15} \frac{1}{(1-z)^i} = \sum_{k=0}^{\infty} \cmb{[i-1]+k}{i-1} z^k. \eq In light of~(\ref{taylor:315:11}), this means that \[ \frac{1}{(1-z)^i} = (1-z)S_i. \] Dividing by $1-z$, \[ \frac{1}{(1-z)^{i+1}} = S_i. \] Applying~(\ref{taylor:315:10}), \bq{taylor:315:20} \frac{1}{(1-z)^{i+1}} = \sum_{k=0}^{\infty} \cmb{i+k}{i} z^k. \eq Evidently~(\ref{taylor:315:15}) implies~(\ref{taylor:315:20}). In other words, if~(\ref{taylor:314:70}) is true for $n=i-1$, then it is also true for $n=i$. Thus \emph{by induction,} if it is true for any one~$n$, then it is also true for all greater~$n$. The ``if'' in the last sentence is important. Like all inductions, this one needs at least one \emph{start case} to be valid (many inductions actually need a consecutive pair of start cases). The $n=0$ supplies the start case \[ \frac{1}{(1-z)^{0+1}} = \sum_{k=0}^{\infty} \cmb{k}{0} z^k = \sum_{k=0}^{\infty} z^k, \] which per~(\ref{alggeo:228:45}) we know to be true. \subsection{Convergence} \label{taylor:314.25} \index{convergence} \index{iff} \index{Weierstrass, Karl Wilhelm Theodor (1815--1897)} The question remains as to the domain over which the sum~(\ref{taylor:314:70}) converges.% \footnote{ The meaning of the verb \emph{to converge} may seem clear enough from the context and from earlier references, but if explanation here helps: a series converges if and only if it approaches a specific, finite value after many terms. A more rigorous way of saying the same thing is as follows: the series \[ S = \sum_{k=0}^{\infty} \tau_k \] converges iff (if and only if), for all possible positive constants~$\ep$, there exists a finite $K \ge -1$ such that \[ \left|\sum_{k=K+1}^{n} \tau_k \right| < \ep, \] for all $n \ge K$ (of course it is also required that the~$\tau_k$ be finite, but you knew that already). The professional mathematical literature calls such convergence ``uniform convergence,'' distinguishing it through a test devised by Weierstrass from the weaker ``pointwise convergence'' \cite[\S~1.5]{Andrews}\@. The applied mathematician can profit substantially by learning the professional view in the matter, but the effect of trying to teach the professional view in a book like this would not be pleasing. Here, we avoid error by keeping a clear view of the physical phenomena the mathematics is meant to model. It is interesting nevertheless to consider an example of an integral for which convergence is not so simple, such as Frullani's integral of \S~\ref{inttx:460}. } To answer the question, consider that per~(\ref{drvtv:220:44}), \[ \cmb{m}{j} = \frac{m}{m-j}\cmb{m-1}{j}, \ \ m > 0. \] With the substitution $n + k \la m$, $n \la j$, this means that \[ \cmb{n+k}{n} = \frac{n+k}{k} \cmb{n+[k-1]}{n}, \] or more tersely, \[ a_{nk} = \frac{n+k}{k} a_{n(k-1)}, \] where \[ a_{nk} \equiv \cmb{n+k}{n} \] are the coefficients of the power series~(\ref{taylor:314:70}). Rearranging factors, \bq{taylor:314:80} \frac{a_{nk}}{a_{n(k-1)}} = \frac{n+k}{k} = 1 + \frac{n}{k}. \eq \index{majorization} Multiplying~(\ref{taylor:314:80}) by $z^k/z^{k-1}$ gives the ratio \[ \frac{a_{nk}z^k}{a_{n(k-1)}z^{k-1}} = \left(1 + \frac{n}{k}\right)z, \] which is to say that the $k$th term of~(\ref{taylor:314:70}) is $(1 + n/k)z$ times the $(k-1)$th term. So long as the criterion% \footnote{ Although one need not ask the question to understand the proof, the reader may nevertheless wonder why the simpler $\left|(1 + n/k)z\right| < 1$ is not given as a criterion. The surprising answer is that not all series $\sum \tau_k$ with $\left|\tau_k/\tau_{k-1}\right| < 1$ converge! For example, the extremely simple $\sum 1/k$ does not converge. As we see however, all series $\sum \tau_k$ with $\left|\tau_k/\tau_{k-1}\right| < 1-\delta$ do converge. The distinction is subtle but rather important. The really curious reader may now ask why $\sum 1/k$ does not converge. Answer: it \emph{majorizes} $\int_1^{x} (1/\tau)\,d\tau = \ln x$. See~(\ref{cexp:225:dln}) and \S~\ref{taylor:316}. } \[ \left|\left(1 + \frac{n}{k}\right)z\right| \le 1-\delta \] is satisfied for all sufficiently large $k>K$---where $0 < \delta \ll 1$ is a small positive constant---then the series evidently converges (see \S~\ref{alggeo:228.30} and eqn.~\ref{trig:278:triangle}). But we can bind $1 + n/k$ as close to unity as desired by making~$K$ sufficiently large, so to meet the criterion it suffices that \bq{taylor:314:83} \left|z\right|<1. \eq The bound~(\ref{taylor:314:83}) thus establishes a sure convergence domain for~(\ref{taylor:314:70}). \subsection{General remarks on mathematical induction} \label{taylor:314.30} We have proven~(\ref{taylor:314:70}) by means of a mathematical induction. The virtue of induction as practiced in \S~\ref{taylor:314.20} is that it makes a logically clean, airtight case for a formula. Its vice is that it conceals the subjective process which has led the mathematician to consider the formula in the first place. Once you obtain a formula somehow, maybe you can prove it by induction; but the induction probably does not help you to obtain the formula! A good inductive proof usually begins by motivating the formula proven, as in \S~\ref{taylor:314.10}. \index{Hamming, Richard~W. (1915--1998)} \index{rigor} \index{applied mathematics} \index{mathematics!applied} Richard~W.\ Hamming once said of mathematical induction, \begin{quote} The theoretical difficulty the student has with mathematical induction arises from the reluctance to ask seriously, ``How could I prove a formula for an infinite number of cases when I know that testing a finite number of cases is not enough?'' Once you really face this question, you will understand the ideas behind mathematical induction. It is only when you grasp the problem clearly that the method becomes clear.\ \cite[\S~2.3]{Hamming} \end{quote} Hamming also wrote, \begin{quote} The function of rigor is mainly critical and is seldom constructive. Rigor is the hygiene of mathematics, which is needed to protect us against careless thinking.\ \cite[\S~1.6]{Hamming} \end{quote} The applied mathematician may tend to avoid rigor for which he finds no immediate use, but he does not disdain mathematical rigor on principle. The style lies in exercising rigor at the right level for the problem at hand. Hamming, a professional mathematician who sympathized with the applied mathematician's needs, wrote further, \begin{quote} Ideally, when teaching a topic the degree of rigor should follow the student's perceived need for it\mdots It is necessary to require a gradually rising level of rigor so that when faced with a real need for it you are not left helpless. As a result, [one cannot teach] a uniform level of rigor, but rather a gradually rising level. Logically, this is indefensible, but psychologically there is little else that can be done.\ \cite[\S~1.6]{Hamming} \end{quote} Applied mathematics holds that the practice \emph{is} defensible, on the ground that the math serves the model; but Hamming nevertheless makes a pertinent point. Mathematical induction is a broadly applicable technique for constructing mathematical proofs. We will not always write inductions out as explicitly in this book as we have done in the present section---often we will leave the induction as an implicit exercise for the interested reader---but this section's example at least lays out the general pattern of the technique. % ---------------------------------------------------------------------- \section{Shifting a power series' expansion point} \label{taylor:317} \index{Taylor series!converting a power series to} \index{expansion point!shifting of} \index{shifting an expansion point} \index{power series!shifting the expansion point of} One more question we should treat before approaching the Taylor series proper in \S~\ref{taylor:310} concerns the shifting of a power series' expansion point. How can the expansion point of the power series \bqa f(z) &=& \sum_{k=K}^{\infty} (a_k)(z-z_o)^k, \label{taylor:317:30} \\ \ds (k,K) &\in& \mathbb Z, \ \ K\le 0, \xn \eqa which may have terms of negative order, be shifted from $z=z_o$ to $z=z_1$? The first step in answering the question is straightforward: one rewrites % bad break (\ref{taylor:317:30}) in the form \[ f(z) = \sum_{k=K}^{\infty} (a_k)([z-z_1]-[z_o-z_1])^k, \] then changes the variables \bq{taylor:317:50} \begin{split} w &\la \frac{z-z_1}{z_o-z_1}, \\ c_k &\la [-(z_o-z_1)]^ka_k, \end{split} \eq to obtain \bq{taylor:317:51} f(z) = \sum_{k=K}^{\infty} (c_k)(1-w)^k. \eq Splitting the $k<0$ terms from the $k \ge 0$ terms in~(\ref{taylor:317:51}), we have that \bqa f(z) &=& f_-(z) + f_+(z), \label{taylor:317:52} \\ f_-(z) &\equiv& \sum_{k=0}^{-(K+1)} \frac{c_{[-(k+1)]}}{(1-w)^{k+1}}, \xn\\ f_+(z) &\equiv& \sum_{k=0}^{\infty} (c_k)(1-w)^k. \xn \eqa Of the two subseries, the $f_-(z)$ is expanded term by term using~(\ref{taylor:314:70}), after which combining like powers of~$w$ yields the form \bq{taylor:317:52q} \begin{split} f_-(z) &= \sum_{k=0}^{\infty} q_k w^k, \\ q_k &\equiv \sum_{n=0}^{-(K+1)} (c_{[-(n+1)]})\cmb{n+k}{n}. \end{split} \eq The $f_+(z)$ is even simpler to expand: one need only multiply the series out term by term per~(\ref{drvtv:230:binthe}), combining like powers of~$w$ to reach the form \bq{taylor:317:52p} \begin{split} f_+(z) &= \sum_{k=0}^{\infty} p_k w^k, \\ p_k &\equiv \sum_{n=k}^{\infty}(c_n)\cmb{n}{k}. \end{split} \eq Equations~(\ref{taylor:317:30}) through~(\ref{taylor:317:52p}) serve to shift a power series' expansion point, calculating the coefficients of a power series for $f(z)$ about $z=z_1$, given those of a power series about $z=z_o$. Notice that---unlike the original, $z=z_o$ power series---the new, $z=z_1$ power series has terms $(z-z_1)^k$ only for $k\ge 0$; it has no terms of negative order. At the price per~(\ref{taylor:314:83}) of restricting the convergence domain to $\left|w\right|<1$, shifting the expansion point away from the pole at $z=z_o$ has resolved the $k<0$ terms. \index{forbidden point} The method fails if $z=z_1$ happens to be a pole or other nonanalytic point of $f(z)$. The convergence domain vanishes as~$z_1$ approaches such a forbidden point. (Examples of such forbidden points include $z=0$ in $h[z] = 1/z$ and in $g[z]=\sqrt z$. See \S\S~\ref{taylor:320} through~\ref{taylor:350}.) Furthermore, even if~$z_1$ does represent a fully analytic point of $f(z)$, it also must lie within the convergence domain of the original, $z=z_o$ series for the shift to be trustworthy as derived. The attentive reader might observe that we have formally established the convergence neither of $f_-(z)$ in~(\ref{taylor:317:52q}) nor of $f_+(z)$ in~(\ref{taylor:317:52p}). Regarding the former convergence, that of $f_-(z)$, we have strategically framed the problem so that one needn't worry about it, running the sum in~(\ref{taylor:317:30}) from the finite $k=K \le 0$ rather than from the infinite $k=-\infty$; and since according to~(\ref{taylor:314:83}) each term of the original $f_-(z)$ of~(\ref{taylor:317:52}) converges for $\left|w\right|<1$, the reconstituted $f_-(z)$ of~(\ref{taylor:317:52q}) safely converges in the same domain. The latter convergence, that of $f_+(z)$, is harder to establish in the abstract because that subseries has an infinite number of terms. As we will see by pursuing a different line of argument in \S~\ref{taylor:310}, however, the $f_+(z)$ of~(\ref{taylor:317:52p}) can be nothing other than the Taylor series about $z=z_1$ of the function $f_+(z)$ in any event, enjoying the same convergence domain any such Taylor series enjoys.% \footnote{ A rigorous argument can be constructed without appeal to \S~\ref{taylor:310} if desired, from the ratio $n/(n-k)$ of~(\ref{drvtv:220:44}) and its brethren, which ratio approaches unity with increasing~$n$. A more elegant rigorous argument can be made indirectly by way of a complex contour integral. In applied mathematics, however, one does not normally try to shift the expansion point of an \emph{unspecified} function $f(z)$, anyway. Rather, one shifts the expansion point of some concrete function like $\sin z$ or $\ln(1-z)$. The imagined difficulty (if any) vanishes in the concrete case. Appealing to \S~\ref{taylor:310}, the important point is the one made in the narrative: $f_+(z)$ can be nothing other than the Taylor series in any event. } % ---------------------------------------------------------------------- \section{Expanding functions in Taylor series} \label{taylor:310} \index{Taylor series} \index{series!Taylor} \index{Taylor, Brook (1685--1731)} Having prepared the ground, we now stand in position to treat the Taylor series proper. The treatment begins with a question: if you had to express some function $f(z)$ by a power series \[ f(z) = \sum_{k=0}^{\infty} (a_k)(z-z_o)^k, \] with terms of nonnegative order $k \ge 0$ only, how would you do it? The procedure of \S~\ref{taylor:314} worked well enough in the case of $f(z)=1/(1-z)^{n+1}$, but it is not immediately obvious that the same procedure works more generally. What if $f(z)=\sin z$, for example?% \footnote{ The actual Taylor series for $\sin z$ is given in \S~\ref{taylor:315}. } Fortunately a different way to attack the power-series expansion problem is known. It works by asking the question: what power series, having terms of nonnegative order only, most resembles $f(z)$ in the immediate neighborhood of $z=z_o$? To resemble $f(z)$, the desired power series should have $a_0 = f(z_o)$; otherwise it would not have the right value at $z=z_o$. Then it should have $a_1=f'(z_o)$ for the right slope. Then, $a_2=f''(z_o)/2$ for the right second derivative, and so on. With this procedure, \bq{taylor:310:20} f(z) = \sum_{k=0}^{\infty} \left(\left.\frac {d^kf} {dz^k}\right|_{z=z_o}\right)\frac{(z-z_o)^k}{k!}. \eq Equation~(\ref{taylor:310:20}) is the \emph{Taylor series.} Where it converges, it has all the same derivatives $f(z)$ has, so if $f(z)$ is infinitely differentiable then the Taylor series is an exact representation of the function.% % diagn: this long footnote would like one last review. \footnote{\label{taylor:310:fn1}% The professional mathematician reading such words is likely to blanch. To him, such words want rigor. To him, such words want pages and whole chapters \cite{Arnold:1997}\cite{Fisher}\cite{Spiegel}\cite{Hildebrand} of rigor. Before submitting unreservedly to the professional's scruples in the matter, however, let us not forget (\S~\ref{intro:284.2}) that the professional's approach is founded in postulation, whereas that ours is founded in physical metaphor. Our means differ from his for this reason alone. Still, an alternate way to think about the Taylor series' sufficiency might interest some readers. It begins with an infinitely differentiable function $F(z)$ and its Taylor series $f(z)$ about~$z_o$, letting $\Delta F(z) \equiv F(z) - f(z)$ be the part of $F(z)$ not representable as a Taylor series about~$z_o$. If $\Delta F(z)$ is the part of $F(z)$ not representable as a Taylor series, then $\Delta F(z_o)$ and all its derivatives at~$z_o$ must be identically zero (otherwise by the Taylor series formula of eqn.~\ref{taylor:310:20}, one could construct a nonzero Taylor series for $\Delta F[z_o]$ from the nonzero derivatives). However, if $F(z)$ is infinitely differentiable and if \emph{all} the derivatives of $\Delta F(z)$ are zero at $z=z_o$ then, by the unbalanced definition of the derivative from \S~\ref{drvtv:240}, all the derivatives must also be zero at $z=z_o\pm\ep$, hence also at $z=z_o\pm2\ep$, and so on. This means that $\Delta F(z) = 0$. In other words, there is no part of $F(z)$ not representable as a Taylor series. A more formal way to make the same argument would be to suppose that $\left. d^n\,\Delta F/dz^n \right|_{z=z_o+\ep} = h$ for some integer $n \ge 0$; whereas that this would mean that $\left. d^{n+1}\,\Delta F/dz^{n+1} \right|_{z=z_o} = h/\ep$; but that, inasmuch as the latter is one of the derivatives of $\Delta F(z)$ at $z=z_o$, it follows that $h=0$. The interested reader can fill the details in, but basically that is how the alternate argument begins. %One the one hand, in some respects the argument is %not very good, for it does not directly take into account the %prospect of an ill-behaved function \cite[``Extremum'']{EWW} like %$f(z) = \sin(1/z)$ for which it is not clear that a proper Taylor %series even exists (nor are such functions merely of academic %interest: see %[section not yet written].% %) %On the other hand, the ill-behaved function $f(z) = \sin(1/z)$ %behaves well enough after the obvious change of variable $u \la 1/x$; %and, anyway After all, if the first, second, third derivatives and so forth, evaluated at some expansion point, indicate anything at all, then must they not indicate how the function and its several derivatives will evolve from that point? And if they do indicate that, then what could null derivatives indicate but null evolution? Yet such arguments even if accepted are satisfactory only from a certain point of view, and yet slightly less so once one considers the asymptotic series of % diagn [chapter not yet written] later in the book. A more elegant rigorous argument, preferred by the professional mathematicians~\cite{Kohler-lecture}\cite{Arnold:1997}\cite{Fisher} but needing significant theoretical preparation, involves integrating over a complex contour about the expansion point. Appendix~\ref{purec} sketches that proof. The discussion of rigor is confined here to a footnote not to deprecate rigor as such, but to deprecate insistence on rigor which serves little known purpose in applications. Applied mathematicians normally regard mathematical functions to be imprecise analogs of, or metaphors for, physical quantities of interest. Since the functions are imprecise analogs in any case, the applied mathematician is logically free implicitly \emph{to define} the functions he uses as Taylor series in the first place; that is, to restrict the set of infinitely differentiable functions used in the model to the subset of such functions representable as Taylor series. With such an implicit definition, whether there actually exist any infinitely differentiable functions not representable as Taylor series is more or less beside the point---at least until a concrete need for such a hypothetical function should present itself. In applied mathematics, the definitions serve the model, not the other way around. Other than to divert the interested reader to Appendix~\ref{purec}, this footnote will leave the matter in that form. (It is entertaining incidentally to consider \cite[``Extremum'']{EWW} the Taylor series of the function $\sin[1/x]$---although in practice this particular function is readily expanded after the obvious change of variable $u \la 1/x$.) } \index{Maclaurin series} \index{Maclaurin, Colin (1698--1746)} The Taylor series is not guaranteed to converge outside some neighborhood near $z=z_o$, but where it does converge it is precise. When $z_o=0$, the series is also called the \emph{Maclaurin series.} By either name, the series is a construct of great importance and tremendous practical value, as we shall soon see. % ---------------------------------------------------------------------- \section{Analytic continuation} \label{taylor:320} \index{analytic continuation} \index{analytic function} \index{function!analytic} \index{infinite differentiability} \index{Taylor series!transposition of to a different expansion point} As earlier mentioned in \S~\ref{alggeo:225.3}, an \emph{analytic function} is a function which is infinitely differentiable in the domain neighborhood of interest---or, maybe more appropriately for our applied purpose, a function expressible as a Taylor series in that neighborhood. As we have seen, only one Taylor series about~$z_o$ is possible for a given function $f(z)$: \[ f(z) = \sum_{k=0}^{\infty} (a_k) (z-z_o)^k. \] However, nothing prevents one from transposing the series to a different expansion point $z=z_1$ by the method of \S~\ref{taylor:317}, except that the transposed series may there enjoy a different convergence domain. As it happens, this section's purpose finds it convenient to swap symbols $z_o \lra z_1$, transposing rather from expansion about $z=z_1$ to expansion about $z=z_o$. In the swapped notation, so long as the expansion point $z=z_o$ lies fully within (neither outside nor right on the edge of) the $z=z_1$ series' convergence domain, the two series evidently describe the selfsame underlying analytic function. \index{neighborhood} \index{domain neighborhood} Since an analytic function $f(z)$ is infinitely differentiable and enjoys a unique Taylor expansion $f_o(z-z_o)=f(z)$ about each point~$z_o$ in its domain, it follows that if two Taylor series $f_1(z-z_1)$ and $f_2(z-z_2)$ find even a small neighborhood $\left|z-z_o\right| < \ep$ which lies in the domain of both, then the two can both be transposed to the common $z=z_o$ expansion point. If the two are found to have the same Taylor series there, then~$f_1$ and~$f_2$ both represent the same function. Moreover, if a series~$f_3$ is found whose domain overlaps that of~$f_2$, then a series~$f_4$ whose domain overlaps that of~$f_3$, and so on, and if each pair in the chain matches at least in a small neighborhood in its region of overlap, then the whole chain of overlapping series necessarily represents the same underlying analytic function $f(z)$. The series~$f_1$ and the series~$f_n$ represent the same analytic function even if their domains do not directly overlap at all. \index{pole} \index{nonanalytic point} \index{forbidden point} \index{single-valued function} \index{multiple-valued function} \index{function!single- and multiple-valued} \index{Argand domain and range planes} This is a manifestation of the principle of \emph{analytic continuation.} The principle holds that if two analytic functions are the same within some domain neighborhood $\left|z-z_o\right| < \ep$, then they are the same everywhere.% \footnote{ The writer hesitates to mention that he is given to understand~\cite{Spiegel} that the domain neighborhood can technically be reduced to a domain contour of nonzero length but zero width. Having never met a significant application of this extension of the principle, the writer has neither researched the extension's proof nor asserted its truth. He does not especially recommend that the reader worry over the point. The domain neighborhood $\left|z-z_o\right| < \ep$ suffices. } Observe however that the principle fails at poles and other nonanalytic points, because the function is not differentiable there. The result of \S~\ref{taylor:317}, which shows general power series to be expressible as Taylor series except at their poles and other nonanalytic points, extends the analytic continuation principle to cover power series in general, including power series with terms of negative order. Now, observe: though all convergent power series are indeed analytic, one need not actually expand every analytic function in a power series. Sums, products and ratios of analytic functions are no less differentiable than the functions themselves---as also, by the derivative chain rule, is an analytic function of analytic functions. For example, where $g(z)$ and $h(z)$ are analytic, there also is $f(z) \equiv g(z)/h(z)$ analytic (except perhaps at isolated points where $h[z] = 0$). Besides, given Taylor series for $g(z)$ and $h(z)$ one can make a power series for $f(z)$ by long division if desired, so that is all right. Section~\ref{taylor:385} speaks further on the point. The subject of analyticity is rightly a matter of deep concern to the professional mathematician. It is also a long wedge which drives pure and applied mathematics apart. When the professional mathematician speaks generally of a ``function,'' he means \emph{any function at all.} One can construct some pretty unreasonable functions if one wants to, such as \bqb f( [2k+1]{2^m} ) &\equiv& (-)^m, \ \ (k,m) \in \mathbb Z; \\ f(z) &\equiv& 0\ \mbox{otherwise.} \eqb However, neither functions like this~$f(z)$ nor more subtly unreasonable functions normally arise in the modeling of physical phenomena. When such functions do arise, one transforms, approximates, reduces, replaces and/or avoids them. The full theory which classifies and encompasses---or explicitly excludes---such functions is thus of limited interest to the applied mathematician, and this book does not cover it.% \footnote{\label{taylor:320:fn10}% Many books do cover it in varying degrees, including~\cite{Fisher}\cite{Spiegel}\cite{Hildebrand} and numerous others. The foundations of the pure theory of a complex variable, though abstract, are beautiful, and though they do not comfortably fit a book like this even an applied mathematician can profit substantially by studying them. The few pages of Appendix~\ref{purec} trace only the pure theory's main thread. However that may be, the pure theory is probably best appreciated after one already understands its chief conclusions. Though not for the explicit purpose of serving the pure theory, the present chapter does develop just such an understanding. } %% Hm. I have mixed my metaphors here. How to unmix them? %Though any mathematician, pure or applied, can profit substantially by %studying the pure theory of analytic functions of a complex variable, %the applied mathematician is advised not to let the pure theory snare %him---or, at least, the reader of this book is advised not to let it %snare him. The pure theory is a deep hole to dive into. One finds %oneself---splash!---underwater down there, swimming complex contours of %various dimly seen shapes in pursuit of the relevant theorems, which %theorems in turn buoy other theorems, unrooted in application. The %fascination of the pure theory, the daunting aspect of the hole, %conspire to delay and often effectively to prevent the applied %mathematician from exploring the day-lit mathematical landscape past the %hole's far rim. That's not good. Avoiding the hole itself, this book %leads the reader about the rim along a safe path at a prudent pace, then %onward. % %To the reader who still wants to dive straightway into the hole: the %writer salutes him. Will he hear one more word of counsel? Applied %mathematics usually concerns itself with the specific functions used to %model some physical phenomenon, not with functions generally.% %\footnote{ % There is the question of a solution's uniqueness, which asks whether % any function other than a particular $f(z)$ solves a given set of % equations. However, the question is typically answered by % contradiction, assuming falsely that there existed an $\tilde f(z) % \neq f(z)$, then forming the difference $\Delta f(z) \equiv \tilde % f(z) - f(z)$ and showing that $\Delta f(z)$ cannot but everywhere be % null. %} %Before the reader dives, let him consider whether there is some specific %function he knows, useful in modeling a physical phenomenon, which %function the applied theory inadequately treats, for which function one %cannot or should not rearrange the problem to evade the difficulty. %Section~\ref{taylor:385} contrives some ideas along these lines, but at %the moment the writer writes these words, he can think of no such %function. The reader who, too, can think of none% %\footnote{ % The reader who \emph{can} think of one is most cordially asked to % e-mail it to the author, at the address on the title page's back. %} %might consider delaying the dive. This does not mean that the scientist or engineer never encounters nonanalytic functions. On the contrary, he encounters several, but they are not subtle: $\left|z\right|$; $\arg z$; $z^{*}$; $\Re(z)$; $\Im(z)$; $u(t)$; $\delta(t)$. Refer to \S\S~\ref{alggeo:225} and~\ref{integ:670}. Such functions are nonanalytic either because they lack proper derivatives in the Argand plane according to~(\ref{drvtv:defz}) or because one has defined them only over a real domain. % ---------------------------------------------------------------------- \section{Branch points} \label{taylor:325} \index{branch point} The function $g(z)=\sqrt z$ is an interesting, troublesome function. Its derivative is $dg/dz = 1/2\sqrt z$, so even though the function is finite at $z=0$, its derivative is not finite there. Evidently $g(z)$ has a nonanalytic point at $z=0$, yet the point is not a pole. What is it? \index{contour} \index{domain contour} \index{range contour} We call it a \emph{branch point.} The defining characteristic of the branch point is that, given a function $f(z)$ with such a point at $z=z_o$, if one encircles% \footnote{ For readers whose native language is not English, ``to encircle'' means ``to surround'' or ``to enclose.'' The verb does not require the boundary to have the shape of an actual, geometrical circle; any closed shape suffices. However, the circle is a typical shape, probably the most fitting shape to imagine when thinking of the concept abstractly. } the point once alone (that is, without also encircling some other branch point) by a closed contour in the Argand domain plane, while simultaneously tracking $f(z)$ in the Argand range plane---and if one demands that~$z$ and $f(z)$ move smoothly, that neither suddenly skip from one spot to another---then one finds that $f(z)$ ends in a different place than it began, even though~$z$ itself has returned precisely to its own starting point. The range contour remains open even though the domain contour is closed. \begin{quote} In complex analysis, a branch point may be thought of informally as a point~$z_o$ at which a ``multiple-valued function'' changes values when one winds once around~$z_o$.\ \cite[``Branch point,'' 18:10, 16 May 2006]{wikip} \end{quote} \index{single-valued function} \index{multiple-valued function} \index{function!single- and multiple-valued} An analytic function like $g(z)=\sqrt z$ having a branch point evidently is not single-valued. It is multiple-valued. For a single~$z$ more than one distinct $g(z)$ is possible. \index{pole} An analytic function like $h(z)=1/z$, by contrast, is single-valued even though it has a pole. This function does not suffer the syndrome described. When a domain contour encircles a pole, the corresponding range contour is properly closed. Poles do not cause their functions to be multiple-valued and thus are not branch points. Evidently $f(z)\equiv(z-z_o)^a$ has a branch point at $z=z_o$ if and only if~$a$ is not an integer. If $f(z)$ does have a branch point---if~$a$ is not an integer---then the mathematician must draw a distinction between $z_1=z_o+\rho e^{i\phi}$ and $z_2=z_o+\rho e^{i(\phi+2\pi)}$, \emph{even though the two are exactly the same number.} Indeed $z_1=z_2$, but paradoxically $f(z_1)\neq f(z_2)$. This is difficult. It is confusing, too, until one realizes that the fact of a branch point says nothing whatsoever about the argument~$z$. As far as~$z$ is concerned, there really is no distinction between $z_1=z_o+\rho e^{i\phi}$ and $z_2=z_o+\rho e^{i(\phi+2\pi)}$---none at all. What draws the distinction is the multiple-valued function $f(z)$ which uses the argument. \index{Pfufnik, Gorbag} It is as though I had a mad colleague who called me Thaddeus Black, until one day I happened to walk past behind his desk (rather than in front as I usually did), whereupon for some reason he began calling me Gorbag Pfufnik. I had not changed at all, but now the colleague calls me by a different name. The change isn't really in me, is it? It's in my colleague, who seems to suffer a branch point. If it is important to me to be sure that my colleague really is addressing me when he cries, ``Pfufnik!'' then I had better keep a running count of how many times I have turned about his desk, hadn't I, even though the number of turns is personally of no import to me. \index{branch point!strategy to avoid} The usual analysis strategy when one encounters a branch point is simply to avoid the point. Where an integral follows a closed contour as in \S~\ref{taylor:350}, the strategy is to compose the contour to exclude the branch point, to shut it out. Such a strategy of avoidance usually prospers.% \footnote{ Traditionally associated with branch points in complex variable theory are the notions of \emph{branch cuts} and \emph{Riemann sheets.} These ideas are interesting, but are not central to the analysis as developed in this book and are not covered here. The interested reader might consult a book on complex variables or advanced calculus like~\cite{Hildebrand}, among many others. } % ---------------------------------------------------------------------- \section{Entire and meromorphic functions} \label{taylor:330} \index{function!entire} \index{function!meromorphic} \index{entire function} \index{meromorphic function} Though an applied mathematician is unwise to let abstract definitions enthrall his thinking, pure mathematics nevertheless brings some technical definitions the applied mathematician can use. Two such are the definitions of \emph{entire} and \emph{meromorphic} functions.% \footnote{\cite{EWW}} A function $f(z)$ which is analytic for all finite~$z$ is an \emph{entire function.} Examples include $f(z) = z^2$ and $f(z) = \exp z$, but not $f(z) = 1/z$ which has a pole at $z=0$. \index{essential singularity} \index{singularity!essential} A function $f(z)$ which is analytic for all finite~$z$ except at isolated poles (which can be $n$-fold poles if~$n$ is a finite, positive integer), which has no branch points, of which no circle of finite radius in the Argand domain plane encompasses an infinity of poles, is a \emph{meromorphic function.} Examples include $f(z) = 1/z$, $f(z) = 1/(z+2) + 1/(z-1)^3 + 2z^2$ and $f(z) = \tan z$---the last of which has an infinite number of poles, but of which the poles nowhere cluster in infinite numbers. The function $f(z) = \tan (1/z)$ is not meromorphic since it has an infinite number of poles within the Argand unit circle. Even the function $f(z) = \exp (1/z)$ is not meromorphic: it has only the one, isolated nonanalytic point at $z=0$, and that point is no branch point; but the point is an \emph{essential singularity,} having the character of an infinitifold ($\infty$-fold) pole.% \footnote{\cite{Kohler-lecture}} If it seems unclear that the singularities of $\tan z$ are actual poles, incidentally, then consider that \[ \tan z = \frac{\sin z}{\cos z} = -\frac{\cos w}{\sin w}, \] wherein we have changed the variable \[ w \la z - (2n+1)\frac{2\pi}{4}, \ \ n \in \mathbb Z. \] Section~\ref{taylor:315} and its Table~\ref{taylor:315:tbl}, below, give Taylor series for $\cos z$ and $\sin z$, with which \[ \tan z = \frac{-1+w^2/2-w^4/\mbox{0x18}-\cdots}{w-w^3/6+w^5/\mbox{0x78}-\cdots}. \] By long division, \[ \tan z = -\frac{1}{w} + \frac{w/3-w^3/\mbox{0x1E}+\cdots}{1-w^2/6+w^4/\mbox{0x78}-\cdots}. \] (On the other hand, if it is unclear that $z = [2n+1][2\pi/4]$ are the only singularities $\tan z$ has---that it has no singularities of which $\Im[z] \neq 0$---then consider that the singularities of $\tan z$ occur where $\cos z=0$, which by Euler's formula, eqn.~\ref{cexp:250:cos}, occurs where $\exp[+iz] = \exp[-iz]$. This in turn is possible only if $\left|\exp[+iz]\right| = \left|\exp[-iz]\right|$, which happens only for real~$z$.) Sections~\ref{taylor:380}, \ref{taylor:385} and~\ref{inttx:260} speak further of the matter. % ---------------------------------------------------------------------- \section{Extrema over a complex domain} \label{taylor:335} \index{extremum} If a function $f(z)$ is expanded by~(\ref{taylor:310:20}) or by other means about an analytic expansion point $z=z_o$ such that \[ f(z) = f(z_o) + \sum_{k=1}^\infty (a_k) (z-z_o)^k; \] and if \bqb a_k &=& 0 \ \ \ \mbox{for $k < K$, but} \\ a_K &\neq& 0, \\ (k,K) &\in& \mathbb Z, \ \ 0 < K < \infty, \eqb such that~$a_K$ is the series' first nonzero coefficient; then, in the immediate neighborhood of the expansion point, \[ f(z) \approx f(z_o) + (a_K) (z-z_o)^K, \ \ \left|z-z_o\right| \ll 1. \] Changing $\rho' e^{i\phi'} \la z-z_o$, this is \bq{taylor:335:10} f(z) \approx f(z_o) + a_K\rho'^K e^{iK\phi'}, \ \ 0 \le \rho' \ll 1. \eq Evidently one can shift the output of an analytic function $f(z)$ slightly in any desired Argand direction by shifting slightly the function's input~$z$. Specifically according to~(\ref{taylor:335:10}), to shift $f(z)$ by $\Delta f \approx \ep e^{i\psi}$, one can shift~$z$ by $\Delta z \approx (\ep/a_K)^{1/K}e^{i(\psi+n2\pi)/K}$, $n \in \mathbb Z$. Except at a nonanalytic point of $f(z)$ or in the trivial case that~$f(z)$ were everywhere constant, this always works---even where $[df/dz]_{z=z_o}=0$. That one can shift an analytic function's output smoothly in any Argand direction whatsoever has the significant consequence that neither the real nor the imaginary part of the function---nor for that matter any linear combination $\Re[e^{-i\omega} f(z)]$ of the real and imaginary parts---can have an extremum within the interior of a domain over which the function is fully analytic. That is, \emph{a function's extrema over a bounded analytic domain never lie within the domain's interior but always on its boundary.}% \footnote{ Professional mathematicians tend to define the domain and its boundary more carefully. }$\mbox{}^,$% \footnote{ \cite{deSturler-lecture}% \cite{Kohler-lecture} } % ---------------------------------------------------------------------- \section{Cauchy's integral formula} \label{taylor:350} \index{Cauchy's integral formula} \index{Cauchy, Augustin Louis (1789--1857)} \index{integral!closed complex contour} \index{contour integration!closed complex} In \S~\ref{integ:260} we considered the problem of vector contour integration, in which the sum value of an integration depends not only on the integration's endpoints but also on the path, or \emph{contour,} over which the integration is done, as in Fig.~\ref{integ:260:fig}. Because real scalars are confined to a single line, no alternate choice of path is possible where the variable of integration is a real scalar, so the contour problem does not arise in that case. It does however arise where the variable of integration is a \emph{complex} scalar, because there again different paths are possible. Refer to the Argand plane of Fig.~\ref{alggeo:225:fig}. Consider the integral \bq{taylor:350:10} S_n = \int_{z_1}^{z_2} z^{n-1} \,dz, \ \ n \in \mathbb Z. \eq If~$z$ were always a real number, then by the antiderivative (\S~\ref{integ:230}) this integral would evaluate to $(z_2^n-z_1^n)/n$; or, in the case of $n=0$, to $\ln(z_2/z_1)$. Inasmuch as~$z$ is complex, however, the correct evaluation is less obvious. To evaluate the integral sensibly in the latter case, one must consider some specific path of integration in the Argand plane. One must also consider the meaning of the symbol~$dz$. \subsection{The meaning of the symbol~$dz$} \label{taylor:350.10} \index{$dz$} \index{infinitesimal!dropping of when negligible} The symbol~$dz$ represents an infinitesimal step in some direction in the Argand plane: \bqb dz &=& \left[z+dz\right] - \left[z\right] \\ &=& \left[(\rho+d\rho)e^{i(\phi+d\phi)}\right] -\left[\rho e^{i\phi}\right] \\ &=& \left[(\rho+d\rho)e^{i\,d\phi}e^{i\phi}\right] -\left[\rho e^{i\phi}\right] \\ &=& \left[(\rho+d\rho)(1+i\,d\phi)e^{i\phi}\right] -\left[\rho e^{i\phi}\right]. \eqb Since the product of two infinitesimals is negligible even on infinitesimal scale, we can drop the $d\rho\,d\phi$ term.% \footnote{ The dropping of second-order infinitesimals like $d\rho\,d\phi$, added to first order infinitesimals like~$d\rho$, is a standard calculus technique. One cannot \emph{always} drop them, however. Occasionally one encounters a sum in which not only do the finite terms cancel, but also the first-order infinitesimals. In such a case, the second-order infinitesimals dominate and cannot be dropped. An example of the type is \[ \lim_{\ep\ra 0}\frac{(1-\ep)^3+3(1+\ep)-4}{\ep^2} = \lim_{\ep\ra 0}\frac{(1-3\ep+3\ep^2)+(3+3\ep)-4}{\ep^2} = 3. \] One typically notices that such a case has arisen when the dropping of second-order infinitesimals has left an ambiguous~$0/0$. To fix the problem, you simply go back to the step where you dropped the infinitesimal and you restore it, then you proceed from there. Otherwise there isn't much point in carrying second-order infinitesimals around. In the relatively uncommon event that you need them, you'll know it. The math itself will tell you. } After canceling finite terms, we are left with the peculiar but fine formula \bq{taylor:350:dz} \index{$dz$} dz = (d\rho+i\rho\,d\phi)e^{i\phi}. \eq \subsection{Integrating along the contour} \label{taylor:350.20} \index{integral!complex contour} \index{contour integration!complex} \index{contour!complex} Now consider the integration~(\ref{taylor:350:10}) along the contour of Fig.~\ref{taylor:350:fig1}. \begin{figure} \caption[A complex contour of integration in two segments.]{A contour of integration in the Argand plane, in two segments: constant-$\rho$ ($z_a$ to~$z_b$); and constant-$\phi$ ($z_b$ to~$z_c$).} \label{taylor:350:fig1} \bc { \nc\xax{-5} \nc\xbx{-3} \nc\xcx{ 5} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xx{2.5} \nc\xxp{1.3} \nc\xxq{26.565} \nc\xxqq{13.283} \nc\xxr{2.2361} \nc\xxrr{1.1180} \nc\xxrrr{2.3} \nc\xxs{2.6} \nc\xxc{1.80} \nc\xxda{10.0} \nc\xxdb{50.0} \nc\xxdab{30.0} \nc\xxe{3.2} \psline[linewidth=0.5pt](-\xx,0)(\xx,0) \psline[linewidth=0.5pt](0,-\xx)(0,\xx) \psarc[linewidth=2.0pt]{*-}(0,0){\xxr}{\xxda}{\xxdb} \pscircle[linewidth=0.5pt,linestyle=dashed](0,0){\xxr} \rput{\xxdb}(0,0){ \psline[linewidth=2.0pt]{C->}(\xxr,0)(\xxe,0) \psline[linewidth=0.5pt,linestyle=dashed](0,0)(\xxr,0) } \psarc[linewidth=0.5pt]{->}(0,0){\xxp}{0}{\xxdb} \rput{\xxdab}(0,0){ \uput[r](\xxp,0){ \rput{*0}(0,0){$\phi$} } } \rput{\xxdb}(0,0){ \uput[u](\xxrr,0){ \rput{*0}(0,0){$\rho$} } } \rput(2.5,0.5){$z_a$} \rput(1.1,1.7){$z_b$} \rput(2.3,2.2){$z_c$} \rput[l](\xxs,0){$x=\Re(z)$} \rput[b](0,\xxs){$y=\Im(z)$} \end{pspicture} } \ec \end{figure} In\-te\-grat\-ing along the constant-$\phi$ segment, \bqb \int_{z_b}^{z_c} z^{n-1} \,dz &=& \int_{\rho_b}^{\rho_c} (\rho e^{i\phi})^{n-1} (d\rho+i\rho\,d\phi)e^{i\phi} \\&=& \int_{\rho_b}^{\rho_c} (\rho e^{i\phi})^{n-1} (d\rho)e^{i\phi} \\&=& e^{in\phi}\int_{\rho_b}^{\rho_c} \rho^{n-1} d\rho \\&=& \frac{e^{in\phi}}{n}\left(\rho_c^n-\rho_b^n\right) \\&=& \frac{z_c^n-z_b^n}{n}. \eqb In\-te\-grat\-ing along the constant-$\rho$ arc, \bqb \int_{z_a}^{z_b} z^{n-1} \,dz &=& \int_{\phi_a}^{\phi_b} (\rho e^{i\phi})^{n-1} (d\rho+i\rho\,d\phi)e^{i\phi} \\&=& \int_{\phi_a}^{\phi_b} (\rho e^{i\phi})^{n-1} (i\rho\,d\phi)e^{i\phi} \\&=& i\rho^{n}\int_{\phi_a}^{\phi_b} e^{in\phi} \,d\phi \\&=& \frac{i\rho^n}{in}\left(e^{in\phi_b}-e^{in\phi_a}\right) \\&=& \frac{z_b^n-z_a^n}{n}. \eqb Adding the two, we have that \[ \int_{z_a}^{z_c} z^{n-1} \,dz = \frac{z_c^n-z_a^n}{n}, \] surprisingly the same as for real~$z$. Since any path of integration between any two complex numbers~$z_1$ and~$z_2$ is approximated arbitrarily closely by a succession of short constant-$\rho$ and constant-$\phi$ segments, it follows generally that \bq{taylor:350:20} \int_{z_1}^{z_2} z^{n-1} \,dz = \frac{z_2^n-z_1^n}{n},\ \ n \in \mathbb Z, \ n\neq 0. \eq The applied mathematician might reasonably ask, ``Was~(\ref{taylor:350:20}) really worth the trouble? We knew \emph{that} already. It's the same as for real numbers.'' Well, we really didn't know it before deriving it, but the point is well taken nevertheless. However, notice the exemption of $n=0$. Equation~(\ref{taylor:350:20}) does not hold in that case. Consider the $n=0$ integral \[ S_0 = \int_{z_1}^{z_2} \frac{dz}{z}. \] Following the same steps as before and using~(\ref{cexp:225:dln}) and~(\ref{alggeo:230:311}), we find that \bq{taylor:350:31} \int_{\rho_1}^{\rho_2} \frac{dz}{z} = \int_{\rho_1}^{\rho_2} \frac{(d\rho+i\rho\,d\phi)e^{i\phi}}{\rho e^{i\phi}} = \int_{\rho_1}^{\rho_2} \frac{d\rho}{\rho} = \ln\frac{\rho_2}{\rho_1}. \eq This is always real-valued, but otherwise it brings no surprise. However, \bq{taylor:350:32} \int_{\phi_1}^{\phi_2} \frac{dz}{z} = \int_{\phi_1}^{\phi_2} \frac{(d\rho+i\rho\,d\phi)e^{i\phi}}{\rho e^{i\phi}} = i\int_{\phi_1}^{\phi_2} d\phi = i (\phi_2-\phi_1). \eq The odd thing about this is in what happens when the contour closes a complete loop in the Argand plane about the $z=0$ pole. In this case, $\phi_2=\phi_1+2\pi$, thus \[ S_0 = i2\pi, \] \emph{even though the integration ends where it begins.} Generalizing, we have that \bq{taylor:cauchy0} \renewcommand{\arraystretch}{2.0} \br{rcl} \ds\oint (z-z_o)^{n-1} \,dz &=& 0,\ \ n \in \mathbb Z, \ n \neq 0; \\ \ds\oint \frac{dz}{z-z_o} &=& i2\pi; \er \eq where as in \S~\ref{integ:260} the symbol~$\oint$ represents integration about a closed contour that ends where it begins, and where it is implied that the contour loops positively (counterclockwise, in the direction of increasing~$\phi$) exactly once about the $z=z_o$ pole. Notice that the formula's~$i2\pi$ does not depend on the precise path of integration, but only on the fact that the path loops once positively about the pole. Notice also that nothing in the derivation of~(\ref{taylor:350:20}) actually requires that~$n$ be an integer, so one can write \bq{taylor:350:21} \int_{z_1}^{z_2} z^{a-1} \,dz = \frac{z_2^a-z_1^a}{a},\ \ a\neq 0. \eq However,~(\ref{taylor:cauchy0}) does not hold in the latter case; its integral comes to zero for nonintegral~$a$ only if the contour does not enclose the branch point at $z=z_o$. \index{pole} \index{nonanalytic point} For a closed contour \emph{which encloses no pole or other nonanalytic point,} % bad break (\ref{taylor:350:21}) has that $\oint z^{a-1} \,dz = 0,$ or with the change of variable $z-z_o \la z$, \[ \oint (z-z_o)^{a-1} \,dz = 0. \] But because any analytic function can be expanded in the form $f(z) = \sum_k (c_k)(z-z_o)^{a_k-1}$ (which is just a Taylor series if the~$a_k$ happen to be positive integers), this means that \bq{taylor:cauchyf} \oint f(z) \,dz = 0 \eq if $f(z)$ is everywhere analytic within the contour.% \footnote{ The careful reader will observe that~(\ref{taylor:cauchyf})'s derivation does not explicitly handle an $f(z)$ represented by a Taylor series with an infinite number of terms and a finite convergence domain (for example, $f[z] = \ln[1-z]$). However, by \S~\ref{taylor:317} one can transpose such a series from~$z_o$ to an overlapping convergence domain about~$z_1$. Let the contour's interior be divided into several cells, each of which is small enough to enjoy a single convergence domain. Integrate about each cell. Because the cells share boundaries within the contour's interior, each interior boundary is integrated twice, once in each direction, canceling. The original contour---each piece of which is an exterior boundary of some cell---is integrated once piecewise. This is the basis on which a more rigorous proof is constructed. } \subsection{The formula} \label{taylor:350.30} The combination of~(\ref{taylor:cauchy0}) and~(\ref{taylor:cauchyf}) is powerful. Consider the closed contour integral \[ \oint \frac{f(z)}{z-z_o} \,dz, \] where the contour encloses no nonanalytic point of $f(z)$ itself but does enclose the pole of $f(z)/(z-z_o)$ at $z=z_o$. If the contour were a tiny circle of infinitesimal radius about the pole, then the integrand would reduce to $f(z_o)/(z-z_o)$; and then per~(\ref{taylor:cauchy0}), \bq{taylor:cauchy} \oint \frac{f(z)}{z-z_o} \,dz = i2\pi f(z_o). \eq But if the contour were not an infinitesimal circle but rather the larger contour of Fig.~\ref{taylor:350:fig2}? \begin{figure} \caption{A Cauchy contour integral.} \label{taylor:350:fig2} \index{contour!complex} \bc { \nc\xax{-5} \nc\xbx{-3} \nc\xcx{ 5} \nc\xdx{ 3} \nc\polexy{0.10} \nc\pole{ { \psset{linewidth=1.0pt} \psline(-\polexy,-\polexy)( \polexy, \polexy) \psline( \polexy,-\polexy)(-\polexy, \polexy) } } \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xx{2.5} \nc\xxt{0.30} \nc\xxtx{0.7} \nc\xxty{0.5} \nc\xxs{2.6} \psline[linewidth=0.5pt](-\xx,0)(\xx,0) \psline[linewidth=0.5pt](0,-\xx)(0,\xx) % Here are just some randomish points to make a lumpy closed curve. \psccurve[linewidth=2.0pt](-2.3,-0.4)(-1.1,1.1)(0,2.1)(1.8,0.5)(2.0,-0.3)(0.5,-1.9)(-0.7,-1.7) \rput(\xxtx,\xxty){ \pole \psarc[linewidth=0.5pt,linestyle=dashed](0,0){\xxt}{5.711}{354.289} \rput{15}(0,0){ \psline[linewidth=0.5pt](0,0.16)(0,0.70) \rput{*0}(0,0.90){$z_o$} } } { \nc\xxu[1]{\psline[linewidth=0.5pt,linestyle=dashed](1.0,#1)(1.8,#1)} \xxu{0.47} \xxu{0.53} } \rput[l](\xxs,0){$\Re(z)$} \rput[b](0,\xxs){$\Im(z)$} \end{pspicture} } \ec \end{figure} In this case, if the dashed detour which excludes the pole is taken, then according to~(\ref{taylor:cauchyf}) the resulting integral totals zero; but the two straight integral segments evidently cancel; and similarly as we have just reasoned, the \emph{reverse-directed} integral about the tiny detour circle is $-i2\pi f(z_o)$; so to bring the total integral to zero the integral about the main contour must be $i2\pi f(z_o)$. Thus,~(\ref{taylor:cauchy}) holds for any positively-directed contour which once encloses a pole and no other nonanalytic point, whether the contour be small or large. Equation~(\ref{taylor:cauchy}) is \emph{Cauchy's integral formula.} \index{linear superposition} \index{superposition} \index{residue} \index{regular part} If the contour encloses multiple poles (\S\S~\ref{alggeo:250} and~\ref{inttx:260.20}), then by the principle of linear superposition (\S~\ref{integ:240.05}), \bq{taylor:cauchyn} \oint \left[ f_o(z) + \sum_k \frac{f_k(z)}{z-z_k} \right] \,dz = i2\pi \sum_k f_k(z_k), \eq where the $f_o(z)$ is a \emph{regular part};% \footnote{\cite[\S~1.1]{Lebedev}} and again, where neither $f_o(z)$ nor any of the several $f_k(z)$ has a pole or other nonanalytic point within (or on) the contour. The values $f_k(z_k)$, which represent the strengths of the poles, are called \emph{residues}. In words,~(\ref{taylor:cauchyn}) says that an integral about a closed contour in the Argand plane comes to~$i2\pi$ times the sum of the residues of the poles (if any) thus enclosed. (Note however that eqn.~\ref{taylor:cauchyn} does not handle branch points. If there is a branch point, the contour must exclude it or the formula will not work.) As we shall see in \S~\ref{inttx:250}, whether in the form of~(\ref{taylor:cauchy}) or of~(\ref{taylor:cauchyn}) Cauchy's integral formula is an extremely useful result.% \footnote{% \cite[\S~10.6]{Hildebrand}% \cite{Spiegel}% \cite[``Cauchy's integral formula,'' 14:13, 20 April 2006]{wikip} } \subsection{Enclosing a multiple pole} \label{taylor:350.40} \index{pole!multiple, enclosing a} \index{multiple pole!enclosing} \index{complex contour!about a multiple pole} \index{closed contour!about a multiple pole} \index{contour!complex, about a multiple pole} When a complex contour of integration encloses a double, triple or other $n$-fold pole, the integration can be written \[ S = \oint \frac{f(z)}{(z-z_o)^{m+1}} \,dz, \ \ m \in \mathbb Z, \ m \ge 0, \] where $m+1=n$. Expanding $f(z)$ in a Taylor series~(\ref{taylor:310:20}) about $z=z_o$, \[ S = \oint \sum_{k=0}^{\infty} \left(\left.\frac {d^kf} {dz^k}\right|_{z=z_o}\right) \frac{dz}{(k!)(z-z_o)^{m-k+1}}. \] But according to~(\ref{taylor:cauchy0}), only the $k=m$ term contributes, so \bqb S &=& \oint \left(\left.\frac {d^mf} {dz^m}\right|_{z=z_o}\right) \frac{dz}{(m!)(z-z_o)} \\ \\&=& \frac{1}{m!}\left(\left.\frac {d^mf} {dz^m}\right|_{z=z_o}\right) \oint\frac{dz}{(z-z_o)} \\ \\&=& \frac{i2\pi}{m!}\left(\left.\frac {d^mf} {dz^m}\right|_{z=z_o}\right), \eqb where the integral is evaluated in the last step according to~(\ref{taylor:cauchy}). Altogether, \bq{taylor:350:30} \oint \frac{f(z)}{(z-z_o)^{m+1}} \,dz = \frac{i2\pi}{m!}\left(\left.\frac {d^mf} {dz^m}\right|_{z=z_o}\right),\ \ m \in \mathbb Z, \ m \ge 0. \eq Equation~(\ref{taylor:350:30}) evaluates a contour integral about an $n$-fold pole as~(\ref{taylor:cauchy}) does about a single pole. (When $m=0$, the two equations are the same.)% \footnote{\cite{Kohler-lecture}\cite{Spiegel}} % ---------------------------------------------------------------------- \section{Taylor series for specific functions} \label{taylor:315} \index{Taylor series!for specific functions} With the general Taylor series formula~(\ref{taylor:310:20}), the derivatives of Tables~\ref{cexp:drv} and~\ref{cexp:drvi}, and the observation from~(\ref{drvtv:240.30:10}) that \[ \frac{d(z^a)}{dz} = az^{a-1}, \] one can calculate Taylor series for many functions. For instance, expanding about $z=1$, \settowidth\tla{$\ds\left. \frac{-(-)^k(k-1)!}{z^k} \right|_{z=1}$} \settowidth\tlb{$-1$} \bqb \left. \ln z \right|_{z=1} = \makebox[\tla][r] {$\ds\left. \ln z \right|_{z=1}$} &=& \makebox[\tlb][r]{$0$}, \\ \left. \frac{d}{dz}\ln z \right|_{z=1} = \makebox[\tla][r] {$\ds\left. \frac{1}{z} \right|_{z=1}$} &=& \makebox[\tlb][r]{$1$}, \\ \left. \frac{d^2}{dz^2}\ln z \right|_{z=1} = \makebox[\tla][r] {$\ds\left. \frac{-1}{z^2} \right|_{z=1}$} &=& \makebox[\tlb][r]{$-1$}, \\ \left. \frac{d^3}{dz^3}\ln z \right|_{z=1} = \makebox[\tla][r] {$\ds\left. \frac{2}{z^3} \right|_{z=1}$} &=& \makebox[\tlb][r]{$2$}, \\ &\vdots& \\ \left. \frac{d^k}{dz^k}\ln z \right|_{z=1} = \makebox[\tla][r] {$\ds\left. \frac{-(-)^k(k-1)!}{z^k} \right|_{z=1}$} &=& -(-)^k(k-1)!,\ \ k > 0. \eqb With these derivatives, the Taylor series about $z=1$ is \[ \ln z = \sum_{k=1}^{\infty} \left[-(-)^k(k-1)! \right]\frac{(z-1)^k}{k!} = -\sum_{k=1}^{\infty} \frac{(1-z)^k}{k}, \] evidently convergent for $\left|1-z\right| < 1$. (And if~$z$ lies outside the convergence domain? Several strategies are then possible. One can expand the Taylor series about a different point; but cleverer and easier is to take advantage of some convenient relationship like $\ln w = -\ln[1/w]$. Section~\ref{taylor:316.80} elaborates.) Using such Taylor series, one can relatively efficiently calculate actual numerical values for $\ln z$ and many other functions. Table~\ref{taylor:315:tbl} lists Taylor series for a few functions of interest. All the series converge for $\left|z\right|<1$. The $\exp z$, $\sin z$ and $\cos z$ series converge for all complex~$z$. \begin{table} \caption{Taylor series.} \label{taylor:315:tbl} \bqb f(z) &=& \sum_{k=0}^{\infty} \left(\left.\frac {d^kf} {dz^k}\right|_{z=z_o}\right)\prod_{j=1}^k\frac{z-z_o}{j} \\ (1+z)^{a-1} &=& \sum_{k=0}^\infty \prod_{j=1}^k \left(\frac{a}{j}-1\right)z \\ \exp z &=& \sum_{k=0}^{\infty} \prod_{j=1}^k \frac{z}{j} = \sum_{k=0}^{\infty} \frac{z^k}{k!} \\ \sin z &=& \sum_{k=0}^{\infty} \left[ z \prod_{j=1}^k \frac{-z^2}{(2j)(2j+1)} \right] \\ \cos z &=& \sum_{k=0}^{\infty} \prod_{j=1}^k \frac{-z^2}{(2j-1)(2j)} \\ \sinh z &=& \sum_{k=0}^{\infty} \left[ z \prod_{j=1}^k \frac{z^2}{(2j)(2j+1)} \right] \\ \cosh z &=& \sum_{k=0}^{\infty} \prod_{j=1}^k \frac{z^2}{(2j-1)(2j)} \\ -\ln (1-z) &=& \sum_{k=1}^{\infty} \frac{1}{k} \prod_{j=1}^k z = \sum_{k=1}^{\infty} \frac{z^k}{k} \\ \arctan z &=& \sum_{k=0}^{\infty} \frac 1{2k+1} \left[ z \prod_{j=1}^k (-z^2) \right] = \sum_{k=0}^{\infty} \frac{(-)^k z^{2k+1}}{2k+1} \eqb \end{table} Among the several series, the series for $\arctan z$ is computed indirectly% \footnote{\cite[\S~11-7]{Shenk}} by way of Table~\ref{cexp:drvi} and~(\ref{alggeo:228:40}): \bqb \arctan z &=& \int_0^z \frac{1}{1+w^2} \,dw \\ &=& \int_0^z \sum_{k=0}^\infty (-)^k w^{2k} \,dw \\ &=& \sum_{k=0}^\infty \frac{(-)^k z^{2k+1}}{2k+1}. \eqb \index{Taylor expansion, first-order} \index{approximation to first order} \index{first-order approximation} \index{sine!approximation of to first order} \index{exponential!approximation of to first order} It is interesting to observe from Table~\ref{taylor:315:tbl} the useful first-order approximations that \bq{taylor:315:60} \settowidth\tla{$\exp z$} \begin{split} \lim_{z \ra 0} \makebox[\tla][r]{$\exp z$} &= 1 + z, \\ \lim_{z \ra 0} \makebox[\tla][r]{$\sin z$} &= z, \end{split} \eq among others. % ---------------------------------------------------------------------- \section{Error bounds} \label{taylor:316} \index{bound on a power series} \index{power series!bounds on} \index{error bound} \index{term!finite number of} One naturally cannot actually sum a Taylor series to an infinite number of terms. One must add some finite number of terms, then quit---which raises the question: how many terms are enough? How can one know that one has added adequately many terms; that the remaining terms, which constitute the tail of the series, are sufficiently insignificant? How can one set error bounds on the truncated sum? \subsection{Examples} \label{taylor:316.20} \index{alternating signs} \index{sign!alternating} \index{partial sum} \index{sum!partial} \index{truncation} \index{series!truncation of} Some series alternate sign. For these it is easy if the numbers involved happen to be real. For example, from Table~\ref{taylor:315:tbl}, \[ \ln \frac 3 2 = \ln \left( 1 + \frac 1 2 \right) = \frac 1{(1)(2^1)} - \frac 1{(2)(2^2)} + \frac 1{(3)(2^3)} - \frac 1{(4)(2^4)} + \cdots \] Each term is smaller in magnitude than the last, so the true value of $\ln(3/2)$ necessarily lies between the sum of the series to~$n$ terms and the sum to $n+1$ terms. The last and next partial sums bound the result. Up to but not including the fourth-order term, for instance, \[ \renewcommand\arraystretch{2.0} \br{c} \ds S_4 - \frac 1{(4)(2^4)} < \ln \frac 3 2 < S_4, \\ \ds S_4 = \frac 1{(1)(2^1)} - \frac 1{(2)(2^2)} + \frac 1{(3)(2^3)}. \er \] \index{majorization} Other series however do not alternate sign. For example, \bqb \ln 2 &=& -\ln \frac 1 2 = -\ln \left( 1 - \frac 1 2 \right) = S_5 + R_5, \\ S_5 &=& \frac 1{(1)(2^1)} + \frac 1{(2)(2^2)} + \frac 1{(3)(2^3)} + \frac 1{(4)(2^4)}, \\ R_5 &=& \frac 1{(5)(2^5)} + \frac 1{(6)(2^6)} + \cdots \eqb The basic technique in such a case is to find a replacement series (or integral)~$R_n'$ which one can collapse analytically, each of whose terms equals or exceeds in magnitude the corresponding term of~$R_n$. For the example, one might choose \[ R_5' = \frac 1 5 \sum_{k=5}^\infty \frac 1{2^k} = \frac{2}{(5)(2^5)}, \] wherein~(\ref{alggeo:228:45}) had been used to collapse the summation. Then, \[ S_5 < \ln 2 < S_5 + R_5'. \] For real $0 \le x < 1$ generally, \bqb S_n &<& -\ln(1-x) \makebox[\arraycolsep]{} < \makebox[\arraycolsep]{} S_n + R_n', \\ S_n &\equiv& \sum_{k=1}^{n-1} \frac{x^k}{k}, \\ R_n' &\equiv& \sum_{k=n}^{\infty} \frac{x^k}{n} = \frac{x^{n}}{(n)(1-x)}. \eqb Many variations and refinements are possible, some of which we will meet in the rest of the section, but that is the basic technique: to add several terms of the series to establish a lower bound, then to overestimate the remainder of the series to establish an upper bound. The overestimate~$R_n'$ \emph{majorizes} the series' true remainder~$R_n$. Notice that the~$R_n'$ in the example is a fairly small number, and that it would have been a lot smaller yet had we included a few more terms in~$S_n$ (for instance, $n = \mr{0x40}$ would have bound $\ln 2$ tighter than the limit of a computer's typical \texttt{double}-type floating-point accuracy). The technique usually works well in practice for this reason. \subsection{Majorization} \label{taylor:316.25} \index{majorization} \emph{To majorize} in mathematics is to be, or to replace by virtue of being, everywhere at least as great as. This is best explained by example. Consider the summation \[ S = \sum_{k=1}^\infty \frac 1{k^2} = 1 + \frac 1{2^2} + \frac 1{3^2} + \frac 1{4^2} + \cdots \] The exact value this summation totals to is unknown to us, but the summation does rather resemble the integral (refer to Table~\ref{integ:basic-antider}) \[ I = \int_{1}^\infty \frac {dx}{x^2} = \left.-\frac{1}{x}\right|_1^\infty = 1. \] Figure~\ref{taylor:316:fig-maj} plots~$S$ and~$I$ together as areas---or more precisely, plots $S-1$ and~$I$ together as areas (the summation's first term is omitted). % This figure, here commented out, was a reasonably good figure but it % did not show the majorization quite correctly. %\begin{figure} % \caption[Majorization.] % {The area~$S$ under the stairstep curve $f_S(x)$ majorizes the area~$I$ under the smooth curve $f_I(x)$.} % \label{taylor:316:fig-maj} % \bc % \nc\fxa{-1.0} \nc\fxb{9.0} % \nc\fya{-0.7} \nc\fyb{3.2} % \nc\xxa{-0.5} \nc\xxb{8.1} % \nc\xya{-0.3} \nc\xyb{2.5} % \nc\xxyt{0.2cm} % \nc\xxt[1]{\psline[linewidth=0.5pt](0,0)(0,-\xxyt)\uput[d](0,-\xxyt){#1}} % \nc\xyt[1]{\psline[linewidth=0.5pt](0,0)(-\xxyt,0){\psset{labelsep=0.12cm}\uput[l](-\xxyt,0){#1}}} % \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) % %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) % { % \small % \psset{dimen=middle} % { % \psset{linewidth=0.5pt} % \psline(\xxa,0)(\xxb,0) \uput[r](\xxb,0){$x$} % \psline(0,\xya)(0,\xyb) \uput[u](0,\xyb){$y$} % \rput(2.58,0.23){$y\!=\!1/x^2$} % \psline(3.22,0.20)(3.50,0.20)(3.78,0.32) % } % { % \psset{linewidth=2.0pt,unit=1.7cm} % \psplot[linestyle=dashed,plotpoints=200]{1.0}{4.6} % { 1.0 x dup mul div } % \psline{-C}( 1,0.00000)(1,1.00000) % \psline{C-C}(1,1.00000)(2,1.00000) \psline{C-c}(2,1.00000)(2,0.25000) % \psline{c-C}(2,0.25000)(3,0.25000) \psline{C-c}(3,0.25000)(3,0.11111) % \psline{c-C}(3,0.11111)(4,0.11111) \psline{C-c}(4,0.11111)(4,0.06250) % \psline{c-}( 4,0.06250)(4.6,0.06250) % { % \psset{labelsep=0.10cm} % \uput[u](1.5,1.00000){$y=1$} % \uput[u](2.5,0.25000){$y=1/2^2$} % \uput[u](3.5,0.11111){$y=1/3^2$} % \uput[u](4.5,0.06250){$y=1/4^2$} % } % \rput(0,1){\xyt{$1$}} % \rput(1,0){\xxt{$1$}} % \rput(2,0){\xxt{$2$}} % \rput(3,0){\xxt{$3$}} % \rput(4,0){\xxt{$4$}} % } % } % \end{pspicture} % \ec %\end{figure} \begin{figure} \caption[Majorization.] {Majorization. The area~$I$ between the dashed curve and the~$x$ axis majorizes the area $S-1$ between the stairstep curve and the~$x$ axis, because the height of the dashed curve is everywhere at least as great as that of the stairstep curve.} \label{taylor:316:fig-maj} \bc \nc\fxa{-1.0} \nc\fxb{11.10} \nc\fya{-0.9} \nc\fyb{ 5.00} \nc\xxa{-0.2} \nc\xxb{ 3.15} \nc\xya{-0.1} \nc\xyb{ 1.30} \nc\xxyt{0.20cm} \nc\xxt[1]{\psline[linewidth=0.5pt](0,0)(0,-\xxyt)\uput[d](0,-\xxyt){#1}} \nc\xyt[1]{\psline[linewidth=0.5pt](0,0)(-\xxyt,0){\psset{labelsep=0.12cm}\uput[l](-\xxyt,0){#1}}} \nc\xxc{3.12} \nc\xxd{0.86} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \small \psset{dimen=middle,unit=3.3cm} { \psset{linewidth=0.5pt} \psline(\xxa,0)(\xxb,0) \uput[r](\xxb,0){$x$} \psline(0,\xya)(0,\xyb) \uput[u](0,\xyb){$y$} \rput(2.55,0.28){$y=1/x^2$} \psline(2.32,0.28)(2.20,0.28)(2.16,0.22) } { \psset{linewidth=2.0pt} \psplot[linestyle=dashed,plotpoints=200]{1.0}{\xxc} { 1.0 x dup mul div } \psplot[linewidth=0.5pt,linestyle=dotted,plotpoints=50]{\xxd}{1.0} { 1.0 x dup mul div } \psline[linestyle=dashed]{-c}(1,0.25)(1,1) \psline{-C}( 1,0.00000)(1,0.25000) \psline{C-c}(1,0.25000)(2,0.25000) \psline{c-C}(2,0.25000)(2,0.11111) \psline{C-c}(2,0.11111)(3,0.11111) \psline{c-C}(3,0.11111)(3,0.06250) \psline{C-}( 3,0.06250)(\xxc,0.06250) { \psset{linewidth=0.5pt,labelsep=0.12cm} \psline(0,0.25000)(-0.04,0.25000)(-0.09,0.28750)(-0.13,0.28750)\rput[r](-0.15,0.28750){$1/2^2$} \psline(0,0.11111)(-0.04,0.11111)(-0.09,0.17500)(-0.13,0.17500)\rput[r](-0.15,0.17500){$1/3^2$} \psline(0,0.06250)(-0.13,0.06250)\rput[r](-0.15,0.06250){$1/4^2$} } { \psset{linewidth=0.5pt,linestyle=dotted} \psline(0,1.00000)(1,1.00000) \psline(0,0.25000)(1,0.25000) \psline(0,0.11111)(1,0.11111) \psline(1,0.11111)(2,0.11111) \psline(0,0.06250)(1,0.06250) \psline(1,0.06250)(2,0.06250) \psline(1.996,0.06250)(3,0.06250) } \rput(0,1){\xyt{$1$}} \rput(1,0){\xxt{$1$}} \rput(2,0){\xxt{$2$}} \rput(3,0){\xxt{$3$}} } } \end{pspicture} \ec \end{figure} As the plot shows, the unknown area $S-1$ cannot possibly be as great as the known area~$I$. In symbols, $S-1 < I = 1$; or, \[ S < 2. \] The integral~$I$ majorizes the summation $S-1$, thus guaranteeing the absolute upper limit on~$S$. (Of course $S < 2$ is a very loose limit, but that isn't the point of the example. In practical calculation, one would let a computer add many terms of the series first numerically, and only then majorize the remainder. Even so, cleverer ways to majorize the remainder of this particular series will occur to the reader, such as in representing the terms graphically---not as flat-topped rectangles---but as slant-topped trapezoids, shifted in the figure a half unit rightward.) \index{integral!and series summation} \index{summation!compared to integration} \index{minorization} Majorization serves surely to bound an unknown quantity by a larger, known quantity. Reflecting, \emph{minorization}% \footnote{ The author does not remember ever encountering the word \emph{minorization} heretofore in print, but as a reflection of \emph{majorization} the word seems logical. This book at least will use the word where needed. You can use it too if you like. } serves surely to bound an unknown quantity by a smaller, known quantity. The quantities in question are often integrals and/or series summations, the two of which are akin as Fig.~\ref{taylor:316:fig-maj} illustrates. The choice of whether to majorize a particular unknown quantity by an integral or by a series summation depends on the convenience of the problem at hand. \index{harmonic series} \index{series!harmonic} The series~$S$ of this subsection is interesting, incidentally. It is a \emph{harmonic series} rather than a power series, because although its terms do decrease in magnitude it has no~$z^k$ factor (or seen from another point of view, it does have a~$z^k$ factor, but $z=1$), and the ratio of adjacent terms' magnitudes approaches unity as~$k$ grows. Harmonic series can be hard to sum accurately, but clever majorization can help (and if majorization does not help enough, the series transformations of % diagn [chapter not yet written] can help even more). \subsection{Geometric majorization} \label{taylor:316.30} \index{majorization!geometric} \index{geometric majorization} \index{geometric series!majorization by} \index{series!geometric} \index{residual} Harmonic series can be hard to sum as \S~\ref{taylor:316.25} has observed, but more common than harmonic series are true power series, easier to sum in that they include a~$z^k$ factor in each term. There is no one, ideal bound that works equally well for all power series. However, the point of establishing a bound is not to sum a power series exactly but rather to fence the sum within some sufficiently (rather than optimally) small neighborhood. A simple, general bound which works quite adequately for most power series encountered in practice, including among many others all the Taylor series of Table~\ref{taylor:315:tbl}, is the \emph{geometric majorization} \bq{taylor:316:30} \left|\ep_n\right| < \frac{\left|\tau_n\right|}{1-\left|\rho_n\right|}. \eq Here,~$\tau_n$ represents the power series' $n$th-order term (in Table~\ref{taylor:315:tbl}'s series for $\exp z$, for example, $\tau_n = z^n/[n!]$). The~$\left|\rho_n\right|$ is a positive real number chosen, preferably as small as possible, such that { \settowidth\tla{for at least one} \bqa \left| \frac{\tau_{k+1}}{\tau_k} \right| &\le& \left|\rho_n\right| \ \ \mbox{\makebox[\tla][r]{for all} $k \ge n$,} \label{taylor:316:33} \\ \left| \frac{\tau_{k+1}}{\tau_k} \right| &<& \left|\rho_n\right| \ \ \mbox{for at least one $k \ge n$,} \xn \\ 0 &<& \left|\rho_n\right| \makebox[\arraycolsep]{} < \makebox[\arraycolsep]{} 1; \eqa }% which is to say, more or less, such that each term in the series' tail is smaller than the last by at least a factor of~$\left|\rho_n\right|$. Given these definitions, if% \footnote{ Some scientists and engineers---as, for example, the authors of~\cite{Peterson/Mittra:1986} and even this writer in earlier years---prefer to define $\ep_n \equiv S_n - S_\infty$, oppositely as we define it here. This choice is a matter of taste. Professional mathematicians---as, for example, the author of~\cite{vdVorst}---seem to tend toward the $\ep_n \equiv S_\infty - S_n$ of~(\ref{taylor:316:34}). } \bq{taylor:316:34} \begin{split} S_n &\equiv \sum_{k=0}^{n-1} \tau_k, \\ \ep_n &\equiv S_\infty - S_n, \end{split} \eq where~$S_\infty$ represents the true, exact (but uncalculatable, unknown) infinite series sum, then~(\ref{alggeo:228:45}) and~(\ref{trig:278:triangle}) imply the geometric majorization~(\ref{taylor:316:30}). \index{logarithm, natural!error of} \index{natural logarithm!error of} If the last paragraph seems abstract, a pair of concrete examples should serve to clarify. First, if the Taylor series \[ -\ln (1-z) = \sum_{k=1}^{\infty} \frac{z^k}{k} \] of Table~\ref{taylor:315:tbl} is truncated before the $n$th-order term, then \bqb -\ln (1-z) &\approx& \sum_{k=1}^{n-1} \frac{z^k}{k}, \\ \left|\ep_n\right| &<& \frac{\left|z^n\right|/n}{1-\left|z\right|}, \eqb where~$\ep_n$ is the error in the truncated sum.% \footnote{ This particular error bound fails for $n=0$, but that is no flaw. There is no reason to use the error bound for $n=0$ when, merely by taking one or two more terms into the truncated sum, one can quite conveniently let $n=1$ or $n=2$. } Here, $\left|\tau_{k+1}/\tau_k\right| = [k/(k+1)]\left|z\right| < \left|z\right|$ for all $k \ge n > 0$, so we have chosen $\left|\rho_n\right| = \left|z\right|$. \index{exponential!natural, error of} \index{natural exponential!error of} Second, if the Taylor series \[ \exp z = \sum_{k=0}^{\infty} \prod_{j=1}^k \frac{z}{j} = \sum_{k=0}^{\infty} \frac{z^k}{k!} \] also of Table~\ref{taylor:315:tbl} is truncated before the $n$th-order term, and if we choose to stipulate that \[ n+1 > \left|z\right|, \] then \bqb \exp z &\approx& \sum_{k=0}^{n-1} \prod_{j=1}^k \frac{z}{j} = \sum_{k=0}^{n-1} \frac{z^k}{k!}, \\ \left|\ep_n\right| &<& \frac{\left|z^n\right|/n!}{1-\left|z\right|/(n+1)}. \eqb Here, $\left|\tau_{k+1}/\tau_k\right| = \left|z\right|/(k+1)$, whose maximum value for all $k\ge n$ occurs when $k=n$, so we have chosen $\left|\rho_n\right| = \left|z\right|/(n+1)$. \subsection{Calculation outside the fast convergence domain} \label{taylor:316.80} \index{slow convergence} \index{convergence!slow} \index{domain} \index{convergence!domain of} \index{function!entire} \index{entire function} Used directly, the Taylor series of Table~\ref{taylor:315:tbl} tend to converge slowly for some values of~$z$ and not at all for others. The series for $-\ln(1-z)$ and $(1+z)^{a-1}$ for instance each converge for $\left| z \right| < 1$ (though slowly for $\left| z \right| \approx 1$); whereas each series diverges when asked to compute a quantity like $-\ln 3$ or $3^{a-1}$ directly. To shift the series' expansion points per \S~\ref{taylor:317} is one way to seek convergence, but for nonentire functions~(\S~\ref{taylor:330}) like these a more probably profitable strategy is to find and exploit some property of the functions to transform their arguments, such as \bqb -\ln \gamma &=& \ln\frac{1}{\gamma}, \\ \gamma^{a-1} &=& \frac{1}{(1/\gamma)^{a-1}}, \eqb which leave the respective Taylor series to compute quantities like $-\ln(1/3)$ and $(1/3)^{a-1}$ they can handle. Let $f(1+\zeta)$ be a function whose Taylor series about $\zeta=0$ converges for $\left|\zeta\right| < 1$ and which obeys properties of the forms% \footnote{ This paragraph's notation is necessarily abstract. To make it seem more concrete, consider that the function $f(1+\zeta)=-\ln(1-z)$ has $\zeta = -z$, $f(\gamma) = g[f(1/\gamma)] = -f(1/\gamma)$ and $f(\alpha\gamma) = h[f(\alpha),f(\gamma)] = f(\alpha)+f(\gamma)$; and that the function $f(1+\zeta)=(1+z)^{a-1}$ has $\zeta = z$, $f(\gamma) = g[f(1/\gamma)] = 1/f(1/\gamma)$ and $f(\alpha\gamma) = h[f(\alpha),f(\gamma)]=f(\alpha)f(\gamma)$. } \bq{taylor:316:83} \begin{split} f(\gamma) &= g\left[f\left(\frac{1}{\gamma}\right)\right], \\ f(\alpha\gamma) &= h\left[f(\alpha),f(\gamma)\right], \end{split} \eq where $g[\cdot]$ and $h[\cdot,\cdot]$ are functions we know how to compute like $g[\cdot]=-[\cdot]$ or $g[\cdot]=1/[\cdot]$; and like $h[\cdot,\cdot]=[\cdot]+[\cdot]$ or $h[\cdot,\cdot]=[\cdot][\cdot]$. Identifying \bq{taylor:316:84} \begin{split} \frac{1}{\gamma} &= 1 + \zeta, \\ \gamma &= \frac{1}{1 + \zeta}, \\ \frac{1-\gamma}{\gamma} &= \zeta, \end{split} \eq we have that \bq{taylor:316:85} f(\gamma) = g\left[f\left(1+\frac{1-\gamma}{\gamma}\right)\right], \eq whose convergence domain $\left|\zeta\right| < 1$ is $\left|1-\gamma\right|/\left|\gamma\right| < 1$, which is $\left|\gamma-1\right| < \left|\gamma\right|$ or in other words \[ \Re(\gamma) > \frac 1 2. \] Although the transformation from~$\zeta$ to~$\gamma$ has not lifted the convergence limit altogether, we see that it has apparently opened the limit to a broader domain. \index{domain!sidestepping a} Though this writer knows no way to lift the convergence limit altogether that does not cause more problems than it solves, one can take advantage of the $h[\cdot,\cdot]$ property of~(\ref{taylor:316:83}) to sidestep the limit, computing $f(\omega)$ indirectly for any~$\omega \neq 0$ by any of several tactics. One nonoptimal but entirely effective tactic is represented by the equations \bq{taylor:316:90} \begin{split} \omega &\equiv i^n2^m\gamma, \\ \left|\Im(\gamma)\right| &\le \Re(\gamma), \\ 1 \le \Re(\gamma) &< 2, \\ m,n &\in \mathbb Z, \end{split} \eq whereupon the formula \bq{taylor:316:92} f(\omega) = h[f(i^n2^m), f(\gamma)] \eq calculates $f(\omega)$ fast for any $\omega \neq 0$---provided only that we have other means to compute $f(i^n2^m)$, which not infrequently we do.% \footnote{ Equation~(\ref{taylor:316:92}) admittedly leaves open the question of how to compute~$f(i^n2^m)$, but at least for the functions this subsection has used as examples this is not hard. For the logarithm, $-\ln(i^n2^m)=m\ln(1/2)-in(2\pi/4)$. For the power, $(i^n2^m)^{a-1}=\cis[(n2\pi/4)(a-1)]/[(1/2)^{a-1}]^m$. The sine and cosine in the cis function are each calculated directly by Taylor series % diagn: review the following parenthetical remark. (possibly with the help of Table~\ref{trig:228:table}), as are the numbers $\ln(1/2)$ and $(1/2)^{a-1}$. The number~$2\pi$, we have not calculated yet, but will in \S~\ref{taylor:355}. } Notice how~(\ref{taylor:316:90}) fences~$\gamma$ within a comfortable zone, keeping~$\gamma$ moderately small in magnitude but never too near the $\Re(\gamma) = 1/2$ frontier in the Argand plane. In theory all finite~$\gamma$ rightward of the frontier let the Taylor series converge, but extreme~$\gamma$ of any kind let the series converge only slowly (and due to compound floating-point rounding error inaccurately) inasmuch as they imply that $\left|\zeta\right| \approx 1$. Besides allowing all~$\omega\neq 0$, the tactic~(\ref{taylor:316:90}) also thus significantly speeds series convergence. The method and tactic of~(\ref{taylor:316:83}) through~(\ref{taylor:316:92}) are useful in themselves and also illustrative generally. Of course, most nonentire functions lack properties of the specific kinds that~(\ref{taylor:316:83}) demands, but such functions may have other properties one can analogously exploit.% \footnote{ To draw another example from Table~\ref{taylor:315:tbl}, consider that \[ \begin{split} \arctan \omega &= \alpha + \arctan \zeta, \\ \zeta &\equiv \frac{ \omega\cos\alpha - \sin\alpha }{ \omega\sin\alpha + \cos\alpha }, \end{split} \] where $\arctan \omega$ is interpreted as the geometrical angle the vector $\vu x + \vu y \omega$ makes with~$\vu x$. Axes are rotated per~(\ref{trig:240:rot}) through some angle~$\alpha$ to reduce the tangent from~$\omega$ to~$\zeta$, where $\arctan \omega$ is interpreted as the geometrical angle the vector $\vu x + \vu y \omega = \vu x' (\omega\sin\alpha+\cos\alpha) +\vu y' (\omega\cos\alpha-\sin\alpha)$ makes with~$\vu x'$, thus causing the Taylor series to converge faster or indeed to converge at all. Any number of further examples and tactics of the kind will occur to the creative reader, shrinking a function's argument by some convenient means before feeding the argument to the function's Taylor series. } \subsection{Nonconvergent series} \label{taylor:316.70} Variants of this section's techniques can be used to prove that a series does not converge at all. For example, \[ \sum_{k=1}^{\infty} \frac{1}{k} \] does not converge because \[ \frac 1 k > \int_{k}^{k+1} \frac{d\tau}{\tau}; \] hence, \[ \sum_{k=1}^{\infty} \frac{1}{k} > \sum_{k=1}^{\infty} \int_{k}^{k+1} \frac{d\tau}{\tau} = \int_{1}^\infty \frac{d\tau}{\tau} = \ln\infty. \] \subsection{Remarks} \label{taylor:316.95} The study of error bounds is not a matter of rules and formulas so much as of ideas, suggestions and tactics. There is normally no such thing as an optimal error bound---with sufficient cleverness, some tighter bound can usually be discovered---but often easier and more effective than cleverness is simply to add a few extra terms into the series before truncating it (that is, to increase~$n$ a little). To eliminate the error entirely usually demands adding an infinite number of terms, which is impossible; but since eliminating the error entirely also requires recording the sum to infinite precision, which is impossible anyway, eliminating the error entirely is not normally a goal one seeks. To eliminate the error to the $\mbox{0x34}$-bit (sixteen-decimal place) precision of a computer's \texttt{double}-type floating-point representation typically requires something like~$\mbox{0x34}$ terms---if the series be wisely composed and if care be taken to keep~$z$ moderately small and reasonably distant from the edge of the series' convergence domain. Besides, few engineering applications really use much more than~$\mbox{0x10}$ bits (five decimal places) in any case. Perfect precision is impossible, but adequate precision is usually not hard to achieve. Occasionally nonetheless a series arises for which even adequate precision is quite hard to achieve. An infamous example is \[ S = -\sum_{k=1}^\infty \frac{(-)^k}{\sqrt k} = 1 - \frac 1{\sqrt 2} + \frac 1{\sqrt 3} - \frac 1{\sqrt 4} + \cdots, \] which obviously converges, but sum it if you can! It is not easy to do. Before closing the section, we ought to arrest one potential agent of terminological confusion. The ``error'' in a series summation's error bounds is unrelated to the error of probability theory (Ch.~\ref{prob}). The English word ``error'' is thus overloaded here. A series sum converges to a definite value, and to the same value every time the series is summed; no chance is involved. It is just that we do not necessarily know exactly what that value is. What we can do, by this section's techniques or perhaps by other methods, is to establish a definite neighborhood in which the unknown value is sure to lie; and we can make that neighborhood as tight as we want, merely by including a sufficient number of terms in the sum. The topic of series error bounds is what G.S.~Brown refers to as ``trick-based.''% \footnote{\cite{Brown-conversation}} There is no general answer to the error-bound problem, but there are several techniques which help, some of which this section has introduced. Other techniques, we shall meet later in the book as the need for them arises. % ---------------------------------------------------------------------- \section{Calculating~$2\pi$} \label{taylor:355} \index{$2\pi$!calculating} The Taylor series for $\arctan z$ in Table~\ref{taylor:315:tbl} implies a neat way of calculating the constant~$2\pi$. We already know that $\tan(2\pi/8) = 1$, or in other words that \[ \arctan 1 = \frac{2\pi}{8}. \] Applying the Taylor series, we have that \bq{taylor:355:10} 2\pi = 8\sum_{k=0}^{\infty} \frac{(-)^k}{2k+1}. \eq The series~(\ref{taylor:355:10}) is simple but converges extremely slowly. Much faster convergence is given by angles smaller than $2\pi/8$. For example, from Table~\ref{trig:hourtable}, \[ \arctan \frac{\sqrt 3 - 1}{\sqrt 3 + 1} = \frac{2\pi}{\mbox{0x18}}. \] Applying the Taylor series at this angle, we have that% \footnote{ The writer is given to understand that clever mathematicians have invented subtle, still much faster-converging iterative schemes toward~$2\pi$. However, there is fast and there is fast. The relatively straightforward series this section gives converges to the best accuracy of your computer's floating-point register within a paltry~\mbox{0x40} iterations or so---and, after all, you only need to compute the numerical value of~$2\pi$ once. Admittedly, the writer supposes that useful lessons lurk in the clever mathematics underlying the subtle schemes, but such schemes are not covered here. } \bq{taylor:355:15} 2\pi = \mbox{0x18}\sum_{k=0}^{\infty} \frac{(-)^k}{2k+1} \left(\frac{\sqrt 3 - 1}{\sqrt 3 + 1}\right)^{2k+1} \approx \mbox{0x6.487F}. \eq % ---------------------------------------------------------------------- \section{Odd and even functions} \label{taylor:365} \index{function!odd or even} \index{odd function} \index{even function} An \emph{odd function} is one for which $f(-z)=-f(z)$. Any function whose Taylor series about $z_o=0$ includes only odd-order terms is an odd function. Examples of odd functions include~$z^3$ and $\sin z$. An \emph{even function} is one for which $f(-z)=f(z)$. Any function whose Taylor series about $z_o=0$ includes only even-order terms is an even function. Examples of even functions include~$z^2$ and $\cos z$. \index{symmetry} Odd and even functions are interesting because of the symmetry they bring---the plot of a real-valued odd function being symmetric about a point, the plot of a real-valued even function being symmetric about a line. Many functions are neither odd nor even, of course, but one can always split an analytic function into two components---one odd, the other even---by the simple expedient of sorting the odd-order terms from the even-order in the function's Taylor series. For example, $\exp z = \sinh z + \cosh z$. % diagn: the rest of this paragraph is wants one last review. Alternately, \bq{taylor:365:10} \begin{split} f(z) &= f_{\mr{odd}}(z) + f_{\mr{even}}(z), \\ f_{\mr{odd}}(z) &= \frac{f(z)-f(-z)}{2}, \\ f_{\mr{even}}(z) &= \frac{f(z)+f(-z)}{2}, \end{split} \eq the latter two lines of which are verified by substituting $-z\la z$ and observing the definitions at the section's head of odd and even, then the first line of which is verified by adding the latter two. % diagn: this paragraph is new and wants review. Section~\ref{fouri:110.65} will have more to say about odd and even functions. % ---------------------------------------------------------------------- \section{Trigonometric poles} \label{taylor:382} \index{trigonometric function!poles of} \index{pole!of a trigonometric function} The singularities of the trigonometric functions are single poles of residue~$\pm 1$ or~$\pm i$. For the circular trigonometrics, all the poles lie along the real number line; for the hyperbolic trigonometrics, along the imaginary. Specifically, of the eight trigonometric functions \[ \renewcommand\arraystretch{2.0} \br{c} \ds\frac{1}{\sin z}, \ds\frac{1}{\cos z}, \ds\frac{1}{\tan z}, \ds\tan z, \\ \ds\frac{1}{\sinh z}, \ds\frac{1}{\cosh z}, \ds\frac{1}{\tanh z}, \ds\tanh z, \er \] the poles and their respective residues are \settowidth\tla{\scriptsize$z-i(k-1/2)\pi$} \bq{taylor:382:10} \begin{split} \left.\frac{z-k\pi}{\sin z}\right|_{\makebox[\tla][l]{\scriptsize$z=k\pi$}} &= (-)^k, \\ \left.\frac{z-(k-1/2)\pi}{\cos z}\right|_{\makebox[\tla][l]{\scriptsize$z=(k-1/2)\pi$}} &= (-)^k, \\ \left.\frac{z-k\pi}{\tan z}\right|_{\makebox[\tla][l]{\scriptsize$z=k\pi$}} &= 1, \\ \left.[z-(k-1/2)\pi]\tan z\right|_{\makebox[\tla][l]{\scriptsize$z=(k-1/2)\pi$}} &= -1, \\ \left.\frac{z-ik\pi}{\sinh z}\right|_{\makebox[\tla][l]{\scriptsize$z=ik\pi$}} &= (-)^k, \\ \left.\frac{z-i(k-1/2)\pi}{\cosh z}\right|_{\makebox[\tla][l]{\scriptsize$z=i(k-1/2)\pi$}} &= i(-)^k, \\ \left.\frac{z-ik\pi}{\tanh z}\right|_{\makebox[\tla][l]{\scriptsize$z=ik\pi$}} &= 1, \\ \left.[z-i(k-1/2)\pi]\tanh z\right|_{\makebox[\tla][l]{\scriptsize$z=i(k-1/2)\pi$}} &= 1, \\ k &\in \mathbb Z. \end{split} \eq \index{l'H\^opital's rule} To support~(\ref{taylor:382:10})'s claims, we shall marshal the identities of Tables~\ref{cexp:tbl-prop} and~\ref{cexp:drv} plus l'H\^opital's rule~(\ref{drvtv:260:lhopital}). Before calculating residues and such, however, we should like to verify that the poles~(\ref{taylor:382:10}) lists are in fact the only poles that there are; that we have forgotten no poles. Consider for instance the function $1/\sin z = i2/(e^{iz}-e^{-iz})$. This function evidently goes infinite only when $e^{iz}=e^{-iz}$, which is possible only for real~$z$; but for real~$z$, the sine function's very definition establishes the poles $z=k\pi$ (refer to Fig.~\ref{trig:226:f1}). With the observations from Table~\ref{cexp:tbl-prop} that $i\sinh z=\sin iz$ and $\cosh z=\cos iz$, similar reasoning for each of the eight trigonometrics forbids poles other than those~(\ref{taylor:382:10}) lists. Satisfied that we have forgotten no poles, therefore, we finally apply l'H\^opital's rule to each of the ratios \[ %\renewcommand\arraystretch{2.0} \br{c} \ds\frac{z-k\pi}{\sin z}, \ds\frac{z-(k-1/2)\pi}{\cos z}, \ds\frac{z-k\pi}{\tan z}, \ds\frac{z-(k-1/2)\pi}{1/\tan z}, \\ \ds\frac{z-ik\pi}{\sinh z}, \ds\frac{z-i(k-1/2)\pi}{\cosh z}, \ds\frac{z-ik\pi}{\tanh z}, \ds\frac{z-i(k-1/2)\pi}{1/\tanh z} \er \] to reach~(\ref{taylor:382:10}). \index{function!meromorphic} \index{meromorphic function} Trigonometric poles evidently are special only in that a trigonometric function has an infinite number of them. The poles are ordinary, single poles, with residues, subject to Cauchy's integral formula and so on. The trigonometrics are meromorphic functions (\S~\ref{taylor:330}) for this reason.% \footnote{\cite{Kohler-lecture}} \index{function!entire} \index{entire function} The six simpler trigonometrics $\sin z$, $\cos z$, $\sinh z$, $\cosh z$, $\exp z$ and $\cis z$---conspicuously excluded from this section's gang of eight---have no poles for finite~$z$, because $e^z$, $e^{iz}$, $e^z \pm e^{-z}$ and $e^{iz} \pm e^{-iz}$ then likewise are finite. These simpler trigonometrics are not only meromorphic but also entire. Observe however that the \emph{inverse} trigonometrics are multiple-valued and have branch points, and thus are not meromorphic at all. % ---------------------------------------------------------------------- \section{The Laurent series} \label{taylor:380} \index{Laurent series} \index{Laurent, Pierre Alphonse (1813--1854)} \index{pole} \index{long division} \index{regular part} Any analytic function can be expanded in a Taylor series, but never about a pole or branch point of the function. Sometimes one nevertheless wants to expand at least about a pole. Consider for example expanding \bq{taylor:380:06} f(z) = \frac{e^{-z}}{1-\cos z} \eq about the function's pole at $z=0$. Expanding dividend and divisor separately, \bqb f(z) &=& \frac{ 1 - z + z^2/2 - z^3/6 + \cdots }{ z^2/2 - z^4/\mbox{0x18} + \cdots } \\&=& \frac{ \sum_{j=0}^\infty \left[ (-)^j z^{j}/j! \right] }{ -\sum_{k=1}^\infty (-)^k z^{2k}/(2k)! } \\&=& \frac{ \sum_{k=0}^\infty \left[ -z^{2k}/(2k)! + z^{2k+1}/(2k+1)! \right] }{ \sum_{k=1}^\infty (-)^k z^{2k}/(2k)! }. \eqb By long division, \bqb f(z) &=& \frac{2}{z^2} - \frac{2}{z} + \left\{ \left[ -\frac{2}{z^2} + \frac{2}{z} \right] \sum_{k=1}^\infty \frac{(-)^k z^{2k}}{(2k)!} \right.\\&& \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left.\mbox{} + \sum_{k=0}^\infty \left[ -\frac{z^{2k}}{(2k)!} + \frac{z^{2k+1}}{(2k+1)!} \right] \right\} \left/ \sum_{k=1}^\infty \frac{(-)^k z^{2k}}{(2k)!} \right. \\&=& \frac{2}{z^2} - \frac{2}{z} + \left\{ \sum_{k=1}^\infty \left[ - \frac{(-)^k 2 z^{2k-2}}{(2k)!} + \frac{(-)^k 2 z^{2k-1}}{(2k)!} \right] \right.\\&& \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left.\mbox{} + \sum_{k=0}^\infty \left[ -\frac{z^{2k}}{(2k)!} + \frac{z^{2k+1}}{(2k+1)!} \right] \right\} \left/ \sum_{k=1}^\infty \frac{(-)^k z^{2k}}{(2k)!} \right. \\&=& \frac{2}{z^2} - \frac{2}{z} + \left\{ \sum_{k=0}^\infty \left[ \frac{(-)^k 2 z^{2k}}{(2k+2)!} - \frac{(-)^k 2 z^{2k+1}}{(2k+2)!} \right] \right.\\&& \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left.\mbox{} + \sum_{k=0}^\infty \left[ -\frac{z^{2k}}{(2k)!} + \frac{z^{2k+1}}{(2k+1)!} \right] \right\} \left/ \sum_{k=1}^\infty \frac{(-)^k z^{2k}}{(2k)!} \right. \\&=& \frac{2}{z^2} - \frac{2}{z} + \sum_{k=0}^\infty \left[ \frac{ -(2k+1)(2k+2) + (-)^k 2 }{(2k+2)!}z^{2k} \right.\\&& \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left.\mbox{} + \frac{ (2k+2) - (-)^k 2 }{(2k+2)!}z^{2k+1} \right] \left/ \sum_{k=1}^\infty \frac{(-)^k z^{2k}}{(2k)!}. \right. \eqb The remainder's $k=0$ terms now disappear as intended; so, factoring $z^2/z^2$ from the division leaves \bqa f(z) &=& \frac{2}{z^2} - \frac{2}{z} + \sum_{k=0}^\infty \left[ \frac{ (2k+3)(2k+4) + (-)^k 2 }{(2k+4)!}z^{2k} \right.\xn\\&& \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left.\mbox{} - \frac{ (2k+4) + (-)^k 2 }{(2k+4)!}z^{2k+1} \right] \left/ \sum_{k=0}^\infty \frac{(-)^k z^{2k}}{(2k+2)!}. \right. \xn\\\label{taylor:380:30} \eqa One can continue dividing to extract further terms if desired, and if all the terms \[ f(z) = \frac{2}{z^2} - \frac{2}{z} + \frac{7}{6} - \frac{z}{2} + \cdots \] are extracted the result is the \emph{Laurent series} proper, \bq{taylor:380:laurent} f(z) = \sum_{k=K}^\infty (a_k) (z-z_o)^k, \ \ (k,K) \in \mathbb Z, \ K \le 0. \eq However for many purposes (as in eqn.~\ref{taylor:380:30}) the partial Laurent series \bqa f(z) &=& \sum_{k=K}^{-1} (a_k) (z-z_o)^k + \frac{ \sum_{k=0}^\infty (b_k) (z-z_o)^k }{ \sum_{k=0}^\infty (c_k) (z-z_o)^k }, \label{taylor:380:partial} \\ (k,K) &\in& \mathbb Z, \ \ K \le 0, \ c_0 \neq 0, \xn \eqa suffices and may even be preferable. In either form, \bq{taylor:380:40} f(z) = \sum_{k=K}^{-1} (a_k) (z-z_o)^k + f_o(z-z_o), \ \ (k,K) \in \mathbb Z, \ K \le 0, \eq where, unlike $f(z)$, $f_o(z-z_o)$ is analytic at $z=z_o$. The $f_o(z-z_o)$ of~(\ref{taylor:380:40}) is $f(z)$'s \emph{regular part} at $z=z_o$. The ordinary Taylor series diverges at a function's pole. Handling the pole separately, the Laurent series remedies this defect.% \footnote{ The professional mathematician's treatment of the Laurent series usually begins by defining an annular convergence domain (a convergence domain bounded without by a large circle and within by a small) in the Argand plane. From an applied point of view however what interests us is the basic technique to remove the poles from an otherwise analytic function. Further rigor is left to the professionals. }$\mbox{}^,$\footnote{% \cite[\S~10.8]{Hildebrand}% \cite[\S~2.5]{Fisher} } Sections~\ref{inttx:250} and~\ref{inttx:260} tell more about poles generally, including multiple poles like the one in the example here. % ---------------------------------------------------------------------- \section{Taylor series in $1/z$} \label{taylor:385} \index{Taylor series!in $1/z$} A little imagination helps the Taylor series a lot. The Laurent series of \S~\ref{taylor:380} represents one way to extend the Taylor series. Several other ways are possible. The typical trouble one has with the Taylor series is that a function's poles and branch points limit the series' convergence domain. Thinking flexibly, however, one can often evade the trouble. \index{essential singularity} \index{singularity!essential} Consider the function \[ f(z) = \frac{\sin(1/z)}{\cos z}. \] This function has a nonanalytic point of a most peculiar nature at $z=0$. The point is an essential singularity, and one cannot expand the function directly about it. One could expand the function directly about some other point like $z=1$, but calculating the Taylor coefficients would take a lot of effort and, even then, the resulting series would suffer a straitly limited convergence domain. All that however tries too hard. Depending on the application, it may suffice to write \[ f(z) = \frac{\sin w}{\cos z}, \ \ w \equiv \frac 1 z. \] This is \[ f(z) = \frac {z^{-1} - z^{-3}/3! + z^{-5}/5! - \cdots} {1 - z^2/2! + z^4/4! - \cdots}, \] which is all one needs to calculate $f(z)$ numerically---and may be all one needs for analysis, too. As an example of a different kind, consider \[ g(z) = \frac{1}{(z-2)^2}. \] Most often, one needs no Taylor series to handle such a function (one simply does the indicated arithmetic). Suppose however that a Taylor series specifically about $z=0$ were indeed needed for some reason. Then by~(\ref{taylor:314:70}) and~(\ref{drvtv:220:30}), \[ g(z) = \frac{1/4}{(1-z/2)^2} = \frac 1 4 \sum_{k=0}^\infty \cmb{1+k}{1} \bigg[\frac z 2\bigg]^k = \sum_{k=0}^\infty \frac{k+1}{2^{k+2}} z^k, \] That expansion is good only for $\left|z\right| < 2$, but for $\left|z\right| > 2$ we also have that \[ g(z) = \frac{1/z^2}{(1-2/z)^2} = \frac 1 {z^2} \sum_{k=0}^\infty \cmb{1+k}{1} \left[\frac 2 z\right]^k = \sum_{k=2}^\infty \frac{2^{k-2}(k-1)}{z^k}, \] which expands in negative rather than positive powers of~$z$. Note that we have computed the two series for $g(z)$ without ever actually taking a derivative. Neither of the section's examples is especially interesting in itself, but their point is that it often pays to think flexibly in extending and applying the Taylor series. One is not required immediately to take the Taylor series of a function as it presents itself; one can first change variables or otherwise rewrite the function in some convenient way, then take the Taylor series either of the whole function at once or of pieces of it separately. One can expand in negative powers of~$z$ equally validly as in positive powers. And, though taking derivatives per~(\ref{taylor:310:20}) may be the canonical way to determine Taylor coefficients, any effective means to find the coefficients suffices. % ---------------------------------------------------------------------- \section{The multidimensional Taylor series} \label{taylor:370} \index{Taylor series!multidimensional} \index{cross-term} \index{term!cross-} \index{cross-derivative} \index{derivative!cross-} Equation~(\ref{taylor:310:20}) has given the Taylor series for functions of a single variable. The idea of the Taylor series does not differ where there are two or more independent variables, only the details are a little more complicated. For example, consider the function $f(z_1,z_2) = z_1^2 + z_1z_2 + 2z_2$, which has terms~$z_1^2$ and~$2z_2$---these we understand---but also has the cross-term~$z_1z_2$ for which the relevant derivative is the cross-derivative $\partial^2f/\partial z_1\,\partial z_2$. Where two or more independent variables are involved, one must account for the cross-derivatives, too. With this idea in mind, the multidimensional Taylor series is \bq{taylor:370:10} f(\ve z) = \sum_{\ve k} \left(\left.\frac {\partial^{\ve k}f} {\partial\ve z^{\ve k}}\right|_{\ve z=\ve z_o}\right) \frac{(\ve z-\ve z_o)^{\ve k}}{\ve k!}. \eq Well, that's neat. What does it mean? \bi \item \index{vector}\index{vector!generalized} The~$\ve z$ is a \emph{vector}% \footnote{ In this generalized sense of the word, a \emph{vector} is an ordered set of~$N$ elements. The geometrical vector $\ve v = \vu x x + \vu y y + \vu z z$ of \S~\ref{trig:230}, then, is a vector with $N=3$, $v_1=x$, $v_2=y$ and $v_3=z$. (Generalized vectors of arbitrary~$N$ will figure prominently in the book from Ch.~\ref{matrix} onward.) } incorporating the several independent variables $z_1,z_2,\ldots,z_N$. \item \index{vector!integer}\index{vector!nonnegative integer} The~$\ve k$ is a nonnegative integer vector of~$N$ counters---$k_1,k_2,\ldots,k_N$---one for each of the independent variables. Each of the~$k_n$ runs independently from~$0$ to~$\infty$, and every permutation is possible. For example, if $N=2$ then \bqb \ve k &=& (k_1,k_2) \\ &=& (0,0),(0,1),(0,2),(0,3),\ldots; \\ && (1,0),(1,1),(1,2),(1,3),\ldots; \\ && (2,0),(2,1),(2,2),(2,3),\ldots; \\ && (3,0),(3,1),(3,2),(3,3),\ldots; \\ && \ldots \eqb \item The $\partial^{\ve k}f/\partial\ve z^{\ve k}$ represents the $\ve k$th cross-derivative of $f(\ve z)$, meaning that \[ \frac{\partial^{\ve k}f}{\partial\ve z^{\ve k}} \equiv \left(\prod_{n=1}^{N}\frac{\partial^{k_n}}{(\partial z_n)^{k_n}}\right)f. \] \item The $(\ve z-\ve z_o)^{\ve k}$ represents \[ (\ve z-\ve z_o)^{\ve k} \equiv \prod_{n=1}^N (z_n-z_{on})^{k_n}. \] \item The~$\ve k!$ represents \[ \ve k! \equiv \prod_{n=1}^N k_n!. \] \ei With these definitions, the multidimensional Taylor series~(\ref{taylor:370:10}) yields all the right derivatives and cross-derivatives at the expansion point $\ve z = \ve z_o$. Thus within some convergence domain about $\ve z = \ve z_o$, the multidimensional Taylor series~(\ref{taylor:370:10}) represents a function $f(\ve z)$ as accurately as the simple Taylor series~(\ref{taylor:310:20}) represents a function $f(z)$, and for the same reason. derivations-0.53.20120414.orig/tex/stub.tex0000644000000000000000000001427011742566274016642 0ustar rootroot% ---------------------------------------------------------------------- % diagn: % The author has recently reworked this---what is it? plan? outline? % stub? temporary appendix?---and has left it thereby in a long-term % form. It wants therefore a close review. \chapter*{Plan} \label{stub} \setcounter{footnote}{0} { \newcounter{enumt} \setcounter{enumt}{\thechapter} The following chapters are tentatively planned to complete the book. \begin{enumerate} \setcounter{enumi}{\theenumt} \item \label{wave} The wave equation% \footnote{ Chapter~\ref{wave} might begin with Poisson's equation and the corresponding static case. After treating the wave equation proper, it might end with the parabolic wave equation. } \item \label{bessel} Cylinder functions \item \label{orthp} Orthogonal polynomials% \footnote{ Chapter~\ref{orthp} would be pretty useless if it did not treat Legendre polynomials, so presumably it will do at least this. }% $\mbox{}^{,}$% \footnote{ The author has not yet decided how to apportion the treatment of the wave equation in spherical geometries between Chs.~\ref{wave}, \ref{bessel} and~\ref{orthp}. } \item \label{xssc} Transformations to speed series convergence% \footnote{ Chapter~\ref{xssc} is tentatively to treat at least the Poisson sum formula, Mosig's summation-by-parts technique and, the author believes, the Watson transformation; plus maybe some others as seems appropriate. This might also be a good chapter in which to develop the infinite-product forms of the sine and the cosine and thence Euler's and Andrews' clever closed-form series summations from~\cite[\S~1.7 and exercises]{Andrews} and maybe from other, similar sources. } \item \label{cgrad} The conjugate-gradient algorithm \item \label{rmrk} Remarks \setcounter{enumt}{\theenumi} \end{enumerate} Chapters are likely yet to be inserted, removed, divided, combined and shuffled, but that's the planned outline at the moment. } The book means to stop short of hypergeometric functions, parabolic cylinder functions, selective-dimensional (Weyl and Sommerfeld) Fourier transforms, wavelets, and iterative techniques more advanced than the % bad break (but fixed) con\-ju\-gate-gra\-di\-ent (the advanced iterative techniques being too active an area of research for such a book as this yet to treat). However, acknowledging the uniquely seminal historical importance Kepler's laws, the book would like to add an appendix on the topic, to precede the existing Appendix~\ref{hist}. % Several of the tentatively planned chapters from~Ch.~\ref{iter} onward % represent deep fields of study each wanting full books of their own. If % written according to plan, few if any of these chapters would treat much % more than a few general results from their respective fields. % Yet further developments, if any, are hard to foresee.% % \footnote{ % Any plans---I should say, any wishes---beyond the topics listed are no % better than daydreams. However, for my own notes if for no other % reason, plausible topics include the following: % Hamiltonian mechanics; % electromagnetics; % the statics of materials; % the mechanics of materials; % fluid mechanics; % advanced special functions; % thermodynamics and the mathematics of entropy; % quantum mechanics; % electric circuits; % information theory; % statistics. % Life should be so long, eh? Well, we shall see. Like most authors % perhaps, I write in my spare time, the supply of which is necessarily % limited and unpredictable. % (Family responsibilities and other duties take precedence. My wife % says to me, ``You have a lot of chapters to write. It will take you a % long time.'' She understates the problem.) % The book targets the list ending in iterative techniques as its actual % goal. % } % \subsubsection*{A personal note to the reader} % \emph{Derivations of Applied Mathematics} belongs to the open-source % tradition, which means that you as reader have a stake in it if you % wish. If you have read the book, or a substantial fraction of it, as % far as it has yet gone, then you can help to improve it. Check % \texttt{http://www.derivations.org/} for the latest revision, then write % me at \texttt{thb@derivations.org}. I would most expressly solicit your % feedback on typos, misprints, false or missing symbols and the like; % such errors only mar the manuscript, so no such correction is too small. % On a higher plane, if you have found any part of the book unnecessarily % confusing, please tell how so. On no particular plane, if you would % tell me what you have done with your copy of the book, what you have % learned from it, or how you have cited it, then write at your % discretion. % % If you find a part of the book insufficiently rigorous, then that is % another matter. I do not discourage such criticism and would be glad % to hear it, but this book may not be well placed to meet it (the % book might compromise by including a footnote that briefly suggests the % outline of a more rigorous proof, but it tries not to distract the % narrative by formalities that do not serve applications). If you want % to detail H\"older spaces and Galois theory, or whatever, then my % response is likely to be that there is already a surfeit of fine % professional mathematics books in print; this just isn't that kind of % book. On the other hand, the book does intend to derive every one of % its results adequately from an applied perspective; if it fails to do so % in your view then maybe you and I should discuss the matter. Finding % the right balance is not always easy. % % At the time of this writing, readers are downloading the book at the % rate of about four thousand copies per year directly through % \texttt{derivations.org}. Some fraction of those, plus others who have % installed the book as a Debian package or have acquired the book through % secondary channels, actually have read it; now you stand among them. % % Write as appropriate. More to come. % \nopagebreak % % \noindent\\ % THB derivations-0.53.20120414.orig/tex/hex.tex0000644000000000000000000002006411742566274016447 0ustar rootroot% ---------------------------------------------------------------------- \chapter[Hexadecimal notation, et al.]{Hexadecimal and other notational matters} \label{hex} \index{convention} The importance of conventional mathematical notation is hard to overstate. Such notation serves two distinct purposes: it conveys mathematical ideas from writer to reader; and it concisely summarizes complex ideas on paper to the writer himself. Without the notation, one would find it difficult even to think clearly about the math; to discuss it with others, nearly impossible. The right notation is not always found at hand, of course. New mathematical ideas occasionally find no adequate pre\"established notation, when it falls to the discoverer and his colleagues to establish new notation to meet the need. A more difficult problem arises when old notation exists but is inelegant in modern use. Convention is a hard hill to climb, and rightly so. Nevertheless, slavish devotion to convention does not serve the literature well; for how else can notation improve over time, if writers will not incrementally improve it? Consider the notation of the algebraist Girolamo Cardano in his 1539 letter to Tartaglia: \begin{quote} [T]he cube of one-third of the coefficient of the unknown is greater in value than the square of one-half of the number.~\cite{mathbios} \end{quote} If Cardano lived today, surely he would express the same thought in the form \[ \left(\frac{a}{3}\right)^3 > \left(\frac{x}{2}\right)^2. \] Good notation matters. \index{$\pi$} \index{$2\pi$} Although this book has no brief to overhaul applied mathematical notation generally, it does seek to aid the honorable cause of notational evolution in a few specifics. For example, the book sometimes treats~$2\pi$ implicitly as a single symbol, so that (for instance) the quarter revolution or right angle is expressed as $2\pi/4$ rather than as the less evocative $\pi/2$. As a single symbol, of course,~$2\pi$ remains a bit awkward. One wants to introduce some new symbol $\xi=2\pi$ thereto. However, it is neither necessary nor practical nor desirable to leap straight to notational Utopia in one great bound. It suffices in print to improve the notation incrementally. If this book treats~$2\pi$ sometimes as a single symbol---if such treatment meets the approval of slowly evolving convention---then further steps, the introduction of new symbols~$\xi$ and such, can safely be left incrementally to future writers. % ---------------------------------------------------------------------- \section{Hexadecimal numerals} \label{hex:240.1} \index{hexadecimal} \index{$\mbox{0x}$} Treating~$2\pi$ as a single symbol is a small step, unlikely to trouble readers much. A bolder step is to adopt from the computer science literature the important notational improvement of the hexadecimal numeral. No incremental step is possible here; either we leap the ditch or we remain on the wrong side. In this book, we choose to leap. Traditional decimal notation is unobjectionable for measured quantities like 63.7~miles, $\$\:1.32$ million or $9.81\:\mr{m}/\mr{s}^2$, but its iterative tenfold structure meets little or no aesthetic support in mathematical theory. Consider for instance the decimal numeral 127, whose number suggests a significant idea to the computer scientist, but whose decimal notation does nothing to convey the notion of the largest signed integer storable in a byte. Much better is the base-sixteen hexadecimal notation 0x7F, which clearly expresses the idea of $2^7-1$. To the reader who is not a computer scientist, the aesthetic advantage may not seem immediately clear from the one example, but consider the decimal number 2,147,483,647, which is the largest signed integer storable in a standard thirty-two bit word. In hexadecimal notation, this is $\mr{0x7FFF\,FFFF}$, or in other words $2^{\mr{0x1F}}-1$. The question is: which notation more clearly captures the idea? To readers unfamiliar with the hexadecimal notation, to explain very briefly: hexadecimal represents numbers not in tens but rather in sixteens. The rightmost place in a hexadecimal numeral represents ones; the next place leftward, sixteens; the next place leftward, sixteens squared; the next, sixteens cubed, and so on. For instance, the hexadecimal numeral 0x1357 means ``seven, plus five times sixteen, plus thrice sixteen times sixteen, plus once sixteen times sixteen times sixteen.'' In hexadecimal, the sixteen symbols 0123456789ABCDEF respectively represent the numbers zero through fifteen, with sixteen being written 0x10. All this raises the sensible question: why sixteen?% \footnote{ An alternative advocated by some eighteenth-century writers was twelve. In base twelve, one quarter, one third and one half are respectively written 0.3, 0.4 and 0.6. Also, the hour angles (\S~\ref{trig:260}) come in neat increments of $(0.06)(2\pi)$ in base twelve, so there are some real advantages to that base. Hexadecimal, however, besides having momentum from the computer science literature, is preferred for its straightforward proxy of binary. } The answer is that sixteen is~$2^4$, so hexadecimal (base sixteen) is found to offer a convenient shorthand for binary (base two, the fundamental, smallest possible base). Each of the sixteen hexadecimal digits represents a unique sequence of exactly four bits (binary digits). Binary is inherently theoretically interesting, but direct binary notation is unwieldy (the hexadecimal number 0x1357 is binary $\mr{0001\,0011\,0101\,0111}$), so hexadecimal is written in proxy. The conventional hexadecimal notation is admittedly a bit bulky and unfortunately overloads the letters~A through~F, letters which when set in italics usually represent coefficients not digits. However, the real problem with the hexadecimal notation is not in the notation itself but rather in the unfamiliarity with it. The reason it is unfamiliar is that it is not often encountered outside the computer science literature, but it is not encountered because it is not used, and it is not used because it is not familiar, and so on in a cycle. It seems to this writer, on aesthetic grounds, that this particular cycle is worth breaking, so this book uses the hexadecimal for integers larger than~9. If you have never yet used the hexadecimal system, it is worth your while to learn it. For the sake of elegance, at the risk of challenging entrenched convention, this book employs hexadecimal throughout. Observe that in some cases, such as where hexadecimal numbers are arrayed in matrices, this book may omit the cumbersome hexadecimal prefix~``0x.'' Specific numbers with physical units attached appear seldom in this book, but where they do naturally decimal not hexadecimal is used: $v_\mr{sound}=331\:\mr{m/s}$ rather than the silly-looking $v_\mr{sound}=\mr{0x14B}\:\mr{m/s}$. Combining the hexadecimal and~$2\pi$ ideas, we note here for interest's sake that \[ 2\pi \approx \mr{0x6.487F}. \] % ---------------------------------------------------------------------- \section{Avoiding notational clutter} \label{hex:270.2} \index{clutter, notational} Good applied mathematical notation is not cluttered. Good notation does not necessarily include every possible limit, qualification, superscript and subscript. For example, the sum \[ S = \sum_{i=1}^{M}\sum_{j=1}^{N} a_{ij}^2 \] might be written less thoroughly but more readably as \[ S = \sum_{i,j} a_{ij}^2 \] if the meaning of the latter were clear from the context. When to omit subscripts and such is naturally a matter of style and subjective judgment, but in practice such judgment is often not hard to render. The balance is between showing few enough symbols that the interesting parts of an equation are not obscured visually in a tangle and a haze of redundant little letters, strokes and squiggles, on the one hand; and on the other hand showing enough detail that the reader who opens the book directly to the page has a fair chance to understand what is written there without studying the whole book carefully up to that point. Where appropriate, this book often condenses notation and omits redundant symbols. derivations-0.53.20120414.orig/tex/xkvview.sty0000644000000000000000000001535011742575144017403 0ustar rootroot%% %% This is file `xkvview.sty', %% generated with the docstrip utility. %% %% The original source files were: %% %% xkeyval.dtx (with options: `xkvview') %% %% --------------------------------------- %% Copyright (C) 2004-2008 Hendri Adriaens %% --------------------------------------- %% %% This work may be distributed and/or modified under the %% conditions of the LaTeX Project Public License, either version 1.3 %% of this license or (at your option) any later version. %% The latest version of this license is in %% http://www.latex-project.org/lppl.txt %% and version 1.3 or later is part of all distributions of LaTeX %% version 2003/12/01 or later. %% %% This work has the LPPL maintenance status "maintained". %% %% This Current Maintainer of this work is Hendri Adriaens. %% %% This work consists of the file xkeyval.dtx and derived files %% keyval.tex, xkvtxhdr.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, pst-xkey.tex, pst-xkey.sty, xkveca.cls, xkvecb.cls, %% xkvesa.sty, xkvesb.sty, xkvesc.sty, xkvex1.tex, xkvex2.tex, %% xkvex3.tex and xkvex4.tex. %% %% The following files constitute the xkeyval bundle and must be %% distributed as a whole: readme, xkeyval.pdf, keyval.tex, %% pst-xkey.sty, pst-xkey.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, xkvtxhdr.tex, pst-xkey.dtx and xkeyval.dtx. %% \NeedsTeXFormat{LaTeX2e}[1995/12/01] \ProvidesPackage{xkvview}% [2008/08/10 v1.5 viewer utility for xkeyval (HA)] \RequirePackage{xkeyval} \RequirePackage{longtable} \DeclareOptionX*{% \PackageWarning{xkvview}{Unknown option `\CurrentOption'}% } \ProcessOptionsX \newif\ifXKVV@vwkey \newif\ifXKVV@colii \newif\ifXKVV@coliii \newif\ifXKVV@coliv \newif\ifXKVV@colv \newwrite\XKVV@out \let\XKVV@db\@empty \define@cmdkeys[XKVV]{xkvview}[XKVV@]{% prefix,family,type,default,file,columns,wcolsep,weol}[\@nil] \define@boolkeys[XKVV]{xkvview}[XKVV@]{view,vlabels,wlabels}[true] \presetkeys[XKVV]{xkvview}{prefix,family,type,default,file,% columns,wcolsep=&,weol=\\,view,vlabels=false,wlabels=false}{} \def\XKVV@tabulate#1#2#3{% \def\XKV@tempa{#3}% \@onelevel@sanitize\XKV@tempa \XKV@addtolist@x\XKVV@db{#1=\ifx\XKV@prefix\@empty\else\expandafter \XKVV@t@bulate\XKV@prefix\fi=\XKV@tfam=#2=\XKV@tempa}% } \def\XKVV@t@bulate#1@{#1} \def\XKV@define@key#1{% \@ifnextchar[{\XKV@d@fine@k@y{#1}}{% \XKVV@tabulate{#1}{ordinary}{[none]}% \expandafter\def\csname\XKV@header#1\endcsname####1% }% } \def\XKV@d@fine@k@y#1[#2]{% \XKVV@tabulate{#1}{ordinary}{#2}% \XKV@define@default{#1}{#2}% \expandafter\def\csname\XKV@header#1\endcsname##1% } \def\XKV@define@cmdkey#1#2[#3]#4{% \ifXKV@st \XKVV@tabulate{#2}{command}{#3}% \XKV@define@default{#2}{#3}% \else \XKVV@tabulate{#2}{command}{[none]}% \fi \def\XKV@tempa{\expandafter\def\csname\XKV@header#2\endcsname####1}% \begingroup\expandafter\endgroup\expandafter\XKV@tempa\expandafter {\expandafter\def\csname#1#2\endcsname{##1}#4}% } \def\XKV@d@fine@ch@icekey#1[#2]{% \XKVV@tabulate{#1}{choice}{#2}% \XKV@define@default{#1}{#2}% \XKV@d@fine@ch@ic@key{#1}% } \def\XKV@d@fine@ch@ic@key#1{% \XKVV@tabulate{#1}{choice}{[none]}% \ifXKV@pl\XKV@afterelsefi \expandafter\XKV@d@f@ne@ch@ic@k@y \else\XKV@afterfi \expandafter\XKV@d@f@ne@ch@ic@key \fi \csname\XKV@header#1\endcsname } \def\XKV@d@f@ne@b@olkey#1#2#3#4#5{% \expandafter\newif\csname if#3\endcsname \ifXKV@st \XKVV@tabulate{#2}{boolean}{#4}% \XKV@define@default{#2}{#4}% \else \XKVV@tabulate{#2}{boolean}{[none]}% \fi \ifXKV@pl \def#1##1{\XKV@pltrue\XKV@sttrue \XKV@checkchoice[\XKV@resa]{##1}{true,false}#5% }% \else \def#1##1{\XKV@plfalse\XKV@sttrue \XKV@checkchoice[\XKV@resa]{##1}{true,false}#5% }% \fi } \def\xkvview#1{% \setkeys[XKVV]{xkvview}{#1}% \ifx\XKVV@default\@nnil\else\@onelevel@sanitize\XKVV@default\fi \ifx\XKVV@columns\@nnil \count@5 \XKVV@coliitrue\XKVV@coliiitrue\XKVV@colivtrue\XKVV@colvtrue \else \count@\@ne \@expandtwoargs\in@{,prefix,}{,\XKVV@columns,}% \ifin@\advance\count@\@ne\XKVV@coliitrue\else\XKVV@coliifalse\fi \@expandtwoargs\in@{,family,}{,\XKVV@columns,}% \ifin@\advance\count@\@ne\XKVV@coliiitrue\else\XKVV@coliiifalse\fi \@expandtwoargs\in@{,type,}{,\XKVV@columns,}% \ifin@\advance\count@\@ne\XKVV@colivtrue\else\XKVV@colivfalse\fi \@expandtwoargs\in@{,default,}{,\XKVV@columns,}% \ifin@\advance\count@\@ne\XKVV@colvtrue\else\XKVV@colvfalse\fi \fi \ifXKVV@view \protected@edef\XKV@tempa{\noexpand\begin{longtable}[l]{% *\the\count@ l}\normalfont Key\ifXKVV@colii&\normalfont Prefix% \fi\ifXKVV@coliii&\normalfont Family\fi\ifXKVV@coliv&\normalfont Type\fi\ifXKVV@colv&\normalfont Default\fi\\\noexpand\hline \noexpand\endfirsthead\noexpand\multicolumn{\the\count@}{l}{% \normalfont\emph{Continued from previous page}}\\\noexpand\hline \normalfont Key\ifXKVV@colii&\normalfont Prefix\fi\ifXKVV@coliii &\normalfont Family\fi\ifXKVV@coliv&\normalfont Type\fi \ifXKVV@colv&\normalfont Default\fi\\\noexpand\hline\noexpand \endhead\noexpand\hline\noexpand\multicolumn{\the\count@}{r}{% \normalfont\emph{Continued on next page}}\\\noexpand\endfoot \noexpand\hline\noexpand\endlastfoot }% \XKV@toks\expandafter{\XKV@tempa}% \fi \ifx\XKVV@file\@nnil\else\immediate\openout\XKVV@out\XKVV@file\fi \XKV@for@o\XKVV@db\XKV@tempa{% \XKVV@vwkeytrue\expandafter\XKVV@xkvview\XKV@tempa\@nil }% \ifXKVV@view \addto@hook\XKV@toks{\end{longtable}}% \begingroup\ttfamily\the\XKV@toks\endgroup \fi \ifx\XKVV@file\@nnil\else\immediate\closeout\XKVV@out\fi } \def\XKVV@xkvview#1=#2=#3=#4=#5\@nil{% \ifx\XKVV@prefix\@nnil\else \def\XKV@tempa{#2}% \ifx\XKV@tempa\XKVV@prefix\else\XKVV@vwkeyfalse\fi \fi \ifx\XKVV@family\@nnil\else \def\XKV@tempa{#3}% \ifx\XKV@tempa\XKVV@family\else\XKVV@vwkeyfalse\fi \fi \ifx\XKVV@type\@nnil\else \def\XKV@tempa{#4}% \ifx\XKV@tempa\XKVV@type\else\XKVV@vwkeyfalse\fi \fi \ifx\XKVV@default\@nnil\else \def\XKV@tempa{#5}% \ifx\XKV@tempa\XKVV@default\else\XKVV@vwkeyfalse\fi \fi \ifXKVV@vwkey \ifXKVV@view \edef\XKV@tempa{% #1\ifXKVV@colii\fi\ifXKVV@coliii\fi \ifXKVV@coliv\fi\ifXKVV@colv\fi \ifXKVV@vlabels\noexpand\label{#2-#3-#1}\fi }% \expandafter\addto@hook\expandafter \XKV@toks\expandafter{\XKV@tempa\\}% \fi \ifx\XKVV@file\@nnil\else \immediate\write\XKVV@out{% #1\ifXKVV@colii\XKVV@wcolsep#2\fi \ifXKVV@coliii\XKVV@wcolsep#3\fi \ifXKVV@coliv\XKVV@wcolsep#4\fi \ifXKVV@colv\XKVV@wcolsep#5\fi \ifXKVV@wlabels\string\label{#2-#3-#1}\fi \expandafter\noexpand\XKVV@weol }% \fi \fi } \endinput %% %% End of file `xkvview.sty'. derivations-0.53.20120414.orig/tex/README0000644000000000000000000000104511742566274016017 0ustar rootroot This directory contains the LaTeX and BibTeX source files for the book. In addition to TeX, LaTeX, BibTeX, AMSmath and PSTricks, you also need Rubber installed to build the source with the default Makefile. NOTES The template files are not actually source files. You could delete them without ill effect. The templates serve to remind the author while writing how to code various TeX entities correctly. When running "make check," it seems to be typographically acceptable to let entries in the index run at least up to 9.0 points too wide. derivations-0.53.20120414.orig/tex/matrix.tex0000644000000000000000000032524611742566274017201 0ustar rootroot% ---------------------------------------------------------------------- \chapter{The matrix} \label{matrix} \index{matrix} \index{linear algebra} \index{algebra!linear} \index{matrix rudiments} \index{rudiments} \index{applied mathematics!foundations of} \index{mathematics!applied, foundations of} Chapters~\ref{alggeo} through~\ref{inttx} have laid solidly the basic foundations of applied mathematics. This chapter begins to build on those foundations, demanding some heavier mathematical lifting. Taken by themselves, most of the foundational methods of the earlier chapters have handled only one or at most a few numbers (or functions) at a time. However, in practical applications the need to handle large arrays of numbers at once arises often. Some nonobvious effects emerge then, as, for example, the eigenvalue of Ch.~\ref{eigen}. \index{matrix!motivation for} Regarding the eigenvalue: the eigenvalue was always there, but prior to this point in the book it was usually trivial---the eigenvalue of~$5$ is just~$5$, for instance---so we didn't bother much to talk about it. It is when numbers are laid out in orderly grids like \[ C = \mf{rrr}{ 6 & 4 & 0 \\ 3 & 0 & 1 \\ 3 & 1 & 0 } \] that nontrivial eigenvalues arise (though you cannot tell just by looking, the eigenvalues of~$C$ happen to be~$-1$ and $[7\pm\sqrt{\mbox{0x49}}]/2$). But, just what is an \emph{eigenvalue?} Answer: an eigenvalue is the value by which an object like~$C$ scales an eigenvector without altering the eigenvector's direction. Of course, we have not yet said what an \emph{eigenvector} is, either, or how~$C$ might scale something, but it is to answer precisely such questions that this chapter and the three which follow it are written. So, we are getting ahead of ourselves. Let's back up. \index{multiplier} \index{coefficient} An object like~$C$ is called a \emph{matrix.} It serves as a generalized coefficient or multiplier. Where we have used single numbers as coefficients or multipliers heretofore, one can with sufficient care often use matrices instead. The matrix interests us for this reason among others. \index{vector!generalized} \index{vector!matrix} \index{matrix vector} \index{vector!$n$-dimensional} \index{$n$-dimensional vector} \index{complex number!being a scalar not a vector} \index{element} \index{dimension} The technical name for the ``single number'' is the \emph{scalar.} Such a number, as for instance~$5$ or $-4+i3$, is called a scalar because its action alone during multiplication is simply to scale the thing it multiplies. Besides acting alone, however, scalars can also act in concert---in orderly formations---thus constituting any of three basic kinds of arithmetical object: \bi \item \index{scalar} the \emph{scalar} itself, a single number like $\alpha=5$ or $\beta=-4+i3$; \item \index{vector} the \emph{vector,} a column of~$m$ scalars like \[ \ve u = \mf{c}{ 5 \\ -4+i3 }, \] which can be written in-line with the notation $\ve u=[5\ \ \mbox{$-4+i3$}]^T$ (here there are two scalar elements,~$5$ and $-4+i3$, so in this example $m=2$); \item \index{matrix}\index{row}\index{column}\index{vector!row of} the \emph{matrix,} an $m\times n$ grid of scalars, or equivalently a row of~$n$ vectors, like \[ A = \mf{rrr}{ 0 & 6 & 2 \\ 1 & 1 & -1 }, \] which can be written in-line with the notation $A=[0\;6\;2; 1\;1\;\mbox{$-1$}]$ or the notation $A=[0\;1; 6\;1; 2\;\mbox{$-1$}]^T$ (here there are two rows and three columns of scalar elements, so in this example $m=2$ and $n=3$). \ei % An x which solves A*x==b is x=[1 1 -2]', where b==[2 4]'. Several general points are immediately to be observed about these various objects. First, despite the geometrical Argand interpretation of the complex number, a complex number is not a two-element vector but a scalar; therefore any or all of a vector's or matrix's scalar elements can be complex. Second, an $m$-element vector does not differ for most purposes from an $m\times 1$ matrix; generally the two can be regarded as the same thing. Third, the three-element (that is, three-dimensional) geometrical vector of \S~\ref{trig:230} is just an $m$-element vector with $m=3$. Fourth,~$m$ and~$n$ can be any nonnegative integers, even one, even zero, even infinity.% \footnote{ Fifth, though the progression \emph{scalar, vector, matrix} suggests next a ``matrix stack'' or stack of~$p$ matrices, such objects in fact are seldom used. As we shall see in \S~\ref{matrix:120}, the chief advantage of the standard matrix is that it neatly represents the linear transformation of one vector into another. ``Matrix stacks'' bring no such advantage. This book does not treat them. } Where one needs visually to distinguish a symbol like~$A$ representing a matrix, one can write it~$[A]$, in square brackets.% \footnote{ Alternate notations seen in print include~$\dyad A$ and~$\ve A$. } Normally however a simple~$A$ suffices. The matrix is a notoriously hard topic to motivate. The idea of the matrix is deceptively simple. The mechanics of matrix arithmetic are deceptively intricate. The most basic body of matrix theory, without which little or no useful matrix work can be done, is deceptively extensive. The matrix neatly encapsulates a substantial knot of arithmetical tedium and clutter, but to understand the matrix one must first understand the tedium and clutter the matrix encapsulates. As far as the author is aware, no one has ever devised a way to introduce the matrix which does not seem shallow, tiresome, irksome, even interminable at first encounter; yet the matrix is too important to ignore. Applied mathematics brings nothing else quite like it.% \footnote{ In most of its chapters, the book seeks a balance between terseness the determined beginner cannot penetrate and prolixity the seasoned veteran will not abide. The matrix upsets this balance. Part of the trouble with the matrix is that its arithmetic is just that, an arithmetic, no more likely to be mastered by mere theoretical study than was the classical arithmetic of childhood. To master matrix arithmetic, one must drill it; yet the book you hold is fundamentally one of theory not drill. The reader who has previously drilled matrix arithmetic will meet here the essential applied theory of the matrix. That reader will find this chapter and the next three tedious enough. The reader who has not previously drilled matrix arithmetic, however, is likely to find these chapters positively hostile. Only the doggedly determined beginner will learn the matrix here alone; others will find it more amenable to drill matrix arithmetic first in the early chapters of an introductory linear algebra textbook, dull though such chapters be (see~\cite{Lay} or better yet the fine, surprisingly less dull~\cite{Hefferon} for instance, though the early chapters of almost any such book give the needed arithmetical drill.) Returning here thereafter, the beginner can expect to find \emph{these} chapters still tedious but no longer impenetrable. The reward is worth the effort. That is the approach the author recommends. To the mathematical rebel, the young warrior with face painted and sword agleam, still determined to learn the matrix here alone, the author salutes his honorable defiance. Would the rebel consider alternate counsel? If so, then the rebel might compose a dozen matrices of various sizes and shapes, broad, square and tall, decomposing each carefully by pencil per the Gauss-Jordan method of \S~\ref{gjrank:341}, checking results (again by pencil; using a machine defeats the point of the exercise, and using a sword, well, it won't work) by multiplying factors to restore the original matrices. Several hours of such drill should build the young warrior the practical arithmetical foundation to master---with commensurate effort---the theory these chapters bring. The way of the warrior is hard, but conquest is not impossible. To the matrix veteran, the author presents these four chapters with grim enthusiasm. Substantial, logical, necessary the chapters may be, but exciting they are not. At least, the earlier parts are not very exciting (later parts are better). As a reasonable compromise, the veteran seeking more interesting reading might skip directly to Chs.~\ref{mtxinv} and~\ref{eigen}, referring back to Chs.~\ref{matrix} and \ref{gjrank} as need arises. } Chapters~\ref{matrix} through~\ref{eigen} treat the matrix and its algebra. This chapter, Ch.~\ref{matrix}, introduces the rudiments of the matrix itself.% \footnote{% \cite{Beattie}% \cite{Franklin}% \cite{Hefferon}% \cite{Lay} } % ---------------------------------------------------------------------- \section{Provenance and basic use} \label{matrix:120} \index{matrix!basic operations of} \index{matrix!motivation for} \index{matrix!provenance of} \index{matrix!basic use of} It is in the study of linear transformations that the concept of the matrix first arises. We begin there. \subsection{The linear transformation} \label{matrix:120.10} \index{linear transformation} \index{transformation, linear} \index{index} Section~\ref{integ:240.05} has introduced the idea of linearity. The \emph{linear transformation}% \footnote{ Professional mathematicians conventionally are careful to begin by drawing a clear distinction between the ideas of the linear transformation, the basis set and the simultaneous system of linear equations---proving from suitable axioms that the three amount more or less to the same thing, rather than implicitly assuming the fact. The professional approach \cite[Chs.~1 and~2]{Beattie}\cite[Chs.~1, 2 and~5]{Lay} has much to recommend it, but it is not the approach we will follow here. } is the operation of an $m\times n$ matrix~$A$, as in \bq{matrix:000:10} A\ve x = \ve b, \eq to transform an $n$-element vector~$\ve x$ into an $m$-element vector~$\ve b$, while respecting the rules of linearity \bq{matrix:000:20} \renewcommand\arraystretch{1.5} \br{rclcl} A(\ve x_1 + \ve x_2) &=& A\ve x_1 + A\ve x_2 &=& \ve b_1 + \ve b_2, \\ A(\alpha \ve x) &=& \alpha A\ve x &=& \alpha\ve b, \\ A(0) &=& 0. && \er \eq For example, \[ A = \mf{rrr}{ 0 & 6 & 2 \\ 1 & 1 & -1 } \] is the $2\times 3$ matrix which transforms a three-element vector~$\ve x$ into a two-element vector~$\ve b$ such that \[ A\ve x = \mf{c}{ 0x_1 + 6x_2 + 2x_3 \\ 1x_1 + 1x_2 - 1x_3 } = \ve b, \] where \[ \ve x = \mf{c}{ x_1 \\ x_2 \\ x_3 }, \ \ % \ve b = \mf{c}{ b_1 \\ b_2 }. \] In general, the operation of a matrix~$A$ is that% \footnote{ As observed in Appendix~\ref{greek}, there are unfortunately not enough distinct Roman and Greek letters available to serve the needs of higher mathematics. In matrix work, the Roman letters~$ijk$ conventionally serve as indices, but the same letter~$i$ also serves as the imaginary unit, which is not an index and has nothing to do with indices. Fortunately, the meaning is usually clear from the context:~$i$ in~$\sum_i$ or~$a_{ij}$ is an index;~$i$ in $-4+i3$ or~$e^{i\phi}$ is the imaginary unit. Should a case arise in which the meaning is not clear, one can use~$\ell jk$ or some other convenient letters for the indices. }% $\mbox{}^,$% \footnote{ Whether to let the index~$j$ run from~$0$ to $n-1$ or from~$1$ to~$n$ is an awkward question of applied mathematical style. In computers, the index normally runs from~$0$ to $n-1$, and in many ways this really is the more sensible way to do it. In mathematical theory, however, a~$0$ index normally implies something special or basic about the object it identifies. The book you are reading tends to let the index run from~$1$ to~$n$, following mathematical convention in the matter for this reason. Conceived more generally, an $m\times n$ matrix can be considered an $\infty\times\infty$ matrix with zeros in the unused cells. Here, both indices~$i$ and~$j$ run from~$-\infty$ to~$+\infty$ anyway, so the computer's indexing convention poses no dilemma in this case. See \S~\ref{matrix:180}. } \bq{matrix:000:30} b_i = \sum_{j=1}^{n} a_{ij} x_j, \eq where~$x_j$ is the $j$th element of~$\ve x$, $b_i$ is the $i$th element of~$\ve b$, and \[ a_{ij} \equiv [A]_{ij} \] is the element at the $i$th row and $j$th column of~$A$, counting from top left (in the example for instance, $a_{12}=6$). \index{equation!solving a set of simultaneously} \index{equation!simultaneous linear system of} \index{simultaneous system of linear equations} Besides representing linear transformations as such, matrices can also represent simultaneous systems of linear equations. For example, the system \[ \begin{split} 0x_1 + 6x_2 + 2x_3 &= 2, \\ 1x_1 + 1x_2 - 1x_3 &= 4, \end{split} \] is compactly represented as \[ A\ve x = \ve b, \] with~$A$ as given above and $\ve b = [2 \; 4 ]^T$. Seen from this point of view, a simultaneous system of linear equations is itself neither more nor less than a linear transformation. \subsection{Matrix multiplication (and addition)} \label{matrix:120.20} \index{matrix!multiplication of} \index{multiplication!of matrices} \index{matrix!arithmetic of} \index{arithmetic!of matrices} \index{matrix!addition of} \index{addition!of matrices} \index{associativity!of matrix multiplication} \index{commutivity!noncommutivity of matrix multiplication} \index{noncommutivity!of matrix multiplication} \index{matrix!associativity of the multiplication of} \index{matrix!noncommutivity of the multiplication of} \index{matrix!multiplication of by a scalar} \index{counterexample} Nothing prevents one from lining several vectors~$\ve x_k$ up in a row, industrial mass production-style, transforming them at once into the corresponding vectors~$\ve b_k$ by the same matrix~$A$. In this case, \bq{matrix:000:40} \br{rcrccccl} X &\equiv& [& \ve x_1 & \ve x_2 &\cdots& \ve x_p &], \\ B &\equiv& [& \ve b_1 & \ve b_2 &\cdots& \ve b_p &], \\ AX &=& \multicolumn{6}{l}{B,} \\ b_{ik} &=& \multicolumn{6}{l}{ \ds\sum_{j=1}^{n} a_{ij}x_{jk}. } \er \eq Equation~(\ref{matrix:000:40}) implies a definition for matrix multiplication. Such matrix multiplication is associative since \bqa \left[(A)(XY)\right]_{ik} &=& \sum_{j=1}^{n} a_{ij}[XY]_{jk} \xn\\&=& \sum_{j=1}^{n} a_{ij}\left[ \sum_{\ell=1}^{p} x_{j\ell} y_{\ell k} \right] \xn\\&=& \sum_{\ell=1}^{p} \sum_{j=1}^{n} a_{ij} x_{j\ell} y_{\ell k} \xn\\&=& \left[(AX)(Y)\right]_{ik}. \label{matrix:000:42} \eqa Matrix multiplication is not generally commutative, however; \bq{matrix:000:41} AX \neq XA, \eq as one can show by a suitable counterexample like $A=[0\;1; 0\;0],$ $X=[1\;0; 0\;0]$. To multiply a matrix by a scalar, one multiplies each of the matrix's elements individually by the scalar: \bq{matrix:000:45} [\alpha A]_{ij} = \alpha a_{ij}. \eq Evidently multiplication by a scalar is commutative: $\alpha A\ve x = A\alpha\ve x$. Matrix addition works in the way one would expect, element by element; and as one can see from~(\ref{matrix:000:40}), under multiplication, matrix addition is indeed distributive: \bq{matrix:000:48} \begin{split} [X+Y]_{ij} &= x_{ij} + y_{ij}; \\ (A)(X+Y) &= AX+AY; \\ (A+C)(X) &= AX+CX. \end{split} \eq \subsection{Row and column operators} \label{matrix:120.27} \index{row operator} \index{column operator} The matrix equation $A\ve x = \ve b$ represents the linear transformation of~$\ve x$ into~$\ve b$, as we have seen. Viewed from another perspective, however, the same matrix equation represents something else; it represents a weighted sum of the columns of~$A$, with the elements of~$\ve x$ as the weights. In this view, one writes~(\ref{matrix:000:30}) as \bq{matrix:230:05} \ve b = \sum_{j=1}^{n} [A]_{*j} x_j, \eq where~$[A]_{*j}$ is the $j$th column of~$A$. Here~$\ve x$ is not only a vector; it is also an operator. It operates on $A$'s columns. By virtue of multiplying~$A$ from the right, the vector~$\ve x$ is a \emph{column operator} acting on~$A$. If several vectors~$\ve x_k$ line up in a row to form a matrix~$X$, such that $AX = B$, then the matrix~$X$ is likewise a column operator: \bq{matrix:230:10} [B]_{*k} = \sum_{j=1}^{n} [A]_{*j} x_{jk}. \eq The $k$th column of~$X$ weights the several columns of~$A$ to yield the $k$th column of~$B$. If a matrix multiplying from the right is a column operator, is a matrix multiplying from the left a \emph{row operator?} Indeed it is. Another way to write $AX=B$, besides~(\ref{matrix:230:10}), is \bq{matrix:230:20} [B]_{i*} = \sum_{j=1}^{n} a_{ij} [X]_{j*}. \eq The $i$th row of~$A$ weights the several rows of~$X$ to yield the $i$th row of~$B$. The matrix~$A$ is a row operator. % (Observe the notation. The~${*}$ here means ``any'' or ``all.'' Hence~$[X]_{j{*}}$ means ``$j$th row, all columns of~$X$''---that is, the $j$th row of~$X$. Similarly,~$[A]_{{*}j}$ means ``all rows, $j$th column of~$A$''---that is, the $j$th column of~$A$.) \emph{Column operators attack from the right; row operators, from the left.} This rule is worth memorizing; the concept is important. In $AX=B$, the matrix~$X$ operates on $A$'s columns; the matrix~$A$ operates on $X$'s rows. Since matrix multiplication produces the same result whether one views it as a linear transformation~(\ref{matrix:000:40}), a column operation~(\ref{matrix:230:10}) or a row operation~(\ref{matrix:230:20}), one might wonder what purpose lies in defining matrix multiplication three separate ways. However, it is not so much for the sake of the mathematics that we define it three ways as it is for the sake of the mathematician. We do it for ourselves. Mathematically, the latter two do indeed expand to yield~(\ref{matrix:000:40}), but as written the three represent three different perspectives on the matrix. A tedious, nonintuitive matrix theorem from one perspective can appear suddenly obvious from another (see for example eqn.~\ref{matrix:330:25}). Results hard to visualize one way are easy to visualize another. It is worth developing the mental agility to view and handle matrices all three ways for this reason. \subsection{The transpose and the adjoint} \label{matrix:120.30} \index{transpose} \index{transpose!conjugate} \index{conjugate transpose} \index{adjoint} \index{Hermite, Charles (1822--1901)} \index{conjugate} One function peculiar to matrix algebra is the \emph{transpose} \bq{matrix:000:50} \begin{split} C &= A^{T}, \\ c_{ij} &= a_{ji}, \end{split} \eq which mirrors an $m\times n$ matrix into an $n\times m$ matrix. For example, \[ A^{T} = \mf{rr}{ 0 & 1 \\ 6 & 1 \\ 2 & -1 }. \] Similar and even more useful is the \emph{conjugate transpose} or \emph{adjoint}% \footnote{ Alternate notations sometimes seen in print for the adjoint include~$A^\dagger$ (a notation which in this book means something unrelated) and~$A^H$ (a notation which recalls the name of the mathematician Charles Hermite). However, the book you are reading writes the adjoint only as~$A^{*}$, a notation which better captures the sense of the thing in the author's view. } %\footnote{ % More couth than the noun ``transpose'' would seem to % ``transposition,'' but the former appears to be entrenched in the % literature. %} \bq{matrix:000:52} \begin{split} C &= A^{*}, \\ c_{ij} &= a_{ji}^{*}, \end{split} \eq which again mirrors an $m\times n$ matrix into an $n\times m$ matrix, but conjugates each element as it goes. The transpose is convenient notationally to write vectors and matrices in-line and to express certain matrix-arithmetical mechanics; but algebraically the transpose is artificial. It is the adjoint rather which mirrors a matrix properly. (If the transpose and adjoint functions applied to words as to matrices, then the transpose of ``derivations'' would be ``snoitavired,'' whereas the adjoint would be ``\reflectbox{derivations}.'' See the difference?) On real-valued matrices like the~$A$ in the example, of course, the transpose and the adjoint amount to the same thing. If one needed to conjugate the elements of a matrix without transposing the matrix itself, one could contrive notation like~$A^{*T}$. Such a need seldom arises, however. Observe that \bq{matrix:120:30} \begin{split} (A_2A_1)^{T} &= A_1^{T}A_2^{T}, \\ (A_2A_1)^{*} &= A_1^{*}A_2^{*}, \end{split} \eq and more generally that% \footnote{ Recall from \S~\ref{alggeo:227} that $\prod_k A_k=\cdots A_3 A_2 A_1$, whereas $\coprod_k A_k=A_1 A_2 A_3 \cdots$. } \bq{matrix:120:33} \begin{split} \left(\prod_k A_k\right)^{T} &= \coprod_k A_k^{T}, \\ \left(\prod_k A_k\right)^{*} &= \coprod_k A_k^{*}. \end{split} \eq % ---------------------------------------------------------------------- \section{The Kronecker delta} \label{matrix:150} \index{Kronecker delta} \index{Kronecker, Leopold (1823--1891)} \index{delta, Kronecker} \index{$\delta$} \index{sifting property} \index{Kronecker delta!sifting property of} \index{delta, Kronecker!sifting property of} Section~\ref{integ:670} has introduced the Dirac delta. The discrete analog of the Dirac delta is the \emph{Kronecker delta}% \footnote{\cite[``Kronecker delta,'' 15:59, 31 May 2006]{wikip}} \bq{matrix:150:kron1} \delta_{i} \equiv \begin{cases} 1 &\mbox{if $i=0$,} \\ 0 &\mbox{otherwise;} \end{cases} \eq or \bq{matrix:150:kron2} \delta_{ij} \equiv \begin{cases} 1 &\mbox{if $i=j$,} \\ 0 &\mbox{otherwise.} \end{cases} \eq The Kronecker delta enjoys the Dirac-like properties that \bq{matrix:150:sift0} \sum_{i=-\infty}^{\infty} \delta_i = \sum_{i=-\infty}^{\infty} \delta_{ij} = \sum_{j=-\infty}^{\infty} \delta_{ij} = 1 \eq and that \bq{matrix:150:sift} \sum_{j=-\infty}^{\infty} \delta_{ij} a_{jk} = a_{ik}, \eq the latter of which is the Kronecker sifting property. The Kronecker equations~(\ref{matrix:150:sift0}) and~(\ref{matrix:150:sift}) parallel the Dirac equations~(\ref{integ:670:sift0}) and~(\ref{integ:670:sift}). Chs.~\ref{matrix} and~\ref{eigen} will find frequent use for the Kronecker delta. Later, \S~\ref{vector:240.30} will revisit the Kronecker delta in another light. % ---------------------------------------------------------------------- \section{Dimensionality and matrix forms} \label{matrix:180} \index{dimensionality} \index{infinite dimensionality} \index{matrix!form of} \index{matrix!multiplication of} \index{multiplication!of matrices} \index{padding a matrix with zeros} \index{zero!padding a matrix with} \index{matrix!padding of with zeros} An $m \times n$ matrix like \[ X = \mf{rrr}{ -4 & 0 \\ 1 & 2 \\ 2 &-1 } \] can be viewed as the $\infty \times \infty$ matrix { \settowidth\tla{\fn$0$} \[ X = \mf{crrrrrrrc}{ \ddots& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \\ \cdots&0&0& 0 & 0 &0&0&0&\cdots \\ \cdots&0&0& 0 & 0 &0&0&0&\cdots \\ \cdots&0&0& \makebox[\tla][r]{$-\!4$} & 0 &0&0&0&\cdots \\ \cdots&0&0& 1 & 2 &0&0&0&\cdots \\ \cdots&0&0& 2 & \makebox[\tla][r]{$-\!1$} &0&0&0&\cdots \\ \cdots&0&0& 0 & 0 &0&0&0&\cdots \\ \cdots&0&0& 0 & 0 &0&0&0&\cdots \\ & \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \multicolumn{1}{c}{\vdots}& \ddots }, \] }% with zeros in the unused cells. As before, $x_{11}=-4$ and $x_{32}=-1$, but now~$x_{ij}$ exists for all integral~$i$ and~$j$; for instance, $x_{(-1)(-1)}=0$. For such a matrix, indeed for all matrices, the matrix multiplication rule~(\ref{matrix:000:40}) generalizes to \bqa B &=& AX, \xn\\ b_{ik} &=& \sum_{j=-\infty}^{\infty} a_{ij}x_{jk}. \label{matrix:180:05} \eqa \index{matrix!square} \index{square matrix} \index{matrix operator} For square matrices whose purpose is to manipulate other matrices or vectors in place, merely padding with zeros often does not suit. Consider for example the square matrix \[ A_3 = \mf{ccc}{ 1 & 0 & 0 \\ 5 & 1 & 0 \\ 0 & 0 & 1 }. \] This~$A_3$ is indeed a matrix, but when it acts~$A_3X$ as a row operator on some $3 \times p$ matrix~$X$, its effect is to add to $X$'s second row,~$5$ times the first. Further consider \[ A_4 = \mf{cccc}{ 1 & 0 & 0 & 0 \\ 5 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 }, \] which does the same to a $4 \times p$ matrix~$X$. We can also define $A_5, A_6, A_7, \ldots$, if we want; but, really, all these express the same operation: ``to add to the second row,~$5$ times the first.'' \index{main diagonal} \index{diagonal!main} \index{matrix!main diagonal of} The $\infty \times \infty$ matrix \[ A = \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots & 1 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 1 & 0 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 5 & 1 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 0 & 1 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 0 & 0 & 1 & \cdots \\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots } \] expresses the operation generally. As before, $a_{11} = 1$ and $a_{21} = 5$, but now also $a_{(-1)(-1)} = 1$ and $a_{09} = 0$, among others. By running ones infinitely both ways out the \emph{main diagonal,} we guarantee by~(\ref{matrix:180:05}) that when~$A$ acts~$AX$ on a matrix~$X$ of any dimensionality whatsoever,~$A$ adds to the second row of~$X$, $5$ times the first---and affects no other row. (But what if~$X$ is a $1 \times p$ matrix, and \emph{has} no second row? Then the operation~$AX$ creates a new second row,~$5$ times the first---or rather so fills in $X$'s previously null second row.) \index{bit} \index{extended operator} In the infinite-dimensional view, the matrices~$A$ and~$X$ differ essentially.% \footnote{ This particular section happens to use the symbols~$A$ and~$X$ to represent certain specific matrix forms because such usage flows naturally from the usage $A\ve x = \ve b$ of \S~\ref{matrix:120}. Such usage admittedly proves awkward in other contexts. Traditionally in matrix work and elsewhere in the book, the letter~$A$ does not necessarily represent an extended operator as it does here, but rather an arbitrary matrix of no particular form. } This section explains, developing some nonstandard formalisms the derivations of later sections and chapters can use.% \footnote{ The idea of infinite dimensionality is sure to discomfit some readers, who have studied matrices before and are used to thinking of a matrix as having some definite size. There is nothing wrong with thinking of a matrix as having some definite size, only that that view does not suit the present book's development. And really, the idea of an $\infty \times 1$ vector or an $\infty \times \infty$ matrix should not seem so strange. After all, consider the vector~$\ve u$ such that \[ u_\ell = \sin \ell\ep, \] where $0 < \ep \ll 1$ and~$\ell$ is an integer, which holds all values of the function $\sin\theta$ of a real argument~$\theta$. Of course one does not actually write down or store all the elements of an infinite-dimensional vector or matrix, any more than one actually writes down or stores all the bits (or digits) of~$2\pi$. Writing them down or storing them is not the point. The point is that infinite dimensionality is all right; that the idea thereof does not threaten to overturn the reader's pre\"existing matrix knowledge; that, though the construct seem unfamiliar, no fundamental conceptual barrier rises against it. Different ways of looking at the same mathematics can be extremely useful to the applied mathematician. The applied mathematical reader who has never heretofore considered infinite dimensionality in vectors and matrices would be well served to take the opportunity to do so here. As we shall discover in Ch.~\ref{gjrank}, dimensionality is a poor measure of a matrix's size in any case. What really counts is not a matrix's $m \times n$ dimensionality but rather its \emph{rank.} } \subsection{The null and dimension-limited matrices} \label{matrix:180.25} \index{null matrix} \index{matrix!null} \index{dimension-limited matrix} \index{matrix!dimension-limited} \index{zero matrix} \index{zero!matrix} \index{$0$ (zero)!matrix} The \emph{null matrix} is just what its name implies: \[ 0 = \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 0 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 0 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }; \] or more compactly, \[ [0]_{ij} = 0. \] Special symbols like~$\dyad 0$, $\ve 0$ or~$O$ are possible for the null matrix, but usually a simple~$0$ suffices. There are no surprises here; the null matrix brings all the expected properties of a zero, like \bqb 0 + A &=& A, \\ \mbox{} [0][X] &=& 0. \eqb \index{zero vector} \index{zero!vector} \index{$0$ (zero)!vector} The same symbol~$0$ used for the null scalar (zero) and the null matrix is used for the null vector, too. Whether the scalar~$0$, the vector~$0$ and the matrix~$0$ actually represent different things is a matter of semantics, but the three are interchangeable for most practical purposes in any case. Basically, a zero is a zero is a zero; there's not much else to it.% \footnote{ Well, of course, there's a lot else to it, when it comes to dividing by zero as in Ch.~\ref{drvtv}, or to summing an infinity of zeros as in Ch.~\ref{integ}, but those aren't what we were speaking of here. } \index{active region} Now a formality: the ordinary $m \times n$ matrix~$X$ can be viewed, infinite-dimensionally, as a variation on the null matrix, inasmuch as~$X$ differs from the null matrix only in the~$mn$ elements~$x_{ij}$, $1 \le i \le m$, $1 \le j \le n$. Though the theoretical dimensionality of~$X$ be $\infty \times \infty$, one need record only the~$mn$ elements, plus the values of~$m$ and~$n$, to retain complete information about such a matrix. So the semantics are these: when we call a matrix~$X$ an \emph{$m \times n$ matrix,} or more precisely a \emph{dimension-limited matrix} with an $m \times n$ \emph{active region,} we will mean formally that~$X$ is an $\infty \times \infty$ matrix whose elements are all zero outside the $m \times n$ rectangle: \bq{matrix:180:27} \mbox{$x_{ij} = 0$ except where $1 \le i \le m$ and $1 \le j \le n$.} \eq By these semantics, every $3 \times 2$ matrix (for example) is also a formally a $4 \times 4$ matrix; but a $4 \times 4$ matrix is not in general a $3 \times 2$ matrix. \subsection[The identity, scalar and extended matrices] {The identity and scalar matrices and the extended operator} \label{matrix:180.35} \index{extended operator} \index{identity matrix} \index{matrix!identity} \index{general identity matrix} \index{matrix!general identity} \index{scalar matrix} \index{matrix!scalar} \index{one} \index{$1$ (one)} The \emph{general identity matrix}---or simply, the \emph{identity matrix}---is \[ I = \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 1 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 1 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \] or more compactly, \bq{matrix:180:31} [I]_{ij} = \delta_{ij}, \eq where~$\delta_{ij}$ is the Kronecker delta of \S~\ref{matrix:150}. The identity matrix~$I$ is a matrix~$1$, as it were,% \footnote{ In fact you can write it as~$1$ if you like. That is essentially what it is. The~$I$ can be regarded as standing for ``identity'' or as the Roman numeral~I\@. } bringing the essential property one expects of a~$1$: \bq{matrix:180:32} IX = X = XI. \eq The \emph{scalar matrix} is \[ \lambda I = \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & \lambda & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & \lambda & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & \lambda & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & \lambda & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & \lambda & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \] or more compactly, \bq{matrix:180:33} [\lambda I]_{ij} = \lambda \delta_{ij}, \eq If the identity matrix~$I$ is a matrix~$1$, then the scalar matrix~$\lambda I$ is a matrix~$\lambda$, such that \bq{matrix:180:34} [\lambda I] X = \lambda X = X [\lambda I]. \eq The identity matrix is (to state the obvious) just the scalar matrix with $\lambda=1$. \index{sparsity} \index{matrix!sparse} The \emph{extended operator}~$A$ is a variation on the scalar matrix~$\lambda I$, $\lambda \neq 0$, inasmuch as~$A$ differs from~$\lambda I$ only in~$p$ specific elements, with~$p$ a finite number. Symbolically, \bqa a_{ij} &=& \begin{cases} (\lambda)(\delta_{ij} + \alpha_k) & \mbox{if $(i,j) = (i_k,j_k)$, $1 \le k \le p$,} \\ \lambda\delta_{ij} & \mbox{otherwise;} \end{cases} \label{matrix:180:29} \\ \lambda &\neq& 0. \xn \eqa The several~$\alpha_k$ control how the extended operator~$A$ differs from~$\lambda I$. One need record only the several~$\alpha_k$ along with their respective addresses $(i_k,j_k)$, plus the scale~$\lambda$, to retain complete information about such a matrix. For example, for an extended operator fitting the pattern \[ A = \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & \lambda & 0 & 0 & 0 & 0 & \cdots \\ \cdots & \lambda \alpha_1 & \lambda & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & \lambda & \lambda \alpha_2 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & \lambda(1+\alpha_3) & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & \lambda & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \] one need record only the values of~$\alpha_1$, $\alpha_2$ and~$\alpha_3$, the respective addresses % bad break $(2,1)$, $(3,4)$ and~$(4,4)$, and the value of the scale~$\lambda$; this information alone implies the entire $\infty \times \infty$ matrix~$A$. When we call a matrix~$A$ an \emph{extended $n \times n$ operator,} or an extended operator with an $n \times n$ \emph{active region,} we will mean formally that~$A$ is an $\infty \times \infty$ matrix and is further an extended operator for which \bq{matrix:180:28} \mbox{$1 \le i_k \le n$ and $1 \le j_k \le n$ for all $1 \le k \le p$.} \eq That is, an extended $n \times n$ operator is one whose several~$\alpha_k$ all lie within the $n \times n$ square. The~$A$ in the example is an extended $4 \times 4$ operator (and also a $5 \times 5$, a $6 \times 6$, etc., but not a $3 \times 3$). (Often in practice for smaller operators---especially in the typical case that $\lambda = 1$---one finds it easier just to record all the $n \times n$ elements of the active region. This is fine. Large matrix operators however tend to be \emph{sparse,} meaning that they depart from~$\lambda I$ in only a very few of their many elements. It would waste a lot of computer memory explicitly to store all those zeros, so one normally stores just the few elements, instead.) Implicit in the definition of the extended operator is that the identity matrix~$I$ and the scalar matrix~$\lambda I$, $\lambda \neq 0$, are extended operators with $0 \times 0$ active regions (and also $1 \times 1$, $2 \times 2$, etc.). If $\lambda = 0$, however, the scalar matrix~$\lambda I$ is just the null matrix, which is no extended operator but rather by definition a $0 \times 0$ dimension-limited matrix. \subsection{The active region} \label{matrix:180.38} \index{active region} Though maybe obvious, it bears stating explicitly that a product of % bad break di\-men\-sion-lim\-ited and/or extended-operational matrices with $n \times n$ active regions itself has an $n \times n$ active region.% \footnote{ The section's earlier subsections formally define the term \emph{active region} with respect to each of the two matrix forms. } (Remember that a matrix with an $m' \times n'$ active region also by definition has an $n \times n$ active region if $m' \le n$ and $n' \le n$.) If any of the factors has dimension-limited form then so does the product; otherwise the product is an extended operator.% \footnote{ If symbolic proof of the subsection's claims is wanted, here it is in outline: \settowidth\tla{$\sum_k a_{ik}(\lambda_b\delta{kj})$} \settowidth\tld{$\makebox[\tla][l]{$\sum_k a_{ik}(\lambda_b\delta_{kj})$} = \lambda_b a_{ij}$} \settowidth\tlc{$j$} \settowidth\tlb{$\lambda_a\lambda_b\delta_{ij}$} \bqb a_{ij} &=& \makebox[\tlb][l]{$\lambda_a\delta_{ij}$} \ \ \mbox{unless $1 \le (i,j) \le n$}, \\ b_{ij} &=& \makebox[\tlb][l]{$\lambda_b\delta_{ij}$} \ \ \mbox{unless $1 \le (i,j) \le n$}; \\ \mbox{} [AB]_{ij} &=& \ds\sum_k a_{ik}b_{kj} \\&=& \begin{cases} \makebox[\tld][l]{$\makebox[\tla][l]{$\sum_k (\lambda_a\delta_{ik})b_{kj}$} = \lambda_a b_{ij}$} \ \ \mbox{unless $1 \le \makebox[\tlc][c]{$i$} \le n$} \\ % Deliberately left the comma off the end of the last line. It looks funny. \makebox[\tld][l]{$\makebox[\tla][l]{$\sum_k a_{ik}(\lambda_b\delta_{kj})$} = \lambda_b a_{ij}$} \ \ \mbox{unless $1 \le \makebox[\tlc][c]{$j$} \le n$} \\ \end{cases} \\&=& \makebox[\tlb][l]{$\lambda_a\lambda_b \delta_{ij}$} \ \ \mbox{unless $1 \le (i,j) \le n$}. \eqb It's probably easier just to sketch the matrices and look at them, though. } \subsection{Other matrix forms} \label{matrix:180.45} Besides the dimension-limited form of \S~\ref{matrix:180.25} and the extended-operational form of \S~\ref{matrix:180.35}, other infinite-dimensional matrix forms are certainly possible. One could for example advantageously define a ``null sparse'' form, recording only nonzero elements and their addresses in an otherwise null matrix; or a ``tridiagonal extended'' form, bearing repeated entries not only along the main diagonal but also along the diagonals just above and just below. Section~\ref{matrix:340} introduces one worthwhile matrix which fits neither the dimension-limited nor the extended-operational form. Still, the dimension-limited and extended-operational forms are normally the most useful, and they are the ones we will principally be handling in this book. One reason to have defined specific infinite-dimensional matrix forms is to show how straightforwardly one can fully represent a practical matrix of an infinity of elements by a modest, finite quantity of information. Further reasons to have defined such forms will soon occur. \subsection{The rank-$r$ identity matrix} \label{matrix:180.22} \index{identity matrix} \index{matrix!identity} \index{identity matrix!$r$-dimensional} \index{identity matrix!rank-$r$} The \emph{rank-$r$ identity matrix}~$I_r$ is the dimension-limited matrix for which \bq{matrix:180:IM} [I_r]_{ij} = \begin{cases} \delta_{ij} &\mbox{if $1 \le i \le r$ and/or $1 \le j \le r$,} \\ 0 &\mbox{otherwise,} \end{cases} \eq where either the ``and'' or the ``or'' can be regarded (it makes no difference). The effect of~$I_r$ is that \bq{matrix:180:22} \begin{split} I_m X &= X = XI_n, \\ I_m\ve x &= \ve x, \end{split} \eq where~$X$ is an $m\times n$ matrix and~$\ve x$, an $m\times 1$ vector. Examples of~$I_r$ include \[ I_3 = \mf{rrr}{ 1&0&0 \\ 0&1&0 \\ 0&0&1 }. \] (Remember that in the infinite-dimensional view,~$I_3$, though a $3 \times 3$ matrix, is formally an $\infty\times\infty$ matrix with zeros in the unused cells. It has only the three ones and fits the $3 \times 3$ dimension-limited form of \S~\ref{matrix:180.25}. The areas of~$I_3$ not shown are all zero, even along the main diagonal.) The rank~$r$ can be any nonnegative integer, even zero (though the rank-zero identity matrix~$I_0$ is in fact the null matrix, normally just written~$0$). If alternate indexing limits are needed (for instance for a computer-indexed identity matrix whose indices run from~$0$ to $r-1$), the notation~$I_a^b$, where \bq{matrix:180:IMC} [I_a^b]_{ij} \equiv \begin{cases} \delta_{ij} &\mbox{if $a \le i \le b$ and/or $a \le j \le b$,} \\ 0 &\mbox{otherwise,} \end{cases} \eq can be used; the rank in this case is $r=b-a+1$, which is just the count of ones along the matrix's main diagonal. The name ``rank-$r$'' implies that~$I_r$ has a ``rank'' of~$r$, and indeed it does. For the moment, however, we will discern the attribute of rank only in the rank-$r$ identity matrix itself. Section~\ref{gjrank:340} defines \emph{rank} for matrices more generally. \subsection{The truncation operator} \label{matrix:180.23} \index{operator!truncation} \index{truncation operator} \index{matrix!truncating} The rank-$r$ identity matrix~$I_r$ is also the \emph{truncation operator.} Attacking from the left, as in~$I_rA$, it retains the first through $r$th rows of~$A$ but cancels other rows. Attacking from the right, as in~$AI_r$, it retains the first through $r$th columns. Such truncation is useful symbolically to reduce an extended operator to dimension-limited form. \index{commutation!of the identity matrix} \index{identity matrix!commutation of} Whether a matrix~$C$ has dimension-limited or extended-operational form (though not necessarily if it has some other form), if it has an $m \times n$ active region% \footnote{ Refer to the definitions of \emph{active region} in \S\S~\ref{matrix:180.25} and~\ref{matrix:180.35}. That a matrix has an $m \times n$ active region does not necessarily mean that it is all zero outside the $m \times n$ rectangle. (After all, if it were always all zero outside, then there would be little point in applying a truncation operator. There would be nothing there to truncate.) } and \[ \begin{split} m &\le r, \\ n &\le r, \end{split} \] then \bq{matrix:180:35} I_rC = I_rCI_r = CI_r. \eq For such a matrix,~(\ref{matrix:180:35}) says at least two things: \bi \item It is superfluous to truncate both rows and columns; it suffices to truncate one or the other. \item The rank-$r$ identity matrix~$I_r$ commutes freely past~$C$. \ei Evidently big identity matrices commute freely where small ones cannot (and the general identity matrix $I=I_\infty$ commutes freely past everything). \subsection [The elementary vector and lone-element matrix] {The elementary vector and the lone-element matrix} \label{matrix:180.30} \index{lone-element matrix} \index{matrix!lone-element} \index{elementary vector} \index{vector!elementary} The \emph{lone-element matrix}~$E_{mn}$ is the matrix with a one in the $mn$th cell and zeros elsewhere: \bq{matrix:180:E} [E_{mn}]_{ij} \equiv \delta_{im}\delta_{jn} = \begin{cases} 1 &\mbox{if $i=m$ and $j=n$,} \\ 0 &\mbox{otherwise.} \end{cases} \eq By this definition, $C = \sum_{i,j} c_{ij} E_{ij}$ for any matrix~$C$. The vector analog of the lone-element matrix is the \emph{elementary vector}~$\ve e_m$, which has a one as the $m$th element: \bq{matrix:180:e} [\ve e_{m}]_{i} \equiv \delta_{im} = \begin{cases} 1 &\mbox{if $i=m$,} \\ 0 &\mbox{otherwise.} \end{cases} \eq By this definition, $[I]_{*j} = \ve e_j$ and $[I]_{i*} = \ve e_i^T$. \subsection{Off-diagonal entries} \label{matrix:180.60} \index{off-diagonal entries} It is interesting to observe and useful to note that if \[ \left[ C_1 \right]_{i*} = \left[ C_2 \right]_{i*} = \ve e_i^T, \] then also \bq{matrix:180:61} \left[ C_1 C_2 \right]_{i*} = \ve e_i^T; \eq and likewise that if \[ \left[ C_1 \right]_{*j} = \left[ C_2 \right]_{*j} = \ve e_j, \] then also \bq{matrix:180:62} \left[ C_1 C_2 \right]_{*j} = \ve e_j. \eq The product of matrices has off-diagonal entries in a row or column only if at least one of the factors itself has off-diagonal entries in that row or column. Or, less readably but more precisely, \emph{the $i$th row or $j$th column of the product of matrices can depart from~$\ve e_i^T$ or~$\ve e_j$, respectively, only if the corresponding row or column of at least one of the factors so departs.} The reason is that in~(\ref{matrix:180:61}), $C_1$ acts as a row operator on~$C_2$; that if $C_1$'s $i$th row is~$\ve e_i^T$, then its action is merely to duplicate $C_2$'s $i$th row, which itself is just~$\ve e_i^T$. Parallel logic naturally applies to~(\ref{matrix:180:62}). % ---------------------------------------------------------------------- \section{The elementary operator} \label{matrix:320} \index{elementary operator} \index{operator!elementary} % The tactical decision is made here to count the self-interchange as a % valid interchange operator. The opposite tactic is equally possible, % and each tactic has minor, subtle advantages over the other. A future % revision of the book might reverse the choice. \index{elementary operator!inverse of} Section~\ref{matrix:120.27} has introduced the general row or column operator. Conventionally denoted~$T$, the \emph{elementary operator} is a simple extended row or column operator from sequences of which more complicated extended operators can be built. The elementary operator~$T$ comes in three kinds.% \footnote{ In \S~\ref{matrix:180}, the symbol~$A$ specifically represented an extended operator, but here and generally the symbol represents any matrix. } \bi \item \index{elementary operator!interchange} \index{interchange operator!elementary} The first is the \emph{interchange elementary} \bq{matrix:320:Tdefxchg} T_{[i\lra j]} = I - (E_{ii}+E_{jj}) + (E_{ij}+E_{ji}), \eq which by operating $T_{[i\lra j]}A$ or $AT_{[i\lra j]}$ respectively interchanges $A$'s $i$th row or column with its $j$th.% \footnote{% \label{matrix:320:fn10}% As a matter of definition, some authors~\cite{Lay} forbid $T_{[i\lra i]}$ as an elementary operator, where $j=i$, since after all $T_{[i\lra i]}=I$; which is to say that the operator doesn't actually do anything. There exist legitimate tactical reasons to forbid (as in \S~\ref{matrix:322}), but normally this book permits. It is good to define a concept aesthetically. One should usually do so when one can; and indeed in this case one might reasonably promote either definition on aesthetic grounds. However, an applied mathematician ought not to let a mere definition entangle him. What matters is the underlying concept. Where the definition does not serve the concept well, the applied mathematician considers whether it were not worth the effort to adapt the definition accordingly. } \item \index{elementary operator!scaling} \index{scaling operator!elementary} The second is the \emph{scaling elementary} \bq{matrix:320:Tdefsc} T_{\alpha[i]} = I + (\alpha-1) E_{ii}, \ \ \alpha \neq 0, \eq which by operating $T_{\alpha[i]}A$ or $AT_{\alpha[i]}$ scales (multiplies) $A$'s $i$th row or column, respectively, by the factor~$\alpha$. \item \index{elementary operator!addition} \index{addition operator!elementary} The third and last is the \emph{addition elementary} \bq{matrix:320:Tdefadd} T_{\alpha[ij]} = I + \alpha E_{ij}, \ \ i\neq j, \eq which by operating $T_{\alpha[ij]}A$ adds to the $i$th row of~$A$, $\alpha$ times the $j$th row; or which by operating $AT_{\alpha[ij]}$ adds to the $j$th column of~$A$, $\alpha$ times the $i$th column. \ei Examples of the elementary operators include \bqb T_{[1\lra 2]} &=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 0 & 1 & 0 & 0 & 0 & \cdots \\ \cdots & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 1 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \\ T_{5[4]} &=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 1 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 5 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \\ T_{5[21]} &=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 5 & 1 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 1 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }. \eqb Note that none of these, and in fact no elementary operator of any kind, differs from~$I$ in more than four elements. \subsection{Properties} \label{matrix:320.10} \index{invertibility!of the elementary operator} \index{elementary operator!invertibility of} Significantly, elementary operators as defined above are always invertible (which is to say, reversible in effect), with \bq{matrix:320:30} \begin{split} T_{[i\lra j]}^{-1} &= T_{[j\lra i]} = T_{[i\lra j]}, \\ T_{\alpha[i]}^{-1} &= T_{(1/\alpha)[i]}, \\ T_{\alpha[ij]}^{-1} &= T_{-\alpha[ij]}, \end{split} \eq being themselves elementary operators such that \bq{matrix:320:33} T^{-1}T = I = TT^{-1} \eq in each case.% \footnote{ The addition elementary $T_{\alpha[ii]}$ and the scaling elementary $T_{0[i]}$ are forbidden precisely because they are not generally invertible. } This means that any sequence of elementaries $\prod_k T_k$ can safely be undone by the reverse sequence $\coprod_k T_k^{-1}$: \bq{matrix:320:34} \coprod_k T_k^{-1} \prod_k T_k = I = \prod_k T_k \coprod_k T_k^{-1}. \eq The rank-$r$ identity matrix~$I_r$ is no elementary operator,% \footnote{ If the statement seems to contradict statements of some other books, it is only a matter of definition. This book finds it convenient to define the elementary operator in infinite-dimensional, extended-operational form. The other books are not wrong; their underlying definitions just differ slightly. } nor is the lone-element matrix~$E_{mn}$; but the general identity matrix~$I$ is indeed an elementary operator. The last can be considered a distinct, fourth kind of elementary operator if desired; but it is probably easier just to regard it as an elementary of any of the first three kinds, since $I=T_{[i\lra i]}=T_{1[i]}=T_{0[ij]}$. From~(\ref{matrix:180:35}), we have that \bq{matrix:320:39} I_rT = I_rTI_r = TI_r \ \ \mbox{if}\ 1\le i\le r\ \mbox{and}\ 1\le j\le r \eq for any elementary operator~$T$ which operates within the given bounds. Equation~(\ref{matrix:320:39}) lets an identity matrix with sufficiently high rank pass through a sequence of elementaries as needed. In general, the transpose of an elementary row operator is the corresponding elementary column operator. Curiously, the interchange elementary is its own transpose and adjoint: \bq{matrix:320:50} T_{[i\lra j]}^{*} = T_{[i\lra j]} = T_{[i\lra j]}^{T}. \eq \subsection{Commutation and sorting} \label{matrix:320.20} \index{commutation!of elementary operators} \index{elementary operator!commutation of} \index{elementary operator!sorting of} Elementary operators often occur in long chains like \[ A = T_{-4[32]} T_{[2\lra 3]} T_{(1/5)[3]} T_{(1/2)[31]} T_{5[21]} T_{[1\lra 3]}, \] with several elementaries of all kinds intermixed. Some applications demand that the elementaries be sorted and grouped by kind, as \[ A = \left( T_{[2\lra 3]} T_{[1\lra 3]} \right) \left( T_{-4[21]} T_{(1/\mbox{\scriptsize 0xA})[13]} T_{5[23]} \right) \left( T_{(1/5)[1]} \right) \] or as \[ A = \left( T_{-4[32]} T_{(1/\mbox{\scriptsize 0xA})[21]} T_{5[31]} \right) \left( T_{(1/5)[2]} \right) \left( T_{[2\lra 3]} T_{[1\lra 3]} \right), \] among other possible orderings. Though you probably cannot tell just by looking, the three products above are different orderings of the same elementary chain; they yield the same~$A$ and thus represent exactly the same matrix operation. Interesting is that the act of reordering the elementaries has altered some of them into other elementaries of the same kind, but has changed the kind of none of them. \index{commutation!hierarchy of} \index{elementary operator!two, of different kinds} One sorts a chain of elementary operators by repeatedly exchanging adjacent pairs. This of course supposes that one can exchange adjacent pairs, which seems impossible since matrix multiplication is not commutative: $A_1A_2 \neq A_2A_1$. However, at the moment we are dealing in elementary operators only; and for most pairs~$T_1$ and~$T_2$ of elementary operators, though indeed $T_1T_2 \neq T_2T_1$, it so happens that there exists either a~$T_1'$ such that $T_1T_2 = T_2T_1'$ or a~$T_2'$ such that $T_1T_2 = T_2'T_1$, where~$T_1'$ and~$T_2'$ are elementaries of the same kinds respectively as~$T_1$ and~$T_2$. The attempt sometimes fails when both~$T_1$ and~$T_2$ are addition elementaries, but all other pairs commute in this way. Significantly, \emph{elementaries of different kinds always commute.} And, though commutation can alter one (never both) of the two elementaries, it changes the kind of neither. Many qualitatively distinct pairs of elementaries exist; we will list these exhaustively in a moment. First, however, we should like to observe a natural hierarchy among the three kinds of elementary: (i)~interchange; (ii)~scaling; (iii)~addition. \bi \item The interchange elementary is the strongest. Itself subject to alteration only by another interchange elementary, it can alter any elementary by commuting past. When an interchange elementary commutes past another elementary of any kind, what it alters are the other elementary's indices~$i$ and/or~$j$ (or~$m$ and/or~$n$, or whatever symbols happen to represent the indices in question). When two interchange elementaries commute past one another, only one of the two is altered. (Which one? Either. The mathematician chooses.) Refer to Table~\ref{matrix:Txchg2}. \item Next in strength is the scaling elementary. Only an interchange elementary can alter it, and it in turn can alter only an addition elementary. Scaling elementaries do not alter one another during commutation. When a scaling elementary commutes past an addition elementary, what it alters is the latter's scale~$\alpha$ (or~$\beta$, or whatever symbol happens to represent the scale in question). Refer to Table~\ref{matrix:Txchg3}. \item The addition elementary, last and weakest, is subject to alteration by either of the other two, itself having no power to alter any elementary during commutation. A pair of addition elementaries are the only pair that can altogether fail to commute---they fail when the row index of one equals the column index of the other---but when they do commute, neither alters the other. Refer to Table~\ref{matrix:Txchg1}. \ei { \nc\tcaption[2]{ \caption[Elementary operators: {#1}.] {Inverting, commuting, combining and expanding elementary operators: {#1}. In the table, $i\neq j\neq m\neq n$; no two indices are the same. {#2}} } \index{elementary operator!inverse of} \index{elementary operator!combination of} \index{elementary operator!expansion of} Tables~\ref{matrix:Txchg2}, \ref{matrix:Txchg3} and~\ref{matrix:Txchg1} list all possible pairs of elementary operators, as the reader can check. The only pairs that fail to commute are the last three of Table~\ref{matrix:Txchg1}. \begin{table} \tcaption{interchange}{ Notice that the effect an interchange elementary $T_{[m\lra n]}$ has in passing any other elementary, even another interchange elementary, is simply to replace~$m$ by~$n$ and~$n$ by~$m$ among the indices of the other elementary. } \label{matrix:Txchg2} \index{elementary operator!interchange} \[ \renewcommand\arraystretch{1.5} \br{rcl} T_{[m\lra n]} &=& T_{[n\lra m]} \\ T_{[m\lra m]} &=& I \\ I T_{[m\lra n]} &=& T_{[m\lra n]} I \\ T_{[m\lra n]}T_{[m\lra n]} &=& T_{[m\lra n]}T_{[n\lra m]} = T_{[n\lra m]}T_{[m\lra n]} = I \\ T_{[m\lra n]}T_{[i\lra n]} &=& T_{[i\lra n]}T_{[m\lra i]} = T_{[i\lra m]}T_{[m\lra n]} \\ &=& \left(T_{[i\lra n]}T_{[m\lra n]}\right)^2 \\ T_{[m\lra n]}T_{[i\lra j]} &=& T_{[i\lra j]}T_{[m\lra n]} \\ T_{[m\lra n]}T_{\alpha[m]} &=& T_{\alpha[n]}T_{[m\lra n]} \\ T_{[m\lra n]}T_{\alpha[i]} &=& T_{\alpha[i]}T_{[m\lra n]} \\ T_{[m\lra n]}T_{\alpha[ij]} &=& T_{\alpha[ij]}T_{[m\lra n]} \\ T_{[m\lra n]}T_{\alpha[in]} &=& T_{\alpha[im]}T_{[m\lra n]} \\ T_{[m\lra n]}T_{\alpha[mj]} &=& T_{\alpha[nj]}T_{[m\lra n]} \\ T_{[m\lra n]}T_{\alpha[mn]} &=& T_{\alpha[nm]}T_{[m\lra n]} \er \] \end{table} \begin{table} \tcaption{scaling}{} \label{matrix:Txchg3} \index{elementary operator!scaling} \[ \renewcommand\arraystretch{1.5} \br{rcl} T_{1[m]} &=& I \\ I T_{\beta[m]} &=& T_{\beta[m]} I \\ T_{(1/\beta)[m]}T_{\beta[m]} &=& I \\ T_{\beta[m]}T_{\alpha[m]} &=& T_{\alpha[m]}T_{\beta[m]} = T_{\alpha\beta[m]} \\ T_{\beta[m]}T_{\alpha[i]} &=& T_{\alpha[i]}T_{\beta[m]} \\ T_{\beta[m]}T_{\alpha[ij]} &=& T_{\alpha[ij]}T_{\beta[m]} \\ T_{\beta[m]}T_{\alpha\beta[im]} &=& T_{\alpha[im]}T_{\beta[m]} \\ T_{\beta[m]}T_{\alpha[mj]} &=& T_{\alpha\beta[mj]}T_{\beta[m]} \er \] \end{table} \begin{table} \tcaption{addition}{The last three lines give pairs of addition elementaries that do not commute.} \label{matrix:Txchg1} \index{elementary operator!addition} \[ \renewcommand\arraystretch{1.5} \br{rcl} T_{0[ij]} &=& I \\ I T_{\alpha[ij]} &=& T_{\alpha[ij]} I \\ T_{-\alpha[ij]} T_{\alpha[ij]} &=& I \\ T_{\beta[ij]}T_{\alpha[ij]} &=& T_{\alpha[ij]}T_{\beta[ij]} = T_{(\alpha+\beta)[ij]} \\ T_{\beta[mj]}T_{\alpha[ij]} &=& T_{\alpha[ij]}T_{\beta[mj]} \\ T_{\beta[in]}T_{\alpha[ij]} &=& T_{\alpha[ij]}T_{\beta[in]} \\ T_{\beta[mn]}T_{\alpha[ij]} &=& T_{\alpha[ij]}T_{\beta[mn]} \\ T_{\beta[mi]}T_{\alpha[ij]} &=& T_{\alpha[ij]}T_{\alpha\beta[mj]}T_{\beta[mi]} \\ T_{\beta[jn]}T_{\alpha[ij]} &=& T_{\alpha[ij]}T_{-\alpha\beta[in]}T_{\beta[jn]} \\ T_{\beta[ji]}T_{\alpha[ij]} &\neq& T_{\alpha[ij]}T_{\beta[ji]} \er \] \end{table} } % ---------------------------------------------------------------------- \section{Inversion and similarity (introduction)} \label{matrix:321} \index{inversion} \index{matrix!inversion of} \index{similarity transformation} \index{commutation!of matrices} \index{matrix!commutation of} If Tables~\ref{matrix:Txchg2}, \ref{matrix:Txchg3} and~\ref{matrix:Txchg1} exhaustively describe the commutation of one elementary past another elementary, then what can one write of the commutation of an elementary past the general matrix~$A$? With some matrix algebra, \[ \begin{split} TA &= (TA)(I) = (TA)(T^{-1}T), \\ AT &= (I)(AT) = (TT^{-1})(AT), \end{split} \] one can write that \bq{matrix:321:59} \begin{split} TA &= [TAT^{-1}]T, \\ AT &= T[T^{-1}AT], \end{split} \eq where~$T^{-1}$ is given by~(\ref{matrix:320:30}). An elementary commuting rightward changes % bad break $A$ to~$TAT^{-1}$; commuting leftward, to~$T^{-1}AT$. First encountered in \S~\ref{matrix:320}, the notation~$T^{-1}$ means the \emph{inverse} of the elementary operator~$T$, such that \[ T^{-1}T = I = TT^{-1}. \] Matrix inversion is not for elementary operators only, though. Many more general matrices~$C$ also have inverses such that \bq{matrix:321:10} C^{-1}C = I = CC^{-1}. \eq (Do all matrices have such inverses? No. For example, the null matrix has no such inverse.) The broad question of how to invert a general matrix~$C$, we leave for Chs.~\ref{gjrank} and~\ref{mtxinv} to address. For the moment however we should like to observe three simple rules involving matrix inversion. First, nothing in the logic leading to~(\ref{matrix:321:59}) actually requires the matrix~$T$ there to be an elementary operator. Any matrix~$C$ for which~$C^{-1}$ is known can fill the role. Hence, \bq{matrix:321:60} \begin{split} CA &= [CAC^{-1}]C, \\ AC &= C[C^{-1}AC]. \end{split} \eq The transformation~$CAC^{-1}$ or~$C^{-1}AC$ is called a \emph{similarity transformation.} Sections~\ref{gjrank:337} and~\ref{eigen:505} speak further of this. \index{inverse!of a matrix transpose or adjoint} \index{transpose!of a matrix inverse} \index{conjugate transpose!of a matrix inverse} \index{adjoint!of a matrix inverse} Second, \settowidth\tla{$C^{-T}$} \bq{matrix:321:70} \begin{split} \big(C^{T}\big)^{-1} &= \makebox[\tla][l]{$C^{-T}$} = \left(C^{-1}\right)^{T}, \\ \big(C^{*}\big)^{-1} &= \makebox[\tla][l]{$C^{-*}$} = \left(C^{-1}\right)^{*}, \end{split} \eq where~$C^{-*}$ is condensed notation for conjugate transposition and inversion in either order and~$C^{-T}$ is of like style. Equation~(\ref{matrix:321:70}) is a consequence of~(\ref{matrix:120:30}), since for conjugate transposition \[ \left(C^{-1}\right)^{*} C^{*} = \left[CC^{-1}\right]^{*} = \left[I\right]^{*} = I = \left[I\right]^{*} = \left[C^{-1}C\right]^{*} = C^{*} \left(C^{-1}\right)^{*} \] and similarly for nonconjugate transposition. \index{inverse!of a matrix product} Third, \bq{matrix:321:80} \left( \prod_k C_k \right)^{-1} = \coprod_k C_k^{-1}. \eq This rule emerges upon repeated application of~(\ref{matrix:321:10}), which yields \[ \coprod_k C_k^{-1} \prod_k C_k = I = \prod_k C_k \coprod_k C_k^{-1}. \] \index{matrix!rank-$r$ inverse of} \index{rank-$r$ inverse} A more limited form of the inverse exists than the infinite-dimensional form of~(\ref{matrix:321:10}). This is the rank-$r$ inverse, a matrix $C^{-1(r)}$ such that \bq{matrix:321:20} C^{-1(r)}C = I_r = CC^{-1(r)}. \eq The full notation $C^{-1(r)}$ is not standard and usually is not needed, since the context usually implies the rank. When so, one can abbreviate the notation to~$C^{-1}$. In either notation,~(\ref{matrix:321:70}) and~(\ref{matrix:321:80}) apply equally for the rank-$r$ inverse as for the infinite-dimensional inverse. Because of~(\ref{matrix:180:35}), eqn.~(\ref{matrix:321:60}) too applies for the rank-$r$ inverse if $A$'s active region is limited to $r \times r$. (Section~\ref{mtxinv:230} uses the rank-$r$ inverse to solve an exactly determined linear system. This is a famous way to use the inverse, with which many or most readers will already be familiar; but before using it so in Ch.~\ref{mtxinv}, we shall first learn how to compute it reliably in Ch.~\ref{gjrank}.) Table~\ref{matrix:321:tbl} summarizes. \begin{table} \caption[Matrix inversion properties.] {Matrix inversion properties. (The properties work equally for $C^{-1(r)}$ as for~$C^{-1}$ if~$A$ honors an $r \times r$ active region. The full notation $C^{-1(r)}$ for the rank-$r$ inverse incidentally is not standard, usually is not needed, and normally is not used.)} \label{matrix:321:tbl} \index{matrix!inversion properties of} \[ \renewcommand\arraystretch{1.3} \br{rcccl} C^{-1}C &=& I &=& CC^{-1} \\ C^{-1(r)}C &=& I_r &=& CC^{-1(r)} \\ \big(C^{T}\big)^{-1} &=& C^{-T} &=& \left(C^{-1}\right)^{T} \\ \big(C^{*}\big)^{-1} &=& C^{-*} &=& \left(C^{-1}\right)^{*} \er \] \bqb CA &=& [CAC^{-1}]C \\ AC &=& C[C^{-1}AC] \\ \left( \prod_k C_k \right)^{-1} &=& \coprod_k C_k^{-1} \eqb \end{table} % ---------------------------------------------------------------------- \section{Parity} \label{matrix:322} \index{parity} \index{oddness} \index{evenness} Consider the sequence of integers or other objects $1,2,3,\ldots,n$. By successively interchanging pairs of the objects (any pairs, not just adjacent pairs), one can achieve any desired permutation (\S~\ref{drvtv:220.20}). For example, beginning with $1,2,3,4,5$, one can achieve the permutation $3,5,1,4,2$ by interchanging first the~$1$ and~$3$, then the~$2$ and~$5$. Now contemplate all possible pairs: \settowidth\tla{$(0,n)$} \bqb \br{lllcl} (1,2) & (1,3) & (1,4) & \cdots & (1,n); \\ & (2,3) & (2,4) & \cdots & (2,n); \\ & & (3,4) & \cdots & (3,n); \\ & & & \ddots & \makebox[\tla][c]{\vdots} \\ & & & & (n-1,n). \er \eqb In a given permutation (like $3,5,1,4,2$), some pairs will appear in correct order with respect to one another, while others will appear in incorrect order. (In $3,5,1,4,2$, the pair $[1,2]$ appears in correct order in that the larger~$2$ stands to the right of the smaller~$1$; but the pair $[1,3]$ appears in incorrect order in that the larger~$3$ stands to the \emph{left} of the smaller~$1$.) If~$p$ is the number of pairs which appear in incorrect order (in the example, $p=6$), and if~$p$ is even, then we say that the permutation has \emph{even} or \emph{positive parity;} if odd, then \emph{odd} or \emph{negative parity.}% \footnote{ For readers who learned arithmetic in another language than English, the \emph{even} integers are $\ldots, -4, -2, 0, 2, 4, 6, \ldots$; the \emph{odd} integers are $\ldots, -3, -1, 1, 3, 5, 7, \ldots$\,. } Now consider: every interchange of adjacent elements must either increment or decrement~$p$ by one, reversing parity. Why? Well, think about it. If two elements are adjacent and their order is correct, then interchanging falsifies the order, but only of that pair (no other element interposes, thus the interchange affects the ordering of no other pair). Complementarily, if the order is incorrect, then interchanging rectifies the order. Either way, an adjacent interchange alters~$p$ by exactly~$\pm 1$, thus reversing parity. What about nonadjacent elements? Does interchanging a pair of these reverse parity, too? To answer the question, let~$u$ and~$v$ represent the two elements interchanged, with $a_1,a_2,\ldots,a_m$ the elements lying between. Before the interchange: \[ \ldots,u,a_1,a_2,\ldots,a_{m-1},a_m,v,\ldots \] After the interchange: \[ \ldots,v,a_1,a_2,\ldots,a_{m-1},a_m,u,\ldots \] The interchange reverses with respect to one another just the pairs \[ \br{lllll} (u,a_1) & (u,a_2) & \cdots & (u,a_{m-1}) & (u,a_m) \\ (a_1,v) & (a_2,v) & \cdots & (a_{m-1},v) & (a_m,v) \\ (u,v) \er \] The number of pairs reversed is odd. Since each reversal alters~$p$ by~$\pm 1$, the net change in~$p$ apparently also is odd, reversing parity. It seems that regardless of how distant the pair, \emph{interchanging any pair of elements reverses the permutation's parity.} The sole exception arises when an element is interchanged with itself. This does not change parity, but it does not change anything else, either, so in parity calculations we ignore it.% \footnote{ This is why some authors forbid self-interchanges, as explained in footnote~\ref{matrix:320:fn10}. } All other interchanges reverse parity. We discuss parity in this, a chapter on matrices, because parity concerns the elementary interchange operator of \S~\ref{matrix:320}. The rows or columns of a matrix can be considered elements in a sequence. If so, then the interchange operator $T_{[i\lra j]}$, $i\neq j$, acts precisely in the manner described, interchanging rows or columns and thus reversing parity. It follows that if $i_k\neq j_k$ and~$q$ is odd, then $\prod_{k=1}^q T_{[i_k\lra j_k]} \neq I$. However, it is possible that $\prod_{k=1}^q T_{[i_k\lra j_k]} = I$ if~$q$ is even. In any event, even~$q$ implies even~$p$, which means even (positive) parity; odd~$q$ implies odd~$p$, which means odd (negative) parity. We shall have more to say about parity in \S\S~\ref{matrix:325.10} and~\ref{eigen:310}. % ---------------------------------------------------------------------- \section{The quasielementary operator} \label{matrix:325} \index{quasielementary operator} \index{operator!quasielementary} Multiplying sequences of the elementary operators of \S~\ref{matrix:320}, one can form much more complicated operators, which per~(\ref{matrix:320:34}) are always invertible. Such complicated operators are not trivial to analyze, however, so one finds it convenient to define an intermediate class of operators, called in this book the \emph{quasielementary operators,} more complicated than elementary operators but less so than arbitrary matrices. A quasielementary operator is composed of elementaries only of a single kind. There are thus three kinds of quasielementary---interchange, scaling and addition---to match the three kinds of elementary. With respect to interchange and scaling, any sequences of elementaries of the respective kinds are allowed. With respect to addition, there are some extra rules, explained in \S~\ref{matrix:325.30}. The three subsections which follow respectively introduce the three kinds of quasielementary operator. % bad break \subsection[The general interchange operator] {The interchange quasielementary or general inter-\linebreak change operator} \label{matrix:325.10} \index{quasielementary operator!interchange} \index{interchange quasielementary} \index{general interchange operator} \index{operator!general interchange} \index{interchange operator!general} \index{permutation matrix} \index{permutor} Any product~$P$ of zero or more interchange elementaries, \bq{325:05} P = \prod_k T_{[i_k\lra j_k]}, \eq constitutes an \emph{interchange quasielementary,} \emph{permutation matrix}, \emph{permutor} or \emph{general interchange operator.}% \footnote{ The letter~$P$ here recalls the verb ``to permute.'' } An example is \[ P = T_{[2\lra 5]}T_{[1\lra 3]} = \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ \cdots & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 1 & 0 & \cdots \\ \cdots & 0 & 1 & 0 & 0 & 0 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }. \] This operator resembles~$I$ in that it has a single one in each row and in each column, but the ones here do not necessarily run along the main diagonal. The effect of the operator is to shuffle the rows or columns of the matrix it operates on, without altering any of the rows or columns it shuffles. By~(\ref{matrix:320:34}), (\ref{matrix:320:30}), (\ref{matrix:320:50}) and~(\ref{matrix:120:33}), the inverse of the general interchange operator is \bqa P^{-1} &=& \left( \prod_k T_{[i_k\lra j_k]} \right)^{-1} = \coprod_k T^{-1}_{[i_k\lra j_k]} \xn\\&=& \coprod_k T_{[i_k\lra j_k]} \xn\\&=& \coprod_k T_{[i_k\lra j_k]}^{*} = \left( \prod_k T_{[i_k\lra j_k]} \right)^{*} \xn\\&=& P^{*} = P^{T} \label{matrix:325:10} \eqa (where $P^{*}=P^T$ because~$P$ has only real elements). The inverse, transpose and adjoint of the general interchange operator are thus the same: \bq{matrix:325:11} P^{T}P = P^{*}P = I = P P^{*} = P P^{T}. \eq A significant attribute of the general interchange operator~$P$ is its parity: positive or even parity if the number of interchange elementaries $T_{[i_k\lra j_k]}$ which compose it is even; negative or odd parity if the number is odd. This works precisely as described in \S~\ref{matrix:322}. For the purpose of parity determination, only interchange elementaries $T_{[i_k\lra j_k]}$ for which $i_k \neq j_k$ are counted; any $T_{[i\lra i]}=I$ noninterchanges are ignored. Thus the example's~$P$ above has even parity (two interchanges), as does~$I$ itself (zero interchanges), but $T_{[i\lra j]}$ alone (one interchange) has odd parity if $i\neq j$. As we shall see in \S~\ref{eigen:310}, the positive (even) and negative (odd) parities sometimes lend actual positive and negative senses to the matrices they describe. The parity of the general interchange operator~$P$ concerns us for this reason. Parity, incidentally, is a property of the matrix~$P$ itself, not just of the operation~$P$ represents. No interchange quasielementary~$P$ has positive parity as a row operator but negative as a column operator. The reason is that, regardless of whether one ultimately means to use~$P$ as a row or column operator, the matrix is nonetheless composable as a definite sequence of interchange elementaries. It is the number of interchanges, not the use, which determines $P$'s parity. \subsection[The general scaling operator] {The scaling quasielementary or general scaling operator} \label{matrix:325.20} \index{quasielementary operator!scaling} \index{scaling quasielementary} \index{general scaling operator} \index{operator!general scaling} \index{scaling operator!general} \index{diagonal matrix} Like the interchange quasielementary~$P$ of \S~\ref{matrix:325.10}, the \emph{scaling quasielementary,} \emph{diagonal matrix} or \emph{general scaling operator}~$D$ consists of a product of zero or more elementary operators, in this case elementary scaling operators:% \footnote{ The letter~$D$ here recalls the adjective ``diagonal.'' } \bq{matrix:325:15} D = \prod_{i=-\infty}^\infty T_{\alpha_i[i]} = \coprod_{i=-\infty}^\infty T_{\alpha_i[i]} = \sum_{i=-\infty}^\infty \alpha_{i} E_{ii} = \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots &{*}& 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 &{*}& 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 &{*}& 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 &{*}& 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 &{*}& \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots } \eq (of course it might be that $\alpha_i = 1$, hence that $T_{\alpha_i[i]} = I$, for some, most or even all~$i$; however, $\alpha_i = 0$ is forbidden by the definition of the scaling elementary). An example is \settowidth\tla{\fn$0$} \[ D = T_{-5[4]}T_{4[2]}T_{7[1]} = \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 7 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 4 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & \makebox[\tla][r]{$-\!5$} & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }. \] This operator resembles~$I$ in that all its entries run down the main diagonal; but these entries, though never zeros, are not necessarily ones, either. They are nonzero scaling factors. The effect of the operator is to scale the rows or columns of the matrix it operates on. The general scaling operator is a particularly simple matrix. Its inverse is evidently \bq{matrix:325:20} D^{-1} = \coprod_{i=-\infty}^\infty T_{(1/\alpha_i)[i]} = \prod_{i=-\infty}^\infty T_{(1/\alpha_i)[i]} = \sum_{i=-\infty}^\infty \frac{E_{ii}}{\alpha_i}, \eq where each element down the main diagonal is individually inverted. \index{diagonal matrix} \index{diag notation, the} A superset of the general scaling operator is the \emph{diagonal matrix,} defined less restrictively that $[A]_{ij} = 0$ for $i \neq j$, where zeros along the main diagonal are allowed. The conventional notation \bqa \left[ \mopx{diag}\{\ve x\} \right]_{ij} &\equiv& \delta_{ij}x_i = \delta_{ij}x_j, \label{matrix:325:diag}\\ \mopx{diag}\{\ve x\} &=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & x_1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & x_2 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & x_3 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & x_4 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & x_5 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \xn \eqa converts a vector~$\ve x$ into a diagonal matrix. The diagonal matrix in general is not invertible and is no quasielementary operator, but is sometimes useful nevertheless. \subsection{Addition quasielementaries} \label{matrix:325.30} \index{quasielementary operator!addition} \index{addition quasielementary} \index{multitarget addition operator} \index{operator!multitarget addition} \index{addition operator!multitarget} Any product of interchange elementaries (\S~\ref{matrix:325.10}), any product of scaling elementaries (\S~\ref{matrix:325.20}), qualifies as a quasielementary operator. Not so, any product of addition elementaries. To qualify as a quasielementary, a product of elementary addition operators must meet some additional restrictions. \index{row addition quasielementary} \index{addition quasielementary!row} \index{quasielementary operator!row addition} Four types of addition quasielementary are defined:% \footnote{ In this subsection the explanations are briefer than in the last two, but the pattern is similar. The reader can fill in the details. } \bi \item \index{downward multitarget addition operator} \index{multitarget addition operator!downward} \index{operator!downward multitarget addition} \index{addition operator!downward multitarget} the \emph{downward multitarget row addition operator,}% \footnote{ The letter~$L$ here recalls the adjective ``lower.'' } \bqa L_{[j]} &=& \prod_{i=j+1}^\infty T_{\alpha_{ij}[ij]} = \coprod_{i=j+1}^\infty T_{\alpha_{ij}[ij]} \label{matrix:325:41} \\&=& I + \sum_{i=j+1}^\infty \alpha_{ij} E_{ij} \xn \\&=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 1 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 &{*}& 1 & 0 & \cdots \\ \cdots & 0 & 0 &{*}& 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \xn \eqa whose inverse is \bqa L_{[j]}^{-1} &=& \coprod_{i=j+1}^\infty T_{-\alpha_{ij}[ij]} = \prod_{i=j+1}^\infty T_{-\alpha_{ij}[ij]} \label{matrix:325:42} \\&=& I - \sum_{i=j+1}^\infty \alpha_{ij} E_{ij} = 2I - L_{[j]}; \xn \eqa \item \index{upward multitarget addition operator} \index{multitarget addition operator!upward} \index{operator!upward multitarget addition} \index{addition operator!upward multitarget} the \emph{upward multitarget row addition operator,}% \footnote{ The letter~$U$ here recalls the adjective ``upper.'' } \bqa U_{[j]} &=& \coprod_{i=-\infty}^{j-1} T_{\alpha_{ij}[ij]} = \prod_{i=-\infty}^{j-1} T_{\alpha_{ij}[ij]} \label{matrix:325:43} \\&=& I + \sum_{i=-\infty}^{j-1} \alpha_{ij} E_{ij} \xn \\&=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 & 0 &{*}& 0 & 0 & \cdots \\ \cdots & 0 & 1 &{*}& 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 1 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \xn \eqa whose inverse is \bqa U_{[j]}^{-1} &=& \prod_{i=-\infty}^{j-1} T_{-\alpha_{ij}[ij]} = \coprod_{i=-\infty}^{j-1} T_{-\alpha_{ij}[ij]} \label{matrix:325:44} \\&=& I - \sum_{i=-\infty}^{j-1} \alpha_{ij} E_{ij} = 2I - U_{[j]}; \xn \eqa \item \index{rightward multitarget addition operator} \index{multitarget addition operator!rightward} \index{operator!rightward multitarget addition} \index{addition operator!rightward multitarget} the \emph{rightward multitarget column addition operator,} which is the transpose~$L_{[j]}^{T}$ of the downward operator; and \item \index{leftward multitarget addition operator} \index{multitarget addition operator!leftward} \index{operator!leftward multitarget addition} \index{addition operator!leftward multitarget} the \emph{leftward multitarget column addition operator,} which is the transpose~$U_{[j]}^{T}$ of the upward operator. \ei % ---------------------------------------------------------------------- \section{The unit triangular matrix} \label{matrix:330} \index{unit triangular matrix} \index{matrix!unit triangular} \index{unit lower triangular matrix} \index{unit upper triangular matrix} \index{lower triangular matrix} \index{upper triangular matrix} \index{matrix!unit lower triangular} \index{matrix!unit upper triangular} Yet more complicated than the quasielementary of \S~\ref{matrix:325} is the \emph{unit triangular matrix,} with which we draw this necessary but tedious chapter toward a long close: \bqa L &=& I + \sum_{i=-\infty}^\infty \, \sum_{j=-\infty}^{i-1} \alpha_{ij}E_{ij} = I + \sum_{j=-\infty}^\infty \, \sum_{i=j+1}^\infty \alpha_{ij}E_{ij} \label{matrix:330:11} \\&=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots &{*}& 1 & 0 & 0 & 0 & \cdots \\ \cdots &{*}&{*}& 1 & 0 & 0 & \cdots \\ \cdots &{*}&{*}&{*}& 1 & 0 & \cdots \\ \cdots &{*}&{*}&{*}&{*}& 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }; \xn\\ U &=& I + \sum_{i=-\infty}^\infty \, \sum_{j=i+1}^\infty \alpha_{ij}E_{ij} = I + \sum_{j=-\infty}^\infty \, \sum_{i=-\infty}^{j-1} \alpha_{ij}E_{ij} \label{matrix:330:12} \\&=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 &{*}&{*}&{*}&{*}& \cdots \\ \cdots & 0 & 1 &{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 1 &{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 1 &{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }. \xn \eqa The former is a \emph{unit lower triangular matrix;} the latter, a \emph{unit upper triangular matrix.} The unit triangular matrix is a generalized addition quasielementary, which adds not only to multiple targets but also from multiple sources---but in one direction only: downward or leftward for~$L$ or~$U^T$ (or~$U^{*}$); upward or rightward for~$U$ or~$L^T$ (or~$L^{*}$). % The following belonged to a deleted passage. The author kind of liked % the expression, though, so it is retained here, commented out. % %\cdots So, this being a book of derivations, %though derivation be tedious, derive the formulas we must, derive the %formulas we will, as follows. \index{triangular matrix} \index{general triangular matrix} \index{strictly triangular matrix} \index{Schur, Issai (1875--1941)} The general \emph{triangular matrix}~$L_S$ or~$U_S$, which by definition can have any values along its main diagonal, is sometimes of interest, as in the Schur decomposition of \S~\ref{eigen:520}.% \footnote{ The subscript~$S$ here stands for Schur. Other books typically use the symbols~$L$ and~$U$ for the general triangular matrix of Schur, but this book distinguishes by the subscript. } The \emph{strictly triangular matrix} $L-I$ or $U-I$ is likewise sometimes of interest, as in Table~\ref{matrix:330:t18}.% \footnote{\cite[``Schur decomposition,'' 00:32, 30~Aug. 2007]{wikip}} However, such matrices cannot in general be expressed as products of elementary operators and this section does not treat them. This section presents and derives the basic properties of the unit triangular matrix. \subsection{Construction} \label{matrix:330.20} \index{unit triangular matrix!construction of} \index{triangular matrix!construction of} \index{matrix!triangular, construction of} To make a unit triangular matrix is straightforward: \bq{matrix:330:20} \begin{split} L &= \coprod_{j=-\infty}^\infty L_{[j]}; \\ U &= \prod_{j=-\infty}^\infty U_{[j]}. \end{split} \eq So long as the multiplication is done in the order indicated,% \footnote{ Recall again from \S~\ref{alggeo:227} that $\prod_k A_k=\cdots A_3 A_2 A_1$, whereas $\coprod_k A_k=A_1 A_2 A_3 \cdots$. This means that $(\prod_k A_k)(C)$ applies first~$A_1$, then~$A_2$, $A_3$ and so on, as row operators to~$C$; whereas $(C)(\coprod_k A_k)$ applies first~$A_1$, then~$A_2$, $A_3$ and so on, as column operators to~$C$. The symbols~$\prod$ and~$\coprod$ as this book uses them can thus be thought of respectively as row and column sequencers. } then conveniently, \bq{matrix:330:25} \begin{split} \Big[L\Big]_{ij} &= \Big[L_{[j]}\Big]_{ij}, \\ \Big[U\Big]_{ij} &= \Big[U_{[j]}\Big]_{ij}, \end{split} \eq which is to say that the entries of~$L$ and~$U$ are respectively nothing more than the relevant entries of the several~$L_{[j]}$ and~$U_{[j]}$. Equation~(\ref{matrix:330:25}) enables one to use~(\ref{matrix:330:20}) immediately and directly, without calculation, to build any unit triangular matrix desired. The correctness of~(\ref{matrix:330:25}) is most easily seen if the several~$L_{[j]}$ and~$U_{[j]}$ are regarded as column operators acting sequentially on~$I$: \[ \begin{split} L &= (I)\left(\coprod_{j=-\infty}^\infty L_{[j]}\right); \\ U &= (I)\left(\prod_{j=-\infty}^\infty U_{[j]}\right). \end{split} \] The reader can construct an inductive proof symbolically on this basis without too much difficulty if desired, but just thinking about how~$L_{[j]}$ adds columns leftward and~$U_{[j]}$, rightward, then considering the order in which the several~$L_{[j]}$ and~$U_{[j]}$ act,~(\ref{matrix:330:25}) follows at once. \subsection{The product of like unit triangular matrices} \label{matrix:330.30} The product of like unit triangular matrices, \bq{matrix:330:30} \begin{split} L_1 L_2 &= L, \\ U_1 U_2 &= U, \end{split} \eq is another unit triangular matrix of the same type. The proof for unit lower and unit upper triangular matrices is the same. In the unit lower triangular case, one starts from a form of the definition of a unit lower triangular matrix: \[ [L_1]_{ij}\ \mbox{or}\ [L_2]_{ij} = \begin{cases} 0 &\mbox{if $i < j$,} \\ 1 &\mbox{if $i = j$.} \end{cases} \] Then, \[ [ L_1 L_2 ]_{ij} = \sum_{m=-\infty}^\infty [L_1]_{im} [L_2]_{mj}. \] But as we have just observed,~$[L_1]_{im}$ is null when $i < m$, and~$[L_2]_{mj}$ is null when $m < j$. Therefore, \[ [ L_1 L_2 ]_{ij} = \begin{cases} 0 &\mbox{if $i < j$,} \\ \sum_{m=j}^i [L_1]_{im} [L_2]_{mj} &\mbox{if $i \ge j$.} \end{cases} \] Inasmuch as this is true, nothing prevents us from weakening the statement to read \[ [ L_1 L_2 ]_{ij} = \begin{cases} 0 &\mbox{if $i < j$,} \\ \sum_{m=j}^i [L_1]_{im} [L_2]_{mj} &\mbox{if $i = j$.} \end{cases} \] But this is just \[ [ L_1 L_2 ]_{ij} = \begin{cases} 0 &\mbox{if $i < j$,} \\ [L_1]_{ij} [L_2]_{ij} = [L_1]_{ii} [L_2]_{ii} = (1)(1) = 1 &\mbox{if $i = j$,} \end{cases} \] which again is the very definition of a unit lower triangular matrix. Hence % bad break (\ref{matrix:330:30}). \subsection{Inversion} \label{matrix:330.40} Inasmuch as any unit triangular matrix can be constructed from addition quasielementaries by~(\ref{matrix:330:20}), inasmuch as~(\ref{matrix:330:25}) supplies the specific quasielementaries, and inasmuch as~(\ref{matrix:325:42}) or~(\ref{matrix:325:44}) gives the inverse of each such quasielementary, one can always invert a unit triangular matrix easily by \bq{matrix:330:40} \begin{split} L^{-1} &= \prod_{j=-\infty}^\infty L_{[j]}^{-1}, \\ U^{-1} &= \coprod_{j=-\infty}^\infty U_{[j]}^{-1}. \end{split} \eq In view of~(\ref{matrix:330:30}), therefore, \emph{the inverse of a unit lower triangular matrix is another unit lower triangular matrix; and the inverse of a unit upper triangular matrix, another unit upper triangular matrix.} It is plain to see but still interesting to note that---unlike the inverse---the adjoint or transpose of a unit lower triangular matrix is a unit upper triangular matrix; and that the adjoint or transpose of a unit upper triangular matrix is a unit lower triangular matrix. The adjoint reverses the sense of the triangle. \subsection{The parallel unit triangular matrix} \label{matrix:330.50} \index{parallel unit triangular matrix} \index{triangular matrix!unit parallel} \index{matrix!unit parallel triangular} If a unit triangular matrix fits the special, restricted form \settoheight\tlj{{\scriptsize $k$}} \bqa \settowidth\tla{$,$} L_\|^{\{k\}} &=& I + \sum_{\rule{0pt}{\tlj}j=-\infty}^k \, \sum_{i=k+1}^\infty \alpha_{ij} E_{ij} \label{matrix:330:51} \\ &=& \mf{ccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&0&0&0&0&\cdots\\ \cdots&0&1&0&0&0&\cdots\\ \cdots&0&0&1&0&0&\cdots\\ \cdots&{*}&{*}&{*}&1&0&\cdots\\ \cdots&{*}&{*}&{*}&0&1&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }\makebox[\tla][l]{} \xn \eqa or \settoheight\tlj{{\scriptsize $k$}} \bqa U_\|^{\{k\}} &=& I + \sum_{j=k}^{\infty} \, \sum_{\rule{0pt}{\tlj}i=-\infty}^{k-1} \alpha_{ij} E_{ij} \label{matrix:330:52} \\ &=& \mf{ccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&0&{*}&{*}&{*}&\cdots\\ \cdots&0&1&{*}&{*}&{*}&\cdots\\ \cdots&0&0&1&0&0&\cdots\\ \cdots&0&0&0&1&0&\cdots\\ \cdots&0&0&0&0&1&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }\makebox[\tla][l]{,} \xn \eqa confining its nonzero elements to a rectangle within the triangle as shown, then it is a \emph{parallel unit triangular matrix} and has some special properties the general unit triangular matrix lacks. \index{source} \index{target} \index{frontier} The general unit lower triangular matrix~$L$ acting~$LA$ on a matrix~$A$ adds the rows of~$A$ downward. The parallel unit lower triangular matrix $L_\|^{\{k\}}$ acting $L_\|^{\{k\}}A$ also adds rows downward, but with the useful restriction that it makes no row of~$A$ both source and target. The addition is \emph{from} $A$'s rows through the $k$th, \emph{to} $A$'s $(k+1)$th row onward. A horizontal frontier separates source from target, which thus march in~$A$ as separate squads. Similar observations naturally apply with respect to the parallel unit upper triangular matrix $U_\|^{\{k\}}$, which acting $U_\|^{\{k\}}A$ adds rows upward, and also with respect to $L_\|^{\{k\}T}$ and $U_\|^{\{k\}T}$, which acting $AL_\|^{\{k\}T}$ and $AU_\|^{\{k\}T}$ add columns respectively rightward and leftward (remembering that $L_\|^{\{k\}T}$ is no unit lower but a unit upper triangular matrix; that $U_\|^{\{k\}T}$ is the lower). Each separates source from target in the matrix~$A$ it operates on. The reason we care about the separation of source from target is that, in matrix arithmetic generally, where source and target are not separate but remain intermixed, the sequence matters in which rows or columns are added. That is, in general, \[ T_{\alpha_1[i_1j_1]}T_{\alpha_2[i_2j_2]} \neq I + \alpha_1E_{i_1j_1} + \alpha_2E_{i_2j_2} \neq T_{\alpha_2[i_2j_2]}T_{\alpha_1[i_1j_1]}. \] It makes a difference whether the one addition comes before, during or after the other---but only because the target of the one addition might be the source of the other. The danger is that $i_1=j_2$ or $i_2=j_1$. Remove this danger, and the sequence ceases to matter (refer to Table~\ref{matrix:Txchg1}). That is exactly what the parallel unit triangular matrix does: it separates source from target and thus removes the danger. It is for this reason that the parallel unit triangular matrix brings the useful property that \settoheight\tlj{{\scriptsize $k$}} \bq{matrix:330:55} \begin{split} L_\|^{\{k\}} &= I + \sum_{\rule{0pt}{\tlj}j=-\infty}^k \, \sum_{i=k+1}^\infty \alpha_{ij} E_{ij} \\ &= \coprod_{\rule{0pt}{\tlj}j=-\infty}^k \, \prod_{i=k+1}^\infty T_{\alpha_{ij}[ij]} = \coprod_{\rule{0pt}{\tlj}j=-\infty}^k \, \coprod_{i=k+1}^\infty T_{\alpha_{ij}[ij]} \\ &= \prod_{\rule{0pt}{\tlj}j=-\infty}^k \, \prod_{i=k+1}^\infty T_{\alpha_{ij}[ij]} = \prod_{\rule{0pt}{\tlj}j=-\infty}^k \, \coprod_{i=k+1}^\infty T_{\alpha_{ij}[ij]} \\ &= \prod_{i=k+1}^\infty \, \coprod_{\rule{0pt}{\tlj}j=-\infty}^k T_{\alpha_{ij}[ij]} = \coprod_{i=k+1}^\infty \, \coprod_{\rule{0pt}{\tlj}j=-\infty}^k T_{\alpha_{ij}[ij]} \\ &= \prod_{i=k+1}^\infty \, \prod_{\rule{0pt}{\tlj}j=-\infty}^k T_{\alpha_{ij}[ij]} = \coprod_{i=k+1}^\infty \, \prod_{\rule{0pt}{\tlj}j=-\infty}^k T_{\alpha_{ij}[ij]}, \\ U_\|^{\{k\}} &= I + \sum_{j=k}^\infty \, \sum_{\rule{0pt}{\tlj}i=-\infty}^{k-1} \alpha_{ij} E_{ij} \\ &= \prod_{j=k}^\infty \, \coprod_{\rule{0pt}{\tlj}i=-\infty}^{k-1} T_{\alpha_{ij}[ij]} = \cdots, \end{split} \eq which says that one can build a parallel unit triangular matrix equally well in any sequence---in contrast to the case of the general unit triangular matrix, whose construction per~(\ref{matrix:330:20}) one must sequence carefully. (Though eqn.~\ref{matrix:330:55} does not show them, even more sequences are possible. You can scramble the factors' ordering any random way you like. The multiplication is fully commutative.) Under such conditions, the inverse of the parallel unit triangular matrix is particularly simple:% \footnote{ There is some odd parochiality at play in applied mathematics when one calls such collections of symbols as~(\ref{matrix:330:56}) ``particularly simple.'' Nevertheless, in the present context the idea~(\ref{matrix:330:56}) represents is indeed simple: that one can multiply constituent elementaries in any order and still reach the same parallel unit triangular matrix; that the elementaries in this case do not interfere. } \settoheight\tlj{{\scriptsize $k$}} \bq{matrix:330:56} \begin{split} L_\|^{\{k\}\,\mbox{\scriptsize$-1$}} &= I - \sum_{\rule{0pt}{\tlj}j=-\infty}^k \, \sum_{i=k+1}^\infty \alpha_{ij} E_{ij} = 2I - L_\|^{\{k\}} \\ &= \prod_{\rule{0pt}{\tlj}j=-\infty}^k \, \coprod_{i=k+1}^\infty T_{-\alpha_{ij}[ij]} = \cdots, \\ U_\|^{\{k\}\,\mbox{\scriptsize$-1$}} &= I - \sum_{j=k}^{\infty} \, \sum_{\rule{0pt}{\tlj}i=-\infty}^{k-1} \alpha_{ij} E_{ij} = 2I - U_\|^{\{k\}} \\ &= \coprod_{j=k}^{\infty} \, \prod_{\rule{0pt}{\tlj}i=-\infty}^{k-1} T_{-\alpha_{ij}[ij]} = \cdots, \end{split} \eq where again the elementaries can be multiplied in any order. Pictorially, { \settowidth\tla{\fn${0}$} \nc\xx{\makebox[\tla][r]{$-{*}$}} \bqb L_\|^{\{k\}\,\mbox{\scriptsize$-1$}} &=& \mf{ccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&\,1\,&\,0\,&\,0\,&\,0\,&\,0\,&\cdots\\ \cdots&0&1&0&0&0&\cdots\\ \cdots&0&0&1&0&0&\cdots\\ \cdots&\xx&\xx&\xx&1&0&\cdots\\ \cdots&\xx&\xx&\xx&0&1&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }, \\ U_\|^{\{k\}\,\mbox{\scriptsize$-1$}} &=& \mf{ccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&0&\xx&\xx&\xx&\cdots\\ \cdots&0&1&\xx&\xx&\xx&\cdots\\ \cdots&0&0&1&0&0&\cdots\\ \cdots&0&0&0&1&0&\cdots\\ \cdots&\,0\,&\,0\,&\,0\,&\,0\,&\,1\,&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }. \eqb }% The inverse of a parallel unit triangular matrix is just the matrix itself, only with each element off the main diagonal negated. Table~\ref{matrix:330:t18} records a few properties that come immediately of the last observation and from the parallel unit triangular matrix's basic layout. % This book source has many idiosyncratic labels, but ":t18"? Well, the % table was originally equation ":18", by after several revisions it % had grown too long and became Table ":t18". Thus the label. In % general, the source's idiosyncratic labels---like this comment, never % seen by the reader---are slightly annoying but not nearly annoying % enough to merit a labor-intensive general revision. Really good % labels would have given a clue of what their equations are about, but % *exactly* what an equation is about is sometimes not entirely clear to % the author until he has typeset the equation and the surrounding % narrative. More importantly, revisions of the text often leave % meaningful labels obsolete. No. Experience teaches that in a % document of this size, meaningless numeric labels can be more % practical for most numbered equations, and are acceptable for many % tables, figures, etc., as well. Experience also teaches that it is % usually not a good idea to risk revising labels after the labels have % been assigned, because every label revision risks breaking---or, % worse, mislinking---some forgotten reference from somewhere else in % the book. In this, a document source differs from a program: % programs benefit from tighter organization. \begin{table} \caption[Properties of the parallel unit triangular matrix.]{ Properties of the parallel unit triangular matrix. (In the table, the notation~$I_a^b$ represents the generalized dimension-limited indentity matrix or truncator of eqn.~\ref{matrix:180:IMC}. Note that the inverses $L_\|^{\{k\}\,\mbox{\scriptsize$-1$}} = L_\|^{\{k\}'}$ and $U_\|^{\{k\}\,\mbox{\scriptsize$-1$}} = U_\|^{\{k\}'}$ are parallel unit triangular matrices themselves, such that the table's properties hold for them, too.) } \label{matrix:330:t18} \index{parallel unit triangular matrix!properties of} \index{unit triangular matrix!parallel, properties of} \index{triangular matrix!parallel, properties of} \index{matrix!parallel unit triangular, properties of} \[ \frac{ L_\|^{\{k\}} + L_\|^{\{k\}\,\mbox{\scriptsize$-1$}} }{2} = I = \frac{ U_\|^{\{k\}} + U_\|^{\{k\}\,\mbox{\scriptsize$-1$}} }{2} \] \[ \renewcommand\arraystretch{1.3} \br{rcccl} \ds I_{k+1}^\infty L_\|^{\{k\}} I_{-\infty}^k &=& \ds L_\|^{\{k\}}-I &=& \ds I_{k+1}^\infty (L_\|^{\{k\}}-I) I_{-\infty}^k \\ \ds I_{-\infty}^{k-1} U_\|^{\{k\}} I_k^\infty &=& \ds U_\|^{\{k\}}-I &=& \ds I_{-\infty}^{k-1} (U_\|^{\{k\}}-I) I_k^\infty \er \] \nc\xxa{If $L_\|^{\{k\}}$ honors an $n \times n$ active region, then} \nc\xxb{$(I_n-I_k) L_\|^{\{k\}} I_k = L_\|^{\{k\}}-I = (I_n-I_k) (L_\|^{\{k\}}-I) I_k$} \nc\xxe{and $(I-I_n)(L_\|^{\{k\}}-I) = 0 = (L_\|^{\{k\}}-I)(I-I_n)$.} \nc\xxc{If $U_\|^{\{k\}}$ honors an $n \times n$ active region, then} \nc\xxd{$I_{k-1} U_\|^{\{k\}} (I_n-I_{k-1}) = U_\|^{\{k\}}-I = I_{k-1} (U_\|^{\{k\}}-I) (I_n-I_{k-1})$} \nc\xxf{and $(I-I_n)(U_\|^{\{k\}}-I) = 0 = (U_\|^{\{k\}}-I)(I-I_n)$.} \settowidth\tla{\xxc} \settowidth\tlb{\xxd} \settowidth\tlc{\xxf} \[ \renewcommand\arraystretch{1.3} \br{l} \makebox[\tla][l]{\xxa} \\\ \ \ \ \makebox[\tlb][l]{\xxb} \\\ \ \ \ \makebox[\tlc][l]{\xxe} \er \] \[ \renewcommand\arraystretch{1.3} \br{l} \makebox[\tla][l]{\xxc} \\\ \ \ \ \makebox[\tlb][l]{\xxd} \\\ \ \ \ \makebox[\tlc][l]{\xxf} \er \] \end{table} \subsection{The partial unit triangular matrix} \label{matrix:330.05} \index{matrix!unit triangular, partial} \index{unit triangular matrix!partial} \index{triangular matrix!partial} \index{partial unit triangular matrix} Besides the notation~$L$ and~$U$ for the general unit lower and unit upper triangular matrices and the notation~$L_\|^{\{k\}}$ and~$U_\|^{\{k\}}$ for the parallel unit lower and unit upper triangular matrices, we shall find it useful to introduce the additional notation \settoheight\tlj{{\scriptsize $k$}} \bqa L^{[k]} &=& I + \sum_{j=k}^\infty \, \sum_{\rule{0pt}{\tlj}i=j+1}^\infty \alpha_{ij}E_{ij} \label{matrix:330:13} \\&=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 1 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 &{*}& 1 & 0 & \cdots \\ \cdots & 0 & 0 &{*}&{*}& 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \xn\\ U^{[k]} &=& I + \sum_{j=-\infty}^k \, \sum_{i=-\infty}^{j-1} \alpha_{ij}E_{ij} \label{matrix:330:14} \\&=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 &{*}&{*}& 0 & 0 & \cdots \\ \cdots & 0 & 1 &{*}& 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 1 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots } \xn \eqa for unit triangular matrices whose off-diagonal content is confined to a narrow wedge and \settoheight\tlj{{\scriptsize $k$}} \bqa \xn \\ L^{\{k\}} &=& I + \sum_{j=-\infty}^k \, \sum_{i=j+1}^\infty \alpha_{ij}E_{ij} \label{matrix:330:15} \\&=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots &{*}& 1 & 0 & 0 & 0 & \cdots \\ \cdots &{*}&{*}& 1 & 0 & 0 & \cdots \\ \cdots &{*}&{*}&{*}& 1 & 0 & \cdots \\ \cdots &{*}&{*}&{*}& 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots }, \xn\\ U^{\{k\}} &=& I + \sum_{j=k}^\infty \, \sum_{\rule{0pt}{\tlj}i=-\infty}^{j-1} \alpha_{ij}E_{ij} \label{matrix:330:16} \\&=& \mf{ccccccc}{ \ddots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ \cdots & 1 & 0 &{*}&{*}&{*}& \cdots \\ \cdots & 0 & 1 &{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 1 &{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 1 &{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 & 1 & \cdots \\ & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots } \xn \eqa for the supplementary forms.% \footnote{ The notation is arguably imperfect in that $L^{\{k\}}+L^{[k]}-I \neq L$ but rather that $L^{\{k\}}+L^{[k+1]}-I = L$. The conventional notation $\sum_{k=a}^b f(k) + \sum_{k=b}^c f(k) \neq \sum_{k=a}^c f(k)$ suffers the same arguable imperfection. } Such notation is not standard in the literature, but it serves a purpose in this book and is introduced here for this reason. If names are needed for $L^{[k]}$, $U^{[k]}$, $L^{\{k\}}$ and $U^{\{k\}}$, the former pair can be called \emph{minor partial unit triangular matrices,} and the latter pair, \emph{major partial unit triangular matrices.} Whether minor or major, the partial unit triangular matrix is a matrix which leftward or rightward of the $k$th column resembles~$I$. Of course partial unit triangular matrices which resemble~$I$ above or below the $k$th \emph{row} are equally possible, and can be denoted $L^{[k]T}$, $U^{[k]T}$, $L^{\{k\}T}$ and $U^{\{k\}T}$. Observe that the parallel unit triangular matrices~$L_\|^{\{k\}}$ and~$U_\|^{\{k\}}$ of \S~\ref{matrix:330.50} are in fact also major partial unit triangular matrices, as the notation suggests. % ---------------------------------------------------------------------- \section{The shift operator} \label{matrix:340} \index{shift operator} Not all useful matrices fit the dimension-limited and extended-operational forms of \S~\ref{matrix:180}. An exception is the \emph{shift operator}~$H_k$, defined that \bq{matrix:340:10} [H_k]_{ij} = \delta_{i(j+k)}. \eq For example, \[ H_2 = \mf{ccccccccc}{ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&0&0&0&0&0&0&0&\cdots\\ \cdots&0&0&0&0&0&0&0&\cdots\\ \cdots&1&0&0&0&0&0&0&\cdots\\ \cdots&0&1&0&0&0&0&0&\cdots\\ \cdots&0&0&1&0&0&0&0&\cdots\\ \cdots&0&0&0&1&0&0&0&\cdots\\ \cdots&0&0&0&0&1&0&0&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots& }. \] Operating~$H_kA$, $H_k$ shifts $A$'s rows downward~$k$ steps. Operating~$AH_k$, $H_k$ shifts $A$'s columns leftward~$k$ steps. Inasmuch as the shift operator shifts all rows or columns of the matrix it operates on, its active region is $\infty \times \infty$ in extent. Obviously, the shift operator's inverse, transpose and adjoint are the same: \bq{matrix:340:60} \renewcommand\arraystretch{1.3} \br{c} H_k^T H_k = H_k^{*} H_k = I = H_k H_k^{*} = H_k H_k^T, \\ H_k^{-1} = H_k^{T} = H_k^{*} = H_{-k}. \er \eq Further obvious but useful identities include that \bq{matrix:340:61} \begin{split} (I_\ell-I_k)H_k &= H_kI_{\ell-k}, \\ H_{-k}(I_\ell-I_k) &= I_{\ell-k}H_{-k}. \end{split} \eq % ---------------------------------------------------------------------- \section{The Jacobian derivative} \label{matrix:350} \index{Jacobian derivative} \index{derivative!Jacobian} \index{Jacobi, Carl Gustav Jacob (1804--1851)} Chapter~\ref{drvtv} has introduced the derivative of a function with respect to a scalar variable. One can also take the derivative of a function with respect to a vector variable, and the function itself can be vector-valued. The derivative is \bq{matrix:350:Jacobian} \left[\frac{d\ve f}{d\ve x}\right]_{ij} = \frac{\partial f_i}{\partial x_j}. \eq For instance, if~$\ve x$ has three elements and~$\ve f$ has two, then \[ \renewcommand\arraystretch{2.0} \frac{d\ve f}{d\ve x} = \mf{ccc}{ \ds\frac{\partial f_1}{\partial x_1} & \ds\frac{\partial f_1}{\partial x_2} & \ds\frac{\partial f_1}{\partial x_3} \\ \ds\frac{\partial f_2}{\partial x_1} & \ds\frac{\partial f_2}{\partial x_2} & \ds\frac{\partial f_2}{\partial x_3} }. \] This is called the \emph{Jacobian derivative,} the \emph{Jacobian matrix,} or just the \emph{Jacobian.}% \footnote{\cite[``Jacobian,'' 00:50, 15 Sept. 2007]{wikip}} Each of its columns is the derivative with respect to one element of~$\ve x$. The Jacobian derivative of a vector with respect to itself is \bq{matrix:350:Jacobian-self} \frac{d\ve x}{d\ve x} = I. \eq The derivative is not~$I_n$ as one might think, because, even if~$\ve x$ has only~$n$ elements, still, one could vary $x_{n+1}$ in principle, and $\partial x_{n+1}/\partial x_{n+1} \neq 0$. \index{derivative!product rule for} \index{product rule, derivative} The Jacobian derivative obeys the derivative product rule~(\ref{drvtv:prod2}) in the form% \footnote{ Notice that the last term on~(\ref{matrix:350:Jacobian-prod})'s second line is transposed, not adjointed. } \settowidth\tla{\scriptsize${*}$} \bq{matrix:350:Jacobian-prod} \begin{split} \frac{d}{d\ve x}\big(\ve g^{\makebox[\tla][l]{\scriptsize${T}$}} A \ve f\big) &= \Bigg[ \ve g^{\makebox[\tla][l]{\scriptsize${T}$}} A \left(\frac{d\ve f}{d\ve x}\right) \Bigg] + \Bigg[ \left(\frac{d\ve g}{d\ve x}\right)^{\makebox[\tla][l]{\scriptsize${T}$}} A \ve f \Bigg]^{T}, \\ \frac{d}{d\ve x}\big(\ve g^{\makebox[\tla][l]{\scriptsize${*}$}} A \ve f\big) &= \Bigg[ \ve g^{\makebox[\tla][l]{\scriptsize${*}$}} A \left(\frac{d\ve f}{d\ve x}\right) \Bigg] + \Bigg[ \left(\frac{d\ve g}{d\ve x}\right)^{\makebox[\tla][l]{\scriptsize${*}$}} A \ve f \Bigg]^{T}, \end{split} \eq valid for any constant matrix~$A$---as is seen by applying the definition~(\ref{drvtv:defz}) of the derivative, which here is \[ \frac{\partial \left(\ve g^{*} A \ve f\right)}{\partial x_j} = \lim_{\partial x_j\rightarrow 0} \frac{ (\ve g + \partial \ve g/2)^{*}A(\ve f + \partial \ve f/2) -(\ve g - \partial \ve g/2)^{*}A(\ve f - \partial \ve f/2) }{\partial x_j}, \] and simplifying. % ---------------------------------------------------------------------- The shift operator of \S~\ref{matrix:340} and the Jacobian derivative of this section complete the family of matrix rudiments we shall need to begin to do increasingly interesting things with matrices in Chs.~\ref{mtxinv} and~\ref{eigen}. Before doing interesting things, however, we must treat two more foundational matrix matters. The two are the Gauss-Jordan decomposition and the matter of matrix rank, which will be the subjects of Ch.~\ref{gjrank}, next. derivations-0.53.20120414.orig/tex/xkvtxhdr.tex0000644000000000000000000000541411742575144017543 0ustar rootroot%% %% This is file `xkvtxhdr.tex', %% generated with the docstrip utility. %% %% The original source files were: %% %% xkeyval.dtx (with options: `xkvheader') %% %% --------------------------------------- %% Copyright (C) 2004-2008 Hendri Adriaens %% --------------------------------------- %% %% This work may be distributed and/or modified under the %% conditions of the LaTeX Project Public License, either version 1.3 %% of this license or (at your option) any later version. %% The latest version of this license is in %% http://www.latex-project.org/lppl.txt %% and version 1.3 or later is part of all distributions of LaTeX %% version 2003/12/01 or later. %% %% This work has the LPPL maintenance status "maintained". %% %% This Current Maintainer of this work is Hendri Adriaens. %% %% This work consists of the file xkeyval.dtx and derived files %% keyval.tex, xkvtxhdr.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, pst-xkey.tex, pst-xkey.sty, xkveca.cls, xkvecb.cls, %% xkvesa.sty, xkvesb.sty, xkvesc.sty, xkvex1.tex, xkvex2.tex, %% xkvex3.tex and xkvex4.tex. %% %% The following files constitute the xkeyval bundle and must be %% distributed as a whole: readme, xkeyval.pdf, keyval.tex, %% pst-xkey.sty, pst-xkey.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, xkvtxhdr.tex, pst-xkey.dtx and xkeyval.dtx. %% %% %% Taken from latex.ltx. %% \message{2005/02/22 v1.1 xkeyval TeX header (HA)} \def\@nnil{\@nil} \def\@empty{} \def\newif#1{% \count@\escapechar \escapechar\m@ne \let#1\iffalse \@if#1\iftrue \@if#1\iffalse \escapechar\count@} \def\@if#1#2{% \expandafter\def\csname\expandafter\@gobbletwo\string#1% \expandafter\@gobbletwo\string#2\endcsname {\let#1#2}} \long\def\@ifnextchar#1#2#3{% \let\reserved@d=#1% \def\reserved@a{#2}% \def\reserved@b{#3}% \futurelet\@let@token\@ifnch} \def\@ifnch{% \ifx\@let@token\@sptoken \let\reserved@c\@xifnch \else \ifx\@let@token\reserved@d \let\reserved@c\reserved@a \else \let\reserved@c\reserved@b \fi \fi \reserved@c} \def\:{\let\@sptoken= } \: % this makes \@sptoken a space token \def\:{\@xifnch} \expandafter\def\: {\futurelet\@let@token\@ifnch} \let\kernel@ifnextchar\@ifnextchar \long\def\@testopt#1#2{% \kernel@ifnextchar[{#1}{#1[{#2}]}} \long\def\@firstofone#1{#1} \long\def \@gobble #1{} \long\def \@gobbletwo #1#2{} \def\@expandtwoargs#1#2#3{% \edef\reserved@a{\noexpand#1{#2}{#3}}\reserved@a} \edef\@backslashchar{\expandafter\@gobble\string\\} \newif\ifin@ \def\in@#1#2{% \def\in@@##1#1##2##3\in@@{% \ifx\in@##2\in@false\else\in@true\fi}% \in@@#2#1\in@\in@@} \def\strip@prefix#1>{} \def \@onelevel@sanitize #1{% \edef #1{\expandafter\strip@prefix \meaning #1}% } \endinput %% %% End of file `xkvtxhdr.tex'. derivations-0.53.20120414.orig/tex/cubic.tex0000644000000000000000000007437111742566274016762 0ustar rootroot% ---------------------------------------------------------------------- \chapter{Cubics and quartics} \label{cubic} \index{root} \index{cubic expression} \index{quartic expression} \index{algebra!higher-order} \index{higher-order algebra} Under the heat of noonday, between the hard work of the morning and the heavy lifting of the afternoon, one likes to lay down one's burden and rest a spell in the shade. Chapters~\ref{alggeo} through~\ref{inttx} have established the applied mathematical foundations upon which coming chapters will build; and Ch.~\ref{matrix}, hefting the weighty topic of the matrix, will indeed begin to build on those foundations. But in this short chapter which rests between, we shall refresh ourselves with an interesting but lighter mathematical topic: the topic of cubics and quartics. \index{linear expression} \index{quadratic expression} \index{Newton-Raphson iteration} \index{antiquity} The expression \[ z + a_0 \] is a \emph{linear} polynomial, the lone root $z = -a_0$ of which is plain to see. The \emph{quadratic} polynomial \[ z^2 + a_1z + a_0 \] has of course two roots, which though not plain to see the quadratic formula~(\ref{alggeo:240}) extracts with little effort. So much algebra has been known since antiquity. The roots of higher-order polynomials, the Newton-Raphson iteration~(\ref{drvtv:NR}) locates swiftly, but that is an approximate iteration rather than an exact formula like~(\ref{alggeo:240}), and as we have seen in \S~\ref{drvtv:270} it can occasionally fail to converge. One would prefer an actual formula to extract the roots. \index{Cardano, Girolamo (also known as Cardanus or Cardan, 1501--1576)} \index{Tartaglia, Niccol\`o Fontana (1499--1557)} \index{Ferrari, Lodovico (1522--1565)} \index{Vieta, Franciscus (Fran\c cois Vi\`ete, 1540--1603)} \index{$n$th-order expression} \index{16th century} No general formula to extract the roots of the $n$th-order polynomial seems to be known.% \footnote{ Refer to Ch.~\ref{noth}'s footnote~\ref{noth:320:fn20}. } However, to extract the roots of the \emph{cubic} and \emph{quartic} polynomials \bqb z^3 + a_2z^2 + a_1z + a_0, && \\ z^4 + a_3z^3 + a_2z^2 + a_1z + a_0, && \eqb though the ancients never discovered how, formulas do exist. The 16th-century algebraists Ferrari, Vieta, Tartaglia and Cardano have given us the clever technique. This chapter explains.% \footnote{% \cite[``Cubic equation'']{EWW}% \cite[``Quartic equation'']{EWW}% \cite[``Quartic equation,'' 00:26, 9~Nov.\ 2006]{wikip}% \cite[``Fran\c cois Vi\`ete,'' 05:17, 1~Nov.\ 2006]{wikip}% \cite[``Gerolamo Cardano,'' 22:35, 31~Oct.\ 2006]{wikip}% \cite[\S~1.5]{SRW} } % There is a controversy regarding Cardano, as to whether he plagiarized % his most celebrated results. The author does not know the facts of it % but feels some unaccountable suspicion regarding the accusation. Why % we should believe the accuser and not the accused in this case % remains unclear to the author, whose intuition warns him to suspect % magnification of the controversy by someone who likes to magnify. (If % Tartaglia really did tell Cardano his secrets and then swear Cardano % to secrecy, we have only Tartaglia's putative word for it as far as % the author knows. Besides, swearing someone to secrecy? What is % Tartaglia supposed to have been, a would-be member of the ancient % Pythagorean cult? If so, that's a bit weird. Why didn't Tartaglia % just publish his formula, if he was so worried about losing credit for % it, eh? Well, the author wonders if Tartaglia and Cardano weren't % both pretty normal, in fact, and if the story of their conflict hasn't % received some---shall we say---dramatic embellishment.) The author % does not wish to be unnecessarily credulous, nor is he really % interested in researching the matter; so, in this book, Cardano is % treated as innocent until proven guilty. Tartaglia, too. % % Well, enough of that. Back to the book. % ---------------------------------------------------------------------- \section{Vieta's transform} \label{cubic:200} \index{Vieta's transform} \index{Vieta, Franciscus (Fran\c cois Vi\`ete, 1540--1603)} \index{Vieta's substitution} There is a sense to numbers by which~$1/2$ resembles~$2$, $1/3$ resembles~$3$, $1/4$ resembles~$4$, and so forth. To capture this sense, one can transform a function $f(z)$ into a function $f(w)$ by the change of variable% \footnote{ This change of variable broadly recalls the sum-of-exponentials form~(\ref{cexp:250:cosh}) of the $\cosh(\cdot)$ function, inasmuch as $\exp[-\phi] = 1/\exp\phi$. } \[ w + \frac{1}{w} \la z, \] or, more generally, \bq{cubic:200:10} w + \frac{w_o^2}{w} \la z. \eq Equation~(\ref{cubic:200:10}) is \emph{Vieta's transform.}% \footnote{ Also called ``Vieta's substitution.''\ % \cite[``Vieta's substitution'']{EWW} } \index{corner value} For $\left|w\right| \gg \left|w_o\right|$, we have that $z \approx w$; but as~$\left|w\right|$ approaches~$\left|w_o\right|$ this ceases to be true. For $\left|w\right| \ll \left|w_o\right|$, $z \approx w_o^2/w$. The constant~$w_o$ is the \emph{corner value,} in the neighborhood of which~$w$ transitions from the one domain to the other. Figure~\ref{cubic:200:Vieta-fig} plots Vieta's transform for real~$w$ in the case $w_o=1$. \begin{figure} \caption[Vieta's transform, plotted logarithmically.] {Vieta's transform~(\ref{cubic:200:10}) for $w_o=1$, plotted logarithmically.} \label{cubic:200:Vieta-fig} \bc \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-1.0} \nc\fyb{3.7} \nc\xxa{3.0} \nc\xxb{0.6} \nc\xxc{2.3} \nc\xxd{2.4} \nc\xxm{0.10} \nc\xxn{0.90} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxa,0)(\xxa,0) \psline(0,-\xxb)(0,\xxa) \uput[r](\xxa,0){$\ln w$} \uput[u](0,\xxa){$\ln z$} \psline[linestyle=dashed](0,0)( \xxc,\xxc) \psline[linestyle=dashed](0,0)(-\xxc,\xxc) \psline( \xxn,-\xxm)( \xxn,\xxm) \psline(-\xxn,-\xxm)(-\xxn,\xxm) \psline(-\xxm,\xxn)(\xxm,\xxn) } \psplot[linewidth=2.0pt,plotpoints=200]{-\xxd}{\xxd}{ 2.7183 x 0.9 div exp dup 1.0 exch div add ln 0.9 mul } } \end{pspicture} \ec \end{figure} \index{Vieta's parallel transform} \index{parallel addition} \index{addition!parallel} An interesting alternative to Vieta's transform is \bq{cubic:200:15} w \,\|\, \frac{w_o^2}{w} \la z, \eq which in light of \S~\ref{noth:420} might be named \emph{Vieta's parallel transform.} Section~\ref{cubic:220} shows how Vieta's transform can be used. % ---------------------------------------------------------------------- \section{Cubics} \label{cubic:220} \index{cubic expression} The general cubic polynomial is too hard to extract the roots of directly, so one begins by changing the variable \bq{cubic:220:10} x + h \la z \eq to obtain the polynomial \[ x^3 + (a_2+3h)x^2 + (a_1+2ha_2+3h^2)x + (a_0+ha_1+h^2a_2+h^3). \] The choice \bq{cubic:220:15} h \equiv -\frac{a_2}{3} \eq casts the polynomial into the improved form \[ x^3 + \left[a_1-\frac{a_2^2}{3}\right]x + \left[a_0-\frac{a_1a_2}{3}+2\left(\frac{a_2}{3}\right)^3\right], \] or better yet \[ x^3 - px - q, \] where \bq{cubic:220:20} \begin{split} p &\equiv -a_1+\frac{a_2^2}{3}, \\ q &\equiv -a_0+\frac{a_1a_2}{3}-2\left(\frac{a_2}{3}\right)^3. \end{split} \eq The solutions to the equation \bq{cubic:220:25} x^3 = px + q, \eq then, are the cubic polynomial's three roots. \index{Vieta's transform} So we have struck the~$a_2z^2$ term. That was the easy part; what to do next is not so obvious. If one could strike the~$px$ term as well, then the roots would follow immediately, but no very simple substitution like~(\ref{cubic:220:10}) achieves this---or rather, such a substitution does achieve it, but at the price of reintroducing an unwanted~$x^2$ or~$z^2$ term. That way is no good. Lacking guidance, one might try many, various substitutions, none of which seems to help much; but after weeks or months of such frustration one might eventually discover Vieta's transform~(\ref{cubic:200:10}), with the idea of balancing the equation between offsetting~$w$ and~$1/w$ terms. This works. Vieta-transforming~(\ref{cubic:220:25}) by the change of variable \bq{cubic:200:28} w + \frac{w_o^2}{w} \la x \eq we get the new equation \bq{cubic:220:35} w^3 + (3w_o^2-p)w + (3w_o^2-p)\frac{w_o^2}{w} + \frac{w_o^6}{w^3} = q, \eq which invites the choice \bq{cubic:220:40} w_o^2 \equiv \frac{p}{3}, \eq reducing~(\ref{cubic:220:35}) to read \[ w^3 + \frac{(p/3)^3}{w^3} = q. \] Multiplying by~$w^3$ and rearranging terms, we have the quadratic equation \bq{cubic:220:45} (w^3)^2 = 2\left(\frac{q}{2}\right)w^3 - \left(\frac{p}{3}\right)^3, \eq which by~(\ref{alggeo:240:quad}) we know how to solve. \index{quadratic expression} Vieta's transform has reduced the original cubic to a quadratic. \index{coefficient!inscrutable} The careful reader will observe that~(\ref{cubic:220:45}) seems to imply six roots, double the three the fundamental theorem of algebra (\S~\ref{noth:320.30}) allows a cubic polynomial to have. We shall return to this point in \S~\ref{cubic:235}. For the moment, however, we should like to improve the notation by defining% \footnote{ Why did we not define~$P$ and~$Q$ so to begin with? Well, before unveiling~(\ref{cubic:220:45}), we lacked motivation to do so. To define inscrutable coefficients unnecessarily before the need for them is apparent seems poor applied mathematical style. } \bq{cubic:220:50} \begin{split} P &\la -\frac{p}{3}, \\ Q &\la +\frac{q}{2}, \end{split} \eq with which~(\ref{cubic:220:25}) and~(\ref{cubic:220:45}) are written \bqa x^3 &=& 2Q - 3Px, \label{cubic:220:55} \\ (w^3)^2 &=& 2Qw^3 + P^3. \label{cubic:220:58} \eqa Table~\ref{cubic:cubic-table} summarizes the complete cubic polynomial root extraction meth\-od in the revised notation---including a few fine points regarding superfluous roots and edge cases, treated in \S\S~\ref{cubic:235} and~\ref{cubic:240} below. \begin{table} \caption[A method to extract the three roots of the general cubic.] {A method to extract the three roots of the general cubic polynomial. (In the definition of~$w^3$, one can choose either sign.)} \label{cubic:cubic-table} \index{cubic expression!roots of} \index{root extraction!from a cubic polynomial} \index{cubic formula} \settowidth\tla{$Q \pm \sqrt{ Q^2 + P^3 }$} \bqb 0 &=& z^3 + a_2z^2 + a_1z + a_0 \\ P &\equiv& \frac{a_1}{3}-\left(\frac{a_2}{3}\right)^2 \\ Q &\equiv& \frac{1}{2}\left[-a_0+3\left(\frac{a_1}{3}\right)\left(\frac{a_2}{3}\right)-2\left(\frac{a_2}{3}\right)^3\right] \\ w^3 &\equiv& \left\{ \br{ll} 2Q & \mbox{if $P=0$,} \\ Q \pm \sqrt{ Q^2 + P^3 } & \mbox{otherwise.} \er \right. \\ x &\equiv& \left\{ \br{ll} \makebox[\tla][l]{$0$} & \mbox{if $P=0$ and $Q=0$}, \\ w - P/w & \mbox{otherwise.} \er \right. \\ z &=& x - \frac{a_2}{3} \eqb \end{table} % ---------------------------------------------------------------------- \section{Superfluous roots} \label{cubic:235} \index{superfluous root} \index{root!superfluous} As \S~\ref{cubic:220} has observed, the equations of Table~\ref{cubic:cubic-table} seem to imply six roots, double the three the fundamental theorem of algebra (\S~\ref{noth:320.30}) allows a cubic polynomial to have. However, what the equations really imply is not six distinct roots but six distinct~$w$. The definition $x \equiv w - P/w$ maps two~$w$ to any one~$x$, so in fact the equations imply only three~$x$ and thus three roots~$z$. The question then is: of the six~$w$, which three do we really need and which three can we ignore as superfluous? The six~$w$ naturally come in two groups of three: one group of three from the one~$w^3$ and a second from the other. For this reason, we will guess---and logically it is only a guess---that a single~$w^3$ generates three distinct~$x$ and thus (because~$z$ differs from~$x$ only by a constant offset) all three roots~$z$. If the guess is right, then the second~$w^3$ cannot but yield the same three roots, which means that the second~$w^3$ is superfluous and can safely be overlooked. But is the guess right? Does a single~$w^3$ in fact generate three distinct~$x$? \index{squaring} \index{cubing} To prove that it does, let us suppose that it did not. Let us suppose that a single~$w^3$ did generate two~$w$ which led to the same~$x$. Letting the symbol~$w_1$ represent the third~$w$, then (since all three~$w$ come from the same~$w^3$) the two~$w$ are $e^{+i2\pi/3}w_1$ and $e^{-i2\pi/3}w_1$. Because $x \equiv w - P/w$, by successive steps, \bqb e^{+i2\pi/3}w_1 - \frac{P}{e^{+i2\pi/3}w_1} &=& e^{-i2\pi/3}w_1 - \frac{P}{e^{-i2\pi/3}w_1}, \\ e^{+i2\pi/3}w_1 + \frac{P}{e^{-i2\pi/3}w_1} &=& e^{-i2\pi/3}w_1 + \frac{P}{e^{+i2\pi/3}w_1}, \\ e^{+i2\pi/3}\left( w_1 + \frac{P}{w_1} \right) &=& e^{-i2\pi/3}\left( w_1 + \frac{P}{w_1} \right), \eqb which can only be true if \[ w_1^2 = -P. \] Cubing% \footnote{ The verb \emph{to cube} in this context means ``to raise to the third power,'' as to change~$y$ to~$y^3$, just as the verb \emph{to square} means ``to raise to the second power.'' } the last equation, \[ w_1^6 = -P^3; \] but squaring the table's~$w^3$ definition for $w=w_1$, \[ w_1^6 = 2Q^2 + P^3 \pm 2Q\sqrt{ Q^2 + P^3 }. \] Combining the last two on~$w_1^6$, \[ -P^3 = 2Q^2 + P^3 \pm 2Q\sqrt{ Q^2 + P^3 }, \] or, rearranging terms and halving, \[ Q^2 + P^3 = \mp Q\sqrt{ Q^2 + P^3 }. \] Squaring, \[ Q^4 + 2Q^2P^3 + P^6 = Q^4 + Q^2P^3, \] then canceling offsetting terms and factoring, \[ (P^3)(Q^2 + P^3) = 0. \] The last equation demands rigidly that either $P=0$ or $P^3=-Q^2$. Some cubic polynomials do meet the demand---\S~\ref{cubic:240} will treat these and the reader is asked to set them aside for the moment---but most cubic polynomials do not meet it. For most cubic polynomials, then, the contradiction proves false the assumption which gave rise to it. The assumption: that the three~$x$ descending from a single~$w^3$ were not distinct. Therefore, provided that $P \neq 0$ and $P^3 \neq -Q^2$, the three~$x$ descending from a single~$w^3$ are indeed distinct, as was to be demonstrated. The conclusion: \emph{either, not both, of the two signs in the table's quadratic solution $w^3 \equiv Q \pm \sqrt{ Q^2 + P^3 }$ demands to be considered.} One can choose either sign; it matters not which.% \footnote{ Numerically, it can matter. As a simple rule, because~$w$ appears in the denominator of $x$'s definition, when the two~$w^3$ differ in magnitude one might choose the larger. } The one sign alone yields all three roots of the general cubic polynomial. In calculating the three~$w$ from~$w^3$, one can apply the Newton-Raphson iteration~(\ref{drvtv:270:35}), the Taylor series of Table~\ref{taylor:315:tbl}, or any other convenient root-finding technique to find a single root~$w_1$ such that $w_1^3 = w^3$. Then the other two roots come easier. They are $e^{\pm i2\pi/3}w_1$; but $e^{\pm i2\pi/3} = (-1 \pm i\sqrt 3)/2$, so \bq{cubic:235:60} w = w_1, \frac{-1 \pm i\sqrt 3}{2} w_1. \eq \index{double root} \index{root!double} We should observe, incidentally, that nothing prevents two actual roots of a cubic polynomial from having the same value. This certainly is possible, and it does not mean that one of the two roots is superfluous or that the polynomial has fewer than three roots. For example, the cubic polynomial $(z-1)(z-1)(z-2) = z^3 - 4z^2 + 5z - 2$ has roots at~$1$, $1$ and~$2$, with a single root at $z=2$ and a double root---that is, two roots---at $z=1$. When this happens, the method of Table~\ref{cubic:cubic-table} properly yields the single root once and the double root twice, just as it ought to do. % ---------------------------------------------------------------------- \section{Edge cases} \label{cubic:240} \index{edge case} Section~\ref{cubic:235} excepts the edge cases $P=0$ and $P^3=-Q^2$. Mostly the book does not worry much about edge cases, but the effects of these cubic edge cases seem sufficiently nonobvious that the book might include here a few words about them, if for no other reason than to offer the reader a model of how to think about edge cases on his own. Table~\ref{cubic:cubic-table} gives the quadratic solution \[ w^3 \equiv Q \pm \sqrt{ Q^2 + P^3 }, \] in which \S~\ref{cubic:235} generally finds it sufficient to consider either of the two signs. In the edge case $P=0$, \[ w^3 = 2Q \ \mbox{or}\ 0. \] In the edge case $P^3=-Q^2$, \[ w^3 = Q. \] Both edge cases are interesting. In this section, we shall consider first the edge cases themselves, then their effect on the proof of \S~\ref{cubic:235}. The edge case $P=0$, like the general non-edge case, gives two distinct quadratic solutions~$w^3$. One of the two however is $w^3=Q-Q=0$, which is awkward in light of Table~\ref{cubic:cubic-table}'s definition that $x \equiv w-P/w$. For this reason, in applying the table's method when $P=0$, one chooses the other quadratic solution, $w^3 = Q + Q = 2Q$. The edge case $P^3=-Q^2$ gives only the one quadratic solution $w^3=Q$; or more precisely, it gives two quadratic solutions which happen to have the same value. This is fine. One merely accepts that $w^3=Q$, and does not worry about choosing one~$w^3$ over the other. \index{triple root} \index{root!triple} \index{corner case} The double edge case, or \emph{corner case,} arises where the two edges meet---where $P=0$ and $P^3=-Q^2$, or equivalently where $P=0$ and $Q=0$. At the corner, the trouble is that $w^3 = 0$ and that no alternate~$w^3$ is available. However, according to~(\ref{cubic:220:55}), $x^3 = 2Q - 3Px$, which in this case means that $x^3 = 0$ and thus that $x = 0$ absolutely, no other~$x$ being possible. This implies the triple root $z=-a_2/3$. Section~\ref{cubic:235} has excluded the edge cases from its proof of the sufficiency of a single~$w^3$. Let us now add the edge cases to the proof. In the edge case $P^3=-Q^2$, both~$w^3$ are the same, so the one~$w^3$ suffices by default because the other~$w^3$ brings nothing different. The edge case $P=0$ however does give two distinct~$w^3$, one of which is $w^3=0$, which puts an awkward $0/0$ in the table's definition of~$x$. We address this edge in the spirit of l'H\^opital's rule, by sidestepping it, changing~$P$ infinitesimally from $P=0$ to $P=\ep$. Then, choosing the~$-$ sign in the definition of~$w^3$, \bqb w^3 &=& Q - \sqrt{Q^2+\ep^3} = Q - (Q)\left(1+\frac{\ep^3}{2Q^2}\right) = -\frac{\ep^3}{2Q}, \\ w &=& -\frac{\ep}{(2Q)^{1/3}}, \\ x &=& w - \frac{\ep}{w} = -\frac{\ep}{(2Q)^{1/3}} + (2Q)^{1/3} = (2Q)^{1/3}. \eqb But choosing the~$+$ sign, \bqb w^3 &=& Q + \sqrt{Q^2+\ep^3} = 2Q, \\ w &=& (2Q)^{1/3}, \\ x &=& w - \frac{\ep}{w} = (2Q)^{1/3} - \frac{\ep}{(2Q)^{1/3}} = (2Q)^{1/3}. \eqb Evidently the roots come out the same, either way. This completes the proof. % ---------------------------------------------------------------------- \section{Quartics} \label{cubic:250} \index{quartic expression} \index{Ferrari, Lodovico (1522--1565)} Having successfully extracted the roots of the general cubic polynomial, we now turn our attention to the general quartic. The kernel of the cubic technique lay in reducing the cubic to a quadratic. The kernel of the quartic technique lies likewise in reducing the quartic to a cubic. The details differ, though; and, strangely enough, in some ways the quartic reduction is actually the simpler.% \footnote{% \label{cubic:250:09}% Even stranger, historically Ferrari discovered it earlier \cite[``Quartic equation'']{EWW}\@. Apparently Ferrari discovered the quartic's resolvent cubic~(\ref{cubic:250:30}), which he could not solve until Tartaglia applied Vieta's transform to it. What motivated Ferrari to chase the quartic solution while the cubic solution remained still unknown, this writer does not know, but one supposes that it might make an interesting story. The reason the quartic is simpler to reduce is probably related to the fact that $(1)^{1/4} = \pm 1, \pm i$, whereas $(1)^{1/3} = 1, (-1 \pm i\sqrt 3)/2$. The $(1)^{1/4}$ brings a much neater result, the roots lying nicely along the Argand axes. This may also be why the quintic is intractable---but here we trespass the professional mathematician's territory and stray from the scope of this book. See Ch.~\ref{noth}'s footnote~\ref{noth:320:fn20}. } \index{cleverness} As with the cubic, one begins solving the quartic by changing the variable \bq{cubic:250:10} x + h \la z \eq to obtain the equation \bq{cubic:250:15} x^4 = sx^2 + px + q, \eq where \bq{cubic:250:20} \nc\xh{\ensuremath{\frac{a_3}{4}}} \nc\yh{\ensuremath{\left(\xh\right)}} \begin{split} h &\equiv -\xh, \\ %s &\equiv 6h^2 + 3h a_3 + a_2, \\ %p &\equiv 4h^3 + 3h^2a_3 + 2h a_2 + a_1, \\ %q &\equiv h^4 + h^3a_3 + h^2a_2 + ha_1 + a_0. s &\equiv -a_2 + 6\yh^2, \\ p &\equiv -a_1 + 2a_2\yh - 8\yh^3, \\ q &\equiv -a_0 + a_1\yh - a_2\yh^2 + 3\yh^4. \end{split} \eq To reduce~(\ref{cubic:250:15}) further, one must be cleverer. Ferrari% \footnote{\cite[``Quartic equation'']{EWW}} supplies the cleverness. The clever idea is to transfer some but not all of the~$sx^2$ term to the equation's left side by \[ x^4 + 2ux^2 = (2u+s)x^2 + px + q, \] where~$u$ remains to be chosen; then to complete the square on the equation's left side as in \S~\ref{alggeo:240}, but with respect to~$x^2$ rather than~$x$, as \bq{cubic:250:24} \left( x^2 + u \right)^2 = k^2x^2 + px + j^2, \eq where \bq{cubic:250:25} \begin{split} k^2 &\equiv 2u+s, \\ j^2 &\equiv u^2+q. \end{split} \eq Now, one must regard~(\ref{cubic:250:24}) and~(\ref{cubic:250:25}) properly. In these equations,~$s$, $p$ and~$q$ have definite values fixed by~(\ref{cubic:250:20}), but not so~$u$, $j$ or~$k$. The variable~$u$ is completely free; we have introduced it ourselves and can assign it any value we like. And though~$j^2$ and~$k^2$ depend on~$u$, still, even after specifying~$u$ we remain free at least to choose signs for~$j$ and~$k$. As for~$u$, though no choice would truly be wrong, one supposes that a wise choice might at least render~(\ref{cubic:250:24}) easier to simplify. \index{constraint} So, what choice for~$u$ would be wise? Well, look at~(\ref{cubic:250:24}). The left side of that equation is a perfect square. The right side would be, too, if it were that $p = \pm 2jk$; so, arbitrarily choosing the~$+$ sign, we propose the constraint that \bq{cubic:250:27} p = 2jk, \eq or, better expressed, \bq{cubic:250:32} j = \frac{p}{2k}. \eq Squaring~(\ref{cubic:250:27}) and substituting for~$j^2$ and~$k^2$ from~(\ref{cubic:250:25}), we have that \[ p^2 = 4(2u+s)(u^2+q); \] or, after distributing factors, rearranging terms and scaling, that \bq{cubic:250:30} \index{quartic expression!resolvent cubic of} \index{resolvent cubic} 0 = u^3 + \frac{s}{2}u^2 + qu + \frac{4sq-p^2}{8}. \eq Equation~(\ref{cubic:250:30}) is the \emph{resolvent cubic,} which we know by Table~\ref{cubic:cubic-table} how to solve for~$u$, and which we now specify as a second constraint. If the constraints~(\ref{cubic:250:32}) and~(\ref{cubic:250:30}) are both honored, then we can safely substitute~(\ref{cubic:250:27}) into~(\ref{cubic:250:24}) to reach the form \[ \left( x^2 + u \right)^2 = k^2x^2 + 2jkx + j^2, \] which is \bq{cubic:250:35} \big( x^2 + u \big)^2 = \big( kx + j \big)^2. \eq The resolvent cubic~(\ref{cubic:250:30}) of course yields three~$u$ not one, but the resolvent cubic is a voluntary constraint, so we can just pick one~$u$ and ignore the other two. Equation~(\ref{cubic:250:25}) then gives~$k$ (again, we can just pick one of the two signs), and~(\ref{cubic:250:32}) then gives~$j$. With~$u$, $j$ and~$k$ established,~(\ref{cubic:250:35}) implies the quadratic \bq{cubic:250:40} x^2 = \pm ( kx + j ) - u, \eq which~(\ref{alggeo:240:quad}) solves as \bq{cubic:250:50} x = \pm \frac{k}{2} \,\pm_o \sqrt{ \left(\frac{k}{2}\right)^2 \pm j - u }, \eq wherein the two~$\pm$ signs are tied together but the third,~$\pm_o$ sign is independent of the two. Equation~(\ref{cubic:250:50}), with the other equations and definitions of this section, reveals the four roots of the general quartic polynomial. In view of~(\ref{cubic:250:50}), the change of variables \bq{cubic:250:60} \begin{split} K &\la \frac{k}{2}, \\ J &\la j, \end{split} \eq improves the notation. Using the improved notation, Table~\ref{cubic:quartic-table} summarizes the complete quartic polynomial root extraction method. \begin{table} \caption[A method to extract the four roots of the general quartic.] {A method to extract the four roots of the general quartic polynomial. (In the table, the resolvent cubic is solved for~$u$ by the method of Table~\ref{cubic:cubic-table}, where any one of the three resulting~$u$ serves. Either of the two~$K$ similarly serves. Of the three~$\pm$ signs in $x$'s definition, the~$\pm_o$ is independent but the other two are tied together, the four resulting combinations giving the four roots of the general quartic.)} \label{cubic:quartic-table} \index{quartic expression!roots of} \index{root extraction!from a quartic polynomial} \index{quartic formula} \nc\xh{\ensuremath{\frac{a_3}{4}}} \nc\yh{\ensuremath{\left(\xh\right)}} \bqb 0 &=& z^4 + a_3z^3 + a_2z^2 + a_1z + a_0 \\ s &\equiv& -a_2 + 6\yh^2 \\ p &\equiv& -a_1 + 2a_2\yh - 8\yh^3 \\ q &\equiv& -a_0 + a_1\yh - a_2\yh^2 + 3\yh^4 \\ 0 &=& u^3 + \frac{s}{2}u^2 + qu + \frac{4sq-p^2}{8} \\ K &\equiv& \pm \frac{\sqrt{2u+s}}{2} \\ J &\equiv& \begin{cases} \pm \sqrt{u^2+q} &\mbox{if $K=0$,} \\ p/4K &\mbox{otherwise.} \end{cases} \\ x &\equiv& \pm K \,\pm_o \sqrt{ K^2 \pm J - u } \\ z &=& x - \xh \eqb \end{table} % ---------------------------------------------------------------------- \section{Guessing the roots} \label{cubic:700} \index{guessing roots} \index{root!guessing of} \index{baroquity} It is entertaining to put pencil to paper and use Table~\ref{cubic:cubic-table}'s method to extract the roots of the cubic polynomial \[ 0 = [z-1][z-i][z+i] = z^3 - z^2 + z - 1. \] One finds that % P = 2/9, Q = 0xA/0x1B \bqb z &=& w+\frac{1}{3}-\frac{2}{3^2w}, \\ w^3 &\equiv& \frac{2 \left(5 + \sqrt{3^3}\right)}{3^3}, \eqb which says indeed that $z=1,\pm i$, but just you try to simplify it! A more baroque, more impenetrable way to write the number~$1$ is not easy to conceive. One has found the number~$1$ but cannot recognize it. Figuring the square and cube roots in the expression numerically, the root of the polynomial comes mysteriously to $1.0000$, but why? The root's symbolic form gives little clue. \index{quintic expression} In general no better way is known;% \footnote{ At least, no better way is known to this author. If any reader can straightforwardly simplify the expression without solving a cubic polynomial of some kind, the author would like to hear of it. } we are stuck with the cubic baroquity. However, to the extent to which a cubic, a quartic, a quintic or any other polynomial has real, rational roots, a trick is known to sidestep Tables~\ref{cubic:cubic-table} and~\ref{cubic:quartic-table} and guess the roots directly. Consider for example the quintic polynomial \[ z^5 - \frac 7 2 z^4 + 4 z^3 + \frac 1 2 z^2 - 5 z + 3. \] Doubling to make the coefficients all integers produces the polynomial \[ 2z^5 - 7z^4 + 8z^3 + 1z^2 - \mbox{0xA}z + 6, \] which naturally has the same roots. If the roots are complex or irrational, they are hard to guess; but if any of the roots happens to be real and rational, it must belong to the set \[ \left\{ \pm 1, \pm 2, \pm 3, \pm 6, \pm \frac 1 2, \pm \frac 2 2, \pm \frac 3 2, \pm \frac 6 2 \right\}. \] No other real, rational root is possible. Trying the several candidates on the polynomial, one finds that~$1$, $-1$ and~$3/2$ are indeed roots. Dividing these out leaves a quadratic which is easy to solve for the remaining roots. \index{relative primeness} \index{prime number!relative} \index{root!rational} \index{rational root} The real, rational candidates are the factors of the polynomial's trailing coefficient (in the example,~$6$, whose factors are~$\pm 1$, $\pm 2$, $\pm 3$ and~$\pm 6$) divided by the factors of the polynomial's leading coefficient (in the example,~$2$, whose factors are~$\pm 1$ and~$\pm 2$). The reason no other real, rational root is possible is seen% \footnote{ The presentation here is quite informal. We do not want to spend many pages on this. } by writing $z=p/q$---where~$p,q \in \mathbb Z$ are integers and the fraction $p/q$ is fully reduced---then multiplying the $n$th-order polynomial by~$q^n$ to reach the form \[ a_np^n + a_{n-1}p^{n-1}q + \cdots + a_1pq^{n-1} + a_0q^n = 0, \] where all the coefficients~$a_k$ are integers. Moving the~$q^n$ term to the equation's right side, we have that \[ \left( a_np^{n-1} + a_{n-1}p^{n-2}q + \cdots + a_1q^{n-1} \right) p = -a_0q^n, \] which implies that~$a_0q^n$ is a multiple of~$p$. But by demanding that the fraction $p/q$ be fully reduced, we have defined~$p$ and~$q$ to be \emph{relatively prime} to one another---that is, we have defined them to have no factors but~$\pm 1$ in common---so, not only~$a_0q^n$ but~$a_0$ itself is a multiple of~$p$. By similar reasoning,~$a_n$ is a multiple of~$q$. But if~$a_0$ is a multiple of~$p$, and~$a_n$, a multiple of~$q$, then~$p$ and~$q$ are factors of~$a_0$ and~$a_n$ respectively. We conclude for this reason, as was to be demonstrated, that no real, rational root is possible except a factor of~$a_0$ divided by a factor of~$a_n$.% \footnote{\cite[\S~3.2]{SRW}} Such root-guessing is little more than an algebraic trick, of course, but it can be a pretty useful trick if it saves us the embarrassment of inadvertently expressing simple rational numbers in ridiculous ways. One could write much more about higher-order algebra, but now that the reader has tasted the topic he may feel inclined to agree that, though the general methods this chapter has presented to solve cubics and quartics are interesting, further effort were nevertheless probably better spent elsewhere. The next several chapters turn to the topic of the matrix, harder but much more profitable, toward which we mean to put substantial effort. derivations-0.53.20120414.orig/tex/inttx.tex0000644000000000000000000014457511742566274017047 0ustar rootroot% ---------------------------------------------------------------------- \chapter{Integration techniques} \label{inttx} \index{integration techniques} Equation~(\ref{drvtv:defz}) implies a general technique for calculating a derivative symbolically. Its counterpart~(\ref{integ:def}), unfortunately, implies a general technique only for calculating an integral \emph{numerically}---and even for this purpose it is imperfect; for, when it comes to adding an infinite number of infinitesimal elements, how is one actually to do the sum? It turns out that there is no one general answer to this question. Some functions are best integrated by one technique, some by another. It is hard to guess in advance which technique might work best. This chapter surveys several weapons of the intrepid mathematician's arsenal against the integral. % ---------------------------------------------------------------------- \section{Integration by antiderivative} \label{inttx:210} \index{integration!by antiderivative} \index{antiderivative} \index{integrand} The simplest way to solve an integral is just to look at it, recognizing its integrand to be the derivative of something already known:% \footnote{ The notation $f(\tau)|_a^z$ or $[f(\tau)]_a^z$ means $f(z)-f(a)$. } \bq{inttx:antider} \int_a^z \frac{df}{d\tau} \,d\tau = f(\tau)|_a^z. \eq For instance, \[ \int_1^x \frac{1}{\tau} \,d\tau = \ln\tau|_1^x = \ln x. \] One merely looks at the integrand $1/\tau$, recognizing it to be the derivative of $\ln\tau$, then directly writes down the solution $\ln \tau|_1^x$. Refer to \S~\ref{integ:230}. The technique by itself is pretty limited. However, the frequent object of other integration techniques is to transform an integral into a form to which this basic technique can be applied. \index{derivative!of $z^a/a$} Besides the essential \bq{inttx:210:10} \tau^{a-1} = \frac{d}{d\tau} \left(\frac{\tau^a}{a}\right), \eq Tables~\ref{integ:basic-antider}, \ref{cexp:drv}, \ref{cexp:drvi} and~\ref{inttx:470:tbl} provide several further good derivatives this antiderivative technique can use. \index{natural logarithm!and the antiderivative} \index{logarithm, natural!and the antiderivative} \index{antiderivative!and the natural logarithm} One particular, nonobvious, useful variation on the antiderivative technique seems worth calling out specially here. If $z=\rho e^{i\phi}$, then~(\ref{taylor:350:31}) and~(\ref{taylor:350:32}) have that \bq{inttx:intinvz} \int_{z_1}^{z_2} \frac{dz}{z} = \ln\frac{\rho_2}{\rho_1} + i(\phi_2-\phi_1). \eq This helps, for example, when~$z_1$ and~$z_2$ are real but negative numbers. % ---------------------------------------------------------------------- \section{Integration by substitution} \label{inttx:220} \index{integration!by substitution} Consider the integral \[ S = \int_{x_1}^{x_2}\frac{x\,dx}{1+x^2}. \] This integral is not in a form one immediately recognizes. However, with the change of variable \[ u \la 1 + x^2, \] whose differential is (by successive steps) \bqb d(u) &=& d(1+x^2), \\ du &=& 2x\,dx, \eqb the integral is \bqb S &=& \int_{x=x_1}^{x_2}\frac{x\,dx}{u} \\&=& \int_{x=x_1}^{x_2}\frac{2x\,dx}{2u} \\&=& \int_{u=1+x_1^2}^{1+x_2^2}\frac{du}{2u} \\&=& \left.\frac 1 2 \ln u\right|_{u=1+x_1^2}^{1+x_2^2} \\&=& \frac 1 2 \ln\frac{1+x_2^2}{1+x_1^2}. \eqb To check the result, we can take the derivative per \S~\ref{integ:245} of the final expression with respect to~$x_2$: \bqb \left. \ppx{x_2} \frac 1 2 \ln\frac{1+x_2^2}{1+x_1^2} \right|_{x_2=x} &=& \left[ \frac 1 2 \ppx{x_2} \left\{ \ln\left(1+x_2^2\right) - \ln\left(1+x_1^2\right) \right\} \right]_{x_2=x} \\&=& \frac{x}{1+x^2}, \eqb which indeed has the form of the integrand we started with. The technique is \emph{integration by substitution.} It does not solve all integrals but it does solve many, whether alone or in combination with other techniques. % ---------------------------------------------------------------------- \section{Integration by parts} \label{inttx:230} \index{integration!by parts} \index{product rule, derivative} \index{derivative!product rule for} Integration by parts is a curious but very broadly applicable technique which begins with the derivative product rule~(\ref{drvtv:prod2}), \[ d(uv) = u\,dv + v\,du, \] where $u(\tau)$ and $v(\tau)$ are functions of an independent variable~$\tau$. Reordering terms, \[ u\,dv = d(uv) - v\,du. \] Integrating, \bq{inttx:parts} \int_{\tau=a}^b u\,dv = \left. uv \right|_{\tau=a}^b - \int_{\tau=a}^b v\,du. \eq Equation~(\ref{inttx:parts}) is the rule of \emph{integration by parts.} For an example of the rule's operation, consider the integral \[ S(x) = \int_0^x \tau\cos\alpha\tau\,d\tau. \] Unsure how to integrate this, we can begin by integrating \emph{part} of it. We can begin by integrating the $\cos\alpha\tau\,d\tau$ part. Letting \[ \begin{split} u &\la \tau, \\ dv &\la \cos\alpha\tau\,d\tau, \end{split} \] we find that% \footnote{ The careful reader will observe that $v=(\sin\alpha\tau)/\alpha + C$ matches the chosen~$dv$ for any value of~$C$, not just for $C=0$. This is true. However, nothing in the integration by parts technique requires us to consider all possible~$v$. Any convenient~$v$ suffices. In this case, we choose $v=(\sin\alpha\tau)/\alpha$. } \[ \begin{split} du &= d\tau, \\ v &= \frac{\sin\alpha\tau}{\alpha}. \end{split} \] According to~(\ref{inttx:parts}), then, \[ S(x) = \left. \frac{\tau\sin\alpha\tau}{\alpha} \right|_0^x - \int_0^x \frac{\sin\alpha\tau}{\alpha} \,d\tau = \frac{x}{\alpha} \sin\alpha x + \cos\alpha x - 1. \] Though integration by parts is a powerful technique, one should understand clearly what it does and does not do. The technique does not just integrate each part of an integral separately. It isn't that simple. What it does is to integrate one part of an integral separately---whichever part one has chosen to identify as~$dv$---while contrarily differentiating the other part~$u$, upon which it rewards the mathematician only with a whole new integral $\int v\,du$. The new integral may or may not be easier to integrate than was the original $\int u\,dv$. The virtue of the technique lies in that one often can find a part~$dv$ which does yield an easier $\int v\,du$. The technique is powerful for this reason. \index{gamma function} \index{factorial} For another kind of example of the rule's operation, consider the definite integral% \footnote{\cite{Lebedev}} \bq{inttx:230:gamma} \Gamma(z) \equiv \int_0^\infty e^{-\tau} \tau^{z-1} \,d\tau, \ \ \Re(z) > 0. \eq Letting \[ \begin{split} u &\la e^{-\tau}, \\ dv &\la \tau^{z-1} \,d\tau, \end{split} \] we evidently have that \[ \begin{split} du &= -e^{-\tau} \,d\tau, \\ v &= \frac{\tau^z}{z}. \end{split} \] Substituting these according to~(\ref{inttx:parts}) into~(\ref{inttx:230:gamma}) yields \bqb \Gamma(z) &=& \left[ e^{-\tau}\frac{\tau^z}{z} \right]_{\tau=0}^{\infty} - \int_0^\infty \left(\frac{\tau^z}{z}\right) \left(-e^{-\tau} \,d\tau\right) \\&=& [0-0] + \int_0^\infty \frac{\tau^z}{z} e^{-\tau} \,d\tau \\&=& \frac{\Gamma(z+1)}{z}. \eqb When written \bq{inttx:230:20} \Gamma(z+1) = z\Gamma(z), \eq this is an interesting result. Since per~(\ref{inttx:230:gamma}) \[ \Gamma(1) = \int_0^\infty e^{-\tau} \,d\tau = \left[ -e^{-\tau} \right]_0^\infty = 1, \] it follows by induction on~(\ref{inttx:230:20}) that \bq{inttx:230:30} (n-1)! = \Gamma(n). \eq Thus~(\ref{inttx:230:gamma}), called the \emph{gamma function,} can be taken as an extended definition of the factorial $(z-1)!$ for all~$z$, $\Re(z) > 0$. Integration by parts has made this finding possible. % ---------------------------------------------------------------------- \section{Integration by unknown coefficients} \label{inttx:240} \index{coefficient!unknown} \index{unknown coefficient} \index{integration!by unknown coefficients} One of the more powerful integration techniques is relatively inelegant, yet it easily cracks some integrals that give other techniques trouble. The technique is the \emph{method of unknown coefficients,} and it is based on the antiderivative~(\ref{inttx:antider}) plus intelligent guessing. It is best illustrated by example. \index{guessing the form of a solution} \index{solution!guessing the form of} \index{antiderivative!guessing} Consider the integral (which arises in probability theory) \bq{inttx:240:21} S(x) = \int_0^x e^{-(\rho/\sigma)^2/2} \rho \,d\rho. \eq If one does not know how to solve the integral in a more elegant way, one can \emph{guess} a likely-seeming antiderivative form, such as \[ e^{-(\rho/\sigma)^2/2} \rho = \frac{d}{d\rho}ae^{-(\rho/\sigma)^2/2}, \] where the~$a$ is an \emph{unknown coefficient.} Having guessed, one has no guarantee that the guess is right, but see: if the guess \emph{were} right, then the antiderivative would have the form \bqb e^{-(\rho/\sigma)^2/2} \rho &=& \frac{d}{d\rho}ae^{-(\rho/\sigma)^2/2} \\ &=& -\frac{a\rho}{\sigma^2} e^{-(\rho/\sigma)^2/2}, \eqb implying that \[ a = -\sigma^2 \] (evidently the guess is right, after all). Using this value for~$a$, one can write the specific antiderivative \[ e^{-(\rho/\sigma)^2/2} \rho = \frac{d}{d\rho}\left[-\sigma^2e^{-(\rho/\sigma)^2/2}\right], \] with which one can solve the integral, concluding that \bq{inttx:240:22} S(x) = \left[-\sigma^2 e^{-(\rho/\sigma)^2/2}\right]_0^x = \left(\sigma^2\right)\left[1-e^{-(x/\sigma)^2/2}\right]. \eq \index{differential equation} \index{differential equation!solution of by unknown coefficients} \index{boundary condition} \index{loan} \index{borrower} \index{interest} \index{payment rate} \index{amortization} The same technique solves differential equations, too. Consider for example the differential equation \bq{inttx:240:26} dx = (Ix-P)\,dt, \ \ x|_{t=0} = x_o,\ x|_{t=T} = 0, \eq which conceptually represents% \footnote{ Real banks (in the author's country, at least) by law or custom actually use a needlessly more complicated formula---and not only more complicated, but mathematically slightly incorrect, too. } the changing balance~$x$ of a bank loan account over time~$t$, where~$I$ is the loan's interest rate and~$P$ is the borrower's payment rate. If it is desired to find the correct payment rate~$P$ which pays the loan off in the time~$T$, then (perhaps after some bad guesses) we guess the form \[ x(t) = Ae^{\alpha t} + B, \] where~$\alpha$, $A$ and~$B$ are unknown coefficients. The guess' derivative is \[ dx = \alpha Ae^{\alpha t}\,dt. \] Substituting the last two equations into~(\ref{inttx:240:26}) and dividing by~$dt$ yields \[ \alpha Ae^{\alpha t} = IAe^{\alpha t} + IB - P, \] which at least is satisfied if both of the equations \[ \begin{split} \alpha Ae^{\alpha t} &= IAe^{\alpha t}, \\ 0 &= IB - P, \end{split} \] are satisfied. Evidently good choices for~$\alpha$ and~$B$, then, are \[ \begin{split} \alpha &= I, \\ B &= \frac{P}{I}. \end{split} \] Substituting these coefficients into the $x(t)$ equation above yields the general solution \bq{inttx:240:27} x(t) = Ae^{It} + \frac{P}{I} \eq to~(\ref{inttx:240:26}). The constants~$A$ and~$P$, we establish by applying the given \emph{boundary conditions} $x|_{t=0} = x_o$ and $x|_{t=T} = 0$. For the former condition,~(\ref{inttx:240:27}) is \[ x_o = Ae^{(I)(0)} + \frac{P}{I} = A + \frac{P}{I}; \] and for the latter condition, \[ 0 = Ae^{IT} + \frac{P}{I}. \] Solving the last two equations simultaneously, we have that \bq{inttx:240:29} \begin{split} A &= \frac{-e^{-IT}x_o}{1-e^{-IT}}, \\ P &= \frac{Ix_o}{1-e^{-IT}}. \end{split} \eq Applying these to the general solution~(\ref{inttx:240:27}) yields the specific solution \bq{inttx:240:28} x(t) = \frac{x_o}{1-e^{-IT}} \left[1-e^{(I)(t-T)}\right] \eq to~(\ref{inttx:240:26}) meeting the boundary conditions, with the payment rate~$P$ required of the borrower given by~(\ref{inttx:240:29}). The virtue of the method of unknown coefficients lies in that it permits one to try an entire family of candidate solutions at once, with the family members distinguished by the values of the coefficients. If a solution exists anywhere in the family, the method usually finds it. The method of unknown coefficients is an elephant. Slightly inelegant the method may be, but it is pretty powerful, too---and it has surprise value (for some reason people seem not to expect it). Such are the kinds of problems the method can solve. % ---------------------------------------------------------------------- \section{Integration by closed contour} \label{inttx:250} \index{Cauchy's integral formula} \index{Cauchy, Augustin Louis (1789--1857)} \index{integral!closed complex contour} \index{contour integration!closed complex} \index{integration!by closed contour} \index{dummy variable} We pass now from the elephant to the falcon, from the inelegant to the sublime. Consider the definite integral% \footnote{\cite[\S~1.2]{Lebedev}} \[ S = \int_0^\infty \frac{\tau^a}{\tau+1} \,d\tau, \ \ -1 < a < 0. \] This is a hard integral. No obvious substitution, no evident factoring into parts, seems to solve the integral; but there is a way. The integrand has a pole at $\tau=-1$. Observing that~$\tau$ is only a dummy integration variable, if one writes the same integral using the complex variable~$z$ in place of the real variable~$\tau$, then Cauchy's integral formula~(\ref{taylor:cauchy}) has that integrating once counterclockwise about a closed complex contour, with the contour enclosing the pole at $z=-1$ but shutting out the branch point at $z=0$, yields \[ I = \oint \frac{z^a}{z+1} \,dz = i2\pi z^a|_{z=-1} = i2\pi \left(e^{ i2\pi/2 }\right)^a = i2\pi e^{ i2\pi a/2}. \] The trouble, of course, is that the integral~$S$ does not go about a closed complex contour. One can however construct a closed complex contour~$I$ of which~$S$ is a part, as in Fig~\ref{inttx:250:fig1}. \begin{figure} \caption{Integration by closed contour.} \label{inttx:250:fig1} \index{contour!complex} \bc { \nc\xax{-5} \nc\xbx{-3} \nc\xcx{ 5} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xx{2.5} \nc\xxr{2.2361} \nc\xxra{2.561} \nc\xxrb{357.8} \nc\xxrr{0.30} \nc\xxrra{18.435} \nc\xxrrb{341.565} \nc\xxrc{60} \nc\xxa{0.62} \nc\xxb{0.10} \nc\xxba{0.2846} \nc\xxbb{2.2338} \nc\xxs{2.6} \nc\polexy{0.10} \nc\pole{ { \psset{linewidth=1.0pt} \psline(-\polexy,-\polexy)( \polexy, \polexy) \psline( \polexy,-\polexy)(-\polexy, \polexy) } } \psline[linewidth=0.5pt](-\xx,0)(\xx,0) \psline[linewidth=0.5pt](0,-\xx)(0,\xx) \psarc[linewidth=2.0pt]{cc-cc}(0,0){\xxr}{\xxra}{\xxrb} \psarc[linewidth=2.0pt]{cc->}(0,0){\xxr}{\xxra}{\xxrc} \psline[linewidth=2.0pt]{c-c}( \xxba, \xxb)(\xxbb, \xxb) \psline[linewidth=2.0pt]{c-c}( \xxba,-\xxb)(\xxbb,-\xxb) \psarc[linewidth=2.0pt]{cc-cc}(0,0){\xxrr}{\xxrra}{\xxrrb} \rput(-\xxa,0){ \pole \rput{15}(0,0){ \psline[linewidth=0.5pt](0,0.16)(0,0.70) \rput{*0}(0,0.90){$z=-1$} } } \rput[l](\xxs,0){$\Re(z)$} \rput[b](0,\xxs){$\Im(z)$} \rput(1.15,0.35){$I_1$} \rput(1.20,2.20){$I_2$} \rput(1.15,-0.39){$I_3$} \rput(-0.30,-0.45){$I_4$} \end{pspicture} } \ec \end{figure} If the outer circle in the figure is of infinite radius and the inner, of infinitesimal, then the closed contour~$I$ is composed of the four parts \bqb I &=& I_1 + I_2 + I_3 + I_4 \\ &=& (I_1 + I_3) + I_2 + I_4. \eqb % The figure tempts one to make the mistake of writing that $I_1=S=-I_3$, but besides being incorrect this defeats the purpose of the closed contour technique. More subtlety is needed. One must take care to interpret the four parts correctly. The integrand $z^a/(z+1)$ is multiple-valued; so, in fact, the two parts $I_1 + I_3 \neq 0$ do not cancel. The integrand has a branch point at $z=0$, which, in passing from~$I_3$ through~$I_4$ to~$I_1$, the contour has circled. Even though~$z$ itself takes on the same values along~$I_3$ as along~$I_1$, the multiple-valued integrand $z^a/(z+1)$ does not. Indeed, \[ \renewcommand\arraystretch{2.2} \br{rclcrcl} \ds I_1 &=& \ds \int_0^\infty \frac{(\rho e^{i0})^a}{(\rho e^{i0})+1} \,d\rho &=& \ds \int_0^\infty \frac{\rho^a}{\rho+1} \,d\rho &=& \ds S, \\ \ds -I_3 &=& \ds \int_0^\infty \frac{(\rho e^{i2\pi})^a}{(\rho e^{i2\pi})+1} \,d\rho &=& \ds e^{i2\pi a} \int_0^\infty \frac{\rho^a}{\rho+1} \,d\rho &=& \ds e^{i2\pi a}S. \er \] Therefore, \bqb I &=& I_1 + I_2 + I_3 + I_4 \\ &=& (I_1 + I_3) + I_2 + I_4 \\ &=& (1-e^{i2\pi a}) S + \lim_{\rho\ra\infty} \int_{\phi=0}^{2\pi} \frac{z^a}{z+1} \,dz - \lim_{\rho\ra 0} \int_{\phi=0}^{2\pi} \frac{z^a}{z+1} \,dz \\ &=& (1-e^{i2\pi a}) S + \lim_{\rho\ra\infty} \int_{\phi=0}^{2\pi} z^{a-1} \,dz - \lim_{\rho\ra 0} \int_{\phi=0}^{2\pi} z^a \,dz \\ &=& (1-e^{i2\pi a}) S + \lim_{\rho\ra\infty} \left.\frac{z^a}{a} \right|_{\phi=0}^{2\pi} - \lim_{\rho\ra 0} \left.\frac{z^{a+1}}{a+1} \right|_{\phi=0}^{2\pi}. \eqb Since $a<0$, the first limit vanishes; and because $a>-1$, the second limit vanishes, too, leaving \[ I = (1-e^{i2\pi a}) S. \] But by Cauchy's integral formula we have already found an expression for~$I$. Substituting this expression into the last equation yields, by successive steps, \bqb i2\pi e^{ i2\pi a/2} &=& (1-e^{i2\pi a}) S, \\ S &=& \frac{i2\pi e^{ i2\pi a/2}}{1-e^{i2\pi a}}, \\ S &=& \frac{i2\pi}{e^{-i2\pi a/2}-e^{i2\pi a/2}}, \\ S &=& -\frac{2\pi/2}{\sin(2\pi a/2)}. \eqb That is, \bq{inttx:250:20} \int_0^\infty \frac{\tau^a}{\tau+1} \,d\tau = -\frac{2\pi/2}{\sin(2\pi a/2)}, \ \ -1 < a < 0, \eq an astonishing result.% \footnote{ So astonishing is the result, that one is unlikely to believe it at first encounter. However, straightforward (though computationally highly inefficient) numerical integration per~(\ref{integ:def}) confirms the result, as the interested reader and his computer can check. Such results vindicate the effort we have spent in deriving Cauchy's integral formula~(\ref{taylor:cauchy}). } Another example% \footnote{\cite{Kohler-lecture}} is \[ T = \int_0^{2\pi} \frac{d\theta}{1+a\cos\theta}, \ \ \Im(a)=0,\ \left|\Re(a)\right| < 1. \] As in the previous example, here again the contour is not closed. The previous example closed the contour by extending it, excluding the branch point. In this example there is no branch point to exclude, nor need one extend the contour. Rather, one changes the variable \[ z \la e^{i\theta} \] and takes advantage of the fact that~$z$, unlike~$\theta$, \emph{begins and ends the integration at the same point.} One thus obtains the equivalent integral \bqb T &=& \oint \frac{dz/iz}{1+(a/2)(z+1/z)} = -\frac{i2}{a} \oint \frac{dz}{z^2+2z/a+1} \\&=& -\frac{i2}{a} \oint \frac{dz}{ \left[z-\left(-1+\sqrt{1-a^2}\right)/a\right] \left[z-\left(-1-\sqrt{1-a^2}\right)/a\right] }, \eqb whose contour is the unit circle in the Argand plane. The integrand evidently has poles at \[ z = \frac{-1\pm\sqrt{1-a^2}}{a}, \] whose magnitudes are such that \[ \left|z\right|^2 = \frac{ 2-a^2 \mp 2\sqrt{1-a^2} }{a^2}. \] One of the two magnitudes is less than unity and one is greater, meaning that one of the two poles lies within the contour and one lies without, as is seen by the successive steps% \footnote{ These steps are perhaps best read from bottom to top. See Ch.~\ref{noth}'s footnote~\ref{noth:420:85}. } \[ \renewcommand{\arraystretch}{1.5} \settowidth\tla{$-(1-a^2)$} \br{rcccl} a^2 &<& \multicolumn{1}{l}{\makebox[\tla][l]{$1,$}} && \\ 0 &<& \multicolumn{1}{l}{\makebox[\tla][l]{$1 - a^2,$}} && \\ (-a^2)(0) &>& \multicolumn{1}{l}{\makebox[\tla][l]{$(-a^2)(1 - a^2),$}} && \\ 0 &>& \multicolumn{1}{l}{\makebox[\tla][l]{$-a^2 + a^4,$}} && \\ 1-a^2 &>& \multicolumn{1}{l}{\makebox[\tla][l]{$1 - 2a^2 + a^4,$}} && \\ 1-a^2 &>& \multicolumn{1}{l}{\makebox[\tla][l]{$\left(1 - a^2\right)^2,$}} && \\ \sqrt{1-a^2} &>& \multicolumn{1}{l}{\makebox[\tla][l]{$1 - a^2,$}} && \\ -\sqrt{1-a^2} &<& -(1 - a^2) &<& \sqrt{1-a^2}, \\ 1 - \sqrt{1-a^2} &<& a^2 &<& 1 + \sqrt{1-a^2}, \\ 2 - 2\sqrt{1-a^2} &<& 2a^2 &<& 2 + 2\sqrt{1-a^2}, \\ 2 -a^2 - 2\sqrt{1-a^2} &<& a^2 &<& 2 -a^2 + 2\sqrt{1-a^2}, \\ \ds \frac{2 -a^2 - 2\sqrt{1-a^2}}{a^2} &<& 1 &<& \ds \frac{2 -a^2 + 2\sqrt{1-a^2}}{a^2}. \er \] Per Cauchy's integral formula~(\ref{taylor:cauchy}), integrating about the pole within the contour yields \[ T = \left. i2\pi \frac{-i2/a}{ z-\left(-1-\sqrt{1-a^2}\right)/a }\right|_{z=\left(-1+\sqrt{1-a^2}\right)/a} = \frac{2\pi}{ \sqrt{1-a^2} }. \] Observe that by means of a complex variable of integration, each example has indirectly evaluated an integral whose integrand is purely real. If it seems unreasonable to the reader to expect so flamboyant a technique actually to work, this seems equally unreasonable to the writer---but work it does, nevertheless. It is a great technique. The technique, \emph{integration by closed contour,} is found in practice to solve many integrals other techniques find almost impossible to crack. The key to making the technique work lies in closing a contour one knows how to treat. The robustness of the technique lies in that any contour of any shape will work, so long as the contour encloses appropriate poles in the Argand domain plane while shutting branch points out. \index{magnitude!of an integral} \index{integral!magnitude of} \index{integrand!magnitude of} \index{triangle inequalities!complex} The extension \bq{inttx:250:triangle} \left| \int_{z_1}^{z_2} f(z) \,dz \right| \le \int_{z_1}^{z_2} \left| f(z) \,dz \right| \eq of the complex triangle sum inequality~(\ref{trig:278:triangle}) from the discrete to the continuous case sometimes proves useful in evaluating integrals by this section's technique, as in \S~\ref{fours:160.35}. % ---------------------------------------------------------------------- \section{Integration by partial-fraction expansion} \label{inttx:260} \index{integration!by partial-fraction expansion} \index{partial-fraction expansion} This section treats integration by partial-fraction expansion. It introduces the expansion itself first.% \footnote{% \cite[Appendix~F]{Phillips/Parr}% \cite[\S\S~2.7 and~10.12]{Hildebrand} } Throughout the section, \[ j,j',k,\ell,m,n,p,p_{(\cdot)},M,N \in \mathbb Z. \] \subsection{Partial-fraction expansion} \label{inttx:260.10} \index{partial-fraction expansion} \index{fraction} \index{numerator} \index{denominator} \index{ratio} \index{quotient} Consider the function \[ f(z) = \frac{-4}{z-1} + \frac{5}{z-2}. \] Combining the two fractions over a common denominator% \footnote{ Terminology (you probably knew this already): A \emph{fraction} is the ratio of two numbers or expressions $B/A$. In the fraction,~$B$ is the \emph{numerator} and~$A$ is the \emph{denominator.} The \emph{quotient} is $Q=B/A$. } yields \[ f(z) = \frac{z+3}{(z-1)(z-2)}. \] Of the two forms, the former is probably the more amenable to analysis. For example, using~(\ref{inttx:intinvz}), \bqb \int_{-1}^0 f(\tau)\,d\tau &=& \int_{-1}^0\frac{-4}{\tau-1}\,d\tau + \int_{-1}^0\frac{5}{\tau-2}\,d\tau \\ &=& \left[ -4\ln(1-\tau) +5\ln(2-\tau) \right]_{-1}^0. \eqb The trouble is that one is not always given the function in the amenable form. \index{residue} \index{function!rational} \index{rational function} Given a \emph{rational function} \bq{inttx:260:10} f(z) = \frac{ \sum_{k=0}^{N-1} b_k z^k }{ \prod_{j=1}^N (z-\alpha_j) } \eq in which no two of the several poles~$\alpha_j$ are the same, the \emph{partial-fraction expansion} has the form \bq{inttx:260:20} f(z) = \sum_{k=1}^N \frac{A_k}{z-\alpha_k}, \eq where multiplying each fraction of~(\ref{inttx:260:20}) by \[ \frac{\left[\prod_{j=1}^N (z-\alpha_j)\right]/(z-\alpha_k)} {\left[\prod_{j=1}^N (z-\alpha_j)\right]/(z-\alpha_k)} \] puts the several fractions over a common denominator, yielding~(\ref{inttx:260:10}). Dividing~(\ref{inttx:260:10}) by~(\ref{inttx:260:20}) gives the ratio \[ 1 = \left. \frac{ \sum_{k=0}^{N-1} b_k z^k }{ \prod_{j=1}^N (z-\alpha_j) } \right/ \sum_{k=1}^N \frac{A_k}{z-\alpha_k}. \] In the immediate neighborhood of $z=\alpha_m$, the $m$th term $A_m/(z-\alpha_m)$ dominates the summation of~(\ref{inttx:260:20}). Hence, \[ 1 = \lim_{z\ra\alpha_m} \left. \frac{ \sum_{k=0}^{N-1} b_k z^k }{ \prod_{j=1}^N (z-\alpha_j) } \right/ \frac{A_m}{z-\alpha_m}. \] Rearranging factors, we have that \bq{inttx:260:25} A_m = \left.\frac{ \sum_{k=0}^{N-1} b_k z^k }{ \left[ \prod_{j=1}^N (z-\alpha_j) \right] / (z-\alpha_m) }\right|_{z=\alpha_m} = \lim_{z \ra \alpha_m} \left[ (z-\alpha_m) f(z) \right], \eq where~$A_m$, the value of $f(z)$ with the pole canceled, is called the \emph{residue} of $f(z)$ at the pole $z=\alpha_m$. Equations~(\ref{inttx:260:20}) and~(\ref{inttx:260:25}) together give the partial-fraction expansion of~(\ref{inttx:260:10})'s rational function $f(z)$. \subsection{Repeated poles} \label{inttx:260.20} \index{pole!double} \index{pole!multiple} \index{pole!repeated} \index{double pole} \index{multiple pole} \index{repeated pole} \index{separation of poles} \index{pole!separation of} \index{perturbation} The weakness of the partial-fraction expansion of \S~\ref{inttx:260.10} is that it cannot directly handle repeated poles. That is, if $\alpha_n=\alpha_j$, $n\neq j$, then the residue formula~(\ref{inttx:260:25}) finds an uncanceled pole remaining in its denominator and thus fails for $A_n=A_j$ (it still works for the other~$A_m$). The conventional way to expand a fraction with repeated poles is presented in \S~\ref{inttx:260.50} below; but because at least to this writer that way does not lend much applied insight, the present subsection treats the matter in a different way. Here, we \emph{separate the poles.} \index{pole!circle of} \index{Parseval's principle} \index{Parseval, Marc-Antoine (1755--1836)} Consider the function \bq{inttx:260:40} g(z) = \sum_{k=0}^{N-1} \frac{Ce^{i2\pi k/N}}{z-\ep e^{i2\pi k/N}}, \ \ N>1,\ 0<\ep\ll 1, \eq where~$C$ is a real-valued constant. This function evidently has a small circle of poles in the Argand plane at $\alpha_k=\ep e^{i2\pi k/N}$. Factoring, \[ g(z) = \frac{C}{z} \sum_{k=0}^{N-1} \frac{e^{i2\pi k/N}}{1-(\ep e^{i2\pi k/N})/z}. \] Using~(\ref{alggeo:228:45}) to expand the fraction, \bqb g(z) &=& \frac{C}{z} \sum_{k=0}^{N-1} \left[ e^{i2\pi k/N} \sum_{j=0}^{\infty}\left(\frac{\ep e^{i2\pi k/N}}{z}\right)^j \right] \\ &=& C \sum_{k=0}^{N-1} \sum_{j=1}^{\infty}\frac{\ep^{j-1} e^{i2\pi jk/N}}{z^j} \\ &=& C \sum_{j=1}^{\infty} \frac{\ep^{j-1}}{z^j} \sum_{k=0}^{N-1} \left(e^{i2\pi j/N}\right)^k. \eqb But% \footnote{\label{inttx:260:fn1}% If you don't see why, then for $N=8$ and $j=3$ plot the several $(e^{i2\pi j/N})^k$ in the Argand plane. Do the same for $j=2$ then $j=8$. Only in the $j=8$ case do the terms add coherently; in the other cases they cancel. This effect---reinforcing when $j=nN$, canceling otherwise---is a classic manifestation of \emph{Parseval's principle,} which \S~\ref{fours:080} will formally introduce later in the book. } \[ \sum_{k=0}^{N-1} \left(e^{i2\pi j/N}\right)^k = \begin{cases} N &\mbox{if}\ j = mN, \\ 0 &\mbox{otherwise}, \end{cases} \] so \[ g(z) = NC \sum_{m=1}^{\infty} \frac{\ep^{mN-1}}{z^{mN}}. \] For $\left|z\right|\gg\ep$---that is, except in the immediate neighborhood of the small circle of poles---the first term of the summation dominates. Hence, \[ g(z) \approx NC \frac{\ep^{N-1}}{z^{N}}, \ \ \left|z\right|\gg\ep. \] Having achieved this approximation, if we strategically choose \[ C=\frac{1}{N\ep^{N-1}}, \] then \[ g(z) \approx \frac{1}{z^{N}}, \ \ \left|z\right|\gg\ep. \] But given the chosen value of~$C$,~(\ref{inttx:260:40}) is \[ g(z) = \frac{1}{N\ep^{N-1}} \sum_{k=0}^{N-1} \frac{e^{i2\pi k/N}}{z-\ep e^{i2\pi k/N}}, \ \ N>1,\ 0<\ep\ll 1. \] Joining the last two equations together, changing $z-z_o \la z$, and writing more formally, we have that \bq{inttx:260:30} \frac{1}{(z-z_o)^{N}} = \lim_{\ep\ra 0} \frac{1}{N\ep^{N-1}} \sum_{k=0}^{N-1} \frac{e^{i2\pi k/N}}{z-\left[z_o+\ep e^{i2\pi k/N}\right]}, \ \ N>1. \eq The significance of~(\ref{inttx:260:30}) is that it lets one replace an $N$-fold pole with a small circle of ordinary poles, which per \S~\ref{inttx:260.10} we already know how to handle. Notice incidentally that $1/N\ep^{N-1}$ is a large number not a small. The poles are close together but very strong. An example to illustrate the technique, separating a double pole: \bqb f(z) &=& \frac{z^2-z+6}{(z-1)^2(z+2)} \\ &=& \lim_{\ep\ra 0}\frac{z^2-z+6} {(z-[1+\ep e^{i2\pi(0)/2}])(z-[1+\ep e^{i2\pi(1)/2}])(z+2)} \\ &=& \lim_{\ep\ra 0}\frac{z^2-z+6} {(z-[1+\ep])(z-[1-\ep])(z+2)} \\ &=& \lim_{\ep\ra 0}\blr \left(\frac{1}{z-[1+\ep]}\right) \left[\frac{z^2-z+6}{(z-[1-\ep])(z+2)}\right]_{z=1+\ep} \right. \\&& \left.\makebox[2.0\parindent]{} + \left(\frac{1}{z-[1-\ep]}\right) \left[\frac{z^2-z+6}{(z-[1+\ep])(z+2)}\right]_{z=1-\ep} \right. \\&& \left.\makebox[2.0\parindent]{} + \left(\frac{1}{z+2}\right) \left[\frac{z^2-z+6}{(z-[1+\ep])(z-[1-\ep])}\right]_{z=-2} \brr \\ &=& \lim_{\ep\ra 0}\blr \left(\frac{1}{z-[1+\ep]}\right) \left[\frac{6+\ep}{6\ep+2\ep^2}\right] \right. \\&& \left.\makebox[2.0\parindent]{} + \left(\frac{1}{z-[1-\ep]}\right) \left[\frac{6-\ep}{-6\ep+2\ep^2}\right] \right. \\&& \left.\makebox[2.0\parindent]{} + \left(\frac{1}{z+2}\right) \left[\frac{\mbox{0xC}}{9}\right] \brr \\ &=& \lim_{\ep\ra 0}\left\{ \frac{1/\ep - 1/6}{z-[1+\ep]} + \frac{-1/\ep-1/6}{z-[1-\ep]} + \frac{4/3}{z+2} \right\} \\ &=& \lim_{\ep\ra 0}\left\{ \frac{1/\ep}{z-[1+\ep]} + \frac{-1/\ep}{z-[1-\ep]} + \frac{-1/3}{z-1} + \frac{4/3}{z+2} \right\}. \eqb Notice how the calculation has discovered an additional, single pole at $z=1$, the pole hiding under dominant, double pole there. \subsection{Integrating a rational function} \label{inttx:260.30} \index{integral!of a rational function} \index{rational function!integral of} \index{function!rational, integral of} If one can find the poles of a rational function of the form~(\ref{inttx:260:10}), then one can use~(\ref{inttx:260:20}) and~(\ref{inttx:260:25})---and, if needed,~(\ref{inttx:260:30})---to expand the function into a sum of partial fractions, each of which one can integrate individually. Continuing the example of \S~\ref{inttx:260.20}, for $0\le x<1$, \bqb \int_0^x f(\tau) \,d\tau &=& \int_0^x \frac{\tau^2-\tau+6}{(\tau-1)^2(\tau+2)} \,d\tau \\ &=& \lim_{\ep\ra 0}\int_0^x\left\{ \frac{1/\ep}{\tau-[1+\ep]} + \frac{-1/\ep}{\tau-[1-\ep]} + \frac{-1/3}{\tau-1} + \frac{4/3}{\tau+2} \right\} \,d\tau \\&=& \lim_{\ep\ra 0}\bigg\{ \frac{1}{\ep}\ln([1+\ep]-\tau) - \frac{1}{\ep}\ln([1-\ep]-\tau) \\&&\ \ \ \ \ \ \ \ \mbox{} - \frac 1 3 \ln(1-\tau) + \frac 4 3 \ln(\tau+2) \bigg\}_0^x \\ &=& \lim_{\ep\ra 0}\left\{ \frac{1}{\ep}\ln\left(\frac{[1+\ep]-\tau}{[1-\ep]-\tau}\right) - \frac 1 3 \ln(1-\tau) + \frac 4 3 \ln(\tau+2) \right\}_0^x \\ &=& \lim_{\ep\ra 0}\left\{ \frac{1}{\ep}\ln\left(\frac{[1-\tau]+\ep}{[1-\tau]-\ep}\right) - \frac 1 3 \ln(1-\tau) + \frac 4 3 \ln(\tau+2) \right\}_0^x \\ &=& \lim_{\ep\ra 0}\left\{ \frac{1}{\ep}\ln\left(1 + \frac{2\ep}{1-\tau}\right) - \frac 1 3 \ln(1-\tau) + \frac 4 3 \ln(\tau+2) \right\}_0^x \\ &=& \lim_{\ep\ra 0}\left\{ \frac{1}{\ep}\left(\frac{2\ep}{1-\tau}\right) - \frac 1 3 \ln(1-\tau) + \frac 4 3 \ln(\tau+2) \right\}_0^x \\ &=& \lim_{\ep\ra 0}\left\{ \frac{2}{1-\tau} - \frac 1 3 \ln(1-\tau) + \frac 4 3 \ln(\tau+2) \right\}_0^x \\ &=& \frac{2}{1-x} - 2 - \frac 1 3 \ln(1-x) + \frac 4 3 \ln\left(\frac{x+2}{2}\right). \eqb To check (\S~\ref{integ:245}) that the result is correct, we can take the derivative of the final expression: \bqb \lefteqn{ \left[ \frac{d}{dx} \left\{ \frac{2}{1-x} - 2 - \frac 1 3 \ln(1-x) + \frac 4 3 \ln\left(\frac{x+2}{2}\right) \right\} \right]_{x=\tau} }\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ && \\&=& \frac{2}{(\tau-1)^2} + \frac{-1/3}{\tau-1} + \frac{4/3}{\tau+2} \\&=& \frac{ \tau^2 - \tau + 6 }{ (\tau-1)^2(\tau+2) }, \eqb which indeed has the form of the integrand we started with, confirming the result. (Notice incidentally how much easier it is symbolically to differentiate than to integrate!) % diagn: this paragraph is new and wants review Section~\ref{fouri:250} exercises the technique in a more sophisticated way, applying it in the context of Ch.~\ref{fouri}'s Laplace transform to solve a linear differential equation. \subsection{The derivatives of a rational function} \label{inttx:260.60} \index{derivative!of a rational function} \index{rational function!derivatives of} \index{function!rational, derivatives of} Not only the integral of a rational function interests us; its derivatives interest us, too. One needs no special technique to compute such derivatives, of course, but the derivatives do bring some noteworthy properties. First of interest is the property that a function in the general rational form \bq{inttx:260:80} \Phi(w) = \frac{w^p h_0(w)}{g(w)}, \ \ g(0) \neq 0, \eq enjoys derivatives in the general rational form \bq{inttx:260:83} \frac{d^k\Phi}{dw^k} = \frac{w^{p-k} h_k(w)}{\left[g(w)\right]^{k+1}}, \ \ 0 \le k \le p, \eq where~$g$ and~$h_k$ are polynomials in nonnegative powers of~$w$. The property is proved by induction. When $k=0$, (\ref{inttx:260:83}) is~(\ref{inttx:260:80}), so~(\ref{inttx:260:83}) is good at least for this case. Then, if~(\ref{inttx:260:83}) holds for $k=n-1$, \bqb \frac{d^n\Phi}{dw^n} &=& \frac{d}{dw} \left[ \frac{d^{n-1}\Phi}{dw^{n-1}} \right] = \frac{d}{dw} \left[ \frac{w^{p-n+1} h_{n-1}(w)}{\left[g(w)\right]^n} \right] = \frac{w^{p-n} h_{n}(w)}{\left[g(w)\right]^{n+1}}, \\ h_n(w) &\equiv& wg\frac{dh_{n-1}}{dw} - nwh_{n-1}\frac{dg}{dw} + (p-n+1)gh_{n-1}, \ \ 0 < n \le p, \eqb which makes~$h_n$ (like $h_{n-1}$) a polynomial in nonnegative powers of~$w$. By induction on this basis,~(\ref{inttx:260:83}) holds for all $0 \le k \le p$, as was to be demonstrated. A related property is that \bq{inttx:260:86} \left.\frac{d^k\Phi}{dw^k}\right|_{w=0} = 0 \ \ \ \ \mbox{for $0 \le k < p$}. \eq That is, the function and its first $p-1$ derivatives are all zero at $w=0$. The reason is that~(\ref{inttx:260:83})'s denominator is $\left[g(w)\right]^{k+1} \neq 0$, whereas its numerator has a $w^{p-k}=0$ factor, when $0 \le k < p$ and $w=0$. \subsection{Repeated poles (the conventional technique)} \label{inttx:260.50} \index{pole!double} \index{pole!multiple} \index{pole!repeated} \index{double pole} \index{multiple pole} \index{repeated pole} Though the technique of \S\S~\ref{inttx:260.20} and~\ref{inttx:260.30} affords extra insight, it is not the conventional technique to expand in partial fractions a rational function having a repeated pole. The conventional technique is worth learning not only because it is conventional but also because it is usually quicker to apply in practice. This subsection derives it. A rational function with repeated poles, \bqa f(z) &=& \frac{ \sum_{k=0}^{N-1} b_k z^k }{ \prod_{j=1}^M (z-\alpha_j)^{p_j} }, \label{inttx:260:50}\\ N &\equiv& \sum_{j=1}^M p_j, \xn\\ p_j &\ge& 0, \xn\\ \alpha_{j'} &\neq& \alpha_j \ \ \mbox{if $j' \neq j$}, \xn \eqa where~$j$, $k$, $M$, $N$ and the several~$p_j$ are integers, cannot be expanded solely in the first-order fractions of \S~\ref{inttx:260.10}, but can indeed be expanded if higher-order fractions are allowed: \settoheight\tla{\scriptsize $k$} \settodepth \tlb{\scriptsize $p_{j}$} \bq{inttx:260:70} f(z) = \sum_{\rule{0pt}{\tla}j=1}^{\rule[-\tlb]{0pt}{0pt}M} \sum_{\ell=0}^{p_j-1} \frac{A_{j\ell}}{(z-\alpha_j)^{p_j-\ell}}. \eq What the partial-fraction expansion~(\ref{inttx:260:70}) lacks are the values of its several coefficients~$A_{j\ell}$. One can determine the coefficients with respect to one (possibly repeated) pole at a time. To determine them with respect to the $p_m$-fold pole at $z=\alpha_m$, $1 \le m \le M$, one multiplies~(\ref{inttx:260:70}) by $(z-\alpha_m)^{p_m}$ to obtain the form \settoheight\tla{\scriptsize $\ell$} \settodepth \tlb{\scriptsize $p_{j}$} \[ (z-\alpha_m)^{p_m} f(z) = \sum_{ \stackindexdecl{ \rule{0pt}{\tla}j &=& 1, \\ j &\neq& m } }^{\rule[-\tlb]{0pt}{0pt}M} \sum_{\ell=0}^{p_j-1} \frac{(A_{j\ell})(z-\alpha_m)^{p_m}}{(z-\alpha_j)^{p_j-\ell}} + \sum_{\ell=0}^{p_m-1} (A_{m\ell})(z-\alpha_m)^{\ell}. \] But~(\ref{inttx:260:86}) with $w=z-\alpha_m$ reveals the double summation and its first $p_m-1$ derivatives all to be null at $z=\alpha_m$; that is, \settoheight\tla{\scriptsize $\ell$} \settodepth \tlb{\scriptsize $p_{j}$} \[ \left. \frac{d^k}{dz^k} \sum_{ \stackindexdecl{ \rule{0pt}{\tla}j &=& 1, \\ j &\neq& m } }^{\rule[-\tlb]{0pt}{0pt}M} \sum_{\ell=0}^{p_j-1} \frac{(A_{j\ell})(z-\alpha_m)^{p_m}}{(z-\alpha_j)^{p_j-\ell}} \right|_{z=\alpha_m} = 0, \ \ \ \ 0 \le k < p_m; \] so, the $(z-\alpha_m)^{p_m} f(z)$ equation's $k$th derivative reduces at that point to \bqb \frac{d^k}{dz^k} \Big[ (z-\alpha_m)^{p_m} f(z) \Big] \bigg|_{z=\alpha_m} &=& \sum_{\ell=0}^{p_m-1} \frac{d^k}{dz^k} \Big[ (A_{m\ell})(z-\alpha_m)^{\ell} \Big] \bigg|_{z=\alpha_m} \\&=& k!A_{mk}, \ \ 0 \le k < p_m. \eqb Changing $j \la m$ and $\ell \la k$ and solving for~$A_{j\ell}$ then produces the coefficients \bq{inttx:260:75} A_{j\ell} = \left. \left( \frac{1}{\ell!} \right) \frac{d^\ell}{dz^\ell} \Big[ (z-\alpha_j)^{p_j}f(z) \Big] \right|_{z=\alpha_j} , \ \ \ \ 0 \le \ell < p, \eq to weight the expansion~(\ref{inttx:260:70})'s partial fractions. In case of a repeated pole, these coefficients evidently depend not only on the residual function itself but also on its several derivatives, one derivative per repetition of the pole. \subsection{The existence and uniqueness of solutions} \label{inttx:260.55} \index{uniqueness} \index{existence} \index{professional mathematician} \index{mathematician!professional} Equation~(\ref{inttx:260:75}) has solved~(\ref{inttx:260:50}) and~(\ref{inttx:260:70}). A professional mathematician might object however that it has done so without first proving that a unique solution actually exists. Comes from us the reply, ``Why should we prove that a solution exists, once we have actually found it?'' Ah, but the professional's point is that we have found the solution only if in fact it does exist, and uniquely; otherwise what we have \emph{found} is a phantom. A careful review of \S~\ref{inttx:260.50}'s logic discovers no guarantee that all of~(\ref{inttx:260:75})'s coefficients actually come from the same expansion. Maybe there exist two distinct expansions, and some of the coefficients come from the one, some from the other. On the other hand, maybe there exists no expansion at all, in which event it is not even clear what~(\ref{inttx:260:75}) means. ``But these are quibbles, cavils and nitpicks!'' we are inclined to grumble. ``The present book is a book of applied mathematics.'' Well, yes, but on this occasion let us nonetheless follow the professional's line of reasoning, if only a short way. \emph{Uniqueness} is proved by positing two solutions \settoheight\tla{\scriptsize $\ell$} \settodepth \tlb{\scriptsize $p_{j}$} \[ f(z) = \sum_{\rule{0pt}{\tla}j=1}^{\rule[-\tlb]{0pt}{0pt}M} \sum_{\ell=0}^{p_j-1} \frac{A_{j\ell}}{(z-\alpha_j)^{p_j-\ell}} = \sum_{\rule{0pt}{\tla}j=1}^{\rule[-\tlb]{0pt}{0pt}M} \sum_{\ell=0}^{p_j-1} \frac{B_{j\ell}}{(z-\alpha_j)^{p_j-\ell}} \] and computing the difference \settoheight\tla{\scriptsize $\ell$} \settodepth \tlb{\scriptsize $p_{j}$} \[ \sum_{\rule{0pt}{\tla}j=1}^{\rule[-\tlb]{0pt}{0pt}M} \sum_{\ell=0}^{p_j-1} \frac{B_{j\ell}-A_{j\ell}}{(z-\alpha_j)^{p_j-\ell}} \] between them. Logically this difference must be zero for all~$z$ if the two solutions are actually to represent the same function $f(z)$. This however is seen to be possible only if $B_{j\ell} = A_{j\ell}$ for each $(j,\ell)$. Therefore, the two solutions are one and the same. \emph{Existence} comes of combining the several fractions of~(\ref{inttx:260:70}) over a common denominator and comparing the resulting numerator against the numerator of~(\ref{inttx:260:50}). Each coefficient~$b_k$ is seen thereby to be a linear combination of the several~$A_{j\ell}$, where the combination's weights depend solely on the locations~$\alpha_j$ and multiplicities~$p_j$ of $f(z)$'s several poles. From the~$N$ coefficients~$b_k$ and the~$N$ coefficients~$A_{j\ell}$, an $N\times N$ system of~$N$ linear equations in~$N$ unknowns results---which might for example (if, say, $N=3$) look like \bqb b_0 &=& -2A_{00} + A_{01} + 3A_{10}, \\ b_1 &=& A_{00} + A_{01} + A_{10}, \\ b_2 &=& 2A_{01} - 5A_{10}. \eqb We will show in Chs.~\ref{matrix} through~\ref{eigen} that when such a system has no solution, there always exist an alternate set of~$b_k$ for which the same system has multiple solutions. But uniqueness, which we have already established, forbids such multiple solutions in all cases. Therefore it is not possible for the system to have no solution---which is to say, the solution necessarily exists. We will not often in this book prove existence and uniqueness explicitly, but such proofs when desired tend to fit the pattern outlined here. % ---------------------------------------------------------------------- \section{Frullani's integral} \label{inttx:460} \index{Frullani's integral} \index{Frullani, Giuliano (1795--1834)} One occasionally meets an integral of the form \[ S = \int_0^\infty \frac{f(b\tau)-f(a\tau)}{\tau} \,d\tau, \] where~$a$ and~$b$ are real, positive coefficients and $f(\tau)$ is an arbitrary complex expression in~$\tau$. One wants to split such an integral in two as $\int [f(b\tau)/\tau]\,d\tau - \int [f(a\tau)/\tau]\,d\tau$; but if $f(0^+) \neq f(+\infty)$, one cannot, because each half-integral alone diverges. Nonetheless, splitting the integral in two is the right idea, provided that one first relaxes the limits of integration as \[ S = \lim_{\ep\ra 0^+} \left\{ \int_{\ep}^{1/\ep}\frac{f(b\tau)}{\tau} \,d\tau -\int_{\ep}^{1/\ep}\frac{f(a\tau)}{\tau} \,d\tau \right\}. \] Changing $\sigma\la b\tau$ in the left integral and $\sigma\la a\tau$ in the right yields \bqb S &=& \lim_{\ep\ra 0^+} \left\{ \int_{b\ep}^{b/\ep}\frac{f(\sigma)}{\sigma} \,d\sigma -\int_{a\ep}^{a/\ep}\frac{f(\sigma)}{\sigma} \,d\sigma \right\} \\ &=& \lim_{\ep\ra 0^+} \left\{ \int_{a\ep}^{b\ep}\frac{-f(\sigma)}{\sigma} \,d\sigma +\int_{b\ep}^{a/\ep}\frac{f(\sigma)-f(\sigma)}{\sigma} \,d\sigma +\int_{a/\ep}^{b/\ep}\frac{f(\sigma)}{\sigma} \,d\sigma \right\} \\ &=& \lim_{\ep\ra 0^+} \left\{ \int_{a/\ep}^{b/\ep}\frac{f(\sigma)}{\sigma} \,d\sigma -\int_{a\ep}^{b\ep}\frac{f(\sigma)}{\sigma} \,d\sigma \right\} \eqb (here on the face of it, we have split the integration as though $a \le b$, but in fact it does not matter which of~$a$ and~$b$ is the greater, as is easy to verify). So long as each of $f(\ep)$ and $f(1/\ep)$ approaches a constant value as~$\ep$ vanishes, this is \bqb S &=& \lim_{\ep\ra 0^+} \left\{ f(+\infty)\int_{a/\ep}^{b/\ep}\frac{d\sigma}{\sigma} -f(0^+)\int_{a\ep}^{b\ep}\frac{d\sigma}{\sigma} \right\} \\&=& \lim_{\ep\ra 0^+} \left\{ f(+\infty)\ln\frac{b/\ep}{a/\ep} -f(0^+)\ln\frac{b\ep}{a\ep} \right\} \\&=& \left[ f(\tau) \right]_0^\infty \ln \frac b a. \eqb Thus we have \emph{Frullani's integral,} \bq{inttx:460:frullani} \int_0^\infty \frac{f(b\tau)-f(a\tau)}{\tau} \,d\tau = \left[ f(\tau) \right]_0^\infty \ln \frac b a, \eq which, if~$a$ and~$b$ are both real and positive, works for any $f(\tau)$ which has definite $f(0^+)$ and $f(+\infty)$.% \footnote{% \cite[\S~1.3]{Lebedev}% \cite[\S~2.5.1]{Andrews}% \cite[``Frullani's integral'']{EWW} } % ---------------------------------------------------------------------- \section[Products of exponentials, powers and logs] {Integrating products of exponentials, powers and logarithms} \label{inttx:470} \index{integration!of a product of exponentials, powers and logarithms} \index{exponential!integrating a product of a power and} \index{logarithm!integrating a product of a power and} The products $\exp(\alpha\tau)\tau^n$ (where $n \in \mathbb Z$) and $\tau^{a-1}\ln\tau$ tend to arise% \footnote{ One could write the latter product more generally as $\tau^{a-1}\ln\beta\tau$. According to Table~\ref{alggeo:230:tbl}, however, $\ln\beta\tau = \ln\beta + \ln\tau$; wherein $\ln \beta$ is just a constant. } among other places in integrands related to special functions % diagn ([chapter not yet written]). The two occur often enough to merit investigation here. Concerning $\exp(\alpha\tau)\tau^n$, by \S~\ref{inttx:240}'s method of unknown coefficients we guess its antiderivative to fit the form \bqb \exp(\alpha\tau)\tau^n &=& \frac{d}{d\tau} \sum_{k=0}^n a_k \exp(\alpha\tau)\tau^k \\&=& \sum_{k=0}^n \alpha a_k \exp(\alpha\tau)\tau^k + \sum_{k=1}^n \frac{a_k}{k} \exp(\alpha\tau)\tau^{k-1} \\&=& \alpha a_n \exp(\alpha\tau)\tau^n + \sum_{k=0}^{n-1} \left( \alpha a_k + \frac{a_{k+1}}{k+1} \right) \exp(\alpha\tau)\tau^k. \eqb If so, then evidently \bqb a_n &=& \frac{1}{\alpha}; \\ a_k &=& - \frac{a_{k+1}}{(k+1)(\alpha)}, \ \ 0 \le k < n. \eqb That is, \[ a_k = \frac{1}{ \alpha \prod_{j=k+1}^{n}(-j\alpha) } = \frac{(-)^{n-k}}{ (n!/k!) \alpha^{n-k+1} }, \ \ 0 \le k \le n. \] Therefore,% \footnote{\cite[Appendix~2, eqn.~73]{Shenk}} \bq{inttx:470:20} \exp(\alpha\tau)\tau^n = \frac{d}{d\tau} \sum_{k=0}^n \frac{(-)^{n-k}}{ (n!/k!) \alpha^{n-k+1} } \exp(\alpha\tau)\tau^k, \ \ n \in \mathbb Z, \ n \ge 0, \ \alpha \neq 0. \eq The right form to guess for the antiderivative of $\tau^{a-1}\ln \tau$ is less obvious. Remembering however \S~\ref{cexp:228}'s observation that $\ln\tau$ is of zeroth order in~$\tau$, after maybe some false tries we eventually do strike the right form \bqb \tau^{a-1}\ln \tau &=& \frac{d}{d\tau}\tau^a[B\ln\tau + C] \\&=& \tau^{a-1}[aB\ln\tau + (B+aC)], \eqb which demands that $B=1/a$ and that $C=-1/a^2$. Therefore,% \footnote{\cite[Appendix~2, eqn.~74]{Shenk}} \bq{inttx:470:30} \tau^{a-1}\ln \tau = \frac{d}{d\tau}\frac{\tau^a}{a}\left(\ln\tau - \frac{1}{a}\right), \ \ a \neq 0. \eq Antiderivatives of terms like $\tau^{a-1}(\ln \tau)^2$, $\exp(\alpha\tau)\tau^n\ln\tau$ and so on can be computed in like manner as the need arises. Equation~(\ref{inttx:470:30}) fails when $a=0$, but in this case with a little imagination the antiderivative is not hard to guess: \bq{inttx:470:35} \frac{\ln \tau}{\tau} = \frac{d}{d\tau} \frac{(\ln\tau)^2}{2}. \eq If~(\ref{inttx:470:35}) seemed hard to guess nevertheless, then l'H\^opital's rule~(\ref{drvtv:260:lhopital}), applied to~(\ref{inttx:470:30}) as $a \ra 0$, with the observation from~(\ref{alggeo:230:316}) that \bq{inttx:470:40} \tau^a = \exp(a\ln\tau), \eq would yield the same~(\ref{inttx:470:35}). Table~\ref{inttx:470:tbl} summarizes. \begin{table} \caption[Antiderivatives of products of exps, powers and logs.] {Antiderivatives of products of exponentials, powers and logarithms.} \label{inttx:470:tbl} \index{antiderivative!of a product of exponentials, powers and logarithms} \bqb \exp(\alpha\tau)\tau^n &=& \frac{d}{d\tau} \sum_{k=0}^n \frac{(-)^{n-k}}{ (n!/k!) \alpha^{n-k+1} } \exp(\alpha\tau)\tau^k, \ \ n \in \mathbb Z, \ n \ge 0, \ \alpha \neq 0 \\ \tau^{a-1}\ln \tau &=& \frac{d}{d\tau}\frac{\tau^a}{a}\left(\ln\tau - \frac{1}{a}\right), \ \ a \neq 0 \\ \frac{\ln \tau}{\tau} &=& \frac{d}{d\tau} \frac{(\ln\tau)^2}{2} \eqb \end{table} % ---------------------------------------------------------------------- \section{Integration by Taylor series} \label{inttx:450} \index{integration!by Taylor series} \index{Taylor series!integration by} \index{closed analytic form} \index{closed form} With sufficient cleverness the techniques of the foregoing sections solve many, many integrals. But not all. When all else fails, as sometimes it does, the Taylor series of Ch.~8 and the antiderivative of \S~\ref{inttx:210} together offer a concise, practical way to integrate some functions, at the price of losing the functions' known closed analytic forms. For example, \bqb \int_0^x \exp\left(-\frac{\tau^2}{2}\right) \,d\tau &=& \int_0^x \sum_{k=0}^{\infty} \frac{(-\tau^2/2)^k}{k!} \,d\tau \\&=& \int_0^x \sum_{k=0}^{\infty} \frac{(-)^k\tau^{2k}}{2^k k!} \,d\tau \\&=& \left[ \sum_{k=0}^{\infty} \frac{(-)^k\tau^{2k+1}}{(2k+1)2^k k!} \right]_0^x \\&=& \sum_{k=0}^{\infty} \frac{(-)^kx^{2k+1}}{(2k+1)2^k k!} = (x) \sum_{k=0}^{\infty} \frac{1}{2k+1} \prod_{j=1}^k \frac{-x^2}{2j}. \eqb The result is no function one recognizes; it is just a series. This is not necessarily bad, however. After all, when a Taylor series from Table~\ref{taylor:315:tbl} is used to calculate $\sin z$, then $\sin z$ is just a series, too. The series above converges just as accurately and just as fast. Sometimes it helps to give the series a name like \[ \mopx{myf} z \equiv \sum_{k=0}^{\infty} \frac{(-)^kz^{2k+1}}{(2k+1)2^k k!} = (z) \sum_{k=0}^{\infty} \frac{1}{2k+1} \prod_{j=1}^k \frac{-z^2}{2j}. \] Then, \[ \int_0^x \exp\left(-\frac{\tau^2}{2}\right) \,d\tau = \mopx{myf} x. \] The $\mopx{myf} z$ is no less a function than $\sin z$ is; it's just a function you hadn't heard of before. You can plot the function, or take its derivative \[ \frac{d}{d\tau} \mopx{myf} \tau = \exp\left(-\frac{\tau^2}{2}\right), \] or calculate its value, or do with it whatever else one does with functions. It works just the same. Beyond the several integration techniques this chapter has introduced, a special-purpose technique of integration by cylindrical transformation will surface in \S~\ref{fouri:130}. derivations-0.53.20120414.orig/tex/template-pspicture.tex0000644000000000000000000000053511742566274021513 0ustar rootroot\begin{figure} \caption{} \label{} \bc \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-3.0} \nc\fyb{3.0} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} } \end{pspicture} \ec \end{figure} derivations-0.53.20120414.orig/tex/pst-xkey.sty0000644000000000000000000000320211742575144017455 0ustar rootroot%% %% This is file `pst-xkey.sty', %% generated with the docstrip utility. %% %% The original source files were: %% %% xkeyval.dtx (with options: `pxklatex') %% %% --------------------------------------- %% Copyright (C) 2004-2008 Hendri Adriaens %% --------------------------------------- %% %% This work may be distributed and/or modified under the %% conditions of the LaTeX Project Public License, either version 1.3 %% of this license or (at your option) any later version. %% The latest version of this license is in %% http://www.latex-project.org/lppl.txt %% and version 1.3 or later is part of all distributions of LaTeX %% version 2003/12/01 or later. %% %% This work has the LPPL maintenance status "maintained". %% %% This Current Maintainer of this work is Hendri Adriaens. %% %% This work consists of the file xkeyval.dtx and derived files %% keyval.tex, xkvtxhdr.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, pst-xkey.tex, pst-xkey.sty, xkveca.cls, xkvecb.cls, %% xkvesa.sty, xkvesb.sty, xkvesc.sty, xkvex1.tex, xkvex2.tex, %% xkvex3.tex and xkvex4.tex. %% %% The following files constitute the xkeyval bundle and must be %% distributed as a whole: readme, xkeyval.pdf, keyval.tex, %% pst-xkey.sty, pst-xkey.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, xkvtxhdr.tex, pst-xkey.dtx and xkeyval.dtx. %% \NeedsTeXFormat{LaTeX2e}[1995/12/01] \ProvidesPackage{pst-xkey} [2005/11/25 v1.6 package wrapper for pst-xkey.tex (HA)] \ifx\PSTXKeyLoaded\endinput\else\input pst-xkey \fi \DeclareOptionX*{% \PackageWarning{pst-xkey}{Unknown option `\CurrentOption'}% } \ProcessOptionsX \endinput %% %% End of file `pst-xkey.sty'. derivations-0.53.20120414.orig/tex/greek.tex0000644000000000000000000001236011742566274016760 0ustar rootroot\chapter{The Greek alphabet} \label{greek} \index{Greek alphabet} \index{Roman alphabet} Mathematical experience finds the Roman alphabet to lack sufficient symbols to write higher mathematics clearly. Although not completely solving the problem, the addition of the Greek alphabet helps. See Table~\ref{greek:288:greek}.% \begin{table} \caption{The Roman and Greek alphabets.} \label{greek:288:greek} \bc \begin{tabular}{rlrlrlrl} \multicolumn{8}{c}{ROMAN}\\ $Aa$ &Aa& $Gg$ &Gg& $Mm$ &Mm& $Tt$ &Tt \\ $Bb$ &Bb& $Hh$ &Hh& $Nn$ &Nn& $Uu$ &Uu \\ $Cc$ &Cc& $Ii$ &Ii& $Oo$ &Oo& $Vv$ &Vv \\ $Dd$ &Dd& $Jj$ &Jj& $Pp$ &Pp& $Ww$ &Ww \\ $Ee$ &Ee& $Kk$ &Kk& $Qq$ &Qq& $Xx$ &Xx \\ $Ff$ &Ff& $L\ell$ &Ll& $Rr$ &Rr& $Yy$ &Yy \\ && && $Ss$ &Ss& $Zz$ &Zz \\ &&&&&&&\\ \multicolumn{8}{c}{GREEK}\\ A$\alpha$ &alpha& H$\eta$ &eta& N$\nu$ &nu& T$\tau$ &tau\\ B$\beta$ &beta& $\Theta\theta$ &theta& $\Xi\xi$ &xi& $\Upsilon\upsilon$ &upsilon\\ $\Gamma\gamma$ &gamma& I$\iota$ &iota& O$o$ &omicron& $\Phi\phi$ &phi\\ $\Delta\delta$ &delta& K$\kappa$ &kappa& $\Pi\pi$ &pi& X$\chi$ &chi\\ E$\epsilon$ &epsilon& $\Lambda\lambda$ &lambda& P$\rho$ &rho& $\Psi\psi$ &psi\\ Z$\zeta$ &zeta& M$\mu$ &mu& $\Sigma\sigma$ &sigma& $\Omega\omega$ &omega\\ \end{tabular} \ec \end{table} When first seen in mathematical writing, the Greek letters take on a wise, mysterious aura. Well, the aura is fine---the Greek letters are pretty---but don't let the Greek letters throw you. They're just letters. We use them not because we want to be wise and mysterious% \footnote{ Well, you can use them to be wise and mysterious if you want to. It's kind of fun, actually, when you're dealing with someone who doesn't understand math---if what you want is for him to go away and leave you alone. Otherwise, we tend to use Roman and Greek letters in various conventional ways: Greek minuscules (lower-case letters) for angles; Roman capitals for matrices;~$e$ for the natural logarithmic base;~$f$ and~$g$ for unspecified functions;~$i$, $j$, $k$, $m$, $n$, $M$ and~$N$ for integers;~$P$ and~$Q$ for metasyntactic elements; %(the mathematical equivalents of \texttt{foo} and \texttt{bar}); $t$, $T$ and~$\tau$ for time;~$d$, $\delta$ and~$\Delta$ for change;~$A$, $B$ and~$C$ for unknown coefficients; etc. } but rather because we simply do not have enough Roman letters. An equation like \[ \alpha^2 + \beta^2 = \gamma^2 \] says no more than does an equation like \[ a^2 + b^2 = c^2, \] after all. The letters are just different (though naturally one prefers to use the letters one's audience expects when one can). Applied as well as professional mathematicians tend to use Roman and Greek letters in certain long-established conventional sets:~$abcd$; $fgh$; $ijk\ell$; $mn$; $pqr$; $st$; $uvw$; $xyz$. For the Greek:~$\alpha\beta\gamma$; $\delta\epsilon$; $\kappa\lambda\mu\nu\xi$; $\rho\sigma\tau$; $\phi\chi\psi\omega$. Greek letters are frequently paired with their Roman congeners as appropriate: $a\alpha$; $b\beta$; $cg\gamma$; $d\delta$; $e\epsilon$; $f\phi$; $k\kappa$; $\ell\lambda$; $m\mu$; $n\nu$; $p\pi$; $r\rho$; $s\sigma$; $t\tau$; $x\chi$; $z\zeta$.% \footnote{ The capital pair~$Y\Upsilon$ is occasionally seen but is awkward both because the Greek minuscule~$\upsilon$ is visually almost indistinguishable from the unrelated (or distantly related) Roman minuscule~$v$; and because the ancient Romans regarded the letter~$Y$ not as a congener but as the Greek letter itself, seldom used but to spell Greek words in the Roman alphabet. To use~$Y$ and~$\Upsilon$ as separate symbols is to display an indifference to, easily misinterpreted as an ignorance of, the Graeco-Roman sense of the thing---which is silly, arguably, if you think about it, since no one objects when you differentiate~$j$ from~$i$, or~$u$ and~$w$ from~$v$---but, anyway, one is probably the wiser to tend to limit the mathematical use of the symbol~$\Upsilon$ to the very few instances in which established convention decrees it. (In English particularly, there is also an old typographical ambiguity between~$Y$ and a Germanic, non-Roman letter named ``thorn'' which has practically vanished from English today, to the point that the typeface in which you are reading these words lacks a glyph for it---but which sufficiently literate writers are still expected to recognize on sight. This is one more reason to tend to avoid~$\Upsilon$ when you can, a Greek letter that makes you look ignorant when you use it wrong and pretentious when you use it right. You can't win.) The history of the alphabets is extremely interesting. Unfortunately, a footnote in an appendix to a book on derivations of applied mathematics is probably not the right place for an essay on the topic, so we'll let the matter rest there. } Mathematicians usually avoid letters like the Greek capital~H (eta), which looks just like the Roman capital~H, even though~H (eta) is an entirely proper member of the Greek alphabet. The Greek minuscule~$\upsilon$ (upsilon) is avoided for like reason, for mathematical symbols are useful only insofar as we can visually tell them apart. derivations-0.53.20120414.orig/tex/eigen.tex0000644000000000000000000025517111742566274016763 0ustar rootroot% ---------------------------------------------------------------------- \chapter{The eigenvalue} \label{eigen} \index{eigenvalue} \index{determinant} The \emph{eigenvalue} is a scalar by which a square matrix scales a vector without otherwise changing it, such that \[ A \ve v = \lambda \ve v. \] This chapter analyzes the eigenvalue and the associated \emph{eigenvector} it scales. Before treating the eigenvalue proper, the chapter gathers from across Chs.~\ref{matrix} through~\ref{eigen} several properties all invertible square matrices share, assembling them in \S~\ref{eigen:370} for reference. One of these regards the \emph{determinant,} which opens the chapter. % ---------------------------------------------------------------------- \section{The determinant} \label{eigen:310} \index{determinant} Through Chs.~\ref{matrix}, \ref{gjrank} and~\ref{mtxinv} the theory of the matrix has developed slowly but pretty straightforwardly. Here comes the first unexpected turn. \index{permutor} \index{parity} It begins with an arbitrary-seeming definition. The \emph{determinant} of an $n \times n$ square matrix~$A$ is the sum of~$n!$ terms, each term the product of~$n$ elements, no two elements from the same row or column, terms of positive parity adding to and terms of negative parity subtracting from the sum---a term's parity (\S~\ref{matrix:322}) being the parity of the permutor (\S~\ref{matrix:325.10}) marking the positions of the term's elements. \index{inversion!symbolic} Unless you already know about determinants, the definition alone might seem hard to parse, so try this. The inverse of the general $2\times 2$ square matrix \[ A_2 = \left[\br{rr} a_{11} & a_{12} \\ a_{21} & a_{22} \er\right], \] by the Gauss-Jordan method or any other convenient technique, is found to be \[ A_2^{-1} = \frac{ \left[\br{rr} a_{22} &-a_{12} \\ -a_{21} & a_{11} \er\right] }{a_{11}a_{22} - a_{12}a_{21}}. \] The quantity% \footnote{ The determinant $\det A$ used to be written~$\left|A\right|$, an appropriately terse notation for which the author confesses some nostalgia. The older notation~$\left|A\right|$ however unluckily suggests ``the magnitude of~$A$,'' which though not quite the wrong idea is not quite the right idea, either. The magnitude~$\left|z\right|$ of a scalar or~$\left|\ve u\right|$ of a vector is a real-valued, nonnegative, nonanalytic function of the elements of the quantity in question, whereas the determinant $\det A$ is a complex-valued, analytic function. The book follows convention by denoting the determinant as $\det A$ for this reason among others. } \[ \det A_2 = a_{11}a_{22} - a_{12}a_{21} \] in the denominator is defined to be the \emph{determinant} of~$A_2$. Each of the determinant's terms includes one element from each column of the matrix and one from each row, with parity giving the term its~$\pm$ sign. The determinant of the general $3 \times 3$ square matrix by the same rule is \bqb \det A_3 &=& ( a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32} ) \\ &&\ \ \mbox{} - ( a_{13}a_{22}a_{31} + a_{12}a_{21}a_{33} + a_{11}a_{23}a_{32} ); \eqb and indeed if we tediously invert such a matrix symbolically, we do find that quantity in the denominator there. \index{determinant!definition of} \index{marking quasielementary} \index{quasielementary operator!marking} \index{marking!permutor} \index{permutor!marking} The parity rule merits a more careful description. The parity of a term like $a_{12}a_{23}a_{31}$ is positive because the parity of the permutor, or interchange quasielementary (\S~\ref{matrix:325.10}), \[ P = \mf{ccc}{ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 } \] marking the positions of the term's elements is positive. The parity of a term like $a_{13}a_{22}a_{31}$ is negative for the same reason. The determinant comprehends all possible such terms,~$n!$ in number, half of positive parity and half of negative. (How do we know that exactly half are of positive and half, negative? Answer: by pairing the terms. For every term like $a_{12}a_{23}a_{31}$ whose marking permutor is~$P$, there is a corresponding $a_{13}a_{22}a_{31}$ whose marking permutor is $T_{[1 \lra 2]}P$, necessarily of opposite parity. The sole exception to the rule is the $1 \times 1$ square matrix, which has no second term to pair.) \index{rank-$n$ determinant} \index{determinant!rank-$n$} Normally the context implies a determinant's rank~$n$, but the nonstandard notation \[ \mopx{det}^{(n)} A \] is available especially to call the rank out, stating explicitly that the determinant has exactly~$n!$ terms. (See also \S\S~\ref{matrix:180.22} and~\ref{matrix:321} and eqn.~\ref{matrix:321:20}.% \footnote{ And further Ch.~\ref{mtxinv}'s footnotes~\ref{mtxinv:245:08} and~\ref{mtxinv:450:08}. }% ) It is admitted% \footnote{\cite[\S~1.2]{Franklin}} that we have not, as yet, actually shown the determinant to be a generally useful quantity; we have merely motivated and defined it. Historically the determinant probably emerged not from abstract considerations but for the mundane reason that the quantity it represents occurred frequently in practice (as in the~$A_2^{-1}$ example above). Nothing however logically prevents one from simply defining some quantity which, at first, one merely suspects will later prove useful. So we do here.% \footnote{\cite[Ch.~1]{Franklin}} \subsection{Basic properties} \label{eigen:310.25} \index{determinant!properties of} The determinant $\det A$ enjoys several useful basic properties. \bi \item \index{interchange} If \[ c_{i*} = \begin{cases} a_{i''*} & \mbox{when $i = i'$,} \\ a_{i'*} & \mbox{when $i = i''$,} \\ a_{i*} & \mbox{otherwise,} \end{cases} \] or if \[ c_{*j} = \begin{cases} a_{*j''} & \mbox{when $j = j'$,} \\ a_{*j'} & \mbox{when $j = j''$,} \\ a_{*j} & \mbox{otherwise,} \end{cases} \] where $i'' \neq i'$ and $j'' \neq j'$, then \bq{eigen:310:16} \det C = -\det A. \eq Interchanging rows or columns negates the determinant. \item \index{scaling} If \[ c_{i*} = \begin{cases} \alpha a_{i*} & \mbox{when $i = i'$,} \\ a_{i*} & \mbox{otherwise,} \end{cases} \] or if \[ c_{*j} = \begin{cases} \alpha a_{*j} & \mbox{when $j = j'$,} \\ a_{*j} & \mbox{otherwise,} \end{cases} \] then \bq{eigen:310:11} \det C = \alpha\det A. \eq Scaling a single row or column of a matrix scales the matrix's determinant by the same factor. (Equation~\ref{eigen:310:11} tracks the linear scaling property of \S~\ref{integ:240.05} and of eqn.~\ref{matrix:000:20}.) \item \index{superposition} If \[ c_{i*} = \begin{cases} a_{i*} + b_{i*} & \mbox{when $i = i'$,} \\ a_{i*} = b_{i*} & \mbox{otherwise,} \end{cases} \] or if \[ c_{*j} = \begin{cases} a_{*j} + b_{*j} & \mbox{when $j = j'$,} \\ a_{*j} = b_{*j} & \mbox{otherwise,} \end{cases} \] then \bq{eigen:310:12} \det C = \det A + \det B. \eq If one row or column of a matrix~$C$ is the sum of the corresponding rows or columns of two other matrices~$A$ and~$B$, while the three matrices remain otherwise identical, then the determinant of the one matrix is the sum of the determinants of the other two. (Equation~\ref{eigen:310:12} tracks the linear superposition property of \S~\ref{integ:240.05} and of eqn.~\ref{matrix:000:20}.) \item \index{row!null} \index{column!null} If \[ c_{i'*} = 0, \] or if \[ c_{*j'} = 0, \] then \bq{eigen:310:18} \det C = 0. \eq A matrix with a null row or column also has a null determinant. \item \index{row!scaled and repeated} \index{column!scaled and repeated} If \[ c_{i''*} = \gamma c_{i'*}, \] or if \[ c_{*j''} = \gamma c_{*j'}, \] where $i'' \neq i'$ and $j'' \neq j'$, then \bq{eigen:310:17} \det C = 0. \eq The determinant is zero if one row or column of the matrix is a multiple of another. \item \index{adjoint} \index{transpose} The determinant of the adjoint is just the determinant's conjugate, and the determinant of the transpose is just the determinant itself: \bq{eigen:310:19} \begin{split} \det C^{*} &= \left(\det C\right)^{*}; \\ \det C^{T} &= \det C. \end{split} \eq \ei These basic properties are all fairly easy to see if the definition of the determinant is clearly understood. % Equations~(\ref{eigen:310:11}), (\ref{eigen:310:12}) and~(\ref{eigen:310:18}) come because each of the~$n!$ terms in the determinant's expansion has exactly one element from row~$i'$ or column~$j'$. % Equation~(\ref{eigen:310:16}) comes because a row or column interchange reverses parity. % Equation~(\ref{eigen:310:19}) comes because according to \S~\ref{matrix:325.10}, the permutors~$P$ and~$P^{*}$ always have the same parity, and because the adjoint operation individually conjugates each element of~$C$. % Finally,~(\ref{eigen:310:17}) comes because, in this case, every term in the determinant's expansion finds an equal term of opposite parity to offset it. Or, more formally,~(\ref{eigen:310:17}) comes because the following procedure does not alter the matrix: (i) scale row~$i''$ or column~$j''$ by $1/\gamma$; (ii) scale row~$i'$ or column~$j'$ by~$\gamma$; (iii) interchange rows $i' \lra i''$ or columns $j' \lra j''$. Not altering the matrix, the procedure does not alter the determinant either; and indeed according to~(\ref{eigen:310:11}), step~(ii)'s effect on the determinant cancels that of step~(i). However, according to~(\ref{eigen:310:16}), step~(iii) negates the determinant. Hence the net effect of the procedure is to negate the determinant---to negate the very determinant the procedure is not permitted to alter. The apparent contradiction can be reconciled only if the determinant is zero to begin with. From the foregoing properties the following further property can be deduced. \bi \item \index{addition!of rows or columns} \index{row!addition of} \index{column!addition of} If \[ c_{i*} = \begin{cases} a_{i*} + \alpha a_{i'*} & \mbox{when $i = i''$,} \\ a_{i*}& \mbox{otherwise,} \end{cases} \] or if \[ c_{*j} = \begin{cases} a_{*j} + \alpha a_{*j'} & \mbox{when $j = j''$,} \\ a_{*j} & \mbox{otherwise,} \end{cases} \] where $i'' \neq i'$ and $j'' \neq j'$, then \bq{eigen:310:21} \det C = \det A. \eq Adding to a row or column of a matrix a multiple of another row or column does not change the matrix's determinant. \ei To derive~(\ref{eigen:310:21}) for rows (the column proof is similar), one defines a matrix~$B$ such that \[ b_{i*} \equiv \begin{cases} \alpha a_{i'*} & \mbox{when $i = i''$,} \\ a_{i*} & \mbox{otherwise.} \end{cases} \] From this definition, $b_{i''*} = \alpha a_{i'*}$ whereas $b_{i'*} = a_{i'*}$, so \[ b_{i''*} = \alpha b_{i'*}, \] which by~(\ref{eigen:310:17}) guarantees that \[ \det B=0. \] On the other hand, the three matrices~$A$, $B$ and~$C$ differ only in the ($i''$)th row, where $[C]_{i''*} = [A]_{i''*} + [B]_{i''*}$; so, according to~(\ref{eigen:310:12}), \[ \det C = \det A + \det B. \] Equation~(\ref{eigen:310:21}) results from combining the last two equations. \subsection{The determinant and the elementary operator} \label{eigen:310.35} \index{elementary operator!and the determinant} \index{determinant!and the elementary operator} Section~\ref{eigen:310.25} has it that interchanging, scaling or adding rows or columns of a matrix respectively negates, scales or does not alter the matrix's determinant. But the three operations named are precisely the operations of the three elementaries of \S~\ref{matrix:320}. Therefore, \bq{eigen:310:25} \renewcommand\arraystretch{1.3} \br{c} \br{rcrcl} \det T_{[i\lra j]}A &=& -\det A &=& \det AT_{[i\lra j]} , \\ \det T_{\alpha[i]}A &=& \alpha\det A &=& \det AT_{\alpha[j]} , \\ \det T_{\alpha[ij]}A &=& \det A &=& \det AT_{\alpha[ij]} , \er \\ 1 \le (i,j) \le n,\ i \neq j, \er \eq for any $n \times n$ square matrix~$A$. Obviously also, \bq{eigen:310:27} \renewcommand\arraystretch{1.3} \br{rcccl} \det IA &=& \det A &=& \det AI , \\ \det I_nA &=& \det A &=& \det AI_n , \\ \det I &=& 1 &=& \det I_n . \er \eq If~$A$ is taken to represent an arbitrary product of identity matrices ($I_n$ and/or~$I$) and elementary operators, then a significant consequence of~(\ref{eigen:310:25}) and~(\ref{eigen:310:27}), applied recursively, is that the determinant of a product is the product of the determinants, at least where identity matrices and elementary operators are concerned. In symbols,% \footnote{\label{eigen:310:fn1}% Notation like ``$\in$'', first met in \S~\ref{alggeo:227}, can be too fancy for applied mathematics, but it does help here. The notation $M_k \in \{ \ldots \}$ restricts~$M_k$ to be any of the things between the braces. As it happens though, in this case,~(\ref{eigen:310:30}) below is going to erase the restriction. } \bqa \det\left( \prod_k M_k \right) &=& \prod_k \det M_k , \label{eigen:310:26}\\ M_k &\in& \left\{ I_n, I, T_{[i \lra j]}, T_{\alpha[i]}, T_{\alpha[ij]} \right\}, \xn\\ 1 \ \le \ (i,j) &\le& n. \xn \eqa This matters because, as the Gauss-Jordan decomposition of \S~\ref{gjrank:341} has shown, one can build up any square matrix of full rank by applying elementary operators to~$I_n$. Section~\ref{eigen:310.40} will put the rule~(\ref{eigen:310:26}) to good use. \subsection{The determinant of a singular matrix} \label{eigen:310.37} \index{singular matrix!determinant of} \index{determinant!zero} Equation~(\ref{eigen:310:25}) gives elementary operators the power to alter a matrix's determinant almost arbitrarily---almost arbitrarily, but not quite. What an $n \times n$ elementary operator% \footnote{ That is, an elementary operator which honors an $n \times n$ active region. See \S~\ref{matrix:180.35}. } cannot do is to change an $n \times n$ matrix's determinant to or from zero. Once zero, a determinant remains zero under the action of elementary operators. Once nonzero, always nonzero. Elementary operators being reversible have no power to breach this barrier. Another thing $n \times n$ elementaries cannot do according to \S~\ref{gjrank:340.20} is to change an $n \times n$ matrix's rank. Nevertheless, such elementaries can reduce any $n \times n$ matrix reversibly to~$I_r$, where $r \le n$ is the matrix's rank, by the Gauss-Jordan algorithm of \S~\ref{gjrank:341}. Equation~(\ref{eigen:310:18}) has that the $n \times n$ determinant of~$I_r$ is zero if $r 0$, because according to the fundamental theorem of algebra~(\ref{noth:320:50}) the matrix's characteristic polynomial~(\ref{eigen:eigdet}) has at least one root, an eigenvalue, which by definition would be no eigenvalue if it had no eigenvector to scale, and for which~(\ref{eigen:410:10}) necessarily admits at least one nonzero solution~$\ve v$ because its matrix $A-\lambda I_n$ is degenerate. \ei % ---------------------------------------------------------------------- \section{Diagonalization} \label{eigen:423} \index{diagonalization} \index{decomposition!diagonal} \index{decomposition!eigenvalue} \index{diagonal decomposition} \index{eigenvalue decomposition} \index{eigenvalue matrix} \index{matrix!eigenvalue} Any $n \times n$ matrix with~$n$ independent eigenvectors (which class per \S~\ref{eigen:422} includes, but is not limited to, every $n \times n$ matrix with~$n$ distinct eigenvalues) can be \emph{diagonalized} as \bq{eigen:diag} A = V \Lambda V^{-1}, \eq where \[ \Lambda = \mf{ccccc}{ \lambda_1 & 0 & \cdots & 0 & 0 \\ 0 & \lambda_2 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & \lambda_{n-1} & 0 \\ 0 & 0 & \cdots & 0 & \lambda_n } \] is an otherwise empty $n \times n$ matrix with the eigenvalues of~$A$ set along its main diagonal and \[ V = \left[ \br{ccccc} \ve v_1 & \ve v_2 & \cdots & \ve v_{n-1} & \ve v_n \er \right] \] is an $n \times n$ matrix whose columns are the eigenvectors of~$A$. This is so because the identity $A\ve v_j = \ve v_j \lambda_j$ holds for all $1 \le j \le n$; or, expressed more concisely, because the identity \[ AV = V\Lambda \] holds.% \footnote{ If this seems confusing, then consider that the $j$th column of the product~$AV$ is~$A\ve v_j$, whereas the $j$th column of~$\Lambda$ having just the one element acts to scale $V$'s $j$th column only. } The matrix~$V$ is invertible because its columns the eigenvectors are independent, from which~(\ref{eigen:diag}) follows. Equation~(\ref{eigen:diag}) is called the \emph{eigenvalue decomposition,} the \emph{diagonal decomposition} or the \emph{diagonalization} of the square matrix~$A$. One might object that we had shown only how to compose some matrix $V\Lambda V^{-1}$ with the correct eigenvalues and independent eigenvectors, but had failed to show that the matrix was actually~$A$. However, we need not show this, because \S~\ref{eigen:422} has already demonstrated that two matrices with the same eigenvalues and independent eigenvectors are in fact the same matrix, whereby the product $V\Lambda V^{-1}$ can be nothing other than~$A$. \index{eigenvector!independent} \index{eigenvalue!distinct} \index{diagonalizability} An $n \times n$ matrix with~$n$ independent eigenvectors (which class, again, includes every $n \times n$ matrix with~$n$ distinct eigenvalues and also includes many matrices with fewer) is called a \emph{diagonalizable} matrix. Besides factoring a diagonalizable matrix by~(\ref{eigen:diag}), one can apply the same formula to compose a diagonalizable matrix with desired eigensolutions. The diagonal matrix $\mopx{diag}\{\ve x\}$ of~(\ref{matrix:325:diag}) is trivially diagonalizable as $\mopx{diag}\{\ve x\}=I_n\mopx{diag}\{\ve x\}I_n$. \index{matrix!raised to a complex power} It is a curious and useful fact that \[ A^2 = (V\Lambda V^{-1})(V\Lambda V^{-1}) = V\Lambda^2 V^{-1} \] and by extension that \bq{eigen:423:30} A^k = V\Lambda^k V^{-1} \eq for any diagonalizable matrix~$A$. The diagonal matrix~$\Lambda^k$ is nothing more than the diagonal matrix~$\Lambda$ with each element individually raised to the $k$th power, such that \[ \left[ \Lambda^k \right]_{ij} = \delta_{ij} \lambda_{j}^k. \] Changing $z \la k$ implies the generalization% \footnote{ It may not be clear however according to~(\ref{cexp:230:33}) which branch of~$\lambda_j^z$ one should choose at each index~$j$, especially if~$A$ has negative or complex eigenvalues. } \settoheight\tlj{\scriptsize $k$} \bq{eigen:423:35} \begin{split} A^z &= V\Lambda^z V^{-1}, \\ \left[ \Lambda^{\rule{0em}{\tlj}z} \right]_{ij} &= \delta_{ij} \lambda_{j}^z, \end{split} \eq good for any diagonalizable~$A$ and complex~$z$. \index{matrix!nondiagonalizable versus singular} \index{singular matrix!versus a nondiagonalizable matrix} \index{nondiagonalizable matrix!versus a singular matrix} \index{eigenvector!repeated} \index{eigensolution!repeated} \index{repeated eigenvalue} Nondiagonalizable matrices are troublesome and interesting. The nondiagonalizable matrix vaguely resembles the singular matrix in that both represent edge cases and can be hard to handle numerically; but the resemblance ends there, and a matrix can be either without being the other. The $n \times n$ null matrix for example is singular but still diagonalizable. What a nondiagonalizable matrix is in essence is a matrix with a repeated eigensolution: the same eigenvalue with the same eigenvector, twice or more. More formally, a nondiagonalizable matrix is a matrix with an $n$-fold eigenvalue whose corresponding eigenvector space fewer than~$n$ eigenvectors fully characterize. Section~\ref{eigen:520.30} will have more to say about the nondiagonalizable matrix. % ---------------------------------------------------------------------- \section{Remarks on the eigenvalue} \label{eigen:429} \index{dominant eigenvalue} \index{eigenvalue!dominant} Eigenvalues and their associated eigenvectors stand among the principal reasons one goes to the considerable trouble to develop matrix theory as we have done in recent chapters. The idea that a matrix resembles a humble scalar in the right circumstance is powerful. Among the reasons for this is that a matrix can represent an iterative process, operating repeatedly on a vector~$\ve v$ to change it first to~$A \ve v$, then to~$A^2 \ve v$, $A^3 \ve v$ and so on. The \emph{dominant eigenvalue} of~$A$, largest in magnitude, tends then to transform~$\ve v$ into the associated eigenvector, gradually but relatively eliminating all other components of~$\ve v$. Should the dominant eigenvalue have greater than unit magnitude, it destabilizes the iteration; thus one can sometimes judge the stability of a physical process indirectly by examining the eigenvalues of the matrix which describes it. Then there is the edge case of the nondiagonalizable matrix, which matrix surprisingly covers only part of its domain with eigenvectors. All this is fairly deep mathematics. It brings an appreciation of the matrix for reasons which were anything but apparent from the outset of Ch.~\ref{matrix}. Remarks continue in \S\S~\ref{eigen:520.30} and~\ref{eigen:900}. % ---------------------------------------------------------------------- \section{Matrix condition} \label{eigen:470} \index{matrix!condition of} \index{condition} \index{eigenvalue!dominant} \index{dominant eigenvalue} The largest in magnitude of the several eigenvalues of a diagonalizable operator~$A$, denoted here~$\lambda_\mr{max}$, tends to dominate the iteration~$A^k\ve x$. Section~\ref{eigen:429} has named~$\lambda_\mr{max}$ the \emph{dominant eigenvalue} for this reason. \index{normalization} \index{unit magnitude} \index{magnitude!unit} \index{magnitude!of an eigenvalue} \index{iteration} \index{magnification} \index{eigenvalue!magnitude of} \index{eigenvalue!zero} \index{eigenvalue!small} \index{eigenvalue!large} \index{matrix!singular} \index{singular matrix} One sometimes finds it convenient to normalize a dominant eigenvalue by defining a new operator $A' \equiv A/\left|\lambda_\mr{max}\right|$, whose own dominant eigenvalue $\lambda_\mr{max}/\left|\lambda_\mr{max}\right|$ has unit magnitude. In terms of the new operator, the iteration becomes $A^k\ve x=\left|\lambda_\mr{max}\right|^kA'^k\ve x$, leaving one free to carry the magnifying effect $\left|\lambda_\mr{max}\right|^k$ separately if one prefers to do so. However, the scale factor $1/\left|\lambda_\mr{max}\right|$ scales all eigenvalues equally; thus, if $A$'s eigenvalue of \emph{smallest} magnitude is denoted~$\lambda_\mr{min}$, then the corresponding eigenvalue of~$A'$ is $\lambda_\mr{min}/\left|\lambda_\mr{max}\right|$. If zero, then both matrices according to \S~\ref{eigen:410} are singular; if nearly zero, then both matrices are ill conditioned. \index{ill-conditioned matrix} \index{matrix!ill-conditioned} \index{imprecise quantity} \index{floating-point number} \index{skepticism} \index{exact arithmetic} \index{arithmetic!exact} Such considerations lead us to define the \emph{condition} of a diagonalizable matrix quantitatively as% \footnote{ \cite{deSturler-lecture} } \bq{eigen:470:10} \kappa \equiv \left| \frac{ \lambda_\mr{max} }{ \lambda_\mr{min} } \right|, \eq by which \bq{eigen:470:20} \kappa \ge 1 \eq is always a real number of no less than unit magnitude. For best invertibility, $\kappa = 1$ would be ideal (it would mean that all eigenvalues had the same magnitude), though in practice quite a broad range of~$\kappa$ is usually acceptable. Could we always work in exact arithmetic, the value of~$\kappa$ might not interest us much as long as it stayed finite; but in computer floating point, or where the elements of~$A$ are known only within some tolerance, infinite~$\kappa$ tends to emerge imprecisely rather as large $\kappa \gg 1$. An \emph{ill-conditioned} matrix by definition% \footnote{ There is of course no definite boundary, no particular edge value of~$\kappa$, less than which a matrix is well conditioned, at and beyond which it turns ill-conditioned; but you knew that already. If I tried to claim that a matrix with a fine $\kappa=3$ were ill conditioned, for instance, or that one with a wretched $\kappa = 2^\mr{0x18}$ were well conditioned, then you might not credit me---but the mathematics nevertheless can only give the number; it remains to the mathematician to interpret it. } is a matrix of large $\kappa \gg 1$. The applied mathematician handles such a matrix with due skepticism. \index{error!in the solution to a linear system} \index{solution!error in} Matrix condition so defined turns out to have another useful application. Suppose that a diagonalizable matrix~$A$ is precisely known but that the corresponding driving vector~$\ve b$ is not. If \[ A(\ve x + \delta\ve x)= \ve b + \delta\ve b, \] where~$\delta\ve b$ is the error in~$\ve b$ and~$\delta\ve x$ is the resultant error in~$\ve x$, then one should like to bound the ratio $\left|\delta\ve x\right|/\left|\ve x\right|$ to ascertain the reliability of~$\ve x$ as a solution. Transferring~$A$ to the equation's right side, \[ \ve x + \delta\ve x= A^{-1}(\ve b + \delta\ve b). \] Subtracting $\ve x=A^{-1}\ve b$ and taking the magnitude, \[ \left|\delta\ve x\right| = \left|A^{-1}\,\delta\ve b\right|. \] Dividing by $\left|\ve x\right| = \left|A^{-1}\ve b\right|$, \[ \frac{\left|\delta\ve x\right|}{\left|\ve x\right|} = \frac{\left|A^{-1}\,\delta\ve b\right|}{\left|A^{-1}\ve b\right|}. \] The quantity $\left|A^{-1}\,\delta\ve b\right|$ cannot exceed $\left|\lambda_\mr{min}^{-1}\,\delta\ve b\right|$. The quantity $\left|A^{-1}\ve b\right|$ cannot fall short of $\left|\lambda_\mr{max}^{-1}\ve b\right|$. Thus, \[ \frac{\left|\delta\ve x\right|}{\left|\ve x\right|} \le \frac{\left|\lambda_\mr{min}^{-1}\,\delta\ve b\right|}{\left|\lambda_\mr{max}^{-1}\ve b\right|} = \left|\frac{\lambda_\mr{max}}{\lambda_\mr{min}}\right| \frac{\left|\delta\ve b\right|}{\left|\ve b\right|}. \] That is, \bq{eigen:470:30} \frac{\left|\delta\ve x\right|}{\left|\ve x\right|} \le \kappa \frac{\left|\delta\ve b\right|}{\left|\ve b\right|}. \eq \index{scalar!condition of} \index{condition!of a scalar} Condition, incidentally, might technically be said to apply to scalars as well as to matrices, but ill condition remains a property of matrices alone. According to~(\ref{eigen:470:10}), the condition of every nonzero scalar is happily $\kappa = 1$. % ---------------------------------------------------------------------- \section{The similarity transformation} \label{eigen:505} \index{similarity transformation} \index{basis} \index{vector!building of from basis vectors} Any collection of vectors assembled into a matrix can serve as a \emph{basis} by which other vectors can be expressed. For example, if the columns of \[ B = \mf{rr}{ 1 & -1 \\ 0 & 2 \\ 0 & 1 } \] are regarded as a basis, then the vector \[ B \mf{r}{ 5 \\ 1 } = 5\mf{r}{ 1 \\ 0 \\ 0 } + 1\mf{r}{ -1 \\ 2 \\ 1 } = \mf{r}{ 4 \\ 2 \\ 1 } \] is $(5,1)$ in the basis~$B$: five times the first basis vector plus once the second. The basis provides the units from which other vectors can be built. \index{basis!complete} \index{complete basis} \index{basis!converting to and from} Particularly interesting is the $n \times n$, invertible \emph{complete basis}~$B$, in which the~$n$ basis vectors are independent and address the same full space the columns of~$I_n$ address. If \[ \ve x = B \ve u \] then~$\ve u$ represents~$\ve x$ in the basis~$B$. Left-multiplication by~$B$ evidently converts out of the basis. Left-multiplication by~$B^{-1}$, \[ \ve u = B^{-1} \ve x, \] then does the reverse, converting into the basis. One can therefore convert any operator~$A$ to work within a complete basis~$B$ by the successive steps \bqb A\ve x &=& \ve b, \\ AB\ve u &=& \ve b, \\ \mbox{}[B^{-1}AB]\ve u &=& B^{-1}\ve b, \eqb by which the operator $B^{-1}AB$ is seen to be the operator~$A$, only transformed to work within the basis% \footnote{ The reader may need to ponder the basis concept a while to grasp it, but the concept is simple once grasped and little purpose would be served by dwelling on it here. Basically, the idea is that one can build the same vector from alternate building blocks, not only from the standard building blocks~$\ve e_1$, $\ve e_2$, $\ve e_3$, etc.---except that the right word for the relevant ``building block'' is \emph{basis vector.} The books~\cite{Hefferon} and~\cite{Lay} introduce the basis more gently; one might consult one of those if needed. }$\mbox{}^,$% \footnote{ The professional matrix literature sometimes distinguishes by typeface between the matrix~$B$ and the basis~$\textsf{B}$ its columns represent. Such semantical distinctions seem a little too fine for applied use, though. This book just uses~$B$. }% ~$B$. \index{similarity} \index{unitary similarity} \index{unitary transformation} The conversion from~$A$ into $B^{-1}AB$ is called a \emph{similarity transformation.} If~$B$ happens to be unitary (\S~\ref{mtxinv:465}), then the conversion is also called a \emph{unitary transformation.} The matrix $B^{-1}AB$ the transformation produces is said to be \emph{similar} (or, if~$B$ is unitary, \emph{unitarily similar}) to the matrix~$A$. We have already met the similarity transformation in \S\S~\ref{matrix:321} and~\ref{gjrank:337}. Now we have the theory to appreciate it properly. Probably the most important property of the similarity transformation is that it alters no eigenvalues. That is, if \[ A \ve x = \lambda \ve x, \] then, by successive steps, \bqa B^{-1}A(BB^{-1})\ve x &=& \lambda B^{-1} \ve x, \xn\\ \mbox{}[B^{-1}AB]\ve u &=& \lambda \ve u. \label{eigen:505:50} \eqa \emph{The eigenvalues of~$A$ and the similar $B^{-1}AB$ are the same} for any square, $n \times n$ matrix~$A$ and any invertible, square, $n \times n$ matrix~$B$. % ---------------------------------------------------------------------- \section{The Schur decomposition} \label{eigen:520} \index{decomposition!Schur} \index{Schur decomposition} \index{Schur, Issai (1875--1941)} The \emph{Schur decomposition} of an arbitrary, $n \times n$ square matrix~$A$ is \bq{eigen:schur} A = QU_SQ^{*}, \eq where~$Q$ is an $n \times n$ unitary matrix whose inverse, as for any unitary matrix (\S~\ref{mtxinv:465}), is $Q^{-1} = Q^{*}$; and where~$U_S$ is a general upper triangular matrix which can have any values (even zeros) along its main diagonal. The Schur decomposition is slightly obscure, is somewhat tedious to derive and is of limited use in itself, but serves a theoretical purpose.% \footnote{ The alternative is to develop the interesting but difficult \emph{Jordan canonical form,} which for brevity's sake this chapter prefers to omit. } We derive it here for this reason. \subsection{Derivation} \label{eigen:520.10} % At the moment this comment is written, it appears that this lengthy % subsection merits no index entries. Interesting. Suppose that% \footnote{ This subsection assigns various capital Roman letters to represent the several matrices and submatrices it manipulates. Its choice of letters except in~(\ref{eigen:schur}) is not standard and carries no meaning elsewhere. The writer had to choose some letters and these are ones he chose. This footnote mentions the fact because good mathematical style avoid assigning letters that already bear a conventional meaning in a related context (for example, this book avoids writing $A\ve x = \ve b$ as $T\ve e = \ve i$, not because the latter is wrong but because it would be extremely confusing). The Roman alphabet provides only twenty-six capitals, though, of which this subsection uses too many to be allowed to reserve any. See Appendix~\ref{greek}. } (for some reason, which will shortly grow clear) we have a matrix~$B$ of the form \settowidth\tla{$C \equiv Q^{*}BQ$} \settowidth\tlc{$,$} \bq{eigen:520:21} \makebox[\tla][r]{$B$} = \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots &{*}&{*}&{*}&{*}&{*}&{*}&{*}& \cdots \\ \cdots & 0 &{*}&{*}&{*}&{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 &{*}&{*}&{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 &{*}&{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 &{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 &{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 &{*}&{*}&{*}& \cdots \\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }\makebox[\tlc][l]{$,$} \eq where the $i$th row and $i$th column are depicted at center. Suppose further that we wish to transform~$B$ not only similarly but unitarily into \bq{eigen:520:22} \makebox[\tla][r]{$C \equiv W^{*}BW$} = \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots &{*}&{*}&{*}&{*}&{*}&{*}&{*}& \cdots \\ \cdots & 0 &{*}&{*}&{*}&{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 &{*}&{*}&{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 &{*}&{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 &{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 & 0 &{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 & 0 &{*}&{*}& \cdots \\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }\makebox[\tlc][l]{$,$} \eq where~$W$ is an $n\times n$ unitary matrix, and where we do not mind if any or all of the~${*}$ elements change in going from~$B$ to~$C$ but we require zeros in the indicated spots. Let~$B_o$ and~$C_o$ represent the $(n-i) \times (n-i)$ submatrices in the lower right corners respectively of~$B$ and~$C$, such that \bq{eigen:520:25} \begin{split} B_o & \equiv I_{n-i}H_{-i}BH_{i}I_{n-i}, \\ C_o & \equiv I_{n-i}H_{-i}CH_{i}I_{n-i}, \end{split} \eq where~$H_k$ is the shift operator of \S~\ref{matrix:340}. Pictorially, \[ B_o = \mf{cccc}{ {*}&{*}&{*}& \cdots \\ {*}&{*}&{*}& \cdots \\ {*}&{*}&{*}& \cdots \\ \vdots&\vdots&\vdots&\ddots }, \ \ % C_o = \mf{cccc}{ {*}&{*}&{*}& \cdots \\ 0 &{*}&{*}& \cdots \\ 0 &{*}&{*}& \cdots \\ \vdots&\vdots&\vdots&\ddots }. \] Equation~(\ref{eigen:520:22}) seeks an $n \times n$ unitary matrix~$W$ to transform the matrix~$B$ into a new matrix $C \equiv W^{*}BW$ such that~$C$ fits the form~(\ref{eigen:520:22}) stipulates. The question remains as to whether a unitary~$W$ exists that satisfies the form and whether for general~$B$ we can discover a way to calculate it. To narrow the search, because we need not find every~$W$ that satisfies the form but only one such~$W$, let us look first for a~$W$ that fits the restricted template \bq{eigen:520:30} W = I_{i} + H_{i}W_oH_{-i} = \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots & 1 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 1 & 0 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 1 & 0 & 0 & 0 & \cdots \\ \cdots & 0 & 0 & 0 & 0 &{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 &{*}&{*}&{*}& \cdots \\ \cdots & 0 & 0 & 0 & 0 &{*}&{*}&{*}& \cdots \\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }, \eq which contains a smaller, $(n-i)\times(n-i)$ unitary submatrix~$W_o$ in its lower right corner and resembles~$I_n$ elsewhere. Beginning from~(\ref{eigen:520:22}), we have by successive, reversible steps that \bqb C &=& W^{*} B W \\&=& (I_{i} + H_{i}W_o^{*}H_{-i}) (B) (I_{i} + H_{i}W_oH_{-i}) \\&=& I_{i}BI_{i} + I_{i}BH_{i}W_oH_{-i} + H_{i}W_o^{*}H_{-i}BI_{i} \\&&\ \ \ \ \mbox{} + H_{i}W_o^{*}H_{-i}BH_{i}W_oH_{-i}. \eqb The unitary submatrix~$W_o$ has only $n-i$ columns and $n-i$ rows, so $I_{n-i}W_o = W_o = W_oI_{n-i}$. Thus, \bqb C &=& I_{i}BI_{i} + I_{i}BH_{i}W_oI_{n-i}H_{-i} + H_{i}I_{n-i}W_o^{*}H_{-i}BI_{i} \\&&\ \ \ \ \mbox{} + H_{i}I_{n-i}W_o^{*}I_{n-i}H_{-i}BH_{i}I_{n-i}W_oI_{n-i}H_{-i} \\&=& I_{i}[B]I_{i} + I_{i}[BH_{i}W_oH_{-i}](I_n-I_i) + (I_n-I_i)[H_{i}W_o^{*}H_{-i}B]I_{i} \\&&\ \ \ \ \mbox{} + (I_n-I_i)[H_{i}W_o^{*}B_oW_oH_{-i}](I_n-I_i), \eqb where the last step has used~(\ref{eigen:520:25}) and the identity~(\ref{matrix:340:61}). The four terms on the equation's right, each term with rows and columns neatly truncated, represent the four quarters of $C \equiv W^{*} B W$---upper left, upper right, lower left and lower right, respectively. The lower left term is null because \bqb (I_n-I_i)[H_{i}W_o^{*}H_{-i}B]I_i &=& (I_n-I_i)[H_{i}W_o^{*}I_{n-i}H_{-i}BI_i]I_i \\&=& (I_n-I_i)[H_{i}W_o^{*}H_{-i}][(I_n-I_i)BI_i]I_i \\&=& (I_n-I_i)[H_{i}W_o^{*}H_{-i}][0]I_i = 0, \eqb leaving \bqb C &=& I_{i}[B]I_{i} + I_{i}[BH_{i}W_oH_{-i}](I_n-I_i) \\&&\ \ \ \ \mbox{} + (I_n-I_i)[H_{i}W_o^{*}B_oW_oH_{-i}](I_n-I_i). \eqb But the upper left term makes the upper left areas of~$B$ and~$C$ the same, and the upper right term does not bother us because we have not restricted the content of $C$'s upper right area. Apparently any $(n-i)\times(n-i)$ unitary submatrix~$W_o$ whatsoever obeys~(\ref{eigen:520:22}) in the lower left, upper left and upper right. That leaves the lower right. Left- and right-multiplying~(\ref{eigen:520:22}) by the truncator $(I_n-I_i)$ to focus solely on the lower right area, we have the reduced requirement that \bq{eigen:520:38} (I_n-I_i)C(I_n-I_i) = (I_n-I_i)W^{*}BW(I_n-I_i). \eq Further left-multiplying by~$H_{-i}$, right-multiplying by~$H_i$, and applying the identity~(\ref{matrix:340:61}) yields \bqb I_{n-i}H_{-i}CH_{i}I_{n-i} = I_{n-i}H_{-i}W^{*}BWH_{i}I_{n-i}; \eqb or, substituting from~(\ref{eigen:520:25}), \[ C_o = I_{n-i}H_{-i}W^{*}BWH_{i}I_{n-i}. \] Expanding~$W$ per~(\ref{eigen:520:30}), \[ C_o = I_{n-i}H_{-i}(I_{i} + H_{i}W_o^{*}H_{-i}) B(I_{i} + H_{i}W_oH_{-i})H_{i}I_{n-i}; \] or, since $I_{n-i}H_{-i}I_{i} = 0 = I_iH_{i}I_{n-i}$, \bqb C_o &=& I_{n-i}H_{-i}(H_{i}W_o^{*}H_{-i}) B(H_{i}W_oH_{-i})H_{i}I_{n-i} \\&=& I_{n-i}W_o^{*}H_{-i} BH_{i}W_oI_{n-i} \\&=& W_o^{*}I_{n-i}H_{-i}BH_{i}I_{n-i}W_o. \eqb Per~(\ref{eigen:520:25}), this is \bq{eigen:520:40} C_o = W_o^{*}B_oW_o. \eq The steps from~(\ref{eigen:520:38}) to~(\ref{eigen:520:40}) are reversible, so the latter is as good a way to state the reduced requirement as the former is. To achieve a unitary transformation of the form~(\ref{eigen:520:22}), therefore, it suffices to satisfy~(\ref{eigen:520:40}). The increasingly well-stocked armory of matrix theory we now have to draw from makes satisfying~(\ref{eigen:520:40}) possible as follows. Observe per \S~\ref{eigen:422} that every square matrix has at least one eigensolution. Let $(\lambda_o,\ve v_o)$ represent an eigensolution of~$B_o$---\emph{any} eigensolution of~$B_o$---with~$\ve v_o$ normalized to unit magnitude. Form the broad, $(n-i)\times(n-i+1)$ matrix \[ F \equiv \left[ \br{cccccc} \ve v_o & \ve e_1 & \ve e_2 & \ve e_3 & \cdots & \ve e_{n-i} \er \right]. \] Decompose~$F$ by the Gram-Schmidt technique of \S~\ref{mtxinv:460.30}, choosing $p=1$ during the first instance of the algorithm's step~\ref{mtxinv:461:s20} (though choosing any permissible~$p$ thereafter), to obtain \[ F = Q_F R_F. \] Noting that the Gram-Schmidt algorithm orthogonalizes only rightward, observe that the first column of the $(n-i) \times (n-i)$ unitary matrix~$Q_F$ remains simply the first column of~$F$, which is the unit eigenvector~$\ve v_o$: \[ \left[ Q_F \right]_{*1} = Q_F \ve e_1 = \ve v_o. \] Transform~$B_o$ unitarily by~$Q_F$ to define the new matrix \[ G \equiv Q_F^{*} B_o Q_F, \] then transfer factors to reach the equation \[ Q_F G Q_F^{*} = B_o. \] Right-multiplying by $Q_F \ve e_1 = \ve v_o$ and noting that $B_o\ve v_o = \lambda_o\ve v_o$, observe that \[ Q_F G \ve e_1 = \lambda_o\ve v_o. \] Left-multiplying by~$Q_F^{*}$, \[ G \ve e_1 = \lambda_o Q_F^{*}\ve v_o. \] Noting that the Gram-Schmidt process has rendered orthogonal to~$\ve v_o$ all columns of~$Q_F$ but the first, which is~$\ve v_o$, observe that \[ G \ve e_1 = \lambda_o Q_F^{*}\ve v_o = \lambda_o \ve e_1 = \mf{c}{ \lambda_o \\ 0 \\ 0 \\ \vdots }, \] which means that \[ G = \mf{cccc}{ \lambda_o&{*}&{*}& \cdots \\ 0 &{*}&{*}& \cdots \\ 0 &{*}&{*}& \cdots \\ \vdots&\vdots&\vdots&\ddots }, \] which fits the very form~(\ref{eigen:520:25}) the submatrix~$C_o$ is required to have. Conclude therefore that \bq{eigen:520:50} \begin{split} W_o &= Q_F, \\ C_o &= G, \end{split} \eq where~$Q_F$ and~$G$ are as this paragraph develops, together constitute a valid choice for~$W_o$ and~$C_o$, satisfying the reduced requirement~(\ref{eigen:520:40}) and thus also the original requirement~(\ref{eigen:520:22}). Equation~(\ref{eigen:520:50}) completes a failsafe technique to transform unitarily any square matrix~$B$ of the form~(\ref{eigen:520:21}) into a square matrix~$C$ of the form~(\ref{eigen:520:22}). Naturally the technique can be applied recursively as \bq{eigen:520:53} B|_{i=i'} = C|_{i=i'-1}, \ \ 1 \le i' \le n, \eq because the form~(\ref{eigen:520:21}) of~$B$ at $i=i'$ is nothing other than the form~(\ref{eigen:520:22}) of~$C$ at $i=i'-1$. Therefore, if we let \bq{eigen:520:56} B|_{i=0} = A, \eq then it follows by induction that \bq{eigen:520:57} B|_{i=n} = U_S, \eq where per~(\ref{eigen:520:21}) the matrix~$U_S$ has the general upper triangular form the Schur decomposition~(\ref{eigen:schur}) requires. Moreover, because the product of unitary matrices according to~(\ref{mtxinv:465:20}) is itself a unitary matrix, we have that \bq{eigen:520:60} Q = \coprod_{i'=0}^{n-1} \left( W|_{i=i'} \right), \eq which along with~(\ref{eigen:520:57}) accomplishes the Schur decomposition. \subsection{The nondiagonalizable matrix} \label{eigen:520.30} \index{nondiagonalizable matrix} \index{matrix!nondiagonalizable} \index{triangular matrix} \index{general triangular matrix} \index{diagonal!main} \index{main diagonal} \index{matrix!main diagonal of} \index{characteristic polynomial} \index{polynomial!characteristic} The characteristic equation~(\ref{eigen:eigdet}) of the general upper triangular matrix~$U_S$ is \[ \det( U_S - \lambda I_n ) = 0. \] Unlike most determinants, this determinant brings only the one term \[ \det( U_S - \lambda I_n ) = \prod_{i=1}^n ( u_{Sii} - \lambda ) = 0 \] whose factors run straight down the main diagonal, where the determinant's $n!-1$ other terms are all zero because each of them includes at least one zero factor from below the main diagonal.% \footnote{ The determinant's definition in \S~\ref{eigen:310} makes the following two propositions equivalent: (i)~that a determinant's term which includes one or more factors above the main diagonal also includes one or more factors below; (ii)~that the only permutor that marks no position below the main diagonal is the one which also marks no position above. In either form, the proposition's truth might seem less than obvious until viewed from the proper angle. Consider a permutor~$P$. If~$P$ marked no position below the main diagonal, then it would necessarily have $p_{nn}=1$, else the permutor's bottom row would be empty which is not allowed. In the next-to-bottom row, $p_{(n-1)(n-1)}=1$, because the $n$th column is already occupied. In the next row up, $p_{(n-2)(n-2)}=1$; and so on, thus affirming the proposition. } Hence no element above the main diagonal of~$U_S$ even influences the eigenvalues, which apparently are \bq{eigen:520:70} \lambda_i = u_{Sii}, \eq the main-diagonal elements. According to~(\ref{eigen:505:50}), similarity transformations preserve eigenvalues. The Schur decomposition~(\ref{eigen:schur}) is in fact a similarity transformation; and, as we have seen, every matrix~$A$ has a Schur decomposition. If therefore \[ A = QU_SQ^{*}, \] then \emph{the eigenvalues of~$A$ are just the values along the main diagonal of~$U_S$.}% \footnote{ An unusually careful reader might worry that~$A$ and~$U_S$ had the same eigenvalues with different multiplicities. It would be surprising if it actually were so; but, still, one would like to give a sounder reason than the participle ``surprising.'' Consider however that \bqb \lefteqn{A - \lambda I_n = QU_SQ^{*} - \lambda I_n = Q[U_S - Q^{*}(\lambda I_n)Q]Q^{*}} && \\&&\ \ \ \ \ \ \ \ \ \ \ \ = Q[U_S - \lambda(Q^{*}I_nQ)]Q^{*} = Q[U_S - \lambda I_n]Q^{*}. \eqb According to~(\ref{eigen:310:30}) and~(\ref{eigen:310:64}), this equation's determinant is \[ \det[A - \lambda I_n] = \det\{Q[U_S - \lambda I_n]Q^{*}\} = \det Q \det [U_S - \lambda I_n] \det Q^{*} = \det [U_S - \lambda I_n], \] which says that~$A$ and~$U_S$ have not only the same eigenvalues but also the same characteristic polynomials, thus further the same eigenvalue multiplicities. } One might think that the Schur decomposition offered an easy way to calculate eigenvalues, but it is less easy than it first appears because one must calculate eigenvalues to reach the Schur decomposition in the first place. Whatever practical merit the Schur decomposition might have or lack, however, it brings at least the theoretical benefit of~(\ref{eigen:520:70}): every square matrix without exception has a Schur decomposition, whose triangular factor~$U_S$ openly lists all eigenvalues along its main diagonal. \index{perturbation} \index{eigenvalue!perturbed} This theoretical benefit pays when some of the~$n$ eigenvalues of an $n \times n$ square matrix~$A$ repeat. By the Schur decomposition, one can construct a second square matrix~$A'$, as near as desired to~$A$ but having~$n$ distinct eigenvalues, simply by perturbing the main diagonal of~$U_S$ to% \footnote{ Equation~(\ref{matrix:325:diag}) defines the $\mopx{diag}\{\cdot\}$ notation. } \bqa U_S' &\equiv& U_S + \ep\mopx{diag}\{\ve u\}, \label{eigen:520:75}\\ u_{i'} &\neq& u_i \ \ \mbox{if $\lambda_{i'} = \lambda_i$}, \xn \eqa where $|\ep| \ll 1$ and where~$\ve u$ is an arbitrary vector that meets the criterion given. Though infinitesimally near~$A$, the modified matrix $A' = QU_S'Q^{*}$ unlike~$A$ has~$n$ (maybe infinitesimally) distinct eigenvalues. With sufficient toil, one can analyze such perturbed eigenvalues and their associated eigenvectors similarly as \S~\ref{inttx:260.20} has analyzed perturbed poles. \index{repeated eigenvalue} \index{repeated eigensolution} \index{eigenvalue!repeated} \index{eigenvector!repeated} \index{eigensolution!repeated} Equation~(\ref{eigen:520:75}) brings us to the nondiagonalizable matrix of the subsection's title. Section~\ref{eigen:423} and its diagonalization formula~(\ref{eigen:diag}) diagonalize any matrix with distinct eigenvalues and even any matrix with repeated eigenvalues but distinct eigenvectors, but fail where eigenvectors repeat. Equation~(\ref{eigen:520:75}) separates eigenvalues, thus also eigenvectors---for according to \S~\ref{eigen:422} eigenvectors of distinct eigenvalues never depend on one another---permitting a nonunique but still sometimes usable form of diagonalization in the limit $\ep \ra 0$ even when the matrix in question is strictly nondiagonalizable. \index{characteristic polynomial} \index{polynomial!characteristic} The finding that every matrix is arbitrarily nearly diagonalizable illuminates a question the chapter has evaded up to the present point. The question: does a $p$-fold root in the characteristic polynomial~(\ref{eigen:eigdet}) necessarily imply a $p$-fold eigenvalue in the corresponding matrix? The existence of the nondiagonalizable matrix casts a shadow of doubt until one realizes that every nondiagonalizable matrix is arbitrarily nearly diagonalizable---and, better, is arbitrarily nearly diagonalizable with distinct eigenvalues. If you claim that a matrix has a triple eigenvalue and someone disputes the claim, then you can show him a nearly identical matrix with three infinitesimally distinct eigenvalues. That is the essence of the idea. We will leave the answer in that form. \index{generalized eigenvector} \index{eigenvector!generalized} \index{Hessenberg, Gerhard (1874--1925)} \index{Hessenberg matrix} Generalizing the nondiagonalizability concept leads one eventually to the ideas of the \emph{generalized eigenvector}% \footnote{\cite[Ch.~7]{Friedberg-IS}} (which solves the higher-order linear system $[A-\lambda I]^k\ve v=0$) and the \emph{Jordan canonical form,}% \footnote{\cite[Ch.~5]{Franklin}} which together roughly track the sophisticated conventional pole-separation technique of \S~\ref{inttx:260.50}. Then there is a kind of sloppy Schur form called a Hessenberg form which allows content in~$U_S$ along one or more subdiagonals just beneath the main diagonal. One could profitably propose and prove any number of useful theorems concerning the nondiagonalizable matrix and its generalized eigenvectors, or concerning the eigenvalue problem% \footnote{\cite{Wilkinson}} more broadly, in more and less rigorous ways, but for the time being we will let the matter rest there. % ---------------------------------------------------------------------- \section{The Hermitian matrix} \label{eigen:550} \index{Hermitian matrix} \index{matrix!Hermitian} \index{Hermite, Charles (1822--1901)} \index{adjoint} \index{self-adjoint matrix} \index{matrix!self-adjoint} An $m \times m$ square matrix~$A$ that is its own adjoint, \bq{eigen:550:10} A^{*} = A, \eq is called a \emph{Hermitian} or \emph{self-adjoint} matrix. Properties of the Hermitian matrix include that \bi \item \index{eigenvalue!real} its eigenvalues are real, \item \index{eigenvector!orthogonal} its eigenvectors corresponding to distinct eigenvalues lie orthogonal to one another, and \item \index{diagonalizable matrix} \index{matrix!diagonalizable} it is unitarily diagonalizable (\S\S~\ref{mtxinv:465} and~\ref{eigen:423}) such that \bq{eigen:550:30} A = V \Lambda V^{*}. \eq \ei That the eigenvalues are real is proved by letting $(\lambda, \ve v)$ represent an eigensolution of~$A$ and constructing the product $\ve v^{*}A\ve v$, for which \[ \lambda^{*} \ve v^{*}\ve v = \ve (A\ve v)^{*}\ve v = \ve v^{*}A\ve v = \ve v^{*}(A\ve v) = \lambda \ve v^{*}\ve v. \] That is, \[ \lambda^{*} = \lambda, \] which naturally is possible only if~$\lambda$ is real. That eigenvectors corresponding to distinct eigenvalues lie orthogonal to one another is proved% \footnote{\cite[\S~8.1]{Lay}} by letting $(\lambda_1, \ve v_1)$ and $(\lambda_2, \ve v_2)$ represent eigensolutions of~$A$ and constructing the product $\ve v_2^{*}A\ve v_1$, for which \[ \ve \lambda_2^{*} \ve v_2^{*}\ve v_1 = \ve (A\ve v_2)^{*}\ve v_1 = \ve v_2^{*}A\ve v_1 = \ve v_2^{*}(A\ve v_1) = \ve \lambda_1 \ve v_2^{*}\ve v_1. \] That is, \[ \mbox{$\lambda_2^{*} = \lambda_1$\ \ or\ \ $\ve v_2^{*}\ve v_1 = 0$.} \] But according to the last paragraph all eigenvalues are real; the eigenvalues~$\lambda_1$ and~$\lambda_2$ are no exceptions. Hence, \[ \mbox{$\lambda_2 = \lambda_1$\ \ or\ \ $\ve v_2^{*}\ve v_1 = 0$.} \] To prove the last hypothesis of the three needs first some definitions as follows. Given an $m \times m$ matrix~$A$, let the~$s$ columns of the $m \times s$ matrix~$V_o$ represent the~$s$ independent eigenvectors of~$A$ such that (i)~each column has unit magnitude and~(ii) columns whose eigenvectors share the same eigenvalue lie orthogonal to one another. Let the $s \times s$ diagonal matrix~$\Lambda_o$ carry the eigenvalues on its main diagonal such that \[ AV_o = V_o\Lambda_o, \] where the distinction between the matrix~$\Lambda_o$ and the full eigenvalue matrix~$\Lambda$ of~(\ref{eigen:diag}) is that the latter always includes a $p$-fold eigenvalue~$p$ times, whereas the former includes a $p$-fold eigenvalue only as many times as the eigenvalue enjoys independent eigenvectors. Let the $m-s$ columns of the $m \times (m-s)$ matrix~$V_o^\perp$ represent the complete orthogonal complement (\S~\ref{mtxinv:450}) to~$V_o$---perpendicular to all eigenvectors, each column of unit magnitude---such that \[ V_o^{\perp{*}}V_o = 0 \ \ \mbox{and}\ \ V_o^{\perp{*}}V_o^\perp = I_{m-s}. \] Recall from \S~\ref{eigen:422} that $s \neq 0$ but $0 < s \le m$ because every square matrix has at least one eigensolution. Recall from \S~\ref{eigen:423} that $s=m$ if and only if~$A$ is diagonalizable.% \footnote{ A concrete example: the invertible but nondiagonalizable matrix \[ A = \mf{rrrr}{-1&0&0&0\\-6&5&\frac 5 2&-\frac 5 2\\0&0&5&0\\0&0&0&5} \] has a single eigenvalue at $\lambda=-1$ and a triple eigenvalue at $\lambda = 5$, the latter of whose eigenvector space is fully characterized by two eigenvectors rather than three such that \settowidth\tli{\footnotesize $-\frac 1{\sqrt 2}$} \settowidth\tlj{\footnotesize $\frac 1{\sqrt 2}$} \[ V_o = \mf{ccc}{\frac 1{\sqrt 2}&0&0\\\frac 1{\sqrt 2}&1&0\\ 0&0&\frac 1{\sqrt 2}\\0&0&\frac 1{\sqrt 2}}, \ \ % \Lambda_o = \mf{rrr}{-1&0&0\\0&5&0\\0&0&5}, \ \ % V_o^\perp = \mf{c}{ \makebox[\tli][r]{\makebox[\tlj][c]{$0$}}\\ \makebox[\tli][r]{\makebox[\tlj][c]{$0$}}\\ \makebox[\tli][r]{\makebox[\tlj][r]{$\frac{1}{\sqrt 2}$}}\\ \makebox[\tli][r]{\makebox[\tlj][r]{$-\frac{1}{\sqrt 2}$}} }. \] The orthogonal complement~$V_o^\perp$ supplies the missing vector, not an eigenvector but perpendicular to them all. In the example, $m=4$ and $s=3$. All vectors in the example are reported with unit magnitude. The two $\lambda = 5$ eigenvectors are reported in mutually orthogonal form, but notice that eigenvectors corresponding to distinct eigenvalues need not be orthogonal when~$A$ is not Hermitian. } With these definitions in hand, we can now prove by contradiction that all Hermitian matrices are diagonalizable, falsely supposing a nondiagonalizable Hermitian matrix~$A$, whose~$V_o^\perp$ (since~$A$ is supposed to be nondiagonalizable, implying that $s}(0,0){\xxphir}{-\xxxg}{\xxphi} } }% } \pscircle(0,0){\xxr} % The outer silhouette of the sphere. } { \psset{linewidth=0.5pt} \rput{0} (0,0){\xxaxis} % The z axis. \rput{90}(0,0){\xxaxis} % The y axis. \rput{90}(0,0){ % (Rotation and rerotation are needed to work around a LaTeX syntax issue.) \localscalebox{1.0}{\xxsinphi}{ % The vertical ellipse. \rput{*0}(0,0){ \psarc(0,0){\xxr}{-90}{90} % The constant-phi arc. \rput{90}{ % The theta angle arc. \psarc{<-}(0,0){\xxphir}{-\xxtheta}{0} } \psline[linestyle=dashed](\tlc,\tlb)(\tlc,-\tlf) % The z altitude. \rput(0,-\xxd){ % The rho dimension double arrow. \psline[linestyle=solid]{<->}(0,\tlf)(\tlc,0) \psline(\tlc,-\xxda)(\tlc,\xxdb) } \rput(\tlc,-\tlf){ \rput(0,\tlg){\psline{-C}(0,0)(-\tlg,\tlj)} \rput(-\tlg,\tlj){\psline{-C}(0,0)(0,\tlg)} } \rput{-\xxtheta}{\psline[linestyle=dashed](0,0)(0,\xxr)} }% } } } \psdot(\tld,\tlb) \psframe[linewidth=0.0pt,linecolor=white,fillstyle=solid,fillcolor=white] (0.48,-0.55)(0.77,-1.00) \rput(0.62,1.20){$r$} \rput(0.27,0.93){$\theta$} \rput(0.25,-0.26){$\phi$} \rput(0.60,-0.79){$\rho$} \rput(1.38,0.75){$z$} \rput(0,3.05){$\hat z$} \rput(3.00,0){$\hat y$} \rput(-0.38,-0.75){$\hat x$} \end{pspicture} \ec% } derivations-0.53.20120414.orig/tex/integ.tex0000644000000000000000000021302311742566274016770 0ustar rootroot% ---------------------------------------------------------------------- \chapter{The integral} \label{integ} \index{calculus} \index{calculus!the two complementary questions of} \index{integral} Chapter~\ref{drvtv} has observed that the mathematics of calculus concerns a complementary pair of questions: \bi \item Given some function $f(t)$, what is the function's instantaneous rate of change, or \emph{derivative,} $f'(t)$? \item Interpreting some function $f'(t)$ as an instantaneous rate of change, what is the corresponding accretion, or \emph{integral,} $f(t)$? \ei Chapter~\ref{drvtv} has built toward a basic understanding of the first question. This chapter builds toward a basic understanding of the second. The understanding of the second question constitutes the concept of the integral, one of the profoundest ideas in all of mathematics. This chapter, which introduces the integral, is undeniably a hard chapter. \index{Hamming, Richard~W. (1915--1998)} Experience knows no reliable way to teach the integral adequately to the uninitiated except through dozens or hundreds of pages of suitable examples and exercises, yet the book you are reading cannot be that kind of book. The sections of the present chapter concisely treat matters which elsewhere rightly command chapters or whole books of their own. Concision can be a virtue---and by design, nothing essential is omitted here---but the bold novice who wishes to learn the integral from these pages alone faces a daunting challenge. It can be done. However, for less intrepid readers who quite reasonably prefer a gentler initiation,~\cite{Hamming} is warmly recommended. % ---------------------------------------------------------------------- \section{The concept of the integral} \label{integ:220} \index{integral} \index{integral!concept of} An \emph{integral} is a finite accretion or sum of an infinite number of infinitesimal elements. This section introduces the concept. \subsection{An introductory example} \label{integ:220.10} \index{integral!as accretion or area} Consider the sums \settowidth\tla{$\ds\frac 12$} \bqb S_1 &=& \rule{1\tla}{0ex}\sum_{k=0}^{\mr{0x10}-1} k, \\ S_2 &=& \frac 12\sum_{k=0}^{\mr{0x20}-1} \frac{k}{2}, \\ S_4 &=& \frac 14\sum_{k=0}^{\mr{0x40}-1} \frac{k}{4}, \\ S_8 &=& \frac 18\sum_{k=0}^{\mr{0x80}-1} \frac{k}{8}, \\ &\vdots& \\ S_n &=& \frac 1n\sum_{k=0}^{(\mr{0x10})n-1} \frac{k}{n}. \eqb What do these sums represent? One way to think of them is in terms of the shaded areas of Fig.~\ref{integ:220:fig1}. \begin{figure} \caption{Areas representing discrete sums.} \label{integ:220:fig1} \bc \begin{pspicture}(-4,-2)(6,4) { \small \psset{dimen=middle} \nc\xx{3.2} \nc\axes[1]{ { \psset{linewidth=0.5pt} \psline(-1.0,0)(3.5,0) \psline(0,-1.0)(0,3.5) \rput(3.8,0){$\tau$} \rput(0,3.8){$f_{#1}(\tau)$} } } \rput(-\xx,0){ \axes{1} \psset{linewidth=1.0pt} \psset{unit=0.2cm} \pspolygon[fillstyle=solid,fillcolor=lightgray,hatchwidth=0.5pt,hatchangle=30] ( 0, 0)( 1, 0) ( 1, 1)( 2, 1) ( 2, 2)( 3, 2) ( 3, 3)( 4, 3) ( 4, 4)( 5, 4) ( 5, 5)( 6, 5) ( 6, 6)( 7, 6) ( 7, 7)( 8, 7) ( 8, 8)( 9, 8) ( 9, 9)(10, 9) (10,10)(11,10) (11,11)(12,11) (12,12)(13,12) (13,13)(14,13) (14,14)(15,14) (15,15)(16,15) (16,16)(16, 0) \rput(11.5,4.5){$S_1$} { \psset{yunit=1cm,linewidth=0.5pt} \psline(16,0)(16,-0.2)\rput(16,-0.5){$\mr{0x10}$} } { \psset{linestyle=dashed,linewidth=0.5pt} \psline(0,16)(16,16) \psset{xunit=1cm} \rput[r](-0.15,16){$\mr{0x10}$} } { \psset{linewidth=0.5pt} \psline(4,5.5)(4,14) \psline(5,6.5)(5,14) \psline{->}(2.5,13.5)(4,13.5) \psline{<-}(5,13.5)(6.5,13.5) \uput{0.2}[r](6.5,13.5){$\Delta \tau\!=\!1$} } } \rput( \xx,0){ \axes{2} \psset{linewidth=1.0pt} \psset{unit=0.1cm} \pspolygon[fillstyle=solid,fillcolor=lightgray,hatchwidth=0.5pt,hatchangle=30] ( 0, 0)( 1, 0) ( 1, 1)( 2, 1) ( 2, 2)( 3, 2) ( 3, 3)( 4, 3) ( 4, 4)( 5, 4) ( 5, 5)( 6, 5) ( 6, 6)( 7, 6) ( 7, 7)( 8, 7) ( 8, 8)( 9, 8) ( 9, 9)(10, 9) (10,10)(11,10) (11,11)(12,11) (12,12)(13,12) (13,13)(14,13) (14,14)(15,14) (15,15)(16,15) (16,16)(17,16) (17,17)(18,17) (18,18)(19,18) (19,19)(20,19) (20,20)(21,20) (21,21)(22,21) (22,22)(23,22) (23,23)(24,23) (24,24)(25,24) (25,25)(26,25) (26,26)(27,26) (27,27)(28,27) (28,28)(29,28) (29,29)(30,29) (30,30)(31,30) (31,31)(32,31) (32,32)(32, 0) \rput(23, 9){$S_2$} { \psset{yunit=1cm,linewidth=0.5pt} \psline(32,0)(32,-0.2)\rput(32,-0.5){$\mr{0x10}$} } { \psset{linestyle=dashed,linewidth=0.5pt} \psline(0,32)(32,32) \psset{xunit=1cm} \rput[r](-0.15,32){$\mr{0x10}$} } { \psset{unit=0.2cm} \psset{linewidth=0.5pt} \psline(4.0,5.5)(4.0,14) \psline(4.5,6.0)(4.5,14) \psline{->}(2.5,13.5)(4.0,13.5) \psline{<-}(4.5,13.5)(6.0,13.5) \uput{0.2}[r](6.0,13.5){$\Delta \tau\!=\!\frac 1 2$} } } } \end{pspicture} \ec \end{figure} In the figure,~$S_1$ is composed of several tall, thin rectangles of width 1 and height~$k$; $S_2$, of rectangles of width~$1/2$ and height $k/2$.% \footnote{ If the reader does not fully understand this paragraph's illustration, if the relation of the sum to the area seems unclear, the reader is urged to pause and consider the illustration carefully until he does understand it. If it still seems unclear, then the reader should probably suspend reading here and go study a good basic calculus text like~\cite{Hamming}\@. The concept is important. } As~$n$ grows, the shaded region in the figure looks more and more like a triangle of base length $b=\mr{0x10}$ and height $h=\mr{0x10}$. In fact it appears that \[ \lim_{n\ra\infty} S_n = \frac{bh}{2} = \mr{0x80}, \] or more tersely \[ S_\infty = \mr{0x80}, \] is the area the increasingly fine stairsteps approach. \index{integral!as shortcut to a sum} Notice how we have evaluated~$S_\infty$, the sum of an infinite number of infinitely narrow rectangles, without actually adding anything up. We have taken a shortcut directly to the total. \index{limit of integration} \index{integration!limit of} In the equation \[ S_n = \frac 1n\sum_{k=0}^{(\mr{0x10})n-1} \frac{k}{n}, \] let us now change the variables \[ \begin{split} \tau &\la \frac{k}{n}, \\ \Delta \tau &\la \frac{1}{n}, \end{split} \] to obtain the representation \[ S_n = \Delta \tau\sum_{k=0}^{(\mr{0x10})n-1}\tau; \] or more properly, \[ S_n = \sum_{k=0}^{(k|_{\tau=\mr{0x10}})-1} \tau \,\Delta \tau, \] where the notation $k|_{\tau=\mr{0x10}}$ indicates the value of~$k$ when $\tau=\mr{0x10}$. Then \[ S_\infty = \lim_{\Delta \tau\ra 0^+} \sum_{k=0}^{(k|_{\tau=\mr{0x10}})-1} \tau \,\Delta \tau, \] in which it is conventional as~$\Delta \tau$ vanishes to change the symbol $d\tau \la \Delta \tau$, where~$d\tau$ is the infinitesimal of Ch.~\ref{drvtv}: \[ S_\infty = \lim_{d\tau\ra 0^{+}} \sum_{k=0}^{(k|_{\tau=\mr{0x10}})-1} \tau\,d\tau. \] The symbol $\lim_{d\tau\ra 0^+} \sum_{k=0}^{(k|_{\tau=\mr{0x10}})-1}$ is cumbersome, so we replace it with the new symbol% \footnote{ Like the Greek~S,~$\sum$, denoting discrete summation, the seventeenth century-styled Roman~S,~$\int$, stands for Latin ``summa,'' English ``sum.'' See \cite[``Long s,'' 14:54, 7 April 2006]{wikip}. } $\int_0^{\mr{0x10}}$ to obtain the form \[ S_\infty = \int_0^{\mr{0x10}} \tau \,d\tau. \] This means, ``stepping in infinitesimal intervals of~$d\tau$, the sum of all~$\tau\,d\tau$ from $\tau=0$ to $\tau=\mr{0x10}$.'' Graphically, it is the shaded area of Fig.~\ref{integ:220:fig2}. \begin{figure} \caption[An area representing an infinite sum of infinitesimals.] {An area representing an infinite sum of infinitesimals. (Observe that the infinitesimal~$d\tau$ is now too narrow to show on this scale. Compare against~$\Delta\tau$ in Fig.~\ref{integ:220:fig1}.)} \label{integ:220:fig2} \bc \begin{pspicture}(-4,-2)(6,4) { \small \psset{dimen=middle} \nc\xx{3.2} \nc\axes{ { \psset{linewidth=0.5pt} \psline(-1.0,0)(3.5,0) \psline(0,-1.0)(0,3.5) \rput(3.8,0){$\tau$} \rput(0,3.8){$f(\tau)$} } } \rput(0,0){ \axes \psset{linewidth=1.0pt} \psset{unit=0.2cm} \pspolygon[fillstyle=solid,fillcolor=lightgray,hatchwidth=0.5pt,hatchangle=30] ( 0, 0)(16,16)(16, 0) \rput(11.5,4.5){$S_\infty$} { \psset{yunit=1cm,linewidth=0.5pt} \psline(16,0)(16,-0.2)\rput(16,-0.5){$\mr{0x10}$} } { \psset{linestyle=dashed,linewidth=0.5pt} \psline(0,16)(16,16) \psset{xunit=1cm} \rput[r](-0.15,16){$\mr{0x10}$} } } } \end{pspicture} \ec \end{figure} \subsection{Generalizing the introductory example} \label{integ:220.20} Now consider a generalization of the example of \S~\ref{integ:220.10}: \[ S_n = \frac 1n\sum_{k=an}^{bn-1} f\left(\frac{k}{n}\right). \] (In the example of \S~\ref{integ:220.10}, $f[\tau]$ was the simple $f[\tau]=\tau$, but in general it could be any function.) With the change of variables \[ \begin{split} \tau &\la \frac{k}{n}, \\ \Delta \tau &\la \frac{1}{n}, \end{split} \] whereby \[ \settowidth\tla{$k|_{\tau=a}$} \begin{split} \makebox[\tla][l]{$k|_{\tau=a}$} &= an, \\ \makebox[\tla][l]{$k|_{\tau=b}$} &= bn, \\ (k,n) &\in \mathbb Z, \ \ n \neq 0, \end{split} \] (but~$a$ and~$b$ need not be integers), this is \[ S_n = \sum_{k=(k|_{\tau=a})}^{(k|_{\tau=b})-1} f(\tau) \,\Delta \tau. \] In the limit, \[ S_\infty = \lim_{d\tau\ra 0^{+}} \sum_{k=(k|_{\tau=a})}^{(k|_{\tau=b})-1} f(\tau) \,d\tau = \int_a^b f(\tau) \,d\tau. \] This is the \emph{integral} of $f(\tau)$ in the interval $a < \tau < b$. It represents the area under the curve of $f(\tau)$ in that interval. \subsection{The balanced definition and the trapezoid rule} \label{integ:220.30} \index{integral!balanced form} Actually, just as we have defined the derivative in the balanced form~(\ref{drvtv:def}), we do well to define the integral in balanced form, too: \bq{integ:def} \int_a^b f(\tau) \,d\tau \equiv \lim_{d\tau\ra 0^{+}} \left\{ \frac {f(a) \,d\tau}{2} + \sum_{k=(k|_{\tau=a})+1}^{(k|_{\tau=b})-1} f(\tau) \,d\tau + \frac {f(b) \,d\tau}{2} \right\}. \eq Here, the first and last integration samples are each balanced ``on the edge,'' half within the integration domain and half without. \index{trapezoid rule} \index{Simpson's rule} Equation~(\ref{integ:def}) is known as the \emph{trapezoid rule.} Figure~\ref{integ:220:fig-trap} depicts it. \begin{figure} \caption [Integration by the trapezoid rule.] {Integration by the trapezoid rule~(\ref{integ:def}). Notice that the shaded and dashed areas total the same.} % The author's five-year-old, Isaiah, seems to think that the % caption should be, "A house with shooters on it." \label{integ:220:fig-trap} \bc \begin{pspicture}(-2,-1.5)(7.5,4) { \small \psset{dimen=middle} \settoheight\tla{$ab\,d\tau$} { % The shaded area here cheats downward 0.02 cm at the top, to % visually emphasize the unintegrated area slightly. \psset{linewidth=1.0pt} \pspolygon[fillstyle=solid,fillcolor=lightgray] (1.0,0) (1.0,1.8050) (2.2,2.4530) (3.4,2.2370) (4.6,1.1570) (4.6,0) \psline(2.2,0)(2.2,2.4530) \psline(3.4,0)(3.4,2.2370) \psline(4.6,0)(4.6,1.1570) } { \psset{linewidth=0.5pt,linestyle=dashed} \psline(1.0,1.8250)(1.6,1.8250) \psline(1.6,2.4730)(2.8,2.4730) \psline(2.8,2.2570)(4.0,2.2570) \psline(4.0,1.1770)(4.6,1.1770) \psline(1.6,0)(1.6,2.4730) \psline(2.8,0)(2.8,2.4730) \psline(4.0,0)(4.0,2.2570) } \psline[linewidth=0.5pt](-1,0)(6,0) \psline[linewidth=0.5pt](0,-0.5)(0,3) \psplot[linewidth=2.0pt,plotpoints=200]{0.5}{5.0}{ x 2.5 sub dup mul -0.30 mul 2.5 add } { \psset{linewidth=2.0pt} \psdot(1.0,1.8250) \psdot(2.2,2.4730) \psdot(3.4,2.2570) \psdot(4.6,1.1770) } \rput(6.3,0){$\tau$} \rput(0,3.3){$f(\tau)$} \rput(1.0,-0.3){\rule{0em}{\tla}$a$} \rput(2.8,-0.3){\rule{0em}{\tla}$d\tau$} \rput(4.6,-0.3){\rule{0em}{\tla}$b$} { \psset{linewidth=0.5pt} \psline(2.2,-0.2)(2.2,-0.4) \psline(3.4,-0.2)(3.4,-0.4) \psline{->}(2.5,-0.3)(2.2,-0.3) \psline{->}(3.1,-0.3)(3.4,-0.3) } } \end{pspicture} \ec \end{figure} The name ``trapezoid'' comes of the shapes of the shaded integration elements in the figure. Observe however that it makes no difference whether one regards the shaded trapezoids or the dashed rectangles as the actual integration elements; the total integration area is the same either way.% \footnote{ % diagn: review this footnote. The trapezoid rule~(\ref{integ:def}) is perhaps the most straightforward, general, robust way to define the integral, but other schemes are possible, too. For example, taking the trapezoids in adjacent pairs---such that a pair enjoys not only a sample on each end but a third sample in the middle---one can for each pair fit a second-order curve $f(\tau) \approx (c_2)(\tau-\tau_{\mr{middle}})^2 + (c_1)(\tau-\tau_{\mr{middle}}) + c_0$ to the function, choosing the coefficients~$c_2$, $c_1$ and~$c_0$ to make the curve match the function exactly at the pair's three sample points; then substitute the area under the pair's curve (which by the end of \S~\ref{integ:241} we shall know how to calculate exactly) for the areas of the two trapezoids. Changing the symbol $\Delta\tau \la d\tau$ on one side of the equation to suggest coarse sampling, the result is the unexpectedly simple \bqb \int_a^b f(\tau) \,\Delta\tau &\approx& \bigg[ \frac{1}{3}f(a) + \frac{4}{3}f(a+\Delta\tau) + \frac{2}{3}f(a+2\,\Delta\tau) \\&&\ \ \mbox{} + \frac{4}{3}f(a+3\,\Delta\tau) + \frac{2}{3}f(a+4\,\Delta\tau) + \cdots + \frac{4}{3}f(b-\Delta\tau) + \frac{1}{3}f(b) \bigg] \,\Delta\tau, \eqb as opposed to the trapezoidal \bqb \int_a^b f(\tau) \,\Delta\tau &\approx& \bigg[ \frac{1}{2}f(a) + f(a+\Delta\tau) + f(a+2\,\Delta\tau) \\&&\ \ \mbox{} + f(a+3\,\Delta\tau) + f(a+4\,\Delta\tau) + \cdots + f(b-\Delta\tau) + \frac{1}{2}f(b) \bigg] \,\Delta\tau \eqb implied by~(\ref{integ:def}). The curved scheme is called \emph{Simpson's rule.} It is clever and well known. Simpson's rule had real uses in the slide-rule era when, for practical reasons, one preferred to let~$\Delta\tau$ be sloppily large, sampling a curve only a few times to estimate its integral; yet the rule is much less useful when a computer is available to do the arithmetic over an adequate number of samples. At best Simpson's rule does not help much with a computer; at worst it can yield spurious results; and because it is easy to program it tends to encourage thoughtless application. Other than in the footnote you are reading, Simpson's rule is not covered in this book. } The important point to understand is that the integral is conceptually just a sum. It is a sum of an infinite number of infinitesimal elements as $d\tau$ tends to vanish, but a sum nevertheless; nothing more. \index{variable~$d\tau$} \index{variable independent infinitesimal} \index{independent infinitesimal!variable} \index{infinitesimal!independent, variable} \index{Leibnitz notation} Nothing actually requires the integration element width~$d\tau$ to remain constant from element to element, incidentally. Constant widths are usually easiest to handle but variable widths find use in some cases. The only requirement is that~$d\tau$ remain infinitesimal. (For further discussion of the point, refer to the treatment of the Leibnitz notation in \S~\ref{drvtv:240.25}.) % ---------------------------------------------------------------------- \section[The antiderivative]{The antiderivative and the fundamental theorem of calculus} \label{integ:230} \index{integral!as antiderivative} \index{antiderivative} \index{calculus!fundamental theorem of} \index{fundamental theorem of calculus} \index{accretion} If \[ S(x) \equiv \int_a^x g(\tau) \,d\tau, \] then what is the derivative $dS/dx$? After some reflection, one sees that the derivative must be \[ \frac{dS}{dx} = g(x). \] This is so because the action of the integral is to compile or accrete the area under a curve. The integral accretes area at a rate proportional to the curve's height $f(\tau)$: the higher the curve, the faster the accretion. In this way one sees that the integral and the derivative are inverse operators; the one inverts the other. The integral is the \emph{antiderivative.} More precisely, \bq{integ:antider} \int_a^b \frac{df}{d\tau} \,d\tau = f(\tau)|_a^b, \eq where the notation $f(\tau)|_a^b$ or $[f(\tau)]_a^b$ means $f(b)-f(a)$. \index{calculus!the two complementary questions of} The importance of~(\ref{integ:antider}), fittingly named the \emph{fundamental theorem of calculus},% \footnote{% \cite[\S~11.6]{Hamming}% \cite[\S~5-4]{Shenk}% \cite[``Fundamental theorem of calculus,'' 06:29, 23 May 2006]{wikip} } can hardly be overstated. As the formula which ties together the complementary pair of questions asked at the chapter's start,~(\ref{integ:antider}) is of utmost importance in the practice of mathematics. The idea behind the formula is indeed simple once grasped, but to grasp the idea firmly in the first place is not entirely trivial.% \footnote{ Having read from several calculus books and, like millions of others perhaps including the reader, having sat years ago in various renditions of the introductory calculus lectures in school, the author has never yet met a more convincing demonstration of~(\ref{integ:antider}) than the formula itself. Somehow the underlying idea is too simple, too profound to explain. It's like trying to explain how to drink water, or how to count or to add. Elaborate explanations and their attendant constructs and formalities are indeed possible to contrive, but the idea itself is so simple that somehow such contrivances seem to obscure the idea more than to reveal it. One ponders the formula~(\ref{integ:antider}) a while, then the idea dawns on him. If you want some help pondering, try this: Sketch some arbitrary function $f(\tau)$ on a set of axes at the bottom of a piece of paper---some squiggle of a curve like \settoheight\tla{\scriptsize $b$} \bc \nc\fxa{-0.5} \nc\fxb{1.5} \nc\fya{-0.35} \nc\fyb{0.8} \nc\xxa {0.13} \nc\xxb{1.2} \nc\xxt{0.07} \nc\xxtt{0.25} \nc\xxma{0.40}\nc\xxmb{1.10} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \scriptsize \psset{dimen=middle} \psset{linewidth=0.5pt} \psline(\fxa,0)(\fxb,0) \psline(0,\fya)(0,\fyb) \psplot[linewidth=1.0pt,plotpoints=50]{\xxa}{\xxb}{ x 0.7 sub dup dup mul mul 3.0 mul 0.4 add } \psline(\xxma,-\xxt)(\xxma,\xxt) \psline(\xxmb,-\xxt)(\xxmb,\xxt) \rput(\xxma,-\xxtt){$\rule{0em}{\tla}a$} \rput(\xxmb,-\xxtt){$\rule{0em}{\tla}b$} \uput{\xxt}[r](\fxb,0){$\tau$} \uput{\xxt}[r](0,\fyb){\uput{0}[dr](0,0){$f(\tau)$}} } \end{pspicture} \ec will do nicely---then on a separate set of axes directly above the first, sketch the corresponding slope function $df/d\tau$. Mark two points~$a$ and~$b$ on the common horizontal axis; then on the upper, $df/d\tau$ plot, shade the integration area under the curve. Now consider~(\ref{integ:antider}) in light of your sketch. There. Does the idea not dawn? Another way to see the truth of the formula begins by canceling its $(1/d\tau)\,d\tau$ to obtain the form $\int_{\tau=a}^b df = f(\tau)|_a^b$. If this way works better for you, fine; but make sure that you understand it the other way, too. } The idea is simple but big. The reader is urged to pause now and ponder the formula thoroughly until he feels reasonably confident that indeed he does grasp it and the important idea it represents. One is unlikely to do much higher mathematics without this formula. As an example of the formula's use, consider that because \linebreak % bad break $(d/d\tau)(\tau^3/6)=\tau^2/2$, it follows that \[ \int_2^x \frac{\tau^2\,d\tau}{2} = \int_2^x \frac{d}{d\tau}\left(\frac{\tau^3}{6}\right)\,d\tau = \left.\frac{\tau^3}{6}\right|_{2}^{x} = \frac{x^3-8}{6}. \] Gathering elements from~(\ref{drvtv:240.30:10}) and from Tables~\ref{cexp:drv} and~\ref{cexp:drvi}, Table~\ref{integ:basic-antider} lists a handful of the simplest, most useful derivatives for antiderivative use.% \begin{table} \caption{Basic derivatives for the antiderivative.} \label{integ:basic-antider} \[ \int_a^b \frac{df}{d\tau} \,d\tau = f(\tau)|_a^b \] \[ \renewcommand\arraystretch{2.0} \setlength\arraycolsep{0.5ex} \br{rclcrcl} \ds\tau^{a-1} &=& \ds\frac{d}{d\tau} \ds\left(\frac{\tau^a}{a}\right), &\ \ & \ds a &\neq& 0 \\ \ds\frac{1}{\tau} &=& \ds\frac{d}{d\tau} \ln\tau, && \ds \ln 1 &=& 0 \\ \ds\exp\tau &=& \ds\frac{d}{d\tau} \exp\tau, && \ds \exp 0 &=& 1 \\ \ds\cos\tau &=& \ds\frac{d}{d\tau} \sin\tau, && \ds \sin 0 &=& 0 \\ \ds\sin\tau &=& \ds\frac{d}{d\tau} \left(-\cos\tau\right), && \ds \cos 0 &=& 1 \er \] \end{table} Section~\ref{inttx:210} speaks further of the antiderivative. % ---------------------------------------------------------------------- \section{Operators, linearity and multiple integrals} \label{integ:240} This section presents the operator concept, discusses linearity and its consequences, treats the commutivity of the summational and integrodifferential operators, and introduces the multiple integral. \subsection{Operators} \label{integ:240.03} \index{operator} An \emph{operator} is a mathematical agent that combines several values of a function. Such a definition, unfortunately, is extraordinarily unilluminating to those who do not already know what it means. A better way to introduce the operator is by giving examples. Operators include~$+$,~$-$, multiplication, division, $\sum$, $\prod$, $\int$ and~$\partial$. The essential action of an operator is to take several values of a function and combine them in some way. For example,~$\prod$ is an operator in \[ \prod_{j=1}^{5} (2j-1) = (1)(3)(5)(7)(9) = \mr{0x3B1}. \] \index{dummy variable} \index{operator!using a variable up} Notice that the operator has acted to remove the variable~$j$ from the expression $2j-1$. The~$j$ appears on the equation's left side but not on its right. The operator has used the variable up. Such a variable, used up by an operator, is a \emph{dummy variable,} as encountered earlier in \S~\ref{alggeo:227}. \subsection{A formalism} \label{integ:240.04} \index{operator!$+$ and~$-$ as} But then how are~$+$ and~$-$ operators? They don't use any dummy variables up, do they? Well, it depends on how you look at it. Consider the sum $S = 3 + 5$. One can write this as \[ S = \sum_{k=0}^1 f(k), \] where \[ f(k) \equiv \begin{cases} 3 &\mbox{if}\ k=0,\\ 5 &\mbox{if}\ k=1,\\ \mbox{undefined} &\mbox{otherwise}. \end{cases} \] Then, \[ S = \sum_{k=0}^1 f(k) = f(0) + f(1) = 3 + 5 = 8. \] By such admittedly excessive formalism, the~$+$ operator can indeed be said to use a dummy variable up. The point is that~$+$ is in fact an operator just like the others. Another example of the kind: \bqb D &=& g(z) - h(z) + p(z) + q(z) \\ &=& g(z) - h(z) + p(z) - 0 + q(z) \\ &=& \Phi(0,z) - \Phi(1,z) + \Phi(2,z) - \Phi(3,z) + \Phi(4,z) \\ &=& \sum_{k=0}^4 (-)^k \Phi(k,z), \eqb where \bqb \Phi(k,z) \equiv \begin{cases} g(z) &\mbox{if}\ k=0,\\ h(z) &\mbox{if}\ k=1,\\ p(z) &\mbox{if}\ k=2,\\ 0 &\mbox{if}\ k=3,\\ q(z) &\mbox{if}\ k=4,\\ \mbox{undefined} &\mbox{otherwise}. \end{cases} \eqb \index{formalism} Such unedifying formalism is essentially useless in applications, except as a vehicle for definition. Once you understand why~$+$ and~$-$ are operators just as~$\sum$ and~$\int$ are, you can forget the formalism. It doesn't help much. \subsection{Linearity} \label{integ:240.05} \index{linearity} \index{linear combination} \index{linearity!of a function} \index{function!linear} \index{function!nonlinear} \index{linear expression} \index{iff} A function $f(z)$ is \emph{linear} iff (if and only if) it has the properties \bqb f(z_1 + z_2) &=& f(z_1) + f(z_2), \\ f(\alpha z) &=& \alpha f(z), \\ f(0) &=& 0. \eqb The functions $f(z) = 3z$, $f(u,v)=2u-v$ and $f(z) = 0$ are examples of linear functions. Nonlinear functions include% \footnote{ If $3z+1$ is a \emph{linear expression,} then how is not $f(z)=3z+1$ a \emph{linear function?} Answer: it is partly a matter of purposeful definition, partly of semantics. The equation $y=3x+1$ plots a line, so the expression $3z+1$ is literally ``linear'' in this sense; but the definition has more purpose to it than merely this. When you see the linear expression $3z+1$, think $3z+1=0$, then $g(z)=3z=-1$. The $g(z)=3z$ is linear; the~$-1$ is the constant value it targets. That's the sense of it. } $f(z) = z^2$, $f(u,v) = \sqrt{uv}$, $f(t)=\cos\omega t$, $f(z) = 3z+1$ and even $f(z)=1$. \index{linearity!of an operator} \index{operator!linear} \index{operator!nonlinear} \index{linear operator} An operator~$L$ is linear iff it has the properties \bqb L(f_1 + f_2) &=& Lf_1 + Lf_2, \\ L(\alpha f) &=& \alpha Lf, \\ L(0) &=& 0. \eqb The operators~$\sum$, $\int$, $+$, $-$ and~$\partial$ are examples of linear operators. For instance,% \footnote{ You don't see~$d$ in the list of linear operators? But~$d$ in this context is really just another way of writing~$\partial$, so, yes,~$d$ is linear, too. See \S~\ref{drvtv:240.25}. } \[ \frac{d}{dz}[f_1(z) + f_2(z)] = \frac{df_1}{dz} + \frac{df_2}{dz}. \] Nonlinear operators include multiplication, division and the various trigonometric functions, among others. Section~\ref{vcalc:320.15} will have more to say about operators and their notation. \subsection{Summational and integrodifferential commutivity} \label{integ:240.10} \index{commutivity!summational and integrodifferential} \index{convergence} Consider the sum \settoheight\tla{\scriptsize$k=a\ j=p$} \[ S_1 = \sum_{k=a\rule{0em}{\tla}}^b \left[ \sum_{j=p\rule{0em}{\tla}}^q \frac{x^k}{j!} \right]. \] This is a sum of the several values of the expression $x^k/j!$, evaluated at every possible pair $(j,k)$ in the indicated domain. Now consider the sum \settoheight\tla{\scriptsize$k=a\ j=p$} \[ S_2 = \sum_{j=p\rule{0em}{\tla}}^q \left[ \sum_{k=a\rule{0em}{\tla}}^b \frac{x^k}{j!} \right]. \] This is evidently a sum of the same values, only added in a different order. Apparently $S_1 = S_2$. Reflection along these lines must soon lead the reader to the conclusion that, in general, % The footnote would nevertheless close with an eminent contrary voice, % that of the late Richard~W. Hamming: % \begin{quote} % When you yourself are responsible for some new application of % mathematics in your chosen field, then your reputation, possibly % millions of dollars and long delays in the work, and possibly even % human lives, may depend on the results you predict. It is then that % the \emph{need} for mathematical rigor will become painfully obvious % to you. Before this time, mathematical rigor will often seem to be % needless pedantry\mdots\ \cite[\S~1.6]{Hamming} % \end{quote} % The author has not himself experienced such scourges for the cause of % which Hamming warns, and observes that time spent conforming % mathematical results to the dictates of professional rigor may be time % not spent checking the results in more mundane ways; but this does not % in itself reduce the value of Hamming's warning. Be that as it may, % Hamming's and Knopp's fine books rather than this one would be the % proper sources from which to learn that style of mathematical rigor. \[ \sum_k\sum_j f(j,k) = \sum_j\sum_k f(j,k). \] Now consider that an integral is just a sum of many elements, and that a derivative is just a difference of two elements. Integrals and derivatives must then have the same commutative property discrete sums have. For example, \bqb \int_{v=-\infty}^{\infty}\int_{u=a}^{b} f(u,v) \,du\,dv &=& \int_{u=a}^{b} \int_{v=-\infty}^{\infty} f(u,v) \,dv\,du; \\ \int\sum_k f_k(v) \,dv &=& \sum_k\int f_k(v) \,dv; \\ \ppx{v} \int f \,du &=& \int \pp{f}{v} \,du. \eqb In general, \bq{integ:240:10} L_v L_u f(u,v) = L_u L_v f(u,v), \eq where~$L$ is any of the linear operators~$\sum$, $\int$ or~$\partial$. \index{conditional convergence} \index{convergence!conditional} \index{Euler, Leonhard (1707--1783)} \index{rigor} Some convergent summations, like \settoheight\tla{\scriptsize $k$} \[ \sum_{k=0}^\infty \sum_{\rule{0em}{\tla}j=0}^1 \frac{(-)^j}{2k+j+1}, \] diverge once reordered, as \settoheight\tla{\scriptsize $k$} \[ \sum_{\rule{0em}{\tla}j=0}^1 \sum_{k=0}^\infty \frac{(-)^j}{2k+j+1}. \] One cannot blithely swap operators here. This is not because swapping is wrong, but rather because the inner sum after the swap diverges, hence the outer sum after the swap has no concrete summand on which to work. (\emph{Why} does the inner sum after the swap diverge? Answer: $1 + 1/3 + 1/5 + \cdots = [1] + [1/3 + 1/5] + [1/7 + 1/9 + 1/\mbox{0xB} + 1/\mbox{0xD} ] + \cdots > 1[1/4] + 2[1/8] + 4[1/\mbox{0x10}] + \cdots = 1/4 + 1/4 + 1/4 + \cdots$. See also \S~\ref{taylor:316.70}.) For a more twisted example of the same phenomenon, consider% \footnote{\cite[\S~1.2.3]{Andrews}} \[ 1 - \frac 1 2 + \frac 1 3 - \frac 1 4 + \cdots = \left( 1 - \frac 1 2 - \frac 1 4 \right) + \left( \frac 1 3 - \frac 1 6 - \frac 1 8 \right) + \cdots, \] which associates two negative terms with each positive, but still seems to omit no term. Paradoxically, then, \bqb 1 - \frac 1 2 + \frac 1 3 - \frac 1 4 + \cdots &=& \left( \frac 1 2 - \frac 1 4 \right) + \left( \frac 1 6 - \frac 1 8 \right) + \cdots \\&=& \frac 1 2 - \frac 1 4 + \frac 1 6 - \frac 1 8 + \cdots \\&=& \frac 1 2 \left( 1 - \frac 1 2 + \frac 1 3 - \frac 1 4 + \cdots \right), \eqb or so it would seem, but cannot be, for it claims falsely that the sum is half itself. A better way to have handled the example might have been to write the series as \[ \lim_{n \ra \infty} \left\{ 1 - \frac 1 2 + \frac 1 3 - \frac 1 4 + \cdots + \frac 1 {2n-1} - \frac{1}{2n} \right\} \] in the first place, thus explicitly specifying equal numbers of positive and negative terms.% \footnote{\label{integ:240:fn19}% Some students of professional mathematics would assert that the false conclusion had been reached through lack of rigor. Well, maybe. This writer however does not feel sure that \emph{rigor} is quite the right word for what was lacking here. Professional mathematics does bring an elegant notation and a set of formalisms which serve ably to spotlight certain limited kinds of blunders, but these are blunders no less by the applied approach. The stalwart Leonhard Euler---arguably the greatest series-smith in mathematical history---wielded his heavy analytical hammer in thunderous strokes before professional mathematics had conceived the notation or the formalisms. If the great Euler did without, then you and I might not always be forbidden to follow his robust example. See also footnote~\ref{integ:240:fn20}. On the other hand, the professional approach is worth study if you have the time. Recommended introductions include~\cite{Knopp}, preceded if necessary by~\cite{Hamming} and/or \cite[Ch.~1]{Andrews}. } So specifying would have prevented the error. In the earlier example, \[ \lim_{n\ra\infty} {\sum_{k=0}^n} \sum_{\rule{0em}{\tla}j=0}^1 \frac{(-)^j}{2k+j+1} \] likewise would have prevented the error, or at least have made the error explicit. The \emph{conditional convergence}% \footnote{\cite[\S~16]{Knopp}} of the last paragraph, which can occur in integrals as well as in sums, seldom poses much of a dilemma in practice. One can normally swap summational and integrodifferential operators with little worry. The reader however should at least be aware that conditional convergence troubles can arise where a summand or integrand varies in sign or phase. \subsection{Multiple integrals} \label{integ:240.20} \index{integral!multiple} \index{surface} \index{volume} Consider the function \[ f(u,w) = \frac{u^2}{w}. \] Such a function would not be plotted as a curved line in a plane, but rather as a curved \emph{surface} in a three-dimensional space. Integrating the function seeks not the area under the curve but rather the volume under the surface: \[ V = \int_{u_1}^{u_2} \int_{w_1}^{w_2} \frac{u^2}{w} \,dw\,du. \] This is a \emph{double integral.} Inasmuch as it can be written in the form \bqb V &=& \int_{u_1}^{u_2} g(u) \,du, \\ g(u) &\equiv& \int_{w_1}^{w_2} \frac{u^2}{w} \,dw, \eqb its effect is to cut the area under the surface into flat, upright slices, then the slices crosswise into tall, thin towers. The towers are integrated over~$w$ to constitute the slice, then the slices over~$u$ to constitute the volume. % diagn: this extended paragraph wants review. \index{integral swapping} \index{integral!ill-behaved} \index{double integral!ill-behaved} In light of \S~\ref{integ:240.10}, evidently nothing prevents us from swapping the integrations:~$u$ first, then~$w$. Hence \[ V = \int_{w_1}^{w_2} \int_{u_1}^{u_2} \frac{u^2}{w} \,du\,dw. \] And indeed this makes sense, doesn't it? What difference should it make whether we add the towers by rows first then by columns, or by columns first then by rows? The total volume is the same in any case---albeit the integral over~$w$ is potentially ill-behaved% \footnote{\label{integ:240:fn20}% % diagn: inadvisable? I hope not, but maybe so anyway. A great deal of ink is spilled in the applied mathematical literature when summations and/or integrations are interchanged. The author tends to recommend saving the ink, for pure and applied mathematics want different styles. What usually matters in applications is not whether a particular summation or integration satisfies some formal test but rather whether one clearly understands the summand to be summed or the integrand to be integrated. See also footnote~\ref{integ:240:fn19}. } near $w=0$; so that, if for instance~$w_1$ were negative,~$w_2$ were positive, and both were real, one might rather write the double integral as% \footnote{ It is interesting to consider the effect of withdrawing the integral's limit at~$-\ep$ to~$-2\ep$, as $\lim_{\ep\ra 0^{+}} \left(\int_{w_1}^{-2\ep} + \int_{+\ep}^{w_2} \right) \int_{u_1}^{u_2} \frac{u^2}{w} \,du\,dw$; for, surprisingly---despite that the parameter~$\ep$ is vanishing anyway---the withdrawal does alter the integral unless the limit at~$+\ep$ also is withdrawn. The reason is that $\lim_{\ep\ra 0^{+}} \int_{\ep}^{2\ep} (1/w) \,dw = \ln 2 \neq 0$. } \[ V = \lim_{\ep\ra 0^{+}} \left(\int_{w_1}^{-\ep} + \int_{+\ep}^{w_2} \right) \int_{u_1}^{u_2} \frac{u^2}{w} \,du\,dw. \] \index{integral!double} \index{integral!triple} \index{double integral} \index{triple integral} \index{integral!surface} \index{integral!volume} \index{surface integration} \index{volume integration} \index{density} \index{mass density} Double integrations arise very frequently in applications. Triple integrations arise about as often. For instance, if $\mu(\ve r) = \mu(x,y,z)$ represents the variable mass density of some soil,% \footnote{ Conventionally the Greek letter~$\rho$ not~$\mu$ is used for density, but it happens that we need the letter~$\rho$ for a different purpose later in the paragraph. } then the total soil mass in some rectangular volume is \[ M = \int_{x_1}^{x_2} \int_{y_1}^{y_2} \int_{z_1}^{z_2} \mu(x,y,z) \,dz\,dy\,dx. \] As a concise notational convenience, the last is likely to be written \[ M = \int_V \mu(\ve r) \,d\ve r, \] where the~$V$ stands for ``volume'' and is understood to imply a triple integration. Similarly for the double integral, \[ V = \int_S f(\we\rho) \,d\we\rho, \] where the~$S$ stands for ``surface'' and is understood to imply a double integration. \index{space and time} \index{time and space} \index{Fourier transform!spatial} \index{fourfold integral} \index{sixfold integral} \index{integral!fourfold} \index{integral!sixfold} Even more than three nested integrations are possible. If we integrated over time as well as space, the integration would be fourfold. A spatial Fourier transform (\S~\ref{fouri:350}) implies a triple integration; and its inverse, another triple: a sixfold integration altogether. Manifold nesting of integrals is thus not just a theoretical mathematical topic; it arises in sophisticated real-world engineering models. The topic concerns us here for this reason. % ---------------------------------------------------------------------- \section{Areas and volumes} \label{integ:241} \index{volume} \index{solid!volume of} By composing and solving appropriate integrals, one can calculate the peri\-me\-ters, areas and volumes of interesting common shapes and solids. % diagn: The area under, say, a third-order curve ought to be calculated % here before trying the circle---and a diagram should be given. \subsection{The area of a circle} \label{integ:241.10} \index{area} \index{area!surface} \index{shape!area of} \index{circle!area of} Figure~\ref{integ:241:fig1} depicts an element of a circle's area. \begin{figure} \caption{The area of a circle.} \label{integ:241:fig1} \bc { \nc\xax{-5} \nc\xbx{-3} \nc\xcx{ 5} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xx{2.5} \nc\xxc{0.25} \nc\xxp{1.8} \nc\xxr{2.2361} \nc\xxrr{1.10} \nc\xxtaa{14} \nc\xxta{24} \nc\xxtb{33} \nc\xxtbb{43} \nc\xxs{2.6} \nc\xxa{ \psline[linewidth=0.5pt](-\xx,0)(\xx,0) } \xxa \rput{90}(0,0){\xxa} \pswedge[linewidth=0.5pt,fillstyle=solid,fillcolor=lightgray](0,0){\xxr}{\xxta}{\xxtb} \pscircle[linewidth=2.0pt](0,0){\xxr} \psarc[linewidth=0.5pt]{->}(0,0){\xxp}{\xxtaa}{\xxta} \psarc[linewidth=0.5pt]{<-}(0,0){\xxp}{\xxtb}{\xxtbb} \rput{\xxtbb}(0,0){ \uput{\xxc}[u](\xxp,0){\rput{*0}(0,0){$d\phi$}} } \rput{\xxtb}(0,0){ \uput[u](\xxrr,0){ \rput{*0}(0,0){$\rho$} } } \rput[l](\xxs,0){$x$} \rput[b](0,\xxs){$y$} \end{pspicture} } \ec \end{figure} The element has wedge shape, but inasmuch as the wedge is infinitesimally narrow, the wedge is indistinguishable from a triangle of base length~$\rho\,d\phi$ and height~$\rho$. The area of such a triangle is $A_\mr{triangle}=\rho^2\,d\phi/2$. Integrating the many triangles, we find the circle's area to be \bq{integ:241:A-circle} A_\mr{circle} = \int_{\phi=-\pi}^{\pi} \! A_\mr{triangle} = \int_{-\pi}^{\pi} \frac{\rho^2\,d\phi}{2} = \frac{2\pi \rho^2}{2}. \eq (The numerical value of~$2\pi$---the circumference or perimeter of the unit circle---we have not calculated yet. We will calculate it in \S~\ref{taylor:355}.) \subsection{The volume of a cone} \label{integ:241.20} \index{cone!volume of} \index{pyramid!volume of} \index{normal vector or line} \index{vertex} One can calculate the volume of any cone (or pyramid) if one knows its base area~$B$ and its altitude~$h$ measured normal% \footnote{ \emph{Normal} here means ``at right angles.'' } to the base. Refer to Fig.~\ref{integ:241:fig2}. \begin{figure} \caption{The volume of a cone.} \label{integ:241:fig2} \bc { \nc\xax{-5} \nc\xbx{-1.0} \nc\xcx{ 5} \nc\xdx{ 2.2} \nc\xxa{2.0} \nc\xxb{1.2} \nc\xxc{0.22} \nc\xxdx{1.3} \nc\xxdy{1.8} \nc\xxex{0.98298} \nc\xxey{0.12619} \nc\xxf{0.08} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) %\psframe[dimen=outer,linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \psset{linewidth=1.0pt} \small \psccurve[linewidth=0pt,linecolor=lightgray,fillstyle=solid,fillcolor=lightgray] (-0.8, 0.1)(-1.0,0.0)(-0.8,-0.1)(-0.3,0)(0.8,-0.1)(1.0,0.0)(0.8, 0.1) \psecurve[linestyle=solid] (-0.8, 0.1)(-1.0,0.0)(-0.8,-0.1)(-0.3,0)(0.8,-0.1)(1.0,0.0)(0.8, 0.1) \psecurve[linestyle=dashed](-0.8,-0.1)(-1.0,0.0)(-0.8, 0.1)(0.8, 0.1)(1.0,0.0)(0.8,-0.1) \psline(-1.0,0)(0.3,1.8)(1.0,0) { \psset{linewidth=0.5pt} \psline[linestyle=dashed](0.3,1.8)(0.3,0) \psline[linewidth=0.5pt](0.0,0)(0.0,0.3)(0.3,0.3) \rput(0.1,0.8){$h$} \psline(0.6,-0.03)(0.75,-0.35) \rput(0.79,-0.52){$B$} } \end{pspicture} } \ec \end{figure} A cross-section of a cone, cut parallel to the cone's base, has the same shape the base has but a different scale. If coordinates are chosen such that the altitude~$h$ runs in the~$\vu z$ direction with $z=0$ at the cone's vertex, then the cross-sectional area is evidently% \footnote{ The fact may admittedly not be evident to the reader at first glance. If it is not yet evident to you, then ponder Fig.~\ref{integ:241:fig2} a moment. Consider what it means to cut parallel to a cone's base a cross-section of the cone, and how cross-sections cut nearer a cone's vertex are smaller though the same shape. What if the base were square? Would the cross-sectional area not be $(B)(z/h)^2$ in that case? What if the base were a right triangle with equal legs---in other words, half a square? What if the base were some other strange shape like the base depicted in Fig.~\ref{integ:241:fig2}? Could such a strange shape not also be regarded as a definite, well-characterized part of a square? (With a pair of scissors one can cut any shape from a square piece of paper, after all.) Thinking along such lines must soon lead one to the insight that the parallel-cut cross-sectional area of a cone can be nothing other than $(B)(z/h)^2$, regardless of the base's shape. } $(B)(z/h)^2$. For this reason, the cone's volume is \bq{integ:241:V-cone} V_\mr{cone} = \int_0^h (B)\left(\frac{z}{h}\right)^2\,dz = \frac{B}{h^2} \int_0^h z^2\,dz = \frac{B}{h^2} \left(\frac{h^3}{3}\right) = \frac{Bh}{3}. \eq \subsection{The surface area and volume of a sphere} \label{integ:241.30} \index{integral!surface} \index{surface integration} \index{surface area} \index{solid!surface area of} \index{sphere!surface area of} \index{strip, tapered} \index{tapered strip} \index{equator} Of a sphere, Fig.~\ref{integ:fig-sphere}, one wants to calculate both the surface area and the volume. \begin{figure} \caption{A sphere.} \label{integ:fig-sphere} \sphere \end{figure} \begin{figure} \caption[An element of a sphere's surface.] {An element of the sphere's surface (see Fig.~\ref{integ:fig-sphere}).} %{The Death Star with a square gun.} % say the author's children, 2007 \label{integ:fig-sphere2} \bc % Figure bounds: \nc\xa{-5.0}\nc\xb{ 5.0} \nc\ya{-3.5}\nc\yb{ 3.5} \nc\xxr{2.5} % The sphere's radius. \nc\xxrb{2.8} % Axis extent. \nc\xxq{0.30} % Right-angle symbol size. \nc\xxd{0.85}\nc\xxda{0.1}\nc\xxdb{0.3} % rho dimension offsets \nc\xxs{0.20} % Perspective ratio of the x-y circle. \nc\xxphir{0.8} % Angle dimension radius. \nc\xxc{0.40} % Angles. (The xg is a perspective angle for the x axis.) \nc\xxxg {5} \nc\xxcosxg {0.99619} \nc\xxsinxg {0.08716} \nc\xxphi {50} \nc\xxcosphi {0.64279} \nc\xxsinphi {0.76604} \nc\xxtheta{75} \nc\xxcostheta{0.25882} \nc\xxsintheta{0.96593} \nc\xxphib {35} \nc\xxcosphib {0.81915} \nc\xxsinphib {0.57358} \setlength\tla{\xxr cm} % The sphere's radius. \setlength\tlb{\xxcostheta\tla} % The y coordinate of the point. \setlength\tlc{\xxsintheta\tla} % The in-ellipse x coordinate of the point. \setlength\tld{\xxsinphi\tlc} % The on-paper x coordinate of the point. \setlength\tle{\xxcosphi\tlc} % The non-x-y-perspectived y coordinate of the z-rho intersection. \setlength\tlf{\xxs\tle} % The x-y-perspectived y coordinate of the z-rho intersection. \setlength\tlg{\xxsintheta\tla} \setlength\tlh{\xxcostheta\tla} \nc\xxaxisx{ % An axis without the upper extension. { \psset{linewidth=0.5pt} \psline(0,-\xxr)(0,-\xxrb) }% } \nc\xxaxis{ % An axis (vertical by default). { \xxaxisx \psline(0,\xxr)(0,\xxrb) }% } \begin{pspicture}(\xa,\ya)(\xb,\yb) \small \psset{dimen=middle} %\psframe[dimen=outer,linewidth=0.5pt](\xa,\ya)(\xb,\yb) % A figure bounding box. %\psdot(1.42,0.25) %\psdot(1.89,0.35) %\psdot(1.89,-0.32) %\psdot(1.41,-0.41) % This is an ugly way to fill the region, but it seems to work for % this figure. { \psset{linewidth=2.0pt} \pspolygon[linewidth=0.0pt,fillstyle=solid,fillcolor=lightgray] (1.42,0.25)(1.89,0.35)(1.91,0.02)(1.89,-0.32)(1.41,-0.41)(1.43,-0.08) \localscalebox{1.0}{\xxs}{ \psarc(0,0){\xxr}{-180}{0} } \rput(0,\tlh){ \localscalebox{1.0}{\xxs}{ \psarc(0,0){\tlg}{-180}{0} } } \rput(0,-\tlh){ \localscalebox{1.0}{\xxs}{ \psarc{->}(0,0){\tlg}{-70}{-58} \psarc{<-}(0,0){\tlg}{-43}{-31} } } \pscircle(0,0){\xxr} % The outer silhouette of the sphere. } { \psset{linewidth=0.5pt} \rput{0} (0,0){\xxaxis} % The z axis. \rput{90}(0,0){\xxaxis} % The y axis. \rput{90}(0,0){ % (Rotation and rerotation are needed to work around a LaTeX syntax issue.) \localscalebox{1.0}{\xxsinphi}{ % The vertical ellipse. \rput{*0}(0,0){ \psarc(0,0){\xxr}{-90}{90} % The constant-phi arc. }% } } \rput{90}(0,0){ % (Rotation and rerotation are needed to work around a LaTeX syntax issue.) \localscalebox{1.0}{\xxsinphib}{ % The vertical ellipse. \rput{*0}(0,0){ \psarc(0,0){\xxr}{-90}{90} % The constant-phi arc. }% } } } { % Again, a bit ugly, but visually it works. \psset{linewidth=0.5pt} \rput(0,0.17){$\psline{<-}(0,0)(0,\xxc)$} \rput(0,-0.50){$\psline{<-}(0,0)(0,-\xxc)$} } \rput[r](0.75,-1.15){$\rho\,d\phi$} \rput[b](0,0.68){$r\,d\theta$} \rput(0,3.05){$\hat z$} \rput(3.00,0){$\hat y$} \end{pspicture} \ec% \end{figure} For the surface area, the sphere's surface is sliced vertically down the~$z$ axis into narrow constant-$\phi$ tapered strips (each strip broadest at the sphere's equator, tapering to points at the sphere's~$\pm z$ poles) and horizontally across the~$z$ axis into narrow constant-$\theta$ rings, as in Fig.~\ref{integ:fig-sphere2}. A surface element so produced (seen as shaded in the latter figure) evidently has the area% \footnote{ % diagn: this new footnote wants review. It can be shown, incidentally---the details are left as an exercise---that $dS = -r \,dz\,d\phi$ also. The subsequent integration arguably goes a little easier if~$dS$ is accepted in this mildly clever form. The form is interesting in any event if one visualizes the specific, annular area the expression $\int_{\phi=-\pi}^\pi \,dS = -2\pi r\,dz$ represents: evidently, unexpectedly, a precisely equal portion of the sphere's surface corresponds to each equal step along the~$z$ axis, pole to pole; so, should you slice an unpeeled apple into parallel slices of equal thickness, though some slices will be bigger across and thus heavier than others, each slice curiously must take an equal share of the apple's skin. (This is true, anyway, if you judge Fig.~\ref{integ:fig-sphere2} to represent an apple. The author's children judge it to represent ``the Death Star with a square gun,'' so maybe it depends on your point of view.) } \[ dS = (r\,d\theta)(\rho\,d\phi) = r^2\sin\theta\,d\theta\,d\phi. \] The sphere's total surface area then is the sum of all such elements over the sphere's entire surface: \bqa S_\mr{sphere} &=& \int_{\phi=-\pi}^{\pi}\int_{\theta=0}^{\pi} dS \xn\\ &=& \int_{\phi=-\pi}^{\pi}\int_{\theta=0}^{\pi} r^2 \sin\theta \,d\theta \,d\phi \xn\\ &=& r^2 \int_{\phi=-\pi}^{\pi} [-\cos \theta]_0^\pi \,d\phi \xn\\ &=& r^2 \int_{\phi=-\pi}^{\pi} [2] \,d\phi \xn\\ &=& 4\pi r^2, \label{integ:241:S-sphere} \eqa where we have used the fact from Table~\ref{integ:basic-antider} that $\sin \tau = (d/d\tau)(-\cos \tau)$. \index{sphere!volume of} \index{integral!closed surface} \index{surface integration!closed} \index{closed surface integration} Having computed the sphere's surface area, one can find its volume just as \S~\ref{integ:241.10} has found a circle's area---except that instead of dividing the circle into many narrow triangles, one divides the sphere into many narrow \emph{cones,} each cone with base area~$dS$ and altitude~$r$, with the vertices of all the cones meeting at the sphere's center. Per~(\ref{integ:241:V-cone}), the volume of one such cone is $V_\mr{cone}=r\,dS/3$. Hence, \[ V_\mr{sphere} = \oint_S V_\mr{cone} = \oint_S \frac{r\,dS}{3} = \frac{r}{3} \oint_S dS = \frac{r}{3} S_\mr{sphere}, \] where the useful symbol \[ \oint_S \] indicates \emph{integration over a closed surface.} In light of~(\ref{integ:241:S-sphere}), the total volume is \bq{integ:241:V-sphere} V_\mr{sphere} = \frac{4\pi r^3}{3}. \eq (One can compute the same spherical volume more prosaically, without reference to cones, by writing $dV=r^2\sin\theta\,dr\,d\theta\,d\phi$ then integrating $\int_V dV$. The derivation given above, however, is preferred because it lends the additional insight that a sphere can sometimes be viewed as a great cone rolled up about its own vertex. The circular area derivation of \S~\ref{integ:241.10} lends an analogous insight: that a circle can sometimes be viewed as a great triangle rolled up about \emph{its} own vertex.) % ---------------------------------------------------------------------- \section{Checking an integration} \label{integ:245} \index{checking an integration} \index{integration!checking of} \index{checking division} \index{division!checking} Dividing $\mbox{0x46B}/\mbox{0xD} = \mbox{0x57}$ with a pencil, how does one check the result?% \footnote{ Admittedly, few readers will ever have done much such multidigit \emph{hexadecimal} arithmetic with a pencil, but, hey, go with it. In decimal, it's $1131/13 = 87$. Actually, hexadecimal is just proxy for binary (see Appendix~\ref{hex}), and long division in straight binary is kind of fun. If you have never tried it, you might. It is simpler than decimal or hexadecimal division, and it's how computers divide. The insight gained is worth the trial. } Answer: by multiplying $(\mbox{0x57})(\mbox{0xD})=\mbox{0x46B}$. Multiplication inverts division. Easier than division, multiplication provides a quick, reliable check. Likewise, integrating \[ \int_{a}^{b} \frac{\tau^2}{2} \,d\tau = \frac{b^3 - a^3}{6} \] with a pencil, how does one check the result? Answer: by differentiating \[ \left[ \frac{\partial}{\partial b} \left( \frac{b^3 - a^3}{6} \right) \right]_{b=\tau} = \frac{\tau^2}{2}. \] Differentiation inverts integration. Easier than integration, differentiation like multiplication provides a quick, reliable check. More formally, according to~(\ref{integ:antider}), \bq{integ:245:20} S \equiv \int_{a}^{b} \frac{df}{d\tau} \,d\tau = f(b) - f(a). \eq Differentiating~(\ref{integ:245:20}) with respect to~$b$ and~$a$, \settowidth{\tla}{$\ds -\frac{df}{d\tau}$} \settowidth{\tlb}{\scriptsize$a$} \bq{integ:245:24} \begin{split} \left.\frac{\partial S}{\partial b} \right|_{\makebox[\tlb][c]{\scriptsize$b$}=\tau} &= \makebox[\tla][r]{$\ds \frac{df}{d\tau}$}, \\ \left.\frac{\partial S}{\partial a} \right|_{\makebox[\tlb][c]{\scriptsize$a$}=\tau} &= \makebox[\tla][r]{$\ds -\frac{df}{d\tau}$}. \end{split} \eq Either line of~(\ref{integ:245:24}) can be used to check an integration. Evaluating~(\ref{integ:245:20}) at $b=a$ yields \bq{integ:245:26} S|_{b=a} = 0, \eq which can be used to check further.% \footnote{ Using~(\ref{integ:245:26}) to check the example, $(b^3 - a^3)/6|_{b=a} = 0$. } \index{definite integral} \index{integral!definite} \index{indefinite integral} \index{integral!indefinite} As useful as~(\ref{integ:245:24}) and~(\ref{integ:245:26}) are, they nevertheless serve only integrals with variable limits. They are of little use to check \emph{definite integrals} like~(\ref{inttx:250:20}) below, which lack variable limits to differentiate. However, many or most integrals one meets in practice have or can be given variable limits. Equations~(\ref{integ:245:24}) and~(\ref{integ:245:26}) do serve such \emph{indefinite integrals.} \index{differentiation!analytical versus numeric} \index{integration!analytical versus numeric} It is a rare irony of mathematics that, although numerically differentiation is indeed harder than integration, analytically precisely the opposite is true. Analytically, differentiation is the easier. So far the book has introduced only easy integrals, but Ch.~\ref{inttx} will bring much harder ones. Even experienced mathematicians are apt to err in analyzing these. Reversing an integration by taking an easy derivative is thus an excellent way to check a hard-earned integration result. % ---------------------------------------------------------------------- \section{Contour integration} \label{integ:260} \index{integral!contour} \index{contour integration} \index{path integration} To this point we have considered only integrations in which the variable of integration advances in a straight line from one point to another: for instance, $\int_a^b f(\tau) \,d\tau$, in which the function $f(\tau)$ is evaluated at $\tau = a, a+d\tau, a+2d\tau, \ldots, b$. The integration variable is a real-valued scalar which can do nothing but make a straight line from~$a$ to~$b$. \index{$d\ell$} Such is not the case when the integration variable is a vector. Consider the integral \[ S = \int_{\ve r=\vu x\rho}^{\vu y\rho} (x^2+y^2) \,d\ell, \] where~$d\ell$ is the infinitesimal length of a step along the path of integration. What does this integral mean? Does it mean to integrate from $\ve r=\vu x\rho$ to $\ve r=0$, then from there to $\ve r=\vu y\rho$? Or does it mean to integrate along the arc of Fig.~\ref{integ:260:fig}? The two paths of integration begin and end at the same points, but they differ in between, and the integral certainly does not come out the same both ways. \begin{figure} \caption{A contour of integration.} \label{integ:260:fig} \index{contour} \bc { \nc\xax{-5} \nc\xbx{-3} \nc\xcx{ 5} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xx{2.5} \nc\xxp{1.3} \nc\xxq{26.565} \nc\xxqq{13.283} \nc\xxr{2.2361} \nc\xxrr{1.1180} \nc\xxrrr{2.3} \nc\xxs{2.6} \nc\xxc{1.80} \psline[linewidth=0.5pt](-\xx,0)(\xx,0) \psline[linewidth=0.5pt](0,-\xx)(0,\xx) \psline[linewidth=0.5pt,linestyle=dashed]{cc-cc}(0,0)(2,1) \psarc[linewidth=2.0pt]{->}(0,0){\xxr}{0}{90} \psarc[linewidth=0.5pt]{->}(0,0){\xxp}{0}{\xxq} \rput{\xxqq}(0,0){ \uput[r](\xxp,0){ \rput{*0}(0,0){$\phi$} } } \rput{\xxq}(0,0){ \uput[u](\xxrr,0){ \rput{*0}(0,0){$\rho$} } } \rput[l](\xxs,0){$x$} \rput[b](0,\xxs){$y$} \rput(\xxc,\xxc){$C$} \end{pspicture} } \ec \end{figure} Yet many other paths of integration from~$\vu x\rho$ to~$\vu y\rho$ are possible, not just these two. Because multiple paths are possible, we must be more specific: \[ S = \int_C (x^2+y^2) \,d\ell, \] where~$C$ stands for ``contour'' and means in this example the specific contour of Fig.~\ref{integ:260:fig}. In the example, $x^2+y^2=\rho^2$ (by the Pythagorean theorem) and $d\ell=\rho\,d\phi$, so \[ S = \int_C \rho^2 \,d\ell = \int_0^{2\pi/4} \rho^3\,d\ve\phi = \frac{2\pi}{4}\rho^3. \] \index{integral!closed contour} \index{contour integration!closed} \index{closed contour integration} In the example the contour is open, but closed contours which begin and end at the same point are also possible, indeed common. The useful symbol \[ \oint \] indicates \emph{integration over a closed contour.} It means that the contour ends where it began: the loop is closed. The contour of Fig.~\ref{integ:260:fig} would be closed, for instance, if it continued to $\ve r=0$ and then back to $\ve r=\vu x\rho$. \index{integral!vector contour} \index{contour integration!of a vector quantity} Besides applying where the variable of integration is a vector, contour integration applies equally where the variable of integration is a complex scalar. In the latter case some interesting mathematics emerge, as we shall see in \S\S~\ref{taylor:350} and~\ref{inttx:250}. % ---------------------------------------------------------------------- \section{Discontinuities} \label{integ:670} \index{discontinuity} \index{Heaviside, Oliver (1850--1925)} \index{Heaviside unit step function} \index{unit step function, Heaviside} \index{$u$} The polynomials and trigonometrics studied to this point in the book offer flexible means to model many physical phenomena of interest, but one thing they do not model gracefully is the simple discontinuity. Consider a mechanical valve opened at time $t=t_o$. The flow $x(t)$ past the valve is \[ x(t) = \begin{cases} 0, & t < t_o; \\ x_o, & t > t_o. \end{cases} \] One can write this more concisely in the form \[ x(t) = u(t-t_o) x_o, \] where $u(t)$ is the \emph{Heaviside unit step,} \bq{integ:670:10} u(t) \equiv \begin{cases} 0, & t < 0; \\ 1, & t > 0; \end{cases} \eq plotted in Fig.~\ref{integ:670:fig-u}. \begin{figure} \caption{The Heaviside unit step $u(t)$.} \label{integ:670:fig-u} \bc \nc\fxa{-3.0} \nc\fxb{3.0} \nc\fya{-0.8} \nc\fyb{2.2} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) \nc\xxxab{2.0} \nc\xxya{0.5} \nc\xxyb{1.5} \nc\xxo{0.15} \nc\xxc{1.7} \nc\xxd{1.1} \nc\xxl{0.2} { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxxab,0)(\xxxab,0) \psline(0,-\xxya)(0,\xxyb) \uput[r](\xxxab,0){$t$} \uput[u](0,\xxyb){$u(t)$} \psline(-\xxl,\xxd)(\xxl,\xxd) \uput[l](-\xxl,\xxd){$1$} } { \psset{linewidth=2.0pt} \psline(-\xxc,0)(0,0)(0,\xxd)(\xxc,\xxd) } } \end{pspicture} \ec \end{figure} \index{Dirac, Paul (1902--1984)} \index{Dirac delta function} \index{delta function, Dirac} \index{impulse function} \index{$\delta$} \index{mathematics!professional or pure} \index{mathematics!applied} \index{professional mathematics} \index{pure mathematics} \index{applied mathematics} \index{Goldman, William (1931--)} \index{nobleman} \index{definition} \index{function} \index{sifting property} \index{Dirac delta function!sifting property of} \index{delta function, Dirac!sifting property of} The derivative of the Heaviside unit step is the curious \emph{Dirac delta} \bq{integ:670:20} \delta(t) \equiv \frac{d}{dt}u(t), \eq also called% \footnote{\cite[\S~19.5]{JJH}} the \emph{impulse function,} plotted in Fig.~\ref{integ:670:fig-d}. \begin{figure} \caption{The Dirac delta $\delta(t)$.} \label{integ:670:fig-d} \bc \nc\fxa{-3.0} \nc\fxb{3.0} \nc\fya{-0.8} \nc\fyb{2.2} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) \nc\xxxab{2.0} \nc\xxya{0.5} \nc\xxyb{1.5} \nc\xxo{0.15} \nc\xxc{1.7} \nc\xxd{1.1} \nc\xxl{0.2} { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxxab,0)(\xxxab,0) \psline(0,-\xxya)(0,0) \uput[r](\xxxab,0){$t$} \uput[u](0,\xxd){$\delta(t)$} } { \psset{linewidth=2.0pt} \psline{->}(0,0)(0,\xxd) } } \end{pspicture} \ec \end{figure} This function is zero everywhere except at $t=0$, where it is infinite, with the property that \bq{integ:670:sift0} \int_{-\infty}^{\infty} \delta(t) \,dt = 1, \eq and the interesting consequence that \bq{integ:670:sift} \int_{-\infty}^{\infty} \delta(t-t_o) f(t) \,dt = f(t_o) \eq for any function $f(t)$. (Equation~\ref{integ:670:sift} is the \emph{sifting property} of the Dirac delta.)% \footnote{ It seems inadvisable for the narrative to digress at this point to explore $u(z)$ and $\delta(z)$, the unit step and delta of a complex argument, although by means of Fourier analysis (Ch.~\ref{fouri}) or by conceiving the Dirac delta as an infinitely narrow Gaussian pulse (\S~\ref{fouri:130}) it could perhaps do so. The book has more pressing topics to treat. For the book's present purpose the interesting action of the two functions is with respect to the real argument~$t$. In the author's country at least, a sort of debate seems to have run for decades between professional and applied mathematicians over the Dirac delta $\delta(t)$. Some professional mathematicians seem to have objected that $\delta(t)$ is not a function, inasmuch as it lacks certain properties common to functions as they define them \cite[\S~2.4]{Phillips/Parr}\cite{Doetsch}\@. From the applied point of view the objection is admittedly a little hard to understand, until one realizes that it is more a dispute over methods and definitions than over facts. What the professionals seem to be saying is that $\delta(t)$ does not fit as neatly as they would like into the abstract mathematical framework they had established for functions in general before Paul Dirac came along in~1930 \cite[``Paul Dirac,'' 05:48, 25~May 2006]{wikip} and slapped his disruptive $\delta(t)$ down on the table. The objection is not so much that $\delta(t)$ is not allowed as it is that professional mathematics for years after~1930 lacked a fully coherent theory for it. It's a little like the six-fingered man in Goldman's \emph{The Princess Bride}~\cite{Goldman}\@. If I had established a definition of ``nobleman'' which subsumed ``human,'' whose relevant traits in my definition included five fingers on each hand, when the six-fingered Count Rugen appeared on the scene, then you would expect me to adapt my definition, wouldn't you? By my pre\"existing definition, strictly speaking, the six-fingered count is ``not a nobleman''; but such exclusion really tells one more about flaws in the definition than it does about the count. Whether the professional mathematician's definition of the \emph{function} is flawed, of course, is not for this writer to judge. Even if not, however, the fact of the Dirac delta dispute, coupled with the difficulty we applied mathematicians experience in trying to understand the reason the dispute even exists, has unfortunately surrounded the Dirac delta with a kind of mysterious aura, an elusive sense that $\delta(t)$ hides subtle mysteries---when what it really hides is an internal discussion of words and means among the professionals. The professionals who had established the theoretical framework before~1930 justifiably felt reluctant to throw the whole framework away because some scientists and engineers like us came along one day with a useful new function which didn't quite fit, but that was the professionals' problem not ours. To us the Dirac delta $\delta(t)$ is just a function. The internal discussion of words and means, we leave to the professionals, who know whereof they speak. } The Dirac delta is defined for vectors, too, such that \bq{integ:670:30} \int_V \delta(\ve r) \,d\ve r = 1. \eq % ---------------------------------------------------------------------- \section{Remarks (and exercises)} \label{integ:680} The concept of the integral is relatively simple once grasped, but its implications are broad, deep and hard. This chapter is short. One reason introductory calculus texts run so long is that they include many, many pages of integration examples and exercises. The reader who desires a gentler introduction to the integral might consult among others the textbook the chapter's introduction has recommended. \index{exercises} \index{style} Even if this book is not an instructional textbook, it seems not meet that it should include no exercises at all here. Here are a few. Some of them do need material from later chapters, so you should not expect to be able to complete them all now. The harder ones are marked with $\mbox{}^{*}\mbox{asterisks}$. Work the exercises if you like. % \begin{enumerate} \item Evaluate (a)~$\int_{0}^{x} \tau \,d\tau$;\ \ % (b)~$\int_{0}^{x} \tau^2 \,d\tau$.\ \ % (Answer: $x^2/2$; $x^3/3$.) \item Evaluate (a)~$\int_{1}^{x} (1/\tau^2) \,d\tau$;\ \ % (b)~$\int_{a}^{x} 3\tau^{-2} \,d\tau$;\ \ % (c)~$\int_{a}^{x} C\tau^n \,d\tau$;\ \ % (d)~$\int_{0}^{x} \linebreak % bad break (a_2\tau^2 + a_1\tau) \,d\tau$;\ \ % $\mbox{}^{*}$(e)~$\int_1^{x} (1/\tau) \,d\tau$. \item $\mbox{}^{*}\mbox{Evaluate}$ (a)~$\int_{0}^{x} \sum_{k=0}^{\infty} \tau^k \,d\tau$;\ \ % (b)~$\sum_{k=0}^{\infty} \int_{0}^{x} \tau^k \,d\tau$;\ \ % (c)~$\int_{0}^{x} \sum_{k=0}^{\infty} (\tau^k/k!) % bad break \linebreak \,d\tau$. \item Evaluate $\int_0^x \exp \alpha\tau \,d\tau$. \item Evaluate (a)~$\int_{-2}^5(3\tau^2-2\tau^3) \,d\tau$;\ \ % (b)~$\int_5^{-2}(3\tau^2-2\tau^3) \,d\tau$.\ \ % Work the exercise by hand in hexadecimal and give the answer in hexadecimal. \item Evaluate $\int_1^{\infty} (3/\tau^2) \,d\tau$. \item $\mbox{}^{*}\mbox{Evaluate}$ the integral of the example of \S~\ref{integ:260} along the alternate contour suggested there, from $\vu x\rho$ to~0 to $\vu y\rho$. \item Evaluate (a)~$\int_{0}^{x} \cos \omega\tau \,d\tau$;\ \ % (b)~$\int_{0}^{x} \sin \omega\tau \,d\tau$;\ \ % $\mbox{}^{*}\mbox{(c)}$% \footnote{\cite[\S~8-2]{Shenk}} $\int_{0}^{x} \tau\sin \omega\tau \,d\tau$. \item $\mbox{}^{*}\mbox{Evaluate}$% \footnote{\cite[\S~5-6]{Shenk}} (a)~ $\int_1^x \sqrt{1+2\tau} \,d\tau$;\ \ % (b)~ $\int_x^a [(\cos\sqrt\tau)/\sqrt\tau] \,d\tau.$ \item $\mbox{}^{*}\mbox{Evaluate}$% \footnote{\cite[back endpaper]{Shenk}} (a)~$\int_0^x [1/(1+\tau^2)] \,d\tau$ (answer: $\arctan x$);\ \ % (b)~$\int_0^x [(4+i3)/\sqrt{2-3\tau^2}] \,d\tau$ (hint: the answer involves another inverse trigonometric). \item $\mbox{}^{**}\mbox{Evaluate}$ (a)~$\int_{-\infty}^x\exp[-\tau^2/2] \,d\tau$; \ \ % (b)~$\int_{-\infty}^{\infty}\exp[-\tau^2/2] \,d\tau$. \end{enumerate} The last exercise in particular requires some experience to answer. Moreover, it requires a developed sense of applied mathematical style to put the answer in a pleasing form (the right form for part~b is very different from that for part~a). Some of the easier exercises, of course, you should be able to work right now. \index{cleverness} The point of the exercises is to illustrate how hard integrals can be to solve, and in fact how easy it is to come up with an integral which no one really knows how to solve very well. Some solutions to the same integral are better than others (easier to manipulate, faster to numerically calculate, etc.)\ yet not even the masters can solve them all in practical ways. On the other hand, integrals which arise in practice often can be solved very well with sufficient cleverness---and the more cleverness you develop, the more such integrals you can solve. The ways to solve them are myriad. The mathematical art of solving diverse integrals is well worth cultivating. Chapter~\ref{inttx} introduces some of the basic, most broadly useful integral-solving techniques. Before addressing techniques of integration, however, as promised earlier we turn our attention in Chapter~\ref{taylor} back to the derivative, applied in the form of the Taylor series. derivations-0.53.20120414.orig/tex/specf.tex0000644000000000000000000001046211742566274016764 0ustar rootroot% ---------------------------------------------------------------------- \chapter{Introduction to special functions} \label{specf} \index{special functions} [This chapter is a rough, partial draft.] No topic more stirs the pure mathematician's imagination than that of number theory, so briefly addressed by this book's \S~\ref{noth:220}. So it is said. However, an applied mathematician is pleased to follow a lesser muse, and the needful topic that most provokes his mathematical curiosity may be that of special functions. What is a \emph{special function?} Trouble arises at once, before the first mathematical symbol strikes the page, for it is not easy to discover a precise definition of the term. N.N.~Lebedev and Larry~C. Andrews, authors respectively in Russian and English of two of the better post-World War~II books on the topic,% \footnote{ The books respectively are \cite{Lebedev} and~\cite{Andrews}, the latter originally in English, the former available at the time of this writing in Richard~A. Silverman's excellent, inexpensive English translation. } seem to decline to say what a special function \emph{is,} merely suggesting in their respective prefaces some things it \emph{does,} diving thence fluidly into the mathematics before anyone should notice that those authors have motivated but never quite delineated their topic (actually, it's not a bad approach, and reading Lebedev or Andrews one does soon perceive the shape of the thing). Abramowitz and Stegun, whose celebrated handbook% \footnote{\cite{A-S}} though not expressly a book on special functions is largely about them, are even silenter on the question. The sober dictionary% \footnote{\cite{Webster}} on the author's desk correctly defines such mathematical terms as ``eigenvalue'' and ``Fourier transform'' but seems unacquainted with ``special function.'' What are we to make of this? \index{elementary function} \index{canonical form} \index{integral} \index{integral equation} \index{differential equation} Here is what. A \emph{special function} is an analytic function (\S~\ref{taylor:320})---likely of a single and at most of a few complex scalar variables, harder to analyze and evaluate than the \emph{elementary functions} of Chs.~\ref{alggeo} through~\ref{cexp}, defined in a suitably canonical form---that serves to evaluate an integral, to solve an integral equation,% \footnote{ It need not interest you but in case it does, an \emph{integral equation} is an equation like \[ \int_{-\infty}^\infty g(z,w) f(w) \,dw = h(z), \] in which the unknown is not a variable but a function $f(w)$ that operates on a dummy variable of integration. Actually, we have already met integral equations in disguise, in discretized form, in matrix notation (Chs.~\ref{matrix} through~\ref{eigen}) resembling \[ G\ve f = \ve h, \] which means no more than it seems to mean; so maybe integral equations are not so strange as they look. The integral equation is just the matrix equation with the discrete vectors~$\ve f$ and~$\ve h$ replaced by their continuous versions $\Delta w\,f(j\,\Delta w)$ and $\Delta z\,h(i\,\Delta z)$ (the~$i$ representing not the imaginary unit here but just an index, as in Ch.~\ref{matrix}). } or to solve a differential equation elementary functions alone cannot evaluate or solve. Such a definition approximates at least the aspect and use of the special functions this book means to treat. The definition will do to go on with. The fascination of special functions to the scientist and engineer lies in how gracefully they analyze otherwise intractable physical models; in how reluctantly they yield their mathematical secrets; in how readily they conform to unexpected applications; in how often they connect seemingly unrelated phenomena; and in that, the more intrepidly one explores their realm, the more disquietly one feels that one had barely penetrated the realm's frontier. The topic of special functions seems inexhaustible. We surely will not begin to exhaust the topic in this book; yet, even so, useful results will start to flow from it almost at once. % ---------------------------------------------------------------------- \section{The Gaussian pulse and its moments} \label{specf:220} We have already met \[ \Omega(x) = \frac{\exp\left(-x^2/2\right)}{\sqrt{2\pi}}, \] as~(\ref{prob:normdist}). derivations-0.53.20120414.orig/tex/purec.tex0000644000000000000000000002342311742566274017003 0ustar rootroot\chapter[A sketch of pure complex theory]% {A bare sketch of the pure theory of the complex variable} \label{purec} \index{variable!complex} \index{complex variable} \index{mathematics!professional or pure} \index{professional mathematics} \index{pure mathematics} % diagn: The bib.bib might want a URL, and maybe even an entirely % reconstructed entry, for Arnold. At least three of the various disciplines of pure mathematics stand out for their pedagogical intricacy and the theoretical depth of their core results. The first of the three is number theory which, except for the simple results of \S~\ref{noth:220}, scientists and engineers tend to get by largely without. The second is matrix theory (Chs.~\ref{matrix} through~\ref{eigen}), a bruiser of a discipline the applied mathematician of the computer age---try though he might---can hardly escape. The third is the pure theory of the complex variable. The introduction's \S~\ref{intro:310} admires the beauty of the pure theory of the complex variable even while admitting that ``its arc takes off too late and flies too far from applications for such a book as this.'' To develop the pure theory properly is a worthy book-length endeavor of its own requiring moderately advanced preparation on its reader's part which, however, the reader who has reached the end of the present book's Ch.~\ref{inttx} possesses. If the writer doubts the strictly applied \emph{necessity} of the pure theory, still, he does not doubt its health to one's overall mathematical formation. It provides another way to think about complex numbers. Scientists and engineers with advanced credentials occasionally expect one to be acquainted with it for technical-social reasons, regardless of its practical use. Besides, the pure theory is interesting. This alone recommends some attention to it. The pivotal result of pure complex-variable theory is the Taylor series by Cauchy's impressed residue theorem. If we will let these few pages of appendix replace an entire book on the pure theory, then Cauchy's and Taylor's are the results we will sketch. The bibliography lists presentations far more complete. %(This presentation, as advertised, is just a sketch.) %(The reader who has reached the end of the %(Ch.~\ref{inttx} will understand already why the presentation is strictly %optional, interesting maybe but deemed unnecessary to the book's applied %mathematical development.) \index{Cauchy's impressed residue theorem} \index{impressed residue theorem, Cauchy's} \index{residue theorem, Cauchy's impressed} \index{cleverness} %\index{Arnold, D.N.} \emph{Cauchy's impressed residue theorem}% \footnote{ This is not a standard name. Though they name various associated results after Cauchy in one way or another, neither~\cite{Hildebrand} nor~\cite{Arnold:1997} seems to name this particular result, though both do feature it. Since~(\ref{purec:100:10}) impresses a pole and thus also a residue on a function $f(z)$ which in the domain of interest lacks them, the name \emph{Cauchy's impressed residue theorem} ought to serve this appendix's purpose ably enough. } is that \bq{purec:100:10} f(z) = \frac{1}{i2\pi} \oint \frac{f(w)}{w-z} \,dw \eq if~$z$ lies within the closed complex contour about which the integral is taken and if $f(z)$ is everywhere analytic (\S~\ref{taylor:320}) within and along the contour. More than one proof of the theorem is known, depending on the assumptions from which the mathematician prefers to start, but this writer is partial to an instructively clever proof he has learned from D.N.~Arnold% \footnote{\cite[\S~III]{Arnold:1997}} which goes as follows. Consider the function \[ g(z,t) \equiv \frac{1}{i2\pi} \oint \frac{f[z+(t)(w-z)]}{w-z} \,dw, \] whose derivative with respect to the parameter~$t$ is% \footnote{ The book does not often employ Newton's notation $f'(\cdot) \equiv [(d/d\zeta)f(\zeta)]_{\zeta=(\cdot)}$ of \S~\ref{drvtv:240} but the notation is handy here because it evades the awkward circumlocution of changing $\zeta \la z$ in~(\ref{purec:100:10}) and then writing \[ \frac{\pl g}{\pl t} = \frac{1}{i2\pi} \oint \frac{[(d/d\zeta)f(\zeta)]_{\zeta=z+(t)(w-z)}}{w-z} \,dw. \] } \[ \frac{\pl g}{\pl t} = \frac{1}{i2\pi} \oint f'[z+(t)(w-z)] \,dw. \] We notice that this is \bqb \frac{\pl g}{\pl t} &=& \frac{1}{i2\pi} \oint \frac{\pl}{\pl w}\left\{ \frac{f[z+(t)(w-z)]}{t} \right\} \,dw \\&=& \frac{1}{i2\pi} \left\{ \frac{f[z+(t)(w-z)]}{t} \right\}_a^b, \eqb where~$a$ and~$b$ respectively represent the contour integration's beginning and ending points. But this integration ends where it begins, so $a=b$ and the factor~$\{\cdot\}_a^b$ in braces vanishes, whereupon \[ \frac{\pl g}{\pl t} = 0, \] meaning that $g(z,t)$ does not vary with~$t$. Observing per~(\ref{taylor:cauchy0}) that \[ \frac{1}{i2\pi} \oint \frac{dw}{w-z} = 1, \] we have that \[ f(z) = \frac{f(z)}{i2\pi} \oint \frac{dw}{w-z} = g(z,0) = g(z,1) = \frac{1}{i2\pi} \oint \frac{f(w)}{w-z} \,dw \] as was to be proved. (There remains a basic question as to whether the paragraph's integration is even valid. Logically, it ought to be valid, since $f[z]$ being analytic is infinitely differentiable,% \footnote{ The professionals minimalistically actually require only that the function be once differentiable under certain conditions, from which they prove infinite differentiability, but this is a fine point which will not concern us here. } but when the integration is used as the sole theoretical support for the entire calculus of the complex variable, well, it seems an awfully slender reed to carry so heavy a load. Admittedly, maybe this is only a psychological problem, but a professional mathematician will devote many pages to preparatory theoretical constructs before even attempting the integral, the result of which lofty effort is not in the earthier spirit of applied mathematics. On the other hand, now that the reader has followed the book along its low road and the high integration is given only in reserve, now that the integration reaches a conclusion already believed and, once there, is asked to carry the far lighter load of this appendix only, the applied reader may feel easier about trusting it.) \index{Goursat, Edouard (1858--1936)} %\index{Hildebrand, F.B.} One could follow Arnold hence toward the proof of the theorem of one Goursat and further toward various other interesting results, a path of study the writer recommends to sufficiently interested readers: see~\cite{Arnold:1997}. Being in a tremendous hurry ourselves, however, we will leave Arnold and follow F.B.~Hildebrand% \footnote{\cite[\S~10.7]{Hildebrand}} directly toward the Taylor series. Positing some expansion point~$z_o$ and then expanding~(\ref{purec:100:10}) geometrically per~(\ref{alggeo:228:45}) about it, we have that \bqb f(z) &=& \frac{1}{i2\pi} \oint \frac{f(w)}{(w-z_o)-(z-z_o)} \,dw \\&=& \frac{1}{i2\pi} \oint \frac{f(w)}{(w-z_o)[1 - (z-z_o)/(w-z_o)]} \,dw \\&=& \frac{1}{i2\pi} \oint \frac{f(w)}{w-z_o} \sum_{k=0}^\infty \left[\frac{z-z_o}{w-z_o}\right]^k \,dw \\&=& \sum_{k=0}^\infty \left\{ \left[ \frac{1}{i2\pi} \oint \frac{f(w)}{(w-z_o)^{k+1}} \,dw \right] (z-z_o)^k \right\}, \eqb which, being the power series \bq{purec:100:40} \begin{split} f(z) &= \sum_{k=0}^\infty (a_k) (z-z_o)^k, \\ a_k &\equiv \frac{1}{i2\pi} \oint \frac{f(w)}{(w-z_o)^{k+1}} \,dw, \end{split} \eq by definition constitutes the Taylor series~(\ref{taylor:310:20}) for $f(z)$ about $z=z_o$, assuming naturally that $\left|z-z_o\right| < \left|w-z_o\right|$ for all~$w$ along the contour so that the geometric expansion above will converge. The important theoretical implication of~(\ref{purec:100:40}) is that \emph{every function has a Taylor series about any point across whose immediate neighborhood the function is analytic.} There evidently is no such thing as an analytic function without a Taylor series---a fact we already knew if we have read and believed Ch.~\ref{taylor}, but some readers may find it more convincing this way. Comparing~(\ref{purec:100:40}) against~(\ref{taylor:310:20}), incidentally, we have also that \bq{purec:100:50} \left.\frac{d^kf}{dz^k}\right|_{z=z_o} = \frac{k!}{i2\pi} \oint \frac{f(w)}{(w-z_o)^{k+1}} \,dw, \eq which is an alternate way to write~(\ref{taylor:350:30}). % diagn: This paragraph is new and wants review. Close inspection of the reasoning by which we have reached~(\ref{purec:100:40}) reveals, quite by the way, at least one additional result which in itself tends to vindicate the pure theory's technique. It is this: that \emph{a Taylor series remains everywhere valid out to the distance of the nearest nonanalytic point.} The proposition is explained and proved as follows. For the aforementioned contour of integration nothing prevents one from choosing a circle, centered in the Argand plane on the expansion point $z=z_o$, the circle's radius just as large as it can be while still excluding all nonanalytic points. The requirement that $\left|z-z_o\right| < \left|w-z_o\right|$ for all~$w$ along the contour evidently is met for all~$z$ inside such a circle, which means that the Taylor series~(\ref{purec:100:40}) converges for all~$z$ inside the circle, which---precisely because we have stipulated that the circle be the largest possible centered on the expansion point---implies and thus proves the proposition in question. As an example of the proposition's use, consider the Taylor series Table~\ref{taylor:315:tbl} gives for $-\ln (1-z)$, whose nearest nonanalytic point at $z=1$ lies at unit distance from the series' expansion point $z=0$: according to the result of this paragraph, the series in question remains valid over the Argand circle out to unit distance, $|1-z| < 1$. % ---------------------------------------------------------------------- derivations-0.53.20120414.orig/tex/hist.tex0000644000000000000000000004444511742566274016643 0ustar rootroot% ---------------------------------------------------------------------- \chapter{Manuscript history} \label{hist} \index{Black, Thaddeus~H.\ (1967--)} The book in its present form is based on various unpublished drafts and notes of mine, plus some of my wife Kristie's (n\'ee Hancock), going back to 1983 when I was fifteen years of age. What prompted the contest I can no longer remember, but the notes began one day when I challenged a high-school classmate to prove the quadratic formula. The classmate responded that he didn't need to prove the quadratic formula because the proof was in the class math textbook, then counterchallenged me to prove the Pythagorean theorem. Admittedly obnoxious (I was fifteen, after all) but not to be outdone, I whipped out a pencil and paper on the spot and started working. But I found that I could not prove the theorem that day. The next day I did find a proof in the school library,% \footnote{ A better proof is found in \S~\ref{alggeo:223}. } writing it down, adding to it the proof of the quadratic formula plus a rather inefficient proof of my own invention to the law of cosines. Soon thereafter the school's chemistry instructor happened to mention that the angle between the tetrahedrally arranged four carbon-hydrogen bonds in a methane molecule was~$109^\circ$, so from a symmetry argument I proved that result to myself, too, adding it to my little collection of proofs. That is how it started.% \footnote{ Fellow gear-heads who lived through that era at about the same age might want to date me against the disappearance of the slide rule. Answer: in my country, or at least at my high school, I was three years too young to use a slide rule. The kids born in 1964 learned the slide rule; those born in 1965 did not. I wasn't born till 1967, so for better or for worse I always had a pocket calculator in high school. My family had an eight-bit computer at home, too, as we shall see. } The book actually has earlier roots than these. In 1979, when I was twelve years old, my father bought our family's first eight-bit computer. The computer's built-in \emph{BASIC} programming-language interpreter exposed functions for calculating sines and cosines of angles. The interpreter's manual included a diagram much like Fig.~\ref{trig:226:f1} showing what sines and cosines were, but it never explained how the computer went about calculating such quantities. This bothered me at the time. Many hours with a pencil I spent trying to figure it out, yet the computer's trigonometric functions remained mysterious to me. When later in high school I learned of the use of the Taylor series to calculate trigonometrics, into my growing collection of proofs the series went. Five years after the Pythagorean incident I was serving the U.S. Army as an enlisted troop in the former West Germany. Although those were the last days of the Cold War, there was no shooting war at the time, so the duty was peacetime duty. My duty was in military signal intelligence, frequently in the middle of the German night when there often wasn't much to do. The platoon sergeant wisely condoned neither novels nor cards on duty, but he did let the troops read the newspaper after midnight when things were quiet enough. Sometimes I used the time to study my German---the platoon sergeant allowed this, too---but I owned a copy of Richard~P. Feynman's \emph{Lectures on Physics}~\cite{Feynman} which I would sometimes read instead. Late one night the battalion commander, a lieutenant colonel and West Point graduate, inspected my platoon's duty post by surprise. A lieutenant colonel was a highly uncommon apparition at that hour of a quiet night, so when that old man appeared suddenly with the sergeant major, the company commander and the first sergeant in tow---the last two just routed from their sleep, perhaps---surprise indeed it was. The colonel may possibly have caught some of my unlucky fellows playing cards that night---I am not sure---but me, he caught with my boots unpolished, reading the \emph{Lectures.} I snapped to attention. The colonel took a long look at my boots without saying anything, as stormclouds gathered on the first sergeant's brow at his left shoulder, then asked me what I had been reading. ``Feynman's \emph{Lectures on Physics,} sir.'' ``Why?'' ``I am going to attend the university when my three-year enlistment is up, sir.'' ``I see.'' Maybe the old man was thinking that I would do better as a scientist than as a soldier? Maybe he was remembering when he had had to read some of the \emph{Lectures} himself at West Point. Or maybe it was just the singularity of the sight in the man's eyes, as though he were a medieval knight at bivouac who had caught one of the peasant levies, thought to be illiterate, reading Cicero in the original Latin. The truth of this, we shall never know. What the old man actually said was, ``Good work, son. Keep it up.'' The stormclouds dissipated from the first sergeant's face. No one ever said anything to me about my boots (in fact as far as I remember, the first sergeant---who saw me seldom in any case---never spoke to me again). The platoon sergeant thereafter explicitly permitted me to read the \emph{Lectures} on duty after midnight on nights when there was nothing else to do, so in the last several months of my military service I did read a number of them. It is fair to say that I also kept my boots better polished. In Volume I, Chapter 6, of the \emph{Lectures} there is a lovely introduction to probability theory. It discusses the classic problem of the ``random walk'' in some detail, then states without proof that the generalization of the random walk leads to the Gaussian distribution % diagn: add an in-book reference, of the type "which you will find in % this book in Sect.~99.99." \[ p(x) = \frac{\exp(-x^2/2\sigma^2)}{\sigma\sqrt{2\pi}}. \] For the derivation of this remarkable theorem, I scanned the book in vain. One had no Internet access in those days, but besides a well-equipped gym the Army post also had a tiny library, and in one yellowed volume in the library---who knows how such a book got there?---I did find a derivation of the $1/\sigma\sqrt{2\pi}$ factor.% \footnote{ The citation is now unfortunately long lost. } The exponential factor, the volume did not derive. Several days later, I chanced to find myself in Munich with an hour or two to spare, which I spent in the university library seeking the missing part of the proof, but lack of time and unfamiliarity with such a German site defeated me. Back at the Army post, I had to sweat the proof out on my own over the ensuing weeks. Nevertheless, eventually I did obtain a proof which made sense to me. Writing the proof down carefully, I pulled the old high-school math notes out of my military footlocker (for some reason I had kept the notes and even brought them to Germany), dusted them off, and added to them the new Gaussian proof. That is how it has gone. To the old notes, I have added new proofs from time to time, and although somehow I have misplaced the original high-school leaves I took to Germany with me the notes have nevertheless grown with the passing years. These years have brought me the good things years can bring: marriage, family and career; a good life gratefully lived, details of which interest me and mine but are mostly unremarkable as seen from the outside. A life however can take strange turns, reprising earlier themes. I had become an industrial building construction engineer for a living % diagn: add an in-book reference (or disclaimer if none) for the % resistance reference below? (and, appropriately enough, had most lately added to the notes a mathematical justification of the standard industrial building construction technique to measure the resistance-to-ground of a new building's electrical grounding system), when at a juncture between construction projects an unexpected opportunity arose to pursue a Ph.D. in engineering at Virginia Tech, courtesy (indirectly, as it developed) of a research program not of the United States Army as last time but this time of the United States Navy. The Navy's research problem turned out to be in the highly mathematical fields of theoretical and computational electromagnetics. Such work naturally brought a blizzard of new formulas, whose proofs I sought or worked out and, either way, added to the notes---whence the manuscript and, in due time, this book. %I was delighted to %discover that Eric W. Weisstein had compiled and published~\cite{EWW} a %wide-ranging collection of mathematical results in a spirit not %entirely dissimilar to that of my own notes. A significant difference %remained, however, between Weisstein's work and my own. The difference %was and is fourfold: %\begin{enumerate} % \item Number theory, mathematical recreations and odd mathematical % \linebreak % bad break % names interest Weisstein much more than they interest me; my own % tastes run toward math directly useful in known physical % applications. The selection of topics in each body of work reflects % this difference. % \item Weisstein often includes results without proof. This is fine, % but for my own part I happen to like proofs. % \item Weisstein lists results encyclopedically, alphabetically by % name. I organize results more traditionally by topic, leaving % alphabetization to the book's index, that readers who wish to do so % can coherently read the book from front to back.% % \footnote{ % There is an ironic personal story in this. As children in the % 1970s, my brother and I had a 1959 World Book encyclopedia in our % bedroom, about twenty volumes. It was then a bit outdated (in % fact the world had changed tremendously in the fifteen or twenty % years following 1959, so the encyclopedia was more than a bit % outdated) but the two of us still used it sometimes. Only years % later did I learn that my father, who in 1959 was fourteen years % old, had bought the encyclopedia with money he had earned % delivering newspapers daily before dawn, \emph{and then had read % the entire encyclopedia, front to back.} My father played % linebacker on the football team and worked a job after school, % too, so where he found the time or the inclination to read an % entire encyclopedia, I'll never know. Nonetheless, it does prove % that even an encyclopedia can be read from front to back. % } % \item I have eventually developed an interest in the free-software % movement, joining it as a Debian Developer~\cite{Debian}; and by % these lights and by the standard of the Debian Free Software % Guidelines (DFSG)~\cite{DFSG}, Weisstein's work is not free. No % objection to non-free work as such is raised here, but the book you % are reading \emph{is} free in the DFSG sense. %\end{enumerate} %A different mathematical reference, even better in some ways than %Weisstein's and (if I understand correctly) indeed free in the DFSG %sense, is emerging at the time of this writing in the on-line pages of %the general-purpose encyclopedia Wikipedia~\cite{wikip}\@. Although %Wikipedia reads unevenly, remains generally un\-cit\-able,% %\footnote{ % Some ``Wikipedians'' do seem actively to be working on making % Wikipedia authoritatively citable. The underlying philosophy and % basic plan of Wikipedia admittedly tend to thwart their efforts, but % their efforts nevertheless seem to continue to progress. We shall % see. Wikipedia is a remarkable, monumental creation. %} %forms no coherent whole, and seems to suffer a certain %competition among some of its mathematical editors as to which of them %can explain a thing most reconditely, it is laden with mathematical %knowledge, including many proofs, which I have referred to more than a %few times in the preparation of this text. The book follows in the honorable tradition of Courant's and Hilbert's % bad break 1924 classic \emph{Methods of Mathematical Physics}~\cite{Courant/Hilbert}---a tradition subsequently developed by, among others, Jeffreys and Jeffreys~\cite{Jeffreys/Jeffreys}, Arfken and Weber~\cite{Arfken/Weber}, and Weisstein% \footnote{ Weisstein lists results encyclopedically, alphabetically by name. I organize results more traditionally by topic, leaving alphabetization to the book's index, that readers who wish to do so can coherently read the book from front to back. There is an ironic personal story in this. As children in the 1970s, my brother and I had a 1959 World Book encyclopedia in our bedroom, about twenty volumes. The encyclopedia was then a bit outdated (in fact the world had changed tremendously in the fifteen or twenty years following 1959, so it was more than a bit outdated) but the two of us still used it sometimes. Only years later did I learn that my father, who in 1959 was fourteen years old, had bought the encyclopedia with money he had earned delivering newspapers daily before dawn, \emph{and then had read the entire encyclopedia, front to back.} My father played linebacker on the football team and worked a job after school, too, so where he found the time or the inclination to read an entire encyclopedia, I'll never know. Nonetheless, it does prove that even an encyclopedia can be read from front to back. }~\cite{EWW}. The present book's chief intended contribution to the tradition lies in its applied-level derivations of the many results it presents. Its author always wanted to know why the Pythagorean theorem was so. The book is presented in this spirit. %The book is also presented in the spirit of the free-software or %open-source movement, which I have joined as a Debian %Developer~\cite{Debian}. I am an open-source contributor rather than an %open-source partisan. Not everything I write is free, but this %particular work is sufficiently unusual in nature, is sufficiently %distinctive in composition, and represents such a personal investment %that one feels reluctant to risk entrusting it to a publisher who might %pitch it to the wrong market and then, impatiently, let the book go out %of print. Publishers after all have their own problems. Besides, %desiring broad distribution even in the author's own country, one would %rather not price the book at~\$~100 a copy in the United States and~\$~5 %a copy in China. Open source thus makes %sense for this book. (Naturally, the publication of this or any book by %open source subjects the book to the not unreasonable criticism that the %book might never have been good enough to publish. I do not believe %that that is remotely the case here, but the reader, having been spared %the book's presumed~\$~100 price, must be the judge of that.) A book can follow convention or depart from it; yet, though occasional departure might render a book original, frequent departure seldom renders a book good. Whether this particular book is original or good, neither or both, is for the reader to tell, but in any case the book does both follow and depart. Convention is a peculiar thing: at its best, it evolves or accumulates only gradually, patiently storing up the long, hidden wisdom of generations past; yet herein arises the ancient dilemma. Convention, in all its richness, in all its profundity, can, sometimes, stagnate at a local maximum, a hillock whence higher ground is achievable not by gradual ascent but only by descent first---or by a leap. Descent risks a bog. A leap risks a fall. One ought not run such risks without cause, even in such an inherently unconservative discipline as mathematics. Well, the book does risk. It risks one leap at least: it employs hexadecimal numerals. This book is bound to lose at least a few readers for its unorthodox use of hexadecimal notation (``The first primes are $2,3,5,7,\mbox{0xB},\ldots$''). Perhaps it will gain a few readers for the same reason; time will tell. I started keeping my own theoretical math notes in hex a long time ago; at first to prove to myself that I could do hexadecimal arithmetic routinely and accurately with a pencil, later from aesthetic conviction that it was the right thing to do. Like other applied mathematicians, I've several own private notations, and in general these are not permitted to burden the published text. The hex notation is not my own, though. It existed before I arrived on the scene and, since I know of no math book better positioned to risk its use, I have with hesitation and no little trepidation resolved to let this book use it. Some readers will approve; some will tolerate; undoubtedly some will do neither. The views of the last group must be respected, but in the meantime the book has a mission; and crass popularity can be only one consideration, to be balanced against other factors. The book might gain even more readers, after all, had it no formulas, and painted landscapes in place of geometric diagrams! I like landscapes, too, but anyway you can see where that line of logic leads. More substantively: despite the book's title, adverse criticism from some quarters for lack of rigor is probably inevitable; nor is such criticism necessarily improper from my point of view. Still, serious books by professional mathematicians tend to be \emph{for} professional mathematicians, which is understandable but does not always help the scientist or engineer who wants to use the math to model something. The ideal author of such a book as this would probably hold two doctorates: one in mathematics and the other in engineering or the like. The ideal author lacking, I have written the book. So here you have my old high-school notes, extended over % diagn: how many years now? twenty-five years and through the course of % diagn: how many degrees now? two-and-a-half university degrees, now partly typed and revised for the first time as a \LaTeX\ manuscript. Where this manuscript will go in the future is hard to guess. Perhaps the revision you are reading is the last. Who can say? The manuscript met an uncommonly enthusiastic reception at Debconf~6~\cite{Debian} May~2006 at Oaxtepec, Mexico; and in August of the same year it warmly welcomed Karl Sarnow and Xplora Knoppix~\cite{Xplora} aboard as the second official distributor of the book. Such developments augur well for the book's future at least. But in the meantime, if anyone should challenge you to prove the Pythagorean theorem on the spot, why, whip this book out and turn to \S~\ref{alggeo:223}. That should confound 'em. \nopagebreak \noindent\\ THB derivations-0.53.20120414.orig/tex/trig.tex0000644000000000000000000015147011742566274016636 0ustar rootroot% ---------------------------------------------------------------------- \chapter{Trigonometry} \label{trig} \index{trigonometry} \emph{Trigonometry} is the branch of mathematics which relates angles to lengths. This chapter introduces the trigonometric functions and derives their several properties. % ---------------------------------------------------------------------- \section{Definitions} \label{trig:226} \index{right triangle} \index{triangle!right} \index{circle} Consider the circle-inscribed right triangle of Fig.~\ref{trig:226:f1}. \index{angle} \index{radian} \index{radius} \index{circle!unit} \index{unit circle} \index{arc} \index{unit} \index{unity} \index{one} \index{$1$ (one)} \index{length!curved} \index{dimensionlessness} \index{revolution} \index{angle!square} \index{angle!right} \index{$2\pi$} In considering the circle, we will find some terminology useful: the \emph{angle}~$\phi$ in the diagram is measured in \emph{radians,} where a radian is the angle which, when centered in a \emph{unit circle,} describes an arc of unit length.% \footnote{ The word ``unit'' means ``one'' in this context. A unit length is a length of~1 (not one centimeter or one mile, just an abstract~1). A unit circle is a circle of radius~1. } Measured in radians, an angle~$\phi$ intercepts an arc of curved length~$\rho\phi$ on a circle of \emph{radius}~$\rho$ (that is, of distance~$\rho$ from the circle's center to its perimeter). An angle in radians is a dimensionless number, so one need not write ``$\phi=2\pi/4\ \mbox{radians}$''; it suffices to write ``$\phi=2\pi/4$.'' In mathematical theory, we express angles in radians. The angle of full revolution is given the symbol~$2\pi$---which thus is the circumference of a unit circle.% \footnote{ Section~\ref{taylor:355} computes the numerical value of~$2\pi$. } A quarter revolution, $2\pi/4$, is then the \emph{right angle,} or square angle. \index{sine} \index{cosine} \index{tangent} \index{slope} \index{diagonal} \index{quadrant} \index{trigonometric function} \index{trigonometric function!inverse} \index{arcsine} \index{arccosine} \index{arctangent} \index{rise}\index{run} \index{coordinates!rectangular} \index{rectangular coordinates} \index{north}\index{south}\index{east}\index{west}\index{up}\index{down} \index{origin} The trigonometric functions $\sin\phi$ and $\cos\phi$ (the ``sine'' and ``cosine'' of~$\phi$) relate the angle~$\phi$ to the lengths shown in Fig.~\ref{trig:226:f1}. \begin{figure} \caption[The sine and the cosine.]{The sine and the cosine (shown on a circle-inscribed right triangle, with the circle centered at the triangle's point).} \label{trig:226:f1} \bc { \nc\xax{-3.0} \nc\xbx{-1.0} \nc\xcx{ 6.0} \nc\xdx{ 3.0} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xxa{3.4641} \nc\xxaa{1.7321} \nc\xxaaa{2.2} \nc\xxb{0.35} \nc\xxc{2.0} \nc\xxcc{1.0} \nc\xxd{4.0} \nc\xxda{4.3} \nc\xxdc{2.6} \nc\xxdo{0.2} \nc\xxe{0.8} \nc\xxf{-12} \nc\xxg{ 45} \nc\xxh{0.35} \nc\xxi{0.8} \nc\xxj{30} \nc\xxjj{15} \psline[linewidth=2.0pt]{cc-cc}(0,0)(\xxa,0)(\xxa,\xxc) \psline[linewidth=2.0pt]{cc-cc}(0,0) (\xxa,\xxc) \psline[linewidth=0.5pt](0,0)(\xxda,0) \psline[linewidth=0.5pt](0,0)(0,\xxdc) \rput(\xxda,0){\rput(\xxdo,0){$x$}} \rput(0,\xxdc){\rput(0,\xxdo){$y$}} \psdot [linewidth=2.0pt](0,0) \psarc [linewidth=0.5pt](0,0){\xxd}{\xxf}{\xxg} \rput(\xxa,0){ \psline[linewidth=0.5pt](-\xxh,0)(-\xxh,\xxh)(0,\xxh) } \uput[120](\xxaa,\xxcc){$\rho$} \uput[u]{* 0}(\xxaaa,0 ){$\rho\cos\phi$} \uput[u]{*90}(\xxa ,\xxcc){$\rho\sin\phi$} \psarc [linewidth=0.5pt]{->}(0,0){\xxi}{0}{\xxj} \rput{\xxjj}(0,0){ \uput[r](\xxi,0){ \rput{*0}(0,0){$\phi$} } } \end{pspicture} } \ec \end{figure} The tangent function is then defined as \bq{trig:226:10} \tan\phi \equiv \frac{\sin\phi}{\cos\phi}, \eq which is the ``rise'' per unit ``run,'' or \emph{slope,} of the triangle's diagonal.% \footnote{ % diagn: this new footnote wants review. Often seen in print is the additional notation $\sec\phi \equiv 1/\cos\phi$, $\csc\phi \equiv 1/\sin\phi$ and $\cot\phi \equiv 1/\tan\phi$; respectively the ``secant,'' ``cosecant'' and ``cotangent.'' This book does not use the notation. } Inverses of the three trigonometric functions can also be defined: \[ \begin{split} \arcsin\left(\sin\phi\right) &= \phi, \\ \arccos\left(\cos\phi\right) &= \phi, \\ \arctan\left(\tan\phi\right) &= \phi. \end{split} \] When the last of these is written in the form \[ \arctan\left(\frac{y}{x}\right), \] it is normally implied that~$x$ and~$y$ are to be interpreted as rectangular coordinates% \footnote{ \emph{Rectangular coordinates} are pairs of numbers $(x,y)$ which uniquely specify points in a plane. Conventionally, the~$x$ coordinate indicates distance eastward; the~$y$ coordinate, northward. For instance, the coordinates $(3,-4)$ mean the point three units eastward and four units southward (that is,~$-4$ units northward) from the \emph{origin} $(0,0)$. A third rectangular coordinate can also be added---$(x,y,z)$---where the~$z$ indicates distance upward. } %$\mbox{}^{,}$% % diagn: consider restoring the following footnote, but probably not. %\footnote{ % Because the ``oo'' of ``coordinates'' is not the monophthongal ``oo'' % of ``boot'' and ``door,'' the old publishing convention this book % generally follows should style the word as ``co\"ordinates.'' The % book uses the word however as a technical term. For better or for % worse, every English-language technical publisher the author knows of % styles the technical term as ``coordinates.'' The author neither has % nor desires a mandate to reform technical publishing practice, so % ``coordinates'' the word shall be. %} and that the $\arctan$ function is to return~$\phi$ in the correct quadrant $-\pi<\phi\le\pi$ (for example, $\arctan[1/(-1)] = [+3/8][2\pi]$, whereas $\arctan[(-1)/1] = [-1/8][2\pi]$). This is similarly the usual interpretation when an equation like \[ \tan\phi = \frac{y}{x} \] is written. \index{Pythagorean theorem!and the sine and cosine functions} By the Pythagorean theorem (\S~\ref{alggeo:223}), it is seen generally that% \footnote{ The notation $\cos^2\phi$ means $\left(\cos\phi\right)^2$. } \bq{trig:226:25} \cos^2\phi + \sin^2\phi = 1. \eq \index{sinusoid} Fig.~\ref{trig:226:sinusoid} plots the sine function. The shape in the plot is called a \emph{sinusoid.} \begin{figure} \caption{The sine function.} \label{trig:226:sinusoid} \bc \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-3.0} \nc\fyb{3.0} \nc\xxx{4.5} \nc\xxy{1.6} \nc\xxa{1.3} \nc\xxb{4.3} \nc\xxt{0.15} \nc\xxp{4.0841} \nc\xxpp{2.0420} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxx,0)(\xxx,0) \psline(0,-\xxy)(0,\xxy) \uput[r](\xxx,0){$t$} \uput[u](0,\xxy){$\sin(t)$} \psline( \xxp,-\xxt)( \xxp, \xxt) \psline(-\xxp,-\xxt)(-\xxp, \xxt) \uput[d](-\xxp,-\xxt){$\ds-\frac{2\pi}{2}\ $} \uput[d]( \xxp,-\xxt){$\ds \frac{2\pi}{2}$} \psline[linestyle=dashed](0,\xxa)(\xxpp,\xxa) \uput[l](0,\xxa){$1$} } \psplot[linewidth=2.0pt,plotpoints=200]{-\xxb}{\xxb}{ x 1.3 div 57.296 mul sin 1.3 mul } } \end{pspicture} \ec \end{figure} % ---------------------------------------------------------------------- \section{Simple properties} \label{trig:228} Inspecting Fig.~\ref{trig:226:f1} and observing~(\ref{trig:226:10}) and~(\ref{trig:226:25}), one readily discovers the several simple trigonometric properties of Table~\ref{trig:228:table}. \begin{table} \caption{Simple properties of the trigonometric functions.} \index{trigonometry!properties of} \label{trig:228:table} \bc \[ \br{rclcrcl} \sin(-\phi) &=& -\sin\phi &\ & \cos(-\phi) &=& +\cos\phi \\ \sin(2\pi/4 - \phi) &=& +\cos\phi &\ & \cos(2\pi/4 - \phi) &=& +\sin\phi \\ \sin(2\pi/2 - \phi) &=& + \sin\phi &\ & \cos(2\pi/2 - \phi) &=& -\cos\phi \\ \sin(\phi\pm2\pi/4) &=& \pm\cos\phi &\ & \cos(\phi\pm2\pi/4) &=& \mp\sin\phi \\ \sin(\phi\pm2\pi/2) &=& -\sin\phi &\ & \cos(\phi\pm2\pi/2) &=& -\cos\phi \\ \sin(\phi+n2\pi) &=& \sin\phi &\ & \cos(\phi+n2\pi) &=& \cos\phi \er \] \[ \br{rcl} \tan(-\phi) &=& -\tan\phi \\ \tan(2\pi/4 - \phi) &=& +1/\tan\phi \\ \tan(2\pi/2 - \phi) &=& -\tan\phi \\ \tan(\phi\pm2\pi/4) &=& -1/\tan\phi \\ \tan(\phi\pm2\pi/2) &=& +\tan\phi \\ \tan(\phi+n2\pi) &=& \tan\phi \er \] \bqb \frac{\sin\phi}{\cos\phi} &=& \tan\phi \\ \cos^2\phi + \sin^2\phi &=& 1 \\ 1 + \tan^2 \phi &=& \frac{1}{\cos^2\phi} \\ 1 + \frac{1}{\tan^2 \phi} &=& \frac{1}{\sin^2\phi} \eqb \ec \end{table} % ---------------------------------------------------------------------- \section{Scalars, vectors, and vector notation} \label{trig:230} \index{scalar} \index{vector} \index{vector!notation for} \index{amplitude} \index{direction} \index{vector!unit basis} \index{unit basis vector} In applied mathematics, a \emph{vector} is an amplitude of some kind coupled with a direction.% \footnote{ The same word \emph{vector} is also used to indicate an ordered set of~$N$ scalars (\S~\ref{taylor:370}) or an $N\times 1$ matrix (Ch.~\ref{matrix}), but those are not the uses of the word meant here. See also the introduction to Ch.~\ref{vector}. } For example, ``55 miles per hour northwestward'' is a vector, as is the entity~$\ve u$ depicted in Fig.~\ref{trig:231}. \begin{figure} \caption[A two-dimensional vector $\ve u = \vu x x + \vu y y$.]{A two-dimensional vector $\ve u = \vu x x + \vu y y$, shown with its rectangular components.} \label{trig:231} \index{vector!two-dimensional} \bc \begin{pspicture}(-5,-3)(5,3) { \small \nc\xa{2.6} \nc\xb{2.8} \nc\xc{2.0} \nc\xd{1.4} \nc\xcc{0.90} \nc\xdd{0.92} \nc\xe{0.25} { \psset{linewidth=0.5pt} \psline(-\xa,0)(\xa,0) \psline(0,-\xa)(0,\xa) \rput(\xb,0){$x$} \rput(0,\xb){$y$} } { \psset{linewidth=2.0pt} \psline{c->}(0,0)(\xc,\xd) \rput(\xcc,\xdd){$\ve u$} \psline{c->}(0,0)(\xc,0) \rput(1.15,\xe){$\vu x x$} \psline{c->}(0,0)(0,\xd) \rput(-0.30,0.80){$\vu y y$} } { \psset{linewidth=0.5pt,linestyle=dashed} \psline(\xc,0)(\xc,\xd) \psline(0,\xd)(\xc,\xd) } } \end{pspicture} \ec \end{figure} The entity~$\ve v$ depicted in Fig.~\ref{trig:231f3} is also a vector, in this case a three-dimensional one. \begin{figure} \caption{A three-dimensional vector $\ve v = \vu x x + \vu y y + \vu z z$.} \label{trig:231f3} \index{vector!three-dimensional} \bc \begin{pspicture}(-5,-3)(5,3) { \small \nc\xax{0.65} \nc\xay{1.3} \nc\xaxx{0.13} \nc\xayy{0.26} \nc\xa{2.6} \nc\xb{2.8} \nc\xc{2.0} \nc\xd{1.4} \nc\xcdx{0.2} \nc\xcdy{0.4} \nc\xcc{0.90} \nc\xdd{0.92} \nc\xe{0.25} \nc\xez{0.18} { \psset{linewidth=0.5pt} \psline(-\xa,0)(\xa,0) \psline(0,-\xa)(0,\xa) \psline(\xaxx,\xayy)(-\xax,-\xay) \rput(\xb,0){$x$} \rput(0,\xb){$y$} \rput{-26.565}(-\xax,-\xay){ \rput{*0}(0,-\xez){$z$} } } { \psset{linewidth=2.0pt} \psline{c->}(0,0)(1.8,1.0) } { \psset{linewidth=0.5pt,linestyle=dashed} \psline(\xc,0)(\xc,\xd) \psline(0,\xd)(\xc,\xd) { \psset{linestyle=dashed,dash=2.7951pt 1.6771pt} \rput(\xc,\xd){ \psline(0,0)(-\xcdx,-\xcdy) } \rput(0,\xd){ \psline(0,0)(-\xcdx,-\xcdy) } \rput(\xc,0){ \psline(0,0)(-\xcdx,-\xcdy) } } \rput(-\xcdx,-\xcdy){ \psline(0,0)(0,\xd)(\xc,\xd)(\xc,0)(0,0) } \rput(1.40,0.52){$\ve v$} } } \end{pspicture} \ec \end{figure} \index{unity} \index{unit} \index{vector!unit} \index{unit vector} \index{amplitude} Many readers will already find the basic vector concept familiar, but for those who do not, a brief review: Vectors such as the \bqb \ve u &=& \vu x x + \vu y y, \\ \ve v &=& \vu x x + \vu y y + \vu z z \eqb of the figures are composed of multiples of the \emph{unit basis vectors}~$\vu x$, $\vu y$ and~$\vu z$, which themselves are vectors of unit length pointing in the cardinal directions their respective symbols suggest.% \footnote{ Printing by hand, one customarily writes a general vector like~$\ve u$ as ``$\,\vec u$\,'' or just ``\,$\overline u$\,'', and a unit vector like~$\vu x$ as ``$\,\hat x$\,''. } Any vector~$\ve a$ can be factored into an \emph{amplitude}~$a$ and a \emph{unit vector}~$\vu a$, as \[ \ve a = \vu a a = \vu a \left| \ve a \right|, \] where the~$\vu a$ represents direction only and has unit magnitude by definition, and where the~$a$ or~$\left| \ve a \right|$ represents amplitude only and carries the physical units if any.% \footnote{ The word ``unit'' here is unfortunately overloaded. As an adjective in mathematics, or in its nounal form ``unity,'' it refers to the number one~(1)---not one mile per hour, one kilogram, one Japanese yen or anything like that; just an abstract~1. The word ``unit'' itself as a noun however usually signifies a physical or financial reference quantity of measure, like a mile per hour, a kilogram or even a Japanese yen. There is no inherent mathematical unity to~1 mile per hour (otherwise known as~0.447 meters per second, among other names). By contrast, a ``unitless~1''---a~1 with no physical unit attached---does represent mathematical unity. Consider the ratio $r=h_1/h_o$ of your height~$h_1$ to my height~$h_o$. Maybe you are taller than I am and so $r=1.05$ (not 1.05~cm or 1.05~feet, just~1.05). Now consider the ratio $h_1/h_1$ of your height to your own height. That ratio is of course unity, exactly~1. There is nothing ephemeral in the concept of mathematical unity, nor in the concept of unitless quantities in general. The concept is quite straightforward and entirely practical. That $r > 1$ means neither more nor less than that you are taller than I am. In applications, one often puts physical quantities in ratio precisely to strip the physical units from them, comparing the ratio to unity without regard to physical units. } For example, $a=55\mbox{\ miles per hour}$, $\vu a=\mbox{northwestward}$. The unit vector~$\vu a$ itself can be expressed in terms of the unit basis vectors: for example, if~$\vu x$ points east and~$\vu y$ points north, then $\vu a = -\vu x (1/\sqrt 2) + \vu y (1/\sqrt 2)$, where per the Pythagorean theorem $(-1/\sqrt 2)^2 + (1/\sqrt 2)^2 = 1^2.$ \index{scalar!complex} A single number which is not a vector or a matrix (Ch.~\ref{matrix}) is called a \emph{scalar.} In the example, $a=55\mbox{\ miles per hour}$ is a scalar. Though the scalar~$a$ in the example happens to be real, scalars can be complex, too---which might surprise one, since scalars by definition lack direction and the Argand phase~$\phi$ of Fig.~\ref{alggeo:225:fig} so strongly resembles a direction. However, phase is not an actual direction in the vector sense (the real number line in the Argand plane cannot be said to run west-to-east, or anything like that). The~$x$, $y$ and~$z$ of Fig.~\ref{trig:231f3} are each (possibly complex) scalars; $\ve v = \vu x x + \vu y y + \vu z z$ is a vector. If~$x$, $y$ and~$z$ are complex, then% \footnote{ Some books print~$\left|\ve v\right|$ as~$\|\ve v\|$ or even~$\|\ve v\|_2$ to emphasize that it represents the real, scalar magnitude of a complex vector. The reason the last notation subscripts a numeral~2 is obscure, having to do with the professional mathematician's generalized definition of a thing he calls the ``norm.'' This book just renders it~$\left|\ve v\right|$. } \bqa \left|\ve v\right|^2 &=& \left|x\right|^2 + \left|y\right|^2 + \left|z\right|^2 = x^{*}x + y^{*}y + z^{*}z \xn\\ &=& \left[\Re(x)\right]^2 + \left[\Im(x)\right]^2 + \left[\Re(y)\right]^2 + \left[\Im(y)\right]^2 \xn\\&&\mbox{\ \ }% + \left[\Re(z)\right]^2 + \left[\Im(z)\right]^2. \label{trig:230:20} \eqa \index{vector!point} \index{point!in vector notation} \index{origin} A point is sometimes identified by the vector expressing its distance and direction from the origin of the coordinate system. That is, the point $(x,y)$ can be identified with the vector $\vu x x+\vu y y$. However, in the general case vectors are not associated with any particular origin; they represent distances and directions, not fixed positions. \index{axes} \index{right-hand rule} \index{orientation} \index{screw} Notice the relative orientation of the axes in Fig.~\ref{trig:231f3}. The axes are oriented such that if you point your flat right hand in the~$x$ direction, then bend your fingers in the~$y$ direction and extend your thumb, the thumb then points in the~$z$ direction. This is orientation by the \emph{right-hand rule.} A left-handed orientation is equally possible, of course, but as neither orientation has a natural advantage over the other, we arbitrarily but conventionally accept the right-handed one as standard.% \footnote{ The writer does not know the etymology for certain, but verbal lore in American engineering has it that the name ``right-handed'' comes from experience with a standard right-handed wood screw or machine screw. If you hold the screwdriver in your right hand and turn the screw in the natural manner clockwise, turning the screw slot from the~$x$ orientation toward the~$y$, the screw advances away from you in the~$z$ direction into the wood or hole. If somehow you came across a left-handed screw, you'd probably find it easier to drive that screw with the screwdriver in your left hand. } Sections~\ref{trig:240} and~\ref{trig:277} and Chs.~\ref{vector} and~\ref{vcalc} speak further of the vector. % ---------------------------------------------------------------------- \section{Rotation} \label{trig:240} \index{rotation} \index{vector!rotation of} \index{axes!rotation of} \index{axes!changing} \index{symmetry!appeal to} \index{dimensionality} \index{space} \index{plane} \index{line} \index{point} \index{prime mark ($'$)} \index{$'$} A fundamental problem in trigonometry arises when a vector \bq{trig:240:08} \ve u = \vu x x + \vu y y \eq must be expressed in terms of alternate unit vectors~$\vu x'$ and~$\vu y'$, where~$\vu x'$ and~$\vu y'$ stand at right angles to one another and lie in the plane% \footnote{ A \emph{plane,} as the reader on this tier undoubtedly knows, is a flat (but not necessarily level) surface, infinite in extent unless otherwise specified. Space is three-dimensional. A plane is two-dimensional. A line is one-dimensional. A point is zero-dimensional. The plane belongs to this geometrical hierarchy. } of~$\vu x$ and~$\vu y$, but are rotated from the latter by an angle~$\phi$ as depicted in Fig.~\ref{trig:240f}.% \footnote{ The ``$\,'\,$'' mark is pronounced ``prime'' or ``primed'' (for no especially good reason of which the author is aware, but anyway, that's how it's pronounced). Mathematical writing employs the mark for a variety of purposes. Here, the mark merely distinguishes the new unit vector~$\vu x'$ from the old~$\vu x$. } \begin{figure} \caption{Vector basis rotation.} \label{trig:240f} \bc \begin{pspicture}(-5,-3)(5,3) { \small \setlength\tla{1.6cm} \nc\xa{2.6} \nc\xb{2.8} \nc\xd{0.81915\tla} \nc\xe{0.57358\tla} \nc\xg{35} \nc\xgg{17} \nc\xr{0.9} \nc\xrr{1.1} \nc\xh{ \psarc{->}(0,0){\xr}{0}{\xg} \rput{\xgg}(0,0){ \rput{*0}(\xrr,0){$\phi$} } } { \psset{linewidth=0.5pt} \psline(-\xa,0)(\xa,0) \psline(0,-\xa)(0,\xa) \rput(\xb,0){$x$} \rput(0,\xb){$y$} \rput{ 0}(0,0){\xh} \rput{90}(0,0){\xh} } { \psset{linewidth=2.0pt} \psline{c->}(0,0)(0,\tla) \psline{c->}(0,0)(\tla,0) \psline{c->}(0,0)(\xd,\xe) \psline{c->}(0,0)(-\xe,\xd) \psline{c->}(0,0)(2.2,2.6) } \rput(1.55,-0.25){$\vu x$} \rput(0.25,1.55){$\vu y$} \rput(1.55,1.10){$\vu x'$} \rput(-1.05,1.50){$\vu y'$} \rput(2.10,2.10){$\ve u$} } \end{pspicture} \ec \end{figure} In terms of the trigonometric functions of \S~\ref{trig:226}, evidently \bq{trig:240:10} \begin{split} \vu x' &= +\vu x \cos\phi + \vu y \sin\phi, \\ \vu y' &= -\vu x \sin\phi + \vu y \cos\phi; \end{split} \eq and by appeal to symmetry it stands to reason that \bq{trig:240:12} \begin{split} \vu x &= +\vu x' \cos\phi - \vu y' \sin\phi, \\ \vu y &= +\vu x' \sin\phi + \vu y' \cos\phi. \end{split} \eq Substituting~(\ref{trig:240:12}) into~(\ref{trig:240:08}) yields \bq{trig:240:rot} \ve u = \vu x' (x\cos\phi+y\sin\phi) + \vu y' (-x\sin\phi+y\cos\phi), \eq which was to be derived. Equation~(\ref{trig:240:rot}) finds general application where rotations in rectangular coordinates are involved. If the question is asked, ``what happens if I rotate not the unit basis vectors but rather the vector~$\ve u$ instead?''\ the answer is that it amounts to the same thing, except that the sense of the rotation is reversed: \bq{trig:240:rot2} \ve u' = \vu x (x\cos\phi-y\sin\phi) + \vu y (x\sin\phi+y\cos\phi). \eq Whether it is the basis or the vector which rotates thus depends on your point of view.% \footnote{ This is only true, of course, with respect to the vectors themselves. When one actually rotates a physical body, the body experiences forces during rotation which might or might not change the body internally in some relevant way. } Much later in the book, \S~\ref{vector:210} will extend rotation in two dimensions to reorientation in three dimensions. % ---------------------------------------------------------------------- \section[Trigonometric sums and differences]{Trigonometric functions of sums and differences of angles} \label{trig:250} \index{trigonometric function!of a sum or difference of angles} \index{rotation!angle of} \index{angle!of rotation} \index{equation!solving a set of simultaneously} With the results of \S~\ref{trig:240} in hand, we now stand in a position to consider trigonometric functions of sums and differences of angles. Let \[ \begin{split} \vu a &\equiv \vu x \cos \alpha + \vu y \sin \alpha, \\ \vu b &\equiv \vu x \cos \beta + \vu y \sin \beta, \end{split} \] be vectors of unit length in the~$xy$ plane, respectively at angles~$\alpha$ and~$\beta$ from the~$x$ axis. If we wanted~$\vu b$ to coincide with~$\vu a$, we would have to rotate it by $\phi=\alpha-\beta$. According to~(\ref{trig:240:rot2}) and the definition of~$\vu b$, if we did this we would obtain \[ \br{l} \vu b' = \vu x [\cos\beta\cos(\alpha-\beta)-\sin\beta\sin(\alpha-\beta)] \\ \sh{2.0} + \vu y [\cos\beta\sin(\alpha-\beta)+\sin\beta\cos(\alpha-\beta)]. \er \] Since we have deliberately chosen the angle of rotation such that $\vu b' = \vu a$, we can separately equate the~$\vu x$ and~$\vu y$ terms in the expressions for~$\vu a$ and~$\vu b'$ to obtain the pair of equations \[ \begin{split} \cos \alpha &= \cos\beta\cos(\alpha-\beta)-\sin\beta\sin(\alpha-\beta), \\ \sin \alpha &= \cos\beta\sin(\alpha-\beta)+\sin\beta\cos(\alpha-\beta). \end{split} \] Solving the last pair simultaneously% \footnote{ \label{trig:footnote10}% The easy way to do this is \bi \item to subtract $\sin\beta$ times the first equation from $\cos\beta$ times the second, then to solve the result for $\sin(\alpha-\beta)$; \item to add $\cos\beta$ times the first equation to $\sin\beta$ times the second, then to solve the result for $\cos(\alpha-\beta)$. \ei This shortcut technique for solving a pair of equations simultaneously for a pair of variables is well worth mastering. In this book alone, it proves useful many times. } for $\sin(\alpha-\beta)$ and $\cos(\alpha-\beta)$ and observing that $\sin^2(\cdot)+\cos^2(\cdot)=1$ yields \bq{trig:250:20} \begin{split} \sin(\alpha-\beta) &= \sin\alpha\cos\beta - \cos\alpha\sin\beta, \\ \cos(\alpha-\beta) &= \cos\alpha\cos\beta + \sin\alpha\sin\beta. \end{split} \eq With the change of variable $\beta\la -\beta$ and the observations from Table~\ref{trig:228:table} that $\sin(-\phi)=-\sin\phi$ and $\cos(-\phi)=+\cos(\phi)$, eqns.~(\ref{trig:250:20}) become \bq{trig:250:22} \begin{split} \sin(\alpha+\beta) &= \sin\alpha\cos\beta + \cos\alpha\sin\beta, \\ \cos(\alpha+\beta) &= \cos\alpha\cos\beta - \sin\alpha\sin\beta. \end{split} \eq Equations~(\ref{trig:250:20}) and~(\ref{trig:250:22}) are the basic formulas for trigonometric functions of sums and differences of angles. \subsection{Variations on the sums and differences} \label{trig:250.11} { \nc\al{\left(\frac{\gamma+\delta}{2}\right)} \nc\bt{\left(\frac{\gamma-\delta}{2}\right)} Several useful variations on~(\ref{trig:250:20}) and~(\ref{trig:250:22}) are achieved by combining the equations in various straightforward ways.% \footnote{ Refer to footnote~\ref{trig:footnote10} above for the technique. } These include \bq{trig:250:30} \begin{split} \sin\alpha\sin\beta &= \frac{ \cos(\alpha-\beta) - \cos(\alpha+\beta) }{2},\\ \sin\alpha\cos\beta &= \frac{ \sin(\alpha-\beta) + \sin(\alpha+\beta) }{2},\\ \cos\alpha\cos\beta &= \frac{ \cos(\alpha-\beta) + \cos(\alpha+\beta) }{2}. \end{split} \eq With the change of variables $\delta \la \alpha-\beta$ and $\gamma \la \alpha+\beta$, (\ref{trig:250:20}) and~(\ref{trig:250:22}) become \[ \begin{split} \sin\delta &= \sin\al\cos\bt - \cos\al\sin\bt, \\ \cos\delta &= \cos\al\cos\bt + \sin\al\sin\bt, \\ \sin\gamma &= \sin\al\cos\bt + \cos\al\sin\bt, \\ \cos\gamma &= \cos\al\cos\bt - \sin\al\sin\bt. \end{split} \] Combining these in various ways, we have that \bq{trig:250:50} \begin{split} \sin\gamma + \sin\delta &= 2\sin\al\cos\bt, \\ \sin\gamma - \sin\delta &= 2\cos\al\sin\bt, \\ \cos\delta + \cos\gamma &= 2\cos\al\cos\bt, \\ \cos\delta - \cos\gamma &= 2\sin\al\sin\bt. \end{split} \eq } \subsection{Trigonometric functions of double and half angles} \label{trig:250.12} \index{trigonometric function!of a double or half angle} \index{double angle} \index{half angle} \index{angle!double} \index{angle!half} If $\alpha=\beta$, then eqns.~(\ref{trig:250:22}) become the \emph{double-angle formulas} \bq{trig:250:62} \begin{split} \sin2\alpha &= 2\sin\alpha\cos\alpha, \\ \cos2\alpha &= 2\cos^2\alpha - 1 = \cos^2\alpha - \sin^2\alpha = 1 - 2\sin^2\alpha. \end{split} \eq Solving~(\ref{trig:250:62}) for $\sin^2\alpha$ and $\cos^2\alpha$ yields the \emph{half-angle formulas} \bq{trig:250:64} \begin{split} \sin^2\alpha &= \frac{1-\cos 2\alpha}{2},\\ \cos^2\alpha &= \frac{1+\cos 2\alpha}{2}. \end{split} \eq % ---------------------------------------------------------------------- \section[Trigonometrics of the hour angles]{Trigonometric functions of the hour angles} \label{trig:260} \index{angle} \index{angle!hour} \index{hour angle} \index{trigonometric function!of an hour angle} \index{hour} \index{radian} \index{circle} \index{Greenwich} \index{Observatory, Old Royal} \index{Royal Observatory, Old} \index{Old Royal Observatory} \index{clock} \index{day} In general one uses the Taylor series of Ch.~\ref{taylor} to calculate trigonometric functions of specific angles. However, for angles which happen to be integral multiples of an \emph{hour}---there are twenty-four or 0x18 hours in a circle, just as there are twenty-four or 0x18 hours in a day% \footnote{\label{trig:260:fn1}% Hence an hour is~$15^\circ$, but you weren't going to write your angles in such inelegant conventional notation as ``$15^\circ$,'' were you? Well, if you were, you're in good company. The author is fully aware of the barrier the unfamiliar notation poses for most first-time readers of the book. The barrier is erected neither lightly nor disrespectfully. Consider: \bi \item There are 0x18 hours in a circle. \item There are 360 degrees in a circle. \ei Both sentences say the same thing, don't they? But even though the ``0x'' hex prefix is a bit clumsy, the first sentence nevertheless says the thing rather better. The reader is urged to invest the attention and effort to master the notation. There is a psychological trap regarding the hour. The familiar, standard clock face shows only twelve hours not twenty-four, so the angle between eleven o'clock and twelve \emph{on the clock face} is not an hour of arc! That angle is two hours of arc. This is so because the clock face's geometry is artificial. If you have ever been to the Old Royal Observatory at Greenwich, England, you may have seen the big clock face there with all twenty-four hours on it. It'd be a bit hard to read the time from such a crowded clock face were it not so big, but anyway, the angle between hours on the Greenwich clock is indeed an honest hour of arc.~\cite{KRHB} The hex and hour notations are recommended mostly only for theoretical math work. It is not claimed that they offer much benefit in most technical work of the less theoretical kinds. If you wrote an engineering memo describing a survey angle as 0x1.80 hours instead of 22.5 degrees, for example, you'd probably not like the reception the memo got. Nonetheless, the improved notation fits a book of this kind so well that the author hazards it. It is hoped that after trying the notation a while, the reader will approve the choice. }---% for such angles simpler expressions exist. Figure~\ref{trig260:fig1} shows the angles. \begin{figure} \caption{The 0x18 hours in a circle.} \label{trig260:fig1} \bc \begin{pspicture}(-3,-5)(3,5) { \nc\xa{2.5} \nc\xb{2.5} \nc\xd{2.8} \nc\xe{0.5} \nc\xbl{\psline{-c}(-\xb,0)(-\xe,0)\psline(\xe,0)(\xb,0)} \nc\xdl{\psline(-\xd,0)(\xd,0)} \psset{linewidth=1.0pt} \pscircle(0,0){\xa} \rput{ 0}(0,0)\xdl \rput{ 15}(0,0)\xbl \rput{ 30}(0,0)\xbl \rput{ 45}(0,0)\xbl \rput{ 60}(0,0)\xbl \rput{ 75}(0,0)\xbl \rput{ 90}(0,0)\xdl \rput{105}(0,0)\xbl \rput{120}(0,0)\xbl \rput{135}(0,0)\xbl \rput{150}(0,0)\xbl \rput{165}(0,0)\xbl } \end{pspicture} \ec \end{figure} Since such angles arise very frequently in practice, it seems worth our while to study them specially. \index{triangle!equilateral} \index{square} Table~\ref{trig:hourtable} tabulates the trigonometric functions of these \emph{hour angles.} \begin{table} \caption{Trigonometric functions of the hour angles.} \label{trig:hourtable} \bc \nc\st{\rule[-3.00ex]{0.0em}{7.00ex}} \[ \br{ccccc} \multicolumn{2}{c}{\mr{ANGLE}\ \phi}&&\\ \mr{[radians]} & \mr{[hours]} & \sin\phi & \tan\phi & \cos\phi \\ \st\ds 0 & 0 & 0 & 0 & 1 \\ \st\ds\frac{2\pi}{\mr{0x18}} & 1 & \ds\frac{\sqrt 3 - 1}{2\sqrt 2} & \ds\frac{\sqrt 3 - 1}{\sqrt 3 + 1} & \ds\frac{\sqrt 3 + 1}{2\sqrt 2} \\ \st\ds\frac{2\pi}{\mr{0xC}} & 2 & \ds\frac{1}{2} & \ds\frac{1}{\sqrt 3} & \ds\frac{\sqrt 3}{2} \\ \st\ds\frac{2\pi}{\mr{8}} & 3 & \ds\frac{1}{\sqrt 2} & 1 & \ds\frac{1}{\sqrt 2} \\ \st\ds\frac{2\pi}{\mr{6}} & 4 & \ds\frac{\sqrt 3}{2} & \ds\sqrt 3 & \ds\frac{1}{2} \\ \st\ds\frac{(5)(2\pi)}{\mr{0x18}} & 5 & \ds\frac{\sqrt 3 + 1}{2\sqrt 2} & \ds\frac{\sqrt 3 + 1}{\sqrt 3 - 1} & \ds\frac{\sqrt 3 - 1}{2\sqrt 2} \\ \st\ds\frac{2\pi}{4} & 6 & 1 & \infty & 0 \er \] \ec \end{table} To see how the values in the table are calculated, look at the square and the equilateral triangle% \footnote{ An \emph{equilateral} triangle is, as the name and the figure suggest, a triangle whose three sides all have the same length. } of Fig.~\ref{trig:260:f1}. \begin{figure} \caption[Calculating the hour trigonometrics.]{A square and an equilateral triangle for calculating trigonometric functions of the hour angles.} \label{trig:260:f1} \bc \begin{pspicture}(-5,-2)(5,2) % \psframe[dimen=middle,linewidth=0.5pt](-5,-2)(5,2) \small \setlength\tla{1.2cm} \nc\xe{0.25} \nc\xd{0.5} \nc\xq{0.3} \rput(-2.0,0){ { \psset{linewidth=2.0pt} \pspolygon(-\tla,-\tla)(\tla,-\tla)(\tla,\tla)(-\tla,\tla) \psline{c-c}(-\tla,-\tla)(\tla,\tla) } { \psset{linewidth=0.5pt} \psarc(-\tla,-\tla){\xd}{0}{45} \rput(\tla,-\tla){ \psline(-\xq,0)(-\xq,\xq)(0,\xq) } } \rput(0,-1.5){$1$} \rput(1.45,0){$1$} \rput(-0.30,0.30){$\sqrt 2$} } \rput(2.0,0){ { \psset{linewidth=2.0pt} \pspolygon(-\tla,-\tla)(0,0.73205\tla)(\tla,-\tla) \psline{c-c}(0,0.73205\tla)(0,-\tla) } { \psset{linewidth=0.5pt} \psarc(-\tla,-\tla){\xd}{0}{60} \psarc(0,0.73205\tla){\xd}{240}{270} \rput(0,-\tla){ \psline(-\xq,0)(-\xq,\xq)(0,\xq) } } \rput(0,-\tla){ \psline[linewidth=0.5pt](0,0)(0,-\xe) } \rput(-0.85,-0.00){$1$} \rput( 0.85,-0.00){$1$} \rput(-0.60,-1.50){$1/2$} \rput( 0.60,-1.50){$1/2$} \rput( 0.30,-0.65){$\frac{\sqrt 3}{2}$} } \end{pspicture} \ec \end{figure} Each of the square's four angles naturally measures six hours; and since a triangle's angles always total twelve hours (\S~\ref{alggeo:323.30}), by symmetry each of the angles of the equilateral triangle in the figure measures four. Also by symmetry, the perpendicular splits the triangle's top angle into equal halves of two hours each and its bottom leg into equal segments of length~$1/2$ each; and the diagonal splits the square's corner into equal halves of three hours each. The Pythagorean theorem (\S~\ref{alggeo:223}) then supplies the various other lengths in the figure, after which we observe from Fig.~\ref{trig:226:f1} that \bi \item the sine of a non-right angle in a right triangle is the opposite leg's length divided by the diagonal's, \item the tangent is the opposite leg's length divided by the adjacent leg's, and \item the cosine is the adjacent leg's length divided by the diagonal's. \ei With this observation and the lengths in the figure, one can calculate the sine, tangent and cosine of angles of two, three and four hours. The values for one and five hours are found by applying~(\ref{trig:250:20}) and~(\ref{trig:250:22}) against the values for two and three hours just calculated. The values for zero and six hours are, of course, seen by inspection.% \footnote{ The creative reader may notice that he can extend the table to any angle by repeated application of the various sum, difference and half-angle formulas from the preceding sections to the values already in the table. However, the Taylor series (\S~\ref{taylor:315}) offers a cleaner, quicker way to calculate trigonometrics of non-hour angles. } % ---------------------------------------------------------------------- \section{The laws of sines and cosines} \label{trig:270} \index{law of sines} \index{sine!law of} \index{variable!utility} \index{utility variable} Refer to the triangle of Fig.~\ref{trig:270:f1}. \begin{figure} \caption{The laws of sines and cosines.} \label{trig:270:f1} \index{triangle} \bc \begin{pspicture}(-5,-1.0)(5,2.0) \small \nc\xx{0.5} \nc\xxa{0.70} \nc\xr{0.5} \nc\xxq{0.3} \newcommand{\dimn}[3]{ { \setlength{\tla}{0.15cm} \psset{linewidth=0.5pt} \psline(-#1,\tla)(-#1,-\tla) \psline( #1,\tla)( #1,-\tla) \psline{->}(-#2,0)(-#1,0) \psline{->}( #2,0)( #1,0) \rput{*0}(0,0){#3} } } \pspolygon[linewidth=2.0pt](-2.0,0)(0.6,1.6)(2.0,0) \psline[linewidth=0.5pt,linestyle=dashed](0.6,1.6)(0.6,0) \rput(0.6,0){\psline[linewidth=0.5pt](-\xxq,0)(-\xxq,\xxq)(0,\xxq)} \psarc[linewidth=0.5pt](-2.0,0){\xr}{0}{31.608} \psarc[linewidth=0.5pt]( 2.0,0){\xr}{131.186}{180} \rput(0,-0.25){ \dimn{2.0}{0.3}{$a$} \rput{*0}(0,0){} % Why is this necessary? } \rput(0.40,0.98){$\alpha$} \psarc[linewidth=0.5pt]{<-}(0.6,1.6){0.65}{-148.392}{-120} \psarc[linewidth=0.5pt]{->}(0.6,1.6){0.65}{-96}{-48.814} \rput(0.78,0.55){$h$} \rput(-0.82,1.05){$b$} \rput( 1.45,1.05){$c$} \rput(-1.18, 0.22){$\gamma$} \rput( 1.35, 0.22){$\beta$} \rput(-3.5,0){ \psset{linewidth=0.5pt} \psline(0,\xx)(0,0)(\xx,0) \rput(0,\xxa){$y$} \rput(\xxa,0){$x$} } \end{pspicture} \ec \end{figure} By the definition of the sine function, one can write that \[ c\sin\beta = h = b\sin\gamma, \] or in other words that \[ \frac{\sin\beta}{b} = \frac{\sin\gamma}{c}. \] But there is nothing special about~$\beta$ and~$\gamma$; what is true for them must be true for~$\alpha$, too.% \footnote{ ``But,'' it is objected, ``there \emph{is} something special about~$\alpha$. The perpendicular~$h$ drops from it.'' True. However, the~$h$ is just a utility variable to help us to manipulate the equation into the desired form; we're not interested in~$h$ itself. Nothing prevents us from dropping additional perpendiculars~$h_\beta$ and~$h_\gamma$ from the other two corners and using those as utility variables, too, if we like. We can use any utility variables we want. } Hence, \bq{trig:270:sin} \frac{\sin\alpha}{a} = \frac{\sin\beta}{b} = \frac{\sin\gamma}{c}. \eq This equation is known as \emph{the law of sines.} \index{law of cosines} \index{cosine!law of} On the other hand, if one expresses~$a$ and~$b$ as vectors emanating from the point~$\gamma$,% \footnote{ Here is another example of the book's judicious relaxation of formal rigor. Of course there is no ``point~$\gamma$''; $\gamma$ is an angle not a point. However, the writer suspects in light of Fig.~\ref{trig:270:f1} that few readers will be confused as to which point is meant. The skillful applied mathematician does not multiply labels without need. } \[ \begin{split} \ve a &= \vu x a, \\ \ve b &= \vu x b \cos\gamma + \vu y b \sin\gamma, \end{split} \] then \bqb c^2 &=& \left|\ve b - \ve a\right|^2 \\ &=& ( b\cos\gamma - a )^2 + ( b\sin\gamma )^2 \\ &=& a^2 + (b^2)(\cos^2\gamma + \sin^2\gamma) - 2ab\cos\gamma. \eqb Since $\cos^2(\cdot) + \sin^2(\cdot) = 1$, this is \bq{trig:270:cos} c^2 = a^2 + b^2 - 2ab\cos\gamma, \eq known as \emph{the law of cosines.} % ---------------------------------------------------------------------- \section{Summary of properties} \label{trig:275} Table~\ref{trig:hourtable} on page~\pageref{trig:hourtable} has listed the values of trigonometric functions of the hour angles. Table~\ref{trig:228:table} on page~\pageref{trig:228:table} has summarized simple properties of the trigonometric functions. Table~\ref{trig:275:table} summarizes further properties, gathering them from \S\S~\ref{trig:240}, \ref{trig:250} and~\ref{trig:270}. \begin{table} \caption{Further properties of the trigonometric functions.} \index{trigonometry!properties of} \label{trig:275:table} \bqb \ve u &=& \vu x' (x\cos\phi+y\sin\phi) + \vu y' (-x\sin\phi+y\cos\phi) \\ \sin(\alpha\pm\beta) &=& \sin\alpha\cos\beta \pm \cos\alpha\sin\beta \\ \cos(\alpha\pm\beta) &=& \cos\alpha\cos\beta \mp \sin\alpha\sin\beta \\ \sin\alpha\sin\beta &=& \frac{ \cos(\alpha-\beta) - \cos(\alpha+\beta) }{2} \\ \sin\alpha\cos\beta &=& \frac{ \sin(\alpha-\beta) + \sin(\alpha+\beta) }{2} \\ \cos\alpha\cos\beta &=& \frac{ \cos(\alpha-\beta) + \cos(\alpha+\beta) }{2} \\ \sin\gamma + \sin\delta &=& 2\sin\left(\frac{\gamma+\delta}{2}\right) \cos\left(\frac{\gamma-\delta}{2}\right) \\ \sin\gamma - \sin\delta &=& 2\cos\left(\frac{\gamma+\delta}{2}\right) \sin\left(\frac{\gamma-\delta}{2}\right) \\ \cos\delta + \cos\gamma &=& 2\cos\left(\frac{\gamma+\delta}{2}\right) \cos\left(\frac{\gamma-\delta}{2}\right) \\ \cos\delta - \cos\gamma &=& 2\sin\left(\frac{\gamma+\delta}{2}\right) \sin\left(\frac{\gamma-\delta}{2}\right) \\ \sin2\alpha &=& 2\sin\alpha\cos\alpha \\ \cos2\alpha &=& 2\cos^2\alpha - 1 = \cos^2\alpha - \sin^2\alpha = 1 - 2\sin^2\alpha \\ \sin^2\alpha &=& \frac{1-\cos 2\alpha}{2} \\ \cos^2\alpha &=& \frac{1+\cos 2\alpha}{2} \\ \frac{\sin\gamma}{c} &=& \frac{\sin\alpha}{a} \makebox[\arraycolsep]{}=\makebox[\arraycolsep]{} \frac{\sin\beta}{b} \\ c^2 &=& a^2 + b^2 - 2ab\cos\gamma \eqb \end{table} % ---------------------------------------------------------------------- \section{Cylindrical and spherical coordinates} \label{trig:277} \index{coordinates} \index{coordinates!rectangular} \index{rectangular coordinates} \index{axis} \index{point} \index{orthonormal vectors} \index{vector!orthonormal} Section~\ref{trig:230} has introduced the concept of the vector \[ \ve v = \vu x x + \vu y y + \vu z z. \] The coefficients $(x,y,z)$ on the equation's right side are \emph{coordinates}---specif\-i\-cally, \emph{rectangular coordinates}---which given a specific orthonormal% \footnote{ \emph{Orthonormal} in this context means ``of unit length and at right angles to the other vectors in the set.''\ \cite[``Orthonormality,'' 14:19, 7 May 2006]{wikip} } set of unit basis vectors $[\vu x\ \vu y\ \vu z]$ uniquely identify a point (see Fig.~\ref{trig:231f3} on page~\pageref{trig:231f3}; also, much later in the book, refer to \S~\ref{vector:230}). Such rectangular coordinates are simple and general, and are convenient for many purposes. However, there are at least two broad classes of conceptually simple problems for which rectangular coordinates tend to be inconvenient: problems in which an axis or a point dominates. Consider for example an electric wire's magnetic field, whose intensity varies with distance from the wire (an axis); or the illumination a lamp sheds on a printed page of this book, which depends on the book's distance from the lamp (a point). \index{coordinates!cylindrical} \index{cylindrical coordinates} \index{coordinates!spherical} \index{spherical coordinates} \index{coordinate rotation} \index{axis} \index{point} To attack a problem dominated by an axis, the \emph{cylindrical coordinates} $(\rho;\phi,z)$ can be used instead of the rectangular coordinates $(x,y,z)$. To attack a problem dominated by a point, the \emph{spherical coordinates} $(r;\theta;\phi)$ can be used.% \footnote{ Notice that the~$\phi$ is conventionally written second in cylindrical $(\rho;\phi,z)$ but third in spherical $(r;\theta;\phi)$ coordinates. This odd-seeming convention is to maintain proper right-handed coordinate rotation. (The explanation will seem clearer once Chs.~\ref{vector} and~\ref{vcalc} are read.) } Refer to Fig.~\ref{trig:fig-sphere}. Such coordinates are related to one another and to the rectangular coordinates by the formulas of Table~\ref{trig:277:20}. \begin{figure} \spherecaption \label{trig:fig-sphere} \index{sphere} \sphere \end{figure} \begin{table} \caption[Rectangular, cylindrical and spherical coordinate relations.]{Relations among the rectangular, cylindrical and spherical coordinates.} \label{trig:277:20} \index{coordinates!relations among} \settowidth\tla{$\ds \rho\cos\phi$} \bqb \rho^2 &=& x^2 + y^2 \\ r^2 &=& \rho^2 + z^2 = x^2 + y^2 + z^2 \\ \tan\theta &=& \frac \rho z \\ \tan\phi &=& \frac y x \\ z &=& r\cos\theta \\ \rho &=& r\sin\theta \\ x &=& \makebox[\tla][l]{$\ds \rho\cos\phi$} = r\sin\theta\cos\phi \\ y &=& \makebox[\tla][l]{$\ds \rho\sin\phi$} = r\sin\theta\sin\phi \eqb \end{table} Cylindrical and spherical coordinates can greatly simplify the analyses of the kinds of problems they respectively fit, but they come at a price. There are no constant unit basis vectors to match them. That is, \[ \ve v = \vu x x + \vu y y + \vu z z \neq \wu \rho \rho + \wu \phi \phi + \vu z z \neq \vu r r + \wu \theta \theta + \wu \phi \phi. \] It doesn't work that way. Nevertheless, \emph{variable} unit basis vectors are defined: \bq{trig:277:30} \index{vector!unit basis, variable} \index{vector!unit basis, cylindrical} \index{vector!unit basis, spherical} \index{unit basis vector!variable} \index{unit basis vector!cylindrical} \index{unit basis vector!spherical} \settowidth\tla{$\ds + \vu x \cos\phi$} \begin{split} \wu \rho &\equiv \makebox[\tla][l]{$\ds + \vu x \cos\phi$} + \vu y \sin\phi, \\ \wu \phi &\equiv \makebox[\tla][l]{$\ds - \vu x \sin\phi$} + \vu y \cos\phi, \\ \vu r &\equiv \makebox[\tla][l]{$\ds + \vu z \cos\theta$} + \wu \rho \sin\theta, \\ \wu \theta &\equiv \makebox[\tla][l]{$\ds - \vu z \sin\theta$} + \wu \rho \cos\theta; \end{split} \eq or, substituting identities from the table, \bq{trig:277:31} \settowidth\tla{$\ds + \vu x \cos\phi$} \begin{split} \wu \rho &= \frac{\vu x x + \vu y y}{\rho}, \\ \wu \phi &= \frac{-\vu x y + \vu y x}{\rho}, \\ \vu r &= \frac{\vu z z + \wu \rho \rho}{r} = \frac{\vu x x + \vu y y + \vu z z}{r}, \\ \wu \theta &= \frac{-\vu z \rho + \wu \rho z}{r}. \end{split} \eq Such variable unit basis vectors point locally in the directions in which their respective coordinates advance. Combining pairs of (\ref{trig:277:30})'s equations appropriately, we have also that \bq{trig:277:30a} \settowidth\tla{$\ds + \wu \phi \cos\phi$} \begin{split} \vu x &= \makebox[\tla][l]{$\ds + \wu \rho \cos\phi$} - \wu \phi \sin\phi, \\ \vu y &= \makebox[\tla][l]{$\ds + \wu \rho \sin\phi$} + \wu \phi \cos\phi, \\ \vu z &= \makebox[\tla][l]{$\ds + \vu r \cos\theta$} - \wu \theta \sin\theta, \\ \wu \rho &= \makebox[\tla][l]{$\ds + \vu r \sin\theta$} + \wu \theta \cos\theta. \end{split} \eq Convention usually orients~$\vu z$ in the direction of a problem's axis. Occasionally however a problem arises in which it is more convenient to orient~$\vu x$ or~$\vu y$ in the direction of the problem's axis (usually because~$\vu z$ has already been established in the direction of some other pertinent axis). Changing the meanings of known symbols like~$\rho$, $\theta$ and~$\phi$ is usually not a good idea, but you can use symbols like \bq{trig:277:40} \renewcommand\arraystretch{2.0} \br{rclcrcl} \ds (\rho^x)^2 &=& \ds y^2 + z^2, &\sh{1.0}& \ds (\rho^y)^2 &=& \ds z^2 + x^2, \\ \ds \tan \theta^x &=& \ds \frac{\rho^x}{x}, &\sh{1.0}& \ds \tan \theta^y &=& \ds \frac{\rho^y}{y}, \\ \ds \tan \phi^x &=& \ds \frac{z}{y}, &\sh{1.0}& \ds \tan \phi^y &=& \ds \frac{x}{z}, \er \eq instead if needed.% \footnote{ Symbols like~$\rho^x$ are logical but, as far as this writer is aware, not standard. The writer is not aware of any conventionally established symbols for quantities like these, but \S~\ref{vector:270} at least will use the $\rho^x$-style symbology. } % ---------------------------------------------------------------------- \section{The complex triangle inequalities} \label{trig:278} \index{triangle inequalities!complex} \index{triangle inequalities!vector} If the real, two-dimensional vectors~$\ve a$, $\ve b$ and~$\ve c$ represent the three sides of a triangle such that $\ve a + \ve b + \ve c = 0$, then per~(\ref{alggeo:323:20}) \[ \left|\ve a\right|-\left|\ve b\right| \le \left|\ve a + \ve b\right| \le \left|\ve a\right| + \left|\ve b\right|. \] These are just the triangle inequalities of \S~\ref{alggeo:323.20} in vector notation.% \footnote{ Reading closely, one might note that \S~\ref{alggeo:323.20} uses the~``$<$'' sign rather than the~``$\le$,'' but that's all right. See \S~\ref{intro:284.1}. } But if the triangle inequalities hold for real vectors in a plane, then why not equally for complex scalars? Consider the geometric interpretation of the Argand plane of Fig.~\ref{alggeo:225:fig} on page~\pageref{alggeo:225:fig}. Evidently, \bq{trig:278:triangle2} \left|z_1\right|-\left|z_2\right| \le \left|z_1 + z_2\right| \le \left|z_1\right| + \left|z_2\right| \eq for any two complex numbers~$z_1$ and~$z_2$. Extending the sum inequality, we have that \bq{trig:278:triangle} \left|\sum_k z_k \right| \le \sum_k \left|z_k\right|. \eq Naturally,~(\ref{trig:278:triangle2}) and~(\ref{trig:278:triangle}) hold equally well for real numbers as for complex; one may find the latter formula useful for sums of real numbers, for example, when some of the numbers summed are positive and others negative.% \footnote{ Section~\ref{mtxinv:447} proves the triangle inequalities more generally. } \index{convergence} \index{series!convergence of} \index{summation!convergence of} \index{magnitude} An important consequence of~(\ref{trig:278:triangle}) is that if $\sum \left|z_k\right|$ converges, then $\sum z_k$ also converges. Such a consequence is important because mathematical derivations sometimes need the convergence of $\sum z_k$ established, which can be hard to do directly. Convergence of $\sum \left|z_k\right|$, which per~(\ref{trig:278:triangle}) implies convergence of $\sum z_k$, is often easier to establish. See also~(\ref{inttx:250:triangle}). Equation~(\ref{trig:278:triangle}) will find use among other places in \S~\ref{taylor:316.30}. % ---------------------------------------------------------------------- \section{De~Moivre's theorem} \label{trig:280} \index{de Moivre, Abraham (1667--1754)} \index{Moivre, Abraham de (1667--1754)} \index{de Moivre's theorem} \index{complex number} \index{number!complex} \index{complex number!multiplication and division} \index{multiplication} \index{division} Compare the Argand-plotted complex number of Fig.~\ref{alggeo:225:fig} (page~\pageref{alggeo:225:fig}) against the vector of Fig.~\ref{trig:231} (page~\pageref{trig:231}). Although complex numbers are scalars not vectors, the figures do suggest an analogy between complex phase and vector direction. With reference to Fig.~\ref{alggeo:225:fig} we can write \bq{trig:280:10} z = (\rho) ( \cos\phi + i\sin\phi ) = \rho\cis\phi, \eq where \bq{trig:280:cis} \cis\phi \equiv \cos\phi + i\sin\phi. \eq If $z=x+iy$, then evidently \bq{trig:280:11} \begin{split} x&=\rho\cos\phi, \\ y&=\rho\sin\phi. \end{split} \eq Per~(\ref{alggeo:225:10}), \[ z_1z_2 = (x_1x_2 - y_1y_2) + i(y_1x_2 + x_1y_2). \] Applying~(\ref{trig:280:11}) to the equation yields \[ \frac{z_1z_2}{\rho_1\rho_2} = (\cos\phi_1\cos\phi_2 - \sin\phi_1\sin\phi_2) + i(\sin\phi_1\cos\phi_2 + \cos\phi_1\sin\phi_2). \] But according to~(\ref{trig:250:22}), this is just \[ \frac{z_1z_2}{\rho_1\rho_2} = \cos(\phi_1 + \phi_2) + i\sin(\phi_1 + \phi_2), \] or in other words \bq{trig:280:20} z_1z_2 = \rho_1\rho_2 \cis(\phi_1 + \phi_2). \eq Equation~(\ref{trig:280:20}) is an important result. It says that if you want to multiply complex numbers, it suffices \bi \item to multiply their magnitudes and \item to add their phases. \ei It follows by parallel reasoning (or by extension) that \bq{trig:280:22} \frac{z_1}{z_2} = \frac{\rho_1}{\rho_2} \cis(\phi_1 - \phi_2) \eq and by extension that \bq{trig:280:24} z^a = \rho^a \cis a\phi. \eq Equations~(\ref{trig:280:20}), (\ref{trig:280:22}) and~(\ref{trig:280:24}) are known as \emph{de~Moivre's theorem.}% \footnote{ Also called \emph{de~Moivre's formula.} Some authors apply the name of de~Moivre directly only to~(\ref{trig:280:24}), or to some variation thereof; but since the three equations express essentially the same idea, if you refer to any of them as~\emph{de~Moivre's theorem} then you are unlikely to be misunderstood. }$\mbox{}^,$\footnote{\cite{Spiegel}\cite{wikip}} We have not shown yet, but will in \S~\ref{cexp:230}, that \[ \cis \phi = \exp i\phi = e^{i\phi}, \] where $\exp(\cdot)$ is the natural exponential function and~$e$ is the natural logarithmic base, both defined in Ch.~\ref{cexp}\@. De~Moivre's theorem is most useful in this light. Section~\ref{cexp:240} will revisit the derivation of de~Moivre's theorem. derivations-0.53.20120414.orig/tex/pst-xkey.tex0000644000000000000000000000444511742575144017450 0ustar rootroot%% %% This is file `pst-xkey.tex', %% generated with the docstrip utility. %% %% The original source files were: %% %% xkeyval.dtx (with options: `pxktex') %% %% --------------------------------------- %% Copyright (C) 2004-2008 Hendri Adriaens %% --------------------------------------- %% %% This work may be distributed and/or modified under the %% conditions of the LaTeX Project Public License, either version 1.3 %% of this license or (at your option) any later version. %% The latest version of this license is in %% http://www.latex-project.org/lppl.txt %% and version 1.3 or later is part of all distributions of LaTeX %% version 2003/12/01 or later. %% %% This work has the LPPL maintenance status "maintained". %% %% This Current Maintainer of this work is Hendri Adriaens. %% %% This work consists of the file xkeyval.dtx and derived files %% keyval.tex, xkvtxhdr.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, pst-xkey.tex, pst-xkey.sty, xkveca.cls, xkvecb.cls, %% xkvesa.sty, xkvesb.sty, xkvesc.sty, xkvex1.tex, xkvex2.tex, %% xkvex3.tex and xkvex4.tex. %% %% The following files constitute the xkeyval bundle and must be %% distributed as a whole: readme, xkeyval.pdf, keyval.tex, %% pst-xkey.sty, pst-xkey.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, xkvtxhdr.tex, pst-xkey.dtx and xkeyval.dtx. %% \csname PSTXKeyLoaded\endcsname \let\PSTXKeyLoaded\endinput \edef\PSTXKeyCatcodes{% \catcode`\noexpand\@\the\catcode`\@\relax \let\noexpand\PSTXKeyCatcodes\relax } \catcode`\@=11\relax \ifx\ProvidesFile\@undefined \message{2005/11/25 v1.6 PSTricks specialization of xkeyval (HA)} \ifx\XKeyValLoaded\endinput\else\input xkeyval \fi \else \ProvidesFile{pst-xkey.tex} [2005/11/25 v1.6 PSTricks specialization of xkeyval (HA)] \@addtofilelist{pst-xkey.tex} \RequirePackage{xkeyval} \fi \def\pst@famlist{} \def\pst@addfams#1{% \XKV@for@n{#1}\XKV@tempa{% \@expandtwoargs\in@{,\XKV@tempa,}{,\pst@famlist,}% \ifin@\else\edef\pst@famlist{\pst@famlist,\XKV@tempa}\fi }% } \def\psset{% \expandafter\@testopt\expandafter\pss@t\expandafter{\pst@famlist}% } \def\pss@t[#1]#2{\setkeys+[psset]{#1}{#2}\ignorespaces} \def\@psset#1,\@nil{% \edef\XKV@tempa{\noexpand\setkeys+[psset]{\pst@famlist}}% \XKV@tempa{#1}% } \PSTXKeyCatcodes \endinput %% %% End of file `pst-xkey.tex'. derivations-0.53.20120414.orig/tex/Makefile0000644000000000000000000002153111742570762016576 0ustar rootroot # For better or for worse, the book's build procedure has grown somewhat # complicated over time. This is not because the author wanted to break # from standard LaTeX book format; he didn't and for the most part # hasn't. It is for small reasons, like the need to combine Timothy # Van Zandt's PSTricks package with proper PDF page numbering, or the # need for an extra digit of section number (10.10 instead of just 10.9) # in the book's table of contents. # # Actually, for what it does, the build procedure is a good, clean # one---or so the author believes. If you find yourself maintaining the # book and do not understand at first what the extra parts of the build # procedure do, then you can remove the parts one by one until you reach # a comfortable position, and then add the parts back one by one as you # climb the learning curve. Except in ../btool/fill-toc-ends # (a minor step which you can omit if you wish with little harm), the # build does little or nothing in a tricky way. But, still, for a LaTeX # document, there is admittedly a fair bit to building this one. The # author develops on a standard Debian platform; you might try that if # appropriate and you think that it will help. Good luck. # # On the other hand, if you do not yet know make(1), sed(1), bash(1) # and C++, well, those are standard tools; sorry. It's hard, but that # is the way it has to be. The author has tried to avoid doing things # in *nonstandard* ways, not to avoid doing them in standard ways! The # author has even refrained from inflicting Perl on you, confining its # use strictly to the optional development helpers (and has refrained # from inflicting the Autotools even on himself); and of course he has # written all narrative solely in English. For certain tasks, one needs # certain skills; so you will probably need to learn the standard tools # make(1), sed(1), bash(1) and maybe C++, along with LaTeX and the # English language of course, decently thoroughly to grok this source. # # Some comments on the last point seem in order. If you have peeked at # various free-software sources from time to time but have not really # understood most of it very well, well, it must be admitted that some # of it is not very understandable. However, a lot of it is # understandable; you just have to learn how to understand it. There is # no substitute for taking the trouble to learn. The author stood in # your place as recently as 1996; he had to learn, too, mostly by # reviewing sources like the one you are reading now, not understanding, # going away and reading dense manuals, returning, understanding a # little more, etc. Esoteric tools and software written in poor style, # you can mostly ignore, but you cannot afford permanently to ignore # sophisticated but standard techniques just because they look cryptic # to you right now, if you want to learn to program free software. You # want to learn increasingly sophisticated standard techniques gradually # over time. # # The author has taken moderate pains to ensure that this particular # source makes a decent model to study, if improving your ability to # maintain free software and to contribute to its development happens to # be your goal. By design, to read this source requires generally # relevant rather than generally irrelevant programming skills. The # short, cryptic-looking but actually logical ../btool/Makefile-subdir # is a particularly good example of the principle: one needs to know a # lot to read it, but almost everything one needs to know to read it is # well worth knowing (or you can just show it to your dad as evidence of # what you've been doing recently; he'll think it evidence that the # monkeys have been at your computer keyboard having another go at # typing Hamlet). # # Anyway, back in 1996 the author wished that one of the free-software # source files he was then trying to understand had taken the trouble to # explain what the preceding paragraphs have tried to explain. That # might have saved the author some grief. If the paragraphs have # illuminated anything worth illuminating in your estimation, why, there # they are, and make of them what you will. # # SHELL := /bin/bash out := derivations author := Thaddeus H. Black class := book def := thb.sty main0 := main bib := bib.bib tmpl0 := template clsdir := /usr/share/texlive/texmf-dist/tex/latex/base # The author cannot say definitively at which PDF level it is best to # build the book. Plainly, the higher the level, the greater the number # of readers who will have trouble opening the PDF. Also, incrementing # the level could break the book's build tools. The author's current # Ghostscript can build levels 1.2, 1.3 and 1.4; all these seem to # suffice. As it happens, while programming the build tools, the author # read parts of Adobe's PDF Reference 1.7, learning that level 1.5 # brings some potentially interesting features like image transparency, # but that level 1.6 brings features of another character. Level 1.6 # brings advanced features like encryption, interactive form pages and # three-dimensional rendering, which are complex (thus suspected to # break build tools) and probably mostly, maybe even wholly, useless to # the book. If one has no reason to do otherwise, one might refrain # from increasing $(pdflvl) beyond 1.5 for this reason, even if and when # Ghostscript gains the ability to produce PDF 1.6 or better. pdflvl := 1.4 # The following are for sed(1) to adapt $(cls). No # parameter $(l1a) is formally defined, but this only # because the corresponding length, 1.5em, is not # altered; logically, l1a := 1.5em is there. # Observe that $(l2a) = $(l1a) + $(l1b), and likewise # that $(l3a) = $(l2a) + $(l2b); otherwise the entries # in the book's table of contents would not fall # visually into proper vertical line. Observe however # that per the original $(cls0) the pattern does not # carry through to $(l4a). p1 := \\newcommand\*\\l@ p2 := \\@dottedtocline l1b := 2.5em l2a := 4.0em l2b := 3.3em l3a := 7.3em # ---------------------------------------------------------------------- main := $(main0).tex cls0 := $(class).cls cls := $(out)-$(cls0) ch := $(filter-out $(main) $(def) $(wildcard $(tmpl0)-*.tex), \ $(wildcard *.tex)) define gs touch cidfmap ; \ gs \ -dSAFER -dNOPAUSE -dBATCH \ -sDEVICE=pdfwrite -dCompatibilityLevel=$(pdflvl) \ -sPAPERSIZE=letter \ -sOutputFile=$@ $< endef define dvips dvips -t letter -o $@ $< endef .PHONY: all FORCE all : $(out).ps $(out).pdf $(out).dvi : $(main) $(def) $(cls) $(ch) $(bib) rubber $< mv -v $(main0).dvi $@ %.dvi : %.tex $(def) rubber $< %.ps : %.dvi ; $(dvips) # See the dvipdf(1), ps2pdfwr(1) and gs(1) manpages for explanation # of the following rule. %.pdf : %.ps ; $(gs) # Do you like unreadable sed(1) expressions? Here are some. Actually, # if you know sed(1) and also understand the escaping role of the dollar # sign ($) in make(1), then the expressions are not *quite* as # unreadable as they look. [If you do not, then you might learn when # you have some time: sed(1) and make(1) are standard tools worth # learning.] Comments tend to grow outdated, so you might be wary of # these words, but at the time of this writing the sole purpose of the # sed(1) operation here is to create and modify a local copy of LaTeX's # book class, to allow just a tad extra space between the section and # subsection numbers and their respective titles in the book's table of # contents (the creator of LaTeX, Leslie Lamport, evidently estimated # section number 10.10 to be unlikely, but the book Derivations happens # to have such sections). $(cls) : sed >$(cls) $(clsdir)/$(cls0) \ -e 's/^\(\\ProvidesClass{\)$(class)\(}\)[[:space:]]*$$/\1$(out)-$(class)\2/' \ -e 's/^\($(p1)section{$(p2){1}{\)\([^{}]*\)\(}{\)\([^{}]*\)\(}}\)[[:space:]]*$$/\1\2\3$(l1b)\5/' \ -e 's/^\($(p1)subsection{$(p2){2}{\)\([^{}]*\)\(}{\)\([^{}]*\)\(}}\)[[:space:]]*$$/\1$(l2a)\3$(l2b)\5/' \ -e 's/^\($(p1)subsubsection{$(p2){3}{\)\([^{}]*\)\(}{\)\([^{}]*\)\(}}\)[[:space:]]*$$/\1$(l3a)\3\4\5/' ../btool/complete-pdf ../btool/romanize: FORCE; $(MAKE) -C $(@D) $(@F) derivations.ps : derivations.dvi $(dvips) sed -i -e 's/^\(%%Title:\).*$$/\1 $(author)/;T;:1;n;b1' $@ derivations.pdf: derivations.ps ../btool/complete-pdf $(gs) ../btool/fill-toc-ends main.toc main.pgn >pdf-addendum.toc ../btool/complete-pdf $@ $< pdf-addendum.toc >pdf-addendum '$(author)' cat pdf-addendum >>$@ # To "make cleanless" removes the various intermediate TeX working # files, but leaves intact the final output document files. The "check" # target circumvents rubber(1), which quashes LaTeX warnings; with "make # check", you get an extra latex(1) run which shows any warnings. .PHONY: check cleanless clean check : $(out).dvi latex $(main) cleanless : rm -fv *.{cls,dvi,aux,bbl,blg,idx,ilg,ind,log,lof,lot,toc,pgn,bak} \ pdf-addendum cidfmap clean : cleanless rm -fv *.{ps,pdf} derivations-0.53.20120414.orig/tex/main.tex0000644000000000000000000000610211742566274016604 0ustar rootroot% The Makefile creates derivations-book.cls at build time % by modifying LaTeX's book.cls. This is why the class file % as such cannot be found among the source. \documentclass[11pt,twoside,openright,letterpaper]{derivations-book} \usepackage{thb} \newboolean{isdraft} \setboolean{isdraft}{true} \newcommand{\veryear}{2012} \newcommand{\verdate}{16 April 2012} % The \veryear on the next line can be changed to a concrete year % like 1970 for the second and subsequent printings, to keep from % updating the copyright date and the date of publication. \newcommand{\firstprintingyear}{\veryear} \title{Derivations of Applied Mathematics} \author{Thaddeus~H. Black} \date{\ifthenelse{\boolean{isdraft}}{Revised \verdate}{\scshape the debian project\ $\cdot$\ \firstprintingyear}} \makeindex %\includeonly{vector} \begin{document} \newwrite\locallog \immediate\openout\locallog=\jobname.pgn \newlength{\tla} \newlength{\tlb} \newlength{\tlc} \newlength{\tld} \newlength{\tle} \newlength{\tlf} \newlength{\tlg} \newlength{\tlh} \newlength{\tli} \newlength{\tlj} \include{sphere} \frontmatter \maketitle \ \vspace{\stretch{1}} \noindent\\ Thaddeus~H. Black, 1967--.\\ Derivations of Applied Mathematics.\\ % Second edition.\\ % (for example) U.S. Library of Congress class QA401. \noindent\\ Copyright \copyright\ \ifthenelse{\boolean{isdraft}}{1983--}{}\firstprintingyear\ by Thaddeus~H. Black $\langle\texttt{thb@derivations.org}\rangle$. \noindent\\ Published by the Debian Project~\cite{Debian}. \noindent\\ This book is free software. You can redistribute and/or modify it under the terms of the GNU General Public License~\cite{GPL}, version~2. \noindent\\ \ifthenelse{\boolean{isdraft}}{% This is a prepublished draft, dated \verdate.% }{% Version 1.01 (that is, first edition, first printing), \verdate.% } \cleardoublepage %\renewcommand\contentsname{Contents} \immediate\write\locallog{Contents \thepage} \tableofcontents \cleardoublepage \renewcommand\listtablename{List of tables} \immediate\write\locallog{List of tables \thepage} \listoftables \cleardoublepage \renewcommand\listfigurename{List of figures} \immediate\write\locallog{List of figures \thepage} \listoffigures \include{pref} \mainmatter \include{intro} \part{The calculus of a single variable} \include{alggeo} \include{trig} \include{drvtv} \include{cexp} \include{noth} \include{integ} \include{taylor} \include{inttx} \include{cubic} \part{Matrices and vectors} \include{matrix} \include{gjrank} \include{mtxinv} \include{eigen} \include{vector} \include{vcalc} \part{Transforms and special functions} \include{fours} \include{fouri} \include{specf} \include{prob} \include{stub} \appendix \cleardoublepage\addcontentsline{toc}{part}{Appendices} \part*{Appendices} \include{hex} \include{greek} \include{purec} \include{hist} %\showhyphens{} \backmatter \bibliographystyle{plain} %\renewcommand\bibname{References} \immediate\write\locallog{Bibliography \thepage} \bibliography{bib} \cleardoublepage \immediate\write\locallog{Index \thepage} {\small\printindex} \cleardoublepage \label{end} \immediate\closeout\locallog \end{document} derivations-0.53.20120414.orig/tex/bib.bib0000644000000000000000000004213111742566274016352 0ustar rootroot@book{self, author={Thaddeus H. Black}, title={{Derivations of Applied Mathematics}}, publisher={The Debian Project}, year={10 March 2010}, address={\textcomp{http://www.debian.org/}} } @book{A-S, editor={Milton Abramowitz and Irene A. Stegun}, title={{Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables}}, publisher={National Bureau of Standards and U.S. Government Printing Office}, year=1964, number=55, series={Applied Mathematics Series}, address={Washington, D.C.}, month={June}, } @book{Arfken/Weber, author={George B. Arfken and Hans J. Weber}, title={{Mathematical Methods for Physicists}}, publisher={Academic Press}, year=2005, address={Burlington, Mass.}, edition={6th}, } @article{Arnold:1997, author={Douglas N. Arnold}, title={{Complex analysis}}, journal={{\upshape Dept. of Mathematics, Penn State Univ.}}, year={1997}, note={Lecture notes.} } @book{Andrews, author={Larry C. Andrews}, title={{Special Functions of Mathematics for Engineers}}, publisher={Macmillan}, year=1985, address={New York}, } @book{Balanis, author={Constantine A. Balanis}, title={{Advanced Engineering Electromagnetics}}, publisher={John Wiley~\& Sons}, year=1989, address={New York}, } @book{Banos, author={Alfredo Ba\~nos, Jr.}, title={{Dipole Radiation in the Presence of a Conducting Half-Space}}, publisher={Pergamon Press}, year=1966, address=Oxford, } @book{Beattie, author={Christopher Beattie and John Rossi and Robert C. Rogers}, title={{Notes on Matrix Theory}}, publisher={Unpublished}, year={6~Dec. 2001}, address={Department of Mathematics, Virginia Polytechnic Institute and State University, Blacksburg, Va.}, } @book{vanBladel, author={Bladel, J. van}, title={{Singular Electromagnetic Fields and Sources}}, publisher={Clarendon Press}, year=1991, number=28, series={Engineering Science Series}, address={Oxford}, } @book{BSL, author={R. Byron Bird and Warren E. Stewart and Edwin N. Lightfoot}, title={{Transport Phenomena}}, publisher={John Wiley~\& Sons}, year=1960, address={New York}, } @book{Cheng, author={Cheng, David K.}, title={{Field and Wave Electromagnetics}}, publisher={Addison-Wesley}, year=1989, series={Series in Electrical Engineering}, address={Reading, Mass.}, edition={2nd}, } @book{Couch, author={Leon W. {Couch II}}, title={{Modern Communication Systems: Principles and Applications}}, publisher={Prentice Hall}, year=1995, address={Upper Saddle River, N.J.}, } @book{Courant/Hilbert, author={Richard Courant and David Hilbert}, title={{Methods of Mathematical Physics}}, publisher={Interscience (Wiley)}, year=1953, address={New York}, edition={{first English}}, } @book{Doetsch, author={G. Doetsch}, title={{Guide to the Applications of the Laplace and z-Transforms}}, publisher={Van Nostrand Reinhold}, year=1971, address={London}, note={Referenced indirectly by way of~\cite{Phillips/Parr}}, } @book{Feynman, author={Richard P. Feynman and Robert B. Leighton and Matthew Sands}, title={{The Feynman Lectures on Physics}}, publisher={Addison-Wesley}, year={1963--65}, address={Reading, Mass.}, note={Three volumes.}, } @book{Fisher, author={Stephen~D. Fisher}, title={Complex Variables}, publisher={Dover}, year=1990, series={Books on Mathematics}, address={Mineola, N.Y.}, edition={2nd}, } @book{Franklin, author={Joel N. Franklin}, title={{Matrix Theory}}, publisher={Dover}, year=1968, series={Books on Mathematics}, address={Mineola, N.Y.}, } @book{Friedberg-IS, author={Stephen H. Friedberg and Arnold J. Insel and Lawrence E. Spence}, title={{Linear Algebra}}, publisher={Pearson Education/Prentice-Hall}, year=2003, address={Upper Saddle River, N.J.}, edition={4th}, } @book{Gibbon, author={Edward Gibbon}, title={{The History of the Decline and Fall of the Roman Empire}}, year=1788, } @article{Gibbs:1899, author={J.W. Gibbs}, title={{Fourier series}}, journal={Nature}, year=1899, volume=59, pages={606}, note={Referenced indirectly by way of~\cite[``Gibbs Phenomenon,'' 06:12, 13~Dec. 2008]{wikip}, this letter of Gibbs completes the idea of the same author's paper on p.~200 of the same volume}, } @book{Goldman, author={William Goldman}, title={{The Princess Bride}}, publisher={Ballantine}, year=1973, address={New York}, } @book{Hamming, author={Richard W. Hamming}, title={{Methods of Mathematics Applied to Calculus, Probability, and Statistics}}, publisher={Dover}, year=1985, series={Books on Mathematics}, address={Mineola, N.Y.}, } @book{Knopp, author={Konrad Knopp}, title={{Theory and Application of Infinite Series}}, publisher={Hafner}, year=1947, address={New York}, edition={{2nd English ed., revised in accordance with the 4th German}}, } @book{Hardy, author={G.H. Hardy}, title={{Divergent Series}}, publisher={Oxford University Press}, year=1949, address={Oxford}, } @book{Harrington, author={Roger F. Harrington}, title={{Time-harmonic Electromagnetic Fields}}, publisher={McGraw-Hill}, year=1961, series={Texts in Electrical Engineering}, address={New York}, } @book{Harrington:1993, author={Roger F. Harrington}, title={{Field Computation by Moment Methods}}, publisher={IEEE Press}, address={Piscataway, N.J.}, year=1993, } @book{Hefferon, author={Jim Hefferon}, title={{Linear Algebra}}, publisher={Mathematics, St. Michael's College}, year={20~May 2006}, address={Colchester, Vt.}, note={(The book is free software, and besides is the best book of the kind the author of the book you hold has encountered. As of 3~Nov. 2007 at least, one can download it from \textcomp{http://joshua.smcvt.edu/linearalgebra/}.)} } @article{Hestenes/Stiefel:1952, author={Magnus R. Hestenes and Eduard Stiefel}, title={{Methods of conjugate gradients for solving linear systems}}, journal={Journal of Research of the National Bureau of Standards}, year=1952, volume={49}, number=6, pages={409--36}, month={Dec.}, note={Research paper 2379.}, } @book{Hildebrand, author={Francis B. Hildebrand}, title={{Advanced Calculus for Applications}}, publisher={Prentice-Hall}, year=1976, address={Englewood Cliffs, N.J.}, edition={2nd}, } @article{Hopman, author={Theo Hopman}, title={{Introduction to indicial notation}}, journal={\textcomp{\upshape http://www. uoguelph.ca/\char`~thopman/246/indicial.pdf}}, year={28~Aug. 2002}, } @manual{Intel, title={{IA-32 Intel Architecture Software Developer's Manual}}, organization={Intel Corporation}, edition={19th}, month={March}, year=2006, } @book{Jeffreys/Jeffreys, author={H. Jeffreys and B.S. Jeffreys}, title={{Methods of Mathematical Physics}}, publisher={Cambridge University Press}, address={Cambridge}, year=1988, edition={3rd}, } @book{JJH, author={David E. Johnson and Johnny R. Johnson and John L. Hilburn}, title={{Electric Circuit Analysis}}, publisher={Prentice Hall}, year=1989, address={Englewood Cliffs, N.J.}, } @book{Jolley, author={L.B.W. Jolley}, title={{Summation of Series}}, publisher={Dover}, address={Mineola, N.Y.}, year={1961}, edition={2nd revised}, } @book{K-R, author={Brian W. Kernighan and Dennis M. Ritchie}, title={{The C Programming Language}}, publisher={Prentice Hall PTR}, year=1988, series={Software Series}, address={Englewood Cliffs, N.J.}, edition={2nd}, } @book{Lay, author={David C. Lay}, title={{Linear Algebra and Its Applications}}, publisher={Addison-Wesley}, year=1994, address={Reading, Mass.}, } @book{Lebedev, author={N.N. Lebedev}, title={{Special Functions and Their Applications}}, publisher={Dover}, year=1965, series={Books on Mathematics}, address={Mineola, N.Y.}, edition={{revised English}}, } @book{McMahon, author={McMahon, David}, title={{Quantum Mechanics Demystified}}, publisher={McGraw-Hill}, year=2006, series={Demystified Series}, address={New York}, } @article{Mosig/Alvarez:2002, author={J.R.~Mosig and A.~Alvarez Melc\'on}, title={{The summation by parts algorithm---a new efficient technique for the rapid calculation of certain series arising in shielded planar structures}}, journal={IEEE Transactions on Microwave Theory and Techniques}, year=2002, volume={50}, number=1, pages={215--18}, month={Jan.}, } @book{Nayfeh/Bal, author={Ali H. Nayfeh and Balakumar Balachandran}, title={{Applied Nonlinear Dynamics: Analytical, Computational and Experimental Methods}}, publisher={Wiley}, year=1995, address={New York}, series={Series in Nonlinear Science}, } @article{Peterson/Mittra:1986, author={Andrew F. Peterson and Raj Mittra}, title={{Convergence of the conjugate gradient method when applied to matrix equations representing electromagnetic scattering problems}}, journal={IEEE Transactions on Antennas and Propagation}, year=1986, volume={AP-34}, number=12, pages={1447--54}, month={Dec.}, } @book{Phillips/Parr, author={Charles~L. Phillips and John M. Parr}, title={{Signals, Systems and Transforms}}, publisher={Prentice-Hall}, year=1995, address={Englewood Cliffs, N.J.}, } @article{PM:1964, author={W.J. {Pierson, Jr.,} and L. Moskowitz}, title={{A proposed spectral form for fully developed wind seas based on the similarity theory of S.A. Kitaigorodskii}}, journal={J. Geophys. Res.}, year=1964, volume={69}, pages={5181--90}, } @book{Sadiku, author={Matthew N.O. Sadiku}, title={{Numerical Techniques in Electromagnetics}}, publisher={CRC Press}, year=2001, address={Boca Raton, Fla.}, edition={2nd}, } @book{Sagan, author={Carl Sagan}, title={{Cosmos}}, publisher={Random House}, year=1980, address={New York}, } @book{Shenk, author={Al Shenk}, title={{Calculus and Analytic Geometry}}, publisher={Scott, Foresman \& Co.}, year=1984, address={Glenview, Ill.}, edition={3rd}, } @book{Sedra/Smith, author={Adel S. Sedra and Kenneth C. Smith}, title={{Microelectronic Circuits}}, publisher={Oxford University Press}, year=1991, series={Series in Electrical Engineering}, address={New York}, edition={3rd}, } @book{Shirer, author={Shirer, William L.}, title={{The Rise and Fall of the Third Reich}}, publisher={Simon \& Schuster}, year=1960, address={New York}, } @book{Spiegel, author={Murray R. Spiegel}, title={{Complex Variables: with an Introduction to Conformal Mapping and Its Applications}}, publisher={McGraw-Hill}, year=1964, series={Schaum's Outline Series}, address={New York}, } @book{SRW, author={James Stewart and Lothar Redlin and Saleem Watson}, title={{Precalculus: Mathematics for Calculus}}, publisher={Brooks/Cole}, year=1993, address={Pacific Grove, Calif.}, edition={3rd}, } @book{Stratton, author={Julius Adams Stratton}, title={{Electromagnetic Theory}}, publisher={McGraw-Hill}, year=1941, series={International Series in Pure and Applied Physics}, address={New York}, } @book{Stroustrup, author={Bjarne Stroustrup}, title={{The C++ Programming Language}}, publisher={Addison-Wesley}, year=2000, address={Boston}, edition={``special'' (third-and-a-half?)}, } @book{Tolkien, author={J.R.R. Tolkien}, title={{The Lord of the Rings}}, publisher={Houghton Mifflin}, year=1965, address={Boston}, edition={2nd}, } @book{vdVorst, author={Vorst, Henk A. van der}, title={{Iterative Krylov Methods for Large Linear Systems}}, publisher={Cambridge University Press}, address={Cambridge}, year=2003, number=13, series={Monographs on Applied and Computational Mathematics}, } @book{Watson, author={Watson, G.N.}, title={{A Treatise on the Theory of Bessel Functions}}, publisher={Macmillan}, address={New York}, year=1944, edition={2nd}, } @book{Webster1913, author={Noah Porter}, title={{Webster's Revised Unabridged Dictionary}}, publisher={C. \& G. Merriam Co.}, year=1913, address={Springfield, Mass.}, } @book{EWW, author={Eric W. Weisstein}, title={{CRC Concise Encyclopedia of Mathematics}}, publisher={Chapman \& Hall/CRC}, year=2003, address={Boca Raton, Fla.}, edition={2nd}, } @article{Wilbraham:1848, author={Henry Wilbraham}, title={{On a certain periodic function}}, journal={Cambridge and Dublin Mathematical Journal}, year=1848, volume=3, pages={198--201}, note={Referenced indirectly by way of~\cite[``Henry Wilbraham,'' 04:06, 6~Dec. 2008]{wikip}}, } @book{Wilkinson, author={J.H. Wilkinson}, title={{The Algebraic Eigenvalue Problem}}, publisher={Clarendon Press}, year=1965, series={Monographs on Numerical Analysis}, address={Oxford}, } @misc{KRHB, author={Kristie H. Black}, howpublished={Private conversation}, year=1996, } @misc{Brown-lecture, author={Gary S. Brown}, howpublished={Lecture, Virginia Polytechnic Institute and State University, Blacksburg, Va.}, year={2004--05}, } @misc{Brown-conversation, author={Gary S. Brown}, howpublished={Private conversation}, year={2004--08}, } @misc{Kohler-lecture, author={Werner E. Kohler}, howpublished={Lecture, Virginia Polytechnic Institute and State University, Blacksburg, Va.}, year=2007, } @misc{Scales-lecture, author={Wayne A. Scales}, howpublished={Lecture, Virginia Polytechnic Institute and State University, Blacksburg, Va.}, year=2004, } @misc{deSturler-lecture, author={Sturler, Eric de}, howpublished={Lecture, Virginia Polytechnic Institute and State University, Blacksburg, Va.}, year=2007, } @misc{CERN-Alice-delta, author={{CERN (European Organization for Nuclear Research, author unknown).}}, key={CERN}, title={{The delta transformation}}, howpublished={\textcomp{http://aliceinfo.cern.ch/ Offline/Activities/Alignment/deltatr.html}}, note={As retrieved 24~May 2008}, } @misc{Debian, author={Debian Project, The}, howpublished={\textcomp{http://www.debian.org/}}, } @misc{DFSG, author={Debian Project, The}, title={{Debian Free Software Guidelines, version~1.1}}, howpublished={\textcomp{http://www.debian.org/social\char`_contract\#guidelines}}, } @misc{GPL, author={Free Software Foundation, The}, title={{GNU General Public License, version~2}}, howpublished={\textcomp{/usr/share/common-licenses/GPL-2} on a Debian system}, note={The Debian Project: \textcomp{http://www.debian.org/}. The Free Software Foundation: 51 Franklin St., Fifth Floor, Boston, Mass. 02110-1301, USA}, } @misc{Hahn, author={Karl Hahn}, title={\emph{Karl's Calculus Tutor}}, howpublished={\textcomp{http://http://www.karlscalculus.org/}}, note={As retrieved 18~Sept. 2007}, } @article{math21b, author={{Harvard University (author unknown).}}, key={Harvard University}, title={{Math~21B review}}, journal={\textcomp{\upshape http://www. math.harvard.edu/archive/21b\char`_fall\char`_03/final/21breview.pdf}}, year={13~Jan. 2004}, } @misc{Jones/Fjeld, author={Eric M. Jones and Paul Fjeld}, title={{Gimbal angles, gimbal lock, and a fourth gimbal for Christmas}}, howpublished={\textcomp{http://www.hq.nasa.gov/alsj/ gimbals.html}}, note={As retrieved 23~May 2008}, } @misc{def-applied-math, author={Labor Law Talk, The}, howpublished={\textcomp{http://encyclopedia.laborlawtalk.com/ Applied\char`_mathematics}}, note={As retrieved 1~Sept. 2005}, } @misc{mathbios, author={John J. O'Connor and Edmund F. Robertson}, title={\emph{The MacTutor History of Mathematics}}, howpublished={School of Mathematics and Statistics, University of St.~Andrews, Scotland, \textcomp{http://www-history.mcs. st-andrews.ac.uk/}}, note={As retrieved 12~Oct. 2005 through 2~Nov. 2005}, } @misc{planetm, author={PlanetMath.org}, title={\emph{Planet Math}}, howpublished={\textcomp{http://www.planetmath.org/}}, note={As retrieved 21~Sept. 2007 through 20~Feb. 2008}, } @article{physics321, author={{Reed College (author unknown).}}, key={Reed College}, title={{Levi-Civita symbol, lecture~1, Physics~321, electrodynamics}}, journal={\textcomp{\upshape http://academic.reed.edu/physics/ courses/Physics321/page1/files/LCHandout.pdf}}, year={Portland, Ore., 27~Aug. 2007}, } @misc{Khamsi, author={M.A. Khamsi}, title={{Gibbs' phenomenon}}, howpublished={\textcomp{http://www.sosmath.com/ fourier/fourier3/gibbs.html}}, note={As retrieved 30~Oct. 2008}, } @misc{Stepney, author={Susan Stepney}, title={{Euclid's proof that there are an infinite number of primes}}, howpublished={\textcomp{http://www-users.cs.york.ac.uk/susan/cyc/p/ primeprf.htm}}, note={As retrieved 28~April 2006}, } @misc{EWW-web, author={Eric W. Weisstein}, title={{\emph{Mathworld}}}, howpublished={\textcomp{http://mathworld.wolfram.com/}}, note={As retrieved 29~May 2006 through 20~Feb. 2008}, } @misc{wikip, author={Wikimedia Foundation, The}, title={{\emph{Wikipedia}}}, howpublished={\textcomp{http://en.wikipedia.org/}}, } @misc{Xplora, author={Xplora}, title={{\emph{Xplora Knoppix}}}, howpublished={\textcomp{http://www.xplora.org/downloads/ Knoppix/}}, } @misc{Octave, author={John W. Eaton}, title={{\emph{GNU Octave}}}, howpublished={\textcomp{http://www.octave.org/}}, note={Software version 2.1.73}, } derivations-0.53.20120414.orig/tex/xkvltxp.sty0000644000000000000000000000660011742575144017416 0ustar rootroot%% %% This is file `xkvltxp.sty', %% generated with the docstrip utility. %% %% The original source files were: %% %% xkeyval.dtx (with options: `xkvltxpatch') %% %% --------------------------------------- %% Copyright (C) 2004-2008 Hendri Adriaens %% --------------------------------------- %% %% This work may be distributed and/or modified under the %% conditions of the LaTeX Project Public License, either version 1.3 %% of this license or (at your option) any later version. %% The latest version of this license is in %% http://www.latex-project.org/lppl.txt %% and version 1.3 or later is part of all distributions of LaTeX %% version 2003/12/01 or later. %% %% This work has the LPPL maintenance status "maintained". %% %% This Current Maintainer of this work is Hendri Adriaens. %% %% This work consists of the file xkeyval.dtx and derived files %% keyval.tex, xkvtxhdr.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, pst-xkey.tex, pst-xkey.sty, xkveca.cls, xkvecb.cls, %% xkvesa.sty, xkvesb.sty, xkvesc.sty, xkvex1.tex, xkvex2.tex, %% xkvex3.tex and xkvex4.tex. %% %% The following files constitute the xkeyval bundle and must be %% distributed as a whole: readme, xkeyval.pdf, keyval.tex, %% pst-xkey.sty, pst-xkey.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, xkvtxhdr.tex, pst-xkey.dtx and xkeyval.dtx. %% %% %% Based on latex.ltx. %% \NeedsTeXFormat{LaTeX2e}[1995/12/01] \ProvidesPackage{xkvltxp}[2004/12/13 v1.2 LaTeX2e kernel patch (HA)] \def\@pass@ptions#1#2#3{% \def\reserved@a{#2}% \def\reserved@b{\CurrentOption}% \ifx\reserved@a\reserved@b \@ifundefined{opt@#3.#1}{\@temptokena\expandafter{#2}}{% \@temptokena\expandafter\expandafter\expandafter {\csname opt@#3.#1\endcsname}% \@temptokena\expandafter\expandafter\expandafter{% \expandafter\the\expandafter\@temptokena\expandafter,#2}% }% \else \@ifundefined{opt@#3.#1}{\@temptokena{#2}}{% \@temptokena\expandafter\expandafter\expandafter {\csname opt@#3.#1\endcsname}% \@temptokena\expandafter{\the\@temptokena,#2}% }% \fi \expandafter\xdef\csname opt@#3.#1\endcsname{\the\@temptokena}% } \def\OptionNotUsed{% \ifx\@currext\@clsextension \let\reserved@a\CurrentOption \@onelevel@sanitize\reserved@a \xdef\@unusedoptionlist{% \ifx\@unusedoptionlist\@empty\else\@unusedoptionlist,\fi \reserved@a}% \fi } \def\@use@ption{% \let\reserved@a\CurrentOption \@onelevel@sanitize\reserved@a \@expandtwoargs\@removeelement\reserved@a \@unusedoptionlist\@unusedoptionlist \csname ds@\CurrentOption\endcsname } \def\@fileswith@pti@ns#1[#2]#3[#4]{% \ifx#1\@clsextension \ifx\@classoptionslist\relax \@temptokena{#2}% \xdef\@classoptionslist{\the\@temptokena}% \def\reserved@a{% \@onefilewithoptions#3[#2][#4]#1% \@documentclasshook}% \else \def\reserved@a{% \@onefilewithoptions#3[#2][#4]#1}% \fi \else \@temptokena{#2}% \def\reserved@b##1,{% \ifx\@nil##1\relax\else \ifx\relax##1\relax\else \noexpand\@onefilewithoptions##1% [\the\@temptokena][#4]\noexpand\@pkgextension \fi \expandafter\reserved@b \fi}% \edef\reserved@a{\zap@space#3 \@empty}% \edef\reserved@a{\expandafter\reserved@b\reserved@a,\@nil,}% \fi \reserved@a} \let\@@fileswith@pti@ns\@fileswith@pti@ns \endinput %% %% End of file `xkvltxp.sty'. derivations-0.53.20120414.orig/tex/cexp.tex0000644000000000000000000015112011742566274016620 0ustar rootroot% ---------------------------------------------------------------------- \chapter{The complex exponential} \label{cexp} \index{exponential, complex} \index{complex exponential} \index{natural exponential!complex} % diagn: this revised paragraph wants review. The complex natural exponential is ubiquitous in higher mathematics. % bad break \linebreak There seems hardly a corner of calculus, basic or advanced, in which the complex exponential does not strongly impress itself and frequently arise. Because the complex natural exponential emerges (at least pedagogically) out of the real natural exponential, this chapter introduces first the real natural exponential and its inverse, the real natural logarithm; and then proceeds to show how the two can operate on complex arguments. It derives the exponential and logarithmic functions' basic properties and explains their close relationship to the trigonometrics. It works out the functions' derivatives and the derivatives of the basic members of the trigonometric and inverse trigonometric families to which they respectively belong. % ---------------------------------------------------------------------- \section{The real exponential} \label{cexp:220} \index{exponential, real} \index{real exponential} \index{natural exponential!real} \index{natural exponential} \index{exponential, natural} \index{natural exponential!existence of} \index{exponential, natural!existence of} \index{$e$} Consider the factor \[ ( 1 + \ep )^N. \] This is the overall factor by which a quantity grows after~$N$ iterative rounds of multiplication by $(1+\ep)$. What happens when~$\ep$ is very small but~$N$ is very large? The really interesting question is, what happens in the limit, as $\ep\rightarrow 0$ and $N\rightarrow\infty$, while $x=\ep N$ remains a finite number? The answer is that the factor becomes \bq{cexp:220:exp} \exp x \equiv \lim_{\ep\rightarrow 0} ( 1 + \ep )^{x/\ep}. \eq Equation~(\ref{cexp:220:exp}) defines the \emph{natural exponential function}---commonly, more briefly named the \emph{exponential function.} Another way to write the same definition is \bqa \exp x &=& e^x, \label{cexp:220:26}\\ e &\equiv& \lim_{\ep\rightarrow 0} ( 1 + \ep )^{1/\ep}. \label{cexp:220:econst} \eqa \index{tangent line} Whichever form we write it in, the question remains as to whether the limit actually exists; that is, whether $0 1$, this action means for real~$x$ that \[ \exp x_1 \le \exp x_2 \ \ \mbox{if}\ \ x_1 \le x_2. \] However, a positive number remains positive no matter how many times one multiplies or divides it by $1+\ep$, so the same action also means that \[ 0 \le \exp x \] for all real~$x$. In light of~(\ref{cexp:220:dexp}), the last two equations imply further that \bqb \frac{d}{dx}\exp x_1 &\le& \frac{d}{dx}\exp x_2 \ \ \ \mbox{if $x_1 \le x_2$,} \\ 0 &\le& \frac{d}{dx}\exp x. \eqb But we have purposely defined the tangent line $y(x) = 1+x$ such that \vspace{-1.0ex} \[ \renewcommand\arraystretch{2.0} \br{rcrcl} \ds\exp 0 &=& \ds y(0) &=& \ds 1, \\ \ds\frac{d}{dx}\exp 0 &=& \ds\frac{d}{dx}y(0) &=& \ds 1; \er \] that is, such that the line just grazes the curve of $\exp x$ at $x=0$. Rightward, at $x > 0$, evidently the curve's slope only increases, bending upward away from the line. Leftward, at $x < 0$, evidently the curve's slope only decreases, again bending upward away from the line. Either way, the curve never crosses below the line for real~$x$. In symbols, \[ y(x) \le \exp x. \] Figure~\ref{cexp:220:fig1} depicts. \begin{figure} \caption{The natural exponential.} \label{cexp:220:fig1} \bc \nc\fxa{-4.5} \nc\fxb{ 3.3} \nc\fya{-1.0} \nc\fyb{ 3.2} \nc\xta{-3.8} \nc\xtb{ 1.0} \nc\xxa{-4.0} \nc\xxb{ 2.5} \nc\xya{-0.6} \nc\xyb{ 2.5} \nc\xxt{0.2} \nc\xxk{-1.0} \nc\xxl{ 1.8} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(\xxa,0)(\xxb,0) \psline(0,\xya)(0,\xyb) \uput[r](\xxb,0){$x$} \uput[u](0,\xyb){$\exp x$} \psline(0,1.0)(\xxt,1.0) \uput[r](\xxt,1.0){$1$} \psline(-1.0,0)(-1.0,-\xxt) \uput[d](-1.0,-\xxt){$-1$} } \psplot[linewidth=2.0pt,plotpoints=200]{\xta}{\xtb}{ 2.7183 x 1.0 div exp 1.0 mul } \rput(-0.5,0.5){ \psline[linewidth=1.0pt,linestyle=dashed](\xxk,\xxk)(\xxl,\xxl) } } \end{pspicture} \ec \end{figure} Evaluating the last inequality at $x=-1/2$ and $x=1$, we have that \settowidth\tla{$\ds\frac{1}{2}$} \bqb \frac{1}{2} &\le& \exp\left(-\frac{1}{2}\right), \\ \makebox[\tla][c]{$2$} &\le& \exp\left(1\right). \eqb But per~(\ref{cexp:220:26}) $\exp x = e^x$, so \settowidth\tla{$\ds\frac{1}{2}$} \bqb \frac{1}{2} &\le& e^{-1/2}, \\ \makebox[\tla][c]{$2$} &\le& e^{1}, \eqb or in other words, \bq{cexp:220:45} 2 \le e \le 4, \eq which in consideration of~(\ref{cexp:220:26}) puts the desired bound on the exponential function. The limit does exist. % diagn: this paragraph wants review. \index{logarithmic derivative} \index{derivative!logarithmic} Dividing~(\ref{cexp:220:dexp}) by $\exp x$ yields the \emph{logarithmic derivative} (\S~\ref{drvtv:240.40}) \bq{cexp:220:dexp-log} \index{natural exponential!logarithmic derivative of} \index{exponential, natural!logarithmic derivative of} \index{logarithmic derivative!of the natural exponential} \index{derivative!logarithmic of the natural exponential} \frac{d(\exp x)}{(\exp x)\,dx} = 1, \eq a form which expresses or captures the deep curiosity of the natural exponential maybe even better than does~(\ref{cexp:220:dexp}). By the Taylor series of Table~\ref{taylor:315:tbl}, the value \[ e \approx \mbox{0x2.B7E1} \] can readily be calculated, but the derivation of that series does not come until Ch.~\ref{taylor}. % ---------------------------------------------------------------------- \section{The natural logarithm} \label{cexp:225} \index{natural logarithm} \index{logarithm, natural} \index{exponential!general} \index{general exponential} In the general exponential expression~$b^x$, one can choose any base~$b$; for example, $b = 2$ is an interesting choice. As we shall see in \S~\ref{cexp:230}, however, it turns out that $b=e$, where~$e$ is the constant introduced in~(\ref{cexp:220:econst}), is the most interesting choice of all. For this reason among others, the base-$e$ logarithm is similarly interesting, such that we define for it the special notation \[ \ln(\cdot) = \log_e(\cdot), \] and call it the \emph{natural logarithm.} Just as for any other base~$b$, so also for base $b=e$; thus the natural logarithm inverts the natural exponential and vice versa: \bq{cexp:225:10} \br{rclcl} \ds\ln \exp x &=& \ds\ln e^x &=& \ds x, \\ \ds\exp \ln x &=& \ds e^{\ln x} &=& \ds x. \er \eq Figure~\ref{cexp:225:fig2} plots the natural logarithm. \begin{figure} \caption{The natural logarithm.} \label{cexp:225:fig2} \bc \nc\fya{-4.0} \nc\fyb{ 2.3} \nc\fxa{-1.2} \nc\fxb{ 4.2} \nc\xta{0.04} \nc\xtb{ 3.3} \nc\xya{-3.6} \nc\xyb{ 1.5} \nc\xxa{-0.8} \nc\xxb{ 3.5} \nc\xxt{0.2} \nc\xxk{-1.0} \nc\xxl{ 1.8} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(\xxa,0)(\xxb,0) \psline(0,\xya)(0,\xyb) \uput[r](\xxb,0){$x$} \uput[u](0,\xyb){$\ln x$} \psline(0,-1.0)(-\xxt,-1.0) \uput[l](-\xxt,-1.0){$-1$} \psline(1.0,0)(1.0,\xxt) \uput[u](1.0,\xxt){$1$} } \psplot[linewidth=2.0pt,plotpoints=200]{\xta}{\xtb}{ x 1.0 div ln 1.0 mul } \rput(0.5,-0.5){ \psline[linewidth=1.0pt,linestyle=dashed](\xxk,\xxk)(\xxl,\xxl) } } \end{pspicture} \ec \end{figure} If \[ y = \ln x, \] then \[ x = \exp y, \] and per~(\ref{cexp:220:dexp}), \[ \frac{dx}{dy} = \exp y. \] But this means that \[ \frac{dx}{dy} = x, \] the inverse of which is \[ \frac{dy}{dx} = \frac{1}{x}. \] In other words, \bq{cexp:225:dln} \index{natural logarithm!derivative of} \index{logarithm, natural!derivative of} \index{derivative!of the natural logarithm} \frac{d}{dx}\ln x = \frac{1}{x}. \eq Like many of the equations in these early chapters, here is another rather significant result.% \footnote{ Besides the result itself, the technique which leads to the result is also interesting and is worth mastering. We will use the technique more than once in this book. } One can specialize Table~\ref{alggeo:230:tbl}'s logarithmic base-conversion identity to read \bq{cexp:225:30} \log_b w = \frac{\ln w}{\ln b}. \eq This equation converts any logarithm to a natural logarithm. Base $b=2$ logarithms are interesting, so we note here that \[ \ln 2 = -\ln\frac 1 2 \approx \mbox{0x0.B172}, \] which Ch.~\ref{taylor} and its Table~\ref{taylor:315:tbl} will show how to calculate. % ---------------------------------------------------------------------- \section{Fast and slow functions} \label{cexp:228} \index{exponential, natural!compared to~$x^a$} \index{logarithm, natural!compared to~$x^a$} \index{natural exponential!compared to~$x^a$} \index{natural logarithm!compared to~$x^a$} \index{function!slow}\index{slow function} \index{function!fast}\index{fast function} The exponential~$\exp x$ is a \emph{fast function.} The logarithm~$\ln x$ is a \emph{slow function.} These functions grow, diverge or decay respectively faster and slower than~$x^a$. Such claims are proved by l'H\^opital's rule~(\ref{drvtv:260:lhopital}). Applying the rule, we have that { \settowidth{\tla}{$\ds\lim_{x\ra\infty}\frac{1}{ax^a}$} \nc\xx[1]{\makebox[\tla][l]{$\ds{#1}$}} \bq{cexp:228:50} \begin{split} \lim_{x\ra\infty}\frac{\ln x}{x^a} &= \xx{\lim_{x\ra\infty}\frac{-1}{ax^a}} = \begin{cases} 0 &\mbox{if}\ a>0, \\ +\infty &\mbox{if}\ a\le 0, \end{cases} \\ \lim_{x\ra 0}\frac{\ln x}{x^a} &= \xx{\lim_{x\ra 0}\frac{-1}{ax^a}} = \begin{cases} -\infty &\mbox{if}\ a\ge 0, \\ 0 &\mbox{if}\ a<0, \end{cases} \end{split} \eq }% which reveals the logarithm to be a slow function. Since the $\exp(\cdot)$ and $\ln(\cdot)$ functions are mutual inverses, we can leverage~(\ref{cexp:228:50}) to show also that \bqb \lim_{x\ra\infty}\frac{\exp (\pm x)}{x^a} &=& \lim_{x\ra\infty}\exp\left[\ln\frac{\exp(\pm x)}{x^a}\right] \\&=& \lim_{x\ra\infty}\exp\left[\pm x-a\ln x\right] \\&=& \lim_{x\ra\infty}\exp \left[(x)\left(\pm 1-a\frac{\ln x}{x}\right)\right] \\&=& \lim_{x\ra\infty}\exp \left[(x)\left(\pm 1-0\right)\right] \\&=& \lim_{x\ra\infty}\exp \left[\pm x\right]. \eqb That is, \bq{cexp:228:55} \begin{split} \lim_{x\ra\infty}\frac{\exp (+x)}{x^a} &= \infty, \\ \lim_{x\ra\infty}\frac{\exp (-x)}{x^a} &= 0, \end{split} \eq which reveals the exponential to be a fast function. Exponentials grow or decay faster than powers; logarithms diverge slower. Such conclusions are extended to bases other than the natural base~$e$ simply by observing that $\log_b x = \ln x/\ln b$ and that $b^x = \exp( x\ln b )$. Thus exponentials generally are fast and logarithms generally are slow, regardless of the base.% \footnote{ There are of course some degenerate edge cases like $b=0$ and $b=1$. The reader can detail these as the need arises. } It is interesting and worthwhile to contrast the sequence \[ \ldots,-\frac{3!}{x^4},\frac{2!}{x^3},-\frac{1!}{x^2},\frac{0!}{x^1}, \frac{x^0}{0!},\frac{x^1}{1!},\frac{x^2}{2!},\frac{x^3}{3!},\frac{x^4}{4!},\ldots \] against the sequence \[ \ldots,-\frac{3!}{x^4},\frac{2!}{x^3},-\frac{1!}{x^2},\frac{0!}{x^1}, \ln x,\frac{x^1}{1!},\frac{x^2}{2!},\frac{x^3}{3!},\frac{x^4}{4!},\ldots \] As $x \ra +\infty$, each sequence increases in magnitude going rightward. Also, each term in each sequence is the derivative with respect to~$x$ of the term to its right---except left of the middle element in the first sequence and right of the middle element in the second. The exception is peculiar. What is going on here? \index{logarithm!resemblance of to~$x^0$} The answer is that~$x^0$ (which is just a constant) and $\ln x$ \emph{both are of zeroth order in~$x$.} This seems strange at first because $\ln x$ diverges as $x \ra \infty$ whereas~$x^0$ does not, but the divergence of the former is extremely slow---so slow, in fact, that per~(\ref{cexp:228:50}) $\lim_{x \ra \infty} (\ln x)/x^\ep = 0$ for any positive~$\ep$ no matter how small.% \footnote{ One does not grasp how truly slow the divergence is until one calculates a few concrete values. Consider for instance how far out~$x$ must run to make $\ln x = \mbox{0x100}$. It's a long, long way. The natural logarithm does indeed eventually diverge to infinity, in the literal sense that there is no height it does not eventually reach, but it certainly does not hurry. As we have seen, it takes practically forever just to reach~0x100. } Figure~\ref{cexp:225:fig2} has plotted $\ln x$ only for $x \sim 1$, but beyond the figure's window the curve (whose slope is $1/x$) flattens rapidly rightward, to the extent that it locally resembles the plot of a constant value; and indeed one can write \[ x^0 = \lim_{u \ra \infty} \frac{\ln (x+u)}{\ln u}, \] which casts~$x^0$ as a logarithm shifted and scaled. Admittedly, one ought not strain such logic too far, because $\ln x$ is not in fact a constant, but the point nevertheless remains that~$x^0$ and $\ln x$ often play analogous roles in mathematics. The logarithm can in some situations profitably be thought of as a ``diverging constant'' of sorts. \index{exponential!resemblance of to~$x^\infty$} Less strange-seeming perhaps is the consequence of~(\ref{cexp:228:55}) that $\exp x$ is of infinite order in~$x$, that~$x^\infty$ and $\exp x$ play analogous roles. It befits an applied mathematician subjectively to internalize~(\ref{cexp:228:50}) and % bad break (\ref{cexp:228:55}), to remember that $\ln x$ resembles~$x^0$ and that $\exp x$ resembles~$x^\infty$. A qualitative sense that logarithms are slow and exponentials, fast, helps one to grasp mentally the essential features of many mathematical models one encounters in practice. Now leaving aside fast and slow functions for the moment, we turn our attention in the next section to the highly important matter of the exponential of a complex argument. % ---------------------------------------------------------------------- % diagn: this important section, substantially rewritten, has enjoyed % some good review but would like even more. \section{Euler's formula} \label{cexp:230} \index{Euler's formula} \index{Euler, Leonhard (1707--1783)} \index{circle} \index{unit!imaginary} \index{imaginary unit} The result of \S~\ref{cexp:220} leads to one of the central questions in all of mathematics. How can one evaluate \[ \exp i\theta = \lim_{\ep\rightarrow 0} ( 1 + \ep )^{i\theta/\ep}, \] where $i^2 = -1$ is the imaginary unit introduced in \S~\ref{alggeo:225}? To begin, one can take advantage of~(\ref{drvtv:230:apxe}) to write the last equation in the form \[ \exp i\theta = \lim_{\ep\rightarrow 0} ( 1 + i\ep )^{\theta/\ep}, \] but from here it is not obvious where to go. The book's development up to the present point gives no obvious direction. In fact it appears that the interpretation of $\exp i\theta$ remains for us to define, if we can find a way to define it which fits sensibly with our existing notions of the real exponential. So, if we don't quite know where to go with this yet, what do we know? \index{multiplication!repeated} \index{magnitude} \index{phase} One thing we know is that if $\theta = \ep$, then \[ \exp (i\ep) = (1 + i\ep)^{\ep/\ep} = 1+i\ep. \] But per \S~\ref{cexp:220}, the essential operation of the exponential function is to multiply repeatedly by some factor, the factor being not quite exactly unity and, in this case, being $1+i\ep$. With such thoughts in mind, let us multiply a complex number $z=x+iy$ by $1+i\ep$, obtaining \[ (1+i\ep)(x+iy) = (x-\ep y) + i(y+\ep x). \] The resulting change in~$z$ is \[ \Delta z = (1+i\ep)(x+iy) - (x+iy) = (\ep)(-y+ix), \] in which it is seen that \[ \renewcommand{\arraystretch}{1.5} \br{rclcl} \ds\left|\Delta z\right| &=& \ds(\ep)\sqrt{y^2+x^2} &=& \ds\ep\rho, \\ \ds\arg(\Delta z) &=& \ds\arctan\frac{x}{-y} &=& \ds\phi + \frac{2\pi}{4}. \er \] The~$\Delta z$, $\rho=\left|z\right|$ and $\phi=\arg z$ are as shown in Fig.~\ref{cexp:230:fig}. Whether in the figure or in the equations, \emph{the change~$\Delta z$ is evidently proportional to the magnitude of~$z$, but at a right angle to $z$'s radial arm in the complex plane.} \begin{figure} \caption{The complex exponential and Euler's formula.} \label{cexp:230:fig} \bc { \nc\xax{-5} \nc\xbx{-3} \nc\xcx{ 5} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xx{2.5} \nc\xxb{1.0} \nc\xxc{2.0} \nc\xxd{0.1} \nc\xxe{0.25} \nc\xxf{0.25} \nc\xxp{1.3} \nc\xxq{26.565} \nc\xxqq{13.283} \nc\xxr{2.2361} \nc\xxrr{1.1180} \nc\xxrrr{2.3} \nc\xxs{2.6} \nc\xxlsep{0.20} \nc\xxa{ \psline[linewidth=1.0pt]{<->}(-\xx,0)(\xx,0) \psline[linewidth=0.5pt]( \xxb,-\xxd)( \xxb,\xxd) \psline[linewidth=0.5pt](-\xxb,-\xxd)(-\xxb,\xxd) \psline[linewidth=0.5pt]( \xxc,-\xxd)( \xxc,\xxd) \psline[linewidth=0.5pt](-\xxc,-\xxd)(-\xxc,\xxd) } \xxa \rput{90}(0,0){\xxa} \rput[r](-\xxe,-1.9){$-i2$} \rput[r](-\xxe,-1){$-i $} \rput[r](-\xxe, 1){$ i $} \rput[r](-\xxe, 1.9){$ i2$} \rput[t](-1.9,-\xxf){$-2$} \rput[t](-1,-\xxf){$-1$} \rput[t]( 1,-\xxf){$ 1$} \rput[t]( 1.9,-\xxf){$ 2$} \psline[linewidth=2.0pt]{cc-*}(0,0)(2,1) \rput(2,1){ \psline[linewidth=2.0pt]{cc->}(0,0)(-0.2,0.4) } \pscircle[linewidth=0.5pt,linestyle=dashed](0,0){\xxr} \rput(2.25,0.90){$z$} \rput(2.15,1.50){$\Delta z$} \psarc[linewidth=0.5pt]{->}(0,0){\xxp}{0}{\xxq} \rput{\xxqq}(0,0){ \uput[r](\xxp,0){ \rput{*0}(0,0){$\phi$} } } \rput{\xxq}(0,0){ \uput{\xxlsep}[u](\xxrr,0){ \rput{*0}(0,0){$\rho$} } } \rput[l](\xxs,0){$\Re(z)$} \rput[b](0,\xxs){$i\Im(z)$} \end{pspicture} } \ec \end{figure} \index{arm, radial} \index{radial arm} \index{travel} \index{motion!about a circle} \index{motion!perpendicular to a radial arm} \index{circle!travel about} \index{infinitesimal incrementation} \index{incrementation!infinitesimal} To travel about a circle wants motion always perpendicular to the circle's radial arm, which happens to be just the kind of motion~$\Delta z$ represents. Referring to the figure and the last equations, we have then that \[ \br{rclcl} \ds\Delta \rho &\equiv& \ds\left|z+\Delta z\right| - \left|z\right| &=& \ds0, \\ \ds\Delta \phi &\equiv& \ds\arg(z+\Delta z) - \arg z = \frac{\left|\Delta z\right|}{\rho} = \frac{\ep\rho}{\rho} &=& \ds\ep, \er \] which results evidently are valid for infinitesimal $\ep \ra 0$ and, importantly, stand independently of the value of~$\rho$. % diagn: the next, new, parenthesized passage particularly wants review. (But does~$\rho$ not grow at least a little, as the last equations almost seem to suggest? The answer is no; or, if you prefer, the answer is that $\Delta\rho \approx \{[\sqrt{1+\ep^2}]-1\}\rho \approx \ep^2\rho/2 \approx 0$, a second-order infinitesimal inconsequential on the scale of~$\ep$ or~$\ep\rho$, utterly vanishing by comparison in the limit $\ep \ra 0$.) With such results in hand, now let us recall from earlier in the section that---as we have asserted or defined---% \[ \exp i\theta = \lim_{\ep\rightarrow 0} ( 1 + i\ep )^{\theta/\ep}, \] and that this remains so for arbitrary real~$\theta$. Yet what does such an equation do, mechanically, but to compute $\exp i\theta$ by multiplying~$1$ by $1 + i\ep$ repeatedly, $\theta/\ep$ times? The plain answer is that such an equation does precisely this and nothing else. We have recently seen how each multiplication of the kind the equation suggests increments the phase~$\phi$ by $\Delta\phi = \ep$ while not changing the magnitude~$\rho$. Since the phase~$\phi$ begins from $\arg 1 = 0$ it must become \[ \phi = \frac\theta\ep\ep = \theta \] after $\theta/\ep$ increments of~$\ep$ each, while the magnitude must remain \[ \rho = 1. \] Reversing the sequence of the last two equations and recalling that $\rho \equiv \left|\exp i\theta\right|$ and that $\phi \equiv \arg(\exp i\theta)$, \[ \begin{split} \left|\exp i\theta\right| &= \ds 1, \\ \arg(\exp i\theta) &= \theta. \end{split} \] Moreover, had we known that~$\theta$ were just $\phi \equiv \arg(\exp i\theta)$, naturally we should have represented it by the symbol~$\phi$ from the start. Changing $\phi \la \theta$ now, we have for real~$\phi$ that \bqb \left|\exp i\phi\right| &=& \ds 1, \\ \arg(\exp i\phi) &=& \phi, \eqb which equations together say neither more nor less than that \bq{cexp:euler} \exp i\phi = \cos \phi + i\sin \phi = \cis \phi, \eq where the notation $\cis(\cdot)$ is as defined in \S~\ref{trig:280}. \index{Euler, Leonhard (1707--1783)} \index{exponential} \index{complex exponent} \index{exponent!complex} Along with the Pythagorean theorem~(\ref{alggeo:pythag}), the fundamental theorem of calculus~(\ref{integ:antider}), Cauchy's integral formula~(\ref{taylor:cauchy}) and Fourier's equation % bad break (\ref{fouri:eqn}), eqn.~(\ref{cexp:euler}) is one of the most famous results in all of mathematics. It is called \emph{Euler's formula,}% \footnote{ For native English speakers who do not speak German, Leonhard Euler's name is pronounced as ``oiler.'' }$\mbox{}^,$\footnote{ An alternate derivation of Euler's formula~(\ref{cexp:euler})---less intuitive and requiring slightly more advanced mathematics, but briefer---constructs from Table~\ref{taylor:315:tbl} the Taylor series for~$\exp i\phi$, $\cos \phi$ and $i\sin \phi$, then adds the latter two to show them equal to the first of the three. Such an alternate derivation lends little insight, perhaps, but at least it builds confidence that we actually knew what we were doing when we came up with the incredible~(\ref{cexp:euler}). } and it opens the exponential domain fully to complex numbers, not just for the natural base~$e$ but for any base. How? Consider in light of Fig.~\ref{cexp:230:fig} and~(\ref{cexp:euler}) that one can express any complex number in the form \[ z = x+iy = \rho\exp i\phi. \] If a complex base~$w$ is similarly expressed in the form \[ w = u+iv = \sigma\exp i\psi, \] then it follows that \bqb w^z &=& \exp[\ln w^z] \xn\\ &=& \exp[z\ln w] \xn\\ &=& \exp[(x+iy)(i\psi + \ln\sigma)] \xn\\ &=& \exp[(x\ln\sigma-\psi y) + i(y\ln\sigma+\psi x)]. \xn \eqb Since $\exp(\alpha+\beta) = e^{\alpha+\beta} = \exp \alpha\exp \beta$, the last equation is \bq{cexp:230:33} w^z = \exp(x\ln\sigma-\psi y) \exp i(y\ln\sigma+\psi x), \eq where \[ \begin{split} x &= \rho \cos \phi, \\ y &= \rho \sin \phi, \\ \sigma &= \sqrt{u^2+v^2}, \\ \tan\psi &= \frac{v}{u}. \end{split} \] Equation~(\ref{cexp:230:33}) serves to raise any complex number to a complex power. %Reasoning like this section's, of a motivational rather than a deductive %character, seems to make some professional mathematicians feel slightly %uneasy. Professional mathematicians seem to tend to prefer %to take~(\ref{cexp:euler}), its Taylor series (Ch.~\ref{taylor}), or %some other, nearly related form as the \emph{definition} of the complex %exponential. The applied mathematician however tends to prefer to %emphasize motivation over definition, which is why this book has %motivated~(\ref{cexp:euler}) rather than just to define it. \index{Euler's formula!curious consequences of} \index{natural logarithm!of a complex number} Curious consequences of Euler's formula~(\ref{cexp:euler}) include that \bq{cexp:230:34} \begin{split} e^{\pm i2\pi/4} &= \pm i, \\ e^{\pm i2\pi/2} &= -1, \\ e^{i n2\pi} &= 1. \end{split} \eq For the natural logarithm of a complex number in light of Euler's formula, we have that \bq{cexp:225:dlnz} \ln w = \ln \left(\sigma e^{i\psi}\right) = \ln\sigma + i\psi. \eq % ---------------------------------------------------------------------- \section[Complex exponentials and de~Moivre]{Complex exponentials and de~Moivre's theorem} \label{cexp:240} \index{exponential, complex!and de~Moivre's theorem} \index{complex exponential!and de~Moivre's theorem} \index{de Moivre's theorem!and the complex exponential} \index{de Moivre, Abraham (1667--1754)} \index{Moivre, Abraham de (1667--1754)} \index{de Moivre's theorem} Euler's formula~(\ref{cexp:euler}) implies that complex numbers~$z_1$ and~$z_2$ can be written \bq{cexp:240:05} \begin{split} z_1 &= \rho_1 e^{i\phi_1}, \\ z_2 &= \rho_2 e^{i\phi_2}. \end{split} \eq By the basic power properties of Table~\ref{alggeo:224:t1}, then, \bq{cexp:240:10} \renewcommand\arraystretch{1.5} \br{rclcl} \ds z_1z_2 &=& \ds\rho_1\rho_2 e^{i(\phi_1 + \phi_2)} &=& \ds\rho_1\rho_2 \exp[i(\phi_1 + \phi_2)], \\ \ds\frac{z_1}{z_2} &=& \ds\frac{\rho_1}{\rho_2} e^{i(\phi_1 - \phi_2)} &=& \ds\frac{\rho_1}{\rho_2} \exp[i(\phi_1 - \phi_2)], \\ \ds z^a &=& \ds\rho^a e^{ia\phi} &=& \ds\rho^a\exp[ia\phi]. \er \eq This is de~Moivre's theorem, introduced in \S~\ref{trig:280}. % ---------------------------------------------------------------------- \section{Complex trigonometrics} \label{cexp:250} \index{trigonometrics!complex} \index{complex trigonometrics} \index{tangent!in complex exponential form} Applying Euler's formula~(\ref{cexp:euler}) to~$+\phi$ then to~$-\phi$, we have that \bqb \exp( +i\phi ) &=& \cos \phi + i\sin\phi, \\ \exp( -i\phi ) &=& \cos \phi - i\sin\phi. \eqb Adding the two equations and solving for $\cos\phi$ yields \bq{cexp:250:cos} \index{cosine!in complex exponential form} \cos\phi = \frac{\exp(+i\phi)+\exp(-i\phi)}{2}. \eq Subtracting the second equation from the first and solving for $\sin\phi$ yields \bq{cexp:250:sin} \index{sine!in complex exponential form} \sin\phi = \frac{\exp(+i\phi)-\exp(-i\phi)}{i2}. \eq Thus are the trigonometrics expressed in terms of complex exponentials. \subsection{The hyperbolic functions} \label{cexp:250.20} \index{hyperbolic functions} \index{hyperbolic trigonometrics} \index{trigonometrics!hyperbolic} \index{Pythagorean theorem!and the hyperbolic functions} The forms~(\ref{cexp:250:cos}) and~(\ref{cexp:250:sin}) suggest the definition of new functions \bqa \cosh\phi &\equiv& \frac{\exp(+\phi)+\exp(-\phi)}{2}, \label{cexp:250:cosh}\\ \sinh\phi &\equiv& \frac{\exp(+\phi)-\exp(-\phi)}{2}, \label{cexp:250:sinh}\\ \tanh\phi &\equiv& \frac{\sinh\phi}{\cosh\phi}. \eqa These are called the \emph{hyperbolic functions.} Their inverses $\mopx{arccosh}$, etc., are defined in the obvious way. The Pythagorean theorem for trigonometrics~(\ref{trig:226:25}) is that $\cos^2\phi + \sin^2\phi=1$; and from~(\ref{cexp:250:cosh}) and~(\ref{cexp:250:sinh}) one can derive the hyperbolic analog: \bq{cexp:250:pythag} \begin{split} \cos^2\phi + \sin^2\phi &= 1, \\ \cosh^2\phi - \sinh^2\phi &= 1. \end{split} \eq Both lines of~(\ref{cexp:250:pythag}) hold for complex~$\phi$ as well as for real.% \footnote{ Chapter~\ref{vector} teaches that the ``dot product'' of a unit vector and its own conjugate is unity---$\vu v^{*} \cdot \vu v = 1$, in the notation of that chapter---which tempts one incorrectly to suppose by analogy that $(\cos\phi)^{*}\cos\phi + (\sin\phi)^{*}\sin\phi = 1$ and that $(\cosh\phi)^{*}\cosh\phi - (\sinh\phi)^{*}\sinh\phi = 1$ when the angle~$\phi$ is complex. However,~(\ref{cexp:250:cos}) through~(\ref{cexp:250:sinh}) can generally be true only if~(\ref{cexp:250:pythag}) holds exactly as written for complex~$\phi$ as well as for real. Hence in fact $(\cos\phi)^{*}\cos\phi + (\sin\phi)^{*}\sin\phi \neq 1$ and $(\cosh\phi)\cosh\phi - (\sinh\phi)^{*}\sinh\phi \neq 1$. Such confusion probably tempts few readers unfamiliar with the material of Ch.~\ref{vector}, so you can ignore this footnote for now. However, if later you return after reading Ch.~\ref{vector} and if the confusion then arises, then consider that the angle~$\phi$ of Fig.~\ref{trig:226:f1} is a real angle, whereas we originally derived (\ref{cexp:250:pythag})'s first line from that figure. The figure is quite handy for real~$\phi$, but what if anything the figure means when~$\phi$ is complex is not obvious. If the confusion descends directly or indirectly from the figure, then such thoughts may serve to clarify the matter. } \index{cis} The notation $\exp i(\cdot)$ or $e^{i(\cdot)}$ is sometimes felt to be too bulky. Although less commonly seen than the other two, the notation \[ \cis(\cdot) \equiv \exp i(\cdot) = \cos(\cdot) +i\sin(\cdot) \] is also conventionally recognized, as earlier seen in \S~\ref{trig:280}. Also conventionally recognized are $\sin^{-1}(\cdot)$ and occasionally $\mopx{asin}(\cdot)$ for $\arcsin(\cdot)$, and likewise for the several other trigs. \index{hyperbolic functions!properties of} Replacing $z\la \phi$ in this section's several equations implies a coherent definition for trigonometric functions of a complex variable. Then, comparing~(\ref{cexp:250:cos}) and~(\ref{cexp:250:sin}) respectively to~(\ref{cexp:250:cosh}) and~(\ref{cexp:250:sinh}), we have that \bq{cexp:250:20} \begin{split} \cosh z &= \cos iz, \\ i \sinh z &= \sin iz, \\ i \tanh z &= \tan iz, \end{split} \eq by which one can immediately adapt the many trigonometric properties of Tables~\ref{trig:228:table} and~\ref{trig:275:table} to hyperbolic use. \index{natural exponential family of functions} \index{natural logarithmic family of functions} \index{trigonometric family of functions} \index{inverse trigonometric family of functions} At this point in the development one begins to notice that the $\sin$, $\cos$, $\exp$, $\cis$, $\cosh$ and $\sinh$ functions are each really just different facets of the same mathematical phenomenon. Likewise their respective inverses: $\arcsin$, $\arccos$, $\ln$, $-i\ln$, $\mopx{arccosh}$ and $\mopx{arcsinh}$. Conventional names for these two mutually inverse families of functions are unknown to the author, but one might call them the \emph{natural exponential} and \emph{natural logarithmic families.} Or, if the various tangent functions were included, then one might call them the \emph{trigonometric} and \emph{inverse trigonometric families.} % diagn: this subsection wants a bit more review. \subsection{Inverse complex trigonometrics} \label{cexp:250.30} \index{trigonometrics!inverse complex} \index{complex trigonometrics!inverse} Since one can express the several trigonometric functions in terms of complex exponentials one would like to know, complementarily, whether one cannot express the several inverse trigonometric functions in terms of complex logarithms. As it happens, one can.% \footnote{\cite[Ch.~2]{Spiegel}} Let us consider the arccosine function, for instance. If per~(\ref{cexp:250:cos}) \[ z = \cos w = \frac{e^{iw}+e^{-iw}}{2}, \] then by successive steps \bqb e^{iw} &=& 2z - e^{-iw}, \\ \left(e^{iw}\right)^2 &=& 2z\left(e^{iw}\right) - 1, \\ e^{iw} &=& z \pm \sqrt{ z^2 - 1 }, \eqb the last step of which has used the quadratic formula~(\ref{alggeo:240:quad}). Taking the logarithm, we have that \[ w = \frac{1}{i}\ln\left(z \pm i\sqrt{ 1 - z^2 }\right); \] or, since by definition $z=\cos w$, that \bq{cexp:250:arccos} \index{arccosine!in complex exponential form} \arccos z = \frac{1}{i}\ln\left(z \pm i\sqrt{ 1 - z^2 }\right). \eq Similarly, \bq{cexp:250:arcsin} \index{arcsine!in complex exponential form} \arcsin z = \frac{1}{i}\ln\left(iz \pm \sqrt{ 1 - z^2 }\right). \eq The arctangent goes only a little differently: \bqb z = \tan w &=& -i\frac{e^{iw}-e^{-iw}}{e^{iw}+e^{-iw}}, \\ ze^{iw}+ze^{-iw} &=& -ie^{iw}+ie^{-iw}, \\ (i+z)e^{iw} &=& (i-z)e^{-iw}, \\ e^{i2w} &=& \frac{i-z}{i+z}, \eqb implying that \bq{cexp:250:arctan} \index{arctangent!in complex exponential form} \arctan z = \frac{1}{i2}\ln\frac{i-z}{i+z}. \eq By the same means, one can work out the inverse hyperbolics to be \bq{cexp:250:archyperbolic} \index{hyperbolic functions!inverse, in complex exponential form} \index{inverse hyperbolic functions!in complex exponential form} \begin{split} \mopx{arccosh} z &= \ln\left(z \pm \sqrt{ z^2 - 1 }\right), \\ \mopx{arcsinh} z &= \ln\left(z \pm \sqrt{ z^2 + 1 }\right), \\ \mopx{arctanh} z &= \frac{1}{2}\ln\frac{1+z}{1-z}. \end{split} \eq % ---------------------------------------------------------------------- \section{Summary of properties} \label{cexp:255} Table~\ref{cexp:tbl-prop} gathers properties of the complex exponential from this chapter and from \S\S~\ref{alggeo:225}, \ref{trig:280} and~\ref{drvtv:240}. \begin{table} \caption{Complex exponential properties.} \label{cexp:tbl-prop} \index{complex exponential!properties of} \settowidth\tla{$\ds\rho_1\rho_2 e^{i(\phi_1 + \phi_2)}$} \bqb i^2 &=& -1 = (-i)^2 \\ \frac{1}{i} &=& -i \\ e^{i\phi} &=& \cos \phi + i\sin \phi \\ e^{iz} &=& \cos z + i\sin z \\ z_1z_2 &=& \makebox[\tla][l]{$\ds\rho_1\rho_2 e^{i(\phi_1 + \phi_2)}$} = (x_1x_2 - y_1y_2) + i(y_1x_2 + x_1y_2) \\ \frac{z_1}{z_2} &=& \makebox[\tla][l]{$\ds\frac{\rho_1}{\rho_2} e^{i(\phi_1 - \phi_2)}$} = \frac{(x_1x_2+y_1y_2)+i(y_1x_2-x_1y_2)}{x_2^2+y_2^2} \\ z^a &=& \rho^a e^{ia\phi} \\ w^z &=& e^{x\ln\sigma-\psi y} e^{i(y\ln\sigma+\psi x)} \\ \ln w &=& \ln\sigma + i\psi \eqb \[ \renewcommand\arraystretch{2.0} \br{rclcrclcrcl} \ds\sin z &\!=\!& \ds\frac{e^{i z}-e^{-i z}}{i2} &\sh{1.0}& \ds\sin iz &\!=\!& \ds i\sinh z &\sh{1.0}& \ds\sinh z &\!=\!& \ds\frac{e^{ z}-e^{- z}}{2} \\ \ds\cos z &\!=\!& \ds\frac{e^{i z}+e^{-i z}}{2} &\sh{1.0}& \ds\cos iz &\!=\!& \ds \cosh z &\sh{1.0}& \ds\cosh z &\!=\!& \ds\frac{e^{ z}+e^{- z}}{2} \\ \ds\tan z &\!=\!& \ds\frac{\sin z}{\cos z} &\sh{1.0}& \ds\tan iz &\!=\!& \ds i\tanh z &\sh{1.0}& \ds\tanh z &\!=\!& \ds\frac{\sinh z}{\cosh z} \er \] \[ \renewcommand\arraystretch{2.0} \br{rclcrcl} \arcsin z &=& \ds\frac{1}{i}\ln\left(iz \pm \sqrt{ 1 - z^2 }\right) &\sh{1.0}& \mopx{arcsinh} z &=& \ds\ln\left(z \pm \sqrt{ z^2 + 1 }\right) \\ \arccos z &=& \ds\frac{1}{i}\ln\left(z \pm i\sqrt{ 1 - z^2 }\right) &\sh{1.0}& \mopx{arccosh} z &=& \ds\ln\left(z \pm \sqrt{ z^2 - 1 }\right) \\ \arctan z &=& \ds\frac{1}{i2}\ln\frac{i-z}{i+z} &\sh{1.0}& \mopx{arctanh} z &=& \ds\frac{1}{2}\ln\frac{1+z}{1-z} \er \] \[ \cos^2 z + \sin^2 z = 1 = \cosh^2 z - \sinh^2 z \] \[ \renewcommand\arraystretch{2.0} \br{rclcrcl} \ds z &\equiv& \ds x+iy = \rho e^{i\phi} &\sh{1.0}& \ds \frac{d}{dz}\exp z &=& \ds \exp z \\ \ds w &\equiv& \ds u+iv = \sigma e^{i\psi} &\sh{1.0}& \ds \frac{d}{dw}\ln w &=& \ds \frac{1}{w} \\ \ds \exp z &\equiv& \ds e^{z} &\sh{1.0}& \ds \frac{df/dz}{f(z)} &=& \ds \frac{d}{dz}\ln f(z) \\ \ds \cis z &\equiv& \ds \cos z + i\sin z = e^{iz} &\sh{1.0}& \ds \log_b w &=& \ds \frac{\ln w}{\ln b} \er \] \end{table} % ---------------------------------------------------------------------- \section{Derivatives of complex exponentials} \label{cexp:320} \index{derivative!of a complex exponential} \index{complex exponential!derivative of} \index{complex exponential!inverse, derivative of} \index{inverse complex exponential!derivative of} This section computes the derivatives of the various trigonometric and inverse trigonometric functions. \subsection{Derivatives of sine and cosine} \label{cexp:320.10} \index{sine!derivative of} \index{cosine!derivative of} \index{derivative!of sine and cosine} One can compute derivatives of the sine and cosine functions from~(\ref{cexp:250:cos}) and~(\ref{cexp:250:sin}), but to do it in that way doesn't seem sporting. Better applied style is to find the derivatives by observing directly the circle from which the sine and cosine functions come. Refer to Fig.~\ref{cexp:320:fig}. Suppose that the point~$z$ in the figure is not fixed but travels steadily about the circle such that \bq{cexp:320:30} z(t) = (\rho)\left[\cos(\omega t+\phi_o) + i\sin(\omega t+\phi_o)\right]. \eq How fast then is the rate $dz/dt$, and in what Argand direction? Answer: \bq{cexp:320:31} \frac{dz}{dt} = (\rho)\left[\frac{d}{dt}\cos(\omega t+\phi_o) + i\frac{d}{dt}\sin(\omega t+\phi_o)\right]. \eq \begin{figure} \caption{The derivatives of the sine and cosine functions.} \label{cexp:320:fig} \bc { \nc\xax{-5} \nc\xbx{-3} \nc\xcx{ 5} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xx{2.5} \nc\xxb{1.0} \nc\xxc{2.0} \nc\xxp{0.8} \nc\xxq{26.565} \nc\xxqq{13.283} \nc\xxr{2.2361} \nc\xxrr{1.30} \nc\xxrrr{2.3} \nc\xxs{2.6} \nc\xxlsep{0.20} \nc\xxa{ \psline[linewidth=0.5pt](-\xx,0)(\xx,0) } \xxa \rput{90}(0,0){\xxa} \psline[linewidth=2.0pt]{cc-*}(0,0)(2,1) \rput(2,1){ \psline[linewidth=2.0pt]{cc->}(0,0)(-0.2,0.4) } \pscircle[linewidth=0.5pt,linestyle=dashed](0,0){\xxr} \rput(2.25,0.90){$z$} \rput(2.10,1.70){$\ds\frac{dz}{dt}$} \psarc[linewidth=0.5pt]{->}(0,0){\xxp}{0}{\xxq} \rput(1.45,0.25){$\omega t+\phi_o$} \rput{\xxq}(0,0){ \uput{\xxlsep}[u](\xxrr,0){ \rput{*0}(0,0){$\rho$} } } \rput[l](\xxs,0){$\Re(z)$} \rput[b](0,\xxs){$\Im(z)$} \end{pspicture} } \ec \end{figure}% Evidently however, considering the figure, \bi \item the speed $\left|dz/dt\right|$ is also $(\rho)(d\phi/dt) = \rho\omega$; \item the direction is at right angles to the arm of~$\rho$, which is to say that $\arg(dz/dt) = \phi + 2\pi/4$. \ei With these observations we can write that % \footnote{ % As an applied mathematician, you really need to be able to see the % truth of this equation merely by staring hard at % Fig.~\ref{cexp:320:fig}. If you cannot see it yet, this is okay; but % stop now and stare hard at the figure until you can. The point % travels about the circle. In what direction? At what speed? Such % reflection can lead one nowhere other than to the expression given for % $dz/dt$. % } \bqa \frac{dz}{dt} &=& (\rho\omega)\left[ \cos\left(\omega t+\phi_o + \frac{2\pi}{4}\right) +i\sin\left(\omega t+\phi_o + \frac{2\pi}{4}\right) \right] \xn\\ &=& (\rho\omega)\left[-\sin(\omega t+\phi_o) + i\cos(\omega t+\phi_o)\right]. \label{cexp:320:32} \eqa Matching the real and imaginary parts of~(\ref{cexp:320:31}) against those of~(\ref{cexp:320:32}), we have that \bq{cexp:320:10} \begin{split} \frac{d}{dt}\cos (\omega t + \phi_o) &= -\omega\sin (\omega t + \phi_o), \\ \frac{d}{dt}\sin (\omega t + \phi_o) &= +\omega\cos (\omega t + \phi_o). \end{split} \eq If $\omega = 1$ and $\phi_o = 0$, these are \bq{cexp:320:12} \begin{split} \frac{d}{dt}\cos t &= -\sin t, \\ \frac{d}{dt}\sin t &= +\cos t. \end{split} \eq \subsection{Derivatives of the trigonometrics} \label{cexp:320.20} Equations~(\ref{cexp:220:dexp}) and~(\ref{cexp:320:12}) give the derivatives of $\exp(\cdot)$, $\sin(\cdot)$ and $\cos(\cdot)$. From these, with the help of~(\ref{cexp:250:pythag}) and the derivative chain and product rules (\S~\ref{drvtv:250}), we can calculate the several derivatives of Table~\ref{cexp:drv}.% \footnote{\cite[back endpaper]{Shenk}} \begin{table} \caption{Derivatives of the trigonometrics.} \label{cexp:drv} \index{trigonometric function!derivative of} \index{natural exponential!derivative of} \index{exponential, natural!derivative of} \index{sine!derivative of} \index{cosine!derivative of} \index{tangent!derivative of} \index{derivative!of a trigonometric} \index{derivative!of the natural exponential} \index{derivative!of sine, cosine and tangent} { \renewcommand\arraystretch{2.0} \bqb \br{rclcrcl} \ds\frac{d}{dz} \exp z &=& \ds+\exp z &\sh{1.0}& \ds\frac{d}{dz} \frac{1}{\exp z} &=& \ds-\frac{1}{\exp z} \\ \ds\frac{d}{dz} \sin z &=& \ds+\cos z &\sh{1.0}& \ds\frac{d}{dz} \frac{1}{\sin z} &=& \ds-\frac{1}{\tan z\sin z} \\ \ds\frac{d}{dz} \cos z &=& \ds-\sin z &\sh{1.0}& \ds\frac{d}{dz} \frac{1}{\cos z} &=& \ds+\frac{\tan z}{\cos z} \er \eqb \bqb \br{rclcl} \ds\frac{d}{dz} \tan z &=& +\left(\ds 1+\tan^2 z\right) &=& \ds+\frac{1}{\cos^2 z} \\ \ds\frac{d}{dz} \frac{1}{\tan z} &=& \ds-\left(1+\frac{1}{\tan^2 z}\right) &=& \ds-\frac{1}{\sin^2 z} \er \eqb \bqb \br{rclcrcl} \ds\frac{d}{dz} \sinh z &=& \ds+\cosh z &\sh{1.0}& \ds\frac{d}{dz} \frac{1}{\sinh z} &=& \ds-\frac{1}{\tanh z\sinh z} \\ \ds\frac{d}{dz} \cosh z &=& \ds+\sinh z &\sh{1.0}& \ds\frac{d}{dz} \frac{1}{\cosh z} &=& \ds-\frac{\tanh z}{\cosh z} \er \eqb \bqb \br{rclcl} \ds\frac{d}{dz} \tanh z &=& \ds 1-\tanh^2 z &=& \ds+\frac{1}{\cosh^2 z} \\ \ds\frac{d}{dz} \frac{1}{\tanh z} &=& \ds 1-\frac{1}{\tanh^2 z} &=& \ds-\frac{1}{\sinh^2 z} \er \eqb } \end{table} \subsection{Derivatives of the inverse trigonometrics} \label{cexp:320.30} Observe the pair \[ \begin{split} \frac{d}{dz}\exp z &= \exp{z}, \\ \frac{d}{dw}\ln w &= \frac{1}{w}. \end{split} \] The natural exponential $\exp z$ belongs to the trigonometric family of functions, as does its derivative. The natural logarithm $\ln w$, by contrast, belongs to the inverse trigonometric family of functions; but its derivative is simpler, not a trigonometric or inverse trigonometric function at all. In Table~\ref{cexp:drv}, one notices that all the trigonometrics have trigonometric derivatives. By analogy with the natural logarithm, do all the inverse trigonometrics have simpler derivatives? It turns out that they do. Refer to the account of the natural logarithm's derivative in \S~\ref{cexp:225}. Following a similar procedure, we have by successive steps that \bqa \arcsin w &=& z, \xn\\ w &=& \sin z, \xn\\ \frac{dw}{dz} &=& \cos z, \xn\\ \frac{dw}{dz} &=& \pm\sqrt{1 - \sin^2 z}, \xn\\ \frac{dw}{dz} &=& \pm\sqrt{1 - w^2}, \xn\\ \frac{dz}{dw} &=& \frac{\pm 1}{\sqrt{1 - w^2}}, \xn\\ \frac{d}{dw} \arcsin w &=& \frac{\pm 1}{\sqrt{1 - w^2}}. \label{cexp:320:20} \eqa Similarly, \bqa \arctan w &=& z, \xn\\ w &=& \tan z, \xn\\ \frac{dw}{dz} &=& 1 + \tan^2 z, \xn\\ \frac{dw}{dz} &=& 1 + w^2, \xn\\ \frac{dz}{dw} &=& \frac{1}{1 + w^2}, \xn\\ \frac{d}{dw} \arctan w &=& \frac{1}{1 + w^2}. \label{cexp:320:22} \eqa Derivatives of the other inverse trigonometrics are found in the same way. Table~\ref{cexp:drvi} summarizes. \begin{table} \caption{Derivatives of the inverse trigonometrics.} \label{cexp:drvi} \index{trigonometric function!inverse, derivative of} \index{natural logarithm!derivative of} \index{logarithm, natural!derivative of} \index{arcsine!derivative of} \index{arccosine!derivative of} \index{arctangent!derivative of} \index{derivative!of an inverse trigonometric} \index{derivative!of the natural logarithm} \index{derivative!of arcsine, arccosine and arctangent} { \renewcommand\arraystretch{2.0} \bqb \br{rcl} \ds\frac{d}{dw} \ln w &=& \ds\frac{1}{w} \\ \ds\frac{d}{dw} \arcsin w &=& \ds\frac{\pm 1}{\sqrt{1-w^2}} \\ \ds\frac{d}{dw} \arccos w &=& \ds\frac{\mp 1}{\sqrt{1-w^2}} \\ \ds\frac{d}{dw} \arctan w &=& \ds\frac{1}{1+w^2} \\ \ds\frac{d}{dw} \mopx{arcsinh} w &=& \ds\frac{\pm 1}{\sqrt{w^2+1}} \\ \ds\frac{d}{dw} \mopx{arccosh} w &=& \ds\frac{\pm 1}{\sqrt{w^2-1}} \\ \ds\frac{d}{dw} \mopx{arctanh} w &=& \ds\frac{1}{1-w^2} \er \eqb } \end{table} % --------------------------------------------------------------------- \section{The actuality of complex quantities} \label{cexp:260} \index{complex number!actuality of} \index{number!complex, actuality of} Doing all this neat complex math, the applied mathematician can lose sight of some questions he probably ought to keep in mind: Is there really such a thing as a complex quantity in nature? If not, then hadn't we better avoid these complex quantities, leaving them to the professional mathematical theorists? \index{Heaviside, Oliver (1850--1925)} As developed by Oliver Heaviside in 1887,% \footnote{\cite{mathbios}} the answer depends on your point of view. If I have~$300$~g of grapes and~$100$~g of grapes, then I have~$400$~g altogether. Alternately, if I have~$500$~g of grapes and~$-100$~g of grapes, again I have~$400$~g altogether. (What does it mean to have~$-100$~g of grapes? Maybe that I ate some!) But what if I have~$200+i100$~g of grapes and~$200-i100$~g of grapes? Answer: again,~$400$~g. \index{wave!propagating} \index{wave!complex} \index{conjugate} \index{superposition} \index{mirror} \index{handwriting, reflected} \index{grapes} Probably you would not choose to think of~$200+i100$~g of grapes and~$200-i100$~g of grapes, but because of~(\ref{cexp:250:cos}) and~(\ref{cexp:250:sin}), one often describes wave phenomena as linear superpositions (sums) of countervailing complex exponentials. Consider for instance the propagating wave \[ A\cos[\omega t-kz] = \frac{A}{2}\exp[+i(\omega t-kz)] + \frac{A}{2}\exp[-i(\omega t-kz)]. \] The benefit of splitting the real cosine into two complex parts is that while the magnitude of the cosine changes with time~$t$, the magnitude of either exponential alone remains steady (see the circle in Fig.~\ref{cexp:230:fig}). It turns out to be much easier to analyze two complex wave quantities of constant magnitude than to analyze one real wave quantity of varying magnitude. Better yet, since each complex wave quantity is the complex conjugate of the other, the analyses thereof are mutually conjugate, too; so you normally needn't actually analyze the second. The one analysis suffices for both.% \footnote{ If the point is not immediately clear, an example: Suppose that by the Newton-Raphson iteration (\S~\ref{drvtv:270}) you have found a root of the polynomial $x^3 + 2x^2 + 3x + 4$ at $x \approx -\mbox{0x0.2D}+i\mbox{0x1.8C}$. Where is there another root? Answer: at the complex conjugate, $x \approx -\mbox{0x0.2D}-i\mbox{0x1.8C}$. One need not actually run the Newton-Raphson again to find the conjugate root. } % The following analogy admittedly remains a little silly, but it stays % here unless and until the author thinks of something better. (It's like reflecting your sister's handwriting. To read her handwriting backward, you needn't ask her to try writing reverse with the wrong hand; you can just hold her regular script up to a mirror. Of course, this ignores the question of why one would want to reflect someone's handwriting in the first place; but anyway, reflecting---which is to say, conjugating---complex quantities often is useful.) \index{Ockham's razor!abusing} \index{Ockham!William of (c.~1287--1347)} \index{Aristotle (384--322~B.C.)} Some authors have gently denigrated the use of imaginary parts in physical applications as a mere mathematical trick, as though the parts were not actually there. Well, that is one way to treat the matter, but it is not the way this book recommends. Nothing in the mathematics \emph{requires} you to regard the imaginary parts as physically nonexistent. You need not abuse Ockham's razor! % diagn: the following parenthetical note wants review. (Ockham's razor, ``Do not multiply objects without necessity,''% \footnote{ \cite[Ch.~12]{Stroustrup} } is not a bad philosophical indicator as far as it goes, but is overused in some circles---particularly in circles in which Aristotle% \footnote{\cite{Feser}} is mistakenly believed to be vaguely outdated. More often than one likes to believe, the necessity to multiply objects remains hidden until one has ventured the multiplication, nor reveals itself to the one who wields the razor, whose hand humility should stay.) It is true by Euler's formula~(\ref{cexp:euler}) that a complex exponential $\exp i\phi$ can be decomposed into a sum of trigonometrics. However, it is equally true by the complex trigonometric formulas~(\ref{cexp:250:cos}) and~(\ref{cexp:250:sin}) that \emph{a trigonometric can be decomposed into a sum of complex exponentials.} So, if each can be decomposed into the other, then which of the two is the real decomposition? Answer: it depends on your point of view. Experience seems to recommend viewing the complex exponential as the basic element---as the element of which the trigonometrics are composed---rather than the other way around. From this point of view, it is~(\ref{cexp:250:cos}) and~(\ref{cexp:250:sin}) which are the real decomposition. Euler's formula itself is secondary. % diagn: review this revised paragraph one more time. The complex exponential method of offsetting imaginary parts offers an elegant yet practical mathematical means to model physical wave phenomena. So go ahead: regard the imaginary parts as actual. Aristotle would regard them so (or so the author suspects). %\footnote{ % A famous English-language physics book of the twentieth century, which % this particular footnote will not name but which was and remains % otherwise an excellent book, has unsubtly insinuated among a % generation of scientists and engineers an impertinent contempt for % philosophy as such. Oddly, the book in question, whose genial, late % author wore Nobel laurels, seems never to denigrate ``teleology'' or % even ``metaphysics'' by name, only ``philosophy''---or % ``philosophers''---by a curiously vague terminological inexactitude. % The book in question seems to delight in the burning of philosophical % straw men in any case. Be that as it may, the author of the book you % are \emph{now} reading cordially dissents from the insinuation and, % for what it might be worth among philosophers, inclines toward % Aristotle rather than toward William of Ockham. %} To regard the imaginary parts as actual hurts nothing, and it helps with the math. derivations-0.53.20120414.orig/tex/mtxinv.tex0000644000000000000000000031323011742566274017210 0ustar rootroot% ---------------------------------------------------------------------- \chapter{Inversion and orthonormalization} \label{mtxinv} \index{inversion} \index{orthonormalization} \index{matrix!inversion of} The undeniably tedious Chs.~\ref{matrix} and~\ref{gjrank} have piled the matrix theory deep while affording scant practical reward. Building upon the two tedious chapters, this chapter brings the first rewarding matrix work. \index{solution} One might be forgiven for forgetting after so many pages of abstract theory that the matrix afforded any reward or had any use at all. Uses however it has. Sections~\ref{matrix:120.10} and~\ref{gjrank:340.24} have already broached% \footnote{ The reader who has skipped Ch.~\ref{gjrank} might at least review \S~\ref{gjrank:340.24}. } the matrix's most basic use, the primary subject of this chapter, to represent a system of~$m$ linear scalar equations in~$n$ unknowns neatly as \[ A \ve x = \ve b \] and to solve the whole system at once by inverting the matrix~$A$ that characterizes it. Now, before we go on, we want to confess that such a use alone, on the surface of it---though interesting---might not have justified the whole uncomfortable bulk of Chs.~\ref{matrix} and~\ref{gjrank}. We already knew how to solve a simultaneous system of linear scalar equations in principle without recourse to the formality of a matrix, after all, as in the last step to derive~(\ref{trig:250:20}) as far back as Ch.~\ref{trig}. Why should we have suffered two bulky chapters, if only to prepare to do here something we already knew how to do? The question is a fair one, but admits at least four answers. First, the matrix neatly solves a linear system not only for a particular driving vector~$\ve b$ but for all possible driving vectors~$\ve b$ at one stroke, as this chapter explains. Second and yet more impressively, the matrix allows \S~\ref{mtxinv:320} to introduce the \emph{pseudoinverse} to approximate the solution to an unsolvable linear system and, moreover, to do so both optimally and efficiently, whereas such overdetermined systems arise commonly in applications. Third, to solve the linear system neatly is only the primary and most straightforward use of the matrix, not its only use: the even more interesting eigenvalue and its incidents await Ch.~\ref{eigen}. Fourth, specific applications aside, one should never underestimate the blunt practical benefit of reducing an arbitrarily large grid of scalars to a single symbol~$A$, which one can then manipulate by known algebraic rules. Most students first learning the matrix have wondered at this stage whether it were worth all the tedium; so, if the reader now wonders, then he stands in good company. The matrix finally begins to show its worth here. The chapter opens in \S~\ref{mtxinv:220} by inverting the square matrix to solve the exactly determined, $n \times n$ linear system in \S~\ref{mtxinv:230}. It continues in \S~\ref{mtxinv:245} by computing the rectangular matrix's kernel to solve the nonoverdetermined, $m \times n$ linear system in \S~\ref{mtxinv:240}. In \S~\ref{mtxinv:320}, it brings forth the aforementioned pseudoinverse, which rightly approximates the solution to the unsolvable overdetermined linear system. After briefly revisiting the Newton-Raphson iteration in \S~\ref{mtxinv:420}, it concludes by introducing the concept and practice of vector orthonormalization in \S\S~\ref{mtxinv:445} through~\ref{mtxinv:465}. % ---------------------------------------------------------------------- \section{Inverting the square matrix} \label{mtxinv:220} \index{matrix!square} \index{square matrix} \index{matrix!inversion of} \index{inversion} \index{inverse!rank-$r$} \index{rank-$r$ inverse} Consider an $n \times n$ square matrix~$A$ of full rank $r=n$. Suppose that extended operators~$G_>$, $G_<$, $G_>^{-1}$ and~$G_<^{-1}$ can be found, each with an $n \times n$ active region (\S~\ref{matrix:180.35}), such that% \footnote{ The symbology and associated terminology might disorient a reader who had skipped Chs.~\ref{matrix} and~\ref{gjrank}. In this book, the symbol~$I$ theoretically represents an $\infty \times \infty$ identity matrix. Outside the $m \times m$ or $n \times n$ square, the operators~$G_>$ and~$G_<$ each resemble the $\infty \times \infty$ identity matrix~$I$, which means that the operators affect respectively only the first~$m$ rows or~$n$ columns of the thing they operate on. (In the present section it happens that $m=n$ because the matrix~$A$ of interest is square, but this footnote uses both symbols because generally $m \neq n$.) The symbol~$I_r$ contrarily represents an identity matrix of only~$r$ ones, though it too can be viewed as an $\infty \times \infty$ matrix with zeros in the unused regions. If interpreted as an $\infty \times \infty$ matrix, the matrix~$A$ of the $m \times n$ system $A\ve x = \ve b$ has nonzero content only within the $m \times n$ rectangle. None of this is complicated, really. Its purpose is merely to separate the essential features of a reversible operation like~$G_>$ or~$G_<$ from the dimensionality of the vector or matrix on which the operation happens to operate. The definitions do however necessarily, slightly diverge from definitions the reader may have been used to seeing in other books. In this book, one can legally multiply any two matrices, because all matrices are theoretically $\infty \times \infty$, anyway (though whether it makes any sense in a given circumstance to multiply mismatched matrices is another question; sometimes it does make sense, as in eqns.~\ref{mtxinv:240:50} and~\ref{eigen:600:SVD}, but more often it does not---which naturally is why the other books tend to forbid such multiplication). To the extent that the definitions confuse, the reader might briefly review the earlier chapters, especially \S~\ref{matrix:180}. } \bq{mtxinv:220:10} \begin{split} G_>^{-1}G_> &= I = G_>G_>^{-1}, \\ G_<^{-1}G_< &= I = G_I_nG_<. \end{split} \eq Observing from~(\ref{matrix:180:35}) that \[ \renewcommand\arraystretch{1.3} \setlength\arraycolsep{0.50\arraycolsep} % normally would be 0.30 \br{rcccl} I_nA &=& A &=& AI_n, \\ I_nG_<^{-1}G_>^{-1} &=& G_<^{-1}I_nG_>^{-1} &=& G_<^{-1}G_>^{-1}I_n, \er \] we find by successive steps that \bqb A &=& G_>I_nG_<, \\ I_nA &=& G_>G_^{-1}I_nA &=& I_n, \\ (G_<^{-1}I_nG_>^{-1})(A) &=& I_n; \eqb or alternately that \bqb A &=& G_>I_nG_<, \\ AI_n &=& I_nG_>G_<, \\ AI_nG_<^{-1}G_>^{-1} &=& I_n, \\ (A)(G_<^{-1}I_nG_>^{-1}) &=& I_n. \eqb Either way, we have that \bq{mtxinv:220:20} \begin{split} A^{-1}A &= I_n = AA^{-1}, \\ A^{-1} &\equiv G_<^{-1}I_nG_>^{-1}. \end{split} \eq Of course, for this to work,~$G_>$, $G_<$, $G_>^{-1}$ and~$G_<^{-1}$ must exist, be known and honor $n \times n$ active regions, which might seem a practical hurdle. However,~(\ref{gjrank:341:GJ}), (\ref{gjrank:341:GJinv}) and the body of \S~\ref{gjrank:341} have shown exactly how to find just such a~$G_>$, $G_<$, $G_>^{-1}$ and~$G_<^{-1}$ for any square matrix~$A$ of full rank, without exception; so, there is no trouble here. The factors do exist, and indeed we know how to find them. Equation~(\ref{mtxinv:220:20}) features the important matrix~$A^{-1}$, the \emph{rank-$n$ inverse} of~$A$. We have not yet much studied the rank-$n$ inverse, but we have defined it in~(\ref{matrix:321:20}), where we gave it the fuller, nonstandard notation $A^{-1(n)}$. When naming the rank-$n$ inverse in words one usually says simply, ``the inverse,'' because the rank is implied by the size of the square active region of the matrix inverted; but the rank-$n$ inverse from~(\ref{matrix:321:20}) is not quite the infinite-dimensional inverse from~(\ref{matrix:321:10}), which is what~$G_>^{-1}$ and~$G_<^{-1}$ are. According to~(\ref{mtxinv:220:20}), the product of~$A^{-1}$ and~$A$---or, written more fully, the product of $A^{-1(n)}$ and~$A$---is, not~$I$, but~$I_n$. \index{inverse!existence of} \index{inverse!uniqueness of} \index{inverse!mutual} \index{noninvertibility} \index{reciprocal pair} Properties that emerge from~(\ref{mtxinv:220:20}) include the following. \bi \item Like~$A$, the rank-$n$ inverse~$A^{-1}$ (more fully written $A^{-1(n)}$) too is an $n \times n$ square matrix of full rank $r=n$. \item Since~$A$ is square and has full rank (\S~\ref{gjrank:340.25}), its rows and, separately, its columns are linearly independent, so it has only the one, unique inverse~$A^{-1}$. No other rank-$n$ inverse of~$A$ exists. \item On the other hand, inasmuch as~$A$ is square and has full rank, it does per~(\ref{mtxinv:220:20}) indeed have an inverse~$A^{-1}$. The rank-$n$ inverse exists. \item If $B=A^{-1}$ then $B^{-1}=A$. That is,~$A$ is itself the rank-$n$ inverse of~$A^{-1}$. The matrices~$A$ and~$A^{-1}$ thus form an exclusive, reciprocal pair. \item If~$B$ is an $n \times n$ square matrix and either $BA = I_n$ or $AB = I_n$, then both equalities in fact hold; thus, $B = A^{-1}$. One can have neither equality without the other. \item Only a square, $n \times n$ matrix of full rank $r=n$ has a rank-$n$ inverse. A matrix~$A'$ which is not square, or whose rank falls short of a full $r=n$, is not invertible in the rank-$n$ sense of~(\ref{mtxinv:220:20}). \ei That~$A^{-1}$ is an $n \times n$ square matrix of full rank and that~$A$ is itself the inverse of~$A^{-1}$ proceed from the definition~(\ref{mtxinv:220:20}) of~$A^{-1}$ plus \S~\ref{gjrank:340.20}'s finding that reversible operations like~$G_>^{-1}$ and~$G_<^{-1}$ cannot change $I_n$'s rank. That the inverse exists is plain, inasmuch as the Gauss-Jordan decomposition plus~(\ref{mtxinv:220:20}) reliably calculate it. That the inverse is unique begins from \S~\ref{gjrank:340.25}'s observation that the columns (like the rows) of~$A$ are linearly independent because~$A$ is square and has full rank. From this beginning and the fact that $I_n = AA^{-1}$, it follows that $[A^{-1}]_{{*}1}$ represents% \footnote{ The notation $[A^{-1}]_{{*}j}$ means ``the $j$th column of~$A^{-1}$.'' Refer to \S~\ref{matrix:120.27}. } the one and only possible combination of $A$'s columns which achieves~$\ve e_1$, that $[A^{-1}]_{{*}2}$ represents the one and only possible combination of $A$'s columns which achieves~$\ve e_2$, and so on through~$\ve e_n$. One could observe likewise respecting the independent rows of~$A$. Either way,~$A^{-1}$ is unique. Moreover, no other $n \times n$ matrix $B \neq A^{-1}$ satisfies \emph{either} requirement of~(\ref{mtxinv:220:20})---that $BA = I_n$ or that $AB = I_n$---much less both. \index{uniqueness} It is not claimed that the matrix factors~$G_>$ and~$G_<$ themselves are unique, incidentally. On the contrary, many different pairs of matrix factors~$G_>$ and~$G_<$ can yield $A = G_>I_nG_<$, no less than that many different pairs of scalar factors~$\gamma_>$ and~$\gamma_<$ can yield $\alpha = \gamma_>1\gamma_<$. Though the Gauss-Jordan decomposition is a convenient means to~$G_>$ and~$G_<$, it is hardly the only means, and any proper~$G_>$ and~$G_<$ found by any means will serve so long as they satisfy~(\ref{mtxinv:220:10}). What are unique are not the factors but the~$A$ and~$A^{-1}$ they produce. \index{square matrix!degenerate} \index{matrix!degenerate} \index{matrix!singular} \index{degenerate matrix} \index{singular matrix} \index{reciprocal} What of the degenerate $n \times n$ square matrix~$A'$, of rank $rI_rKS\ve x &=& \ve b, \\ I_rKS\ve x &=& G_>^{-1}\ve b, \\ I_r(K-I)S\ve x + I_rS\ve x &=& G_>^{-1}\ve b. \eqb Applying an identity from Table~\ref{gjrank:341:t34} on page~\pageref{gjrank:341:t34}, \[ I_rK(I_n-I_r)S\ve x + I_rS\ve x = G_>^{-1}\ve b. \] Rearranging terms, \bq{mtxinv:245:30} I_rS\ve x = G_>^{-1}\ve b - I_rK(I_n-I_r)S\ve x. \eq \index{free element} \index{dependent element} \index{element!free and dependent} \index{driving vector} \index{vector!driving} Equation~(\ref{mtxinv:245:30}) is interesting. It has~$S\ve x$ on both sides, where~$S\ve x$ is the vector~$\ve x$ with elements reordered in some particular way. The equation has however on the left only~$I_rS\ve x$, which is the first~$r$ elements of~$S\ve x$; and on the right only $(I_n-I_r)S\ve x$, which is the remaining $n-r$ elements.% \footnote{ Notice how we now associate the factor $(I_n-I_r)$ rightward as a row truncator, though it had first entered acting leftward as a column truncator. The flexibility to reassociate operators in such a way is one of many good reasons Chs.~\ref{matrix} and~\ref{gjrank} have gone to such considerable trouble to develop the basic theory of the matrix. } No element of~$S\ve x$ appears on both sides. Naturally this is no accident; we have (probably after some trial and error not recorded here) planned the steps leading to~(\ref{mtxinv:245:30}) to achieve precisely this effect. Equation~(\ref{mtxinv:245:30}) implies \emph{that one can choose the last $n-r$ elements of~$S\ve x$ freely, but that the choice then determines the first~$r$ elements.} \index{vector space} \index{column} \index{matrix!column of} \index{domain} \index{range} The implication is significant. To express the implication more clearly we can rewrite~(\ref{mtxinv:245:30}) in the improved form \bq{mtxinv:245:32} \begin{split} \ve f &= G_>^{-1}\ve b - I_rKH_r\ve a, \\ S\ve x &= \left[\br{c}\ve f\\\ve a\er\right] = \ve f + H_r\ve a, \\ \ve f &\equiv I_rS\ve x, \\ \ve a &\equiv H_{-r}(I_n-I_r)S\ve x, \end{split} \eq where~$\ve a$ represents the $n-r$ free elements of~$S\ve x$ and~$\ve f$ represents the~$r$ dependent elements. This makes~$\ve f$ and thereby also~$\ve x$ functions of the free parameter~$\ve a$ and the driving vector~$\ve b$: \bq{mtxinv:245:49} \begin{split} \ve f(\ve a, \ve b) &= G_>^{-1}\ve b - I_rKH_r\ve a, \\ S\ve x(\ve a, \ve b) &= \left[\br{c}\ve f(\ve a, \ve b)\\\ve a\er\right] = \ve f(\ve a, \ve b) + H_r\ve a. \end{split} \eq If $\ve b=0$ as~(\ref{mtxinv:245:05}) requires, then \[ \begin{split} \ve f(\ve a, 0) &= - I_rKH_r\ve a, \\ S\ve x(\ve a, 0) &= \left[\br{c}\ve f(\ve a, 0)\\\ve a\er\right] = \ve f(\ve a, 0) + H_r\ve a. \end{split} \] Substituting the first line into the second, \bq{mtxinv:245:48} S\ve x(\ve a, 0) = (I - I_rK)H_r\ve a. \eq In the event that $\ve a = \ve e_j$, where $1 \le j \le n-r$, \[ S\ve x(\ve e_j, 0) = (I - I_rK)H_r\ve e_j. \] For all the~$\ve e_j$ at once, \[ S\ve x(I_{n-r}, 0) = (I - I_rK)H_rI_{n-r}. \] But if all the~$\ve e_j$ at once, the columns of~$I_{n-r}$, exactly address the domain of~$\ve a$, then the columns of $\ve x(I_{n-r}, 0)$ likewise exactly address the range of $\ve x(\ve a, 0)$. Equation~(\ref{mtxinv:245:10}) has already named this range~$A^K$, by which% \footnote{ These are difficult steps. How does one justify replacing~$\ve a$ by~$\ve e_j$, then~$\ve e_j$ by~$I_{n-r}$, then~$\ve x$ by~$A^K$? One justifies them in that the columns of~$I_{n-r}$ are the several~$\ve e_j$, of which any $(n-r)$-element vector~$\ve a$ can be constructed as the linear combination \[ \ve a = I_{n-r}\ve a = [ \br{ccccc} \ve e_1 & \ve e_2 & \ve e_3 & \cdots & \ve e_{n-r} \er ] \ve a = \sum_{j=1}^{n-r} a_j \ve e_j \] weighted by the elements of~$\ve a$. Seen from one perspective, this seems trivial; from another perspective, baffling; until one grasps what is really going on here. The idea is that if we can solve the problem for each elementary vector~$\ve e_j$---that is, in aggregate, if we can solve the problem for the identity matrix~$I_{n-r}$---then we shall implicitly have solved it for every~$\ve a$ because~$\ve a$ is a weighted combination of the~$\ve e_j$ and the whole problem is linear. The solution \[ \ve x = A^K\ve a \] for a given choice of~$\ve a$ becomes a weighted combination of the solutions for each~$\ve e_j$, with the elements of~$\ve a$ again as the weights. And what are the solutions for each~$\ve e_j$? Answer: the corresponding columns of~$A^K$, which by definition are the independent values of~$\ve x$ that cause $\ve b = 0$. } \bq{mtxinv:245:41} SA^K = (I-I_rK)H_rI_{n-r}. \eq Left-multiplying by $S^{-1}=S^{*}=S^{T}$ produces the alternate kernel formula \bq{mtxinv:245:42} A^K = S^{-1}(I-I_rK)H_rI_{n-r}. \eq \index{schematic} \index{pencil} \index{hypothesis} \index{kernel!alternate formula for} The alternate kernel formula~(\ref{mtxinv:245:42}) is correct but not as simple as it could be. By the identity~(\ref{matrix:340:61}), eqn.~(\ref{mtxinv:245:41}) is \bqa SA^K &=& (I-I_rK)(I_n-I_r)H_r \xn\\ &=& [(I_n-I_r) - I_rK(I_n-I_r)]H_r \xn\\ &=& [(I_n-I_r) - (K-I)]H_r, \label{mtxinv:245:33} \eqa where we have used Table~\ref{gjrank:341:t34} again in the last step. % How to proceed symbolically from~(\ref{mtxinv:245:33}) is not obvious, but if one sketches the matrices of~(\ref{mtxinv:245:33}) schematically with a pencil, and if one remembers that~$K^{-1}$ is just~$K$ with elements off the main diagonal negated, then it appears that \bq{mtxinv:245:34} SA^K = K^{-1}H_rI_{n-r}. \eq The appearance is not entirely convincing,% \footnote{ Well, no, actually, the appearance pretty much is entirely convincing, but let us finish the proof symbolically nonetheless. } but~(\ref{mtxinv:245:34}) though unproven still helps because it posits a hypothesis toward which to target the analysis. Two variations on the identities of Table~\ref{gjrank:341:t34} also help. First, from the identity that \[ \frac{K+K^{-1}}{2} = I, \] we have that \bq{mtxinv:245:35} K-I = I-K^{-1}. \eq Second, right-multiplying by~$I_r$ the identity that \[ I_rK^{-1}(I_n-I_r) = K^{-1}-I \] and canceling terms, we have that \bq{mtxinv:245:36} K^{-1}I_r = I_r \eq (which actually is pretty obvious if you think about it, since all of $K$'s interesting content lies by construction right of its $r$th column). Now we have enough to go on with. Substituting~(\ref{mtxinv:245:35}) and~(\ref{mtxinv:245:36}) into~(\ref{mtxinv:245:33}) yields \[ SA^K = [(I_n-K^{-1}I_r) - (I-K^{-1})]H_r. \] Adding $0=K^{-1}I_nH_r-K^{-1}I_nH_r$ and rearranging terms, \[ SA^K = K^{-1}(I_n-I_r)H_r + [K^{-1} - K^{-1}I_n - I + I_n]H_r. \] Factoring, \[ SA^K = K^{-1}(I_n-I_r)H_r + [(K^{-1}-I)(I-I_n)]H_r. \] According to Table~\ref{gjrank:341:t34}, the quantity in square brackets is zero, so \[ SA^K = K^{-1}(I_n-I_r)H_r, \] which, considering that the identity~(\ref{matrix:340:61}) has that $(I_n-I_r)H_r = H_rI_{n-r}$, proves~(\ref{mtxinv:245:34}). The final step is to left-multiply~(\ref{mtxinv:245:34}) by $S^{-1}=S^{*}=S^{T}$, reaching~(\ref{mtxinv:245:kernel}) that was to be derived. \index{vector space!address of} \index{kernel space} One would like to feel sure that the columns of~(\ref{mtxinv:245:kernel})'s $A^K$ actually addressed the whole kernel space of~$A$ rather than only part. One would further like to feel sure that~$A^K$ had no redundant columns; that is, that it had full rank. Moreover, the definition of~$A^K$ in the section's introduction demands both of these features. In general such features would be hard to establish, but here the factors conveniently are Gauss-Jordan factors. Regarding the whole kernel space,~$A^K$ addresses it because~$A^K$ comes from all~$\ve a$. Regarding redundancy,~$A^K$ lacks it because~$SA^K$ lacks it, and~$SA^K$ lacks it because according to~(\ref{mtxinv:245:41}) the last rows of~$SA^K$ are~$H_rI_{n-r}$. So, in fact,~(\ref{mtxinv:245:kernel}) has both features and does fit the definition. \subsection{Converting between kernel matrices} \label{mtxinv:245.20} \index{kernel matrix!converting between two of} \index{vector!replacement of} If~$C$ is a reversible $(n-r) \times (n-r)$ operator by which we right-multiply~(\ref{mtxinv:245:10}), then the matrix \bq{mtxinv:245:17} A'^K = A^K C \eq like~$A^K$ evidently represents the kernel of~$A$: \[ AA'^K = A (A^K C) = (AA^K) C = 0. \] Indeed this makes sense: because the columns of~$A^K C$ address the same space the columns of~$A^K$ address, the two matrices necessarily represent the same underlying kernel. Moreover, \emph{some}~$C$ exists to convert~$A^K$ into every alternate kernel matrix~$A'^K$ of~$A$. We know this because \S~\ref{gjrank:338} lets one replace the columns of~$A^K$ with those of~$A'^K$, reversibly, one column at a time, without altering the space addressed. (It might not let one replace the columns in sequence, but if out of sequence then a reversible permutation at the end corrects the order. Refer to \S\S~\ref{gjrank:340.05} and~\ref{gjrank:340.10} for the pattern by which this is done.) The orthonormalizing column operator~$R^{-1}$ of~(\ref{mtxinv:QR}) below incidentally tends to make a good choice for~$C$. \subsection{The degree of freedom} \label{mtxinv:245.70} \index{freedom, degree of} \index{degree of freedom} A slightly vague but extraordinarily useful concept has emerged in this section, worth pausing briefly to appreciate. The concept is the concept of the \emph{degree of freedom.} \index{artillerist} \index{cannon} \index{gunpowder} \index{battlefield} \index{azimuth} \index{elevation} \index{north} \index{south} \index{east} \index{west} \index{horse} \index{battle} \index{battlefield} \index{Napoleon} A degree of freedom is a parameter one remains free to determine within some continuous domain. For example, Napoleon's artillerist% \footnote{ The author, who has never fired an artillery piece (unless an arrow from a Boy Scout bow counts), invites any real artillerist among the readership to write in to improve the example. } might have enjoyed as many as six degrees of freedom in firing a cannonball: two in where he chose to set up his cannon (one degree in north-south position, one in east-west); two in aim (azimuth and elevation); one in muzzle velocity (as governed by the quantity of gunpowder used to propel the ball); and one in time. A seventh potential degree of freedom, the height from which the artillerist fires, is of course restricted by the lay of the land: the artillerist can fire from a high place only if the place he has chosen to fire from happens to be up on a hill, for Napoleon had no flying cannon. Yet even among the six remaining degrees of freedom, the artillerist might find some impractical to exercise. The artillerist probably preloads the cannon always with a standard charge of gunpowder because, when he finds his target in the field, he cannot spare the time to unload the cannon and alter the charge: this costs one degree of freedom. Likewise, the artillerist must limber up the cannon and hitch it to a horse to shift it to better ground; for this too he cannot spare time in the heat of battle: this costs two degrees. And Napoleon might yell, ``Fire!'' canceling the time degree as well. Two degrees of freedom remain to the artillerist; but, since exactly two degrees are needed to hit some particular target on the battlefield, the two are enough. \index{carriage wheel} Now consider what happens if the artillerist loses one of his last two remaining degrees of freedom. Maybe the cannon's carriage wheel is broken and the artillerist can no longer turn the cannon; that is, he can still choose firing elevation but no longer azimuth. In such a strait to hit some particular target on the battlefield, the artillerist needs somehow to recover another degree of freedom, for he needs two but has only one. If he disregards Napoleon's order, ``Fire!'' (maybe not a wise thing to do, but, anyway, \ldots)\ % and waits for the target to traverse the cannon's fixed line of fire, then he can still hope to hit even with the broken carriage wheel; for could he choose neither azimuth nor the moment to fire, then he would almost surely miss. \index{nonlinearity} Some apparent degrees of freedom are not real. For example, muzzle velocity gives the artillerist little control firing elevation does not also give. Other degrees of freedom are nonlinear in effect: a certain firing elevation gives maximum range; nearer targets can be hit by firing either higher or lower at the artillerist's discretion. On the other hand, too much gunpowder might break the cannon. \index{engineer} \index{aeronautical engineer} \index{aileron} \index{rudder} \index{control surface, aeronautical} \index{pilot} \index{control} \index{tuning} \index{electrical engineer} \index{circuit} \index{electric circuit} \index{potentiometer} \index{technician} All of this is hard to generalize in unambiguous mathematical terms, but the count of the degrees of freedom in a system is of high conceptual importance to the engineer nonetheless. Basically, the count captures the idea that to control~$n$ output variables of some system takes at least~$n$ independent input variables. The~$n$ may possibly for various reasons still not suffice---it might be wise in some cases to allow $n+1$ or $n+2$---but in no event will fewer than~$n$ do. Engineers of all kinds think in this way: an aeronautical engineer knows in advance that an airplane needs at least~$n$ ailerons, rudders and other control surfaces for the pilot adequately to control the airplane; an electrical engineer knows in advance that a circuit needs at least~$n$ potentiometers for the technician adequately to tune the circuit; and so on. \index{up} \index{down} \index{mountain road} \index{road!mountain} \index{point} \index{line} \index{plane} \index{swimming pool} \index{city street} In geometry, a line brings a single degree of freedom. A plane brings two. A point brings none. If the line bends and turns like a mountain road, it still brings a single degree of freedom. And if the road reaches an intersection? Answer: still one degree. A degree of freedom has some continuous nature, not merely a discrete choice to turn left or right. On the other hand, a swimmer in a swimming pool enjoys three degrees of freedom (up-down, north-south, east-west) even though his domain in any of the three is limited to the small volume of the pool. The driver on the mountain road cannot claim a second degree of freedom at the mountain intersection (he can indeed claim a choice, but the choice being discrete lacks the proper character of a degree of freedom), but he might plausibly claim a second degree of freedom upon reaching the city, where the web or grid of streets is dense enough to approximate access to any point on the city's surface. Just how many streets it takes to turn the driver's ``line'' experience into a ``plane'' experience is a matter for the mathematician's discretion. Reviewing~(\ref{mtxinv:245:49}), we find $n-r$ degrees of freedom in the general underdetermined linear system, represented by the $n-r$ free elements of~$\ve a$. If the underdetermined system is not also overdetermined, if it is nondegenerate such that $r=m$, then it is guaranteed to have a family of solutions~$\ve x$. This family is the topic of the next section. % ---------------------------------------------------------------------- \section{The nonoverdetermined linear system} \label{mtxinv:240} \index{linear system!nonoverdetermined} \index{nonoverdetermined linear system} \index{matrix!broad} \index{solution!family of} \index{family of solutions} The exactly determined linear system of \S~\ref{mtxinv:230} is common, but also common is the more general, nonoverdetermined linear system \bq{mtxinv:240:05} A \ve x = \ve b, \eq in which~$\ve b$ is a known, $m$-element vector;~$\ve x$ is an unknown, $n$-element vector; and~$A$ is a square or broad, $m \times n$ matrix of full row rank (\S~\ref{gjrank:340.25}) \bq{mtxinv:240:07} r = m \le n. \eq Except in the exactly determined edge case $r=m=n$ of \S~\ref{mtxinv:230}, the nonoverdetermined linear system has no unique solution but rather a family of solutions. This section delineates the family. \subsection{Particular and homogeneous solutions} \label{mtxinv:240.20} \index{solution!particular and homogeneous} \index{particular solution} \index{homogeneous solution} \index{driving vector} \index{vector, driving} \index{split form} The nonoverdetermined linear system~(\ref{mtxinv:240:05}) by definition admits more than one solution~$\ve x$ for a given driving vector~$\ve b$. Such a system is hard to solve all at once, though, so we prefer to split the system as \bq{mtxinv:240:20} \begin{split} A \ve x_1 &= \ve b, \\ A (A^K \ve a) &= 0, \\ \ve x &= \ve x_1 + A^K \ve a, \end{split} \eq which, when the second line is added to the first and the third is substituted, makes the whole form~(\ref{mtxinv:240:05}). Splitting the system does not change it, but it does let us treat the system's first and second lines in~(\ref{mtxinv:240:20}) separately. In the split form, the symbol~$\ve x_1$ represents any one $n$-element vector that happens to satisfy the form's first line---many are possible; the mathematician just picks one---and is called \emph{a particular solution} of~(\ref{mtxinv:240:05}). The $(n-r)$-element vector~$\ve a$ remains unspecified, whereupon~$A^K \ve a$ represents the complete family of $n$-element vectors that satisfy the form's second line. The family of vectors expressible as~$A^K \ve a$ is called \emph{the homogeneous solution} of~(\ref{mtxinv:240:05}). \index{articles ``a'' and ``the''} Notice the italicized articles \emph{a} and \emph{the.} The Gauss-Jordan kernel formula~(\ref{mtxinv:245:kernel}) has given us~$A^K$ and thereby the homogeneous solution, which renders the analysis of~(\ref{mtxinv:240:05}) already half done. To complete the analysis, it remains in \S~\ref{mtxinv:240.30} to find a particular solution. \subsection{A particular solution} \label{mtxinv:240.30} \index{solution!particular} \index{particular solution} \index{linear system!nonoverdetermined, particular solution of} \index{nonoverdetermined linear system!particular solution of} Any particular solution will do. Equation~(\ref{mtxinv:245:49}) has that \[ \begin{split} \ve f(\ve a, \ve b) &= G_>^{-1}\ve b - I_rKH_r\ve a, \\ (S) \left[ \ve x_1(\ve a, \ve b) + A^K \ve a \right] &= \left[\br{c}\ve f(\ve a, \ve b)\\\ve a\er\right] = \ve f(\ve a, \ve b) + H_r\ve a, \end{split} \] where we have substituted the last line of~(\ref{mtxinv:240:20}) for~$\ve x$. This holds for any~$\ve a$ and~$\ve b$. We are not free to choose the driving vector~$\ve b$, but since we need only one particular solution,~$\ve a$ can be anything we want. Why not \[ \ve a=0? \] Then \[ \begin{split} \ve f(0, \ve b) &= G_>^{-1}\ve b, \\ S\ve x_1(0, \ve b) &= \left[\br{c}\ve f(0, \ve b)\\0\er\right] = \ve f(0, \ve b). \end{split} \] That is, \bq{mtxinv:240:25} \ve x_1 = S^{-1}G_>^{-1}\ve b. \eq \subsection{The general solution} \label{mtxinv:240.40} \index{solution!general} \index{general solution} \index{linear system!nonoverdetermined, general solution of} \index{nonoverdetermined linear system!general solution of} Assembling~(\ref{mtxinv:245:kernel}), (\ref{mtxinv:240:20}) and~(\ref{mtxinv:240:25}) yields the general solution \bq{mtxinv:240:50} \ve x = S^{-1}(G_>^{-1}\ve b + K^{-1}H_rI_{n-r} \ve a) \eq to the nonoverdetermined linear system~(\ref{mtxinv:240:05}). \index{arithmetic!exact} \index{exact arithmetic} \index{rounding error} \index{error!due to rounding} \index{pivot!small} \index{matrix!large} In exact arithmetic~(\ref{mtxinv:240:50}) solves the nonoverdetermined linear system in theory exactly. Of course, practical calculations are usually done in limited precision, in which compounded rounding error in the last bit eventually disrupts~(\ref{mtxinv:240:50}) for matrices larger than some moderately large size. Avoiding unduly small pivots early in the Gauss-Jordan extends~(\ref{mtxinv:240:50})'s reach to larger matrices, and for yet larger matrices a bewildering variety of more sophisticated techniques exists to mitigate the problem, which can be vexing because the problem arises even when the matrix~$A$ is exactly known. Equation~(\ref{mtxinv:240:50}) is useful and correct, but one should at least be aware that it can in practice lose floating-point accuracy when the matrix it attacks grows too large. (It can also lose accuracy when the matrix's rows are almost dependent, but that is more the fault of the matrix than of the formula. See \S~\ref{eigen:470}, which addresses a related problem.) % ---------------------------------------------------------------------- \section{The residual} \label{mtxinv:315} \index{residual} Equations~(\ref{mtxinv:220:20}) and~(\ref{mtxinv:230:15}) solve the exactly determined linear system $A\ve x = \ve b$. Equation~(\ref{mtxinv:240:50}) broadens the solution to include the nonoverdetermined linear system. None of those equations however can handle the overdetermined linear system, because for general~$\ve b$ the overdetermined linear system \bq{mtxinv:315:05} A \ve x \approx \ve b \eq has no exact solution. (See \S~\ref{gjrank:340.24} for the definitions of \emph{underdetermined, overdetermined,} etc.) \index{linear system!overdetermined} \index{overdetermined linear system} \index{datum} One is tempted to declare the overdetermined system uninteresting because it has no solution and to leave the matter there, but this would be a serious mistake. In fact the overdetermined system is especially interesting, and the more so because it arises so frequently in applications. One seldom trusts a minimal set of data for important measurements, yet extra data imply an overdetermined system. We need to develop the mathematics to handle the overdetermined system properly. \index{squared residual norm} \index{residual!squared norm of} The quantity% \footnote{\label{mtxinv:315:95}% Alas, the alphabet has only so many letters (see Appendix~\ref{greek}). The~$\ve r$ here is unrelated to matrix rank~$r$. }$\mbox{}^,$% \footnote{ This is as~\cite{vdVorst} defines it. Some authors~\cite{Peterson/Mittra:1986} however prefer to define $\ve r(\ve x) \equiv A\ve x - \ve b$, instead. } \bq{mtxinv:residual} \ve r(\ve x) \equiv \ve b - A\ve x \eq measures how nearly some candidate solution~$\ve x$ solves the system~(\ref{mtxinv:315:05}). We call this quantity the \emph{residual,} and the smaller, the better. More precisely, the smaller the nonnegative real scalar \bq{mtxinv:residual-norm} [\ve r(\ve x)]^{*}[\ve r(\ve x)] = \sum_i \left|r_i(\ve x)\right|^2 \eq is, called the \emph{squared residual norm,} the more favorably we regard the candidate solution~$\ve x$. % ---------------------------------------------------------------------- \section[The pseudoinverse and least squares] {The Moore-Penrose pseudoinverse and the least-squares problem} \label{mtxinv:320} \index{pseudoinverse} \index{Moore-Penrose pseudoinverse} \index{Moore, E.H. (1862--1932)} \index{Penrose, Roger (1931--)} \index{least squares} \index{squares, least} \index{datum} \index{line!fitting a} \index{freeway} \index{building construction} \index{contractor} \index{Saturday} \index{worker} \index{labor union} A typical problem is to fit a straight line to some data. For example, suppose that we are building-construction contractors with a unionized work force, whose labor union can supply additional, fully trained labor on demand. Suppose further that we are contracted to build a long freeway and have been adding workers to the job in recent weeks to speed construction. On Saturday morning at the end of the second week, we gather and plot the production data on the left of Fig.~\ref{mtxinv:320:fig1}. If~$u_i$ and~$b_i$ respectively represent the number of workers and the length of freeway completed during week~$i$, then we can fit a straight line $b = \sigma u + \gamma$ to the measured production data such that \[ \mf{cc}{u_1&1\\u_2&1} \mf{c}{\sigma\\\gamma} = \mf{c}{b_1\\b_2}, \] inverting the matrix per \S\S~\ref{mtxinv:220} and~\ref{mtxinv:230} to solve for $\ve x \equiv[\sigma\;\gamma]^T$, in the hope that the resulting line will predict future production accurately. \begin{figure} \caption{Fitting a line to measured data.} \label{mtxinv:320:fig1} \bc \nc\fxa{-5.8} \nc\fxb{5.8} \nc\fya{-2.5} \nc\fyb{2.5} \nc\xxb{3.5} \nc\xxc{3.0} \nc\xxd{0.30} \nc\xxf{-0.85} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \small \psset{dimen=middle} \rput(-4.2,\xxf){ { \psset{linewidth=0.5pt} \psline{<->}(0,\xxc)(0,0)(\xxb,0) \footnotesize \rput[tl](0,-\xxd) {Number of workers} \rput[bl]{90}(-\xxd,0) {\parbox{3.0cm}{Length newly\\completed\\during the week}} } { \psset{linewidth=2.0pt} \psdot(1.2,1.5) \psdot(1.5,1.6) } { \psset{linewidth=1.0pt} % slope 0.33333 % y intercept at 1.10000 \psline(0.7,1.3333)(3.2,2.1667) } } \rput(1.8,\xxf){ { \psset{linewidth=0.5pt} \psline{<->}(0,\xxc)(0,0)(\xxb,0) \footnotesize \rput[tl](0,-\xxd) {Number of workers} \rput[bl]{90}(-\xxd,0) {\parbox{3.0cm}{Length newly\\completed\\during the week}} } { \psset{linewidth=2.0pt} % slope 0.52212 % y intercept at 0.86372 \psdot(1.2,1.5) \psdot(1.5,1.6) \psdot(1.6,1.8) \psdot(1.8,1.7) \psdot(2.1,2.0) } { \psset{linewidth=1.0pt} % slope 0.52212 % y intercept at 0.86372 \psline(0.7,1.2292)(3.2,2.5345) } } } \end{pspicture} \ec \end{figure} \index{linear system!overdetermined} \index{overdetermined linear system} That is all mathematically irreproachable. By the fifth Saturday however we shall have gathered more production data, plotted on the figure's right, to which we should like to fit a better line to predict production more accurately. The added data present a problem. Statistically, the added data are welcome, but geometrically we need only two points to specify a line; what are we to do with the other three? The five points together overdetermine the linear system \[ \mf{cc}{u_1&1\\u_2&1\\u_3&1\\u_4&1\\u_5&1} \mf{c}{\sigma\\\gamma} = \mf{c}{b_1\\b_2\\b_3\\b_4\\b_5}. \] There is no way to draw a single straight line $b = \sigma u + \gamma$ exactly through all five, for in placing the line we enjoy only two degrees of freedom.% \footnote{ Section~\ref{mtxinv:245.70} characterized a line as enjoying only one degree of freedom. Why now two? The answer is that \S~\ref{mtxinv:245.70} discussed travel along a line rather than placement of a line as here. Though both involve lines, they differ as driving an automobile differs from washing one. Do not let this confuse you. } The proper approach is to draw among the data points a single straight line that misses the points as narrowly as possible. More precisely, the proper approach chooses parameters~$\sigma$ and~$\gamma$ to minimize the squared residual norm $[\ve r(\ve x)]^{*}[\ve r(\ve x)]$ of \S~\ref{mtxinv:315}, given that \[ A = \mf{cc}{u_1&1\\u_2&1\\u_3&1\\u_4&1\\u_5&1\\\multicolumn{2}{c}{\vdots}}, \ \ \ve x = \mf{c}{\sigma\\\gamma}, \ \ \ve b = \mf{c}{b_1\\b_2\\b_3\\b_4\\b_5\\\vdots}. \] Such parameters constitute a \emph{least-squares} solution. The matrix~$A$ in the example has two columns, data marching on the left, all ones on the right. This is a typical structure for~$A$, but in general any matrix~$A$ with any number of columns of any content might arise (because there were more than two relevant variables or because some data merited heavier weight than others, among many further reasons). Whatever matrix~$A$ might arise from whatever source, this section attacks the difficult but important problem of approximating optimally a solution to the general, possibly unsolvable linear system~(\ref{mtxinv:315:05}), $A\ve x \approx \ve b$. \subsection{Least squares in the real domain} \label{mtxinv:320.20} \index{solution!of least-squares} \index{least-squares solution} \index{Jacobian derivative} \index{derivative!Jacobian} \index{matrix!real} The least-squares problem is simplest when the matrix~$A$ enjoys full column rank and no complex numbers are involved. In this case, we seek to minimize the squared residual norm \bqb [\ve r(\ve x)]^{T}[\ve r(\ve x)] &=& (\ve b - A\ve x)^{T}(\ve b - A\ve x) \\&=& \ve x^{T}A^{T}A\ve x + \ve b^{T}\ve b - \left( \ve x^{T} A^{T}\ve b + \ve b^{T}A\ve x \right) \\&=& \ve x^{T}A^{T}A\ve x + \ve b^{T}\ve b - 2\ve x^{T}A^{T}\ve b \\&=& \ve x^{T}A^{T}\left(A\ve x - 2\ve b\right) + \ve b^{T}\ve b, \eqb in which the transpose is used interchangeably for the adjoint because all the numbers involved happen to be real. The norm is minimized where \[ \frac{d}{d\ve x}\left(\ve r^{T}\ve r\right) = 0 \] (in which $d/d\ve x$ is the Jacobian operator of \S~\ref{matrix:350}). A requirement that \[ \frac{d}{d\ve x}\left[\ve x^{T}A^{T}\left(A\ve x - 2\ve b\right) + \ve b^{T}\ve b \right] = 0 \] comes of combining the last two equations. Differentiating by the Jacobian product rule~(\ref{matrix:350:Jacobian-prod}) yields the equation \[ \ve x^{T}A^{T}A + \left[A^{T}\left(A\ve x - 2\ve b\right)\right]^{T} = 0; \] or, after transposing the equation, rearranging terms and dividing by 2, the simplified equation \[ A^{T}A\ve x = A^{T}\ve b. \] Assuming (as warranted by \S~\ref{mtxinv:320.30}, next) that the $n \times n$ square matrix~$A^{T}A$ is invertible, the simplified equation implies the approximate but optimal least-squares solution \bq{mtxinv:320:leastsq} \ve x = \left(A^{T}A\right)^{-1}A^{T}\ve b \eq to the unsolvable linear system~(\ref{mtxinv:315:05}) in the restricted but quite typical case that~$A$ and~$\ve b$ are real and~$A$ has full column rank. Equation~(\ref{mtxinv:320:leastsq}) plots the line on Fig.~\ref{mtxinv:320:fig1}'s right. As the reader can see, the line does not pass through all the points, for no line can; but it does pass pretty convincingly nearly among them. In fact it passes optimally nearly among them. No line can pass more nearly, in the squared-residual norm sense of~(\ref{mtxinv:residual-norm}).% \footnote{% \index{metric}% Here is a nice example of the use of the mathematical adjective \emph{optimal} in its adverbial form. ``Optimal'' means ``best.'' Many problems in applied mathematics involve discovering the best of something. What constitutes the best however can be a matter of judgment, even of dispute. We will leave to the philosopher and the theologian the important question of what constitutes objective good, for applied mathematics is a poor guide to such mysteries. The role of applied mathematics is to construct suitable models to calculate quantities needed to achieve some definite good; its role is not, usually, to identify the good as good in the first place. \index{optimality}% \index{cost function}% One generally establishes mathematical optimality by some suitable, nonnegative, real \emph{cost function} or \emph{metric,} and the less, the better. Strictly speaking, the mathematics cannot tell us which metric to use, but where no other consideration prevails the applied mathematician tends to choose the metric that best simplifies the mathematics at hand---and, really, that is about as good a way to choose a metric as any. The metric~(\ref{mtxinv:residual-norm}) is so chosen. ``But,'' comes the objection, ``what if some more complicated metric is better?'' Well, if the other metric really, objectively is better, then one should probably use it. In general however the mathematical question is: what does one mean by ``better?'' Better by which metric? Each metric is better according to itself. This is where the mathematician's experience, taste and judgment come in. In the present section's example, too much labor on the freeway job might actually slow construction rather than speed it. One could therefore seek to fit not a line but some downward-turning curve to the data. Mathematics offers many downward-turning curves. A circle, maybe? Not likely. An experienced mathematician would probably reject the circle on the aesthetic yet practical ground that the parabola $b = \alpha u^2 + \sigma u + \gamma$ lends itself to easier analysis. Yet even fitting a mere straight line offers choices. One might fit the line to the points $(b_i,u_i)$ or $(\ln u_i,\ln b_i)$ rather than to the points $(u_i,b_i)$. The three resulting lines differ subtly. They predict production differently. The adjective ``optimal'' alone evidently does not always tell us all we need to know. Section~\ref{noth:420} offers a choice between averages that resembles in spirit this footnote's choice between metrics. } \subsection{The invertibility of~$A^{*}A$} \label{mtxinv:320.30} \index{invertibility} Section~\ref{mtxinv:320.20} has assumed correctly but unwarrantedly that the product~$A^{T}A$ were invertible for real~$A$ of full column rank. For real~$A$, it happens that $A^{T}=A^{*}$, so it only broadens the same assumption to suppose that the product~$A^{*}A$ were invertible for complex~$A$ of full column rank.% \footnote{ Notice that if~$A$ is tall, then~$A^{*}A$ is a compact, $n \times n$ square, whereas~$AA^{*}$ is a big, $m \times m$ square. It is the compact square that concerns this section. The big square is not very interesting and in any case is not invertible. } This subsection warrants the latter assumption, thereby incidentally also warranting the former. \index{column rank} \index{rank!column} Let~$A$ be a complex, $m \times n$ matrix of full column rank $r=n \le m$. Suppose falsely that~$A^{*}A$ were not invertible but singular. Since the product~$A^{*}A$ is a square, $n \times n$ matrix, this is to suppose (\S~\ref{mtxinv:220}) that the product's rank $r' 0$ for all $I_n\ve u \neq 0$. } \eq As in \S~\ref{mtxinv:320.30}, here also when a matrix~$A$ has full column rank $r = n \le m$ the product $\ve u^{*}A^{*}A\ve u = (A\ve u)^{*}(A\ve u)$ is real and positive for all nonzero, $n$-element vectors~$\ve u$. Thus per~(\ref{mtxinv:320:34}) \emph{the product~$A^{*}A$ is positive definite for any matrix~$A$ of full column rank.} \index{nonnegative definiteness} An $n \times n$ matrix~$C$ is \emph{nonnegative definite} if and only if \bq{mtxinv:320:35} \mbox{ $\Im(\ve u^{*}C\ve u) = 0$ and $\Re(\ve u^{*}C\ve u) \ge 0$ for all~$\ve u$. } \eq By reasoning like the last paragraph's, \emph{the product~$A^{*}A$ is nonnegative definite for any matrix~$A$ whatsoever.} Such definitions might seem opaque, but their sense is that a positive definite operator never reverses the thing it operates on, that the product~$A\ve u$ points more in the direction of~$\ve u$ than of~$-\ve u$. Section~\ref{mtxinv:445} explains further. A positive definite operator resembles a positive scalar in this sense. \subsection{The Moore-Penrose pseudoinverse} \label{mtxinv:320.50} \index{factorization!full-rank} \index{full-rank factorization} \index{conjecture} \index{residual!minimizing the} Not every $m \times n$ matrix~$A$ enjoys full rank. According to~(\ref{gjrank:340:26}), however, every $m \times n$ matrix~$A$ of rank~$r$ can be factored into a product% \footnote{ This subsection uses the symbols~$B$ and~$\ve b$ for unrelated purposes, which is unfortunate but conventional. See footnote~\ref{mtxinv:315:95}. } \[ A = BC \] of an $m \times r$ tall or square matrix~$B$ and an $r \times n$ broad or square matrix~$C$, both of which factors themselves enjoy full rank~$r$. (If~$A$ happens to have full row or column rank, then one can just choose $B=I_m$ or $C=I_n$; but even if~$A$ lacks full rank, the Gauss-Jordan decomposition of eqn.~\ref{gjrank:341:GJ} finds at least the full-rank factorization $B=G_>I_r$, $C=I_rG_<$.) This being so, a conjecture seems warranted. Suppose that, inspired by~(\ref{mtxinv:320:leastsq}), we manipulated~(\ref{mtxinv:315:05}) by the successive steps \bqb A \ve x &\approx& \ve b, \\ BC \ve x &\approx& \ve b, \\ (B^{*}B)^{-1}B^{*}BC \ve x &\approx& (B^{*}B)^{-1}B^{*}\ve b, \\ C \ve x &\approx& (B^{*}B)^{-1}B^{*}\ve b. \eqb Then suppose that we changed \[ C^{*} \ve u \la \ve x, \] thus restricting~$\ve x$ to the space addressed by the independent columns of~$C^{*}$. Continuing, \bqb CC^{*} \ve u &\approx& (B^{*}B)^{-1}B^{*}\ve b, \\ \ve u &\approx& (CC^{*})^{-1}(B^{*}B)^{-1}B^{*}\ve b. \eqb Changing the variable back and (because we are conjecturing and can do as we like), altering the~``$\approx$'' sign to~``$=$,'' \bq{mtxinv:320:50} \ve x = C^{*}(CC^{*})^{-1}(B^{*}B)^{-1}B^{*}\ve b. \eq Equation~(\ref{mtxinv:320:50}) has a pleasingly symmetrical form, and we know from \S~\ref{mtxinv:320.30} at least that the two matrices it tries to invert are invertible. So here is our conjecture: \bi \item no~$\ve x$ enjoys a smaller squared residual norm~$\ve r^{*}\ve r$ than the~$\ve x$ of~(\ref{mtxinv:320:50}) does; and \item among all~$\ve x$ that enjoy the same, minimal squared residual norm, the~$\ve x$ of~(\ref{mtxinv:320:50}) is strictly least in magnitude. \ei The conjecture is bold, but if you think about it in the right way it is not unwarranted under the circumstance. After all,~(\ref{mtxinv:320:50}) does resemble~(\ref{mtxinv:320:leastsq}), the latter of which admittedly requires real~$A$ of full column rank but does minimize the residual when its requirements are met; and, even if there were more than one~$\ve x$ which minimized the residual, one of them might be smaller than the others: why not the~$\ve x$ of~(\ref{mtxinv:320:50})? One can but investigate. \index{deviation} \index{reverse logic} \index{logic!reverse} \index{inequality} The first point of the conjecture is symbolized \[ \ve r^{*}(\ve x) \ve r(\ve x) \le \ve r^{*}(\ve x + \Delta\ve x) \ve r(\ve x + \Delta\ve x), \] where~$\Delta\ve x$ represents the deviation, whether small, moderate or large, of some alternate~$\ve x$ from the~$\ve x$ of~(\ref{mtxinv:320:50}). According to~(\ref{mtxinv:residual}), this is \[ [\ve b-A\ve x]^{*}[\ve b-A\ve x] \le [\ve b-(A)(\ve x+\Delta\ve x)]^{*}[\ve b-(A)(\ve x+\Delta\ve x)]. \] Reorganizing, \[ [\ve b-A\ve x]^{*}[\ve b-A\ve x] \le [(\ve b-A\ve x)-A\,\Delta\ve x]^{*}[(\ve b-A\ve x)-A\,\Delta\ve x]. \] Distributing factors and canceling like terms, \[ 0 \le - \Delta\ve x^{*}\,A^{*}(\ve b - A\ve x) - (\ve b - A\ve x)^{*}A\,\Delta\ve x + \Delta\ve x^{*}\,A^{*}A\,\Delta\ve x. \] But according to~(\ref{mtxinv:320:50}) and the full-rank factorization $A=BC$, \bqb A^{*}(\ve b-A\ve x) &=& A^{*}\ve b - A^{*}A\ve x \\&=& [C^{*}B^{*}][\ve b] - [C^{*}B^{*}][BC][C^{*}(CC^{*})^{-1}(B^{*}B)^{-1}B^{*}\ve b] \\&=& C^{*}B^{*}\ve b - C^{*}(B^{*}B)(CC^{*})(CC^{*})^{-1}(B^{*}B)^{-1}B^{*}\ve b \\&=& C^{*}B^{*}\ve b - C^{*}B^{*}\ve b = 0, \eqb which reveals two of the inequality's remaining three terms to be zero, leaving an assertion that \[ 0 \le \Delta\ve x^{*}\,A^{*}A\,\Delta\ve x. \] Each step in the present paragraph is reversible,% \footnote{ The paragraph might inscrutably but logically instead have ordered the steps in reverse as in \S\S~\ref{noth:420.20} and~\ref{inttx:250}. See Ch.~\ref{noth}'s footnote~\ref{noth:420:85}. } so the assertion in the last form is logically equivalent to the conjecture's first point, with which the paragraph began. Moreover, the assertion in the last form is correct because the product of any matrix and its adjoint according to \S~\ref{mtxinv:320.34} is a nonnegative definite operator, thus establishing the conjecture's first point. The conjecture's first point, now established, has it that no $\ve x + \Delta\ve x$ enjoys a smaller squared residual norm than the~$\ve x$ of~(\ref{mtxinv:320:50}) does. It does not claim that no $\ve x + \Delta\ve x$ enjoys the same, minimal squared residual norm. The latter case is symbolized \[ \ve r^{*}(\ve x) \ve r(\ve x) = \ve r^{*}(\ve x + \Delta\ve x) \ve r(\ve x + \Delta\ve x), \] or equivalently by the last paragraph's logic, \[ 0 = \Delta\ve x^{*}\,A^{*}A\,\Delta\ve x; \] or in other words, \[ A\,\Delta\ve x = 0. \] But $A=BC$, so this is to claim that \[ B(C\,\Delta\ve x) = 0, \] which since~$B$ has full column rank is possible only if \[ C\,\Delta\ve x = 0. \] Considering the product $\Delta\ve x^{*}\,\ve x$ in light of~(\ref{mtxinv:320:50}) and the last equation, we observe that \bqb \Delta\ve x^{*}\,\ve x &=& \Delta\ve x^{*}\,[C^{*}(CC^{*})^{-1}(B^{*}B)^{-1}B^{*}\ve b] \\&=& [C\,\Delta\ve x]^{*}[(CC^{*})^{-1}(B^{*}B)^{-1}B^{*}\ve b], \eqb which is to observe that \[ \Delta\ve x^{*}\,\ve x = 0 \] for any~$\Delta\ve x$ for which $\ve x + \Delta\ve x$ achieves minimal squared residual norm. Returning attention to the conjecture, its second point is symbolized \[ \ve x^{*}\ve x < (\ve x+\Delta\ve x)^{*}(\ve x+\Delta\ve x) \] for any \[ \Delta\ve x \neq 0 \] for which $\ve x + \Delta\ve x$ achieves minimal squared residual norm (note that it's ``$<$'' this time, not ``$\le$'' as in the conjecture's first point). Distributing factors and canceling like terms, \[ 0 < \ve x^{*}\,\Delta\ve x + \Delta\ve x^{*}\,\ve x + \Delta\ve x^{*}\,\Delta\ve x. \] But the last paragraph has found that $\Delta\ve x^{*}\,\ve x = 0$ for precisely such~$\Delta\ve x$ as we are considering here, so the last inequality reduces to read \[ 0 < \Delta\ve x^{*}\,\Delta\ve x, \] which naturally for $\Delta\ve x \neq 0$ is true. Since each step in the paragraph is reversible, reverse logic establishes the conjecture's second point. With both its points established, the conjecture is true. If $A=BC$ is a full-rank factorization, then the matrix% \footnote{ Some books print~$A^\dagger$ as~$A^{+}$. } \bq{mtxinv:psinv} A^\dagger \equiv C^{*}(CC^{*})^{-1}(B^{*}B)^{-1}B^{*} \eq of~(\ref{mtxinv:320:50}) is called the \emph{Moore-Penrose pseudoinverse} of~$A$, more briefly the \emph{pseudoinverse} of~$A$. Whether underdetermined, exactly determined, overdetermined or even degenerate, every matrix has a Moore-Penrose pseudoinverse. Yielding the optimal approximation \bq{mtxinv:320:70} \ve x = A^\dagger \ve b, \eq the Moore-Penrose solves the linear system~(\ref{mtxinv:315:05}) as well as the system can be solved---exactly if possible, with minimal squared residual norm if impossible. If~$A$ is square and invertible, then the Moore-Penrose $A^\dagger = A^{-1}$ is just the inverse, and then of course~(\ref{mtxinv:320:70}) solves the system uniquely and exactly. Nothing can solve the system uniquely if~$A$ has broad shape but the Moore-Penrose still solves the system exactly in that case as long as~$A$ has full row rank, moreover minimizing the solution's squared magnitude~$\ve x^{*}\ve x$ (which the solution of eqn.~\ref{mtxinv:240:25} fails to do). If~$A$ lacks full row rank, then the Moore-Penrose solves the system as nearly as the system can be solved (as in Fig.~\ref{mtxinv:320:fig1}) and as a side-benefit also minimizes~$\ve x^{*}\ve x$. The Moore-Penrose is thus a general-purpose solver and approximator for linear systems. It is a significant discovery.% \footnote{ \cite[\S~3.3]{Beattie}% \cite[``Moore-Penrose generalized inverse'']{planetm} } % ---------------------------------------------------------------------- \section{The multivariate Newton-Raphson iteration} \label{mtxinv:420} \index{Newton-Raphson iteration!multivariate} \index{multivariate Newton-Raphson iteration} \index{Raphson, Joseph (1648--1715)} \index{Newton, Sir Isaac (1642--1727)} \index{root!finding of numerically} When we first met the Newton-Raphson iteration in \S~\ref{drvtv:270} we lacked the matrix notation and algebra to express and handle vector-valued functions adeptly. Now that we have the notation and algebra we can write down the multivariate Newton-Raphson iteration almost at once. \index{Jacobian derivative} \index{derivative!Jacobian} The iteration approximates the nonlinear vector function $\ve f(\ve x)$ by its tangent \[ \tilde{\ve f}_k(\ve x) = \ve f(\ve x_k) + \left[\frac{d}{d\ve x}\ve f(\ve x)\right]_{\ve x=\ve x_k} (\ve x-\ve x_k), \] where $d\ve f/d\ve x$ is the Jacobian derivative of \S~\ref{matrix:350}. It then approximates the root $\ve x_{k+1}$ as the point at which $\tilde{\ve f}_k(\ve x_{k+1}) = 0$: \[ \tilde{\ve f}_k(\ve x_{k+1}) = 0 = \ve f(\ve x_k) + \left[\frac{d}{d\ve x}\ve f(\ve x)\right]_{\ve x=\ve x_k} (\ve x_{k+1}-\ve x_k). \] Solving for $\ve x_{k+1}$ (approximately if necessary), we have that \bq{mtxinv:NR} \ve x_{k+1} = \left. \ve x - \left[\frac{d}{d\ve x}\ve f(\ve x)\right]^\dagger \ve f(\ve x) \right|_{\ve x=\ve x_k}, \eq where~$[\cdot]^\dagger$ is the Moore-Penrose pseudoinverse of \S~\ref{mtxinv:320}---which is just the ordinary inverse~$[\cdot]^{-1}$ of \S~\ref{mtxinv:220} if~$\ve f$ and~$\ve x$ happen each to have the same number of elements. Refer to \S~\ref{drvtv:270} and Fig.~\ref{drvtv:270:fig1}.% \footnote{ \cite{Scales-lecture} } Despite the Moore-Penrose notation of~(\ref{mtxinv:NR}), the Newton-Raphson iteration is not normally meant to be applied at a value of~$\ve x$ for which the Jacobian is degenerate. The iteration intends rather in light of~(\ref{mtxinv:psinv}) that \bq{mtxinv:420:70} \left[\frac{d}{d\ve x}\ve f(\ve x)\right]^\dagger = \begin{cases} \left[d\ve f/d\ve x\right]^{*} \left( \left[d\ve f/d\ve x\right] \left[d\ve f/d\ve x\right]^{*} \right)^{-1} & \mbox{ if $r=m \le n$, } \\ \left[d\ve f/d\ve x\right]^{-1} & \mbox{ if $r=m=n$, } \\ \left( \left[d\ve f/d\ve x\right]^{*} \left[d\ve f/d\ve x\right] \right)^{-1} \left[d\ve f/d\ve x\right]^{*} & \mbox{ if $r=n \le m$, } \end{cases} \eq where $B=I_m$ in the first case and $C=I_n$ in the last. It does not intend to use the full~(\ref{mtxinv:psinv}). If both $r \left|\ve a\right| + \left|\ve b\right|. \] Squaring and using~(\ref{mtxinv:445:dotmag}), \[ (\ve a + \ve b)^{*} \cdot (\ve a + \ve b) > \ve a^{*}\cdot\ve a + 2\left|\ve a\right|\left|\ve b\right| + \ve b^{*}\cdot\ve b. \] Distributing factors and canceling like terms, \[ \ve a^{*}\cdot\ve b + \ve b^{*}\cdot\ve a > 2\left|\ve a\right|\left|\ve b\right|. \] Splitting~$\ve a$ and~$\ve b$ each into real and imaginary parts on the inequality's left side and then halving both sides, \[ \Re(\ve a)\cdot\Re(\ve b) + \Im(\ve a)\cdot\Im(\ve b) > \left|\ve a\right|\left|\ve b\right|. \] Defining the new, $2n$-dimensional \emph{real} vectors \[ \ve f \equiv \mf{c}{ \Re{(\ve a_1)} \\ \Im{(\ve a_1)} \\ \Re{(\ve a_2)} \\ \Im{(\ve a_2)} \\ \vdots \\ \Re{(\ve a_n)} \\ \Im{(\ve a_n)} }, \ \ \ve g \equiv \mf{c}{ \Re{(\ve b_1)} \\ \Im{(\ve b_1)} \\ \Re{(\ve b_2)} \\ \Im{(\ve b_2)} \\ \vdots \\ \Re{(\ve b_n)} \\ \Im{(\ve b_n)} }, \] we make the inequality to be \[ \ve f \cdot \ve g > \left|\ve f\right|\left|\ve g\right|, \] in which we observe that the left side must be positive because the right side is nonnegative. (This naturally is impossible for any case in which $\ve f = 0$ or $\ve g = 0$, among others, but wishing to establish impossibility for all cases we pretend not to notice and continue reasoning as follows.) Squaring again, \[ (\ve f \cdot \ve g)^2 > (\ve f \cdot \ve f)(\ve g \cdot \ve g); \] or, in other words, \[ \sum_{i,j} f_ig_if_jg_j > \sum_{i,j} f_i^2g_j^2. \] Reordering factors, \[ \sum_{i,j} [(f_ig_j)(g_if_j)] > \sum_{i,j} (f_ig_j)^2. \] Subtracting $\sum_i (f_ig_i)^2$ from each side, \[ \sum_{i \neq j} [(f_ig_j)(g_if_j)] > \sum_{i \neq j} (f_ig_j)^2, \] which we can cleverly rewrite in the form \[ \sum_{i < j} [2(f_ig_j)(g_if_j)] > \sum_{i < j} [(f_ig_j)^2 + (g_if_j)^2], \] where $\sum_{i \sum_{i < j} [(f_ig_j)^2 + 2(f_ig_j)(g_if_j) + (g_if_j)^2]. \] This is \[ 0 > \sum_{i < j} [f_ig_j + g_if_j]^2, \] which, since we have constructed the vectors~$\ve f$ and~$\ve g$ to have real elements only, is impossible in all cases. The contradiction proves false the assumption that gave rise to it, thus establishing the sum hypothesis of~(\ref{mtxinv:447:10}). The difference hypothesis that $\left|\ve a\right|-\left|\ve b\right| \le \left|\ve a + \ve b\right|$ is established by defining a vector $\ve c$ such that \[ \ve a + \ve b + \ve c = 0, \] whereupon according to the sum hypothesis (which we have already established), \[ \begin{split} \left|\ve a + \ve c\right| &\le \left|\ve a\right| + \left|\ve c\right|, \\ \left|\ve b + \ve c\right| &\le \left|\ve b\right| + \left|\ve c\right|. \end{split} \] That is, \[ \begin{split} \left|-\ve b\right| &\le \left|\ve a\right| + \left|-\ve a-\ve b\right|, \\ \left|-\ve a\right| &\le \left|\ve b\right| + \left|-\ve a-\ve b\right|, \end{split} \] which is the difference hypothesis in disguise. This completes the proof of~(\ref{mtxinv:447:10}). As in \S~\ref{trig:278}, here too we can extend the sum inequality to the even more general form \bq{mtxinv:447:20} \left|\sum_k \ve a_k \right| \le \sum_k \left| \ve a_k \right|. \eq % ---------------------------------------------------------------------- \section{The orthogonal complement} \label{mtxinv:450} \index{orthogonal complement} \index{matrix!orthogonally complementary} \index{perpendicular matrix} \index{matrix!perpendicular} The $m \times (m-r)$ kernel (\S~\ref{mtxinv:245})% \footnote{\label{mtxinv:450:08}% The symbol~$A^\perp$ \cite{Hefferon}\cite{Beattie}\cite{Lay} can be pronounced ``A perp,'' short for ``A perpendicular,'' since by~(\ref{mtxinv:450:20}) $A^\perp$ is in some sense perpendicular to~$A$. If we were really precise, we might write not~$A^{\perp}$ but $A^{\perp(m)}$. Refer to footnote~\ref{mtxinv:245:08}. } \bq{mtxinv:450:perp} A^\perp \equiv A^{*K} \eq is an interesting matrix. By definition of the kernel, the columns of~$A^{*K}$ are the independent vectors~$\ve u_j$ for which $A^{*}\ve u_j = 0$, which---inasmuch as the rows of~$A^{*}$ are the adjoints of the \emph{columns} of~$A$---is possible only when each~$\ve u_j$ lies orthogonal to every column of~$A$. This says that the columns of $A^\perp \equiv A^{*K}$ address the complete space of vectors that lie orthogonal to $A$'s columns, such that \bq{mtxinv:450:20} A^{\perp{*}} A = 0 = A^{*} A^\perp. \eq The matrix~$A^\perp$ is called the \emph{orthogonal complement}% \footnote{\cite[\S~3.VI.3]{Hefferon}} or \emph{perpendicular matrix} to~$A$. \index{rank!row} \index{full row rank} Among other uses, the orthogonal complement~$A^\perp$ supplies the columns % bad break $A$ lacks to reach full row rank. Properties include that \bq{mtxinv:450:30} \begin{split} A^{*K} &= A^\perp, \\ A^{*\perp} &= A^K. \end{split} \eq % ---------------------------------------------------------------------- \section{Gram-Schmidt orthonormalization} \label{mtxinv:460} \index{orthonormalization} \index{vector!orthonormalization of} \index{Gram-Schmidt process} \index{Gram, J\o rgen Pedersen (1850--1916)} \index{Schmidt, Erhard (1876--1959)} \index{normalization} \index{vector!normalization of} If a vector $\ve x=A^K\ve a$ belongs to a kernel space~$A^K$ (\S~\ref{mtxinv:245}), then so equally does any~$\alpha \ve x$. If the vectors $\ve x_1=A^K\ve a_1$ and $\ve x_2=A^K\ve a_2$ both belong, then so does $\alpha_1 \ve x_1 + \alpha_2 \ve x_2$. If I claim $A^K=[3\;4\;5;\mbox{$-1$}\;1\;0]^T$ to represent a kernel, then you are not mistaken arbitrarily to rescale each column of my~$A^K$ by a separate nonzero factor, instead for instance representing the same kernel as $A^K=[6\;8\;\mbox{0xA};\frac 1 7\;\mbox{$-\frac 1 7$}\;0]^T$. Kernel vectors have no inherent scale. Style generally asks the applied mathematician to remove the false appearance of scale by using~(\ref{mtxinv:445:20}) \emph{to normalize} the columns of a kernel matrix to unit magnitude before reporting them. The same goes for the eigenvectors of Ch.~\ref{eigen} to come. \index{orthogonalization} \index{vector!orthogonalization of} Where a kernel matrix~$A^K$ has two or more columns (or a repeated eigenvalue has two or more eigenvectors), style generally asks the applied mathematician not only to normalize but also \emph{to orthogonalize} the columns before reporting them. One orthogonalizes a vector~$\ve b$ with respect to a vector~$\ve a$ by subtracting from~$\ve b$ a multiple of~$\ve a$ such that \[ \begin{split} \ve a^{*} \cdot \ve b_\perp &= 0, \\ \ve b_\perp &\equiv \ve b - \beta \ve a, \end{split} \] where the symbol~$\ve b_\perp$ represents the orthogonalized vector. Substituting the second of these equations into the first and solving for~$\beta$ yields \[ \beta = \frac{\ve a^{*} \cdot \ve b}{\ve a^{*} \cdot \ve a}. \] Hence, \bq{mtxinv:460:10} \begin{split} \ve a^{*} \cdot \ve b_\perp &= 0, \\ \ve b_\perp &\equiv \ve b - \frac{\ve a^{*} \cdot \ve b}{\ve a^{*} \cdot \ve a} \ve a. \end{split} \eq But according to~(\ref{mtxinv:445:20}), $\ve a = \vu a\sqrt{\ve a^{*} \cdot \ve a}$; and according to~(\ref{mtxinv:445:25}), $\vu a^{*}\cdot\vu a = 1$; so, \bq{mtxinv:460:34} \ve b_\perp = \ve b - \vu a (\vu a^{*} \cdot \ve b); \eq or, in matrix notation, \[ \ve b_\perp = \ve b - \vu a ( \vu a^{*} )( \ve b ). \] This is arguably better written \bq{mtxinv:460:35} \ve b_\perp = \left[I - (\vu a)(\vu a^{*})\right]\ve b \eq (observe that it's $[\vu a][\vu a^{*}]$, a matrix, rather than the scalar $[\vu a^{*}][\vu a]$). One \emph{orthonormalizes} a set of vectors by orthogonalizing them with respect to one another, then by normalizing each of them to unit magnitude. The procedure to orthonormalize several vectors \[ \left\{ \ve x_1,\ve x_2,\ve x_3,\ldots,\ve x_n\right\} \] therefore is as follows. First, normalize~$\ve x_1$ by~(\ref{mtxinv:445:20}); call the result~$\vu x_{1\perp}$. Second, orthogonalize~$\ve x_2$ with respect to~$\vu x_{1\perp}$ by~(\ref{mtxinv:460:34}) or~(\ref{mtxinv:460:35}), then normalize it; call the result~$\vu x_{2\perp}$. Third, orthogonalize~$\ve x_3$ with respect to~$\vu x_{1\perp}$ then to~$\vu x_{2\perp}$, then normalize it; call the result~$\vu x_{3\perp}$. Proceed in this manner through the several~$\ve x_j$. Symbolically, \bq{mtxinv:460:50} \begin{split} \vu x_{j\perp} &= \frac{ \ve x_{j\perp} }{ \sqrt{ \ve x_{j\perp}^{*} \ve x_{j\perp} } }, \\ \ve x_{j\perp} &\equiv \left[ \prod_{i=1}^{j-1} \left( I - \vu x_{i\perp} \vu x_{i\perp}^{*} \right) \right] \ve x_j. \end{split} \eq By the vector replacement principle of \S~\ref{gjrank:338} in light of~(\ref{mtxinv:460:10}), the resulting orthonormal set of vectors \[ \left\{ \vu x_{1\perp},\vu x_{2\perp}, \vu x_{3\perp},\ldots,\vu x_{n\perp}\right\} \] addresses the same space as did the original set. Orthonormalization naturally works equally for any linearly independent set of vectors, not only for kernel vectors or eigenvectors. By the technique, one can conveniently replace a set of independent vectors by an equivalent, neater, orthonormal set which addresses precisely the same space. \subsection{Efficient implementation} \label{mtxinv:460.20} \index{efficient implementation} \index{implementation!efficient} \index{computer memory} \index{memory, computer} \index{algorithm!implementation of from an equation} To turn an equation like the latter line of~(\ref{mtxinv:460:50}) into an efficient numerical algorithm sometimes demands some extra thought, in perspective of whatever it happens to be that one is trying to accomplish. If all one wants is some vectors orthonormalized, then the equation as written is neat but is overkill because the product~$\vu x_{i\perp}\vu x_{i\perp}^{*}$ is a matrix, whereas the product~$\vu x_{i\perp}^{*}\ve x_j$ implied by~(\ref{mtxinv:460:34}) is just a scalar. Fortunately, one need not apply the latter line of~(\ref{mtxinv:460:50}) exactly as written. One can instead introduce intermediate vectors~$\ve x_{ji}$, representing the~$\prod$ multiplication in the admittedly messier form \bq{mtxinv:460:55} \begin{split} \ve x_{j1} &\equiv \ve x_j, \\ \ve x_{j(i+1)} &\equiv \ve x_{ji} -\left( \vu x_{i\perp}^{*} \cdot \ve x_{ji} \right) \vu x_{i\perp}, \\ \ve x_{j\perp} &= \ve x_{jj}. \end{split} \eq Besides obviating the matrix $I - \vu x_{i\perp}\vu x_{i\perp}^{*}$ and the associated matrix multiplication, the messier form~(\ref{mtxinv:460:55}) has the significant additional practical virtue that it lets one forget each intermediate vector~$\ve x_{ji}$ immediately after using it. (A well-written orthonormalizing computer program reserves memory for one intermediate vector only, which memory it repeatedly overwrites---and, actually, probably does not even reserve that much, working rather in the memory space it has already reserved for~$\vu x_{j\perp}$.)% \footnote{ \cite[``Gram-Schmidt process,'' 04:48, 11~Aug. 2007]{wikip} } Other equations one algorithmizes can likewise benefit from thoughtful rendering. \subsection{The Gram-Schmidt decomposition} \label{mtxinv:460.30} \index{$QR$ decomposition} \index{Gram-Schmidt decomposition} \index{orthonormalizing decomposition} \index{decomposition!$QR$} \index{decomposition!Gram-Schmidt} \index{decomposition!orthonormalizing} \index{factorization!$QR$} \index{factorization!Gram-Schmidt} \index{factorization!orthonormalizing} \index{interchange!refusing an} The orthonormalization technique this section has developed is named the \emph{Gram-Schmidt process.} One can turn it into the \emph{Gram-Schmidt decomposition} \bq{mtxinv:QR} \begin{split} A &= QR = QUDS, \\ R &\equiv UDS, \end{split} \eq also called the \emph{orthonormalizing} or \emph{QR decomposition,} by an algorithm that somewhat resembles the Gauss-Jordan algorithm of \S~\ref{gjrank:341.10}; except that~(\ref{gjrank:341:05}) here becomes \bq{mtxinv:460:60} A = \tilde Q \tilde U \tilde D \tilde S \eq and initially $\tilde Q \la A$. By elementary column operations based on~(\ref{mtxinv:460:50}) and~(\ref{mtxinv:460:55}), the algorithm gradually transforms~$\tilde Q$ into a dimension-limited, $m \times r$ matrix~$Q$ of orthonormal columns, distributing the inverse elementaries to~$\tilde U$, $\tilde D$ and~$\tilde S$ according to Table~\ref{gjrank:337:table}---where the latter three working matrices ultimately become the extended-operational factors~$U$, $D$ and~$S$ of~(\ref{mtxinv:QR}). \index{nesting} \index{loop} \index{majority} Borrowing the language of computer science we observe that the indices~$i$ and~$j$ of~(\ref{mtxinv:460:50}) and~(\ref{mtxinv:460:55}) imply a two-level nested loop, one level looping over~$j$ and the other over~$i$. The equations suggest \emph{$j$-major nesting,} with the loop over~$j$ at the outer level and the loop over~$i$ at the inner, such that the several $(i,j)$ index pairs occur in the sequence (reading left to right then top to bottom) \[ \br{cccc} (1,2) &&& \\ (1,3) & (2,3) && \\ (1,4) & (2,4) & (3,4) & \\ \cdots & \cdots & \cdots & \ddots \er \] In reality, however, (\ref{mtxinv:460:55})'s middle line requires only that no~$\vu x_{i\perp}$ be used before it is fully calculated; otherwise that line does not care which $(i,j)$ pair follows which. The $i$-major nesting \[ \br{cccc} (1,2) & (1,3) & (1,4) & \cdots \\ & (2,3) & (2,4) & \cdots \\ & & (3,4) & \cdots \\ &&& \ddots \er \] bringing the very same index pairs in a different sequence, is just as valid. We choose $i$-major nesting on the subtle ground that it affords better information to the choice of column index~$p$ during the algorithm's step~\ref{mtxinv:461:s20}. The algorithm, in detail: \begin{enumerate} \item \label{mtxinv:461:s10} Begin by initializing \[ \br{c} \tilde U \la I, \ \tilde D \la I, \ \tilde S \la I, \\ \setlength\arraycolsep{0.30\arraycolsep} \br{rcl} \tilde Q &\la& A, \\ i &\la& 1. \er \er \] \item \label{mtxinv:461:s15} (Besides arriving at this point from step~\ref{mtxinv:461:s10} above, the algorithm also re\"enters here from step~\ref{mtxinv:461:s80} below.) Observe that~$\tilde U$ enjoys the major partial unit triangular form $L^{\{i-1\}T}$ (\S~\ref{matrix:330.05}), that~$\tilde D$ is a general scaling operator (\S~\ref{matrix:325.20}) with $\tilde d_{jj} = 1$ for all $j \ge i$, that~$\tilde S$ is permutor (\S~\ref{matrix:325.10}), and that the first through $(i-1)$th columns of~$\tilde Q$ consist of mutually orthonormal unit vectors. \item \label{mtxinv:461:s20} \index{column} \index{matrix!column of} \index{null column} \index{matrix!null column of} Choose a column $p \ge i$ of~$\tilde Q$ containing at least one nonzero element. (The simplest choice is perhaps $p=i$ as long as the $i$th column does not happen to be null, but one might instead prefer to choose the column of greatest magnitude, or to choose randomly, among other heuristics.) If~$\tilde Q$ is null in and rightward of its $i$th column such that no column $p \ge i$ remains available to choose, then skip directly to step~\ref{mtxinv:461:s90}. \item \label{mtxinv:461:s30} \index{interchange} Observing that~(\ref{mtxinv:460:60}) can be expanded to read \bqb A &=& \Big( \tilde Q T_{[i \lra p]} \Big) \Big( T_{[i \lra p]} \tilde U T_{[i \lra p]} \Big) \Big( T_{[i \lra p]} \tilde D T_{[i \lra p]} \Big) \Big( T_{[i \lra p]} \tilde S \Big) \\&=& \Big( \tilde Q T_{[i \lra p]} \Big) \Big( T_{[i \lra p]} \tilde U T_{[i \lra p]} \Big) \tilde D \Big( T_{[i \lra p]} \tilde S \Big), \eqb where the latter line has applied a rule from Table~\ref{gjrank:337:table}, interchange the chosen $p$th column to the $i$th position by \[ \begin{split} \tilde Q &\la \tilde Q T_{[i \lra p]}, \\ \tilde U &\la T_{[i \lra p]} \tilde U T_{[i \lra p]}, \\ \tilde S &\la T_{[i \lra p]} \tilde S. \end{split} \] \item \label{mtxinv:461:s35} \index{normalization} Observing that~(\ref{mtxinv:460:60}) can be expanded to read \[ A = \Big( \tilde Q T_{(1/\alpha)[i]} \Big) \Big( T_{\alpha[i]} \tilde U T_{(1/\alpha)[i]} \Big) \Big( T_{\alpha[i]} \tilde D \Big) \tilde S, \] normalize the $i$th column of~$\tilde Q$ by \[ \begin{split} \tilde Q &\la \tilde Q T_{(1/\alpha)[i]}, \\ \tilde U &\la T_{\alpha[i]} \tilde U T_{(1/\alpha)[i]}, \\ \tilde D &\la T_{\alpha[i]} \tilde D, \end{split} \] where \[ \alpha = \sqrt{ \left[ \tilde Q \right]_{*i}^{*} \cdot \left[ \tilde Q \right]_{*i} }. \] \item \label{mtxinv:461:s40} Initialize \[ j \la i+1. \] \item \label{mtxinv:461:s50} \index{orthogonalization} (Besides arriving at this point from step~\ref{mtxinv:461:s40} above, the algorithm also re\"enters here from step~\ref{mtxinv:461:s60} below.) If $j>n$ then skip directly to step~\ref{mtxinv:461:s80}. Otherwise, observing that~(\ref{mtxinv:460:60}) can be expanded to read \[ A = \Big( \tilde Q T_{-\beta[ij]} \Big) \Big( T_{\beta[ij]} \tilde U \Big) \tilde D \tilde S, \] orthogonalize the $j$th column of~$\tilde Q$ per~(\ref{mtxinv:460:55}) with respect to the $i$th column by \[ \begin{split} \tilde Q &\la \tilde Q T_{-\beta[ij]}, \\ \tilde U &\la T_{\beta[ij]} \tilde U, \end{split} \] where \[ \beta = \left[ \tilde Q \right]_{*i}^{*} \cdot \left[ \tilde Q \right]_{*j}. \] \item \label{mtxinv:461:s60} Increment \[ j \la j + 1 \] and return to step~\ref{mtxinv:461:s50}. \item \label{mtxinv:461:s80} Increment \[ i \la i + 1 \] and return to step~\ref{mtxinv:461:s15}. \item \label{mtxinv:461:s90} Let \[ \br{c} Q \equiv \tilde Q, \ U \equiv \tilde U, \ D \equiv \tilde D, \ S \equiv \tilde S, \\ r = i-1. \er \] End. \end{enumerate} % bad break: the index entries that follow have bad breaks \index{Gram-Schmidt decomposition!differences of against the Gauss-\newline Jordan} \index{Gauss-Jordan decomposition!differences of against the Gram-\newline Schmidt} \index{decomposition!differences between the Gram-\newline Schmidt and Gauss-Jordan} Though the Gram-Schmidt algorithm broadly resembles the % bad break Gauss-\linebreak Jordan, at least two significant differences stand out: (i)~the Gram-Schmidt is one-sided because it operates only on the columns of~$\tilde Q$, never on the rows; (ii)~since $Q$ is itself dimension-limited, the Gram-Schmidt decomposition~(\ref{mtxinv:QR}) needs and has no explicit factor~$I_r$. \index{Gram-Schmidt decomposition!factor~$S$ of} As in \S~\ref{gjrank:340.60}, here also one sometimes prefers that $S=I$. The algorithm optionally supports this preference if the $m \times n$ matrix~$A$ has full column rank $r=n$, when null columns cannot arise, if one always chooses $p=i$ during the algorithm's step~\ref{mtxinv:461:s20}. Such optional discipline maintains $S=I$ when desired. \index{factorization!full-rank} \index{full-rank factorization} \index{pseudoinverse} \index{Moore-Penrose pseudoinverse} Whether $S=I$ or not, the matrix $Q=QI_r$ has only~$r$ columns, so one can write~(\ref{mtxinv:QR}) as \[ A = (QI_r)(R). \] Reassociating factors, this is \bq{mtxinv:460:70} A = (Q)(I_rR), \eq which per~(\ref{gjrank:340:26}) is a proper full-rank factorization with which one can compute the pseudoinverse~$A^\dagger$ of~$A$ (see eqn.~\ref{mtxinv:psinv}, above; but see also eqn.~\ref{mtxinv:QRinv}, below). \index{Gram-Schmidt decomposition!factor~$Q$ of} If the Gram-Schmidt decomposition~(\ref{mtxinv:QR}) looks useful, it is even more useful than it looks. The most interesting of its several factors is the $m \times r$ orthonormalized matrix~$Q$, whose orthonormal columns address the same space the columns of~$A$ themselves address. If~$Q$ reaches the maximum possible rank $r=m$, achieving square, $m \times m$ shape, then it becomes a \emph{unitary matrix}---the subject of \S~\ref{mtxinv:465}. Before treating the unitary matrix, however, let us pause to extract a kernel from the Gram-Schmidt decomposition in \S~\ref{mtxinv:460.40}, next. \subsection{The Gram-Schmidt kernel formula} \label{mtxinv:460.40} \index{Gram-Schmidt kernel formula} \index{kernel!Gram-Schmidt formula for} Like the Gauss-Jordan decomposition in~(\ref{mtxinv:245:kernel}), the Gram-Schmidt decomposition too brings a kernel formula. To develop and apply it, one decomposes an $m \times n$ matrix \bq{mtxinv:460:74} A=QR \eq per the Gram-Schmidt~(\ref{mtxinv:QR}) and its algorithm in \S~\ref{mtxinv:460.30}. Observing that the~$r$ independent columns of the $m \times r$ matrix~$Q$ address the same space the columns of~$A$ address, one then constructs the $m \times (r+m)$ matrix \bq{mtxinv:460:76} A' \equiv Q + I_mH_{-r} = \left[ \br{cc}Q&I_m\er \right] \eq and decomposes it too, \bq{mtxinv:460:77} A' = Q'R', \eq again by Gram-Schmidt---with the differences that, this time, one chooses $p=1,2,3,\ldots,r$ during the first~$r$ instances of the algorithm's step~\ref{mtxinv:461:s20}, and that one skips the unnecessary step~\ref{mtxinv:461:s50} for all $j \le r$; on the ground that the earlier Gram-Schmidt application of~(\ref{mtxinv:460:74}) has already orthonormalized first~$r$ columns of~$A'$, which columns, after all, are just~$Q$. The resulting $m \times m$, full-rank square matrix \bq{mtxinv:460:kernel0} Q' = Q + A^\perp H_{-r} = \left[ \br{cc}Q&A^\perp\er \right] \eq consists of \bi \item $r$ columns on the left that address the same space the columns of~$A$ address and \item $m-r$ columns on the right that give a complete orthogonal complement (\S~\ref{mtxinv:450})~$A^\perp$ of~$A$. \ei Each column has unit magnitude and conveniently lies orthogonal to every other column, left and right. Equation~(\ref{mtxinv:460:kernel0}) is probably the more useful form, but the % bad break \emph{Gram-\linebreak Schmidt kernel formula} as such, \bq{mtxinv:460:kernel} A^{*K} = A^\perp = Q'H_rI_{m-r}, \eq extracts the rightward columns that express the kernel, not of~$A$, but of~$A^{*}$. To compute the kernel of a matrix~$B$ by Gram-Schmidt one sets $A=B^{*}$ and applies~(\ref{mtxinv:460:74}) through~(\ref{mtxinv:460:kernel}). Refer to~(\ref{mtxinv:450:30}). In either the form~(\ref{mtxinv:460:kernel0}) or the form~(\ref{mtxinv:460:kernel}), the Gram-Schmidt kernel formula does everything the Gauss-Jordan kernel formula~(\ref{mtxinv:245:kernel}) does and in at least one sense does it better; for, if one wants a Gauss-Jordan kernel orthonormalized, then one must orthonormalize it as an extra step, whereas the Gram-Schmidt kernel comes already orthonormalized. Being square, the $m \times m$ matrix~$Q'$ is a unitary matrix, as the last paragraph of \S~\ref{mtxinv:460.30} has alluded. The unitary matrix is the subject of \S~\ref{mtxinv:465} that follows. % ---------------------------------------------------------------------- \section{The unitary matrix} \label{mtxinv:465} \index{unitary matrix} \index{matrix!unitary} When the orthonormalized matrix~$Q$ of the Gram-Schmidt decomposition % bad break (\ref{mtxinv:QR}) is square, having the maximum possible rank $r=m$, it brings one property so interesting that the property merits a section of its own. The property is that \bq{mtxinv:unitary} Q^{*}Q = I_m = QQ^{*}. \eq The reason that $Q^{*}Q = I_m$ is that $Q$'s columns are orthonormal, and that the very definition of orthonormality demands that the dot product $[Q]_{*i}^{*} \cdot [Q]_{*j}$ of orthonormal columns be zero unless $i=j$, when the dot product of a unit vector with itself is unity. That $I_m=QQ^{*}$ is unexpected, however, until one realizes% \footnote{\cite[\S~4.4]{Franklin}} that the equation $Q^{*}Q = I_m$ characterizes~$Q^{*}$ to be the rank-$m$ inverse of~$Q$, and that \S~\ref{mtxinv:220} lets any rank-$m$ inverse (orthonormal or otherwise) attack just as well from the right as from the left. Thus, \bq{mtxinv:465:15} Q^{-1} = Q^{*}, \eq a very useful property. A matrix~$Q$ that satisfies~(\ref{mtxinv:unitary}), whether derived from the Gram-Schmidt or from elsewhere, is called a \emph{unitary matrix.} (Note that the permutor of \S~\ref{matrix:325.10} enjoys the property of eqn.~\ref{mtxinv:465:15} precisely because it is unitary.) \index{orthonormal rows and columns} \index{row!orthonormal} \index{column!orthonormal} One immediate consequence of~(\ref{mtxinv:unitary}) is that \emph{a square matrix with either orthonormal columns or orthonormal rows is unitary and has both.} \emph{The product of two or more unitary matrices is itself unitary} if the matrices are of the same dimensionality. To prove it, consider the product \bq{mtxinv:465:20} Q = Q_aQ_b \eq of $m \times m$ unitary matrices~$Q_a$ and~$Q_b$. Let the symbols~$\ve q_j$, $\ve q_{aj}$ and~$\ve q_{bj}$ respectively represent the $j$th columns of~$Q$, $Q_a$ and~$Q_b$ and let the symbol~$q_{bij}$ represent the $i$th element of~$\ve q_{bj}$. By the columnwise interpretation (\S~\ref{matrix:120.27}) of matrix multiplication, \[ \ve q_j = \sum_i q_{bij} \ve q_{ai}. \] The adjoint dot product of any two of $Q$'s columns then is \[ \ve q_{j'}^{*} \cdot \ve q_j = \sum_{i,i'} q_{bi'j'}^{*}q_{bij}\ve q_{ai'}^{*}\cdot \ve q_{ai}. \] But $\ve q_{ai'}^{*}\cdot \ve q_{ai} = \delta_{i'i}$ because~$Q_a$ is unitary,% \footnote{ This is true only for $1 \le i \le m$, but you knew that already. } so \[ \ve q_{j'}^{*} \cdot \ve q_j = \sum_i q_{bij'}^{*}q_{bij} = \ve q_{bj'}^{*} \cdot \ve q_{bj} = \delta_{j'j}, \] which says neither more nor less than that the columns of~$Q$ are orthonormal, which is to say that~$Q$ is unitary, as was to be demonstrated. \index{length!preservation of} \index{magnitude!preservation of} \emph{Unitary operations preserve length.} That is, operating on an $m$-element vector by an $m \times m$ unitary matrix does not alter the vector's magnitude. To prove it, consider the system \[ Q\ve x = \ve b. \] Multiplying the system by its own adjoint yields \[ \ve x^{*}Q^{*}Q\ve x = \ve b^{*}\ve b. \] But according to~(\ref{mtxinv:unitary}), $Q^{*}Q=I_m$; so, \[ \ve x^{*}\ve x = \ve b^{*}\ve b, \] as was to be demonstrated. \index{$QR$ decomposition!inverting a matrix by} \index{Gram-Schmidt decomposition!inverting a matrix by} \index{orthonormalizing decomposition!inverting a matrix by} \index{decomposition!Gram-Schmidt, inverting a matrix by} Equation~(\ref{mtxinv:465:15}) lets one use the Gram-Schmidt decomposition~(\ref{mtxinv:QR}) to invert a square matrix as \bq{mtxinv:QRinv} A^{-1} = R^{-1}Q^{*} = S^{*}D^{-1}U^{-1}Q^{*}. \eq \index{extended operator} Unitary extended operators are certainly possible, for if~$Q$ is an $m \times m$ dimension-limited matrix, then the extended operator \[ Q_\infty = Q + (I-I_m), \] which is just~$Q$ with ones running out the main diagonal from its active region, itself meets the unitary criterion~(\ref{mtxinv:unitary}) for $m=\infty$. Unitary matrices are so easy to handle that they can sometimes justify significant effort to convert a model to work in terms of them if possible. We shall meet the unitary matrix again in \S\S~\ref{eigen:520} and~\ref{eigen:600}. % ---------------------------------------------------------------------- The chapter as a whole has demonstrated at least in theory (and usually in practice) techniques to solve any linear system characterized by a matrix of finite dimensionality, whatever the matrix's rank or shape. It has explained how to orthonormalize a set of vectors and has derived from the explanation the useful Gram-Schmidt decomposition. As the chapter's introduction had promised, the matrix has shown its worth here; for without the matrix's notation, arithmetic and algebra most of the chapter's findings would have lain beyond practical reach. And even so, the single most interesting agent of matrix arithmetic remains yet to be treated. This last is the eigenvalue, and it is the subject of Ch.~\ref{eigen}, next. derivations-0.53.20120414.orig/tex/template.bib0000644000000000000000000000127511742566274017435 0ustar rootroot@book{label % Required: author= | editor= title= publisher= year= % Optional: volume= | number= series= address= edition= month= note= } @inbook{label % Required: author= | editor= title= chapter= &| pages= publisher= year= % Optional: volume= | number= series= type= address= edition= month= note= } @article{label % Required: author= title= journal= year= % Optional: volume= number= pages= month= note= } @manual{label % Required: title= % Optional: author= organization= address= edition= month= year= note= } @misc{label % Optional: author= title= howpublished= month= year= note= } derivations-0.53.20120414.orig/tex/keyval.tex0000644000000000000000000000514111742575144017151 0ustar rootroot%% %% This is file `keyval.tex', %% generated with the docstrip utility. %% %% The original source files were: %% %% xkeyval.dtx (with options: `xkvkeyval') %% %% --------------------------------------- %% Copyright (C) 2004-2008 Hendri Adriaens %% --------------------------------------- %% %% This work may be distributed and/or modified under the %% conditions of the LaTeX Project Public License, either version 1.3 %% of this license or (at your option) any later version. %% The latest version of this license is in %% http://www.latex-project.org/lppl.txt %% and version 1.3 or later is part of all distributions of LaTeX %% version 2003/12/01 or later. %% %% This work has the LPPL maintenance status "maintained". %% %% This Current Maintainer of this work is Hendri Adriaens. %% %% This work consists of the file xkeyval.dtx and derived files %% keyval.tex, xkvtxhdr.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, pst-xkey.tex, pst-xkey.sty, xkveca.cls, xkvecb.cls, %% xkvesa.sty, xkvesb.sty, xkvesc.sty, xkvex1.tex, xkvex2.tex, %% xkvex3.tex and xkvex4.tex. %% %% The following files constitute the xkeyval bundle and must be %% distributed as a whole: readme, xkeyval.pdf, keyval.tex, %% pst-xkey.sty, pst-xkey.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, xkvtxhdr.tex, pst-xkey.dtx and xkeyval.dtx. %% %% %% Based on keyval.sty. %% \def\XKV@tempa#1{% \def\KV@@sp@def##1##2{% \futurelet\XKV@resa\KV@@sp@d##2\@nil\@nil#1\@nil\relax##1}% \def\KV@@sp@d{% \ifx\XKV@resa\@sptoken \expandafter\KV@@sp@b \else \expandafter\KV@@sp@b\expandafter#1% \fi}% \def\KV@@sp@b#1##1 \@nil{\KV@@sp@c##1}% } \XKV@tempa{ } \def\KV@@sp@c#1\@nil#2\relax#3{\XKV@toks{#1}\edef#3{\the\XKV@toks}} \def\KV@do#1,{% \ifx\relax#1\@empty\else \KV@split#1==\relax \expandafter\KV@do\fi} \def\KV@split#1=#2=#3\relax{% \KV@@sp@def\XKV@tempa{#1}% \ifx\XKV@tempa\@empty\else \expandafter\let\expandafter\XKV@tempc \csname\KV@prefix\XKV@tempa\endcsname \ifx\XKV@tempc\relax \XKV@err{`\XKV@tempa' undefined}% \else \ifx\@empty#3\@empty \KV@default \else \KV@@sp@def\XKV@tempb{#2}% \expandafter\XKV@tempc\expandafter{\XKV@tempb}\relax \fi \fi \fi} \def\KV@default{% \expandafter\let\expandafter\XKV@tempb \csname\KV@prefix\XKV@tempa @default\endcsname \ifx\XKV@tempb\relax \XKV@err{No value specified for key `\XKV@tempa'}% \else \XKV@tempb\relax \fi} \def\KV@def#1#2[#3]{% \@namedef{KV@#1@#2@default\expandafter}\expandafter {\csname KV@#1@#2\endcsname{#3}}% \@namedef{KV@#1@#2}##1} \endinput %% %% End of file `keyval.tex'. derivations-0.53.20120414.orig/tex/thb.sty0000644000000000000000000003031111742574473016453 0ustar rootroot\ProvidesPackage{thb} \usepackage{ifthen} \usepackage{latexsym} \usepackage{amsmath} \usepackage{amssymb} \usepackage{graphicx} \usepackage{pst-plot} \usepackage{makeidx} % T.H. Black's general LaTeX macros. % % You can use this style for whatever you want, but the author does not % especially recommend that you use it for anything except editing the % author's works. The only theme the style's macros have in common is % that they share the same author. The style is published only because % one needs it to compile from source at least one of the author's % works. % % The style, named `thb' according to the author's initials, gathers % various macros the author has made for his own use over the years for % various, sometimes forgotten reasons. Some of the macros do % interesting things like printing a schematical vector arrowhead symbol % or a closed double-integration symbol, and some are just abbreviations % for frequently used commands, but others do things better done by % existing LaTeX standard or LaTeX package commands the author had not % yet learned at the time he created the macro. Such obsolete macros % can remain because unrevised old source text of the author's still % calls them. % % Other than by telling TeX how to hyphenate a handful of words the % author uses and TeX fumbles, the style does not alter standard LaTeX. % It does not narrow the margins or put a special footer on each page, % or anything like that. It just provides some macros, which one can % use or ignore. % % The author does not undertake to make later versions of this % particular style backward-compatible with earlier. (In fact, the % author can pretty much guarantee that later versions will introduce % incompatibilities because, as stated above, the style this file % defines is nothing more than a collection of the author's personal % macros. Like your own personal macros perhaps, the author's too tend % to change over time.) % ---------------------------------------------------------------------- % nonstandard math functions % \newcommand{\cis}{\ensuremath{\mathop\mathrm{cis}\nolimits}} \newcommand{\sinc}{\ensuremath{\mathop\mathrm{sinc}\nolimits}} \newcommand{\sinarg}{\ensuremath{\mathop\mathrm{Sa}\nolimits}} \newcommand{\sinint}{\ensuremath{\mathop\mathrm{Si}\nolimits}} \newcommand{\cosint}{\ensuremath{\mathop\mathrm{Ci}\nolimits}} \newcommand{\expint}{\ensuremath{\mathop\mathrm{Ei}\nolimits}} % % One can create further nonstandard math functions in a similar way. % If limits should be printed directly above and below the function % name in displayed formulas, as they are with the built-in \lim and % \sum functions, then eliminate the "\nolimits" from the function's % definition. Here is a predefined way of doing this: \newcommand{\mopx}[1]{\ensuremath{\mathop\mathrm{#1}\nolimits}} \newcommand{\mop }[1]{\ensuremath{\mathop\mathrm{#1} }} % other nonstandard math symbols % %\newcommand{\apxle}{\raisebox{-0.8ex}{$\ \stackrel{\textstyle <}{\sim}\ $}} %\newcommand{\apxge}{\raisebox{-0.8ex}{$\ \stackrel{\textstyle >}{\sim}\ $}} \newcommand{\apxle}{\lesssim} \newcommand{\apxge}{\gtrsim} % ---------------------------------------------------------------------- % widths, blanks and shifts % % The \sh{n} command inserts a blank space n times the width of a % standard paragraph indentation. This is useful in formatting % multiline formulas. Because the standard indent is properly fitted to % the font and font size, it is preferable to scale off the standard % indent than to use a hard length like "0.4in". % \newcommand{\sh}[1]{\makebox[#1\parindent]{}} % % Long before the computer age, typesetters adopted the general practice % of making all digits exactly the same width, even "1", so that tallies % and sums would line up properly. This practice continues for most % fonts today, including TeX's standard Computer Modern font. The % length \wdig is defined to work like \em or \parindent, but is set to % the width of a digit in the present font. With it, one can insert % blanks the precise width of a digit, or multiples thereof, which % sometimes helps in neatly lining columns of figures up. % \newlength {\wdig} \settowidth{\wdig}{0} % % The \mstrut{} command inserts an invisible strut the height (and % depth) of some given mathematical expression. The given expression % itself is not printed. This sometimes helps when you want LaTeX to % print taller parentheses, root-signs, etc., than it otherwise would. % \newlength{\mstruth} \newlength{\mstrutd} \newcommand{\mstrut}[1]{% {% \settoheight{\mstruth}{\ensuremath{{\displaystyle #1}}}% \settodepth {\mstrutd}{\ensuremath{{\displaystyle #1}}}% \rule{0em}{1.0\mstruth}% \rule[-\mstrutd]{0em}{\mstruth}% }% } % units and other plain roman in math % \newcommand{\un}[1]{\ensuremath{\mathrm{#1}}} \newcommand{\mr}[1]{\ensuremath{\mathrm{#1}}} % itemization % \newcommand{\bi}{\begin{itemize}} \newcommand{\ei}{\end{itemize}} % four low dots to end an incomplete sentence.... %\newcommand{\mdots}{\mbox{.\,\ldots}\ \ } \newcommand{\mdots}{\mbox{.\:.\:.\:.}\ \ } % other % \newcommand{\fn}{\footnotesize} % ---------------------------------------------------------------------- % accented characters \newcommand{\tdi}{{\makebox[-0.15ex]{}\textit{\~\i}\makebox[0.15ex]{}}} \newcommand{\tdj}{{\makebox[-0.15ex]{}\textit{\~\j}\makebox[0.15ex]{}}} \newcommand{\tdk}{{\makebox[-0.15ex]{}\textit{\~{\makebox[-1.35ex]{}\makebox{k}}}\makebox[0.15ex]{}}} % ---------------------------------------------------------------------- % vectors % % (The \we and \wu can be used with \usepackage{amsmath}.) % \newcommand{\ve}[1]{\ensuremath{\mathbf{#1}}} \newcommand{\vu}[1]{\ensuremath{\hat{\mathbf{#1}}}} \newcommand{\we}[1]{\ensuremath{\boldsymbol{#1}}} \newcommand{\wu}[1]{\ensuremath{\hat{\boldsymbol{#1}}}} \newcommand{\dyad}[1]{\ensuremath{\overline{\overline{#1}}}} \newcommand{\vui}{\ensuremath{\hat{\mbox{\textbf{\i}}}}} \newcommand{\vuj}{\ensuremath{\hat{\mbox{\textbf{\j}}}}} % graphical vector symbols (\usepackage{pst-plot}) % \newcommand{\vectortoward}{% \begin{pspicture}(-0.170,-0.135)(0.170,0.135)% \pscircle[linewidth=1.0pt](0,0){0.15}% \psdot[linewidth=0.6pt](0.0,0.0)% \end{pspicture}% } \newcommand{\vectoraway}{% \begin{pspicture}(-0.170,-0.135)(0.170,0.135)% \pscircle[linewidth=1.0pt](0,0){0.15}% \psline[linewidth=1.0pt](-0.105,-0.105)(0.105,0.105)% \psline[linewidth=1.0pt](-0.105,0.105)(0.105,-0.105)% \end{pspicture}% } % inversion % \newcommand{\nv}[1]{\ensuremath{{#1}^{-1}}} % Greek letters % \newcommand{\ep}{\ensuremath{\epsilon}} % arrows % \newcommand{\la}{\ensuremath{\leftarrow}} \newcommand{\ra}{\ensuremath{\rightarrow}} \newcommand{\lra}{\ensuremath{\leftrightarrow}} % equations % \newcommand{\bq}[1]{\begin{equation}\label{#1}} \newcommand{\eq}{\end{equation}} \newcommand{\bqx}{\begin{equation}} \newcommand{\eqx}{\end{equation}} \newcommand{\bqa}{\begin{eqnarray}} \newcommand{\eqa}{\end{eqnarray}} \newcommand{\bqb}{\begin{eqnarray*}} \newcommand{\eqb}{\end{eqnarray*}} \newcommand{\xn}{\nonumber} % arrays % \newcommand{\br}[1]{\begin{array}{#1}} \newcommand{\er}{\end{array}} \newcommand{\ds}{\displaystyle} \newcommand{\mf}[2]{ \ensuremath{ \mbox{% \footnotesize$ \displaystyle\left[ \begin{array}{#1} #2 \end{array} \right] $% } } } \newcommand{\mfd}[2]{ \ensuremath{ \mbox{% \footnotesize$ \displaystyle\left| \begin{array}{#1} #2 \end{array} \right| $% } } } % Stack index declarations in a subscript. \newcommand\stackindexdecl[1]{ { \ensuremath \renewcommand\arraystretch{0.80} \setlength\arraycolsep{0.00\arraycolsep} \mbox{ \scriptsize \!$ \begin{array}{rcl} #1 \end{array} $\! } } } % delimiters {[()]} of uniform large size % (Obsolete. Use AMSmath's \big, \Big, \bigg, etc., instead.) % \newcommand{\blp}{\ensuremath{\left(\rule{0em}{3.5ex}}} \newcommand{\brp}{\ensuremath{\rule{0em}{3.5ex}\right)}} \newcommand{\blq}{\ensuremath{\left[\rule{0em}{3.5ex}}} \newcommand{\brq}{\ensuremath{\rule{0em}{3.5ex}\right]}} \newcommand{\blr}{\ensuremath{\left\{\rule{0em}{3.5ex}}} \newcommand{\brr}{\ensuremath{\rule{0em}{3.5ex}\right\}}} % and smaller sizes \newcommand{\blpb}{\ensuremath{\left(\rule{0em}{3.0ex}}} \newcommand{\brpb}{\ensuremath{\rule{0em}{3.0ex}\right)}} \newcommand{\blqb}{\ensuremath{\left[\rule{0em}{3.0ex}}} \newcommand{\brqb}{\ensuremath{\rule{0em}{3.0ex}\right]}} \newcommand{\blrb}{\ensuremath{\left\{\rule{0em}{3.0ex}}} \newcommand{\brrb}{\ensuremath{\rule{0em}{3.0ex}\right\}}} % and smaller sizes \newcommand{\blpc}{\ensuremath{\left(\rule{0em}{2.5ex}}} \newcommand{\brpc}{\ensuremath{\rule{0em}{2.5ex}\right)}} \newcommand{\blqc}{\ensuremath{\left[\rule{0em}{2.5ex}}} \newcommand{\brqc}{\ensuremath{\rule{0em}{2.5ex}\right]}} \newcommand{\blrc}{\ensuremath{\left\{\rule{0em}{2.5ex}}} \newcommand{\brrc}{\ensuremath{\rule{0em}{2.5ex}\right\}}} % ---------------------------------------------------------------------- % double integrals (the "l" forms are for use in running text) % (these macros, and those below for triple integrals, need more robustness) % \newcommand{\intos}{\ensuremath{\int\makebox[-1.2em]{}\bigcirc\makebox[-1.2em]{}\int_S}} \newcommand{\intosl}{\ensuremath{\int\makebox[-0.62em]{}{\circ}\makebox[-0.62em]{}\int_S}} \newcommand{\intosx}{\ensuremath{\int\makebox[-1.2em]{}\bigcirc\makebox[-1.2em]{}\int}} \newcommand{\intosxl}{\ensuremath{\int\makebox[-0.62em]{}{\circ}\makebox[-0.62em]{}\int}} \newcommand{\ints}{\ensuremath{\int\!\!\!\!\int_S}} \newcommand{\intsl}{\ensuremath{\int\!\!\int_S}} \newcommand{\intsx}{\ensuremath{\int\!\!\!\!\int}} \newcommand{\intsxl}{\ensuremath{\int\!\!\int}} % triple integrals % \newcommand{\intv}{\ensuremath{\int\!\!\!\!\int\!\!\!\!\int_V}} \newcommand{\intvl}{\ensuremath{\int\!\!\int\!\!\int_V}} \newcommand{\intvx}{\ensuremath{\int\!\!\!\!\int\!\!\!\!\int}} \newcommand{\intvxl}{\ensuremath{\int\!\!\int\!\!\int}} % linear operators % \newcommand{\divg}{\ensuremath{\nabla\cdot}} \newcommand{\curl}{\ensuremath{\nabla\times}} \newcommand{\pp}[2]{\ensuremath{\frac{\partial{#1}}{\partial{#2}}}} \newcommand{\ppx}[1]{\ensuremath{\frac{\partial}{\partial{#1}}}} \newcommand{\pl}{\ensuremath\partial} % EM parameters % \newcommand{\epsc}{\ensuremath{\left[\epsilon+\frac{\sigma}{j\omega}\right]}} % Other operators and functions % \newcommand{\cmb}[2]{\ensuremath{\left(\begin{array}{c}{#1}\\{#2}\end{array}\right)}} \newcommand{\cmbl}[2]{\ensuremath{\mbox{\scriptsize$\cmb{#1}{#2}$}}} \newcommand{\fouripair}{\ensuremath{\stackrel{\mathcal{F}}{\rightarrow}}} \newcommand{\laplair}{\ensuremath{\stackrel{\mathcal{L}}{\rightarrow}}} % ---------------------------------------------------------------------- % centering % \newcommand{\bc}{\begin{center}} \newcommand{\ec}{\end{center}} % PSTricks % % (nothing yet) % Memoranda % \newcommand{\memo}[4]{ \noindent \begin{center} MEMORANDUM \end{center} \noindent \makebox[-1.2\arraycolsep]{}\begin{tabular}{ll} From: & #1 \\ Date: & #2 \\ To: & #3 \\ Subject: & #4 \\ \end{tabular}\\ } % TeX directives % \newcommand{\nc}{\newcommand} % citation % \newcommand{\textcomp}[1]{\texttt{#1}} % hyphenation % % TeX is uncannily good at hy-phen-at-ing hyphenating words. However, % occasionally a word will fool it. Add such words here, separated by % spaces. (If you want to see how TeX thinks a word may be hyphenated, % give the \showhyphens{} command, then examine the LaTeX run log.) % \hyphenation{ Helm-holtz iso-trop-ic an-iso-trop-ic in-te-grate in-te-grat-ing con-stit-u-ent con-stit-u-ents quasi-el-e-men-tary quasi-el-e-men-taries none-the-less manu-script manu-scripts de-riv-a-tive de-riv-a-tives equi-dis-tant equi-dis-tance pa-rab-o-la par-a-bol-ic par-a-bol-i-cal-ly pa-rab-o-loid pa-rab-o-loi-dal Py-thag-o-re-an } % ---------------------------------------------------------------------- % Bug work-arounds. % % No typesetting system is perfect. Here are some alternate definitions % you may need to build on different versions of LaTeX. % Uncomment one of the following scalebox alternatives. %\newcommand{\localscalebox}[3]{\scalebox{#1 #2}{#3}} % teTeX 2 \newcommand{\localscalebox}[3]{\scalebox{#1}[#2]{#3}} % teTeX 3 or TeXLive derivations-0.53.20120414.orig/tex/fouri.tex0000644000000000000000000025037011742566274017014 0ustar rootroot\chapter{The Fourier and Laplace transforms} \label{fouri} \index{transform} \index{Fourier transform} \index{Laplace transform} \index{Fourier, Jean Baptiste Joseph\\(1768--1830)} % bad break \index{Laplace, Pierre-Simon (1749--1827)} The Fourier series of Ch.~\ref{fours} though quite useful applies solely to waveforms that repeat. An effort to extend the Fourier series to the broader domain of nonrepeating waveforms leads to the \emph{Fourier transform,} this chapter's chief subject. [This chapter is yet only a rough draft.] % ---------------------------------------------------------------------- \section{The Fourier transform} \label{fouri:100} \index{Fourier transform} \index{transform!Fourier} This section derives and presents the Fourier transform, extending the % bad break \linebreak Fourier series. \subsection{Fourier's equation} \label{fouri:100.10} \index{Fourier's equation} \index{pulse} \index{pulse train} \index{nonrepeating waveform} \index{waveform!nonrepeating} \index{integral!as the continuous limit of a sum} \index{sum!continuous limit of} Consider the nonrepeating waveform or \emph{pulse} of Fig.~\ref{fouri:100:fig}. \begin{figure} \caption{A pulse.} \label{fouri:100:fig} \bc \nc\xxxab{4.3} \nc\xxya{-0.7} \nc\xxyb{1.6} \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-1.1} \nc\fyb{2.5} \nc\xxaxes{% {% \psset{linewidth=0.5pt}% \psline(-\xxxab,0)(\xxxab,0)% \psline(0,\xxya)(0,\xxyb)% \uput[r](\xxxab,0){$t$}% \uput[u](0,\xxyb){$f(t)$}% }% } \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} \small { \xxaxes \psplot[linewidth=2.0pt,plotpoints=300]{-4.05}{4.05}{ /e 2.71828182845905 def /scale 0.7 def /pulse { /A exch def /sigma exch def /to exch def e x scale div to sub sigma div dup mul 2.0 div neg exp A mul scale mul } def -0.20 1.00 2.20 pulse 0.50 0.80 -1.80 pulse -2.00 0.60 1.10 pulse add add } } \end{pspicture} \ec \end{figure}% Because the pulse does not repeat it has no Fourier series, yet one can however give it something very like a Fourier series in the following way. First, convert the pulse $f(t)$ into the pulse train \[ g(t) \equiv \sum_{n=-\infty}^\infty f(t-nT_1), \] which naturally does repeat.% \footnote{ One could divert rigorously from this point to consider formal requirements against $f(t)$ but it suffices that $f(t)$ be sufficiently limited in extent that $g(t)$ exist for all $\Re(T_1) > 0$, $\Im(T_1) = 0$. Formally, such a condition would forbid a function like $f(t) = A \cos \omega_o t$, but one can evade this formality, among other ways, by defining the function as $f(t) = \lim_{T_2\ra\infty} \Pi(t/T_2) A \cos \omega_o t$, where $\Pi(t)$ is the rectangular pulse of~(\ref{fours:095:10}). We will leave to the professionals further consideration of formal requirements. } Second, by~(\ref{fours:100:15}), calculate the Fourier coefficients of this pulse train $g(t)$. Third, use these coefficients in the Fourier series~(\ref{fours:100:10}) to reconstruct \[ g(t) = \sum_{j=-\infty}^{\infty} \left\{ \left[ \frac{1}{T_1} \int_{-T_1/2}^{T_1/2} e^{-ij\,\Delta\omega\,\tau} g(\tau) \,d\tau \right] e^{ij \,\Delta\omega\, t} \right\}. \] Fourth, observing that $\lim_{T_1\ra\infty}g(t) = f(t)$, recover from the train the original pulse \[ f(t) = \lim_{T_1\ra\infty}\sum_{j=-\infty}^{\infty} \left\{ \left[ \frac{1}{T_1} \int_{-T_1/2}^{T_1/2} e^{-ij\,\Delta\omega\,\tau} f(\tau) \,d\tau \right] e^{ij \,\Delta\omega\, t} \right\}; \] or, observing per~(\ref{fours:080:08}) that $\Delta\omega\,T_1 = 2\pi$ and reordering factors, \[ f(t) = \lim_{\Delta\omega\ra 0^{+}} \frac{1}{\sqrt{2\pi}} \sum_{j=-\infty}^{\infty} e^{ij \,\Delta\omega\, t} \left[ \frac{1}{\sqrt{2\pi}} \int_{-2\pi/2\Delta\omega}^{2\pi/2\Delta\omega} e^{-ij\,\Delta\omega\,\tau} f(\tau) \,d\tau \right] \,\Delta\omega. \] Fifth, defining the symbol $\omega \equiv j\,\Delta\omega$ observe that the summation is really an integration in the limit, such that \bq{fouri:eqn} f(t) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{i\omega t} \left[ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-i\omega\tau} f(\tau) \,d\tau \right] \,d\omega. \eq This is \emph{Fourier's equation,} a remarkable, highly significant result. \subsection{The transform and inverse transform} \label{fouri:100.20} \index{inverse Fourier transform} \index{Fourier transform!inverse} The reader may agree that Fourier's equation~(\ref{fouri:eqn}) is curious, but in what way is it remarkable? To answer, let us observe that the quantity in (\ref{fouri:eqn})'s square braces, \bq{fouri:xform} F(\omega) = \mathcal{F}\left\{f(t)\right\} \equiv \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-i\omega\tau} f(\tau) \,d\tau, \eq is a function not of~$t$ but rather of~$\omega$. We conventionally give this function the capitalized symbol $F(\omega)$ and name it the \emph{Fourier transform} of $f(t)$, introducing also the useful notation $\mathcal{F}\{\cdot\}$ (where the script letter~$\mathcal{F}$ stands for ``Fourier'' and is only coincidentally, unfortunately, the same letter here as~$f$ and~$F$) as a short form to represent the transformation~(\ref{fouri:xform}) serves to define. Substituting~(\ref{fouri:xform}) into~(\ref{fouri:eqn}) and changing $\eta \la \omega$ as the dummy variable of integration, we have that \bq{fouri:invxform} f(t) = \mathcal{F}^{-1}\left\{F(\omega)\right\} = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{i\eta t} F(\eta) \,d\eta. \eq This last is the \emph{inverse Fourier transform} of the function $F(\omega)$. \index{frequency content} The Fourier transform~(\ref{fouri:xform}) serves as a continuous measure of a function's frequency content. To understand why this should be so, consider that~(\ref{fouri:invxform}) constructs a function $f(t)$ of an infinity of infinitesimally graded complex exponentials and that~(\ref{fouri:xform}) provides the weights $F(\omega)$ for the construction. Indeed, the Fourier transform's complementary equations~(\ref{fouri:invxform}) and~(\ref{fouri:xform}) are but continuous versions of the earlier complementary equations~(\ref{fours:100:10}) and~(\ref{fours:100:15}) of the discrete Fourier series. The transform finds even wider application than does the series.% \footnote{ Regrettably, several alternate definitions and usages of the Fourier series are broadly current in the writer's country alone. Alternate definitions~\cite{Phillips/Parr}\cite{Couch} handle the factors of $1/\sqrt{2\pi}$ differently. Alternate usages~\cite{Feynman} change $-i\la i$ in certain circumstances. The essential Fourier mathematics however remains the same in any case. The reader can adapt the book's presentation to the Fourier definition and usage his colleagues prefer at need. } Figure~\ref{fouri:100:figt} plots the Fourier transform of the pulse of Fig.~\ref{fouri:100:fig}. \begin{figure} \caption{The Fourier transform of the pulse of Fig.~\ref{fouri:100:fig}.} \label{fouri:100:figt} \bc \nc\xxxab{4.3} \nc\xxya{-1.40} \nc\xxyb{1.40} \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-1.7} \nc\fyb{1.7} \nc\xxaxes{% {% \psset{linewidth=0.5pt}% \psline(-\xxxab,0)(\xxxab,0)% \psline(0,\xxya)(0,\xxyb)% \uput[r](\xxxab,0){$\omega$}% \psline(-0.39,0.80)(-0.51,0.92)(-0.71,0.92) \uput[l](-0.60,0.92){$\Re[F(\omega)]$} \psline(-1.03,-0.94)(-1.15,-1.06)(-1.35,-1.06) \uput[l](-1.24,-1.06){$\Im[F(\omega)]$} }% } \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} \small { \xxaxes \psplot[linewidth=2.0pt,plotpoints=300]{-4.05}{4.05}{ /e 2.71828182845905 def /rad 57.2957795130823 def /scale 0.7 def /pulse { /A exch def /sigma exch def /to exch def e x scale div sigma mul dup mul 2.0 div neg exp A mul x scale div to mul neg rad mul cos mul scale mul } def -0.20 1.00 2.20 pulse 0.50 0.80 -1.80 pulse -2.00 0.60 1.10 pulse add add } \psplot[linewidth=2.0pt,plotpoints=300,linestyle=dashed]{-4.05}{4.05}{ /e 2.71828182845905 def /rad 57.2957795130823 def /scale 0.7 def /pulse { /A exch def /sigma exch def /to exch def e x scale div sigma mul dup mul 2.0 div neg exp A mul x scale div to mul neg rad mul sin mul scale mul } def -0.20 1.00 2.20 pulse 0.50 0.80 -1.80 pulse -2.00 0.60 1.10 pulse add add } } \end{pspicture} \ec \end{figure}% \subsection{The complementary variables of transformation} \label{fouri:100.30} \index{transformation, variable of} \index{variable of transformation} \index{complementary variables of transformation, the} \index{independent variable!Fourier transform and} \index{Fourier transform!independent variable and} \index{domain!time and frequency} \index{frequency domain} \index{time domain} \index{transform domain} \index{domain!transform} If~$t$ represents time then~$\omega$ represents angular frequency as \S~\ref{fours:085} has explained. In this case the function $f(t)$ is said to operate in the \emph{time domain} and the corresponding transformed function $F(\omega)$, in the \emph{frequency domain.} The mutually independent variables~$\omega$ and~$t$ are then the \emph{complementary variables of transformation.} \index{Fourier transform pair} \index{transform pair} Formally, one can use any two letters in place of~$t$ and~$\omega$; and indeed one need not even use two different letters, for it is sometimes easier just to write \bq{fouri:byu} \begin{split} F(v) = \mathcal{F}\left\{f(v)\right\} &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\theta} f(\theta) \,d\theta, \\ f(v) = \mathcal{F}^{-1}\left\{F(v)\right\} &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{iv\theta} F(\theta) \,d\theta, \\ \mathcal{F} &\equiv \mathcal{F}_{vv}, \end{split} \eq in which the~$\theta$ is in itself no variable of transformation but only a dummy variable. To emphasize the distinction between the untransformed and transformed (respectively typically time and frequency) domains, however, one can instead write \bq{fouri:byu2} \begin{split} F(\omega) = \mathcal{F}\left\{f(t)\right\} &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-i\omega t} f(t) \,dt, \\ f(t) = \mathcal{F}^{-1}\left\{F(\omega)\right\} &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{i\omega t} F(\omega) \,d\omega, \\ \mathcal{F} &\equiv \mathcal{F}_{\omega t}, \end{split} \eq where~(\ref{fouri:byu2}) is just~(\ref{fouri:xform}) and~(\ref{fouri:invxform}) together with appropriate changes of dummy variable. Notice here the usage of the symbol~$\mathcal F$, incidentally. As clarity demands, one can elaborate the~$\mathcal{F}$---here or wherever else it appears---as~$\mathcal{F}_{vv}$, $\mathcal{F}_{\omega t}$ or the like to identify the complementary variables of transformation explicitly. The unadorned symbol~$\mathcal{F}$ however usually acquits itself clearly enough in context (refer to \S~\ref{hex:270.2}). Whichever letter or letters might be used for the independent variable, the functions \bq{fouri:100:30} f(v) \stackrel{\mathcal F}{\ra} F(v) \eq constitute a \emph{Fourier transform pair.} \subsection{An example} \label{fouri:100.40} \index{Fourier transform!example of} \index{triangular pulse!Fourier transform of} \index{Fourier transform!of a triangular pulse} As a Fourier example, consider the triangular pulse $\Lambda(t)$ of~(\ref{fours:095:10}). Its Fourier transform according to~(\ref{fouri:byu}) is \bqb \mathcal{F}\left\{\Lambda(v)\right\} &=& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\theta} \Lambda(\theta) \,d\theta \\&=& \frac{1}{\sqrt{2\pi}} \left\{ \int_{-1}^{0} e^{-iv\theta} (1+\theta) \,d\theta + \int_{0}^{1} e^{-iv\theta} (1-\theta) \,d\theta \right\}. \eqb According to Table~\ref{inttx:470:tbl} (though it is easy enough to figure it out without recourse to the table), $\theta e^{-iv\theta} = [d/d\theta] [ e^{-iv\theta}(1+iv\theta)/v^2 ]$; so, continuing, \bqb \mathcal{F}\left\{\Lambda(v)\right\} &=& \frac{1}{v^2\sqrt{2\pi}} \bigg\{ \left[ e^{-iv\theta} [1+(iv)(1+\theta)] \right]_{-1}^{0} \\&&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \mbox{} + \left[ e^{-iv\theta} [-1+(iv)(1-\theta)] \right]_{0}^{1} \bigg\} \\&=& \frac{\sinarg^2(v/2)}{\sqrt{2\pi}}, \eqb where $\sinarg(\cdot)$ is the sine-argument function of~(\ref{fours:160:10}). Thus we find the Fourier transform pair \bq{fouri:100:41} \Lambda(v) \fouripair \frac{\sinarg^2(v/2)}{\sqrt{2\pi}}. \eq \index{square pulse!Fourier transform of} \index{Fourier transform!of a square pulse} One can compute other Fourier transforms in like manner, such as that \bq{fouri:100:44} \Pi(v) \fouripair \frac{\sinarg(v/2)}{\sqrt{2\pi}}. \eq and yet further transforms by the duality rule and the other properties of \S~\ref{fouri:110}. % ---------------------------------------------------------------------- \section{Properties of the Fourier transform} \label{fouri:110} \index{Fourier transform!properties of} The Fourier transform obeys an algebra of its own, exhibiting several broadly useful properties one might grasp to wield the transform effectively. This section derives and lists the properties. \subsection{Duality} \label{fouri:110.10} \index{duality} \index{Fourier transform!dual of} \index{Fourier transform!reversing the independent variable of} Changing $-v \la v$ makes (\ref{fouri:byu})'s second line to read \[ f(-v) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\theta} F(\theta) \,d\theta. \] However, according to (\ref{fouri:byu})'s first line, this says neither more nor less than that \bq{fouri:110:10} F(v) \stackrel{\mathcal F}{\ra} f(-v), \eq which is that the transform of the transform is the original function with the independent variable reversed, an interesting and useful property. It is entertaining, %\footnote{ % At least for those of us who are easily entertained! %} and moreover enlightening, to combine~(\ref{fouri:100:30}) and~(\ref{fouri:110:10}) to form the endless transform progression \bq{fouri:110:12} \cdots \stackrel{\mathcal F}{\ra} f(v) \stackrel{\mathcal F}{\ra} F(v) \stackrel{\mathcal F}{\ra} f(-v) \stackrel{\mathcal F}{\ra} F(-v) \stackrel{\mathcal F}{\ra} f(v) \stackrel{\mathcal F}{\ra} \cdots \eq %A sequence of four successive transformations apparently recovers the %original function.% %\footnote{ % Fourier transformation resembles multiplication by~$i$ in this % respect. (Tangentially, see also Ch.~\ref{cubic}'s % footnote~\ref{cubic:250:09}.) %} Equation~(\ref{fouri:110:10}), or alternately~(\ref{fouri:110:12}), expresses the Fourier transform's \emph{duality} rule. \index{compositional duality} \index{duality!compositional} \index{Fourier transform!compositional dual of} The Fourier transform evinces duality in another guise too, \emph{compositional duality,} expressed abstractly as \bq{fouri:110:14} \begin{split} g[v,f(h_g(v))] &\fouripair G[v,F(h_G(v))], \\ G[v,f(h_G(v))] &\fouripair g[-v,F(-h_g(-v))]. \end{split} \eq This is best introduced by example. Consider the Fourier pair $\Lambda(v) \fouripair \sinarg^2[v/2]/\sqrt{2\pi}$ mentioned in \S~\ref{fouri:100.40}, plus the Fourier identity $f(v-a) \fouripair e^{-iav}F(v)$ which we have not yet met but will in \S~\ref{fouri:110.30} below. Identifying $f(v)=\Lambda(v)$ and $F(v)=\sinarg^2[v/2]/\sqrt{2\pi}$, the identity extends the pair to $\Lambda(v-a) \fouripair e^{-iav}\sinarg^2[v/2]/\sqrt{2\pi}$. On the other hand, recognizing $h_g(v) = v-a$, $g[v,(\cdot)] = (\cdot)$, $h_G(v)=v$, and $G[v,(\cdot)]=(e^{-iav})(\cdot)$, eqn.~(\ref{fouri:110:14}) converts the identity to its compositional dual $e^{-iav}f(v) \fouripair F(v+a)$, which in turn extends the pair to $e^{-iav}\Lambda(v) \fouripair \sinarg^2[(v+a)/2]/\sqrt{2\pi}$. Note incidentally that the direct dual of the original pair per~(\ref{fouri:110:12}) is the pair $\sinarg^2[v/2]/\sqrt{2\pi} \fouripair \Lambda(-v)$ which, since it happens that $\Lambda(-v)=\Lambda(v)$, is just the pair $\sinarg^2[v/2]/\sqrt{2\pi} \fouripair \Lambda(v)$; but that we need neither the identity nor~(\ref{fouri:110:14}) to determine this. \index{formal pair} \index{Fourier transform pair!formal} \index{transform pair!formal} So, assuming that~(\ref{fouri:110:14}) is correct, it does seem useful; but is it correct? To show that it is, take the direct dual on~$v$ of (\ref{fouri:110:14})'s first line to get the formal pair \[ G[v,F(h_G(v))] \fouripair g[-v,f(h_g(-v))], \] then change the symbols $\phi \la f$ and $\Phi \la F$ to express the same formal pair as \bq{fouri:110:15} G[v,\Phi(h_G(v))] \fouripair g[-v,\phi(h_g(-v))]. \eq Now, this as we said is merely a formal pair, which is to say that it represents no functions in particular but presents a pattern to which functions can be fitted. Therefore, $\phi([\cdot])$ might represent any function so long as $\Phi([\cdot])$ were let to represent the same function's Fourier transform on~$[\cdot]$, as% \footnote{ To be symbolically precise, the~$\mathcal F$ here is $\mathcal F_{[\cdot][\cdot]}$, such that \[ \begin{split} \phi([\cdot]) &\stackrel{\mathcal{F}_{[\cdot][\cdot]}}{\rightarrow} \Phi([\cdot]), \\ F(-[\cdot]) &\stackrel{\mathcal{F}_{[\cdot][\cdot]}}{\rightarrow} f([\cdot]); \end{split} \] whereas the~$\mathcal F$ in the formal pairs was~$\mathcal F_{vv}$, such that \[ \begin{split} g[v,f(h_g(v))] &\stackrel{\mathcal{F}_{vv}}{\rightarrow} G[v,F(h_G(v))], \\ G[v,f(h_G(v))] &\stackrel{\mathcal{F}_{vv}}{\rightarrow} g[-v,F(-h_g(-v))]. \end{split} \] Refer to \S~\ref{fouri:100.30}. } \[ \phi([\cdot]) \fouripair \Phi([\cdot]). \] Suppose some particular function $f([\cdot])$ whose Fourier transform on~$[\cdot]$ is $F([\cdot])$, for the two of which there must exist---by direct duality thrice on~$[\cdot]$---the Fourier pair \[ F(-[\cdot]) \fouripair f([\cdot]). \] Let us define \[ \Phi([\cdot]) \equiv f([\cdot]), \] whose inverse Fourier transform on~$[\cdot]$, in view of the foregoing, cannot but be \[ \phi([\cdot]) \equiv F(-[\cdot]); \] then observe that substituting these two, complementary definitions together into the formal pair~(\ref{fouri:110:15}) yields (\ref{fouri:110:14})'s second line, completing the proof. Once the proof is understood,~(\ref{fouri:110:14}) is readily extended to \bq{fouri:110:16} \begin{split} g[v,f_1(h_{g1}(v)),f_2(h_{g2}(v))] &\fouripair G[v,F_1(h_{G1}(v)),F_2(h_{G2}(v))], \\ G[v,f_1(h_{G1}(v)),f_2(h_{G2}(v))] &\fouripair g[-v,F_1(-h_{g1}(-v)),F_2(-h_{g2}(-v))]; \end{split} \eq and indeed generalized to %\footnote{ % The notation of~(\ref{fouri:110:17}) is more abstract than one would % like but once you grasp the semantics (which admittedly is not trivial % to do) you will see the necessity of the abstraction. %} \bq{fouri:110:17} \begin{split} g[v,f_k(h_{gk}(v))] &\fouripair G[v,F_k(h_{Gk}(v))], \\ G[v,f_k(h_{Gk}(v))] &\fouripair g[-v,F_k(-h_{gk}(-v))], \end{split} \eq in which $g[v,f_k(h_{gk}(v))]$ means $g[v,f_1(h_{g1}(v)),f_2(h_{g2}(v)),f_3(h_{g3}(v)),\ldots]$. Table~\ref{fouri:110:tbl10} summarizes. \begin{table} \caption[Fourier duality rules.]{Fourier duality rules. (Observe that the compositional rules, the table's several rules involving~$g$, transform only properties valid for all $f[v]$.)} \label{fouri:110:tbl10} \bqb f(v) &\fouripair& F(v) \\ F(v) &\fouripair& f(-v) \\ f(-v) &\fouripair& F(-v) \\ F(-v) &\fouripair& f(v) \\&&\\ g[v,f(h_g(v))] &\fouripair& G[v,F(h_G(v))] \\ G[v,f(h_G(v))] &\fouripair& g[-v,F(-h_g(-v))] \\&&\\ g[v,f_1(h_{g1}(v)),f_2(h_{g2}(v))] &\fouripair& G[v,F_1(h_{G1}(v)),F_2(h_{G2}(v))] \\ G[v,f_1(h_{G1}(v)),f_2(h_{G2}(v))] &\fouripair& g[-v,F_1(-h_{g1}(-v)),F_2(-h_{g2}(-v))] \\&&\\ g[v,f_k(h_{gk}(v))] &\fouripair& G[v,F_k(h_{Gk}(v))] \\ G[v,f_k(h_{Gk}(v))] &\fouripair& g[-v,F_k(-h_{gk}(-v))] \eqb \end{table} \subsection{Real and imaginary parts} \label{fouri:110.15} The Fourier transform of a function's conjugate according to~(\ref{fouri:byu}) is \[ \mathcal{F}\{f^{*}(v)\} = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{-iv\theta} f^{*}(\theta)\, d\theta = \left[ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{iv^{*}\theta} f(\theta)\, d\theta\right]^{*}, \] in which we have taken advantage of the fact that the dummy variable $\theta = \theta^{*}$ happens to be real. This implies by~(\ref{fouri:byu}) and~(\ref{fouri:110:12}) that \bq{fouri:110:18} \mathcal{F}\{f^{*}(v)\} = \mathcal{F}^{-*}\{f(v^{*})\} = \mathcal{F}^{*}\{f(-v^{*})\}, \eq where the symbology $\mathcal{F}^{-*}\{\cdot\} \equiv [\mathcal{F}^{-1}\{\cdot\}]^{*}$ is used and $ \mathcal{F}^{-1}\{g(w)\} = % bad break \linebreak \mathcal{F}\{\mathcal{F}^{-1}\{\mathcal{F}^{-1}\{g(w)\}\}\} = \mathcal{F}\{g(-w)\} $. In the arrow notation, it implies that% \footnote{ From past experience with complex conjugation, an applied mathematician might naturally have expected of~(\ref{fouri:110:18a}) that $f^{*}(v) \fouripair F^{*}(v)$, but this natural expectation would actually have been incorrect. Readers whom this troubles might consider that, unlike most of the book's mathematics before Ch.~\ref{fours}, eqns.~(\ref{fours:100:10}) and~(\ref{fours:100:15})---and thus ultimately also the Fourier transform's definition~(\ref{fouri:byu})---have arbitrarily chosen a particular sign for the~$i$ in the phasing factor $e^{-ij\,\Delta\omega\,\tau}$ or $e^{-iv\theta}$, which phasing factor the Fourier integration bakes into the transformed function $F(v)$, so to speak. The Fourier transform as such therefore does not meet \S~\ref{alggeo:225.2}'s condition for~(\ref{alggeo:225:conj2}) to hold. Fortunately,~(\ref{fouri:110:18a}) does hold. Viewed from another angle, it must be so, because Fourier transforms real functions into complex ones. See Figs.~\ref{fouri:100:fig} and~\ref{fouri:100:figt}. } \bq{fouri:110:18a} f^{*}(v) \fouripair F^{*}(-v^{*}). \eq If we express the real and imaginary parts of $f(v)$ in the style of~(\ref{alggeo:225:25}) as \[ \begin{split} \Re[f(v)] &= \frac{f(v) + f^{*}(v)}{2}, \\ \Im[f(v)] &= \frac{f(v) - f^{*}(v)}{i2}, \end{split} \] then the Fourier transforms of these parts according to~(\ref{fouri:110:18a}) are% \footnote{ The precisely orderly reader might note that a forward reference to Table~\ref{fouri:110:tbl20} is here implied, but the property referred to, Fourier superposition $A_1f_1(v) + A_2f_2(v) \fouripair A_1F_1(v) + A_2F_2(v)$, which does not depend on this subsection's results anyway, is so trivial to prove that we will not bother about the precise ordering in this case. } \bq{fouri:110:19} \begin{split} \Re[f(v)] &\fouripair \frac{F(v) + F^{*}(-v^{*})}{2}, \\ \Im[f(v)] &\fouripair \frac{F(v) - F^{*}(-v^{*})}{i2}. \end{split} \eq For real~$v$ and an $f(v)$ which itself is real for all real~$v$, the latter line becomes \[ 0 \fouripair \frac{F(v) - F^{*}(-v)}{i2} \ \ \mbox{if $\Im(v) = 0$ and, for all such~$v$, $\Im[f(v)] = 0$,} \] whereby \bq{fouri:110:19b} F(v) = F^{*}(-v) \ \ \mbox{if $\Im(v) = 0$ and, for all such~$v$, $\Im[f(v)] = 0$.} \eq Interpreted,~(\ref{fouri:110:19b}) says for real~$v$ and $f(v)$ that the plot of $\Re[F(v)]$ is symmetric about the vertical axis whereas the plot of $\Im[F(v)]$ is symmetric about the origin, as Fig.~\ref{fouri:100:figt} has illustrated. Table~\ref{fouri:110:tbl15} summarizes. \begin{table} \caption{Real and imaginary parts of the Fourier transform.} \label{fouri:110:tbl15} \bqb f^{*}(v) &\fouripair& F^{*}(-v^{*}) \\ \Re[f(v)] &\fouripair& \frac{F(v) + F^{*}(-v^{*})}{2} \\ \Im[f(v)] &\fouripair& \frac{F(v) - F^{*}(-v^{*})}{i2} \eqb \bc If $\Im(v) = 0$ and, for all such~$v$, $\Im[f(v)] = 0$, then \ec \[ F(v) = F^{*}(-v). \] \end{table} %Before continuing to the next subsection to study further Fourier %properties we should caution that---though the properties of %Table~\ref{fouri:110:tbl15} are indeed valid---the meaning and, indeed, %the proper existence of the Fourier transform are not necessarily %obvious when~$v$ is complex. Subtleties can arise, depending on %the function being transformed. For instance, after reading %\S\S~\ref{fouri:110.20} and~\ref{fouri:110.30} below, one might consider %the Fourier transform of $f(u) = e^{iav}$ for complex~$v$. Then, after %reading %[section not yet written]% %, %one might consider representing the Dirac delta and its transform as the %limit of a Gaussian pulse. None of this is easy. The peculiar form %of~(\ref{fouri:110:18a}) is a sign that Fourier mathematics give %complex arguments an aspect not seen earlier in the book. The Fourier %transform most often arises in applications with real~$v$ for this %reason among others. % %Section %[not yet written] %will introduce the Laplace transform, a variant of the Fourier transform %meant especially, systematically to handle complex~$v$. Students new to %the Fourier transform, if confused by the complex case, might limit %their consideration of the topic to the real case at least until after %learning Laplace. \subsection{The Fourier transform of the Dirac delta} \label{fouri:110.20} \index{delta function, Dirac!Fourier transform of} \index{Dirac delta function!Fourier transform of} \index{Fourier transform!of the Dirac Delta} \index{constant!Fourier transform of} \index{Fourier transform!of a constant} \index{$1$ (one)!Fourier transform of} Section~\ref{fouri:120} will compute several Fourier transform pairs but \S~\ref{fouri:110.30} will need one particular pair and its dual sooner, so let us pause to compute these now. Applying~(\ref{fouri:byu}) to the Dirac delta~(\ref{integ:670:20}) and invoking its sifting property~(\ref{integ:670:sift}), we find curiously that \bq{fouri:110:20} \delta(v) \fouripair \frac{1}{\sqrt{2\pi}}, \eq the dual of which according to~(\ref{fouri:110:12}) is \bq{fouri:110:22} 1 \fouripair \left(\sqrt{2\pi}\right) \delta(v) \eq inasmuch as $\delta(-v) = \delta(v)$. (The duality rule proves its worth in eqn.~\ref{fouri:110:22}, incidentally. Had we tried to calculate the Fourier transform of~$1$---that is, of $f[v]\equiv 1$---directly according to eqn.~\ref{fouri:byu} we would have found $ 1 \fouripair [1/\sqrt{2\pi}] \int_{-\infty}^{\infty} e^{-iv\theta} \,d\theta, $ an impossible integral to evaluate.) \subsection{Shifting, scaling and differentiation} \label{fouri:110.30} \index{Fourier transform!shifting of} \index{Fourier transform!of a shifted function} \index{Fourier transform!scaling of} \index{Fourier transform!differentiation of} \index{Fourier transform!of a derivative} \index{Fourier transform!linearity of} \index{derivative!of a Fourier transform} \index{derivative!Fourier transform of} Table~\ref{fouri:110:tbl20} lists several Fourier properties involving shifting, scaling and differentiation, plus an expression of the Fourier transform's linearity. \begin{table} \caption[Properties involving shifting, scaling and differentiation.]{Fourier properties involving shifting, scaling, differentiation and integration.} \label{fouri:110:tbl20} \bqb f(v-a) &\fouripair& e^{-iav}F(v) \\ e^{iav}f(v) &\fouripair& F(v-a) \\ Af(\alpha v) &\fouripair& \frac{A}{\left|\alpha\right|}F\left(\frac{v}{\alpha}\right) \ \ \mbox{if $\Im(\alpha)=0$, $\Re(\alpha)\neq 0$} \\ A_1f_1(v) + A_2f_2(v) &\fouripair& A_1F_1(v) + A_2F_2(v) \\ \frac{d}{dv}f(v) &\fouripair& ivF(v) \\ ivf(v) &\fouripair& -\frac{d}{dv}F(v) \\ \int_{-\infty}^v f(\tau) \,d\tau &\fouripair& \frac{F(v)}{iv} + \frac{2\pi}{2} F(0) \delta(v) \eqb \end{table} The table's first property is proved by applying~(\ref{fouri:byu}) to $f(v-a)$ then changing $\xi \la \theta - a$. The table's second property is proved by applying~(\ref{fouri:byu}) to $e^{iav}f(v)$; or, alternately, is proved through~(\ref{fouri:110:14}) as the composition dual of the table's first property. The table's third property is proved by applying~(\ref{fouri:byu}) to $Af(\alpha v)$ then changing $\xi\la\alpha\theta$. The table's fourth property is proved trivially. The table's fifth and sixth properties begin from the derivative of the inverse Fourier transform; that is, of (\ref{fouri:byu})'s second line. This derivative is \bqb \frac{d}{dv}f(v) &=& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} i\theta e^{iv\theta} F(\theta) \,d\theta \\&=& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{iv\theta} [i\theta F(\theta)] \,d\theta \\&=& \mathcal{F}^{-1}\{ivF(v)\}, \eqb which implies \[ \mathcal{F}\left\{\frac{d}{dv}f(v) \right\} = ivF(v), \] the table's fifth property. The sixth and last property is the compositional dual~(\ref{fouri:110:14}) of the fifth. Besides the identities this section derives, Table~\ref{fouri:110:tbl20} also includes % bad break \linebreak (\ref{fouri:125:10}), which \S~\ref{fouri:125} will prove. \subsection{Convolution and correlation} \label{fouri:110.40} \index{convolution} \index{correlation} \index{convolution!Fourier transform of} \index{correlation!Fourier transform of} \index{product!Fourier transform of} \index{Fourier transform!of a convolution} \index{Fourier transform!of a correlation} \index{Fourier transform!of a product} \index{transfer function} The concept of \emph{convolution} emerges from mechanical engineering (or from its subdisciplines electrical and chemical engineering), in which the response of a linear system to an impulse $\delta(t)$ is some characteristic \emph{transfer function} $h(t)$. Since the system is linear, it follows that its response to an arbitrary input $f(t)$ is \[ g(t) \equiv \int_{-\infty}^{\infty} h(t-\tau)f(\tau) \,d\tau; \] or, changing $t/2+\tau \la \tau$ to improve the equation's symmetry, \bq{fouri:110:40} g(t) \equiv \int_{-\infty}^{\infty} h\left(\frac{t}{2}-\tau\right) f\left(\frac{t}{2}+\tau\right) \,d\tau. \eq This integral defines% \footnote{\cite[\S~2.2]{Hsu:sig}} convolution of the two functions $f(t)$ and $h(t)$. Changing $v\la t$ and $\psi\la\tau$ in~(\ref{fouri:110:40}) to comport with the notation found elsewhere in this section then applying~(\ref{fouri:byu}) yields% \footnote{ See Ch.~\ref{fours}'s footnote~\ref{fours:100:fn10}. } \bqb \lefteqn{ \mathcal{F}\left\{ \int_{-\infty}^{\infty} h\left(\frac{v}{2}-\psi\right) f\left(\frac{v}{2}+\psi\right) \,d\psi \right\} }&&\\&=& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\theta} \int_{-\infty}^{\infty} h\left(\frac{\theta}{2}-\psi\right) f\left(\frac{\theta}{2}+\psi\right) \,d\psi \,d\theta \\&=& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-iv\theta} h\left(\frac{\theta}{2}-\psi\right) f\left(\frac{\theta}{2}+\psi\right) \,d\theta \,d\psi. \eqb Now changing $\phi \la \theta/2 + \psi$, \bqb \lefteqn{ \mathcal{F}\left\{ \int_{-\infty}^{\infty} h\left(\frac{v}{2}-\psi\right) f\left(\frac{v}{2}+\psi\right) \,d\psi \right\} }&&\\&=& \frac{2}{\sqrt{2\pi}} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-iv(2\phi-2\psi)} h(\phi-2\psi) f(\phi) \,d\phi \,d\psi \\&=& \frac{2}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\phi} f(\phi) \int_{-\infty}^{\infty} e^{-iv(\phi-2\psi)} h(\phi-2\psi) \,d\psi \,d\phi. \eqb Again changing $\chi \la \phi - 2\psi$, \bqb \lefteqn{ \mathcal{F}\left\{ \int_{-\infty}^{\infty} h\left(\frac{v}{2}-\psi\right) f\left(\frac{v}{2}+\psi\right) \,d\psi \right\} }&&\\&=& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\phi} f(\phi) \int_{-\infty}^{\infty} e^{-iv\chi} h(\chi) \,d\chi \,d\phi \\&=& \left[\sqrt{2\pi}\right] \left[ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\chi} h(\chi) \,d\chi \right] \left[ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\phi} f(\phi) \,d\phi \right] \\&=& \left(\sqrt{2\pi}\right) H(v) F(v). \eqb That is, \bq{fouri:110:42} \int_{-\infty}^{\infty} h\left(\frac{v}{2}-\psi\right) f\left(\frac{v}{2}+\psi\right) \,d\psi \fouripair \left(\sqrt{2\pi}\right) H(v) F(v). \eq The compositional dual~(\ref{fouri:110:14}) of~(\ref{fouri:110:42}) is \bq{fouri:110:43} h(v) f(v) \fouripair \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} H\left(\frac{v}{2}-\psi\right) F\left(\frac{v}{2}+\psi\right) \,d\psi, \eq in which we have changed $-\psi\la\psi$ as the dummy variable of integration. Whether by~(\ref{fouri:110:42}) or by~(\ref{fouri:110:43}), convolution in the one domain evidently transforms to multiplication in the other. Closely related to the convolutional integral~(\ref{fouri:110:40}) is the integral \bq{fouri:110:45} g(t) \equiv \int_{-\infty}^{\infty} h\left(\tau-\frac{t}{2}\right) f\left(\tau+\frac{t}{2}\right) \,d\tau, \eq whose transform and dual transform are computed as in the last paragraph to be \bq{fouri:110:47} \begin{split} \int_{-\infty}^{\infty} h\left(\psi-\frac{v}{2}\right) f\left(\psi+\frac{v}{2}\right) \,d\psi &\fouripair \left(\sqrt{2\pi}\right) H(-v) F(v), \\ h(-v) f(v) &\fouripair \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} H\left(\psi-\frac{v}{2}\right) F\left(\psi+\frac{v}{2}\right) \,d\psi. \end{split} \eq Furthermore, according to~(\ref{fouri:110:18a}), $h^{*}(v) \fouripair H^{*}(-v^{*})$, so \bq{fouri:110:48} \begin{split} \int_{-\infty}^{\infty} h^{*}\left(\psi-\frac{v}{2}\right) f\left(\psi+\frac{v}{2}\right) \,d\psi &\fouripair \left(\sqrt{2\pi}\right) H^{*}(v^{*}) F(v), \\ h^{*}(v^{*}) f(v) &\fouripair \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} H^{*}\left(\psi-\frac{v}{2}\right) F\left(\psi+\frac{v}{2}\right) \,d\psi, \end{split} \eq in which the second line is the compositional dual of the first with, as before, the dummy variable $-\psi\la\psi$ changed; and indeed one can do the same to the transform~(\ref{fouri:110:42}) of the convolutional integral, obtaining \bq{fouri:110:49} \begin{split} \int_{-\infty}^{\infty} h^{*}\left(\frac{v}{2}-\psi\right) f\left(\frac{v}{2}+\psi\right) \,d\psi &\fouripair \left(\sqrt{2\pi}\right) H^{*}(-v^{*}) F(v), \\ h^{*}(v) f(v) &\fouripair \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} H^{*}\left(\frac{v}{2}-\psi\right) F\left(\frac{v}{2}+\psi\right) \,d\psi. \end{split} \eq \index{autocorrelation} \index{convolution!commutivity of} \index{convolution!associativity of} \index{commutivity!of convolution} \index{associativity!of convolution} Unlike the operation the integral~(\ref{fouri:110:40}) expresses, known as convolution, the operation the integral~(\ref{fouri:110:45}) expresses has no special name as far as the writer is aware. However, the operation its variant \bq{fouri:110:45a} g(t) \equiv \int_{-\infty}^{\infty} h^{*}\left(\tau-\frac{t}{2}\right) f\left(\tau+\frac{t}{2}\right) \,d\tau \eq expresses does have a name. It is called \emph{correlation,} being a measure of the degree to which one function tracks another with an offset in the independent variable. Reviewing this subsection, we see that in~(\ref{fouri:110:48}) we have already determined the transform and dual transform of the correlational integral~(\ref{fouri:110:45a}). Convolution and correlation arise often enough in applications to enjoy their own, peculiar notations% \footnote{\cite[\S~19.4]{JJH}} \bq{fouri:110:50} h(t) \ast f(t) \equiv \int_{-\infty}^{\infty} h\left(\frac{t}{2}-\tau\right) f\left(\frac{t}{2}+\tau\right) \,d\tau \eq for convolution and \bq{fouri:110:53} R_{fh}(t) \equiv \int_{-\infty}^{\infty} h^{*}\left(\tau-\frac{t}{2}\right) f\left(\tau+\frac{t}{2}\right) \,d\tau \eq for correlation. Nothing prevents one from correlating a function with itself, incidentally. The \emph{autocorrelation} \bq{fouri:110:54} R_{ff}(t) = \int_{-\infty}^{\infty} f^{*}\left(\tau-\frac{t}{2}\right) f\left(\tau+\frac{t}{2}\right) \,d\tau \eq proves useful at times.% \footnote{\cite[\S~1.6A]{Hsu:comm}} % diagn: the following new rest of the paragraph and % the associated new table entries want review. For convolution, the commutative and associative properties that \bq{fouri:110:51} \begin{split} f(t) \ast h(t) &= h(t) \ast f(t), \\ f(t) \ast [g(t) \ast h(t)] &= [f(t) \ast g(t)] \ast h(t), \end{split} \eq are useful, too, where the former may be demonstrated by changing $-\tau \la \tau$ in~(\ref{fouri:110:50}) and, through Fourier transformation, both may be demonstrated as $f(v) \ast [g(v) \ast h(v)] \fouripair (\sqrt{2\pi})F(v)[(\sqrt{2\pi})G(v)H(v)] = % bad break \linebreak (\sqrt{2\pi})[(\sqrt{2\pi})F(v)G(v)]H(v) \stackrel{\mathcal{F}^{-1}}{\ra} [f(v) \ast g(v)] \ast h(v)$ and similarly for the commutative property. Tables~\ref{fouri:110:tbl40} and~\ref{fouri:110:tbl41} summarize. \begin{table} \caption{Convolution and correlation, and their Fourier properties.} \label{fouri:110:tbl40} \bqb \int_{-\infty}^{\infty} h\left(\frac{v}{2}-\psi\right) f\left(\frac{v}{2}+\psi\right) \,d\psi &\fouripair& \left(\sqrt{2\pi}\right) H(v) F(v) \\ h(v) f(v) &\fouripair& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} H\left(\frac{v}{2}-\psi\right) \\&&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \mbox{}\times F\left(\frac{v}{2}+\psi\right) \,d\psi \\ \int_{-\infty}^{\infty} h\left(\psi-\frac{v}{2}\right) f\left(\psi+\frac{v}{2}\right) \,d\psi &\fouripair& \left(\sqrt{2\pi}\right) H(-v) F(v) \\ h(-v) f(v) &\fouripair& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} H\left(\psi-\frac{v}{2}\right) \\&&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \mbox{}\times F\left(\psi+\frac{v}{2}\right) \,d\psi \\ \int_{-\infty}^{\infty} h^{*}\left(\frac{v}{2}-\psi\right) f\left(\frac{v}{2}+\psi\right) \,d\psi &\fouripair& \left(\sqrt{2\pi}\right) H^{*}(-v^{*}) F(v) \\ h^{*}(v) f(v) &\fouripair& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} H^{*}\left(\frac{v}{2}-\psi\right) \\&&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \mbox{}\times F\left(\frac{v}{2}+\psi\right) \,d\psi \\ \int_{-\infty}^{\infty} h^{*}\left(\psi-\frac{v}{2}\right) f\left(\psi+\frac{v}{2}\right) \,d\psi &\fouripair& \left(\sqrt{2\pi}\right) H^{*}(v^{*}) F(v) \\ h^{*}(v^{*}) f(v) &\fouripair& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} H^{*}\left(\psi-\frac{v}{2}\right) \\&&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \mbox{}\times F\left(\psi+\frac{v}{2}\right) \,d\psi \eqb \end{table} \begin{table} \caption[Convolution and correlation in their peculiar notation.]{% Convolution and correlation in their peculiar notation. % diagn: the next sentence is new and wants review. (Note that the~$\ast$ which appears in the table as $h[t] \ast f[t]$ differs in meaning from the~$\mbox{}^{*}$ in $h^{*}[v^{*}]$.)% } \label{fouri:110:tbl41} \bqb f(t) \ast h(t) = h(t) \ast f(t) &\equiv& \int_{-\infty}^{\infty} h\left(\frac{t}{2}-\tau\right) f\left(\frac{t}{2}+\tau\right) \,d\tau \\ R_{fh}(t) &\equiv& \int_{-\infty}^{\infty} h^{*}\left(\tau-\frac{t}{2}\right) f\left(\tau+\frac{t}{2}\right) \,d\tau \\ h(v) \ast f(v) &\fouripair& \left(\sqrt{2\pi}\right) H(v) F(v) \\ h(v) f(v) &\fouripair& \frac{1}{\sqrt{2\pi}}[H(v) \ast F(v)] \\ R_{fh}(v) &\fouripair& \left(\sqrt{2\pi}\right) H^{*}(v^{*}) F(v) \\ h^{*}(v^{*}) f(v) &\fouripair& \frac{1}{\sqrt{2\pi}}R_{FH}(v) \\ f(t) \ast [g(t) \ast h(t)] &=& [f(t) \ast g(t)] \ast h(t) \eqb \end{table} \index{energy spectral density} \index{spectral density} \index{density!spectral} Before closing the section, we should take note of one entry of~\ref{fouri:110:tbl41} in particular, $R_{fh}(v) \fouripair (\sqrt{2\pi}) H^{*}(v^{*}) F(v)$. This same entry is also found in Table~\ref{fouri:110:tbl40} in other notation---indeed it is just the first line of~(\ref{fouri:110:48})---but when written in the correlation's peculiar notation it draws attention to a peculiar result. Scaled by $1/\sqrt{2\pi}$, the \emph{autocorrelation} and its Fourier transform are evidently \[ \frac{1}{\sqrt{2\pi}} R_{ff}(v) \fouripair F^{*}(v^{*}) F(v). \] For% \footnote{ Electrical engineers call the quantity $\left|F(v)\right|^2$ on (\ref{fouri:110:57})'s right the \emph{energy spectral density} of $f(v)$.~\cite[\S~1.6B]{Hsu:comm} } real~$v$, \bq{fouri:110:57} \frac{1}{\sqrt{2\pi}} R_{ff}(v) \fouripair \left|F(v)\right|^2 \ \ \mbox{if $\Im(v) = 0$.} \eq %Section \S~\ref{fouri:110.55} will comment on the notion of energy in %signals, but for the moment we should like to observe that, in practical %electrical control systems, it sometimes happens that an adequate %estimate of $R_{ff}(v)$ is immediately available whereas that the %history of $f(v)$ is not. When this is the case,~(\ref{fouri:110:57}) %gives an elegant way to calculate an energy spectral density. % diagn: From here on is largely new. \subsection{Parseval's theorem} \label{fouri:110.55} \index{Parseval's theorem} \index{Parseval, Marc-Antoine (1755--1836)} By successive steps, \bqb \int_{-\infty}^{\infty} h^{*}(v) f(v) \,dv &=& \int_{-\infty}^{\infty} h^{*}(v) \left[ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{iv\theta} F(\theta) \,d\theta \right] \,dv \\&=& \int_{-\infty}^{\infty} \left[ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{iv\theta} h^{*}(v) \,dv \right] F(\theta) \,d\theta \\&=& \int_{-\infty}^{\infty} \left[ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\theta} h(v) \,dv \right]^{*} F(\theta) \,d\theta. \\&=& \int_{-\infty}^{\infty} H^{*}(\theta) F(\theta) \,d\theta, \eqb in which we have used~(\ref{fouri:byu}), interchanged the integrations and assumed that the dummy variables~$v$ and~$\theta$ of integration remain real. Changing $v \la \theta$ on the right, we have that \bq{fouri:110:parseval} \int_{-\infty}^{\infty} h^{*}(v) f(v) \,dv = \int_{-\infty}^{\infty} H^{*}(v) F(v) \,dv. \eq This is \emph{Parseval's theorem.}% \footnote{ \cite[\S~2-2]{Couch}\cite[\S~1.6B]{Hsu:comm} } \index{time} \index{energy} \index{frequency} \index{frequency!angular} \index{angular frequency} \index{frequency!spatial} \index{frequency content} \index{space} \index{spatial frequency} \index{dimension} \index{quadrature} \index{channel} \index{$I$ channel} \index{$Q$ channel} Especially interesting is the special case $h(t) = f(t)$, when \bq{fouri:110:parseval2} \int_{-\infty}^{\infty} \left|f(v)\right|^2 \,dv = \int_{-\infty}^{\infty} \left|F(v)\right|^2 \,dv. \eq When this is written as \[ \int_{-\infty}^{\infty} \left|f(t)\right|^2 \,dt = \int_{-\infty}^{\infty} \left|F(\omega)\right|^2 \,d\omega, \] and~$t$, $\left|f(t)\right|^2$, $\omega$ and $\left|F(\omega)\right|^2$ respectively have physical dimensions of time, energy per unit time, angular frequency and energy per unit angular frequency, then the theorem conveys the important physical insight that energy transmitted at various times can equally well be regarded as energy transmitted at various frequencies. This works for space and spatial frequencies, too: see \S~\ref{fours:085}. For real $f(v)$, one can write~(\ref{fouri:110:parseval2}) as \bq{fouri:110:parseval2a} \int_{-\infty}^{\infty} f^2(v) \,dv = \int_{-\infty}^{\infty} \Re^2[F(v)] \,dv + \int_{-\infty}^{\infty} \Im^2[F(v)] \,dv \eq which expresses the principle of \emph{quadrature,} conveying the additional physical insight that a single frequency can carry energy in not one but each of two distinct, independent channels; namely, a \emph{real-phased, in-phase} or~$I$ channel and an \emph{imaginary-phased, quadrature-phase} or~$Q$ channel.% \footnote{\cite[\S~5-1]{Couch}} Practical digital electronic communications systems, wired or wireless, often do precisely this---effectively transmitting two, independent streams of information at once, without conflict, in the selfsame band. \subsection{Oddness and evenness} \label{fouri:110.65} \index{function!odd or even} \index{odd function!Fourier transform of} \index{even function!Fourier transform of} \index{Fourier transform!of an odd or even function} Odd functions have odd transforms. Even functions have even transforms. Symbolically, \bi \item if $f(-v) = -f(v)$ for all~$v$, then $F(-v) = -F(v)$; \item if $f(-v) = f(v)$ for all~$v$, then $F(-v) = F(v)$. \ei In the odd case, this is seen by expressing $F(-v)$ per~(\ref{fouri:byu}) as \[ F(-v) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-i(-v)\theta} f(\theta) \,d\theta, \] then changing the dummy variable $-\theta \la \theta$ to get \bqb F(-v) &=& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-i(-v)(-\theta)} f(-\theta) \,d\theta \\&=& -\frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-iv\theta} f(\theta) \,d\theta \\&=& -F(v). \eqb The even case is analyzed likewise. See \S~\ref{taylor:365}. % ---------------------------------------------------------------------- \section[Fourier transforms of selected functions]% {The Fourier transforms of selected functions} \label{fouri:120} \index{Fourier transform!of selected functions} \index{sine-argument function!Fourier transform of} \index{Fourier transform!of the sine-argument function} We have already computed the Fourier transforms of $\Pi(v)$, $\Lambda(v)$, $\delta(v)$ and the constant~$1$ in~(\ref{fouri:100:44}), (\ref{fouri:100:41}), (\ref{fouri:110:20}) and~(\ref{fouri:110:22}), respectively. The duals~(\ref{fouri:110:12}) of the first two of these are evidently \[ \begin{split} \sinarg\left(\frac v2\right) &\fouripair \left(\sqrt{2\pi}\right)\Pi(-v), \\ \sinarg^2\left(\frac v2\right) &\fouripair \left(\sqrt{2\pi}\right)\Lambda(-v); \end{split} \] or, since $\Pi(-v)=\Pi(v)$ and $\Lambda(-v)=\Lambda(v)$, \[ \begin{split} \sinarg\left(\frac v2\right) &\fouripair \left(\sqrt{2\pi}\right)\Pi(v), \\ \sinarg^2\left(\frac v2\right) &\fouripair \left(\sqrt{2\pi}\right)\Lambda(v), \end{split} \] which by the scaling property of Table~\ref{fouri:110:tbl20} imply that% \footnote{ In electronic communications systems, including radio, the first line of~(\ref{fouri:120:20}) implies significantly that, to spread energy evenly over an available ``baseband'' but to let no energy leak outside that band, one should transmit sine-argument-shaped pulses as in Fig.~\ref{fours:160:fig}. } \bq{fouri:120:20} \begin{split} \sinarg(v) &\fouripair \frac{\sqrt{2\pi}}{2}\Pi\left(\frac v2\right), \\ \sinarg^2(v) &\fouripair \frac{\sqrt{2\pi}}{2}\Lambda\left(\frac v2\right). \end{split} \eq % diagn: heavily revised paragraph wants close review. \index{Heaviside unit step function!Fourier transform of} \index{unit step function, Heaviside!Fourier transform of} \index{Fourier transform!of the Heaviside unit step} \index{exponential decay!Fourier transform of} \index{Fourier transform!of an exponential decay} \index{exponential, natural!Fourier transform of} \index{natural exponential!Fourier transform of} \index{Fourier transform!of the natural exponential} Applying the Fourier transform's definition~(\ref{fouri:byu}) to $u(v)e^{-av}$, where $u(v)$ is the Heaviside unit step~(\ref{integ:670:10}), yields \[ \mathcal F \left\{u(v)e^{-av}\right\} = \frac{1}{\sqrt{2\pi}} \int_{0}^{\infty} e^{-(a+iv)\theta} \,d\theta = \frac{1}{\sqrt{2\pi}} \left[ \frac{e^{-(a+iv)\theta}}{-(a+iv)} \right]_{0}^{\infty}, \] revealing the transform pair \bq{fouri:120:21} u(v)e^{-av} \fouripair \frac{1}{\left(\sqrt{2\pi}\right)(a+iv)}, \ \ \Re(a) > 0. \eq Interesting is the limit $a \ra 0^{+}$ in~(\ref{fouri:120:21}), \[ u(v) \fouripair \frac{1}{\left(\sqrt{2\pi}\right)(iv)} + C\delta(v), \] where the necessary term $C\delta(v)$, with scale~$C$ to be determined, merely admits that we do not yet know how to evaluate~(\ref{fouri:120:21}) when both~$a$ and~$v$ vanish at once. What we do know from \S~\ref{fouri:110.65} is that odd functions have odd transforms and that (as one can readily see in Fig.~\ref{integ:670:fig-u}) one can convert $u(v)$ to an odd function by the simple expedient of subtracting $1/2$ from it. Since $1/2 \fouripair (\sqrt{2\pi}/2) \delta(v)$ according to~(\ref{fouri:110:22}), we have then that \[ u(v) - \frac 12 \fouripair \frac{1}{\left(\sqrt{2\pi}\right)(iv)} + \left(C-\frac{\sqrt{2\pi}}{2}\right)\delta(v), \] which to make its right side odd demands that $C=\sqrt{2\pi}/2$. The transform pair \bq{fouri:120:22} u(v) \fouripair \frac{1}{\left(\sqrt{2\pi}\right)iv} + \frac{\sqrt{2\pi}}{2} \delta(v) \eq results. On the other hand, according to Table~\ref{inttx:470:tbl}, \[ e^{-av}v^n = \frac{d}{dv} \sum_{k=0}^n \frac{-e^{-av}v^k}{ (n!/k!) a^{n-k+1} }, \] so \bqb \mathcal F \left\{ u(v)e^{-av}v^n \right\} &=& \frac{1}{\sqrt{2\pi}} \int_0^\infty e^{-(a+iv)\theta}\theta^n \,d\theta \\&=& \frac{1}{\sqrt{2\pi}} \sum_{k=0}^n \left. \frac{ -e^{-(a+iv)\theta}\theta^k }{ (n!/k!)(a+iv)^{n-k+1} } \right|_0^\infty. \eqb Since all but the $k=0$ term vanish, the last equation implies the transform pair \bq{fouri:120:24} u(v)e^{-av}v^n \fouripair \frac{1}{\sqrt{2\pi}n!(a+iv)^{n+1}}, \ \ \Re(a) > 0, \ n \in \mathbb Z, \ n \ge 0. \eq \index{Fourier transform!of a sinusoid} \index{Fourier transform!of a sinusoid} \index{sine!Fourier transform of} \index{cosine!Fourier transform of} The Fourier transforms of $\sin av$ and $\cos av$ are interesting and important, and can be computed straightforwardly from the pairs \bq{fouri:120:26} \begin{split} e^{iav} &\fouripair \left(\sqrt{2\pi}\right) \delta(v-a), \\ e^{-iav} &\fouripair \left(\sqrt{2\pi}\right) \delta(v+a), \end{split} \eq which result by applying to~(\ref{fouri:110:22}) Table~\ref{fouri:110:tbl20}'s property that $e^{iav}f(v) \fouripair F(v-a)$. Composing per Table~\ref{cexp:tbl-prop} the trigonometrics from their complex parts, we have that \bq{fouri:120:23} \begin{split} \sin av &\fouripair \frac{\sqrt{2\pi}}{j2}\left[ \delta(v-a) - \delta(v+a) \right], \\ \cos av &\fouripair \frac{\sqrt{2\pi}}{2}\left[ \delta(v-a) + \delta(v+a) \right]. \end{split} \eq \index{pulse train!Fourier transform of} \index{Dirac delta pulse train!Fourier transform of} \index{Fourier transform!of a Dirac delta pulse train} Curiously, the Fourier transform of the Dirac delta pulse train of Fig.~\ref{fours:100:fig4} turns out to be another Dirac delta pulse train. The reason is that the Dirac delta pulse train's Fourier series according to~(\ref{fours:100:37}) and~(\ref{fours:100:10}) is \[ \sum_{j=-\infty}^{\infty} \delta(v-jT_1) = \sum_{j=-\infty}^{\infty} \frac{e^{ij(2\pi/T_1)v}}{T_1}, \] the transform of which according to~(\ref{fouri:120:26}) is \bq{fouri:120:25} \sum_{j=-\infty}^{\infty} \delta(v-jT_1) \fouripair \frac{\sqrt{2\pi}}{T_1} \sum_{j=-\infty}^{\infty} \delta\left(v-j\frac{2\pi}{T_1}\right). \eq Apparently, the further the pulses of the original train, the closer the pulses of the transformed train, and vice versa; yet, even when transformed, the train remains a train of Dirac deltas. Letting $T_1 = \sqrt{2\pi}$ in~(\ref{fouri:120:25}) we find the pair \bq{fouri:120:25a} \sum_{j=-\infty}^{\infty} \delta\left(v-j\sqrt{2\pi}\right) \fouripair \sum_{j=-\infty}^{\infty} \delta\left(v-j\sqrt{2\pi}\right), \eq discovering a pulse train whose Fourier transform is itself. % diagn: check forward references. Table~\ref{fouri:120:tbl20} summarizes. Besides gathering transform pairs from this and earlier sections, the table also covers the Gaussian pulse of \S~\ref{fouri:130}. \begin{table} \caption{Fourier transform pairs.} \label{fouri:120:tbl20} \index{Fourier transform pair} \index{transform pair} \bqb 1 &\fouripair& \left(\sqrt{2\pi}\right) \delta(v) \\ u(v) &\fouripair& \frac{1}{\left(\sqrt{2\pi}\right)iv} + \frac{\sqrt{2\pi}}{2} \delta(v) \\ \delta(v) &\fouripair& \frac{1}{\sqrt{2\pi}} \\ \Pi(v) &\fouripair& \frac{\sinarg(v/2)}{\sqrt{2\pi}} \\ \Lambda(v) &\fouripair& \frac{\sinarg^2(v/2)}{\sqrt{2\pi}} \\ u(v)e^{-av} &\fouripair& \frac{1}{\left(\sqrt{2\pi}\right)(a+iv)}, \ \ \Re(a) > 0 \\ u(v)e^{-av}v^n &\fouripair& \frac{1}{\left(\sqrt{2\pi}\right)n!(a+iv)^{n+1}}, \\ &&\ \ \ \ \ \ \Re(a) > 0,\ n \in \mathbb Z, \ n \ge 0 \\ e^{iav} &\fouripair& \left(\sqrt{2\pi}\right) \delta(v-a) \\ \sin av &\fouripair& \frac{\sqrt{2\pi}}{j2}\left[ \delta(v-a) - \delta(v+a) \right] \\ \cos av &\fouripair& \frac{\sqrt{2\pi}}{2}\left[ \delta(v-a) + \delta(v+a) \right] \\ \sinarg(v) &\fouripair& \frac{\sqrt{2\pi}}{2}\Pi\left(\frac v2\right) \\ \sinarg^2(v) &\fouripair& \frac{\sqrt{2\pi}}{2}\Lambda\left(\frac v2\right) \\ \sum_{j=-\infty}^{\infty} \delta(v-jT_1) &\fouripair& \frac{\sqrt{2\pi}}{T_1} \sum_{j=-\infty}^{\infty} \delta\left(v-j\frac{2\pi}{T_1}\right) \\ \sum_{j=-\infty}^{\infty} \delta\left(v-j\sqrt{2\pi}\right) &\fouripair& \sum_{j=-\infty}^{\infty} \delta\left(v-j\sqrt{2\pi}\right) \\ \Omega(v) &\fouripair& \Omega(v) \eqb \end{table} % ---------------------------------------------------------------------- \section[The Fourier transform of integration]% {The Fourier transform of the integration operation} \label{fouri:125} \index{Fourier transform!of integration} \index{integration!Fourier transform of} Though it includes the Fourier transform of the differentiation operation, Table~\ref{fouri:110:tbl20} omits the complementary identity \bq{fouri:125:10} \int_{-\infty}^v f(\tau) \,d\tau \fouripair \frac{F(v)}{iv} + \frac{2\pi}{2} F(0) \delta(v), \eq the Fourier transform of the integration operation, for when we compiled the table we lacked the needed theory. We have the theory now.% \footnote{\cite[Prob.~5.33]{Hsu:sig}} To develop~(\ref{fouri:125:10}), we begin by observing that one can express the integration in question in either of the equivalent forms \[ \int_{-\infty}^v f(\tau) \,d\tau = \int_{-\infty}^\infty u\left(\frac v2 -\tau\right) f\left(\frac v2 + \tau\right) \,d\tau. \] Invoking an identity of Table~\ref{fouri:110:tbl40} on the rightward form, then substituting the leftward form, yields the Fourier pair \bqb \int_{-\infty}^v f(\tau) \,d\tau &\fouripair& \left(\sqrt{2\pi}\right) H(v) F(v), \\ H(v) &\equiv& \mathcal F\{u(v)\}. \eqb But according to Table~\ref{fouri:120:tbl20}, $\mathcal F\{u(v)\} = 1/[(\sqrt{2\pi})iv] + (\sqrt{2\pi}/2)\delta(v)$, so %\frac{1}{\left(\sqrt{2\pi}\right)iv} + \frac{\sqrt{2\pi}}{2} \delta(v) \[ \int_{-\infty}^v f(\tau) \,d\tau \fouripair \frac{F(v)}{iv} + \frac{2\pi}{2} F(v) \delta(v), \] of which sifting the rightmost term produces~(\ref{fouri:125:10}). % ---------------------------------------------------------------------- \section{The Gaussian pulse} \label{fouri:130} \index{Gauss, Carl Friedrich (1777--1855)} \index{Gaussian pulse} \index{pulse, Gaussian} While studying the derivative in Chs.~\ref{drvtv} and~\ref{cexp}, we asked among other questions whether any function could be its own derivative. We found that a sinusoid could be its own derivative after a fashion---differentiation shifted its curve leftward but did not alter its shape---but that the only nontrivial function to be exactly its own derivative was the natural exponential $f(z)=Ae^z$. We later found the same natural exponential to fill several significant mathematical roles---largely, whether directly or indirectly, because it was indeed its own derivative. Studying the Fourier transform, the question now arises again: can any function be its own transform? Well, we have already found in \S~\ref{fouri:120} that the Dirac delta pulse train can be; but this train unlike the natural exponential is abrupt and ungraceful, perhaps not the sort of function one had in mind. One should like an analytic function, and preferably not a train but a single pulse. \index{bell curve} \index{cleverness} \index{$\Omega$ as the Gaussian pulse} In Chs.~\ref{specf} and~\ref{prob}, during the study of special functions and probability, we shall encounter a most curious function, the \emph{Gaussian pulse,} also known as the \emph{bell curve} among other names. We will defer discussion of the Gaussian pulse's provenance to the coming chapters but, for now, we can just copy here the pulse's definition % diagn: check the next reference. from~(\ref{prob:normdist}) as \bq{fouri:130:10} \Omega(t) \equiv \frac{\exp\left(-t^2/2\right)}{\sqrt{2\pi}}, \eq % diagn: check the rest of the sentence, which is new. plotted on pages~\pageref{prob:normdist-fig} below and~\pageref{fours:095:fig1} above, respectively in Figs.~\ref{prob:normdist-fig} and~\ref{fours:095:fig1}. The Fourier transform of the Gaussian pulse is even trickier to compute than were the transforms of \S~\ref{fouri:120}, but known techniques to compute it include the following.% \footnote{ An alternate technique is outlined in \cite[Prob.~5.43]{Hsu:sig}. } From the definition~(\ref{fouri:byu}) of the Fourier transform, \[ \mathcal F\{ \Omega(v) \} = \frac{1}{2\pi} \int_{-\infty}^{\infty} \exp\left( -\frac{\theta^2}{2} - iv \theta \right) \,d\theta. \] Completing the square (\S~\ref{alggeo:240}),% \footnote{\cite{Davis-Westin-conversation}} \bqb \mathcal F\{ \Omega(v) \} &=& \frac{\exp\left(-v^2/2\right)}{2\pi} \int_{-\infty}^{\infty} \exp\left( -\frac{\theta^2}{2} - iv \theta + \frac{v^2}{2} \right) \,d\theta \\&=& \frac{\exp\left(-v^2/2\right)}{2\pi} \int_{-\infty}^{\infty} \exp\left[ -\frac{\left( \theta + iv \right)^2}{2} \right] \,d\theta. \eqb Changing $\xi \la \theta + iv$, \[ \mathcal F\{ \Omega(v) \} = \frac{\exp\left(-v^2/2\right)}{2\pi} \int_{-\infty+iv}^{\infty+iv} \exp\left( -\frac{\xi^2}{2} \right) \,d\xi. \] Had we not studied complex contour integration in \S~\ref{inttx:250} we should find such an integral hard to integrate in closed form. However, since happily we have studied it, observing that the integrand $\exp(\xi^2)$ is an entire function (\S~\ref{taylor:330}) of~$\xi$---that is, that it is everywhere analytic---we recognize that one can trace the path of integration from $-\infty+i\theta$ to $\infty+i\theta$ along any contour one likes. Let us trace it along the real Argand axis from~$-\infty$ to~$\infty$, leaving only the two, short complex segments at the ends which (as is easy enough to see, and the formal proof is left as an exercise to the interested reader% \footnote{ The short complex segments at the ends might integrate to something were the real part of~$\xi^2$ negative, but the real part happens to be positive---indeed, most extremely positive---over the domains of the segments in question. }% ) lie so far away that---for this integrand---they integrate to nothing. So tracing leaves us with \bq{fouri:130:20} \mathcal F\{ \Omega(v) \} = \frac{\exp\left(-v^2/2\right)}{2\pi} \int_{-\infty}^{\infty} \exp\left( -\frac{\xi^2}{2} \right) \,d\xi. \eq How to proceed from here is not immediately obvious. None of the techniques of Ch.~\ref{inttx} seems especially suitable to evaluate \[ I \equiv \int_{-\infty}^{\infty} \exp\left( -\frac{\xi^2}{2} \right) \,d\xi, \] though if a search for a suitable contour of integration failed one might fall back on the Taylor-series technique of \S~\ref{inttx:450}. Fortunately, mathematicians have been searching hundreds of years for clever techniques to evaluate just such integrals and, when occasionally they should discover such a technique and reveal it to us, why, we record it in books like this, not to forget. \index{integral!closed complex contour} \index{contour integration!closed complex} \index{integration!by closed contour} \index{function!entire} \index{entire function} \index{integration technique} \index{integration!by conversion to cylindrical or polar form} Here is the technique.% \footnote{ \cite[\S~I:40-4]{Feynman} %The author has never heard who first discovered the technique---it %might have been Gauss himself---and regrettably has misplaced the %citation by which he first learned of it in print long ago (see %Appendix~\ref{hist}). However, he has since heard the technique %orally explained at least twice, including %by~\cite{Brown-conversation} if memory serves. The technique thus %would not seem especially obscure. In any event, the technique is now %recorded here. } The equations \[ \begin{split} I &= \int_{-\infty}^{\infty} \exp\left( -\frac{x^2}{2} \right) \,dx, \\ I &= \int_{-\infty}^{\infty} \exp\left( -\frac{y^2}{2} \right) \,dy, \end{split} \] express the same integral~$I$ in two different ways, the only difference being in the choice of letter for the dummy variable. What if we multiply the two? Then \bqb I^2 &=& \int_{-\infty}^{\infty} \exp\left( -\frac{x^2}{2} \right) \,dx \int_{-\infty}^{\infty} \exp\left( -\frac{y^2}{2} \right) \,dy \\&=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \exp\left( -\frac{x^2+y^2}{2} \right) \,dx\,dy. \eqb One, geometrical way to interpret this~$I^2$ is as a double integration over a plane in which $(x,y)$ are rectangular coordinates. If we interpret thus, nothing then prevents us from double-integrating by the cylindrical coordinates $(\rho; \phi)$, instead, as \bqb I^2 &=& \int_{-2\pi/2}^{2\pi/2} \int_{0}^{\infty} \exp\left( -\frac{\rho^2}{2} \right) \rho\,d\rho\,d\phi \\&=& 2\pi \left[ \int_{0}^{\infty} \exp\left( -\frac{\rho^2}{2} \right) \rho\,d\rho \right]. \eqb At a casual glance, the last integral in square brackets does not look much different from the integral with which we started, but see: it is not only that the lower limit of integration and the letter of the dummy variable have changed, but that an extra factor of the dummy variable has appeared---that the integrand ends not with~$d\rho$ but with~$\rho\,d\rho$. Once we have realized this, the integral's solution by antiderivative (\S~\ref{inttx:210}) becomes suddenly easy to guess: \[ I^2 = 2\pi \left[ -\exp\left( -\frac{\rho^2}{2} \right) \right]_{0}^{\infty} = 2\pi. \] So evidently, \[ I = \sqrt{2\pi}, \] which means that \bq{fouri:130:25} \int_{-\infty}^{\infty} \exp\left( -\frac{\xi^2}{2} \right) \,d\xi = \sqrt{2\pi} \eq as was to be calculated. Finally substituting~(\ref{fouri:130:25}) into~(\ref{fouri:130:20}), we have that \[ \mathcal F\{ \Omega(v) \} = \frac{\exp\left(-v^2/2\right)}{\sqrt{2\pi}}. \] which in view of~(\ref{fouri:130:10}) reveals the remarkable transform pair \bq{fouri:130:50} \Omega(v) \fouripair \Omega(v), \eq The Gaussian pulse transforms to itself. Old Fourier, who can twist and knot other curves with ease, seems powerless to bend Gauss' mighty curve. \index{Gaussian pulse!to implement the Dirac delta by} \index{pulse, Gaussian!to implement the Dirac delta by} \index{Dirac delta function!as implemented by the Gaussian\\pulse} % bad break \index{delta function, Dirac!as implemented by the Gaussian\\pulse} % bad break It is worth observing incidentally in light of~(\ref{fouri:130:10}) and~(\ref{fouri:130:25}) that \bq{fouri:130:60} \int_{-\infty}^{\infty} \Omega(t) \,dt = 1, \eq the same as for $\Pi(t)$, $\Lambda(t)$ and indeed $\delta(t)$. Section~\ref{fours:095} and its~(\ref{fours:095:33}) have recommended the shape of the Gaussian pulse, in the tall, narrow limit, to implement the Dirac delta $\delta(t)$. This section lends a bit more force to the recommendation, for not only is the Gaussian pulse analytic (unlike the Dirac delta) but it also behaves uncommonly well under Fourier transformation (like the Dirac delta), thus rendering the Dirac delta susceptible to an analytic limiting process which transforms amenably. Too, the Gaussian pulse is about as tightly localized as a nontrivial, uncontrived analytic function can be.% \footnote{ Consider that $\Omega(t) \approx \mbox{0x0.6621}, \mbox{0x0.3DF2}, \mbox{0x0.0DD2}, \mbox{0x0.0122}, \mbox{0x0.0009}, \mbox{0x0.0000}$ at $t=0,\pm 1,\pm 2,\pm 3,\pm 4, \pm 5$; and that $\Omega(\pm 8) < 2^{-\mbox{\tiny 0x2F}}$. Away from its middle region $\left|t\right| \lesssim 1$, the Gaussian pulse evidently vanishes rather convincingly. } The passion of one of the author's mentors in extolling the Gaussian pulse as ``absolutely a beautiful function'' seems well supported by the practical mathematical virtues exhibited by the function itself. The Gaussian pulse resembles the natural exponential in its general versatility. Indeed, though the book has required several chapters through this Ch.~\ref{fouri} to develop the fairly deep mathematics underlying the Gaussian pulse and supporting its basic application, now that we have the Gaussian pulse in hand we shall find that it ably fills all sorts of roles---not least the principal role of Ch.~\ref{prob} to come. % ---------------------------------------------------------------------- \section{The Laplace transform} \label{fouri:200} \index{Laplace transform} \index{transform!Laplace} \index{Laplace, Pierre-Simon (1749--1827)} Equation~(\ref{fouri:byu2}), defining the Fourier transform in the~$\mathcal{F}_{\omega t}$ notation, transforms pulses like those of Figs.~\ref{fouri:100:fig} and~\ref{fours:095:fig1} straightforwardly but stumbles on time-unlimited functions like $f(t) = \cos \omega_o t$ or even the ridiculously simple $f(t)=1$. Only by the indirect techniques of \S\S~\ref{fouri:110} and~\ref{fouri:120} have we been able to transform such functions. Such indirect techniques are valid and even interesting, but nonetheless can prompt the tasteful mathematician to wonder whether a simpler alternative to the Fourier transform were not possible. At the sometimes acceptable cost of omitting one of the Fourier integral's two tails,% \footnote{ There has been invented a version of the Laplace transform which omits no tail~\cite[Ch.~3]{Hsu:sig}. This book does not treat it. } the \emph{Laplace transform} \bq{fouri:laplace} F(s) = \mathcal L\{ f(t) \} \equiv \int_{0^{-}}^\infty e^{-st} f(t) \,dt \eq offers such an alternative. Here, $s=j\omega$ is the transform variable and, when~$s$ is purely imaginary, the Laplace transform is very like the Fourier; but Laplace's advantage lies in that it encourages the use of a complex~$s$, usually with a negative real part, which in~(\ref{fouri:laplace})'s integrand suppresses even the tail the transform does not omit, thus effectively converting even a time-unlimited function to an integrable pulse---and Laplace does so without resort to indirect techniques.% \footnote{\cite[Ch.~3]{Hsu:sig}\cite[Ch.~7]{Phillips/Parr}\cite[Ch.~19]{JJH}} As we said, the Laplace transform's definition~(\ref{fouri:laplace})---quite unlike the Fourier transform's definition~(\ref{fouri:byu2})---tends to transform simple functions straightforwardly. For instance, the pair \[ 1 \laplair \frac{1}{s}. \] (in which one can elaborate the symbol~$\mathcal L$ as~$\mathcal{L}_{st}$ if desired) comes by direct application of the definition. The first several of the Laplace properties of Table~\ref{fouri:laplace-properties} likewise come by direct application. The differentiation property comes by \[ \mathcal L\left\{ \frac{d}{dt} f(t) \right\} = \int_{0^{-}}^\infty e^{-st} \frac{d}{dt} f(t) \,dt = \int_{0^{-}}^\infty e^{-st} \,d\left[f(t)\right] \] and thence by integration by parts (\S~\ref{inttx:230}); whereafter the higher-order differentiation property comes by repeated application of the differentiation property. The integration property merely reverses the integration property on the function $g(t) \equiv \int_{0^{-}}^\infty f(\tau) \,d\tau$, for which $dg/dt = f(t)$ and $g(0^{-}) = 0$. The ramping property comes by differentiating and negating~(\ref{fouri:laplace}) as \[ -\frac{d}{ds} F(s) = -\frac{d}{ds} \int_{0^{-}}^\infty e^{-st} f(t) \,dt = \int_{0^{-}}^\infty e^{-st} [tf(t)] \,dt = \mathcal L \{tf(t)\}; \] whereafter again the higher-order property comes by repeated application. The convolution property comes as it did in \S~\ref{fouri:110.40} except that here we take advantage the presence of Heaviside's unit step to manipulate the limits of integration, beginning \bqb \lefteqn{ \mathcal{L}\left\{ \int_{-\infty}^{\infty} u\left(\frac{t}{2}-\psi\right) h\left(\frac{t}{2}-\psi\right) u\left(\frac{t}{2}+\psi\right) f\left(\frac{t}{2}+\psi\right) \,d\psi \right\} } &&\\&=& \int_{-\infty}^{\infty} e^{-st} \int_{-\infty}^{\infty} u\left(\frac{t}{2}-\psi\right) h\left(\frac{t}{2}-\psi\right) u\left(\frac{t}{2}+\psi\right) f\left(\frac{t}{2}+\psi\right) \,d\psi \,dt, \eqb wherein evidently $u(t/2-\psi)u(t/2+\psi)=0$ for all $t < 0$, regardless of the value of~$\psi$. As in \S~\ref{fouri:110.40}, here also we change $\phi \la t/2 + \psi$ and $\chi \la \phi - 2\psi$, eventually reaching the form \bqb \lefteqn{ \mathcal{L}\left\{ \int_{-\infty}^{\infty} u\left(\frac{t}{2}-\psi\right) h\left(\frac{t}{2}-\psi\right) u\left(\frac{t}{2}+\psi\right) f\left(\frac{t}{2}+\psi\right) \,d\psi \right\} } &&\\&=& \left[ \int_{-\infty}^{\infty} e^{-s\chi} u(\chi)h(\chi) \,d\chi \right] \left[ \int_{-\infty}^{\infty} e^{-s\phi} u(\phi)f(\phi) \,d\phi \right], \eqb after which once more we take advantage of Heaviside, this time to curtail each integration to begin at~$0^{-}$ rather than at~$-\infty$, thus completing the convolution property's proof. \begin{table} \caption{Properties of the Laplace transform.} \label{fouri:laplace-properties} \bqb u(t-t_o)f(t-t_o) &\laplair& e^{-st_o}F(s) \\ e^{-at}f(t) &\laplair& F(s+a) \\ Af(\alpha t) &\laplair& \frac{A}{\alpha}F\left(\frac{s}{\alpha}\right) \ \ \mbox{if $\Im(\alpha)=0$, $\Re(\alpha) > 0$} \\ A_1f_1(t) + A_2f_2(t) &\laplair& A_1F_1(t) + A_2F_2(t) \\ \frac{d}{dt}f(t) &\laplair& sF(s) - f(0^{-}) \\ \frac{d^n}{dt^n}f(t) &\laplair& s^nF(s) - \sum_{k=0}^{n-1}\left\{s^k\left[\frac{d^{n-1-k}}{dt^{n-1-k}}f(t)\right]_{t=0^{-}}\right\} \\ \int_{0^{-}}^\infty f(\tau) \,d\tau &\laplair& \frac{F(s)}{s} \\ tf(t) &\laplair& -\frac{d}{ds}F(s) \\ t^nf(t) &\laplair& (-)^n\frac{d^n}{ds^n}F(s) \\ % I lack a ready proof of the next property, commented out. % It's probably not important enough for this table, anyway. %\int_{0^{-}}^t f(\tau) \,d\tau &\laplair& \frac{F(s)}{s} \\ \left[u(t)h(t)\right] \ast \left[u(t)f(t)\right] &\laplair& H(s) F(s) \eqb \end{table} \begin{table} \caption{Laplace transform pairs.} \label{fouri:laplace-pairs} \[ \renewcommand\arraystretch{2.0} \br{rclcrcl} \delta(t) &\laplair& 1 &\ \ & && \\ \ds 1 &\laplair& \ds \frac{1}{s} && \ds e^{-at} &\laplair& \ds \frac{1}{s+a} \\ \ds t &\laplair& \ds \frac{1}{s^2} && \ds e^{-at}t &\laplair& \ds \frac{1}{(s+a)^2} \\ \ds t^n &\laplair& \ds \frac{n!}{s^{n+1}} && \ds e^{-at}t^n &\laplair& \ds \frac{n!}{(s+a)^{n+1}} \\ \ds \sin\omega_o t &\laplair& \ds \frac{\omega_o}{s^2+\omega_o^2} && \ds e^{-at}\sin\omega_o t &\laplair& \ds \frac{\omega_o}{(s+a)^2+\omega_o^2} \\ \ds \cos\omega_o t &\laplair& \ds \frac{s}{s^2+\omega_o^2} && \ds e^{-at}\cos\omega_o t &\laplair& \ds \frac{s+a}{(s+a)^2+\omega_o^2} \er \] \end{table} As Table~\ref{fouri:laplace-properties} lists Laplace properties, so Table~\ref{fouri:laplace-pairs} lists Laplace transform pairs. As the former table's, most too of the latter table's entries come by direct application of the Laplace transform's definition~(\ref{fouri:laplace}) (though to reach the sine and cosine entries one should first split the sine and cosine functions per Table~\ref{cexp:tbl-prop} into their complex exponential components). The pair $t \laplair 1/s^2$ comes by application of the property that $tf(t) \laplair -(d/ds)F(s)$ to the pair $1 \laplair 1/s$, and the pair $t^n \laplair n!/s^{n+1}$ comes by repeated application of the same property. The pairs transforming $e^{-at}t$ and $e^{-at}t^n$ come similarly. In the application of either table,~$a$ may be, and~$s$ usually is, complex, but~$\alpha$ and~$t$ are normally real. % ---------------------------------------------------------------------- \section{Solving differential equations by Laplace} \label{fouri:250} \index{differential equation!solution of by the Laplace transform} \index{Laplace transform!solving a differential equation by} \index{time domain} \index{frequency domain} \index{domain!time and frequency} \index{transform domain} \index{domain!transform} \index{Laplace transform!comparison of against the Fourier transform} \index{Fourier transform!comparison of against the Laplace transform} The Laplace transform is curious, but admittedly one often finds in practice that the more straightforward---though harder to analyze---Fourier transform is a better tool for frequency-domain analysis, among other reasons because Fourier brings an inverse transformation formula~(\ref{fouri:byu2}) whereas Laplace does not.% \footnote{ Actually, formally, Laplace does bring an uncouth contour-integrating inverse transformation formula in footnote~\ref{fouri:250:fn30}, but we'll not use it. } This depends on the application. However, by the way, another use use for the Laplace transform happens to arise. The latter use emerges from the Laplace property of Table~\ref{fouri:laplace-properties} that $(d/dt)f(t) \laplair sF(s) - f(0^{-})$, according to which, evidently, \emph{differentiation in the time (untransformed) domain corresponds to multiplication by the transform variable~$s$ in the frequency (transformed) domain.} \index{initial condition} Now, one might say the same of the Fourier transform, for it has a differentiation property, too, $(d/dv)f(v) \fouripair ivF(s)$, which looks rather alike. The difference however lies in Laplace's extra Laplace term $-f(0^{-})$ which, significantly, represents the untransformed function's initial condition. \index{state space} To see the significance, consider for example the linear differential equation% \footnote{\cite[Example~19.31]{JJH}}$\mbox{}^{,}$% \footnote{ It is rather enlightening to study the same differential equation, written in the \emph{state-space} style \cite[Ch.~8]{Phillips/Parr} \[ \frac{d}{dt}\ve f(t) = \mf{rr}{ 0 & 1 \\ -3 & -4 } \ve f(t) + \mf{c}{0 \\ e^{-2t}},\ \ % \ve f(0) = \mf{c}{1 \\ 2}, \] where $\ve f(t) \equiv [1\;d/dt]^{T}f(t)$. The effort required to assimilate the notation rewards the student with significant insight into the manner in which initial conditions---here symbolized $\ve f(0)$---determine a system's subsequent evolution. } \bqb \frac{d^2}{dt^2}f(t) + 4\frac{d}{dt}f(t) + 3f(t) &=& e^{-2t}, \\ \left. f(t) \right|_{t=0^{-}} &=& 1, \\ \left. \frac{d}{dt}f(t) \right|_{t=0^{-}} &=& 2. \eqb Applying the properties of Table~\ref{fouri:laplace-properties} and transforms of Table~\ref{fouri:laplace-pairs}, term by term, yields the transformed equation \bqb \lefteqn{ \left\{ s^2F(s) - s\bigg[ f(t) \bigg]_{t=0^{-}} - \bigg[ \frac{d}{dt}f(t) \bigg]_{t=0^{-}} \right\} } &&\\&&\ \ \ \ \mbox{} + 4 \left\{ sF(s) - \bigg[ f(t) \bigg]_{t=0^{-}} \right\} + 3F(s) = \frac{1}{s+2}. \eqb That is, \[ (s^2+4s+3)F(s) - (s+4)\bigg[ f(t) \bigg]_{t=0^{-}} - \bigg[\frac{d}{dt}f(t) \bigg]_{t=0^{-}} = \frac{1}{s+2}. \] Applying the known initial conditions, \[ (s^2+4s+3)F(s) - (s+4)[1] - [2] = \frac{1}{s+2}. \] Combining like terms, \[ (s^2+4s+3)F(s) - (s+6) = \frac{1}{s+2}. \] Multiplying by $s+2$ and rearranging, \[ (s+2)(s^2+4s+3)F(s) = s^2+8s+13. \] Isolating the heretofore unknown frequency-domain function $F(s)$, \[ F(s) = \frac{s^2+8s+\mbox{0xD}}{(s+2)(s^2+4s+3)}. \] Factoring the denominator, \[ F(s) = \frac{s^2+8s+\mbox{0xD}}{(s+1)(s+2)(s+3)}. \] Expanding in partial fractions (this step being the key to the whole technique: see \S~\ref{inttx:260}), \[ F(s) = \frac{3}{s+1} - \frac{1}{s+2} - \frac{1}{s+3}. \] Though we lack an inverse transformation formula, it seems that we do not need one because---having split the frequency-domain equation into such simple terms---we can just look up the inverse transformation in Table~\ref{fouri:laplace-pairs}, term by term. The time-domain solution \[ f(t) = 3e^{-t} - e^{-2t} - e^{-3t} \] results to the differential equation with which we started.% \footnote{\label{fouri:250:fn30}% The careful reader might object that we have never proved that the Laplace transform cannot map distinct time-domain functions atop one another in the frequency domain; that is, that we have never shown the Laplace transform to be invertible. The objection has merit. Consider for instance the time-domain function $f_2(t) = u(t)[3e^{-t} - e^{-2t} - e^{-3t}]$, whose Laplace transform does not differ from that of $f(t)$. However, even the careful reader will admit that the suggested $f_2(t)$ differs from $f(t)$ only over $t < 0$, a domain Laplace ignores. What one thus ought to ask is whether the Laplace transform can map time-domain functions, the functions being \emph{distinct for $t \ge 0$,} atop one another in the frequency domain. In one sense it may be unnecessary to answer even the latter question, for one can check the correctness, and probably also the sufficiency, of any solution Laplace might offer to a particular differential equation by the expedient of substituting the solution back into the equation. However, one can answer the latter question formally nonetheless by changing $s \la i\omega$ in~(\ref{fouri:eqn}) and observing the peculiar, contour-integrating inverse of the Laplace transform, $f(t) = (1/i2\pi) \int_{-i\infty}^{i\infty} e^{st} F(s) \,ds$, which results \cite[eqn.~7.2]{Phillips/Parr}\@. To consider the choice of contours of integration and otherwise to polish the answer is left as an exercise to the interested reader; here it is noted only that, to cause the limits of integration involved to behave nicely, one might insist as a precondition to answering the question something like that $f(t) = 0$ for all $t<0$, the precondition being met by any $f(t) = u(t)g(t)$ (in which $u[t]$ is formally defined for the present purpose such that $u[0] = 1$). } This section's Laplace technique neatly solves many linear differential equations. % ---------------------------------------------------------------------- \section{Initial and final values by Laplace} \label{fouri:260} \index{initial-value theorem} \index{final-value theorem} \index{Laplace transform!initial and final values by} The method of \S~\ref{fouri:250} though effective is sometimes too much work, when all one wants to know are the initial and/or final values of a function $f(t)$, when one is not interested in the details between. The Laplace transform's \emph{initial-} and \emph{final-value theorems,} \bq{fouri:260:10} \begin{split} f(0^{+}) &= \lim_{s \ra \infty} sF(s), \\ \lim_{t\ra\infty} f(t) &= \lim_{s \ra 0} sF(s), \end{split} \eq meet this want. (Note that these are not transform pairs as in Tables~\ref{fouri:laplace-properties} and~\ref{fouri:laplace-pairs} but actual equations.) One derives the initial-value theorem by the successive steps \bqb \lim_{s \ra \infty} \mathcal L \left\{\frac{d}{dt}f(t)\right\} &=& \lim_{s \ra \infty} sF(s) - f(0^{-}), \\ \lim_{s \ra \infty} \int_{0^{-}}^\infty e^{-st} \frac{d}{dt}f(t) \,dt &=& \lim_{s \ra \infty} sF(s) - f(0^{-}), \\ \int_{0^{-}}^{0^{+}} \frac{d}{dt}f(t) \,dt &=& \lim_{s \ra \infty} sF(s) - f(0^{-}), \\ f(0^{+}) - f(0^{-}) &=& \lim_{s \ra \infty} sF(s) - f(0^{-}), \eqb which invoke the time-differentiation property of Table~\ref{fouri:laplace-properties} and the last of which implies (\ref{fouri:260:10})'s first line. For the final value, one begins \bqb \lim_{s \ra 0} \mathcal L \left\{\frac{d}{dt}f(t)\right\} &=& \lim_{s \ra 0} sF(s) - f(0^{-}), \\ \lim_{s \ra 0} \int_{0^{-}}^\infty e^{-st} \frac{d}{dt}f(t) \,dt &=& \lim_{s \ra 0} sF(s) - f(0^{-}), \\ \int_{0^{-}}^\infty \frac{d}{dt}f(t) \,dt &=& \lim_{s \ra 0} sF(s) - f(0^{-}), \\ \lim_{t\ra\infty} f(t) - f(0^{-}) &=& \lim_{s \ra 0} sF(s) - f(0^{-}), \eqb and (\ref{fouri:260:10})'s second line follows immediately.% \footnote{\cite[\S~7.5]{Phillips/Parr}} % ---------------------------------------------------------------------- \section{The spatial Fourier transform} \label{fouri:350} \index{spatial Fourier transform} \index{Fourier transform!spatial} \index{time} \index{space} \index{wave mechanics} \index{kernel} \index{spatiotemporal phase factor} \index{phase factor!spatiotemporal} \index{convention} \index{integration!volume} \index{integration!triple} In the study of wave mechanics, % diagn: add an appropriate in-book reference? physicists and engineers sometimes elaborate this chapter's kernel~$e^{iv\theta}$ or~$e^{i\omega t}$, or by whichever pair of letters is let to represent the complementary variables of transformation, into the more general, spatiotemporal phase factor% \footnote{ The choice of sign here is a matter of convention, which differs by discipline. This book tends to reflect its author's preference for $e^{i(-\omega t + \ve k\cdot\ve r)}$, convenient in electrical modeling but slightly less convenient in quantum-mechanical work. } $e^{i(\pm\omega t \mp\ve k\cdot\ve r)}$; where~$\ve k$ and~$\ve r$ are three-dimensional geometrical vectors and~$\ve r$ in particular represents a position in space. To review the general interpretation and use of such a factor lies beyond the chapter's scope but the factor's very form, \[ e^{i(\mp\omega t \pm\ve k\cdot\ve r)} = e^{i(\mp\omega t \pm k_xx \pm k_yy \pm k_zz)}, \] suggests Fourier transformation with respect not merely to time but also to space. There results the \emph{spatial Fourier transform} \bq{fouri:spatial} \begin{split} F(\ve k) &= \frac{1}{(2\pi)^{3/2}} \int_V e^{+i\ve k\cdot\ve r} f(\ve r) \,d\ve r, \\ f(\ve r) &= \frac{1}{(2\pi)^{3/2}} \int_V e^{-i\ve k\cdot\ve r} F(\ve k) \,d\ve k, \end{split} \eq analogous to~(\ref{fouri:byu2}) but cubing the $1/\sqrt{2\pi}$ scale factor for the triple integration and reversing the sign of the kernel's exponent. The transform variable~$\ve k$, analogous to~$\omega$, is a \emph{spatial frequency,} also for other reasons called a \emph{propagation vector.} \index{integration!surface} \index{integration!double} \index{integration!fourfold} \index{integration!sixfold} Nothing prevents one from extending~(\ref{fouri:spatial}) to four dimensions, including a fourth integration to convert time~$t$ to temporal frequency~$\omega$ while also converting position~$\ve r$ to spatial frequency~$\ve k$. On the other hand, one can restrict it to two dimensions or even one. Thus, various plausibly useful Fourier transforms include \[ \begin{split} F(\omega) &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-i\omega t} f(t) \,dt, \\ F(k_z) &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{+ik_zz} f(z) \,dz, \\ F(\ve k_\rho) &= \frac{1}{2\pi} \int_S e^{+i\ve k_\rho\cdot\we\rho} f(\we\rho) \,d\we\rho, \\ F(\ve k) &= \frac{1}{(2\pi)^{3/2}} \int_V e^{+i\ve k\cdot\ve r} f(\ve r) \,d\ve r, \\ F(\ve k, \ve \omega) &= \frac{1}{(2\pi)^{2}} \int_V \int_{-\infty}^{\infty} e^{i(-\omega t+\ve k\cdot\ve r)} f(\ve r,t) \,dt\,d\ve r, \end{split} \] among others. derivations-0.53.20120414.orig/tex/gjrank.tex0000644000000000000000000031556511742566274017154 0ustar rootroot% ---------------------------------------------------------------------- \chapter [Rank and the Gauss-Jordan] {Matrix rank and the Gauss-Jordan decomposition} \label{gjrank} \index{matrix rudiments} \index{rudiments} Chapter~\ref{matrix} has brought the matrix and its rudiments, the latter including \bi \item lone-element matrix~$E$ (\S~\ref{matrix:180.30}), \item the null matrix~$0$ (\S~\ref{matrix:180.25}), \item the rank-$r$ identity matrix~$I_r$ (\S~\ref{matrix:180.22}), \item the general identity matrix~$I$ and the scalar matrix~$\lambda I$ (\S~\ref{matrix:180.35}), \item the elementary operator~$T$ (\S~\ref{matrix:320}), \item the quasielementary operator~$P$, $D$, $L_{[k]}$ or $U_{[k]}$ (\S~\ref{matrix:325}), and \item the unit triangular matrix~$L$ or~$U$ (\S~\ref{matrix:330}). \ei Such rudimentary forms have useful properties, as we have seen. The general matrix~$A$ does not necessarily have any of these properties, but it turns out that one can factor any matrix whatsoever into a product of rudiments which do have the properties, and that several orderly procedures are known to do so. The simplest of these, and indeed one of the more useful, is the Gauss-Jordan decomposition. This chapter introduces it. Section~\ref{matrix:180} has de\"emphasized the concept of matrix dimensionality $m \times n$, supplying in its place the new concept of matrix rank. However, that section has actually defined rank only for the rank-$r$ identity matrix~$I_r$. In fact all matrices have rank. This chapter explains. Before treating the Gauss-Jordan decomposition and the matter of matrix rank as such, however, we shall find it helpful to prepare two preliminaries thereto: (i)~the matter of the linear independence of vectors; and (ii)~the elementary similarity transformation. The chapter begins with these. Except in \S~\ref{gjrank:337}, the chapter demands more rigor than one likes in such a book as this. However, it is hard to see how to avoid the rigor here, and logically the chapter cannot be omitted. We will drive through the chapter in as few pages as can be managed, and then onward to the more interesting matrix topics of Chs.~\ref{mtxinv} and~\ref{eigen}. % ---------------------------------------------------------------------- \section{Linear independence} \label{gjrank:335} \index{linear independence} \index{linear dependence} \index{independence} \index{linear combination} \index{weighted sum} \index{sum!weighted} \index{triviality} \index{coefficient!nontrivial} Linear independence is a significant possible property of a set of vectors---whether the set be the several columns of a matrix, the several rows, or some other vectors---the property being defined as follows. A vector is \emph{linearly independent} if its role cannot be served by the other vectors in the set. More formally, the~$n$ vectors of the set $\{\ve a_1,\ve a_2,\ve a_3,\ve a_4,\ve a_5,\ldots,\ve a_n\}$ are linearly independent if and only if none of them can be expressed as a linear combination---a weighted sum---of the others. That is, the several~$\ve a_k$ are linearly independent iff \bq{gjrank:335:10} \alpha_1\ve a_1 + \alpha_2\ve a_2 + \alpha_3\ve a_3 + \cdots + \alpha_n\ve a_n \neq 0 \eq for all nontrivial~$\alpha_k$, where ``nontrivial~$\alpha_k$'' means the several~$\alpha_k$, at least one of which is nonzero (\emph{trivial}~$\alpha_k$, by contrast, would be $\alpha_1=\alpha_2=\alpha_3=\cdots=\alpha_n=0$). Vectors which can combine nontrivially to reach the null vector are by definition \emph{linearly dependent.} \index{zero vector} \index{null vector} \index{$0$ (zero)!vector} \index{vector!zero or null} Linear independence is a property of vectors. Technically the property applies to scalars, too, inasmuch as a scalar resembles a one-element vector---so, any nonzero scalar alone is linearly independent---but there is no such thing as a linearly independent pair of scalars, because one of the pair can always be expressed as a complex multiple of the other. Significantly but less obviously, there is also no such thing as a linearly independent set which includes the null vector;~(\ref{gjrank:335:10}) forbids it. Paradoxically, even the single-member, $n=1$ set consisting only of $\ve a_1=0$ is, strictly speaking, not linearly independent. \index{empty set} \index{edge case} For consistency of definition, we regard the empty, $n=0$ set as linearly independent, on the technical ground that the only possible linear combination of the empty set is trivial.% \footnote{ This is the kind of thinking which typically governs mathematical edge cases. One could define the empty set to be linearly dependent if one really wanted to, but what then of the observation that adding a vector to a linearly dependent set never renders the set independent? Surely in this light it is preferable just to define the empty set as independent in the first place. Similar thinking makes $0!=1$, $\sum_{k=0}^{-1} a_kz^k = 0$, and~2 not~1 the least prime, among other examples. } \index{vector!arbitrary} If a linear combination of several independent vectors~$\ve a_k$ forms a vector~$\ve b$, then one might ask: can there exist a different linear combination of the same vectors~$\ve a_k$ which also forms~$\ve b$? That is, if \[ \beta_1\ve a_1 + \beta_2\ve a_2 + \beta_3\ve a_3 + \cdots + \beta_n\ve a_n = \ve b, \] where the several~$\ve a_k$ satisfy~(\ref{gjrank:335:10}), then is \[ \beta'_1\ve a_1 + \beta'_2\ve a_2 + \beta'_3\ve a_3 + \cdots + \beta'_n\ve a_n = \ve b \] possible? To answer the question, suppose that it were possible. The difference of the two equations then would be \[ (\beta_1'-\beta_1)\ve a_1 + (\beta_2'-\beta_2)\ve a_2 + (\beta_3'-\beta_3)\ve a_3 + \cdots + (\beta_n'-\beta_n)\ve a_n = 0. \] According to~(\ref{gjrank:335:10}), this could only be so if the coefficients in the last equation where trivial---that is, only if $\beta_1'-\beta_1=0$, $\beta_2'-\beta_2=0$, $\beta_3'-\beta_3=0$, \ldots, $\beta_n'-\beta_n=0$. But this says no less than that the two linear combinations, which we had supposed to differ, were in fact one and the same. One concludes therefore that, \emph{if a vector~$\ve b$ can be expressed as a linear combination of several linearly independent vectors~$\ve a_k$, then it cannot be expressed as any other combination of the same vectors.} The combination is unique. \index{visualization, geometrical} \index{geometrical visualization} \index{abstraction} Linear independence can apply in any dimensionality, but it helps to visualize the concept geometrically in three dimensions, using the three-dimensional geometrical vectors of \S~\ref{trig:230}. Two such vectors are independent so long as they do not lie along the same line. A third such vector is independent of the first two so long as it does not lie in their common plane. A fourth such vector (unless it points off into some unvisualizable fourth dimension) cannot possibly then be independent of the three. We discuss the linear independence of vectors in this, a chapter on matrices, because (\S~\ref{matrix:120}) a matrix is essentially a sequence of vectors---either of column vectors or of row vectors, depending on one's point of view. As we shall see in \S~\ref{gjrank:340}, the important property of matrix \emph{rank} depends on the number of linearly independent columns or rows a matrix has. % ---------------------------------------------------------------------- \section{The elementary similarity transformation} \label{gjrank:337} \index{elementary similarity transformation} \index{similarity transformation} Section~\ref{matrix:321} and its~(\ref{matrix:321:60}) have introduced the \emph{similarity transformation} % bad break $CAC^{-1}$ or~$C^{-1}AC$, which arises when an operator~$C$ commutes respectively rightward or leftward past a matrix~$A$. The similarity transformation has several interesting properties, some of which we are now prepared to discuss, particularly in the case in which the operator happens to be an elementary, $C=T$. In this case, the several rules of Table~\ref{gjrank:337:table} obtain. \begin{table} \caption{Some elementary similarity transformations.} \label{gjrank:337:table} \settowidth\tla{${U^{\{k\}}}'$} \nc\xx{$L,U,L^{[k]},U^{[k]},L^{\{k\}},U^{\{k\}},L_\|^{\{k\}},U_\|^{\{k\}}$} \settowidth\tlb{\xx} \bqb T_{[i \lra j]} I T_{[i \lra j]} &=& I \\ T_{[i \lra j]} P T_{[i \lra j]} &=& P' \\ T_{[i \lra j]} D T_{[i \lra j]} &=& D' = D + \left([D]_{jj}-[D]_{ii}\right)E_{ii} + \left([D]_{ii}-[D]_{jj}\right)E_{jj} \\ T_{[i \lra j]} D T_{[i \lra j]} &=& \makebox[\tla][l]{$D$} \ \ \mbox{if $[D]_{ii} = [D]_{jj}$} \\ T_{[i \lra j]} L^{[k]}T_{[i \lra j]} &=& \makebox[\tla][l]{$L^{[k]}$} \ \ \mbox{if $ik$ and $j>k$} \\ T_{[i \lra j]} L^{\{k\}}T_{[i \lra j]} &=& \makebox[\tla][l]{${L^{\{k\}}}'$} \ \ \mbox{if $i>k$ and $j>k$} \\ T_{[i \lra j]} U^{\{k\}}T_{[i \lra j]} &=& \makebox[\tla][l]{${U^{\{k\}}}'$} \ \ \mbox{if $ik$ and $j>k$} \\ T_{[i \lra j]} U_\|^{\{k\}}T_{[i \lra j]} &=& \makebox[\tla][l]{${U_\|^{\{k\}}}'$} \ \ \mbox{if $i j$} \\ T_{\alpha[ij]} U T_{-\alpha[ij]} &=& \makebox[\tla][l]{$U'$} \ \ \mbox{if $i < j$} \eqb \end{table} Most of the table's rules are fairly obvious if the meaning of the symbols is understood, though to grasp some of the rules it helps to sketch the relevant matrices on a sheet of paper. Of course rigorous symbolic proofs can be constructed after the pattern of \S~\ref{matrix:330.30}, but they reveal little or nothing sketching the matrices does not. The symbols~$P$, $D$, $L$ and~$U$ of course represent the quasielementaries and unit triangular matrices of \S\S~\ref{matrix:325} and~\ref{matrix:330}. The symbols~$P'$, $D'$, $L'$ and~$U'$ also represent quasielementaries and unit triangular matrices, only not necessarily the same ones~$P$, $D$, $L$ and~$U$ do. The rules permit one to commute some but not all elementaries past a quasielementary or unit triangular matrix, without fundamentally altering the character of the quasielementary or unit triangular matrix, and sometimes without altering the matrix at all. The rules find use among other places in the Gauss-Jordan decomposition of \S~\ref{gjrank:341}. % ---------------------------------------------------------------------- \section{The Gauss-Jordan decomposition} \label{gjrank:341} \index{Gauss, Carl Friedrich (1777--1855)} \index{Jordan, Wilhelm (1842--1899)} \index{Gauss-Jordan decomposition} \index{decomposition!Gauss-Jordan} \index{Gauss-Jordan factorization} \index{factorization!Gauss-Jordan} \index{reversibility} \index{$LU$ decomposition} \index{decomposition!$LU$} The \emph{Gauss-Jordan decomposition} of an arbitrary, dimension-limited, $m \times n$ matrix~$A$ is% \footnote{ Most introductory linear algebra texts this writer has met call the Gauss-Jordan decomposition instead the ``$LU$ decomposition'' and include fewer factors in it, typically merging~$D$ into~$L$ and omitting~$K$ and~$S$. They also omit~$I_r$, since their matrices have pre-defined dimensionality. Perhaps the reader will agree that the decomposition is cleaner as presented here. } \bq{gjrank:341:GJ} \begin{split} A = G_> I_r G_< & = P D L U I_r K S, \\ G_< &\equiv K S, \\ G_> &\equiv P D L U, \end{split} \eq where \bi \item $P$ and~$S$ are general interchange operators (\S~\ref{matrix:325.10}); \item $D$ is a general scaling operator (\S~\ref{matrix:325.20}); \item $L$ and~$U$ are respectively unit lower and unit upper triangular matrices (\S~\ref{matrix:330}); \item $K=L_\|^{\{r\}T}$ is the transpose of a parallel unit lower triangular matrix, being thus a parallel unit upper triangular matrix (\S~\ref{matrix:330.50}); \item $G_>$ and~$G_<$ are composites% \footnote{ One can pronounce~$G_>$ and~$G_<$ respectively as~``$G$ acting rightward'' and~``$G$ acting leftward.'' The letter~$G$ itself can be regarded as standing for ``Gauss-Jordan,'' but admittedly it is chosen as much because otherwise we were running out of available Roman capitals! } as defined by~(\ref{gjrank:341:GJ}); and \item $r$ is an unspecified rank. \ei The Gauss-Jordan decomposition is also called the \emph{Gauss-Jordan factorization.} Whether all possible matrices~$A$ have a Gauss-Jordan decomposition (they do, in fact) is a matter this section addresses. However---at least for matrices which do have one---because~$G_>$ and~$G_<$ are composed of invertible factors, one can left-multiply the equation $A = G_> I_r G_<$ by~$G_>^{-1}$ and right-multiply it by~$G_<^{-1}$ to obtain \bq{gjrank:341:GJinv} \begin{split} U^{-1} L^{-1} D^{-1} P^{-1} A S^{-1} K^{-1} & = G_>^{-1} A G_<^{-1} = I_r, \\ S^{-1} K^{-1} &= G_<^{-1}, \\ U^{-1} L^{-1} D^{-1} P^{-1} &= G_>^{-1}, \end{split} \eq the Gauss-Jordan's complementary form. \subsection{Motive} \label{gjrank:341.01} Equation~(\ref{gjrank:341:GJ}) seems inscrutable. The equation itself is easy enough to read, but just as there are many ways to factor a scalar ($\mbox{0xC} = [4][3] = [2]^2[3] = [2][6]$, for example), there are likewise many ways to factor a matrix. Why choose this particular way? There are indeed many ways. We shall meet some of the others in \S\S~\ref{mtxinv:460}, \ref{eigen:423}, \ref{eigen:520} and~\ref{eigen:600}. The Gauss-Jordan decomposition we meet here however has both significant theoretical properties and useful practical applications, and in any case needs less advanced preparation to appreciate than the others, and (at least as developed in this book) precedes the others logically. It emerges naturally when one posits a pair of square, $n \times n$ matrices,~$A$ and~$A^{-1}$, for which $A^{-1} A = I_n$, where~$A$ is known and~$A^{-1}$ is to be determined. (The~$A^{-1}$ here is the $A^{-1(n)}$ of eqn.~\ref{matrix:321:20}. However, it is only supposed here that $A^{-1} A = I_n$; it is not \emph{yet} claimed that $A A^{-1} = I_n$.) To determine~$A^{-1}$ is not an entirely trivial problem. The matrix~$A^{-1}$ such that $A^{-1} A = I_n$ may or may not exist (usually it does exist if~$A$ is square, but even then it may not, as we shall soon see), and even if it does exist, how to determine it is not immediately obvious. And still, if one can determine~$A^{-1}$, that is only for square~$A$; what if~$A$ is not square? In the present subsection however we are not trying to prove anything, only to motivate, so for the moment let us suppose a square~$A$ for which~$A^{-1}$ does exist, and let us seek~$A^{-1}$ by left-multiplying~$A$ by a sequence $\prod T$ of elementary row operators, each of which makes the matrix more nearly resemble~$I_n$. When~$I_n$ is finally achieved, then we shall have that \[ \left(\prod T\right) (A) = I_n, \] or, left-multiplying by~$I_n$ and observing that $I_n^2=I_n$, \[ (I_n) \left(\prod T\right) (A) = I_n, \] which implies that \[ A^{-1} = (I_n) \left(\prod T\right). \] The product of elementaries which transforms~$A$ to~$I_n$, truncated (\S~\ref{matrix:180.23}) to $n \times n$ dimensionality, itself constitutes~$A^{-1}$. This observation is what motivates the Gauss-Jordan decomposition. { \nc\fz[1]{\mbox{\footnotesize$\ds{#1}$}} \settowidth\tla{\footnotesize $-0$} By successive steps,% \footnote{ Theoretically, all elementary operators including the ones here have extended-operational form~(\S~\ref{matrix:180.35}), but all those~$\cdots$ ellipses clutter the page too much. Only the $2 \times 2$ active regions are shown here. } a concrete example: \bqb A &=& \fz{\left[ \br{rr} 2 & -4 \\ 3 & -1 \er \right]}, \\ \fz{ \left[ \br{rr} \frac 1 2 & 0 \\ 0 & 1 \er \right] } A &=& \fz{\left[ \br{rr} 1 & -2 \\ 3 & -1 \er \right]}, \\ \fz{ \left[ \br{rr} 1 & 0 \\ -3 & 1 \er \right] \left[ \br{rr} \frac 1 2 & 0 \\ 0 & 1 \er \right] } A &=& \fz{\left[ \br{rr} 1 & -2 \\ 0 & 5 \er \right]}, \\ \fz{ \left[ \br{rr} 1 & 0 \\ 0 & \frac 1 5 \er \right] \left[ \br{rr} 1 & 0 \\ -3 & 1 \er \right] \left[ \br{rr} \frac 1 2 & 0 \\ 0 & 1 \er \right] } A &=& \fz{ \left[ \br{rr} 1 & -2 \\ 0 & 1 \er \right] }, \\ \fz{ \left[ \br{rr} 1 & 2 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 0 \\ 0 & \frac 1 5 \er \right] \left[ \br{rr} 1 & 0 \\ -3 & 1 \er \right] \left[ \br{rr} \frac 1 2 & 0 \\ 0 & 1 \er \right] } A &=& \fz{\left[ \br{rr} 1 & \makebox[\tla][r]{$0$} \\ 0 & 1 \er \right]}, \\ \fz{ \left[ \br{rr} 1 & 0 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 2 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 0 \\ 0 & \frac 1 5 \er \right] \left[ \br{rr} 1 & 0 \\ -3 & 1 \er \right] \left[ \br{rr} \frac 1 2 & 0 \\ 0 & 1 \er \right] } A &=& \fz{\left[ \br{rr} 1 & \makebox[\tla][r]{$0$} \\ 0 & 1 \er \right]}. \eqb Hence, \[ \renewcommand\arraystretch{1.3} A^{-1} = \fz{ \left[ \br{rr} 1 & 0 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 2 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 0 \\ 0 & \frac 1 5 \er \right] \left[ \br{rr} 1 & 0 \\ -3 & 1 \er \right] \left[ \br{rr} \frac 1 2 & 0 \\ 0 & 1 \er \right] } = \fz{ \left[ \br{rr} -\frac 1 {\mathrm A} & \frac 2 5 \\ -\frac 3 {\mathrm A} & \frac 1 5 \er \right] }. \] Using the elementary commutation identity that $T_{\beta[m]}T_{\alpha[mj]} = T_{\alpha\beta[mj]}T_{\beta[m]}$, from Table~\ref{matrix:Txchg3}, to group like operators, we have that \[ \renewcommand\arraystretch{1.3} A^{-1} = \fz{ \left[ \br{rr} 1 & 0 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 2 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 0 \\ -\frac 3 5 & 1 \er \right] \left[ \br{rr} 1 & 0 \\ 0 & \frac 1 5 \er \right] \left[ \br{rr} \frac 1 2 & 0 \\ 0 & 1 \er \right] } = \fz{ \left[ \br{rr} -\frac 1 {\mathrm A} & \frac 2 5 \\ -\frac 3 {\mathrm A} & \frac 1 5 \er \right] }; \] or, multiplying the two scaling elementaries to merge them into a single general scaling operator (\S~\ref{matrix:325.20}), \[ \renewcommand\arraystretch{1.3} A^{-1} = \fz{ \left[ \br{rr} 1 & 0 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 2 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 0 \\ -\frac 3 5 & 1 \er \right] \left[ \br{rr} \frac 1 2 & 0 \\ 0 & \frac 1 5 \er \right] } = \fz{ \left[ \br{rr} -\frac 1 {\mathrm A} & \frac 2 5 \\ -\frac 3 {\mathrm A} & \frac 1 5 \er \right] }. \] The last equation is written symbolically as \[ A^{-1} = I_2 U^{-1} L^{-1} D^{-1}, \] from which \[ A = D L U I_2 = \fz{ \left[ \br{rr} 2 & 0 \\ 0 & 5 \er \right] \left[ \br{rr} 1 & 0 \\ \frac 3 5 & 1 \er \right] \left[ \br{rr} 1 & -2 \\ 0 & 1 \er \right] \left[ \br{rr} 1 & 0 \\ 0 & 1 \er \right] } = \fz{\left[ \br{rr} 2 & -4 \\ 3 & -1 \er \right]}. \] } Now, admittedly, the equation $A = D L U I_2$ is not~(\ref{gjrank:341:GJ})---or rather, it is~(\ref{gjrank:341:GJ}), but only in the special case that $r = 2$ and $P = S = K = I$---which begs the question: why do we need the factors~$P$, $S$ and~$K$ in the first place? The answer regarding~$P$ and~$S$ is that these factors respectively gather row and column interchange elementaries, of which the example given has used none but which other examples sometimes need or want, particularly to avoid dividing by zero when they encounter a zero in an inconvenient cell of the matrix (the reader might try reducing $A=[0\;1; 1\;0]$ to~$I_2$, for instance; a row or column interchange is needed here). Regarding~$K$, this factor comes into play when~$A$ has broad rectangular rather than square shape, and also sometimes when one of the rows of~$A$ happens to be a linear combination of the others. The last point, we are not quite ready to detail yet, but at present we are only motivating not proving, so if the reader will accept the other factors and suspend judgment on~$K$ until the actual need for it emerges in \S~\ref{gjrank:341.10}, step~\ref{gjrank:341:s75}, then we will proceed on this basis. \subsection{Method} \label{gjrank:341.05} The Gauss-Jordan decomposition of a matrix~$A$ is not discovered at one stroke but rather is gradually built up, elementary by elementary. It begins with the equation \[ A = I I I I A I I, \] where the six~$I$ hold the places of the six Gauss-Jordan factors~$P$, $D$, $L$, $U$, $K$ and~$S$ of~(\ref{gjrank:341:GJ}). By successive elementary operations, the~$A$ on the right is gradually transformed into~$I_r$, while the six~$I$ are gradually transformed into the six Gauss-Jordan factors. The decomposition thus ends with the equation \[ A = P D L U I_r K S, \] which is~(\ref{gjrank:341:GJ}). In between, while the several matrices are gradually being transformed, the equation is represented as \bq{gjrank:341:05} A = \tilde P \tilde D \tilde L \tilde U \tilde I \tilde K \tilde S, \eq where the initial value of~$\tilde I$ is~$A$ and the initial values of~$\tilde P$, $\tilde D$, etc., are all~$I$. Each step of the transformation goes as follows. The matrix~$\tilde I$ is left- or right-multiplied by an elementary operator~$T$. To compensate, one of the six factors is right- or left-multiplied by~$T^{-1}$. Intervening factors are multiplied by both~$T$ and~$T^{-1}$, which multiplication constitutes an elementary similarity transformation as described in \S~\ref{gjrank:337}. For example, \[ A = \tilde P \left(\tilde DT_{(1/\alpha)[i]}\right) \left(T_{\alpha[i]}\tilde LT_{(1/\alpha)[i]}\right) \left(T_{\alpha[i]}\tilde UT_{(1/\alpha)[i]}\right) \left(T_{\alpha[i]}\tilde I\right) \tilde K \tilde S, \] which is just~(\ref{gjrank:341:05}), inasmuch as the adjacent elementaries cancel one another; then, \[ \begin{split} \tilde I &\la T_{\alpha[i]}\tilde I, \\ \tilde U &\la T_{\alpha[i]}\tilde UT_{(1/\alpha)[i]}, \\ \tilde L &\la T_{\alpha[i]}\tilde LT_{(1/\alpha)[i]}, \\ \tilde D &\la \tilde DT_{(1/\alpha)[i]}, \end{split} \] thus associating the operation with the appropriate factor---in this case,~$\tilde D$. Such elementary row and column operations are repeated until $\tilde I=I_r$, at which point~(\ref{gjrank:341:05}) has become the Gauss-Jordan decomposition~(\ref{gjrank:341:GJ}). \subsection{The algorithm} \label{gjrank:341.10} \index{Gauss-Jordan algorithm} \index{algorithm!Gauss-Jordan} Having motivated the Gauss-Jordan decomposition in \S~\ref{gjrank:341.01} and having proposed a basic method to pursue it in \S~\ref{gjrank:341.05}, we shall now establish a definite, orderly, failproof algorithm to achieve it. Broadly, the algorithm \bi \item copies~$A$ into the variable working matrix~$\tilde I$ (step~\ref{gjrank:341:s10} below), \item reduces~$\tilde I$ by suitable row (and maybe column) operations to unit upper triangular form (steps~\ref{gjrank:341:s18} through~\ref{gjrank:341:s50}), \item establishes a rank~$r$ (step~\ref{gjrank:341:s55}), and \item reduces the now unit triangular~$\tilde I$ further to the rank-$r$ identity matrix~$I_r$ (steps~\ref{gjrank:341:s60} through~\ref{gjrank:341:s85}). \ei Specifically, the algorithm decrees the following steps. (The steps as written include many parenthetical remarks---so many that some steps seem to consist more of parenthetical remarks than of actual algorithm. The remarks are unnecessary to execute the algorithm's steps as such. They are however necessary to explain and to justify the algorithm's steps to the reader.) \begin{enumerate} \item \label{gjrank:341:s10} Begin by initializing \[ \br{c} \tilde P \la I, \ \tilde D \la I, \ \tilde L \la I, \ \tilde U \la I, \ \tilde K \la I, \ \tilde S \la I, \\ \setlength\arraycolsep{0.30\arraycolsep} \br{rcl} \tilde I &\la& A, \\ i &\la& 1, \er \er \] where~$\tilde I$ holds the part of~$A$ remaining to be decomposed, where~$i$ is a row index, and where the others are the variable working matrices of~(\ref{gjrank:341:05}). (The eventual goal will be to factor all of~$\tilde I$ away, leaving $\tilde I=I_r$, though the precise value of~$r$ will not be known until step~\ref{gjrank:341:s55}. Since~$A$ by definition is a dimension-limited $m \times n$ matrix, one naturally need not store~$A$ beyond the $m \times n$ active region. What is less clear until one has read the whole algorithm, but nevertheless true, is that one also need not store the dimension-limited~$\tilde I$ beyond the $m \times n$ active region. The other six variable working matrices each have extended-operational form, but they also confine their activity to well-defined regions: $m \times m$ for~$\tilde P$, $\tilde D$, $\tilde L$ and~$\tilde U$; $n \times n$ for~$\tilde K$ and~$\tilde S$. One need store none of the matrices beyond these bounds.) \item \label{gjrank:341:s18} \index{pivot} (Besides arriving at this point from step~\ref{gjrank:341:s10} above, the algorithm also re\"enters here from step~\ref{gjrank:341:s50} below. From step~\ref{gjrank:341:s10}, $\tilde I = A$ and $\tilde L = I$, so this step~\ref{gjrank:341:s18} though logical seems unneeded. The need grows clear once one has read through step~\ref{gjrank:341:s50}.) Observe that neither the $i$th row of~$\tilde I$ nor any row below it has an entry left of the $i$th column, that~$\tilde I$ is all-zero below-leftward of and directly leftward of (though not directly below) the \emph{pivot} element~$\tdi_{ii}$.% \footnote{ The notation~$\tdi_{ii}$ looks interesting, but this is accidental. The~$\tdi$ relates not to the doubled, subscribed index~$ii$ but to~$\tilde I$. The notation~$\tdi_{ii}$ thus means $[\tilde I]_{ii}$---in other words, it means the current $ii$th element of the variable working matrix~$\tilde I$. } Observe also that above the $i$th row, the matrix has proper unit upper triangular form (\S~\ref{matrix:330}). Regarding the other factors, notice that~$\tilde L$ enjoys the major partial unit triangular form $L^{\{i-1\}}$ (\S~\ref{matrix:330.05}) and that $\tilde d_{kk} = 1$ for all $k \ge i$. Pictorially, \bqb \tilde D &=& \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&{*}&0&0&0&0&0&0&\cdots\\ \cdots&0&{*}&0&0&0&0&0&\cdots\\ \cdots&0&0&{*}&0&0&0&0&\cdots\\ \cdots&0&0&0&1&0&0&0&\cdots\\ \cdots&0&0&0&0&1&0&0&\cdots\\ \cdots&0&0&0&0&0&1&0&\cdots\\ \cdots&0&0&0&0&0&0&1&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots },\\ \tilde L = L^{\{i-1\}} &=& \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&0&0&0&0&0&0&\cdots\\ \cdots&{*}&1&0&0&0&0&0&\cdots\\ \cdots&{*}&{*}&1&0&0&0&0&\cdots\\ \cdots&{*}&{*}&{*}&1&0&0&0&\cdots\\ \cdots&{*}&{*}&{*}&0&1&0&0&\cdots\\ \cdots&{*}&{*}&{*}&0&0&1&0&\cdots\\ \cdots&{*}&{*}&{*}&0&0&0&1&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots },\\ \tilde I &=& \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&{*}&{*}&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&1&{*}&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&1&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&{*}&{*}&{*}&{*}&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }, \eqb where the $i$th row and $i$th column are depicted at center. \item \label{gjrank:341:s20} \index{Intel} \index{AMD} \index{x86-class computer processor} \index{computer processor} \index{processor, computer} \index{bit} \index{floating-point number} \index{mantissa} \index{exponent, floating-point} \index{exact arithmetic} \index{arithmetic!exact} Choose a nonzero element $\tdi_{pq} \neq 0$ on or below the pivot row, where $p \ge i$ and $q \ge i$. (The easiest choice may simply be~$\tdi_{ii}$, where $p=q=i$, if $\tdi_{ii} \neq 0$; but any nonzero element from the $i$th row downward can in general be chosen. Beginning students of the Gauss-Jordan or~$LU$ decomposition are conventionally taught to choose first the least possible~$q$ then the least possible~$p$. When one has no reason to choose otherwise, that is as good a choice as any. There is however no actual need to choose so. In fact alternate choices can sometimes improve practical numerical accuracy.% \footnote{ A typical Intel or AMD x86-class computer processor represents a C/C++ \texttt{double}-type floating-point number, $x=2^pb$, in~0x40 bits of computer memory. Of the~0x40 bits,~0x34 are for the number's mantissa $2.0 \le b < 4.0$ (not $1.0 \le b < 2.0$ as one might expect),~0xB are for the number's exponent $-\mbox{0x3FF} \le p \le \mbox{0x3FE}$, and one is for the number's~$\pm$ sign. (The mantissa's high-order bit, which is always~1, is implied not stored, thus is one neither of the~0x34 nor of the~0x40 bits.) The out-of-bounds exponents $p=-\mbox{0x400}$ and $p=\mbox{0x3FF}$ serve specially respectively to encode~0 and~$\infty$. All this is standard computing practice. Such a floating-point representation is easily accurate enough for most practical purposes, but of course it is not generally exact.\ \cite[\S~1-4.2.2]{Intel} }$\mbox{}^,$\footnote{ The Gauss-Jordan's floating-point errors come mainly from dividing by small pivots. Such errors are naturally avoided by avoiding small pivots, at least until as late in the algorithm as possible. Smallness however is relative: a small pivot in a row and a column each populated by even smaller elements is unlikely to cause as much error as is a large pivot in a row and a column each populated by even larger elements. To choose a pivot, any of several heuristics are reasonable. The following heuristic if programmed intelligently might not be too computationally expensive: Define the pivot-smallness metric \settoheight\tla{\tiny$*$} \[ \tilde\eta_{pq}^2 \equiv \frac{2\tdi_{pq}^{*}\tdi_{pq}^{\rule{0em}{\tla}}}{ \sum_{p'=i}^m \tdi_{p'q}^{*}\tdi_{p'q}^{\rule{0em}{\tla}} + \sum_{q'=i}^n \tdi_{pq'}^{*}\tdi_{pq'}^{\rule{0em}{\tla}} }. \] Choose the~$p$ and~$q$ of least~$\tilde\eta_{pq}^2$. If two are equally least, then choose first the lesser column index~$q$, then if necessary the lesser row index~$p$. } Theoretically nonetheless, when doing exact arithmetic, the choice is quite arbitrary, so long as $\tdi_{pq} \neq 0$.) If no nonzero element is available---if all remaining rows $p \ge i$ are now null---then skip directly to step~\ref{gjrank:341:s55}. \item \label{gjrank:341:s25} \index{row} \index{matrix!row of} \index{column} \index{matrix!column of} Observing that~(\ref{gjrank:341:05}) can be expanded to read \bqb A &=& \left( \tilde P T_{[p \lra i]} \right) \left( T_{[p \lra i]} \tilde D T_{[p \lra i]} \right) \left( T_{[p \lra i]} \tilde L T_{[p \lra i]} \right) \left( T_{[p \lra i]} \tilde U T_{[p \lra i]} \right) \\ && \ \ \mbox{} \times \left( T_{[p \lra i]} \tilde I T_{[i \lra q]} \right) \left( T_{[i \lra q]} \tilde K T_{[i \lra q]} \right) \left( T_{[i \lra q]} \tilde S \right) \\ &=& \left( \tilde P T_{[p \lra i]} \right) \tilde D \left( T_{[p \lra i]} \tilde L T_{[p \lra i]} \right) \tilde U \\ && \ \ \mbox{} \times \left( T_{[p \lra i]} \tilde I T_{[i \lra q]} \right) \tilde K \left( T_{[i \lra q]} \tilde S \right), \eqb let \[ \begin{split} \tilde P &\la \tilde P T_{[p\lra i]}, \\ \tilde L &\la T_{[p\lra i]} \tilde L T_{[p\lra i]}, \\ \tilde I &\la T_{[p\lra i]} \tilde I T_{[i\lra q]}, \\ \tilde S &\la T_{[i\lra q]} \tilde S, \end{split} \] thus interchanging the $p$th with the $i$th row and the $q$th with the $i$th column, to bring the chosen element to the pivot position. (Refer to Table~\ref{gjrank:337:table} for the similarity transformations. The~$\tilde U$ and~$\tilde K$ transformations disappear because at this stage of the algorithm, still $\tilde U = \tilde K = I$. The~$\tilde D$ transformation disappears because $p \ge i$ and because $\tilde d_{kk}=1$ for all $k \ge i$. Regarding the~$\tilde L$ transformation, it does not disappear, but~$\tilde L$ has major partial unit triangular form $L^{\{i-1\}}$, which form according to Table~\ref{gjrank:337:table} it retains since $i-1 < i \le p$.) \item \label{gjrank:341:s40} Observing that~(\ref{gjrank:341:05}) can be expanded to read \bqb A &=& \tilde P \left( \tilde D T_{\tdi_{ii}[i]} \right) \left( T_{(1/\tdi_{ii})[i]} \tilde L T_{\tdi_{ii}[i]} \right) \left( T_{(1/\tdi_{ii})[i]} \tilde U T_{\tdi_{ii}[i]} \right) \\ && \ \ \mbox{} \times \left( T_{(1/\tdi_{ii})[i]} \tilde I \right) \tilde K \tilde S \\ &=& \tilde P \left( \tilde D T_{\tdi_{ii}[i]} \right) \left( T_{(1/\tdi_{ii})[i]} \tilde L T_{\tdi_{ii}[i]} \right) \tilde U \left( T_{(1/\tdi_{ii})[i]} \tilde I \right) \tilde K \tilde S, \eqb normalize the new~$\tdi_{ii}$ pivot by letting \[ \begin{split} \tilde D &\la \tilde D T_{\tdi_{ii}[i]}, \\ \tilde L &\la T_{(1/\tdi_{ii})[i]} \tilde L T_{\tdi_{ii}[i]}, \\ \tilde I &\la T_{(1/\tdi_{ii})[i]} \tilde I. \end{split} \] This forces $\tdi_{ii} = 1$. It also changes the value of~$\tilde d_{ii}$. Pictorially after this step, \bqb \tilde D &=& \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&{*}&0&0&0&0&0&0&\cdots\\ \cdots&0&{*}&0&0&0&0&0&\cdots\\ \cdots&0&0&{*}&0&0&0&0&\cdots\\ \cdots&0&0&0&{*}&0&0&0&\cdots\\ \cdots&0&0&0&0&1&0&0&\cdots\\ \cdots&0&0&0&0&0&1&0&\cdots\\ \cdots&0&0&0&0&0&0&1&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots },\\ \tilde I &=& \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&{*}&{*}&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&1&{*}&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&1&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&1&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&{*}&{*}&{*}&{*}&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }. \eqb (Though the step changes~$\tilde L$, too, again it leaves~$\tilde L$ in the major partial unit triangular form $L^{\{i-1\}}$, because $i-1i$. It also fills in $\tilde L$'s $i$th column below the pivot, advancing that matrix from the $L^{\{i-1\}}$ form to the $L^{\{i\}}$ form. Pictorially, \bqb \tilde L = L^{\{i\}} &=& \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&0&0&0&0&0&0&\cdots\\ \cdots&{*}&1&0&0&0&0&0&\cdots\\ \cdots&{*}&{*}&1&0&0&0&0&\cdots\\ \cdots&{*}&{*}&{*}&1&0&0&0&\cdots\\ \cdots&{*}&{*}&{*}&{*}&1&0&0&\cdots\\ \cdots&{*}&{*}&{*}&{*}&0&1&0&\cdots\\ \cdots&{*}&{*}&{*}&{*}&0&0&1&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots },\\ \tilde I &=& \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&{*}&{*}&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&1&{*}&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&1&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&1&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&0&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&0&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&0&{*}&{*}&{*}&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }. \eqb (Note that it is not necessary actually to apply the addition elementaries here one by one. Together they easily form an addition quasielementary $L_{[i]}$, thus can be applied all at once. See \S~\ref{matrix:325.30}.) \item \label{gjrank:341:s50} Increment \[ i\la i+1. \] Go back to step~\ref{gjrank:341:s18}. \item \label{gjrank:341:s55} Decrement \[ i\la i-1 \] to undo the last instance of step~\ref{gjrank:341:s50} (even if there never was an instance of step~\ref{gjrank:341:s50}), thus letting~$i$ point to the matrix's last nonzero row. After decrementing, let the rank \[ r \equiv i. \] Notice that, certainly, $r \le m$ and $r \le n$. \item \label{gjrank:341:s60} (Besides arriving at this point from step~\ref{gjrank:341:s55} above, the algorithm also re\"enters here from step~\ref{gjrank:341:s70} below.) If $i=0$, then skip directly to step~\ref{gjrank:341:s75}. \item \label{gjrank:341:s65} Observing that~(\ref{gjrank:341:05}) can be expanded to read \[ A = \tilde P \tilde D \tilde L \left( \tilde U T_{\tdi_{pi}[pi]} \right) \left( T_{-\tdi_{pi}[pi]} \tilde I \right) \tilde K \tilde S, \] clear $\tilde I$'s $i$th column above the pivot by letting \[ \begin{split} \tilde U &\la \left( \tilde U \right) \left( \prod_{p=1}^{i-1}T_{\tdi_{pi}[pi]} \right), \\ \tilde I &\la \left( \coprod_{p=1}^{i-1}T_{-\tdi_{pi}[pi]} \right) \left( \tilde I \right). \end{split} \] This forces $\tdi_{ip}=0$ for all $p \neq i$. It also fills in $\tilde U$'s $i$th column above the pivot, advancing that matrix from the $U^{\{i+1\}}$ form to the $U^{\{i\}}$ form. Pictorially, \bqb \tilde U = U^{\{i\}} &=& \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&0&0&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&1&0&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&1&{*}&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&1&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&0&1&{*}&{*}&\cdots\\ \cdots&0&0&0&0&0&1&{*}&\cdots\\ \cdots&0&0&0&0&0&0&1&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots },\\ \tilde I &=& \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&{*}&{*}&0&0&0&0&\cdots\\ \cdots&0&1&{*}&0&0&0&0&\cdots\\ \cdots&0&0&1&0&0&0&0&\cdots\\ \cdots&0&0&0&1&0&0&0&\cdots\\ \cdots&0&0&0&0&1&0&0&\cdots\\ \cdots&0&0&0&0&0&1&0&\cdots\\ \cdots&0&0&0&0&0&0&1&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }. \eqb (As in step~\ref{gjrank:341:s45}, here again it is not necessary actually to apply the addition elementaries one by one. Together they easily form an addition quasielementary $U_{[i]}$. See \S~\ref{matrix:325.30}.) \item \label{gjrank:341:s70} Decrement $i\la i-1$. Go back to step~\ref{gjrank:341:s60}. \item \label{gjrank:341:s75} \index{extra column} \index{spare column} \index{column!spare} Notice that~$\tilde I$ now has the form of a rank-$r$ identity matrix, except with $n-r$ extra columns dressing its right edge (often $r=n$ however; then there are no extra columns). Pictorially, \[ \tilde I = \mf{ccccccccc}{ \ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ \cdots&1&0&0&0&{*}&{*}&{*}&\cdots\\ \cdots&0&1&0&0&{*}&{*}&{*}&\cdots\\ \cdots&0&0&1&0&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&1&{*}&{*}&{*}&\cdots\\ \cdots&0&0&0&0&0&0&0&\cdots\\ \cdots&0&0&0&0&0&0&0&\cdots\\ \cdots&0&0&0&0&0&0&0&\cdots\\ &\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\ddots }. \] Observing that~(\ref{gjrank:341:05}) can be expanded to read \[ A = \tilde P \tilde D \tilde L \tilde U \left( \tilde I T_{-\tdi_{pq}[pq]} \right) \left( T_{\tdi_{pq}[pq]} \tilde K \right) \tilde S, \] use the now conveniently elementarized columns of $\tilde I$'s main body to suppress the extra columns on its right edge by \[ \begin{split} \tilde I &\la \left( \tilde I \right) \left( \prod_{q=r+1}^n \, \coprod_{p=1}^r T_{-\tdi_{pq}[pq]} \right), \\ \tilde K &\la \left( \coprod_{q=r+1}^n \, \prod_{p=1}^r T_{\tdi_{pq}[pq]} \right) \left( \tilde K \right). \end{split} \] (Actually, entering this step, it was that $\tilde K = I$, so in fact~$\tilde K$ becomes just the product above. As in steps~\ref{gjrank:341:s45} and~\ref{gjrank:341:s65}, here again it is not necessary actually to apply the addition elementaries one by one. Together they easily form a parallel unit upper---not lower---triangular matrix $L_\|^{\{r\}T}$. See \S~\ref{matrix:330.50}.) \item \label{gjrank:341:s85} Notice now that $\tilde I=I_r$. Let \[ P \equiv \tilde P, \ D \equiv \tilde D, \ L \equiv \tilde L, \ U \equiv \tilde U, \ K \equiv \tilde K, \ S \equiv \tilde S. \] End. \end{enumerate} Never stalling, the algorithm cannot fail to achieve $\tilde I=I_r$ and thus a complete Gauss-Jordan decomposition of the form~(\ref{gjrank:341:GJ}), though what value the rank~$r$ might turn out to have is not normally known to us in advance. (We have not yet proven, but will in \S~\ref{gjrank:340}, that the algorithm always produces the same~$I_r$, the same rank $r \ge 0$, regardless of which pivots $\tdi_{pq}\neq 0$ one happens to choose in step~\ref{gjrank:341:s20} along the way. We can safely ignore this unproven fact however for the immediate moment.) \subsection{Rank and independent rows} \label{gjrank:341.11} \index{rank!and independent rows} \index{rank!maximum} Observe that the Gauss-Jordan algorithm of \S~\ref{gjrank:341.10} operates always within the bounds of the original $m\times n$ matrix~$A$. Therefore, necessarily, \bq{gjrank:341:22} \begin{split} r &\le m, \\ r &\le n. \end{split} \eq The rank~$r$ exceeds the number neither of the matrix's rows nor of its columns. This is unsurprising. Indeed the narrative of the algorithm's step~\ref{gjrank:341:s55} has already noticed the fact. Observe also however that \emph{the rank always fully reaches $r=m$ if the rows of the original matrix~$A$ are linearly independent.} The reason for this observation is that the rank can fall short, $rI_r)(I_rG_$ and~$B_<$ exist such that \bq{gjrank:340:20} \begin{split} B_>AB_< &= I_r, \\ A &= B_>^{-1} I_r B_<^{-1}, \\ B_>^{-1}B_> &= I = B_>B_>^{-1}, \\ B_<^{-1}B_< &= I = B_^{-1} I_r B_<^{-1}, \\ A &= G_>^{-1} I_s G_<^{-1}. \end{split} \] Combining these equations, \[ B_>^{-1} I_r B_<^{-1} = G_>^{-1} I_s G_<^{-1}. \] Solving first for~$I_r$, then for~$I_s$, \[ \begin{split} (B_>G_>^{-1}) I_s (G_<^{-1}B_<) &= I_r, \\ (G_>B_>^{-1}) I_r (B_<^{-1}G_<) &= I_s. \end{split} \] Were it that $r \neq s$, then one of these two equations would constitute the demotion of an identity matrix and the other, a promotion. But according to \S~\ref{gjrank:340.10} and its~(\ref{gjrank:340:10}), promotion is impossible. Therefore $r \neq s$ is also impossible, and \[ r = s \] is guaranteed. No matrix has two different ranks. \emph{Matrix rank is unique.} This finding has two immediate implications: \bi \item Reversible row and/or column operations exist to change any matrix of rank~$r$ to \emph{any other matrix} of the same rank. The reason is that, according to~(\ref{gjrank:340:20}), reversible operations exist to change both matrices to~$I_r$ and back. \item No reversible operation can change a matrix's rank. \ei The discovery that every matrix has a single, unambiguous rank and the establishment of a failproof algorithm---the Gauss-Jordan---to ascertain that rank have not been easy to achieve, but they are important achievements nonetheless, worth the effort thereto. The reason these achievements matter is that the mere dimensionality of a matrix is a chimerical measure of the matrix's true size---as for instance for the $3 \times 3$ example matrix at the head of the section. Matrix rank by contrast is an entirely solid, dependable measure. We will rely on it often. Section~\ref{gjrank:340.30} comments further. \subsection{The full-rank matrix} \label{gjrank:340.25} \index{rank!full} \index{full rank} \index{matrix!full-rank} \index{degenerate matrix} \index{matrix!degenerate} According to~(\ref{gjrank:341:22}), the rank~$r$ of a matrix can exceed the number neither of the matrix's rows nor of its columns. The greatest rank possible for an $m\times n$ matrix is the lesser of~$m$ and~$n$. A \emph{full-rank} matrix, then, is defined to be an $m\times n$ matrix with maximum rank $r=m$ or $r=n$---or, if $m=n$, both. A matrix of less than full rank is a \emph{degenerate} matrix. \index{linear combination} \index{linear dependence} Consider a tall $m\times n$ matrix~$C$, $m\ge n$, one of whose~$n$ columns is a linear combination (\S~\ref{gjrank:335}) of the others. One could by definition target the dependent column with addition elementaries, using multiples of the other columns to wipe the dependent column out. Having zeroed the dependent column, one could then interchange it over to the matrix's extreme right, effectively throwing the column away, shrinking the matrix to $m\times(n-1)$ dimensionality. Shrinking the matrix necessarily also shrinks the bound on the matrix's rank to $r\le n-1$---which is to say, to $r^{T}$ such that $I_n = B_<^{T}A^{T}B_>^{T}$, the transpose of which equation is $B_>AB_< = I_n$---which in turn says that not only~$A^T$, but also~$A$ itself, has full rank $r=n$. Parallel reasoning rules the rows of broad matrices, $m\le n$, of course. To square matrices, $m=n$, both lines of reasoning apply. \index{matrix!tall} \index{matrix!broad} \index{matrix!square} Gathering findings, we have that \bi \item a tall $m\times n$ matrix, $m\ge n$, has full rank if and only if its columns are linearly independent; \item a broad $m\times n$ matrix, $m\le n$, has full rank if and only if its rows are linearly independent; \item a square $n\times n$ matrix, $m = n$, has full rank if and only if its columns and/or its rows are linearly independent; and \item a square matrix has both independent columns and independent rows, or neither; never just one or the other. \ei \index{full column rank} \index{column rank!full} \index{rank!column} \index{full row rank} \index{row rank!full} \index{rank!row} To say that a matrix has \emph{full column rank} is to say that it is tall or square and has full rank $r=n \le m$. To say that a matrix has \emph{full row rank} is to say that it is broad or square and has full rank $r=m \le n$. Only a square matrix can have full column rank and full row rank at the same time, because a tall or broad matrix cannot but include, respectively, more columns or more rows than~$I_r$. \subsection[Under- and overdetermined systems (introduction)] {Underdetermined and overdetermined linear systems (introduction)} \label{gjrank:340.24} \index{linear system!classification of} \index{linear system!taxonomy of} \index{taxonomy!of the linear system} \index{linear system!underdetermined} \index{linear system!overdetermined} \index{linear system!degenerate} \index{underdetermined linear system} \index{overdetermined linear system} \index{degenerate linear system} The last paragraph of \S~\ref{gjrank:340.25} % Thesaurus for provoke: awaken, call forth, call up, cultivate, % enkindle, fire up, foment, inspire, instigate, motivate, move, % promote, prompt, provoke, stir up. provokes yet further terminology. A linear system $A\ve x = \ve b$ is \emph{underdetermined} if~$A$ lacks full column rank---that is, if $r^{-1}$ to reach the form \[ I_rG_<\ve x = G_>^{-1}\ve b. \] If the $m$-element vector $\ve c \equiv G_>^{-1}\ve b$, then $I_rG_<\ve x = \ve c,$ which is impossible unless the last $m-r$ elements of~$\ve c$ happen to be zero. But since~$G_>$ is invertible, each~$\ve b$ corresponds to a unique~$\ve c$ and vice versa; so, if~$\ve b$ is an unrestricted $m$-element vector then so also is~$\ve c$, which verifies the claim. Complementarily, \emph{a nonoverdetermined linear system $A\ve x = \ve b$ does have a solution for every possible $m$-element driving vector~$\ve b$.} This is so because in this case the last $m-r$ elements of~$\ve c$ do happen to be zero; or, better stated, because~$\ve c$ in this case has no nonzeros among its last $m-r$ elements, because it \emph{has} no last $m-r$ elements, for the trivial reason that $r=m$. It is an analytical error, and an easy one innocently to commit, to require that \[ A\ve x = \ve b \] for unrestricted~$\ve b$ when~$A$ lacks full row rank. The error is easy to commit because the equation looks right, because such an equation is indeed valid over a broad domain of~$\ve b$ and might very well have been written correctly in that context, only not in the context of unrestricted~$\ve b$. Analysis including such an error can lead to subtly absurd conclusions. It is never such an analytical error however to require that \[ A\ve x = 0 \] because, whatever other solutions such a system might have, it has at least the solution $\ve x = 0$. \subsection{The full-rank factorization} \label{gjrank:340.26} \index{full-rank factorization} \index{factorization!full-rank} One sometimes finds dimension-limited matrices of less than full rank inconvenient to handle. However, every dimension-limited, $m \times n$ matrix of rank~$r$ can be expressed as the product of two full-rank matrices, one $m \times r$ and the other $r \times n$, both also of rank~$r$: \bq{gjrank:340:26} A = BC. \eq The truncated Gauss-Jordan~(\ref{gjrank:341:GJtC}) constitutes one such \emph{full-rank factorization:} $B=I_mG_>I_r$, $C=I_rG_r$ that dress~$\tilde I$'s right, but in this case~$\tilde I$ has only~$r$ columns and therefore has no spare columns to null. Hence step~\ref{gjrank:341:s75} does nothing and $K=I$. \index{interchange!refusing an} That $S=I$ comes immediately of choosing $q=i$ for pivot column during each iterative instance of the algorithm's step~\ref{gjrank:341:s20}. But, one must ask, can one choose so? What if column $q=i$ were unusable? That is, what if the only nonzero elements remaining in~$\tilde I$'s $i$th column stood above the main diagonal, unavailable for step~\ref{gjrank:341:s25} to bring to pivot? Well, \emph{were} it so, then one would indeed have to choose $q \neq i$ to swap the unusable column away rightward, but see: nothing in the algorithm later fills such a column's zeros with anything else---they remain zeros---so swapping the column away rightward could only delay the crisis. The column would remain unusable. Eventually the column would reappear on pivot when no usable column rightward remained available to swap it with, which contrary to our assumption would mean precisely that $r}(0,\xxm)(0,0)} \rput(\xxo\tla,0.5\tlb){$A$} \rput(\xxo\tla,0){\psline{->}(0,-\xxm)(0,0)} \rput(-0.75\tla,-\tlb){% \rput(0,-0.1\tla){% \psline(-0.5\tla,\xxl)(-0.5\tla,-\xxl)% \psline( 0.5\tla,\xxl)( 0.5\tla,-\xxl)% \psline{->}( \xxp,0)( 0.5\tla,0)% \psline{->}(-\xxp,0)(-0.5\tla,0)% \rput(0,0){$T_1$}% }% } } { \psset{linewidth=2.0pt} \psline% (-1.35\tla,-\tlb)% (-1.25\tla,-\tlb)(-1.25\tla,\tlb)(-0.75\tla,\tlb)(-0.75\tla,-\tlb)% (-0.25\tla,-\tlb)(-0.25\tla,\tlb)( 0.25\tla,\tlb)( 0.25\tla,-\tlb)% ( 0.75\tla,-\tlb)( 0.75\tla,\tlb)( 1.25\tla,\tlb)( 1.25\tla,-\tlb)% ( 1.35\tla,-\tlb) } } \end{pspicture} \ec \end{figure} A Fourier series expands such a repeating waveform as a superposition of complex exponentials or, alternately, if the waveform is real, of sinusoids. \index{waveform!approximation of} \index{primary frequency} \index{frequency!primary} Suppose that you wanted to approximate the square wave of Fig.~\ref{fours:000:fig10} by a single sinusoid. You might try the sinusoid at the top of Fig.~\ref{fours:000:fig20}---which is not very convincing, maybe, but if you added to the sinusoid another, suitably scaled sinusoid of thrice the frequency then you would obtain the somewhat better fitting curve in the figure's middle. The curve at the figure's bottom would yet result after you had added in four more sinusoids respectively of five, seven, nine and eleven times the primary frequency. Algebraically,% \bqa f(t) &=& \frac{8A}{2\pi}\bigg[ \cos\frac{(2\pi) t}{T_1} - \frac 1 3 \cos\frac{3(2\pi) t}{T_1} \xn\\&&\ \ \ \ \ \ \ \ \ \ \mbox{} + \frac 1 5 \cos\frac{5(2\pi) t}{T_1} - \frac 1 7 \cos\frac{7(2\pi) t}{T_1} + \cdots \bigg]. \label{fours:000:20} \eqa% \begin{figure} \caption[Superpositions of sinusoids.]{Superpositions of one, two and six sinusoids to approximate the square wave of Fig.~\ref{fours:000:fig10}.} \label{fours:000:fig20} \bc \nc\xxxab{4.3} \nc\xxyab{1.2} \setlength\tla{3.0cm} \setlength\tlb{0.5cm} \nc\xxl{0.15} \nc\xxm{0.25} \nc\xxo{0.35} \nc\xxp{0.20} \nc\fyc{3.7} \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-5.3} \nc\fyb{5.8} \nc\xxaxes{% {% \psset{linewidth=0.5pt}% \psline(-\xxxab,0)(\xxxab,0)% \psline(0,-\xxyab)(0,\xxyab)% \uput[r](\xxxab,0){$t$}% \uput[u](0,\xxyab){$f(t)$}% }% } \nc\xxsquarewave{% {% \psset{linewidth=1.0pt,linestyle=dashed}% \psline% (-1.35\tla,-\tlb)% (-1.25\tla,-\tlb)(-1.25\tla,\tlb)(-0.75\tla,\tlb)(-0.75\tla,-\tlb)% (-0.25\tla,-\tlb)(-0.25\tla,\tlb)( 0.25\tla,\tlb)( 0.25\tla,-\tlb)% ( 0.75\tla,-\tlb)( 0.75\tla,\tlb)( 1.25\tla,\tlb)( 1.25\tla,-\tlb)% ( 1.35\tla,-\tlb)% }% } \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} \rput(0,\fyc){ \xxaxes \xxsquarewave \psplot[linewidth=2.0pt,plotpoints=500]{-4.05}{4.05}{ x 3.0 div 360.0 mul cos 0.5 mul 1.2732 mul } }% \rput(0,0){ \xxaxes \xxsquarewave \psplot[linewidth=2.0pt,plotpoints=500]{-4.05}{4.05}{ x 3.0 div 1.0 mul 360.0 mul cos 0.5 mul 1.0 div 1.2732 mul x 3.0 div 3.0 mul 360.0 mul cos 0.5 mul -3.0 div 1.2732 mul add } }% \rput(0,-\fyc){ \xxaxes \psplot[linewidth=2.0pt,plotpoints=500]{-4.05}{4.05}{ x 3.0 div 1.0 mul 360.0 mul cos 0.5 mul 1.0 div 1.2732 mul x 3.0 div 3.0 mul 360.0 mul cos 0.5 mul -3.0 div 1.2732 mul x 3.0 div 5.0 mul 360.0 mul cos 0.5 mul 5.0 div 1.2732 mul x 3.0 div 7.0 mul 360.0 mul cos 0.5 mul -7.0 div 1.2732 mul x 3.0 div 9.0 mul 360.0 mul cos 0.5 mul 9.0 div 1.2732 mul x 3.0 div 11.0 mul 360.0 mul cos 0.5 mul -11.0 div 1.2732 mul add add add add add } } } \end{pspicture} \ec \end{figure}% How faithfully~(\ref{fours:000:20}) really represents the repeating waveform and why its coefficients happen to be $1, -\frac 1 3, \frac 1 5, -\frac 1 7, \ldots$ are among the questions this chapter will try to answer; but, visually at least, it looks as though superimposing sinusoids worked. The chapter begins in preliminaries, starting with a discussion of Parseval's principle. % ---------------------------------------------------------------------- \section{Parseval's principle} \label{fours:080} \index{Parseval's principle} \index{Parseval, Marc-Antoine (1755--1836)} \index{step in every direction} \emph{Parseval's principle} is that \emph{a step in every direction is no step at all.} In the Argand plane (Fig.~\ref{alggeo:225:fig}), stipulated that \bq{fours:080:08} \begin{split} \Delta\omega\, T_1 &= 2\pi, \\ \Im(\Delta\omega) &= 0, \\ \Im(t_o) &= 0, \\ \Im(T_1) &= 0, \\ T_1 &\neq 0, \end{split} \eq and also that% \footnote{ That $2 \le N$ is a redundant requirement, since (\ref{fours:080:09})'s other lines imply it, but it doesn't hurt to state it anyway. } \bq{fours:080:09} \begin{split} j,n,N &\in \mathbb Z, \\ n &\neq 0, \\ \left| n \right| &< N, \\ 2 &\le N, \end{split} \eq the principle is expressed algebraically as that% \footnote{ An expression like $t_o \pm T_1/2$ means $t_o \pm (T_1/2)$, here and elsewhere in the book. } \bq{fours:080:10} \int_{t_o-T_1/2}^{t_o+T_1/2} e^{in \,\Delta\omega\, \tau} \,d\tau = 0 \eq or alternately in discrete form as that \bq{fours:080:15} \sum_{j=0}^{N-1} e^{i 2\pi nj/N} = 0. \eq \index{parameter} \index{dimension} \index{physical unit} \index{unit!physical} Because the product $\Delta\omega\, T_1 = 2\pi$ relates~$\Delta\omega$ to~$T_1$, the symbols~$\Delta\omega$ and~$T_1$ together represent in~(\ref{fours:080:08}) and~(\ref{fours:080:10}) not two but only one independent parameter. If~$T_1$ bears physical units then these typically will be units of time (seconds, for instance), whereupon~$\Delta\omega$ will bear the corresponding units of angular frequency (such as radians per second). The frame offset~$t_o$ and the dummy variable~$\tau$ naturally must have the same dimensions% \footnote{ The term \emph{dimension} in this context refers to the kind of physical unit. A quantity like~$T_1$ for example, measurable in seconds or years (but not, say, in kilograms or dollars), has dimensions of time. An automobile's speed having dimensions of length divided by time can be expressed in miles per hour as well as in meters per second but not directly, say, in volts per centimeter; and so on. }% ~$T_1$ has and normally will bear the same units. This matter is discussed further in \S~\ref{fours:085}. \index{physical insight} \index{insight} \index{symmetry!appeal to} To prove~(\ref{fours:080:10}) symbolically is easy: one merely carries out the indicated integration. To prove~(\ref{fours:080:15}) symbolically is not much harder: one replaces the complex exponential $e^{i 2\pi nj/N}$ by $\lim_{\ep \ra 0^{+}} e^{(i - \ep)2\pi nj/N}$ and then uses~(\ref{alggeo:228:45}) to evaluate the summation. Notwithstanding, we can do better, for an alternate, more edifying, physically more insightful explanation of the two equations is possible as follows. Because~$n$ is a nonzero integer,~(\ref{fours:080:10}) and~(\ref{fours:080:15}) represent sums of steps in every direction---that is, steps in every phase---in the Argand plane (more precisely, eqn.~\ref{fours:080:15} represents a sum over a discrete but balanced, uniformly spaced selection of phases). An appeal to symmetry forbids such sums from favoring any one phase $n \,\Delta\omega\, \tau$ or $2\pi nj/N$ over any other. This being the case, how could the sums of~(\ref{fours:080:10}) and~(\ref{fours:080:15}) ever come to any totals other than zero? The plain answer is that they can come to no other totals. A step in every direction is indeed no step at all. This is why~(\ref{fours:080:10}) and~(\ref{fours:080:15}) are so.% \footnote{ The writer unfortunately knows of no conventionally established name for Parseval's principle. The name \emph{Parseval's principle} seems as apt as any and this is the name this book will use. A pedagogical knot seems to tangle Marc-Antoine Parseval's various namesakes. Because Parseval's principle can be extracted as a special case from Parseval's theorem (eqn.~\ref{fouri:110:parseval} in the next chapter), the literature sometimes indiscriminately applies the name ``Parseval's theorem'' to both. This is fine as far as it goes, but the knot arrives when one needs Parseval's principle to derive the Fourier series, which one needs to derive the Fourier transform, which one needs in turn to derive Parseval's theorem, at least as this book develops them. The way to untie the knot is to give Parseval's principle its own name and to let it stand as an independent result. } We have actually already met Parseval's principle, informally, in \S~\ref{inttx:260.20}. One can translate Parseval's principle from the Argand realm to the analogous realm of geometrical vectors, if needed, in the obvious way. %Parseval's principle is a specialization of a more general mathematical %result named ``Parseval's theorem,'' the latter of which refers to %a formalized abstraction of the mathematics% %\footnote{\cite[eqns.~2-40 and~2-41]{Couch}} %that descends from the engineering observation that, given lengthy %enough a chance and hefty enough a temporary storage element (like a %counterweight, flywheel, buffer, holding tank, pressure vessel, resonant %element, tuned stub or electrical capacitor), one can insert and/or %extract mechanical or electrical power to and/or from a wave, and that %one can do so a single, discrete frequency at a time; whereas one cannot %separate the power that accords to sources operating at the same %frequency unless the sources are twain and happen to operate in %quadrature. The trouble in developing the full Parseval's theorem from %the first is that, if one prefers not to develop it in a conceptual %vacuum before concepts like ``power'' or ``quadrature'' have been %properly introduced, one will still want the theorem's main idea, %Parseval's principle, before treating power and quadrature. So, which %comes first? Power and quadrature, or Parseval's theorem? Our answer %is to introduce Parseval's principle first and then to let the full %theorem, which is not central and will be almost obvious later on in any %case, take care of itself. % ---------------------------------------------------------------------- \section{Time, space and frequency} \label{fours:085} \index{time} \index{space} \index{frequency} \index{frequency!cyclic} \index{frequency!angular} \index{cyclic frequency} \index{angular frequency} \index{dimension} \index{engine} \index{internal-combustion engine} \index{RPM} \index{hertz} A \emph{frequency} is the inverse of an associated period of time, expressing the useful concept of the rate at which a cycle repeats. For example, an internal-combustion engine whose crankshaft revolves once every~20 milliseconds---which is to say, once every $1/3000$ of a minute---runs thereby at a frequency of~3000 revolutions per minute (RPM). Frequency however comes in two styles: cyclic frequency (as in the engine's example), conventionally represented by letters like~$\nu$ and~$f$; and angular frequency, by letters like~$\omega$ and~$k$. If~$T$, $\nu$ and~$\omega$ are letters taken to stand respectively for a period of time, the associated cyclic frequency and the associated angular frequency, then by definition \bq{fours:085:10} \begin{split} \nu T &= 1, \\ \omega T &= 2\pi, \\ \omega &= 2\pi \nu. \end{split} \eq The period~$T$ will have dimensions of time like seconds. The cyclic frequency~$\nu$ will have dimensions of inverse time like hertz (cycles per second).% \footnote{ Notice incidentally, contrary to the improper verbal usage one sometimes hears, that there is no such thing as a ``hert.'' Rather, ``Hertz'' is somebody's name. The uncapitalized form ``hertz'' thus is singular as well as plural. } The angular frequency~$\omega$ will have dimensions of inverse time like radians per second. \index{countability} \index{baseball} \index{cycle} \index{radian} \index{second} The applied mathematician should make himself aware, and thereafter keep in mind, that the cycle per second and the radian per second do not differ dimensionally from one another. Both are technically units of $[\mbox{second}]^{-1}$, whereas the words ``cycle'' and ``radian'' in the contexts of the phrases ``cycle per second'' and ``radian per second'' are verbal cues that, in and of themselves, play no actual part in the mathematics. This is not because the cycle and the radian were ephemeral but rather because the second is unfundamental. The second is an arbitrary unit of measure. The cycle and the radian are definite, discrete, inherently countable things; and, where things are counted, it is ultimately up to the mathematician to interpret the count (consider for instance that nine baseball hats may imply nine baseball players and one baseball team, but that there is nothing in the number nine itself to tell us so). To distinguish angular frequencies from cyclic frequencies, it remains to the mathematician to lend factors of~$2\pi$ where needed. The word ``frequency'' without a qualifying adjective is usually taken to mean cyclic frequency unless the surrounding context implies otherwise. \index{frequency!spatial} \index{spatial frequency} Frequencies exist in space as well as in time: \bq{fours:085:15} k\lambda = 2\pi. \eq Here,~$\lambda$ is a \emph{wavelength} measured in meters or other units of length. The \emph{wavenumber}% \footnote{ The wavenumber~$k$ is no integer, notwithstanding that the letter~$k$ tends to represent integers in other contexts. }% ~$k$ is an angular spatial frequency measured in units like radians per meter. (Oddly, no conventional symbol for cyclic spatial frequency seems to be current. The literature just uses $k/2\pi$ which, in light of the potential for confusion between~$\nu$ and~$\omega$ in the temporal domain, is probably for the best.) \index{propagation speed} \index{speed!of propagation} Where a wave propagates the propagation speed \bq{fours:085:20} v = \frac{\lambda}{T} = \frac{\omega}{k} \eq relates periods and frequencies in space and time. \index{dimensionlessness} Now, we must admit that we fibbed when we said that~$T$ had to have dimensions of time. Physically, that is the usual interpretation, but mathematically~$T$ (and~$T_1$, $t$, $t_o$, $\tau$, etc.) can bear any units and indeed are not required to bear units at all, as \S~\ref{fours:080} has observed. The only mathematical requirement is that the product $\omega T = 2\pi$ (or $\Delta\omega\, T_1 = 2\pi$ or the like, as appropriate) be dimensionless. However, when~$T$ has dimensions of length rather than time it is conventional---indeed, it is practically mandatory if one wishes to be understood---to change $\lambda \la T$ and $k \la \omega$ as this section has done, though the essential Fourier mathematics is the same regardless of $T$'s dimensions (if any) or of whether alternate symbols like~$\lambda$ and~$k$ are used. % ---------------------------------------------------------------------- \section{The square, triangular and Gaussian pulses} \label{fours:095} \index{pulse} \index{pulse!square} \index{square pulse} \index{pulse!triangular} \index{triangular pulse} \index{Dirac delta function!implementation of} \index{delta function, Dirac!implementation of} \index{pulse!Gaussian} \index{Gaussian pulse} \index{$\Pi$ as the rectangular pulse} \index{$\Lambda$ as the triangular pulse} \index{$\Omega$ as the Gaussian pulse} The Dirac delta of \S~\ref{integ:670} and of Fig.~\ref{integ:670:fig-d} is useful for the unit area it covers among other reasons, but for some purposes its curve is too sharp. One occasionally finds it expedient to substitute either the \emph{square} or the \emph{triangular pulse} of Fig.~\ref{fours:095:fig1}, \bq{fours:095:10} \settowidth\tlb{$1 - \left| t \right|$} \begin{split} \Pi (t) &\equiv \begin{cases} \makebox[\tlb][l]{$1$} &\mbox{if $\left|t\right| \le 1/2$,} \\ 0 &\mbox{otherwise;} \end{cases} \\ \Lambda(t) &\equiv \begin{cases} \makebox[\tlb][l]{$1 - \left| t \right|$} &\mbox{if $\left|t\right| \le 1$,} \\ 0 &\mbox{otherwise;} \end{cases} \end{split} \eq for the Dirac delta, \begin{figure} \caption{The square, triangular and Gaussian pulses.} \label{fours:095:fig1} \bc \nc\xxxab{4.3} \nc\xxyab{1.8} \nc\xxyac{0.4} \setlength\tla{3.0cm} \setlength\tlb{1.2cm} \nc\xxl{0.15} \nc\xxo{0.80} \nc\xxp{0.25} \nc\xxq{1.90} \nc\xxqqq{5.70} \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-5.6} \nc\fyb{4.5} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} \settowidth\tlj{$\Pi$} \rput(0, \xxq){% { \psset{linewidth=0.5pt} \settowidth\tlc{$\frac{1}{2}$} \psline(-\xxxab,0)(\xxxab,0) \psline(0,-\xxyac)(0,\xxyab) \uput[r](\xxxab,0){$t$} \uput[u](0,\xxyab){$\makebox[\tlj][r]{$\Pi$}(t)$} \uput[ur](0,\tlb){$1$} \psline( 0.5\tlb, \xxl)( 0.5\tlb,-\xxl) \uput[d]( 0.5\tlb,-\xxl){\makebox[\tlc][r]{$\frac{1}{2}$}} \psline(-0.5\tlb, \xxl)(-0.5\tlb,-\xxl) \uput[d](-0.5\tlb,-\xxl){\makebox[\tlc][r]{$-\frac{1}{2}$}} }% { \psset{linewidth=2.0pt} \psline% (-1.35\tla,0)% (-0.50\tlb,0)(-0.50\tlb,\tlb)( 0.50\tlb,\tlb)( 0.50\tlb,0)% ( 1.35\tla,0) }% }% \rput(0,-\xxq){% { \psset{linewidth=0.5pt} \settowidth\tlc{$1$} \psline(-\xxxab,0)(\xxxab,0) \psline(0,-\xxyac)(0,\xxyab) \uput[r](\xxxab,0){$t$} \uput[u](0,\xxyab){$\makebox[\tlj][r]{$\Lambda$}(t)$} \psline(-\xxl,\tlb)( \xxl,\tlb) \uput[ur](0,\tlb){$1$} \psline( 1.0\tlb, \xxl)( 1.0\tlb,-\xxl) \uput[d]( 1.0\tlb,-\xxl){\makebox[\tlc][r]{$1$}} \psline(-1.0\tlb, \xxl)(-1.0\tlb,-\xxl) \uput[d](-1.0\tlb,-\xxl){\makebox[\tlc][r]{$-1$}} }% { \psset{linewidth=2.0pt} \psline% (-1.35\tla,0)% (-1.00\tlb,0)(0,\tlb)( 1.00\tlb,0)% ( 1.35\tla,0) }% }% \rput(0,-\xxqqq){% { \psset{linewidth=0.5pt} \settowidth\tlc{$1$} \psline(-\xxxab,0)(\xxxab,0) \psline(0,-\xxyac)(0,\xxyab) \uput[r](\xxxab,0){$t$} \uput[u](0,\xxyab){$\makebox[\tlj][r]{$\Omega$}(t)$} \uput[ur](0,0.39894\tlb){$\frac{1}{\sqrt{2\pi}}$} }% { \psset{linewidth=2.0pt} \psplot[plotpoints=300]{-4.05}{4.05}{ /twopi 6.28318530717959 def /e 2.71828182845905 def /scale 1.2 def e x scale div dup mul 2.0 div neg exp twopi sqrt div scale mul } }% }% } \end{pspicture} \ec \end{figure} both of which pulses evidently share Dirac's property that \bq{fours:095:20} \settowidth\tla{$\Pi$} \begin{split} \int_{-\infty}^\infty \frac{1}{T} \makebox[\tla][c]{$\delta $}\left(\frac{\tau - t_o}{T}\right) \,d\tau &= 1, \\ \int_{-\infty}^\infty \frac{1}{T} \makebox[\tla][c]{$\Pi $}\left(\frac{\tau - t_o}{T}\right) \,d\tau &= 1, \\ \int_{-\infty}^\infty \frac{1}{T} \makebox[\tla][c]{$\Lambda$}\left(\frac{\tau - t_o}{T}\right) \,d\tau &= 1, \end{split} \eq for any real $T>0$ and real~$t_o$. In the limit, %it may be observed that \bq{fours:095:30} \settowidth\tla{$\Pi$} \begin{split} \lim_{T \ra 0^{+}} \frac{1}{T} \makebox[\tla][c]{$\Pi $}\left(\frac{t - t_o}{T}\right) &= \delta(t - t_o), \\ \lim_{T \ra 0^{+}} \frac{1}{T} \makebox[\tla][c]{$\Lambda$}\left(\frac{t - t_o}{T}\right) &= \delta(t - t_o), \end{split} \eq constituting at least two possible implementations of the Dirac delta in case such an implementation were needed. Looking ahead, if we may further abuse the Greek capitals to let them represent pulses whose shapes they accidentally resemble, then a third, subtler implementation---more complicated to handle but analytic (\S~\ref{taylor:320}) and therefore preferable for some purposes---is the \emph{Gaussian pulse} \bqa \lim_{T \ra 0^{+}} \frac{1}{T} \Omega\left(\frac{t - t_o}{T}\right) &=& \delta(t - t_o), \label{fours:095:33} \\ \Omega(t) &\equiv& \frac{1}{\sqrt{2\pi}}\exp\left(-\frac{t^2}{2}\right),\xn \eqa the mathematics of which \S~\ref{fouri:130} and % diagn: check the following reference. Chs.~\ref{specf} and~\ref{prob} will unfold. %but see Fig.~\ref{prob:normdist-fig} on page~\pageref{prob:normdist-fig} %for a plot. % ---------------------------------------------------------------------- \section[Expanding waveforms in Fourier series]% {Expanding repeating waveforms in Fourier series} \label{fours:100} \index{Fourier series} \index{series!Fourier} \index{sinusoid} \index{complex exponential} \index{exponential!complex} \index{Fourier coefficient} \index{coefficient!Fourier} The Fourier series represents a repeating waveform~(\ref{fours:000:11}) %like the waveform of Fig.~\ref{fours:000:fig10} as a superposition of sinusoids. More precisely, inasmuch as Euler's formula~(\ref{cexp:250:cos}) makes a sinusoid a sum of two complex exponentials, the Fourier series supposes that a repeating waveform were a superposition \bq{fours:100:10} f(t) = \sum_{j=-\infty}^{\infty} a_j e^{ij \,\Delta\omega\, t} \eq of many complex exponentials, in which~(\ref{fours:080:08}) is obeyed yet in which neither the several Fourier coefficients~$a_j$ nor the waveform $f(t)$ itself need be real. Whether one can properly represent every repeating waveform as a superposition~(\ref{fours:100:10}) of complex exponentials is a question \S\S~\ref{fours:100.80} and~\ref{fours:170} will address later; but, at least to the extent to which one can properly represent such a waveform, we will now assert that one can recover any or all of the waveform's Fourier coefficients~$a_j$ by choosing an arbitrary frame offset~$t_o$ ($t_o = 0$ is a typical choice) and then integrating \bq{fours:100:15} a_j = \frac{1}{T_1} \int_{t_o-T_1/2}^{t_o+T_1/2} e^{-ij\,\Delta\omega\,\tau} f(\tau) \,d\tau. \eq \subsection{Derivation of the Fourier-coefficient formula} \label{fours:100.10} \index{integration!over a complete cycle} \index{cycle!integration over a complete} \index{frequency!shifting of} \index{Fourier coefficient!recovery of} But why should~(\ref{fours:100:15}) work? How is it to recover a Fourier coefficient~$a_j$? The answer is that it recovers a Fourier coefficient~$a_j$ by isolating it, and that it isolates it by shifting frequencies and integrating. Equation~(\ref{fours:100:10}) has proposed to express a repeating waveform as a series of complex exponentials, each exponential of the form $a_j e^{ij \,\Delta\omega\, t}$ in which~$a_j$ is a weight to be determined. Unfortunately,~(\ref{fours:100:10}) can hardly be very useful until the several~$a_j$ actually are determined, whereas how to determine~$a_j$ from~(\ref{fours:100:10}) for a given value of~$j$ is not immediately obvious. The trouble with using~(\ref{fours:100:10}) to determine the several coefficients~$a_j$ is that it includes all the terms of the series and, hence, all the coefficients~$a_j$ at once. To determine~$a_j$ for a given value of~$j$, one should like to suppress the entire series except the single element $a_j e^{ij \,\Delta\omega\, t}$, isolating this one element for analysis. Fortunately, Parseval's principle~(\ref{fours:080:10}) gives us a way to do this, as we shall soon see. Now, to prove~(\ref{fours:100:15}) we mean to use~(\ref{fours:100:15}), a seemingly questionable act. Nothing prevents us however from taking only the right side of~(\ref{fours:100:15})---not as an equation but as a mere expression---and doing some algebra with it to see where the algebra leads, for if the algebra should lead to the \emph{left} side of~(\ref{fours:100:15}) then we should have proven the equation. Accordingly, changing dummy variables $\tau \la t$ and $\ell \la j$ in~(\ref{fours:100:10}) and then substituting into~(\ref{fours:100:15})'s right side the resulting expression for $f(\tau)$, we have by successive steps that% \footnote{\label{fours:100:fn10}% It is unfortunately conventional to footnote steps like these with some formal remarks on convergence and the swapping of summational/integrodifferential operators. Refer to \S\S~\ref{integ:240.10} and~\ref{integ:240.20}. } \bqb \lefteqn{\frac{1}{T_1} \int_{t_o-T_1/2}^{t_o+T_1/2} e^{-ij\,\Delta\omega\,\tau} f(\tau) \,d\tau} && \\&=& \frac{1}{T_1} \int_{t_o-T_1/2}^{t_o+T_1/2} e^{-ij\,\Delta\omega\,\tau} \sum_{\ell = -\infty}^{\infty} a_\ell e^{i\ell \,\Delta\omega\, \tau } \,d\tau \\&=& \frac{1}{T_1} \sum_{\ell = -\infty}^{\infty} a_\ell \int_{t_o-T_1/2}^{t_o+T_1/2} e^{i(\ell-j)\,\Delta\omega\,\tau} \,d\tau \\&=& \frac{a_j}{T_1} \int_{t_o-T_1/2}^{t_o+T_1/2} e^{i(j-j)\,\Delta\omega\,\tau} \,d\tau \\&=& \frac{a_j}{T_1} \int_{t_o-T_1/2}^{t_o+T_1/2} \,d\tau = a_j, \eqb in which Parseval's principle~(\ref{fours:080:10}) has killed all but the $\ell=j$ term in the summation. Thus is~(\ref{fours:100:15}) formally proved. Though the foregoing formally completes the proof, the idea behind the formality remains more interesting than the formality itself, for one would like to know not only the fact that~(\ref{fours:100:15}) is true but also the thought which leads one to propose the equation in the first place. The thought is as follows. Assuming that~(\ref{fours:100:10}) indeed can represent the waveform $f(t)$ properly, one observes that the transforming factor $e^{-ij\,\Delta\omega\,\tau}$ of~(\ref{fours:100:15}) serves to shift the waveform's $j$th component $a_j e^{ij \,\Delta\omega\, t}$---whose angular frequency evidently is $\omega = j \,\Delta\omega$---down to a frequency of zero, incidentally shifting the waveform's several other components to various nonzero frequencies as well. Significantly, the transforming factor leaves each shifted frequency to be a whole multiple of the waveform's fundamental frequency~$\Delta\omega$. By Parseval's principle, (\ref{fours:100:15})'s integral then kills all the thus frequency-shifted components except the zero-shifted one \emph{by integrating the components over complete cycles,} passing only the zero-shifted component which, once shifted, has no cycle. Such is the thought which has given rise to the equation. \subsection{The square wave} \label{fours:100.20} \index{square wave} \index{wave!square} According to~(\ref{fours:100:15}), the Fourier coefficients of Fig.~\ref{fours:000:fig10}'s square wave are, if $t_o=T_1/4$ is chosen and by successive steps, \bqb a_j &=& \frac{1}{T_1} \int_{-T_1/4}^{3T_1/4} e^{-ij\,\Delta\omega\,\tau} f(\tau) \,d\tau \\&=& \frac{A}{T_1} \left[ \mbox{$\ds \int_{-T_1/4}^{T_1/4} -\int_{T_1/4}^{3T_1/4} $} \right] e^{-ij\,\Delta\omega\,\tau} \,d\tau \\&=& \frac{iA}{2\pi j} e^{-ij\,\Delta\omega\,\tau} \left[ \bigg|_{-T_1/4}^{T_1/4} - \bigg|_{T_1/4}^{3T_1/4} \right]. \eqb But% {% \settowidth\tla{\scriptsize $-T_1/4$}% \bqb \left. e^{-ij\,\Delta\omega\,\tau} \right|_{\tau = -T_1/4 } = \left. e^{-ij\,\Delta\omega\,\tau} \right|_{\tau = \makebox[\tla][l]{\scriptsize $3T_1/4$}} &=& i^j, \\ \left. e^{-ij\,\Delta\omega\,\tau} \right|_{\tau = \makebox[\tla][l]{\scriptsize $ T_1/4$}} &=& (-i)^j, \eqb% }% so \bqb \lefteqn{ e^{-ij\,\Delta\omega\,\tau} \left[ \bigg|_{-T_1/4}^{T_1/4} - \bigg|_{T_1/4}^{3T_1/4} \right] } && \\&=& [(-i)^j - i^j] - [i^j - (-i)^j] = 2[(-i)^j - i^j] \\&=& \mbox{$\ldots,-i4,0,i4,0,-i4,0,i4,\ldots$ for $j=\ldots,-3,-2,-1,0,1,2,3,\ldots$} \eqb Therefore, \bq{fours:100:20} \begin{split} a_j &= \left[(-i)^j - i^j\right] \frac{i2A}{2\pi j} \\ &= \begin{cases} (-)^{(j-1)/2}4A/2\pi j &\mbox{for odd~$j$,} \\ 0 &\mbox{for even~$j$} \end{cases} \end{split} \eq are the square wave's Fourier coefficients which, when the coefficients are applied to~(\ref{fours:100:10}) and when~(\ref{cexp:250:cos}) is invoked, indeed yield the specific series of sinusoids~(\ref{fours:000:20}) and Fig.~\ref{fours:000:fig20} have proposed. \subsection{The rectangular pulse train} \label{fours:100.30} \index{pulse train} \index{pulse train!rectangular} \index{rectangular pulse train} The square wave of \S~\ref{fours:100.20} is an important, canonical case and~(\ref{fours:000:20}) is arguably worth memorizing. After the square wave however the variety of possible repeating waveforms has no end. Whenever an unfamiliar repeating waveform arises, one can calculate its Fourier coefficients~(\ref{fours:100:15}) on the spot by the straightforward routine of \S~\ref{fours:100.20}. There seems little point therefore in trying to tabulate waveforms here. \index{duty cycle} One variant on the square wave nonetheless is interesting enough to attract special attention. This variant is the \emph{pulse train} of Fig.~\ref{fours:100:fig3}, \bq{fours:100:27} f(t) = A \sum_{j=-\infty}^{\infty} \Pi\left(\frac{t-jT_1}{\eta T_1}\right); \eq where $\Pi(\cdot)$ is the square pulse of~(\ref{fours:095:10}); the symbol~$A$ represents the pulse's full height rather than the half-height of Fig.~\ref{fours:000:fig10}; and the dimensionless factor $0 \le \eta \le 1$ is the train's \emph{duty cycle,} the fraction of each cycle its pulse is as it were on duty. \begin{figure} \caption{A rectangular pulse train.} \label{fours:100:fig3} \bc \nc\xxxab{4.3} \nc\xxyab{1.6} \nc\xxyac{0.6} \setlength\tla{3.0cm} \setlength\tlb{1.2cm} \nc\xxl{0.15} \nc\xxm{0.35} \nc\xxma{0.15} \nc\xxmb{0.08} \nc\xxo{0.80} \nc\xxp{0.25} \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-1.0} \nc\fyb{2.5} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxxab,0)(\xxxab,0) \psline(0,-\xxyac)(0,\xxyab) \uput[r](\xxxab,0){$t$} \uput[u](0,\xxyab){$f(t)$} \rput(\xxo\tla,\tlb){\psline(-\xxl,0)(\xxl,0)\psline{->}(0,-\xxm)(0,0)} \rput(\xxo\tla,0.5\tlb){$A$} \rput(\xxo\tla,0){\psline{->}(0,\xxm)(0,0)} \rput(-0.50\tla,0){% \rput(0,-0.1\tla){% \psline(-0.5\tla,\xxl)(-0.5\tla,-\xxl)% \psline{->}( \xxp,0)( 0.5\tla,0)% \psline{->}(-\xxp,0)(-0.5\tla,0)% \rput(0,0){$T_1$}% }% } \rput(1.00\tla,0){% \rput(0,-0.1\tla){% \psline(-0.1\tla,\xxl)(-0.1\tla,-\xxl)% \psline( 0.1\tla,\xxl)( 0.1\tla,-\xxl)% %\rput(-0.1\tla,0){\psline{->}(-\xxma,0)(0,0)}% %\rput( 0.1\tla,0){\psline{->}( \xxma,0)(0,0)}% %\rput(0.00,0){\footnotesize$\eta T_1$}% \psline{<->}(-0.1\tla,0)(0.1\tla,0)% \rput( 0.1\tla,0){% \psline(\xxma,0)(0,0)% \rput(\xxma,0){% \rput[l](\xxmb,0){$\eta T_1$}% }% }% }% } } { \psset{linewidth=2.0pt} \psline% (-1.35\tla,0)% (-1.10\tla,0)(-1.10\tla,\tlb)(-0.90\tla,\tlb)(-0.90\tla,0)% (-0.10\tla,0)(-0.10\tla,\tlb)( 0.10\tla,\tlb)( 0.10\tla,0)% ( 0.90\tla,0)( 0.90\tla,\tlb)( 1.10\tla,\tlb)( 1.10\tla,0)% ( 1.35\tla,0) } } \end{pspicture} \ec \end{figure} By the routine of \S~\ref{fours:100.20}, \bqb a_j &=& \frac{1}{T_1} \int_{-T_1/2}^{T_1/2} e^{-ij\,\Delta\omega\,\tau} f(\tau) \,d\tau \\&=& \frac{A}{T_1} \int_{-\eta T_1/2}^{\eta T_1/2} e^{-ij\,\Delta\omega\,\tau} \,d\tau \\&=& \frac{iA}{2\pi j} e^{-ij\,\Delta\omega\,\tau} \bigg|_{-\eta T_1/2}^{\eta T_1/2} = \frac{2A}{2\pi j} \sin \frac{2\pi\eta j}{2} \eqb for $j \neq 0$. On the other hand, \[ a_0 = \frac{1}{T_1} \int_{-T_1/2}^{T_1/2} f(\tau) \,d\tau = \frac{A}{T_1} \int_{-\eta T_1/2}^{\eta T_1/2} \,d\tau = \eta A \] is the waveform's mean value. Altogether for the pulse train, \bq{fours:100:30} a_j = \begin{cases} \ds \frac{2A}{2\pi j} \sin \frac{2\pi\eta j}{2} & \mbox{if $j \neq 0$,} \\ \eta A & \mbox{if $j = 0$} \end{cases} \eq (though eqn.~\ref{fours:160:05} will improve the notation later). \index{area} An especially interesting special case occurs when the duty cycle grows very short. Since $\lim_{\eta \ra 0^{+}} \sin (2\pi\eta j/2) = 2\pi\eta j/2$ according to~(\ref{taylor:315:60}), it follows from~(\ref{fours:100:30}) that \bq{fours:100:35} \lim_{\eta \ra 0^{+}} a_j = \eta A, \eq the same for every index~$j$. As the duty cycle~$\eta$ tends to vanish the pulse tends to disappear and the Fourier coefficients along with it; but we can compensate for vanishing duty if we wish by increasing the pulse's amplitude~$A$ proportionally, maintaining the product \bq{fours:100:36} \eta T_1 A = 1 \eq of the pulse's width~$\eta T_1$ and its height~$A$, thus preserving unit area% \footnote{ In light of the discussion of time, space and frequency in \S~\ref{fours:085}, we should clarify that we do not here mean a physical area measurable in square meters or the like. We merely mean the dimensionless product of the width (probably measured in units of time like seconds) and the height (correspondingly probably measured in units of frequency like inverse seconds) of the rectangle a single pulse encloses in Fig.~\ref{fours:100:fig3}. Though it is not a physical area the rectangle one sketches on paper to represent it, as in the figure, of course does have an area. The word \emph{area} here is meant in the latter sense. } under the pulse. In the limit $\eta \ra 0^{+}$, the pulse then by definition becomes the Dirac delta of Fig.~\ref{integ:670:fig-d}, and the pulse train by construction becomes the \emph{Dirac delta pulse train} of Fig.~\ref{fours:100:fig4}. \begin{figure} \caption{A Dirac delta pulse train.} \label{fours:100:fig4} \index{Dirac delta pulse train} \bc \nc\xxxab{4.3} \nc\xxyab{1.6} \nc\xxyac{0.6} \setlength\tla{3.0cm} \setlength\tlb{1.2cm} \nc\xxl{0.15} \nc\xxm{0.35} \nc\xxma{0.15} \nc\xxmb{0.08} \nc\xxo{0.80} \nc\xxp{0.25} \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-1.0} \nc\fyb{2.5} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxxab,0)(\xxxab,0) \psline(0,-\xxyac)(0,\xxyab) \uput[r](\xxxab,0){$t$} \uput[u](0,\xxyab){$f(t)$} \rput(-0.50\tla,0){% \rput(0,-0.1\tla){% \psline(-0.5\tla,\xxl)(-0.5\tla,-\xxl)% \psline{->}( \xxp,0)( 0.5\tla,0)% \psline{->}(-\xxp,0)(-0.5\tla,0)% \rput(0,0){$T_1$}% }% } } { \psset{linewidth=2.0pt} \psline{->}(-1.00\tla,0)(-1.00\tla,\tlb) \psline{->}( 0.00\tla,0)( 0.00\tla,\tlb) \psline{->}( 1.00\tla,0)( 1.00\tla,\tlb) } } \end{pspicture} \ec \end{figure} Enforcing~(\ref{fours:100:36}) on~(\ref{fours:100:35}) yields the Dirac delta pulse train's Fourier coefficients \bq{fours:100:37} a_j = \frac{1}{T_1}. \eq \subsection{Linearity and sufficiency} \label{fours:100.80} \index{Fourier series!sufficiency of} \index{Fourier series!linearity of} \index{linearity!of the Fourier series} The Fourier series is evidently linear according to the rules of \S~\ref{integ:240.05}. That is, if the Fourier coefficients of $f_1(t)$ are~$a_{j1}$ and the Fourier coefficients of $f_2(t)$ are~$a_{j2}$, and if the two waveforms $f_1(t)$ and $f_2(t)$ share the same fundamental period~$T_1$, then the Fourier coefficients of $f(t) = f_1(t) + f_2(t)$ are $a_j = a_{j1} + a_{j2}$. Likewise, the Fourier coefficients of $\alpha f(t)$ are~$\alpha a_j$ and the Fourier coefficients of the null waveform $f_{\mr{null}}(t) \equiv 0$ are themselves null, thus satisfying the conditions of linearity. All this however supposes that the Fourier series actually works.% \footnote{ The remainder of this dense subsection can be regarded as optional reading. } % The following bad break depends not only on the wording of the % sentence but probably also on the index of the foregoing footnote. % It is thus a chancy bad break, needing monitoring. % bad break \linebreak Though Fig.~\ref{fours:000:fig20} is suggestive, the figure alone hardly serves to demonstrate that every repeating waveform were representable as a Fourier series. To try to consider every repeating waveform at once would be too much to try at first in any case, so let us start from a more limited question: does there exist any continuous, repeating waveform% \footnote{ As elsewhere in the book, the notation $f(t) \neq 0$ here forbids only the all-zero waveform. It does not forbid waveforms like $f(t) = A \sin \omega t$ that happen to take a zero value at certain values of~$t$. } $f(t) \neq 0$ of period~$T_1$ whose Fourier coefficients $a_j = 0$ are identically zero? If the waveform $f(t)$ in question is continuous then nothing prevents us from discretizing~(\ref{fours:100:15}) as \bqb a_j &=& \lim_{M \ra \infty} \frac{1}{T_1} \sum_{\ell=-M}^M e^{(-ij\,\Delta\omega)(t_o + \ell\,\Delta\tau_M)} f(t_o + \ell\,\Delta\tau_M) \,\Delta\tau_M, \\ \Delta\tau_M &\equiv& \frac{T_1}{2M+1}, \eqb and further discretizing the waveform itself as % The $\,\Pi$ here is odd, but the equation does not look right without it. \[ f(t) = \lim_{M \ra \infty} \sum_{p=-\infty}^\infty f(t_o + p\,\Delta\tau_M) \,\Pi\left[\frac{t - (t_o + p\,\Delta\tau_M)}{\Delta\tau_M}\right], \] in which $\Pi[\cdot]$ is the square pulse of~(\ref{fours:095:10}). Substituting the discretized waveform into the discretized formula for~$a_j$, we have that \settoheight\tlj{{\scriptsize $M$}} \bqb a_j &=& \lim_{M \ra \infty} \frac{\Delta\tau_M}{T_1} \sum_{\ell=-M}^M \sum_{\rule{0em}{\tlj}p=-\infty}^\infty e^{(-ij\,\Delta\omega)(t_o + \ell\,\Delta\tau_M)} f(t_o + p\,\Delta\tau_M) \Pi(\ell-p) \\&=& \lim_{M \ra \infty} \frac{\Delta\tau_M}{T_1} \sum_{\ell=-M}^M e^{(-ij\,\Delta\omega)(t_o + \ell\,\Delta\tau_M)} f(t_o + \ell\,\Delta\tau_M). \eqb If we define the $(2M+1)$-element vectors and $(2M+1)\times(2M+1)$ matrix \[ \begin{split} [\ve f_M]_\ell &\equiv f(t_o + \ell\,\Delta\tau_M), \\ [\ve a_M]_j &\equiv a_j, \\ [C_M]_{j\ell} &\equiv \frac{\Delta\tau_M}{T_1}e^{(-ij\,\Delta\omega)(t_o + \ell\,\Delta\tau_M)}, \\ -M \le (j,\ell) &\le M, \end{split} \] then matrix notation renders the last equation as \[ \lim_{M \ra \infty} \ve a_M = \lim_{M \ra \infty} C_M \ve f_M, \] whereby \[ \lim_{M \ra \infty} \ve f_M = \lim_{M \ra \infty} C_M^{-1} \ve a_M, \] assuming that~$C_M$ is invertible. But is~$C_M$ invertible? This seems a hard question to answer until we realize that the rows of~$C_M$ consist of sampled complex exponentials which repeat over the interval~$T_1$ and thus stand subject to Parseval's principle~(\ref{fours:080:15}). Realizing this, we can do better than merely to state that~$C_M$ is invertible: we can write down its actual inverse, \[ [C_M^{-1}]_{\ell j} = \frac{T_1}{(2M+1)\,\Delta\tau_M}e^{(+ij\,\Delta\omega)(t_o + \ell\,\Delta\tau_M)}, \] such that% \footnote{ Equation~(\ref{matrix:180:IMC}) has defined the notation~$I_{-M}^M$, representing a $(2M+1)$-dimensional identity matrix whose string of ones extends along its main diagonal from $j=\ell=-M$ through $j=\ell=M$. } $C_MC_M^{-1} = I_{-M}^M$ and thus per~(\ref{mtxinv:220:20}) also that $C_M^{-1}C_M = I_{-M}^M$. So, the answer to our question is that, yes,~$C_M$ is invertible. Because~$C_M$ is invertible, \S~\ref{eigen:370} has it that neither~$\ve f_M$ nor~$\ve a_M$ can be null unless both are. In the limit $M \ra \infty$, this implies that no continuous, repeating waveform $f(t) \neq 0$ exists whose Fourier coefficients $a_j = 0$ are identically zero. Now consider a continuous, repeating waveform $F(t)$ and its Fourier series $f(t)$. Let $\Delta F(t) \equiv F(t) - f(t)$ be the part of $F(t)$ unrepresentable as a Fourier series, continuous because both $F(t)$ and $f(t)$ are continuous. Being continuous and unrepresentable as a Fourier series, $\Delta F(t)$ has null Fourier coefficients; but as the last paragraph has concluded this can only be so if $\Delta F(t) = 0$. Hence, $\Delta F(t) = 0$ indeed, which implies% \footnote{ Chapter~\ref{taylor}'s footnote~\ref{taylor:310:fn1} has argued in a similar style, earlier in the book. } that $f(t) = F(t)$. In other words, \emph{every continuous, repeating waveform is representable as a Fourier series.} And what of discontinuous waveforms? Well, the square wave of Figs. % bad break \ref{fours:000:fig10} and~\ref{fours:000:fig20} this chapter has posed as its principal example is a repeating waveform but, of course, not a continuous one. A truly discontinuous waveform would admittedly invalidate the discretization above of $f(t)$, but see: nothing prevents us from approximating the square wave's discontinuity by an arbitrarily steep slope, whereupon this subsection's conclusion again applies.% \footnote{ Where this subsection's conclusion cannot be made to apply is where unreasonable waveforms like $A\sin[B/\sin\omega t]$ come into play. We will leave to the professional mathematician the classification of such unreasonable waveforms, the investigation of the waveforms' Fourier series and the provision of greater rigor generally. } The better, more subtle, more complete answer to the question though is that a discontinuity incurs Gibbs' phenomenon, which \S~\ref{fours:170} will derive. \subsection{The trigonometric form} \label{fours:100.90} \index{Fourier series!in trigonometric form} It is usually best, or at least neatest and cleanest, and moreover more evocative, to calculate Fourier coefficients and express Fourier series in terms of complex exponentials as~(\ref{fours:100:10}) and~(\ref{fours:100:15}) do. Occasionally, though, when the repeating waveform $f(t)$ is real, one prefers to work in sines and cosines rather than in complex exponentials. One writes~(\ref{fours:100:10}) by Euler's formula~(\ref{cexp:euler}) as \[ f(t) = a_0 + \sum_{j=1}^{\infty} \left[ (a_j+a_{-j})\cos j \,\Delta\omega\, t + i(a_j-a_{-j})\sin j \,\Delta\omega\, t \right]. \] Then, superimposing coefficients in~(\ref{fours:100:15}), \bq{fours:100:91} \settowidth\tla{$i(a_j - a_{-j})$} \begin{split} a_0 &= \frac{1}{T_1} \int_{t_o-T_1/2}^{t_o+T_1/2} f(\tau) \,d\tau, \\ b_j \equiv \makebox[\tla][r]{$(a_j + a_{-j})$} &= \frac{2}{T_1} \int_{t_o-T_1/2}^{t_o+T_1/2} \cos(j \,\Delta\omega\, \tau) f(\tau) \,d\tau, \\ c_j \equiv \makebox[\tla][r]{$i(a_j - a_{-j})$} &= \frac{2}{T_1} \int_{t_o-T_1/2}^{t_o+T_1/2} \sin(j \,\Delta\omega\, \tau) f(\tau) \,d\tau, \end{split} \eq which give the Fourier series the trigonometric form \bq{fours:100:90} \begin{split} f(t) &= a_0 + \sum_{j=1}^{\infty} \left( b_j\cos j \,\Delta\omega\, t + c_j\sin j \,\Delta\omega\, t \right). \end{split} \eq \index{waveform!real} The complex conjugate of~(\ref{fours:100:15}) is \[ a_j^{*} = \frac{1}{T_1} \int_{t_o-T_1/2}^{t_o+T_1/2} e^{+ij\,\Delta\omega\,\tau} f^{*}(\tau) \,d\tau. \] If the waveform happens to be real then $f^{*}(t) = f(t)$, which in light of the last equation and~(\ref{fours:100:15}) implies that \bq{fours:100:93} a_{-j} = a_j^{*} \ \mbox{if} \ \Im[f(t)] = 0. \eq Combining~(\ref{fours:100:91}) and~(\ref{fours:100:93}), we have that \bq{fours:100:94} \left. \setlength\arraycolsep{0.30\arraycolsep} \renewcommand\arraystretch{1.3} \br{rcr} b_j &=& 2\Re(a_j) \\ c_j &=& -2\Im(a_j) \er \right\} \mbox{if} \ \Im[f(t)] = 0. \eq % ---------------------------------------------------------------------- \section{The sine-argument function} \label{fours:160} \index{sine-argument function} \index{sinc function} \index{sine-argument function!Taylor series for} \index{Taylor series!for the sine-argument function} Equation~(\ref{fours:100:30}) gives the pulse train of Fig.~\ref{fours:100:fig3} its Fourier coefficients, but a better notation for~(\ref{fours:100:30}) is \bq{fours:160:05} a_j = \eta A \sinarg \frac{2\pi\eta j}{2}, \eq where \bq{fours:160:10} \sinarg z \equiv \frac{\sin z}{z} \eq is the \emph{sine-argument function,}% \footnote{ Many (including the author himself in other contexts) call it the \emph{sinc function,} denoting it $\sinc(\cdot)$ and pronouncing it as ``sink.'' Unfortunately, some \cite[\S~4.3]{Phillips/Parr}% \cite[\S~2.2]{Couch}% \cite{Octave} use the $\sinc(\cdot)$ notation for another function, \[ \sinc_\mr{alternate} z \equiv \sinarg\frac{2\pi z}{2} = \frac{\sin(2\pi z/2)}{2\pi z/2}. \] The unambiguous $\sinarg(\cdot)$ suits this particular book better, anyway, so this is the notation we will use. } plotted in Fig.~\ref{fours:160:fig}. \begin{figure} \caption{The sine-argument function.} \label{fours:160:fig} \bc \setlength\tla{2.0420cm} \settowidth\tlb{$2\pi$} \settowidth\tlc{$\frac{2\pi}{2}$} \nc\xxa{5.6} \nc\xxb{1.10} \nc\xxc{0.3} \nc\xxd{5.5} \nc\xxda{0.02} \nc\xxdb{0.022} \nc\xxg{0.10} \nc\fxa{-6.0} \nc\fxb{6.0} \nc\fya{-0.6} \nc\fyb{2.2} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{3.0}{3.0} { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxa,0)(\xxa,0) \psline(0,-\xxc)(0,\xxb) \uput[r](\xxa,0){$t$} \uput[u](0,\xxb){$\ds \sinarg t \equiv \frac{\sin t}{t}$} \psline(-2\tla,\xxg)(-2\tla,-\xxg) \psline(-1\tla,\xxg)(-1\tla,-\xxg) \psline( 1\tla,\xxg)( 1\tla,-\xxg) \psline( 2\tla,\xxg)( 2\tla,-\xxg) \uput[u](-2\tla,\xxg){\makebox[\tlb][r]{$-2\pi$}} \uput[u](-1\tla,\xxg){\makebox[\tlc][r]{$-\frac{2\pi}{2}$}} \uput[u]( 1\tla,\xxg){$ \frac{2\pi}{2}$} \uput[u]( 2\tla,\xxg){$ 2\pi$} \uput[ur](0,0.65){$1$} } { \psset{linewidth=2.0pt} \psplot[plotpoints=250]{-\xxd}{-\xxda}{ x 0.65 div dup 57.296 mul sin exch div 0.65 mul } \psline(-\xxdb,0.65)(\xxdb,0.65) \psplot[plotpoints=250]{\xxda}{\xxd}{ x 0.65 div dup 57.296 mul sin exch div 0.65 mul } } } \end{pspicture} \ec \end{figure} The function's Taylor series is \bq{fours:160:20} \sinarg z = \sum_{j=0}^{\infty} \left[ \prod_{m=1}^j \frac{-z^2}{(2m)(2m+1)} \right], \eq the Taylor series of $\sin z$ from Table~\ref{taylor:315:tbl}, divided by~$z$. This section introduces the sine-argument function and some of its properties, plus also the related sine integral.% \footnote{ Readers interested in Gibbs' phenomenon, \S~\ref{fours:170}, will read the present section because Gibbs depends on its results. Among other readers however some, less interested in special functions than in basic Fourier theory, may find this section unprofitably tedious. They can skip ahead to the start of the next chapter without great loss. } \subsection{Derivative and integral} \label{fours:160.20} \index{sine-argument function!derivative of} \index{sine-argument function!integral of} \index{derivative!of the sine-argument function} \index{integral!of the sine-argument function} \index{sine integral} \index{sine integral!Taylor series for} \index{Taylor series!for the sine integral} The sine-argument function's derivative is computed from the definition % bad break (\ref{fours:160:10}) and the derivative product rule~(\ref{drvtv:proddiv}) to be \bq{fours:160:30} \frac{d}{dz} \sinarg z = \frac{\cos z - \sinarg z}{z}. \eq The function's integral is expressed as a Taylor series after integrating the function's own Taylor series~(\ref{fours:160:20}) term by term to obtain the form \bq{fours:160:35} \sinint z \equiv \int_0^z \sinarg \tau \,d\tau = \sum_{j=0}^{\infty} \left[ \frac{z}{2j+1} \prod_{m=1}^j \frac{-z^2}{(2m)(2m+1)} \right], \eq plotted in Fig.~\ref{fours:160:figi}. Convention gives this integrated function its own name and notation: it calls it the \emph{sine integral}% \footnote{\cite[\S~3.3]{Lebedev}}% $\mbox{}^{,}$% \footnote{ The name ``sine-argument'' incidentally seems to have been back-constructed from the name ``sine integral.'' } and denotes it by $\sinint(\cdot)$. \begin{figure} \caption{The sine integral.} \label{fours:160:figi} \bc \setlength\tla{2.0420cm} \settowidth\tlb{$2\pi$} \settowidth\tlc{$\frac{2\pi}{2}$} \nc\xxa{5.6} \nc\xxb{1.50} \nc\xxc{0.3} \nc\xxd{5.5} \nc\xxda{0.02} \nc\xxdb{0.022} \nc\xxe{1.02102} \nc\xxg{0.10} \nc\fxa{-6.0} \nc\fxb{6.0} \nc\fya{-1.80} \nc\fyb{2.40} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{3.0}{3.0} { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxa,0)(\xxa,0) \psline(0,-\xxb)(0,\xxb) \uput[r](\xxa,0){$t$} \uput[u](0,\xxb){$\sinint t \equiv \int_0^t \sinarg \tau \,d\tau$} { \psset{linestyle=dashed} \psline(0, \xxe)( \xxa, \xxe) \psline(0,-\xxe)(-\xxa,-\xxe) } \psline(-2\tla,\xxg)(-2\tla,-\xxg) \psline(-1\tla,\xxg)(-1\tla,-\xxg) \psline( 1\tla,\xxg)( 1\tla,-\xxg) \psline( 2\tla,\xxg)( 2\tla,-\xxg) \uput[u](-2\tla, \xxg){\makebox[\tlb][r]{$-2\pi$}} \uput[u](-1\tla, \xxg){\makebox[\tlc][r]{$-\frac{2\pi}{2}$}} \uput[d]( 1\tla,-\xxg){$ \frac{2\pi}{2}$} \uput[d]( 2\tla,-\xxg){$ 2\pi$} \psline(-\xxg, 0.65)(\xxg, 0.65) \psline(-\xxg,-0.65)(\xxg,-0.65) \psline(-\xxg, \xxe)(\xxg, \xxe) \psline(-\xxg,-\xxe)(\xxg,-\xxe) \uput[l](-\xxg, 0.65){ $1$} \uput[r]( \xxg,-0.65){$-1$} \uput[l](-\xxg, 1.18){ $\frac{2\pi}{4}$} \uput[r]( \xxg,-1.18){$-\frac{2\pi}{4}$} } { \psset{linewidth=2.0pt} \psplot[plotpoints=500]{-\xxd}{\xxd}{ /myscale 0.65 def /myN 24 def /myt x myscale div def /myt2 myt myt mul neg def /mytn myt def 0 0 1 myN { /myk exch def /myk20 myk 2 mul def /myk21 myk20 1 add def /myk22 myk20 2 add def /myk23 myk20 3 add def mytn myk21 div add /mytn mytn myt2 myk22 myk23 mul div mul def } for myscale mul } } } \end{pspicture} \ec \end{figure} \subsection{Properties of the sine-argument function} \label{fours:160.30} \index{sine-argument function!properties of} Sine-argument properties include the following. \bi \item The sine-argument function is real over the real domain. That is, if $\Im(t) = 0$ then $\Im(\sinarg t) = 0$. \item The zeros of $\sinarg z$ occur at $z = n\pi$, $n \neq 0$, $n \in \mathbb Z$. \item \index{extremum!of the sine-argument function} \index{extremum!global} It is that $\left|\sinarg t\right| < 1$ over the real domain $\Im(t) = 0$ except at the global maximum $t = 0$, where \bq{fours:160:38} \sinarg 0 = 1. \eq \item \index{alternating signs} \index{sign!alternating} Over the real domain $\Im(t) = 0$, the function $\sinarg t$ alternates between distinct, positive and negative lobes. Specifically, $(-)^n\sinarg(\pm t) > 0$ over $n\pi < t < (n+1)\pi$ for each $n \ge 0$, $n \in \mathbb Z$. \item Each of the sine-argument's lobes has but a single peak. That is, over the real domain $\Im(t) = 0$, the derivative $(d/dt)\sinarg t = 0$ is zero at only a single value of~$t$ on each lobe. \item The sine-argument function and its derivative converge toward \bq{fours:160:40} \begin{split} \lim_{t \ra \pm \infty} \sinarg t &= 0, \\ \lim_{t \ra \pm \infty} \frac{d}{dt} \sinarg t &= 0. \end{split} \eq \ei Some of these properties are obvious in light of the sine-argument function's definition~(\ref{fours:160:10}). Among the less obvious properties, that $\left|\sinarg t\right| < 1$ says merely that $\left|\sin t\right| < \left|t\right|$ for nonzero~$t$; which must be true since~$t$, interpreted as an angle---which is to say, as a curved distance about a unit circle---can hardly be shorter than $\sin t$, interpreted as the corresponding direct shortcut to the axis (see Fig.~\ref{trig:226:f1}). For $t=0$, (\ref{taylor:315:60}) obtains---or, if you prefer,~(\ref{fours:160:20}). \index{sketch, proof by} \index{proof!by sketch} That each of the sine-argument function's lobes should have but a single peak seems right in view of Fig.~\ref{fours:160:fig} but is nontrivial to prove. To assert that each lobe has but a single peak is to assert that $(d/dt)\sinarg t = 0$ exactly once in each lobe; or, equivalently---after setting (\ref{fours:160:30})'s left side to zero, multiplying by $z^2/\cos z$ and changing $t \la z$---it is to assert that \[ \tan t = t \] exactly once in each interval \[ n\pi \le t < (n+1)\pi, \ n \ge 0, \] for $t \ge 0$; and similarly for $t \le 0$. But according to Table~\ref{cexp:drv} \[ \frac{d}{dt} \tan t = \frac{1}{\cos^2 t} \ge 1, \] whereas $dt/dt = 1$, implying that $\tan t$ is everywhere at least as steep as~$t$ is---and, well, the most concise way to finish the argument is to draw a picture of it, as in Fig.~\ref{fours:160:fig3}, where the curves evidently cannot but intersect exactly once in each interval. \begin{figure} \caption{The points at which~$t$ intersects $\tan t$.} \label{fours:160:fig3} \index{tangent!compared against its argument} \bc \nc\xxa{2.2} \nc\xxb{2.6} \nc\xxc{2.0} \nc\xxg{0.15} \nc\xxh{1.25664} \nc\fxa{-2.8} \nc\fxb{2.8} \nc\fya{-2.9} \nc\fyb{3.4} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxa,0)(\xxa,0) \psline(0,-\xxb)(0,\xxb) \uput[r](\xxa,0){$t$} \uput[u](0,\xxb){$f(t)$} \psline( \xxh,\xxg)( \xxh,-\xxg) \psline(-\xxh,\xxg)(-\xxh,-\xxg) \uput[dr]( \xxh,0){$ \frac{2\pi}{2}$} \uput[ul](-\xxh,0){$-\frac{2\pi}{2}$} \rput(-2.18,-2.06){$t$} \rput(-0.60,-2.40){$\tan t$} } { \psset{linewidth=2.0pt} \psline(-\xxc,-\xxc)(\xxc,\xxc) \psplot[plotpoints=200]{-0.70032}{-1.81296}{ /myscale 0.40 def /myrad 57.296 def x myscale div myrad mul dup sin exch cos div myscale mul } \psplot[plotpoints=200]{-0.55632}{ 0.55632}{ /myscale 0.40 def /myrad 57.296 def x myscale div myrad mul dup sin exch cos div myscale mul } \psplot[plotpoints=200]{ 0.70032}{ 1.81296}{ /myscale 0.40 def /myrad 57.296 def x myscale div myrad mul dup sin exch cos div myscale mul } } } \end{pspicture} \ec \end{figure} \subsection{Properties of the sine integral} \label{fours:160.33} \index{sine integral!properties of} Properties of the sine integral $\sinint t$ of~(\ref{fours:160:35}) include the following. \bi \item Over the real domain $\Im(t) = 0$, the sine integral $\sinint t$ is positive for positive~$t$, negative for negative~$t$ and, of course, zero for $t=0$. \item \index{extremum!local} The local extrema of $\sinint t$ over the real domain $\Im(t) = 0$ occur at the zeros of $\sinarg t$. \item \index{extremum!of the sine integral} The global maximum and minimum of $\sinint t$ over the real domain $\Im(t) = 0$ occur respectively at the first positive and negative zeros of $\sinarg t$, which are $t = \pm\pi$. \item The sine integral converges toward \bq{fours:160:41} \lim_{t \ra \pm \infty} \sinint t = \pm\frac{2\pi}{4}. \eq \ei That the sine integral should reach its local extrema at the sine-argument's zeros ought to be obvious to the extent to which the concept of integration is understood. To explain the other properties it helps first to have expressed the sine integral in the form \bqb \sinint t &=& S_n + \int_{n\pi}^t \sinarg\tau \,d\tau, \\ S_n &\equiv& \sum_{j=0}^{n-1} U_j, \\ U_j &\equiv& \int_{j\pi}^{(j+1)\pi} \sinarg\tau \,d\tau, \\ n\pi &\le& t \makebox[\arraycolsep]{} < \makebox[\arraycolsep]{} (n+1)\pi, \\ 0 &\le& n, \ \ (j,n) \in \mathbb Z, \eqb where each partial integral~$U_j$ integrates over a single lobe of the sine-argument. The several~$U_j$ alternate in sign but, because each lobe majorizes the next (\S~\ref{taylor:316.25})---that is, because,% \footnote{ More rigorously, to give the reason perfectly unambiguously, one could fuss here for a third of a page or so over signs, edges and the like. To do so is left as an exercise to those who aspire to the pure mathematics profession. } %\footnote{ % To be extra precise: it is so because % $\left| \sinarg \tau \right| > \left| \sinarg \tau+\pi \right|$ % for all $\tau \ge 0$ except $\tau = n\pi$, $n > 0$, for which % $\left| \sinarg \tau \right| = \left| \sinarg \tau+\pi \right| = 0$. % To be extra, extra precise, one could take special note of the case % that $n=0$, but this is all rather too much pedantry for a book on % applied mathematics; and the narrative, as written, adequately % explains the concept without the help of footnotes like this one. %} in the integrand, $\left| \sinarg \tau \right| \ge \left| \sinarg \tau+\pi \right|$ for all $\tau \ge 0$---the magnitude of the area under each lobe exceeds that under the next, such that% {% \setlength\arraycolsep{0.40\arraycolsep}% \bqb 0 &\le& (-)^j\int_{j\pi}^t \sinarg \tau \,d\tau < (-)^jU_j < (-)^{j-1}U_{j-1}, \\ j\pi &\le& t \makebox[\arraycolsep]{} < \makebox[\arraycolsep]{} (j+1)\pi, \\ 0 &\le& j, \ j \in \mathbb Z \eqb% }% (except that the $U_{j-1}$ term of the inequality does not apply when $j=0$, since there is no~$U_{-1}$) and thus that \bqb & 0 = S_0 < S_{2m} < S_{2m+2} < S_{\infty} < S_{2m+3} < S_{2m+1} < S_1 &\\& \mbox{for all $m>0$, $m \in \mathbb Z$.} & \eqb The foregoing applies only when $t \ge 0$ but naturally one can reason similarly for $t \le 0$, concluding that the integral's global maximum and minimum over the real domain occur respectively at the sine-argument function's first positive and negative zeros, $t = \pm \pi$; and further concluding that the integral is positive for all positive~$t$ and negative for all negative~$t$. Equation~(\ref{fours:160:41}) wants some cleverness to calculate and will be the subject of the next subsection. \subsection{The sine integral's limit by complex contour} \label{fours:160.35} \index{sine integral!evaluation of by complex contour} % This paragraph seems worth saving here. % %\index{Wilbraham, Henry (1825--1883)} %The first two lines of~(\ref{fours:160:40}) are plain enough but its %last line is nonobvious. The sine-argument function is arguably the book's first instance %of a \emph{special function,} though because~(\ref{fours:160:10}) %expresses it in closed form it has not traditionally been classified as %such. We will have much to say about special functions in %[chapters not yet written], %but for the moment we will only observe that one often wants unusually %clever techniques to discover constants like the~% %$2\pi/4$ of~(\ref{fours:160:40}). Fortunately, some unusually clever %mathematicians have toiled, in the words of G.S.~Brown, ``a very long %time, in very dark rooms,''% %\footnote{\cite{Brown-lecture}} %over the past few centuries, precisely to solve problems like this one. %Other such problems have found their solutions quite by accident when a %mathematician (often an applied one), working on something else %entirely, had the considerable wit to realize that the answer to some %apparently unrelated but nonetheless interesting question had surfaced %unexpectedly in his algebra. The specific problem %of~(\ref{fours:160:40}) found its solution in~1848 and~1899 in the %latter manner at the hands respectively of Henry Wilbraham and %J.~Willard Gibbs. Section~\ref{fours:170} will explain. Equation~(\ref{fours:160:41}) has proposed that the sine integral converges toward a value of $2\pi/4$, but why? The integral's Taylor series~(\ref{fours:160:35}) is impractical to compute for large~$t$ and is useless for $t \ra \infty$, so it cannot answer the question. To evaluate the integral in the infinite limit, we shall have to think of something cleverer. Noticing per~(\ref{cexp:250:sin}) that \[ \sinarg z = \frac{e^{+iz} - e^{-iz}}{i2z}, \] rather than trying to integrate the sine-argument function all at once let us first try to integrate just one of its two complex terms, leaving the other term aside to handle later, for the moment computing only \[ I_1 \equiv \int_0^\infty \frac{e^{iz} \,dz}{i2z}. \] To compute the integral~$I_1$, we will apply the closed-contour technique of \S~\ref{inttx:250}, choosing a contour in the Argand plane that incorporates~$I_1$ but shuts out the integrand's pole at $z=0$. \index{false try} Many contours are possible and one is unlikely to find an amenable contour on the first attempt, but perhaps after several false tries we discover and choose the contour of Fig.~\ref{fours:160:fig-contour}. \begin{figure} \caption{A complex contour about which to integrate $e^{iz}/i2z$.} \label{fours:160:fig-contour} \index{contour!complex} \bc { \nc\xax{-5} \nc\xbx{-1.2} \nc\xcx{ 5} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) %\localscalebox{3.0}{3.0} { \nc\xx{2.5} \nc\xxc{0.7} \nc\xxr{2.2361} \nc\xxra{2.561} \nc\xxrb{177.439} \nc\xxrr{0.30} \nc\xxrra{18.435} \nc\xxrrb{161.565} \nc\xxrc{60} \nc\xxa{0.62} \nc\xxb{0.10} \nc\xxba{0.2846} \nc\xxbb{2.2338} \nc\xxs{2.6} \nc\polexy{0.10} \nc\pole{% {% \psset{linewidth=1.0pt}% \psline(-\polexy,-\polexy)( \polexy, \polexy)% \psline( \polexy,-\polexy)(-\polexy, \polexy)% }% } \psline[linewidth=0.5pt](-\xx,0)(\xx,0) \psline[linewidth=0.5pt](0,-\xxc)(0,\xx) \psline[linewidth=2.0pt]{c-c}% (\xxba,\xxb)(\xxbb,\xxb)(\xxbb,\xxbb)(-\xxbb,\xxbb)(-\xxbb,\xxb)(-\xxba,\xxb) \psline[linewidth=2.0pt]{cc->}(-0.7,\xxbb)(-0.8,\xxbb) \psarc[linewidth=2.0pt]{cc-cc}(0,0){\xxrr}{\xxrra}{\xxrrb} \rput(0,0){\pole} \rput[l](\xxs,0){$\Re(z)$} \rput[b](0,\xxs){$\Im(z)$} \rput( 1.15,0.35){$I_1$} \rput( 1.95,1.10){$I_2$} \rput( 0.40,1.95){$I_3$} \rput(-1.95,1.10){$I_4$} \rput(-1.15,0.35){$I_5$} \rput(-0.30,0.55){$I_6$} } \end{pspicture} } \ec \end{figure} The integral about the inner semicircle of this contour is \[ I_6 = \int_{C_6} \frac{e^{iz}\,dz}{i2z} = \lim_{\rho \ra 0^{+}} \int_{2\pi/2}^0 \frac{e^{iz}(i\rho e^{i\phi} \, d\phi)}{i2(\rho e^{i\phi})} = \int_{2\pi/2}^0 \frac{e^{i0} \, d\phi}{2} = -\frac{2\pi}{4}. \] The integral across the contour's top segment is \[ I_3 = \int_{C_3} \frac{e^{iz} \,dz}{i2z} = \lim_{a \ra \infty} \int_{a}^{-a} \frac{e^{i(x+ia)} \,dx}{i2z} = \lim_{a \ra \infty} \int_{-a}^{a} \frac{-e^{ix}e^{-a} \,dx}{i2z}, \] from which, according to the continuous triangle inequality~(\ref{inttx:250:triangle}), \[ \left| I_3 \right| \le \lim_{a \ra \infty} \int_{-a}^{a} \left| \frac{-e^{ix}e^{-a} \,dx}{i2z} \right| = \lim_{a \ra \infty} \int_{-a}^{a} \frac{e^{-a} \,dx}{2 \left| z \right|}; \] which, since $0 < a \le \left| z \right|$ across the segment, we can weaken to read \[ \left| I_3 \right| \le \lim_{a \ra \infty} \int_{-a}^{a} \frac{e^{-a} \,dx}{2a} = \lim_{a \ra \infty} e^{-a} = 0, \] only possible if \[ I_3 = 0. \] The integral up the contour's right segment is \[ I_2 = \int_{C_2} \frac{e^{iz} \,dz}{i2z} = \lim_{a \ra \infty} \int_{0}^{a} \frac{e^{i(a+iy)} \,dy}{2z} = \lim_{a \ra \infty} \int_{0}^{a} \frac{e^{ia}e^{-y} \,dy}{2z}, \] from which, according to the continuous triangle inequality, \[ \left| I_2 \right| \le \lim_{a \ra \infty} \int_{0}^{a} \left| \frac{e^{ia}e^{-y} \,dy}{2z} \right| = \lim_{a \ra \infty} \int_{0}^{a} \frac{e^{-y} \,dy}{2\left| z \right|}; \] which, since $0 < a \le \left| z \right|$ across the segment, we can weaken to read \[ \left| I_2 \right| \le \lim_{a \ra \infty} \int_{0}^{a} \frac{e^{-y} \,dy}{2a} = \lim_{a \ra \infty} \frac{1}{2a} = 0, \] only possible if \[ I_2 = 0. \] The integral down the contour's left segment is \[ I_4 = 0 \] for like reason. Because the contour encloses no pole, \[ \oint \frac{e^{iz} \,dz}{i2z} = I_1 + I_2 + I_3 + I_4 + I_5 + I_6 = 0, \] which in light of the foregoing calculations implies that \[ I_1 + I_5 = \frac{2\pi}{4}. \] Now, \[ I_1 = \int_{C_1} \frac{e^{iz} \,dz}{i2z} = \int_0^\infty \frac{e^{ix} \,dx}{i2x} \] is the integral we wanted to compute in the first place, but what is that~$I_5$? Answer: \[ I_5 = \int_{C_5} \frac{e^{iz \,dz}}{i2z} = \int_{-\infty}^0 \frac{e^{ix} \,dx}{i2x}; \] or, changing $-x \la x$, \[ I_5 = \int_0^{\infty} \frac{-e^{-ix} \,dx}{i2x}, \] which fortuitously happens to integrate the heretofore neglected term of the sine-argument function we started with. Thus, \[ \lim_{t \ra \infty} \sinint t = \int_0^\infty \sinarg x \,dx = \int_0^\infty \frac{e^{+ix} - e^{-ix}}{i2x} \,dx = I_1 + I_5 = \frac{2\pi}{4}, \] which was to be computed.% \footnote{ Integration by closed contour is a subtle technique, is it not? What a finesse this subsection's calculation has been! The author rather strongly sympathizes with the reader who still somehow cannot quite believe that contour integration actually works, but in the case of the sine integral another, quite independent method to evaluate the integral is known and it finds the same number $2\pi/4$. The interested reader can extract this other method from Gibbs' calculation in \S~\ref{fours:170}, which refers a sine integral to the known amplitude of a square wave. We said that it was fortuitous that~$I_5$, which we did not know how to eliminate, turned out to be something we needed anyway; but is it really merely fortuitous, once one has grasped the technique? An integration of $-e^{-iz}/i2z$ is precisely the sort of thing an experienced applied mathematician would expect to fall out as a byproduct of the contour integration of $e^{iz}/i2z$. The trick is to discover the contour from which it actually does fall out, the discovery being a process of informed trial and error. } % ---------------------------------------------------------------------- \section{Gibbs' phenomenon} \label{fours:170} \index{Wilbraham, Henry (1825--1883)} \index{Gibbs, Josiah Willard (1839--1903)} \index{Gibbs phenomenon} \index{discontinuous waveform} \index{waveform!discontinuous} Section~\ref{fours:100.80} has shown how the Fourier series suffices to represent a continuous, repeating waveform. Paradoxically, the chapter's examples have been of discontinuous waveforms like the square wave. At least in Fig.~\ref{fours:000:fig20} the Fourier series seems to work for such discontinuous waveforms, though we have never exactly demonstrated that it should work for them, or how. So, what does all this imply? \index{overshot} \index{oscillation} In one sense, it does not imply much of anything. One can represent a discontinuity by a relatively sharp continuity---as for instance one can represent the Dirac delta of Fig.~\ref{integ:670:fig-d} by the triangular pulse of Fig.~\ref{fours:095:fig1}, with its sloped edges, if~$T$ in~(\ref{fours:095:30}) is sufficiently small---and, considered in this light, the Fourier series works. Mathematically however one is more likely to approximate a Fourier series by truncating it after some finite number~$N$ of terms; and, indeed, so-called% \footnote{ So called because they pass low frequencies while suppressing high ones, though systems encountered in practice admittedly typically suffer a middle frequency domain through which frequencies are only partly suppressed. } ``low-pass'' physical systems that naturally suppress high frequencies% \footnote{\cite[\S~15.2]{JJH}} are common, in which case to truncate the series is more or less the right thing to do. Yet, a significant thing happens when one truncates the Fourier series. \emph{At a discontinuity, the Fourier series oscillates and overshoots.}% \footnote{\cite{Khamsi}} \index{ringing} Henry Wilbraham investigated this phenomenon as early as~1848\@. % bad break \linebreak J.~Willard Gibbs explored its engineering implications in~1899.% \footnote{% \cite{Wilbraham:1848}% \cite{Gibbs:1899}% } Let us along with them refer to the square wave of Fig.~\ref{fours:000:fig20} on page~\pageref{fours:000:fig20}. As further Fourier components are added the Fourier waveform better approximates the square wave, but, as we said, it oscillates about and overshoots---it ``rings about'' in the electrical engineer's vernacular---the square wave's discontinuities (the verb ``to ring'' here recalls the ringing of a bell or steel beam). This oscillation and overshot turn out to be irreducible, and moreover they can have significant physical effects. \index{integration!as summation} \index{summation!as integration} Changing $t - T_1/4 \la t$ in~(\ref{fours:000:20}) to delay the square wave by a quarter cycle yields \[ f(t) = \frac{8A}{2\pi} \sum_{j=0}^{\infty} \frac{1}{2j+1} \sin \left[\frac{(2j+1)(2\pi)t}{T_1}\right], \] which we can, if we like, write as \[ f(t) = \lim_{N \ra \infty} \frac{8A}{2\pi} \sum_{j=0}^{N-1} \frac{1}{2j+1} \sin \left[\frac{(2j+1)(2\pi)t}{T_1}\right]. \] Again changing \[ \Delta v \la \frac{2(2\pi)t}{T_1} \] makes this \[ f\left[\frac{T_1}{2(2\pi)} \,\Delta v\right] = \lim_{N\ra\infty} \frac{4A}{2\pi} \sum_{j=0}^{N-1} \sinarg \left[\left(j+\frac{1}{2}\right)\,\Delta v\right] \,\Delta v. \] Stipulating that~$\Delta v$ be infinitesimal, \[ 0 < \Delta v \ll 1, \] (which in light of the definition of~$\Delta v$ is to stipulate that $0 < t \ll T_1$) such that $dv \equiv \Delta v$ and, therefore, that the summation become an integration; and further defining \[ u \equiv N \,\Delta v; \] we have that \bq{fours:170:20} \lim_{N\ra\infty} f\left[\frac{T_1}{2(2\pi)N} u\right] = \frac{4A}{2\pi} \int_0^u \sinarg v \,dv = \frac{4A}{2\pi} \sinint u. \eq Equation~(\ref{fours:160:41}) gives us that $\lim_{u \ra \infty} \sinint u = 2\pi/4$, so~(\ref{fours:170:20}) as it should has it that $f(t) \approx A$ when% \footnote{ Here is an exotic symbol:~$\gnapprox$. It means what it appears to mean, that $t > 0$ and $t \not\approx 0$. } $t \gnapprox 0$. When $t \approx 0$ however it gives the waveform locally the sine integral's shape of Fig.~\ref{fours:160:figi}. \index{engineer} \index{mechanical engineer} \index{electrical engineer} Though unexpected the effect can and does actually arise in physical systems. When it does, the maximum value of $f(t)$ is of interest to mechanical and electrical engineers among others because, if an element in an engineered system will overshoot its designed position, the engineer wants to allow safely for the overshot. According to \S~\ref{fours:160.33}, the sine integral $\sinint u$ reaches its maximum at \[ u = \frac{2\pi}{2}, \] where according to~(\ref{fours:160:35}) % An Octave script to calculate f_max: % f=0; % term=4/(2*pi); % jmax=12; % for j = 0:jmax % if (j>0) % term *= -(2*pi/2)^2/((2*j)*(2*j+1)); % end % f += ((2*pi/2)/(2*j+1)) * term % end \[ f_\mr{max} = \frac{4A}{2\pi} \sinint \frac{2\pi}{2} = \frac{4A}{2\pi} \sum_{j=0}^{\infty} \left[ \frac{2\pi/2}{2j+1} \prod_{m=1}^j \frac{-(2\pi/2)^2}{(2m)(2m+1)} \right] \approx (\mbox{0x1.2DD2})A. \] This overshot, peaking momentarily at $(\mbox{0x1.2DD2})A$, and the associated sine-integral ringing constitute \emph{Gibbs' phenomenon,} as Fig.~\ref{fours:170:fig10} depicts. \begin{figure} \caption{Gibbs' phenomenon.} \label{fours:170:fig10} \bc \nc\xxxab{4.3} \nc\xxyab{1.2} \setlength\tla{3.0cm} \setlength\tlb{0.5cm} \nc\xxl{0.15} \nc\xxm{0.25} \nc\xxo{-0.10} \nc\xxgibbs{1.17898} \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-1.6} \nc\fyb{2.1} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} { \psset{linewidth=0.5pt} \psline(-\xxxab,0)(\xxxab,0) \psline(0,-\xxyab)(0,\xxyab) \uput[r](\xxxab,0){$t$} \uput[u](0,\xxyab){$f(t)$} \rput(\xxo\tla,\tlb){\psline(-\xxl,0)(\xxl,0)\psline{->}(0,\xxm)(0,0)} \rput(\xxo\tla,0.5\tlb){$A$} \rput(\xxo\tla,0){\psline{->}(0,-\xxm)(0,0)} } { \psset{linewidth=2.0pt} \psline(-1.35\tla,-\tlb)(-1.00\tla,-\tlb) \psline(-1.00\tla, \tlb)(-0.50\tla, \tlb) \psline(-0.50\tla,-\tlb)( 0.00\tla,-\tlb) \psline( 0.00\tla, \tlb)( 0.50\tla, \tlb) \psline( 0.50\tla,-\tlb)( 1.00\tla,-\tlb) \psline( 1.00\tla, \tlb)( 1.35\tla, \tlb) \nc\xxgibbsline[1]{\psline{c-c}(#1\tla,-\xxgibbs\tlb)(#1\tla,\xxgibbs\tlb)} \xxgibbsline{-1.00} \xxgibbsline{-0.50} \xxgibbsline{ 0.00} \xxgibbsline{ 0.50} \xxgibbsline{ 1.00} } } \end{pspicture} \ec \end{figure} We have said that Gibbs' phenomenon is irreducible, and indeed strictly this is so: a true discontinuity, if it is to obey Fourier, must overshoot according to Gibbs. Admittedly as earlier alluded, one can sometimes substantially evade Gibbs by softening a discontinuity's edge, giving it a steep but not vertical slope and maybe rounding its corners a little;% \footnote{ If the applied mathematician is especially exacting he might represent a discontinuity by the probability integral of % diagn [not yet written] or maybe (if slightly less exacting) as an arctangent, and indeed there are times at which he might do so. However, such extra-fine mathematical craftsmanship is unnecessary to this section's purpose. } or, alternately, by rolling the Fourier series off gradually rather than truncating it exactly at~$N$ terms. Engineers may do one or the other, or both, explicitly or implicitly, which is why the full Gibbs is not always observed in engineered systems. Nature may do likewise. Neither however is the point. The point is that sharp discontinuities do not behave in the manner one might na\"ively have expected, yet that one can still analyze them profitably, adapting this section's subtle technique as the circumstance might demand. A good engineer or other applied mathematician will make himself aware of Gibbs' phenomenon and of the mathematics behind it for this reason. % ---------------------------------------------------------------------- derivations-0.53.20120414.orig/tex/intro.tex0000644000000000000000000004154311742566274017023 0ustar rootroot% ---------------------------------------------------------------------- \chapter{Introduction} \label{intro} \index{derivation} \index{proof} This is a book of applied mathematical proofs. If you have seen a mathematical result, if you want to know why the result is so, you can look for the proof here. The book's purpose is to convey the essential ideas underlying the derivations of a large number of mathematical results useful in the modeling of physical systems. To this end, the book emphasizes main threads of mathematical argument and the motivation underlying the main threads, de\"emphasizing formal mathematical rigor. It derives mathematical results from the purely applied perspective of the scientist and the engineer. The book's chapters are topical. This first chapter treats a few introductory matters of general interest. % ---------------------------------------------------------------------- \section{Applied mathematics} \label{intro:220} \index{applied mathematics} \index{mathematics!applied} \index{mathematician!applied} What is applied mathematics? % \begin{quote} Applied mathematics is a branch of mathematics that concerns itself with the application of mathematical knowledge to other domains\mdots The question of what is applied mathematics does not answer to logical classification so much as to the sociology of professionals who use mathematics.~\cite{def-applied-math} \end{quote} % That is about right, on both counts. In this book we shall define \emph{applied mathematics} to be correct mathematics useful to scientists, engineers and the like; proceeding not from reduced, well-defined sets of axioms but rather directly from a nebulous mass of natural arithmetical, geometrical and classical-algebraic idealizations of physical systems; demonstrable but generally lacking the detailed rigor of the professional mathematician. % ---------------------------------------------------------------------- \section{Rigor} \label{intro:284} \index{rigor} \index{professional mathematics} \index{pure mathematics} \index{mathematics!professional or pure} \index{mathematician!professional} It is impossible to write such a book as this without some discussion of mathematical rigor. Applied and pure mathematics differ principally and essentially in the layer of abstract definitions the latter subimposes beneath the physical ideas the former seeks to model. Notions of mathematical rigor fit far more comfortably in the abstract realm of the professional mathematician; they do not always translate so gracefully to the applied realm. The applied mathematical reader should be aware of this difference. \subsection{Axiom and definition} \label{intro:284.2} \index{axiom} \index{definition} \index{irreducibility} Ideally, a professional mathematician knows or precisely specifies in advance the set of fundamental axioms he means to use to derive a result. A prime aesthetic here is irreducibility: no axiom in the set should overlap the others or be specifiable in terms of the others. Geometrical argument---proof by sketch---is distrusted. The professional mathematical literature discourages undue pedantry indeed, but its readers do implicitly demand a convincing assurance that its writers \emph{could} derive results in pedantic detail if called upon to do so. Precise definition here is critically important, which is why the professional mathematician tends not to accept blithe statements such as that \[ \frac{1}{0} = \infty, \] without first inquiring as to exactly what is meant by symbols like~$0$ and~$\infty$. \index{model} \index{geometrical argument} \index{proof!by sketch} \index{sketch, proof by} The applied mathematician begins from a different base. His ideal lies not in precise definition or irreducible axiom, but rather in the elegant modeling of the essential features of some physical system. Here, mathematical definitions tend to be made up \emph{ad hoc} along the way, based on previous experience solving similar problems, adapted implicitly to suit the model at hand. If you ask the applied mathematician exactly what his axioms are, which symbolic algebra he is using, he usually doesn't know; what he knows is that the bridge is founded in certain soils with specified tolerances, suffers such-and-such a wind load, etc. To avoid error, the applied mathematician relies not on abstract formalism but rather on a thorough mental grasp of the essential physical features of the phenomenon he is trying to model. An equation like \[ \frac{1}{0} = \infty \] may make perfect sense without further explanation to an applied mathematical readership, depending on the physical context in which the equation is introduced. Geometrical argument---proof by sketch---is not only trusted but treasured. Abstract definitions are wanted only insofar as they smooth the analysis of the particular physical problem at hand; such definitions are seldom promoted for their own sakes. \index{Heaviside, Oliver (1850--1925)} \index{Hilbert, David (1862--1943)} \index{Courant, Richard (1888--1972)} \index{physicist} The irascible Oliver Heaviside, responsible for the important applied mathematical technique of phasor analysis, once said, \begin{quote} It is shocking that young people should be addling their brains over mere logical subtleties, trying to understand the proof of one obvious fact in terms of something equally \ldots\ obvious.~\cite{mathbios} \end{quote} Exaggeration, perhaps, but from the applied mathematical perspective \linebreak % bad break Heaviside nevertheless had a point. The professional mathematicians \linebreak % bad break Richard Courant and David Hilbert put it more soberly in~1924 when they wrote, \begin{quote} Since the seventeenth century, physical intuition has served as a vital source for mathematical problems and methods. Recent trends and fashions have, however, weakened the connection between mathematics and physics; mathematicians, turning away from the roots of mathematics in intuition, have concentrated on refinement and emphasized the postulational side of mathematics, and at times have overlooked the unity of their science with physics and other fields. In many cases, physicists have ceased to appreciate the attitudes of mathematicians.\ \cite[Preface]{Courant/Hilbert} \end{quote} Although the present book treats ``the attitudes of mathematicians'' with greater deference than some of the unnamed~1924 physicists might have done, still, Courant and Hilbert could have been speaking for the engineers and other applied mathematicians of our own day as well as for the physicists of theirs. To the applied mathematician, the mathematics is not principally meant to be developed and appreciated for its own sake; it is meant to be \emph{used.} This book adopts the Courant-Hilbert perspective. %\footnote{ % Section~\ref{taylor:310} and Ch.~\ref{taylor}'s % footnote~\ref{taylor:310:fn1} pose a particularly typical instance of % the distinction. %} \index{style} The introduction you are now reading is not the right venue for an essay on why both kinds of mathematics---applied and professional (or pure)---are needed. Each kind has its place; and although it is a stylistic error to mix the two indiscriminately, clearly the two have much to do with one another. However this may be, this book is a book of derivations of applied mathematics. The derivations here proceed by a purely applied approach. \subsection{Mathematical extension} \label{intro:284.1} \index{extension} Profound results in mathematics are occasionally achieved simply by extending results already known. For example, negative integers and their properties can be discovered by counting backward---3, 2, 1, 0---then asking what follows (precedes?) 0 in the countdown and what properties this new, negative integer must have to interact smoothly with the already known positives. The astonishing Euler's formula (\S~\ref{cexp:230}) is discovered by a similar but more sophisticated mathematical extension. More often, however, the results achieved by extension are unsurprising and not very interesting in themselves. Such extended results are the faithful servants of mathematical rigor. Consider for example the triangle on the left of Fig.~\ref{intro:284:fig}. \begin{figure} \caption{Two triangles.} \label{intro:284:fig} \bc \begin{pspicture}(-4.2,-1.1)(5.8,3) \small \nc\xa{3.0} \nc\xb{2.0} \nc\xc{0.6} \nc\xd{3.6} \nc\xe{2.0} \nc\xee{0.9} \nc\xq{0.3} \nc\xf{0.15} \nc\xk{0.3} \nc\xkk{0.2} \nc\xkkk{0.25} \nc\xg{0.2} \nc\xj[2]{ \psline[linewidth=0.5pt](-#1,\xf)(-#1,-\xf) \psline[linewidth=0.5pt]( #1,\xf)( #1,-\xf) \psline[linewidth=0.5pt]{->}(-\xg,0)(-#1,0) \psline[linewidth=0.5pt]{->}( \xg,0)( #1,0) \rput(0,0){#2} } \rput(-\xa,0){ \pspolygon[linewidth=2.0pt](-\xb,0)(\xb,0)(\xc,\xe) \psline[linewidth=0.5pt,linestyle=dashed](\xc,0)(\xc,\xe) \rput(\xc,0){\psline[linewidth=0.5pt](-\xq,0)(-\xq,\xq)(0,\xq)} \rput(0,-\xk){\xj{\xb}{$b$}} \rput(\xc,\xee){\rput(\xkk,0){$h$}} \rput(-0.6,0){\rput(0,\xkkk){$b_1$}} \rput( 1.2,0){\rput(0,\xkkk){$b_2$}} } \rput( \xa,0){ \psline[linewidth=2.0pt]{c-c}(\xd,\xe)(-\xb,0)(\xb,0)(\xd,\xe) \psline[linewidth=0.5pt,linestyle=dashed](0,0)(\xd,0)(\xd,\xe) \rput(\xd,0){\psline[linewidth=0.5pt](-\xq,0)(-\xq,\xq)(0,\xq)} \rput(0,-\xk){\xj{\xb}{$b$}} \rput(\xd,\xee){\rput(\xkk,0){$h$}} \rput( 2.9,0){\rput(0,\xkkk){$-b_2$}} } \end{pspicture} \ec \end{figure} This triangle is evidently composed of two right triangles of areas \bqb A_1&=&\frac{b_1h}{2}, \\ A_2&=&\frac{b_2h}{2} \eqb (each right triangle is exactly half a rectangle). Hence the main triangle's area is \[ A = A_1 + A_2 = \frac{(b_1+b_2)h}{2} = \frac{bh}{2}. \] Very well. What about the triangle on the right? Its~$b_1$ is not shown on the figure, and what is that~$-b_2$, anyway? Answer: the triangle is composed of the \emph{difference} of two right triangles, with~$b_1$ the base of the larger, overall one: $b_1=b+(-b_2)$. The~$b_2$ is negative because the sense of the small right triangle's area in the proof is negative: the small area is subtracted from the large rather than added. By extension on this basis, the main triangle's area is again seen to be $A = bh/2$. The proof is exactly the same. In fact, once the central idea of adding two right triangles is grasped, the extension is really rather obvious---too obvious to be allowed to burden such a book as this. \index{edge case} Excepting the uncommon cases where extension reveals something interesting or new, this book generally leaves the mere extension of proofs---including the validation of edge cases and over-the-edge cases---as an exercise to the interested reader. % ---------------------------------------------------------------------- \section{Complex numbers and complex variables} \label{intro:310} \index{complex number} \index{number!complex} \index{complex variable} \index{variable!complex} More than a mastery of mere logical details, it is an holistic view of the mathematics and of its use in the modeling of physical systems which is the mark of the applied mathematician. A \emph{feel} for the math is the great thing. Formal definitions, axioms, symbolic algebras and the like, though often useful, are felt to be secondary. The book's rapidly staged development of complex numbers and complex variables is planned on this sensibility. Sections~\ref{alggeo:225}, \ref{trig:278}, \ref{trig:280}, \ref{drvtv:230.35}, \ref{drvtv:240}, \ref{noth:320}, \ref{inttx:250} and~\ref{inttx:260.50}, plus all of Chs.~\ref{cexp} and~\ref{taylor}, constitute the book's principal stages of complex development. In these sections and throughout the book, the reader comes to appreciate that most mathematical properties which apply for real numbers apply equally for complex, that few properties concern real numbers alone. % ********************************** % Do not forget that purec.tex quotes the following paragraph. If you % change it here, change it there, too. % ********************************** Pure mathematics develops an abstract theory of the complex variable.% \footnote{\cite{Arnold:1997}\cite{Fisher}\cite{Spiegel}\cite{Hildebrand}} The abstract theory is quite beautiful. However, its arc takes off too late and flies too far from applications for such a book as this. Less beautiful but more practical paths to the topic exist;% \footnote{ See Ch.~\ref{taylor}'s footnote~\ref{taylor:320:fn10}. } this book leads the reader along one of these. For supplemental reference, a bare sketch of the abstract theory of the complex variable is found in Appendix~\ref{purec}. %% ---------------------------------------------------------------------- % %\section{To undergraduate readers} %\label{intro:320} % %Technical books written for a postgraduate readership tend to present %their mathematics tersely, as this book does in most places. Though %the book is not intended solely for postgraduates, it does leave %significant responsibility on the reader's shoulders. A topic a %lower-division college text would develop in twenty pages---or an %upper-division text, in five---this book develops in two. Readers %unused to such density may find the book hard to read at first. % %Karl Hahn has written, ``[A]s far as the \ldots complaint \ldots that %math is hard, I can't help that. It is hard.'' Hahn is right. % %To spend twenty pages to develop two pages' worth of mathematics however %does not make the math much easier, however; it mostly just makes the %pages turn faster. What the extra pages do is to guide the student %\emph{who has not yet fully learned how to learn.} Such a book as the %one you are now reading can hardly afford the pages. % %This book omits no essential theory by design, but neither does it bring %much in the way of application, reflection, practice or exercise, which %of course are at least as necessary as the theory is. Postgraduate %technical books mostly expect their readers to provide their own %application, reflection, practice and exercise. This book expects that, %too. % %If you have never tried to read a book of this kind, then you might, %starting now. It is a useful skill. Allow plenty of time per page. % % ---------------------------------------------------------------------- \section{On the text} \label{intro:290} The book gives numerals in hexadecimal. It denotes variables in Greek letters as well as Roman. Readers unfamiliar with the hexadecimal notation will find a brief orientation thereto in Appendix~\ref{hex}. Readers unfamiliar with the Greek alphabet will find it in Appendix~\ref{greek}. \index{GNU General Public License} \index{General Public License, GNU} \index{GNU GPL} \index{GPL} \index{Debian} \index{Debian Free Software Guidelines} \index{DFSG (Debian Free Software Guidelines)} Licensed to the public under the GNU General Public Licence~\cite{GPL}, version~2, this book meets the Debian Free Software Guidelines~\cite{DFSG}. %\index{citation} %If you cite an equation, section, chapter, figure or other item from %this book, it is recommended that you include in your citation the %book's precise %draft %date as given on the title page. The reason is %that equation numbers, chapter numbers and the like are numbered %automatically by the \LaTeX\ typesetting software: such numbers can %change arbitrarily from draft to draft. If an exemplary citation helps, %see~\cite{self} in the bibliography. A book of mathematical derivations by its nature can tend to make dry, even gray reading. Delineations of black and white become the book's duty. Mathematics however should serve the demands not only of deduction but equally of insight, by the latter of which alone mathematics derives either feeling or use. Yet, though \emph{this} book does try---at some risk to strict consistency of tone---to add color in suitable shades, to strike an appropriately lively balance between the opposite demands of logical progress and literary relief; nonetheless, neither every sequence of equations nor every conjunction of figures is susceptible to an apparent hue the writer can openly paint upon it, but only to that abeyant hue, that luster which reveals or reflects the fire of the reader's own mathematical imagination, which color otherwise remains unobserved. The book's subject and purpose thus restrict its overt style. % The following paragraph comes across as scolding. % %The reader unfamiliar with such a style %% (a relatively accessibly species of what publishers in the writer's %% country call a ``graduate-level'' style) %may prefer to begin reading with a pencil and notebook in hand; for few %practices will help more to confirm understanding, or to discover the %latent train of thought in technical prose---indeed, even to perceive %the aforementioned abeyant hue---than taking notes. The book begins by developing the calculus of a single variable. derivations-0.53.20120414.orig/tex/xkeyval.sty0000644000000000000000000001150611742575144017362 0ustar rootroot%% %% This is file `xkeyval.sty', %% generated with the docstrip utility. %% %% The original source files were: %% %% xkeyval.dtx (with options: `xkvlatex') %% %% --------------------------------------- %% Copyright (C) 2004-2008 Hendri Adriaens %% --------------------------------------- %% %% This work may be distributed and/or modified under the %% conditions of the LaTeX Project Public License, either version 1.3 %% of this license or (at your option) any later version. %% The latest version of this license is in %% http://www.latex-project.org/lppl.txt %% and version 1.3 or later is part of all distributions of LaTeX %% version 2003/12/01 or later. %% %% This work has the LPPL maintenance status "maintained". %% %% This Current Maintainer of this work is Hendri Adriaens. %% %% This work consists of the file xkeyval.dtx and derived files %% keyval.tex, xkvtxhdr.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, pst-xkey.tex, pst-xkey.sty, xkveca.cls, xkvecb.cls, %% xkvesa.sty, xkvesb.sty, xkvesc.sty, xkvex1.tex, xkvex2.tex, %% xkvex3.tex and xkvex4.tex. %% %% The following files constitute the xkeyval bundle and must be %% distributed as a whole: readme, xkeyval.pdf, keyval.tex, %% pst-xkey.sty, pst-xkey.tex, xkeyval.sty, xkeyval.tex, xkvview.sty, %% xkvltxp.sty, xkvtxhdr.tex, pst-xkey.dtx and xkeyval.dtx. %% \NeedsTeXFormat{LaTeX2e}[1995/12/01] \ProvidesPackage{xkeyval} [2008/08/13 v2.6a package option processing (HA)] \ifx\XKeyValLoaded\endinput\else\input xkeyval \fi \edef\XKVcatcodes{% \catcode`\noexpand\=\the\catcode`\=\relax \catcode`\noexpand\,\the\catcode`\,\relax \let\noexpand\XKVcatcodes\relax } \catcode`\=12\relax \catcode`\,12\relax \let\XKV@doxs\relax \def\XKV@warn#1{\PackageWarning{xkeyval}{#1}} \def\XKV@err#1{\PackageError{xkeyval}{#1}\@ehc} \XKV@whilist\@filelist\XKV@tempa\ifx\XKV@documentclass\@undefined\fi{% \filename@parse\XKV@tempa \ifx\filename@ext\@clsextension \XKV@ifundefined{opt@\filename@area\filename@base.\filename@ext }{}{% \edef\XKV@documentclass{% \filename@area\filename@base.\filename@ext }% }% \fi } \ifx\XKV@documentclass\@undefined \XKV@err{xkeyval loaded before \protect\documentclass}% \let\XKV@documentclass\@empty \let\XKV@classoptionslist\@empty \else \let\XKV@classoptionslist\@classoptionslist \def\XKV@tempa#1{% \let\@classoptionslist\@empty \XKV@for@n{#1}\XKV@tempa{% \expandafter\in@\expandafter=\expandafter{\XKV@tempa}% \ifin@\else\XKV@addtolist@o\@classoptionslist\XKV@tempa\fi }% } \expandafter\XKV@tempa\expandafter{\@classoptionslist} \fi \def\XKV@testopte#1{% \XKV@ifstar{\XKV@sttrue\XKV@t@stopte#1}{\XKV@stfalse\XKV@t@stopte#1}% } \def\XKV@t@stopte#1{\@testopt{\XKV@t@st@pte#1}{KV}} \def\XKV@t@st@pte#1[#2]{% \XKV@makepf{#2}% \@ifnextchar<{\XKV@@t@st@pte#1}% {\XKV@@t@st@pte#1<\@currname.\@currext>}% } \def\XKV@@t@st@pte#1<#2>{% \XKV@sp@deflist\XKV@fams{#2}% \@testopt#1{}% } \def\DeclareOptionX{% \let\@fileswith@pti@ns\@badrequireerror \XKV@ifstar\XKV@dox\XKV@d@x } \long\def\XKV@dox#1{\XKV@toks{#1}\edef\XKV@doxs{\the\XKV@toks}} \def\XKV@d@x{\@testopt\XKV@@d@x{KV}} \def\XKV@@d@x[#1]{% \@ifnextchar<{\XKV@@@d@x[#1]}{\XKV@@@d@x[#1]<\@currname.\@currext>}% } \def\XKV@@@d@x[#1]<#2>#3{\@testopt{\define@key[#1]{#2}{#3}}{}} \def\ExecuteOptionsX{\XKV@stfalse\XKV@plfalse\XKV@t@stopte\XKV@setkeys} \def\ProcessOptionsX{\XKV@plfalse\XKV@testopte\XKV@pox} \def\XKV@pox[#1]{% \let\XKV@tempa\@empty \XKV@inpoxtrue \let\@fileswith@pti@ns\@badrequireerror \edef\XKV@testclass{\@currname.\@currext}% \ifx\XKV@testclass\XKV@documentclass \let\@unusedoptionlist\XKV@classoptionslist \XKV@ifundefined{ver@xkvltxp.sty}{}{% \@onelevel@sanitize\@unusedoptionlist }% \else \ifXKV@st \def\XKV@tempb##1,{% \def\CurrentOption{##1}% \ifx\CurrentOption\@nnil\else \XKV@g@tkeyname##1=\@nil\CurrentOption \XKV@key@if@ndefined{\CurrentOption}{}{% \XKV@useoption{##1}% \XKV@addtolist@n\XKV@tempa{##1}% }% \expandafter\XKV@tempb \fi }% \expandafter\XKV@tempb\XKV@classoptionslist,\@nil,% \fi \fi \expandafter\XKV@addtolist@o\expandafter \XKV@tempa\csname opt@\@currname.\@currext\endcsname \def\XKV@tempb{\XKV@setkeys[#1]}% \expandafter\XKV@tempb\expandafter{\XKV@tempa}% \let\XKV@doxs\relax \let\XKV@rm\@empty \XKV@inpoxfalse \let\@fileswith@pti@ns\@@fileswith@pti@ns \AtEndOfPackage{\let\@unprocessedoptions\relax}% } \def\XKV@useoption#1{% \def\XKV@resa{#1}% \XKV@ifundefined{ver@xkvltxp.sty}{}{% \@onelevel@sanitize\XKV@resa }% \@expandtwoargs\@removeelement{\XKV@resa}% {\@unusedoptionlist}\@unusedoptionlist } \DeclareOptionX*{% \PackageWarning{xkeyval}{Unknown option `\CurrentOption'}% } \ProcessOptionsX \XKVcatcodes \endinput %% %% End of file `xkeyval.sty'. derivations-0.53.20120414.orig/tex/noth.tex0000644000000000000000000007321211742566274016636 0ustar rootroot% ---------------------------------------------------------------------- \chapter{Primes, roots and averages} \label{noth} This chapter gathers a few significant topics, each of whose treatment seems too brief for a chapter of its own. % ---------------------------------------------------------------------- \section{Prime numbers} \label{noth:220} \index{prime number} \index{composite number} \index{integer!prime} \index{integer!composite} A \emph{prime number}---or simply, a \emph{prime}---is an integer greater than one, divisible only by one and itself. A \emph{composite number} is an integer greater than one and not prime. A composite number can be composed as a product of two or more prime numbers. All positive integers greater than one are either composite or prime. \index{number theory} \index{cryptography} The mathematical study of prime numbers and their incidents constitutes \emph{number theory,} and it is a deep area of mathematics. The deeper results of number theory seldom arise in applications,% \footnote{ The deeper results of number theory do arise in cryptography, or so the author has been led to understand. Although cryptography is literally an application of mathematics, its spirit is that of pure mathematics rather than of applied. If you seek cryptographic derivations, this book is probably not the one you want. } however, so we will confine our study of number theory in this book to one or two of its simplest, most broadly interesting results. \subsection{The infinite supply of primes} \label{noth:220.10} \index{prime number!infinite supply of} The first primes are evidently $2,3,5,7,\mbox{0xB},\ldots$\,. Is there a last prime? To show that there is not, suppose that there were. More precisely, suppose that there existed exactly~$N$ primes, with~$N$ finite, letting $p_1,p_2,\ldots,p_N$ represent these primes from least to greatest. Now consider the product of all the primes, \[ C=\prod_{j=1}^{N}p_j. \] What of $C+1$? Since $p_1=2$ divides~$C$, it cannot divide $C+1$. Similarly, since $p_2=3$ divides~$C$, it also cannot divide $C+1$. The same goes for $p_3=5$, $p_4=7$, $p_5=\mbox{0xB}$, etc. Apparently none of the primes in the~$p_j$ series divides $C+1$, which implies either that $C+1$ itself is prime, or that $C+1$ is composed of primes not in the series. But the latter is assumed impossible on the ground that the~$p_j$ series includes all primes; and the former is assumed impossible on the ground that $C+1>C>p_N$, with~$p_N$ the greatest prime. The contradiction proves false the assumption which gave rise to it. The false assumption: that there were a last prime. Thus there is no last prime. No matter how great a prime number one finds, a greater can always be found. The supply of primes is infinite.% \footnote{\cite{Stepney}} \index{Euclid (c.~300~B.C.)} \index{\emph{reductio ad absurdum}} \index{proof!by contradiction} \index{contradiction, proof by} Attributed to the ancient geometer Euclid, the foregoing proof is a classic example of mathematical \emph{reductio ad absurdum,} or as usually styled in English, \emph{proof by contradiction.}% \footnote{% \cite[Appendix~1]{Sagan}% \cite[``Reductio ad absurdum,'' 02:36, 28 April 2006]{wikip} } \subsection{Compositional uniqueness} \label{noth:220.20} \index{composite number!compositional uniqueness of} \index{integer!compositional uniqueness of} \index{plausible assumption} \index{prime factorization} \index{factorization!prime} Occasionally in mathematics, plausible assumptions can hide subtle logical flaws. One such plausible assumption is the assumption that every positive integer has a unique \emph{prime factorization.} It is readily seen that the first several positive integers---$1=()$, $2=(2^1)$, $3=(3^1)$, $4=(2^2)$, $5=(5^1)$, $6=(2^1)(3^1)$, $7=(7^1)$, $8=(2^3)$, \ldots---each have unique prime factorizations, but is this necessarily true of all positive integers? To show that it is true, suppose that it were not.% \footnote{ Unfortunately the author knows no more elegant proof than this, yet cannot even cite this one properly. The author encountered the proof in some book over a decade ago. The identity of that book is now long forgotten. } More precisely, suppose that there did exist positive integers factorable each in two or more distinct ways, with the symbol~$C$ representing the least such integer. Noting that~$C$ must be composite (prime numbers by definition are each factorable only one way, like $5=[5^1]$), let \bqb C_p &\equiv& \prod_{j=1}^{N_p} p_j, \\ C_q &\equiv& \prod_{k=1}^{N_q} q_k, \\ C_p = C_q &=& C, \\ p_j &\le& p_{j+1}, \\ q_k &\le& q_{k+1}, \\ p_1 &\le& q_1, \\ N_p &>& 1, \\ N_q &>& 1, \eqb where~$C_p$ and~$C_q$ represent two distinct prime factorizations of the same number~$C$ and where the~$p_j$ and~$q_k$ are the respective primes ordered from least to greatest. We see that \[ p_j \neq q_k \] for any~$j$ and~$k$---that is, that the same prime cannot appear in both factorizations---because if the same prime~$r$ did appear in both then $C/r$ either would be prime (in which case both factorizations would be $[r][C/r]$, defying our assumption that the two differed) or would constitute an ambiguously factorable composite integer less than~$C$ when we had already defined~$C$ to represent the least such. Among other effects, the fact that $p_j \neq q_k$ strengthens the definition $p_1\le q_1$ to read \[ p_1 < q_1. \] Let us now rewrite the two factorizations in the form \bqb C_p &=& p_1 A_p, \\ C_q &=& q_1 A_q, \\ C_p = C_q &=& C, \\ A_p &\equiv& \prod_{j=2}^{N_p} p_j, \\ A_q &\equiv& \prod_{k=2}^{N_q} q_k, \eqb where~$p_1$ and~$q_1$ are the least primes in their respective factorizations. Since~$C$ is composite and since $p_1 < q_1$, we have that \bqb 1 < p_1 < q_1 \le \sqrt C \le A_q < A_p < C, \eqb which implies that \[ p_1q_1 < C. \] The last inequality lets us compose the new positive integer \[ B = C - p_1q_1, \] which might be prime or composite (or unity), but which either way enjoys a unique prime factorization because $B0,\ p>0,\ q>1. \] Squaring the equation, we have that \[ \frac{p^2}{q^2} = n, \] which form is evidently also fully reduced. But if $q>1$, then the fully reduced $n=p^2/q^2$ is not an integer as we had assumed that it was. The contradiction proves false the assumption which gave rise to it. Hence there exists no rational, nonintegral $\sqrt{n}$, as was to be demonstrated. The proof is readily extended to show that any $x=n^{j/k}$ is irrational if nonintegral, the extension by writing $p^k/q^k=n^j$ then following similar steps as those this paragraph outlines. That's all the number theory the book treats; but in applied math, so little will take you pretty far. Now onward we go to other topics. % ---------------------------------------------------------------------- \section[The existence and number of roots]{The existence and number of polynomial roots} \label{noth:320} \index{polynomial} \index{root} This section shows that an $N$th-order polynomial must have exactly~$N$ roots. \subsection{Polynomial roots} \label{noth:320.20} \index{long division!by $z-\alpha$} \index{remainder!after division by $z-\alpha$} \index{remainder!zero} Consider the quotient $B(z)/A(z)$, where \bqb A(z) &=& z-\alpha, \\ B(z) &=& \sum_{k=0}^N b_k z^k, \ \ N>0,\ b_N \neq 0, \\ B(\alpha) &=& 0. \eqb In the long-division symbology of Table~\ref{alggeo:228:tbl-down}, \[ B(z) = A(z)Q_0(z) + R_0(z), \] where $Q_0(z)$ is the quotient and $R_0(z)$, a remainder. In this case the divisor $A(z)=z-\alpha$ has first order, and as \S~\ref{alggeo:228.20} has observed, first-order divisors leave zeroth-order, constant remainders $R_0(z) = \rho$. Thus substituting yields \[ B(z) = (z-\alpha)Q_0(z) + \rho. \] When $z=\alpha$, this reduces to \[ B(\alpha) = \rho. \] But $B(\alpha) = 0$ by assumption, so \[ \rho = 0. \] Evidently the division leaves no remainder~$\rho$, which is to say that \emph{$z-\alpha$ exactly divides every polynomial $B(z)$ of which $z=\alpha$ is a root.} Note that if the polynomial $B(z)$ has order~$N$, then the quotient~$Q(z) = B(z)/(z-\alpha)$ has exactly order~$N-1$. That is, the leading,~$z^{N-1}$ term of the quotient is never null. The reason is that if the leading term were null, if $Q(z)$ had order less than $N-1$, then $B(z) = (z-\alpha)Q(z)$ could not possibly have order~$N$ as we have assumed. \subsection{The fundamental theorem of algebra} \label{noth:320.30} \index{algebra!fundamental theorem of} \index{fundamental theorem of algebra} \index{polynomial!of order~$N$ having~$N$ roots} The \emph{fundamental theorem of algebra} holds that any polynomial $B(z)$ of order~$N$ can be factored \bq{noth:320:50} B(z) = \sum_{k=0}^{N} b_kz^k = b_N \prod_{j=1}^{N} (z-\alpha_j), \ \ b_N\neq 0, \eq where the~$\alpha_k$ are the~$N$ roots of the polynomial.% \footnote{ Professional mathematicians typically state the theorem in a slightly different form. They also prove it in rather a different way.\ \cite[Ch.~10, Prob.~74]{Hildebrand} } To prove the theorem, it suffices to show that all polynomials of order $N>0$ have at least one root; for if a polynomial of order~$N$ has a root~$\alpha_N$, then according to \S~\ref{noth:320.20} one can divide the polynomial by $z-\alpha_N$ to obtain a new polynomial of order $N-1$. To the new polynomial the same logic applies: if it has at least one root $\alpha_{N-1}$, then one can divide \emph{it} by $z-\alpha_{N-1}$ to obtain yet another polynomial of order $N-2$; and so on, one root extracted at each step, factoring the polynomial step by step into the desired form $b_N \prod_{j=1}^{N} (z-\alpha_j)$. \index{locus} \index{polynomial!having at least one root} It remains however to show that there exists no polynomial $B(z)$ of order $N>0$ lacking roots altogether. To show that there is no such polynomial, consider the locus% \footnote{ A \emph{locus} is the geometric collection of points which satisfy a given criterion. For example, the locus of all points in a plane at distance~$\rho$ from a point~$O$ is a circle; the locus of all points in three-dimensional space equidistant from two points~$P$ and~$Q$ is a plane; etc. } of all $B(\rho e^{i\phi})$ in the Argand range plane (Fig.~\ref{alggeo:225:fig}), where $z=\rho e^{i\phi}$,~$\rho$ is held constant, and~$\phi$ is variable. Because $e^{i(\phi+n2\pi)} = e^{i\phi}$ and no fractional powers of~$z$ appear in~(\ref{noth:320:50}), this locus forms a closed loop. At very large~$\rho$, the~$b_Nz^N$ term dominates $B(z)$, so the locus there evidently has the general character of $b_N\rho^N e^{iN\phi}$. As such, the locus is nearly but not quite a circle at radius~$b_N\rho^N$ from the Argand origin $B(z)=0$, revolving~$N$ times at that great distance before exactly repeating. On the other hand, when $\rho = 0$ the entire locus collapses on the single point $B(0)=b_0$. Now consider the locus at very large~$\rho$ again, but this time let~$\rho$ slowly shrink. Watch the locus as~$\rho$ shrinks. The locus is like a great string or rubber band, joined at the ends and looped in~$N$ great loops. As~$\rho$ shrinks smoothly, the string's shape changes smoothly. Eventually~$\rho$ disappears and the entire string collapses on the point $B(0)=b_0$. Since the string originally has looped~$N$ times at great distance about the Argand origin, but at the end has collapsed on a single point, then at some time between it must have swept through the origin and every other point within the original loops. After all, $B(z)$ is everywhere differentiable, so the string can only \emph{sweep} as~$\rho$ decreases; it can never skip. The Argand origin lies inside the loops at the start but outside at the end. If so, then the values of~$\rho$ and~$\phi$ precisely where the string has swept through the origin by definition constitute a root $B(\rho e^{i\phi}) = 0$. Thus as we were required to show, $B(z)$ does have at least one root, which observation completes the applied demonstration of the fundamental theorem of algebra. \index{quadratic expression} \index{cubic expression} \index{quartic expression} \index{quintic expression} The fact that the roots exist is one thing. Actually finding the roots numerically is another matter. For a quadratic (second order) polynomial,~(\ref{alggeo:240:quad}) gives the roots. For cubic (third order) and quartic (fourth order) polynomials, formulas for the roots are known (see Ch.~\ref{cubic}) though seemingly not so for quintic (fifth order) and higher-order polynomials;% \footnote{\label{noth:320:fn20}% In a celebrated theorem of pure mathematics \cite[``Abel's impossibility theorem'']{EWW-web}, it is said to be shown that no such formula even exists, given that the formula be constructed according to certain rules. Undoubtedly the theorem is interesting to the professional mathematician, but to the applied mathematician it probably suffices to observe merely that no such formula is known. } but the Newton-Raphson iteration (\S~\ref{drvtv:270}) can be used to locate a root numerically in any case. The Newton-Raphson is used to extract one root (\emph{any} root) at each step as described above, reducing the polynomial step by step until all the roots are found. The reverse problem, finding the polynomial given the roots, is much easier: one just multiplies out $\prod_j (z-\alpha_j)$, as in~(\ref{noth:320:50}). % ---------------------------------------------------------------------- \section{Addition and averages} \label{noth:420} This section discusses the two basic ways to add numbers and the three basic ways to calculate averages of them. \subsection{Serial and parallel addition} \label{noth:420.10} \index{parallel addition} \index{addition!parallel} \index{serial addition} \index{series addition} \index{addition!serial} \index{addition!series} \index{mason} Consider the following problem. There are three masons. The strongest and most experienced of the three, Adam, lays 120 bricks per hour.% \footnote{ The figures in the example are in decimal notation. } Next is Brian who lays 90. Charles is new; he lays only 60. Given eight hours, how many bricks can the three men lay? Answer: \[ (8\ \mbox{hours})( 120 + 90 + 60\ \mbox{bricks per hour} ) = 2160\ \mbox{bricks}. \] Now suppose that we are told that Adam can lay a brick every 30 seconds; Brian, every 40 seconds; Charles, every 60 seconds. How much time do the three men need to lay 2160 bricks? Answer: \bqb \frac{2160\ \mbox{bricks}}{ \frac{1}{30} + \frac{1}{40} + \frac{1}{60} \ \mbox{bricks per second} } &=& \mbox{28,800 \mbox{seconds}} \left(\frac{1\ \mbox{hour}}{3600\ \mbox{seconds}}\right) \\ &=& 8\ \mbox{hours}. \eqb The two problems are precisely equivalent. Neither is stated in simpler terms than the other. The notation used to solve the second is less elegant, but fortunately there exists a better notation: \[ (2160\ \mbox{bricks})(30 \,\|\, 40 \,\|\, 60\ \mbox{seconds per brick}) = 8\ \mbox{hours}, \] where \[ \frac{1}{ 30 \,\|\, 40 \,\|\, 60 } = \frac{1}{30} + \frac{1}{40} + \frac{1}{60}. \] The operator~$\|$ is called the \emph{parallel addition} operator. It works according to the law \bq{noth:420:10} \frac{1}{ a \,\|\, b } = \frac{1}{a} + \frac{1}{b}, \eq where the familiar operator~$+$ is verbally distinguished from the~$\|$ when necessary by calling the~$+$ the \emph{serial addition} or \emph{series addition} operator. With~(\ref{noth:420:10}) and a bit of arithmetic, the several parallel-addition identities of Table~\ref{noth:420:tbl-plad} are soon derived. \begin{table} \caption{Parallel and serial addition identities.} \label{noth:420:tbl-plad} \[ \renewcommand{\arraystretch}{2.0} \br{rclcrcl} \ds\frac{1}{a \,\|\, b} &=& \ds\frac{1}{a} + \frac{1}{b} &\ \ \ & \ds\frac{1}{a + b} &=& \ds\frac{1}{a} \,\|\, \frac{1}{b} \\ \ds a \,\|\, b &=& \ds \frac{ab}{a+b} &\sh{1.0}& \ds a + b &=& \ds \frac{ab}{a\,\|\,b} \\ \ds a \,\|\, \frac{1}{b} &=& \ds\frac{a}{1+ab} &\sh{1.0}& \ds a + \frac{1}{b} &=& \ds\frac{a}{1\,\|\,ab} \\ \ds a \,\|\, b &=& \ds b \,\|\, a &\sh{1.0}& \ds a + b &=& \ds b + a \\ \ds a \,\|\, (b\,\|\,c) &=& \ds (a\,\|\,b) \,\|\,c &\sh{1.0}& \ds a + (b+c) &=& \ds (a+b) + c \\ \ds a \,\|\, \infty = \ds \infty \,\|\, a &=& \ds a &\sh{1.0}& \ds a + 0 = \ds 0 + a &=& \ds a \\ \ds a \,\|\, (-a) &=& \ds \infty &\sh{1.0}& \ds a + (-a) &=& \ds 0 \\ \ds (a)(b\,\|\,c) &=& \ds ab \,\|\, ac &\sh{1.0}& \ds (a)(b+c) &=& \ds ab + ac \\ \ds \frac{1}{\sum_k\|\, a_k} &=& \ds \sum_k \frac{1}{a_k} &\sh{1.0}& \ds \frac{1}{\sum_k a_k} &=& \ds \sum_k\|\, \frac{1}{a_k} \er \] \end{table} The writer knows of no conventional notation for parallel sums of series, but suggests that the notation which appears in the table, \[ \sum_{k=a}^{b}\|\,f(k) \equiv f(a) \,\|\, f(a+1) \,\|\, f(a+2) \,\|\, \cdots \,\|\, f(b), \] might serve if needed. \index{iff} Assuming that none of the values involved is negative, one can readily show that% \footnote{ The word \emph{iff} means, ``if and only if.'' } \bq{noth:420:28} a\,\|\,x \le b\,\|\,x \ \ \mbox{iff}\ \ a \le b. \eq This is intuitive. Counterintuitive, perhaps, is that \bq{noth:420:29} a\,\|\,x \le a. \eq Because we have all learned as children to count in the sensible manner $1,2,3,4,5,\ldots$---rather than as $1,\frac 1 2,\frac 1 3,\frac 1 4, \frac 1 5,\ldots$---serial addition ($+$) seems more natural than parallel addition ($\|$) does. The psychological barrier is hard to breach, yet for many purposes parallel addition is in fact no less fundamental. Its rules are inherently neither more nor less complicated, as Table~\ref{noth:420:tbl-plad} illustrates; yet outside the electrical engineering literature the parallel addition notation is seldom seen.% \footnote{ In electric circuits, loads are connected in parallel as often as, in fact probably more often than, they are connected in series. Parallel addition gives the electrical engineer a neat way of adding the impedances of parallel-connected loads. } Now that you have seen it, you can use it. There is profit in learning to think both ways. (Exercise: counting from zero serially goes $0,1,2,3,4,5,\ldots$; how does the parallel analog go?)% \footnote{\cite[eqn.~1.27]{Sedra/Smith}} \index{parallel subtraction} \index{subtraction!parallel} Convention brings no special symbol for parallel subtraction, incidentally. One merely writes \[ a\,\|\,(-b), \] which means exactly what it appears to mean. \subsection{Averages} \label{noth:420.20} \index{average} \index{mean} \index{productivity} \index{businessman} \index{contract} Let us return to the problem of the preceding section. Among the three masons, what is their average productivity? The answer depends on how you look at it. On the one hand, \[ \frac{120 + 90 + 60\ \mbox{bricks per hour}}{3} = 90\ \mbox{bricks per hour}. \] On the other hand, \[ \frac{30 + 40 + 60\ \mbox{seconds per brick}}{3} = 43{\textstyle\frac{1}{3}}\ \mbox{seconds per brick}. \] These two figures are not the same. That is, $1/(43{\textstyle\frac{1}{3}}\ \mbox{seconds per brick}) \neq 90\ \mbox{bricks per hour}$. Yet both figures are valid. Which figure you choose depends on what you want to calculate. A common mathematical error among businessmen seems to be to fail to realize that both averages are possible and that they yield different numbers (if the businessman quotes in bricks per hour, the productivities average one way; if in seconds per brick, the other way; yet some businessmen will never clearly consider the difference). Realizing this, the clever businessman might negotiate a contract so that the average used worked to his own advantage.% \footnote{ ``And what does the author know about business?'' comes the rejoinder. The rejoinder is fair enough. If the author wanted to demonstrate his business acumen (or lack thereof) he'd do so elsewhere not here! There are a lot of good business books out there and this is not one of them. The fact remains nevertheless that businessmen sometimes use mathematics in peculiar ways, making relatively easy problems harder and more mysterious than the problems need to be. If you have ever encountered the little monstrosity of an approximation banks (at least in the author's country) actually use in place of~(\ref{inttx:240:29}) to accrue interest and amortize loans, then you have met the difficulty. Trying to convince businessmen that their math is wrong, incidentally, is in the author's experience usually a waste of time. Some businessmen are mathematically rather sharp---as you presumably are if you are in business and are reading these words---but as for most: when real mathematical ability is needed, that's what they hire engineers, architects and the like for. The author is not sure, but somehow he doubts that many boards of directors would be willing to bet the company on a financial formula containing some mysterious-looking~$e^x$. Business demands other talents. } \index{United States} \index{American} \index{House of Representatives} \index{Representatives, House of} \index{representative} \index{seat} \index{apportionment} \index{Constitution of the United States} \index{statute} \index{population} \index{republic} When it is unclear which of the two averages is more appropriate, a third average is available, the \emph{geometric mean} \[ \left[(120)(90)(60)\right]^{1/3} \ \mbox{bricks per hour}. \] The geometric mean does not have the problem either of the two averages discussed above has. The inverse geometric mean \[ \left[(30)(40)(60)\right]^{1/3} \ \mbox{seconds per brick} \] implies the same average productivity. The mathematically savvy sometimes prefer the geometric mean over either of the others for this reason.% % diagn: review the following footnote briefly once again.% \footnote{ The writer, an American, was recently, pleasantly surprised to learn that the formula his country's relevant federal statute stipulates to implement the Constitutional requirement that representation in the country's federal House of Representatives be apportioned by population actually, properly always apportions the next available seat in the House to the state whose \emph{geometric} mean of population per representative before and after apportionment would be greatest. Now, admittedly, the Republic does not rise or fall on the niceties of averaging techniques; but, nonetheless, some American who knew his mathematics was involved in the drafting of that statute! } \index{mean!arithmetic} \index{arithmetic mean} \index{mean!geometric} \index{geometric mean} \index{harmonic mean} \index{mean!harmonic} Generally, the \emph{arithmetic, geometric} and \emph{harmonic means} are defined \bqa \mu &\equiv& \frac{\sum_k w_kx_k}{\sum_k w_k} = \left(\sum_k\|\, \frac{1}{w_k}\right) \left(\sum_k w_kx_k\right), \label{noth:420:31}\\ \mu_\Pi &\equiv& \left[ \prod_j x_j^{w_j} \right]^{1/\sum_k w_k} = \left[ \prod_j x_j^{w_j} \right]^{\sum_k\|\, 1/w_k}, \label{noth:420:32}\\ \mu_\| &\equiv& \frac{\sum_k\|\, x_k/w_k}{\sum_k\|\, 1/w_k} = \left(\sum_k w_k\right) \left(\sum_k\|\, \frac{x_k}{w_k}\right), \label{noth:420:33} \eqa where the~$x_k$ are the several samples and the~$w_k$ are weights. For two samples weighted equally, these are \bqa \mu &=& \frac{a+b}{2}, \label{noth:420:36}\\ \mu_\Pi &=& \sqrt{ab}, \label{noth:420:37}\\ \mu_\| &=& 2 (a\,\|\,b). \label{noth:420:38} \eqa \index{proving backward} If $a\ge 0$ and $b\ge 0$, then by successive steps,% \footnote{\label{noth:420:85}% The steps are logical enough, but the motivation behind them remains inscrutable until the reader realizes that the writer originally worked the steps out backward with his pencil, from the last step to the first. Only then did he reverse the order and write the steps formally here. The writer had no idea that he was supposed to start from $0 \le (a-b)^2$ until his pencil working backward showed him. ``Begin with the end in mind,'' the saying goes. In this case the saying is right. The same reading strategy often clarifies inscrutable math. When you can follow the logic but cannot understand what could possibly have inspired the writer to conceive the logic in the first place, try reading backward. }% \[ \renewcommand{\arraystretch}{2.0} \br{rcccl} \ds 0 &\le& \multicolumn{3}{l}{\ds (a-b)^2,} \\ \ds 0 &\le& \multicolumn{3}{l}{\ds a^2-2ab+b^2,} \\ \ds 4ab &\le& \multicolumn{3}{l}{\ds a^2+2ab+b^2,} \\ \ds 2\sqrt{ab} &\le& \multicolumn{3}{l}{\ds a+b,} \\ \ds \frac{2\sqrt{ab}}{a+b} &\le& \ds 1 &\le& \ds \frac{a+b}{2\sqrt{ab}}, \\ \ds \frac{2ab}{a+b} &\le& \ds \sqrt{ab} &\le& \ds \frac{a+b}{2}, \\ \ds 2 (a\,\|\,b) &\le& \ds \sqrt{ab} &\le& \ds \frac{a+b}{2}. \er \] That is, \bq{noth:420:40} \mu_\| \le \mu_\Pi \le \mu. \eq The arithmetic mean is greatest and the harmonic mean, least; with the geometric mean falling between. Does~(\ref{noth:420:40}) hold when there are several nonnegative samples of various nonnegative weights? To show that it does, consider the case of $N=2^m$ nonnegative samples of equal weight. Nothing prevents one from dividing such a set of samples in half, considering each subset separately, for if~(\ref{noth:420:40}) holds for each subset individually then surely it holds for the whole set (this is so because the average of the whole set is itself the \emph{average of the two subset averages,} where the word ``average'' signifies the arithmetic, geometric or harmonic mean as appropriate). But each subset can further be divided in half, then each subsubset can be divided in half again, and so on until each smallest group has two members only---in which case we already know that~(\ref{noth:420:40}) obtains. Starting there and recursing back, we have that~(\ref{noth:420:40}) obtains for the entire set. Now consider that a sample of any weight can be approximated arbitrarily closely by several samples of weight $1/2^m$, provided that~$m$ is sufficiently large. By this reasoning,~(\ref{noth:420:40}) holds for any nonnegative weights of nonnegative samples, which was to be demonstrated. derivations-0.53.20120414.orig/tex/vector.tex0000644000000000000000000031402011742566274017163 0ustar rootroot% ---------------------------------------------------------------------- % This chapter's title is named again later in the book. Thus this hack % (which will not work if the title is named again earlier in the book; % I should look up the right way to do this). ---THB--- \newcommand\chaptertitlevector{Vector analysis} \chapter{\chaptertitlevector} \label{vector} \index{vector} \index{vector analysis} \index{vector!three-dimensional geometrical} \index{three-dimensional geometrical vector} \index{geometrical vector} \index{vector algebra} \index{algebra!of the vector} \index{physical world} \index{world, physical} Leaving the matrix, this chapter and the next turn to a curiously underappreciated agent of applied mathematics, the three-dimensional geometrical vector, first met in \S\S~\ref{trig:230}, \ref{trig:240} and~\ref{trig:277}. Seen from one perspective, the three-dimensional geometrical vector is the $n=3$ special case of the general, $n$-dimensional vector of Chs.~\ref{matrix} through~\ref{eigen}. Because its three elements represent the three dimensions of the physical world, however, the three-dimensional geometrical vector merits closer attention and special treatment.% \footnote{\cite[Ch.~2]{Cheng}} \index{vector!matrix} \index{matrix vector} \index{vector!$n$-dimensional} \index{$n$-dimensional vector} It also merits a shorter name. Where the geometrical context is clear---as it is in this chapter and the next---we will call the three-dimensional geometrical vector just a \emph{vector.} A name like ``matrix vector'' or ``$n$-dimensional vector'' can disambiguate the vector of Chs.~\ref{matrix} through~\ref{eigen} where necessary but, since the three-dimensional geometrical vector is in fact a vector, it usually is not necessary to disambiguate. The lone word \emph{vector} serves. \index{amplitude} \index{direction} \index{coordinate} \index{rectangular coordinates} \index{cylindrical coordinates} \index{spherical coordinates} In the present chapter's context and according to \S~\ref{trig:230}, a vector consists of an amplitude of some kind plus a direction. Per \S~\ref{trig:277}, three scalars called \emph{coordinates} suffice together to specify the amplitude and direction and thus the vector, the three being $(x,y,x)$ in the rectangular coordinate system, $(\rho;\phi,z)$ in the cylindrical coordinate system, or $(r;\theta;\phi)$ in the spherical spherical coordinate system---as Fig.~\ref{vector:fig-sphere} illustrates and Table~\ref{trig:277:20} on page~\pageref{trig:277:20} interrelates---among other, more exotic possibilities (\S~\ref{vector:280}).% \begin{figure} \caption[A point on a sphere.]{% A point on a sphere, in spherical $(r;\theta;\phi)$ and cylindrical $(\rho;\phi,z)$ coordinates. (The axis labels bear circumflexes in this figure only to disambiguate the~$\hat z$ axis from the cylindrical coordinate~$z$. See also Fig.~\ref{vector:230:fig-spherical-basis}.)% }% \label{vector:fig-sphere} \index{sphere} \sphere \end{figure} \index{vector notation} \index{notation!of the vector} \index{rotation} \index{axes!rotation of} The vector brings an elegant notation. This chapter and Ch.~\ref{vcalc} detail it. Without the notation, one would write an expression like \[ \frac{ (z-z') -\left[\partial z'/\partial x\right]_{x=x',y=y'} (x-x') -\left[\partial z'/\partial y\right]_{x=x',y=y'} (y-y') }{ \sqrt{ \left[ 1 + (\partial z'/\partial x)^2 + (\partial z'/\partial y)^2 \right]_{x=x',y=y'} \left[(x-x')^2 + (y-y')^2 + (z-z')^2\right] } } \] for the aspect coefficient relative to a local surface normal (and if the sentence's words do not make sense to you yet, don't worry; just look the symbols over and appreciate the expression's bulk). The same coefficient in standard vector notation is \[ \vu n \cdot \Delta \vu r. \] Besides being more evocative (once one has learned to read it) and much more compact, the standard vector notation brings the major advantage of freeing a model's geometry from reliance on any particular coordinate system. Reorienting axes (\S~\ref{vector:210}) for example knots the former expression like spaghetti but does not disturb the latter expression at all. \index{vector!two-dimensional geometrical} \index{two-dimensional geometrical vector} Two-dimensional geometrical vectors arise in practical modeling about as often as three-dimensional geometrical vectors do. Fortunately, the two-dimensional case needs little special treatment, for it is just the three-dimensional with $z=0$ or $\theta = 2\pi/4$ (see however \S~\ref{vector:270}). \index{number!real} \index{real number} \index{coordinate!real} \index{coordinate!complex} \index{real coordinate} \index{complex coordinate} Here at the outset, a word on complex numbers seems in order. Unlike most of the rest of the book this chapter and the next will work chiefly in real numbers, or at least in real coordinates. Notwithstanding, complex coordinates are possible. Indeed, in the rectangular coordinate system complex coordinates are perfectly appropriate and are straightforward enough to handle. The cylindrical and spherical systems however, which these chapters also treat, were not conceived with complex coordinates in mind; and, although it might with some theoretical subtlety be possible to treat complex radii, azimuths and elevations consistently as three-dimensional coordinates, these chapters will not try to do so.% \footnote{ The author would be interested to learn if there existed an uncontrived scientific or engineering application that actually used complex, nonrectangular coordinates. } (This is not to say that you cannot have a complex vector like, say, $\wu\rho[3+j2] - \wu\phi[1/4]$ in a nonrectangular basis. You can have such a vector, it is fine, and these chapters will not avoid it. What these chapters will avoid are complex nonrectangular \emph{coordinates} like $[3+j2;-1/4,0]$.) \index{vector!addition of} \index{addition!of vectors} Vector addition will already be familiar to the reader from Ch.~\ref{trig} or (quite likely) from earlier work outside this book. This chapter therefore begins with the reorientation of axes in \S~\ref{vector:210} and vector multiplication in \S~\ref{vector:220}. % ---------------------------------------------------------------------- \section{Reorientation} \label{vector:210} \index{reorientation} \index{axes!reorientation of} Matrix notation expresses the rotation of axes~(\ref{trig:240:10}) as \[ \mf{c}{\vu x' \\ \vu y' \\ \vu z'} = \mf{ccc}{ \cos\phi & \sin\phi & 0 \\ -\sin\phi & \cos\phi & 0 \\ 0 & 0 & 1 } \mf{c}{\vu x \\ \vu y \\ \vu z}. \] In three dimensions however one can do more than just to rotate the~$x$ and~$y$ axes about the~$z$. One can reorient the three axes generally as follows. \subsection{The Tait-Bryan rotations} \label{vector:210.20} \index{rotation!Tait-Bryan or Cardano} \index{Tait-Bryan rotations} \index{Cardano, Girolamo (also known as Cardanus or Cardan, 1501--1576)} \index{Cardano rotations} \index{Tait, Peter Guthrie (1831--1901)} \index{Bryan, George Hartley (1864--1928)} \index{yaw} \index{pitch} \index{roll} With a \emph{yaw} and a \emph{pitch} to point the~$x$ axis in the desired direction plus a \emph{roll} to position the~$y$ and~$z$ axes as desired about the new~$x$ axis,% \footnote{ The English maritime verbs \emph{to yaw, to pitch} and \emph{to roll} describe the rotational motion of a vessel at sea. For a vessel to yaw is for her to rotate about her vertical axis, so her bow (her forwardmost part) yaws from side to side. For a vessel to pitch is for her to rotate about her ``beam axis,'' so her bow pitches up and down. For a vessel to roll is for her to rotate about her ``fore-aft axis'' such that she rocks or lists (leans) without changing the direction she points~\cite[``Glossary of nautical terms,'' 23:00, 20 May 2008]{wikip}\@. In the Tait-Bryan rotations as explained in this book, to yaw is to rotate about the~$z$ axis, to pitch about the~$y$, and to roll about the~$x$~\cite{Jones/Fjeld}. In the Euler rotations as explained in this book later in the present section, however, the axes are assigned to the vessel differently such that to yaw is to rotate about the~$x$ axis, to pitch about the~$y$, and to roll about the~$z$. This implies that the Tait-Bryan vessel points $x$-ward whereas the Euler vessel points $z$-ward. The reason to shift perspective so is to maintain the semantics of the symbols~$\theta$ and~$\phi$ (though not~$\psi$) according to Fig.~\ref{vector:fig-sphere}. If this footnote seems confusing, then read~(\ref{vector:210:20}) and~(\ref{vector:210:40}) which are correct. } one can reorient the three axes generally: \bq{vector:210:20} \mf{c}{\vu x' \\ \vu y' \\ \vu z'} = \mf{ccc}{ 1 & 0 & 0 \\ 0 & \cos\psi & \sin\psi \\ 0 & -\sin\psi & \cos\psi } \mf{ccc}{ \cos\theta & 0 & -\sin\theta \\ 0 & 1 & 0 \\ \sin\theta & 0 & \cos\theta } \mf{ccc}{ \cos\phi & \sin\phi & 0 \\ -\sin\phi & \cos\phi & 0 \\ 0 & 0 & 1 } \mf{c}{\vu x \\ \vu y \\ \vu z}; \eq or, inverting per~(\ref{trig:240:12}), \bq{vector:210:25} \mf{c}{\vu x \\ \vu y \\ \vu z} = \mf{ccc}{ \cos\phi & -\sin\phi & 0 \\ \sin\phi & \cos\phi & 0 \\ 0 & 0 & 1 } \mf{ccc}{ \cos\theta & 0 & \sin\theta \\ 0 & 1 & 0 \\ -\sin\theta & 0 & \cos\theta } \mf{ccc}{ 1 & 0 & 0 \\ 0 & \cos\psi & -\sin\psi \\ 0 & \sin\psi & \cos\psi } \mf{c}{\vu x' \\ \vu y' \\ \vu z'}. \eq These are called the \emph{Tait-Bryan rotations,} or alternately the \emph{Cardano rotations.}% \footnote{ The literature seems to agree on no standard order among the three Tait-Bryan rotations; and, though the rotational angles are usually named~$\phi$, $\theta$ and~$\psi$, which angle gets which name admittedly depends on the author. }$\mbox{}^{,}$% \footnote{ \cite{CERN-Alice-delta} } Notice in~(\ref{vector:210:20}) and~(\ref{vector:210:25}) that the transpose (though curiously not the adjoint) of each $3 \times 3$ Tait-Bryan factor is also its inverse. In concept, the Tait-Bryan equations~(\ref{vector:210:20}) and~(\ref{vector:210:25}) say nearly all one needs to say about reorienting axes in three dimensions; but, still, the equations can confuse the uninitiated. Consider a vector \bq{vector:210:v} \ve v = \vu x x + \vu y y + \vu z z. \eq It is not the vector one reorients but rather the axes used to describe the vector. Envisioning the axes as in Fig.~\ref{vector:fig-sphere} with the~$z$ axis upward, one first yaws the~$x$ axis through an angle~$\phi$ toward the~$y$ then pitches it downward through an angle~$\theta$ away from the~$z$. Finally, one rolls the~$y$ and~$z$ axes through an angle~$\psi$ about the new~$x$, all the while maintaining the three axes rigidly at right angles to one another. These three Tait-Bryan rotations can orient axes any way. Yet, even once one has clearly visualized the Tait-Bryan sequence, the prospect of applying~(\ref{vector:210:25}) (which inversely represents the sequence) to~(\ref{vector:210:v}) can still seem daunting until one rewrites the latter equation in the form \bq{vector:210:28} \ve v = \left[ \br{ccc} \vu x & \vu y & \vu z \er \right] \left[ \br{c} x\\y\\z \er \right], \eq after which the application is straightforward. There results \[ \ve v' = \vu x' x' + \vu y' y' + \vu z' z', \] where \bq{vector:210:30} \mf{c}{x' \\ y' \\ z'} \equiv \mf{ccc}{ 1 & 0 & 0 \\ 0 & \cos\psi & \sin\psi \\ 0 & -\sin\psi & \cos\psi } \mf{ccc}{ \cos\theta & 0 & -\sin\theta \\ 0 & 1 & 0 \\ \sin\theta & 0 & \cos\theta } \mf{ccc}{ \cos\phi & \sin\phi & 0 \\ -\sin\phi & \cos\phi & 0 \\ 0 & 0 & 1 } \mf{c}{x \\ y \\ z}, \eq and where Table~\ref{trig:277:20} converts to cylindrical or spherical coordinates if and as desired. Since~(\ref{vector:210:30}) resembles~(\ref{vector:210:20}), it comes as no surprise that its inverse, \bq{vector:210:35} \mf{c}{x \\ y \\ z} = \mf{ccc}{ \cos\phi & -\sin\phi & 0 \\ \sin\phi & \cos\phi & 0 \\ 0 & 0 & 1 } \mf{ccc}{ \cos\theta & 0 & \sin\theta \\ 0 & 1 & 0 \\ -\sin\theta & 0 & \cos\theta } \mf{ccc}{ 1 & 0 & 0 \\ 0 & \cos\psi & -\sin\psi \\ 0 & \sin\psi & \cos\psi } \mf{c}{x' \\ y' \\ z'}, \eq resembles~(\ref{vector:210:25}). \subsection{The Euler rotations} \label{vector:210.25} \index{rotation!Euler} \index{Euler rotations} \index{Euler, Leonhard (1707--1783)} A useful alternative to the Tait-Bryan rotations are the \emph{Euler rotations,} which view the problem of reorientation from the perspective of the~$z$ axis rather than of the~$x$. The Euler rotations consist of a roll and a pitch followed by another roll, without any explicit yaw:% \footnote{ As for the Tait-Bryan, for the Euler also the literature agrees on no standard sequence. What one author calls a pitch, another might call a yaw, and some prefer to roll twice about the~$x$ axis rather than the~$z$. What makes a reorientation an Euler rather than a Tait-Bryan is that the Euler rolls twice. } \bq{vector:210:40} \mf{c}{\vu x' \\ \vu y' \\ \vu z'} = \mf{ccc}{ \cos\psi & \sin\psi & 0 \\ -\sin\psi & \cos\psi & 0 \\ 0 & 0 & 1 } \mf{ccc}{ \cos\theta & 0 & -\sin\theta \\ 0 & 1 & 0 \\ \sin\theta & 0 & \cos\theta } \mf{ccc}{ \cos\phi & \sin\phi & 0 \\ -\sin\phi & \cos\phi & 0 \\ 0 & 0 & 1 } \mf{c}{\vu x \\ \vu y \\ \vu z}; \eq and inversely \bq{vector:210:45} \mf{c}{\vu x \\ \vu y \\ \vu z} = \mf{ccc}{ \cos\phi & -\sin\phi & 0 \\ \sin\phi & \cos\phi & 0 \\ 0 & 0 & 1 } \mf{ccc}{ \cos\theta & 0 & \sin\theta \\ 0 & 1 & 0 \\ -\sin\theta & 0 & \cos\theta } \mf{ccc}{ \cos\psi & -\sin\psi & 0 \\ \sin\psi & \cos\psi & 0 \\ 0 & 0 & 1 } \mf{c}{\vu x' \\ \vu y' \\ \vu z'}. \eq Whereas the Tait-Bryan point the~$x$ axis first, the Euler tactic is to point first the~$z$. So, that's it. One can reorient three axes arbitrarily by rotating them in pairs about the~$z$, $y$ and~$x$ or the~$z$, $y$ and~$z$ axes in sequence---or, generalizing, in pairs about any of the three axes so long as the axis of the middle rotation differs from the axes (Tait-Bryan) or axis (Euler) of the first and last. A firmer grasp of the reorientation of axes in three dimensions comes with practice, but those are the essentials of it. % ---------------------------------------------------------------------- \section{Multiplication} \label{vector:220} \index{vector!multiplication of} \index{multiplication!of a vector} \index{product!of vectors} \index{multiplication!of a vector by a scalar} \index{scalar multiplication!of a vector} \index{vector!scalar multiplication of} \index{product!of a vector and a scalar} One can multiply a vector in any of three ways. The first, scalar multiplication, is trivial: if a vector~$\ve v$ is as defined by~(\ref{vector:210:v}), then \bq{vector:220:02} \psi \ve v = \vu x \psi x + \vu y \psi y + \vu z \psi z. \eq Such scalar multiplication evidently scales a vector's length without diverting its direction. The other two forms of vector multiplication involve multiplying a vector by another vector and are the subjects of the two subsections that follow. \subsection{The dot product} \label{vector:220.20} \index{dot product} \index{vector!dot or inner product of two} \index{cosine} \index{angle} \index{invariance under the reorientation of axes} \index{reorientation!invariance under} \index{axes!invariance under the reorientation of} \index{orthogonal vector} \index{vector!orthogonal} \index{vector!angle between two} We first met the dot product in \S~\ref{mtxinv:445}. It works similarly for the geometrical vectors of this chapter as for the matrix vectors of Ch.~\ref{mtxinv}: \bq{vector:dot1} \ve v_1 \cdot \ve v_2 = x_1 x_2 + y_1 y_2 + z_1 z_2, \eq which, if the vectors~$\ve v_1$ and~$\ve v_2$ are real, is the product of the two vectors to the extent to which they run in the same direction. It is the product to the extent to which the vectors run in the same direction because one can reorient axes to point~$\vu x'$ in the direction of~$\ve v_1$, whereupon $\ve v_1 \cdot \ve v_2 = x_1'x_2'$ since~$y_1'$ and~$z_1'$ have vanished. Naturally, to be valid, the dot product must not vary under a reorientation of axes; and indeed if we write~(\ref{vector:dot1}) in matrix notation, \bq{vector:dotm} \ve v_1 \cdot \ve v_2 = \left[ \br{ccc} x_1 & y_1 & z_1 \er \right] \left[ \br{c} x_2 \\ y_2 \\ z_2 \er \right], \eq and then expand each of the two factors on the right according to~(\ref{vector:210:35}), we see that the dot product does not in fact vary. As in~(\ref{mtxinv:445:35}) of \S~\ref{mtxinv:445}, here too the relationship \bq{vector:220:23} \begin{split} \ve v_1^{*} \cdot \ve v_2 &= v_1^{*} v_2 \cos \theta, \\ \vu v_1^{*} \cdot \vu v_2 &= \cos \theta, \end{split} \eq gives the angle~$\theta$ between two vectors according Fig.~\ref{trig:226:f1}'s cosine if the vectors are real, by definition hereby if complex. Consequently, the two vectors are mutually orthogonal---that is, the vectors run at right angles $\theta = 2\pi/4$ to one another---if and only if \[ \ve v_1^{*} \cdot \ve v_2 = 0. \] \index{commutivity!of the dot product} \index{dot product!commutivity of} That the dot product is commutative, \bq{vector:220:26} \ve v_2 \cdot \ve v_1 = \ve v_1 \cdot \ve v_2, \eq is obvious from~(\ref{vector:dot1}). Fig.~\ref{vector:220:fig-dot} illustrates the dot product. \begin{figure} \caption{The dot product.} \label{vector:220:fig-dot} \bc \nc\fxa{-0.7} \nc\fxb{6.1} \nc\fya{-0.8} \nc\fyb{2.0} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \small \psset{dimen=middle} \psline[linewidth=2.0pt]{<->}(4,0)(0,0)(2.7713,1.6) { \psset{linewidth=0.5pt} \psarc{->}(0,0){0.8}{0}{30} \rput(1.00,0.27){$\theta$} \rput(2.7713,0){ \nc\xxa{0.3} \psline[linestyle=dashed](0,1.6)(0,0) \psline(-\xxa,0)(-\xxa,\xxa)(0,\xxa) } \rput(1.3256,-0.25){ \nc\xxa{1.4457} \nc\xxb{0.1} \nc\xxc{0.65} \psline{->}( \xxc,0)( \xxa,0) \psline{->}(-\xxc,0)(-\xxa,0) \psline( \xxa,\xxb)( \xxa,-\xxb) \psline(-\xxa,\xxb)(-\xxa,-\xxb) \rput[c](0,0){$b\cos\theta$} } } \rput(4.25,0.00){$\ve a$} \rput(1.45,1.20){$\ve b$} \rput[l](3.50,0.90){$\ve a \cdot \ve b = ab \cos \theta$} } \end{pspicture} \ec \end{figure} \subsection{The cross product} \label{vector:220.30} \index{cross product} \index{mnemonic} The dot product of two vectors according to \S~\ref{vector:220.20} is a scalar. One can also multiply two vectors to obtain a vector, however, and it is often useful to do so. As the dot product is the product of two vectors to the extent to which they run in the same direction, the \emph{cross product} is the product of two vectors to the extent to which they run in different directions. Unlike the dot product the cross product is a vector, defined in rectangular coordinates as \bqa \ve v_1 \times \ve v_2 &=& \left| \br{ccc} \vu x & \vu y & \vu z \\ x_1 & y_1 & z_1 \\ x_2 & y_2 & z_2 \er \right| \label{vector:cross1} \\&\equiv& % The following throws the above off-center, which visually does % not look right, but available alternatives seem worse. \vu x (y_1z_2 - z_1y_2) + \vu y (z_1x_2 - x_1z_2) + \vu z (x_1y_2 - y_1x_2), \xn \eqa where the~$\left|\cdot\right|$ notation is a mnemonic (actually a pleasant old determinant notation \S~\ref{eigen:310} could have but did not happen to use) whose semantics are as shown. \index{invariance under the reorientation of axes} \index{reorientation!invariance under} \index{axes!invariance under the reorientation of} As the dot product, the cross product too is invariant under reorientation. One could demonstrate this fact by multiplying out~(\ref{vector:210:25}) and~(\ref{vector:210:35}) then substituting the results into~(\ref{vector:cross1}): a lengthy, unpleasant exercise. Fortunately, it is also an unnecessary exercise; for, inasmuch as a reorientation consists of three rotations in sequence, it suffices merely that rotation about one axis not alter the dot product. One proves the proposition in the latter form by setting any two of~$\phi$, $\theta$ and~$\psi$ to zero before multiplying out and substituting. Several facets of the cross product draw attention to themselves. \bi \item \index{cyclic progression of coordinates} \index{progression of coordinates, cyclic} \index{coordinates!cyclic progression of} \index{parity} \index{right-hand rule} The cyclic progression \bq{vector:220:36} \cdots \ra x \ra y \ra z \ra x \ra y \ra z \ra x \ra y \ra \cdots \eq of~(\ref{vector:cross1}) arises again and again in vector analysis. Where the progression is honored, as in~$\vu z x_1y_2$, the associated term bears a~$+$ sign, otherwise a~$-$ sign, due to \S~\ref{matrix:322}'s parity principle and the right-hand rule. \item \index{commutivity!noncommutivity of the cross product} \index{noncommutivity!of the cross product} \index{cross product!noncommutivity of} The cross product is not commutative. In fact, \bq{vector:220:32} \ve v_2 \times \ve v_1 = -\ve v_1 \times \ve v_2, \eq which is a direct consequence of the previous point regarding parity, or which can be seen more prosaically in~(\ref{vector:cross1}) by swapping the places of~$\ve v_1$ and~$\ve v_2$. \item \index{associativity!nonassociativity of the cross product} \index{nonassociativity!of the cross product} \index{cross product!nonassociativity of} The cross product is not associative. That is, \[ (\ve v_1 \times \ve v_2) \times \ve v_3 \neq \ve v_1 \times (\ve v_2 \times \ve v_3), \] as is proved by a suitable counterexample like $\ve v_1 = \ve v_2 = \vu x$, $\ve v_3 = \vu y$. \item \index{cross product!perpendicularity of} The cross product runs perpendicularly to each of its two factors if the vectors involved are real. That is, \bq{vector:220:35} \ve v_1 \cdot ( \ve v_1 \times \ve v_2 ) = 0 = \ve v_2 \cdot ( \ve v_1 \times \ve v_2 ), \eq as is seen by substituting~(\ref{vector:cross1}) into~(\ref{vector:dot1}) with an appropriate change of variables and simplifying. \item \index{space!three-dimensional} \index{three-dimensional space} \index{space!two-dimensional} \index{two-dimensional space} Unlike the dot product, the cross product is closely tied to three-dimensional space. Two-dimensional space (a plane) can have a cross product so long as one does not mind that the product points off into the third dimension, but to speak of a cross product in four-dimensional space would require arcane definitions and would otherwise make little sense. Fortunately, the physical world is three-dimensional (or, at least, the space in which we model all but a few, exotic physical phenomena is three-dimensional), so the cross product's limitation as defined here to three dimensions will seldom if ever disturb us. \item \index{sine} \index{angle} \index{electromagnetic power} \index{power!electromagnetic} Section~\ref{vector:220.20} %Equation~(\ref{vector:220:23}) has related the cosine of the angle between vectors to the dot product. One can similarly relate the angle's sine to the cross product if the vectors involved are real, as \bq{vector:220:33} \begin{split} |\ve v_1 \times \ve v_2| &= v_1v_2 \sin \theta, \\ |\vu v_1 \times \vu v_2| &= \sin \theta, \end{split} \eq demonstrated by reorienting axes such that $\vu v_1 = \vu x'$, that~$\vu v_2$ has no component in the~$\vu z'$ direction, and that~$\vu v_2$ has only a nonnegative component in the~$\vu y'$ direction; by remembering that reorientation cannot alter a cross product; and finally by applying~(\ref{vector:cross1}) and comparing the result against Fig.~\ref{trig:226:f1}'s sine. (If the vectors involved are complex then nothing prevents the operation $|\ve v_1^{*} \times \ve v_2|$ by analogy with eqn.~\ref{vector:220:23}---in fact the operation $\ve v_1^{*} \times \ve v_2$ without the magnitude sign is used routinely to calculate electromagnetic power flow% \footnote{\cite[eqn.~1-51]{Harrington}}---% but each of the cross product's three rectangular components has its own complex phase which the magnitude operation flattens, so the result's relationship to the sine of an angle is not immediately clear.) \ei Fig.~\ref{vector:220:fig-cross} illustrates the cross product. \begin{figure} \caption{The cross product.} \label{vector:220:fig-cross} \bc \nc\fxa{-2.5} \nc\fxb{4.5} \nc\fya{-1.7} \nc\fyb{3.4} \nc\xxs{0.20} % Perspective ratio of the x-y circle. \nc\xxphir{0.45} % Angle dimension radius. \nc\xxa{4.0} \nc\xxb{3.0} \nc\xxu{-75} \nc\xxv{60} \nc\xxw{2.8pt} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) \localscalebox{1.0}{1.0}{ \small \psset{dimen=middle} % The slope of the plane's sides on the page is 4:1. \pspolygon[linewidth=0.5pt,linecolor=lightgray,fillstyle=solid,fillcolor=lightgray] (-1.5,1.1)(4.1,1.1)(3.5,-1.3)(-2.1,-1.3) \localscalebox{1.0}{\xxs}{ \psarc[linewidth=1.0pt]{->}(0,0){\xxphir}{\xxu}{\xxv} \rput{\xxu}(0,0){\psline[linewidth=\xxw]{c->}(0,0)(\xxa,0)} \rput{\xxv}(0,0){\psline[linewidth=\xxw]{c->}(0,0)(\xxa,0)} } \psline[linewidth=2.0pt]{->}(0,0)(0,\xxb) { \psset{linewidth=0.5pt} % Here is an Octave function to calculate the coordinates for % the right-angle symbols: % function res = coords(r,phi,a) % p = phi/180*pi; % res(1)=0; % res(2)=r; % res(3)=r*cos(p); % res(4)=res(2)+a*r*sin(p); % res(5)=res(3); % res(6)=res(4)-r; % end \psline(0.00000,0.45000)(0.11647,0.36307)(0.11647,-0.08693) \psline(0.00000,0.45000)(0.22500,0.52794)(0.22500,0.07794) } \rput(0.62,-0.08){$\theta$} \rput(1.25,-0.95){$\ve a$} \rput(2.25,0.85){$\ve b$} \rput[l](0.00,2.40){ \setlength\arraycolsep{0.15em} $ \br{rcl} \ve c &=& \ve a \times \ve b \\ &=& \vu c ab \sin \theta \er $ } } \end{pspicture} \ec \end{figure} % ---------------------------------------------------------------------- \section{Orthogonal bases} \label{vector:230} \index{orthogonal basis} \index{basis, orthogonal} A vector exists independently of the components by which one expresses it, for, whether $\ve q = \vu x x + \vu y y + \vu z z$ or $\ve q = \vu x' x' + \vu y' y' + \vu z' z'$, it remains the same vector~$\ve q$. However, where a model involves a circle, a cylinder or a sphere, where a model involves a contour or a curved surface of some kind, to choose~$\vu x'$, $\vu y'$ and~$\vu z'$ wisely can immensely simplify the model's analysis. Normally one requires that~$\vu x'$, $\vu y'$ and~$\vu z'$ each retain unit length, run perpendiclarly to one another, and obey the right-hand rule (\S~\ref{trig:230}), but otherwise any~$\vu x'$, $\vu y'$ and~$\vu z'$ can serve. Moreover, a model can specify various~$\vu x'$, $\vu y'$ and~$\vu z'$ under various conditons, for nothing requires the three to be constant. \index{winding road} \index{road!winding} \index{automobile} \index{speed} \index{velocity} \index{wind} \index{headwind} \index{crosswind} Recalling the constants and variables of \S~\ref{alggeo:231}, such a concept is flexible enough to confuse the uninitiated severely and soon. As in \S~\ref{alggeo:231}, here too an example affords perspective. Imagine driving your automobile down a winding road, where~$q$ represented your speed% \footnote{\label{vector:230:fn1}% Conventionally one would prefer the letter~$v$ to represent speed, with velocity as~$\ve v$ which in the present example would happen to be $\ve v = \wu\ell v$. However, this section will require the letter~$v$ for an unrelated purpose. } and~$\wu\ell$ represented the direction the road ran, not generally, but just at the spot along the road at which your automobile momentarily happened to be. That your velocity were~$\wu\ell q$ meant that you kept skilfully to your lane; on the other hand, that your velocity were $(\wu\ell \cos\psi + \vu v \sin\psi)q$---where~$\vu v$, at right angles to~$\wu\ell$, represented the direction right-to-left across the road---would have you drifting out of your lane at an angle~$\psi$. A headwind had velocity~$-\wu\ell q_{\mr{wind}}$; a crosswind,~$\pm\vu v q_{\mr{wind}}$. A car a mile ahead of you had velocity $\wu\ell_2q_2=(\wu\ell \cos\beta + \vu v \sin\beta)q_2$, where~$\beta$ represented the difference (assuming that the other driver kept skilfully to his own lane) between the road's direction a mile ahead and its direction at your spot. For all these purposes the unit vector~$\wu\ell$ would remain constant. However, fifteen seconds later, after you had rounded a bend in the road, the symbols~$\wu\ell$ and~$\vu v$ would by definition represent different vectors than before, with respect to which one would express your new velocity as~$\wu\ell q$ but would no longer express the headwind's velocity as~$-\wu\ell q_{\mr{wind}}$ because, since the road had turned while the wind had not, the wind would no longer be a headwind. And this is where confusion can arise: your own velocity had changed while the expression representing it had not; whereas the wind's velocity had not changed while the expression representing \emph{it} had. This is not because~$\wu\ell$ differs from place to place at a given moment, for like any other vector the vector~$\wu\ell$ (as defined in this particular example) is the same vector everywhere. Rather, it is because~$\wu\ell$ is defined relative to the road at your automobile's location, which location changes as you drive. \index{basis!constant} \index{basis!variable} \index{orthogonal basis!variable} \index{model} If a third unit vector~$\vu w$ were defined, perpendicular both to~$\wu\ell$ and to~$\vu v$ such that $[\wu\ell\;\vu v\;\vu w]$ obeyed the right-hand rule, then the three together would constitute an \emph{orthogonal basis.} Any three real,% \footnote{ A complex orthogonal basis is also theoretically possible but is normally unnecessary in geometrical applications and involves subtleties in the cross product. This chapter, which specifically concerns three-dimensional geometrical vectors rather than the general, $n$-dimensional vectors of Ch.~\ref{matrix}, is content to consider real bases only. Note that one can express a complex vector in a real basis. } right-handedly mutually perpendicular unit vectors $[\vu x'\;\vu y'\;\vu z']$ in three dimensions, whether constant or variable, for which \bq{vector:230:10} \index{right-hand rule} \setlength\arraycolsep{0.30\arraycolsep} \br{rclcrclcrcl} \vu y' \cdot \vu z' &=& 0, &\ \ \ \ & \vu y' \times \vu z' &=& \vu x', &\ \ \ \ & \Im(\vu x') &=& 0,\\ \vu z' \cdot \vu x' &=& 0, && \vu z' \times \vu x' &=& \vu y', && \Im(\vu y') &=& 0,\\ \vu x' \cdot \vu y' &=& 0, && \vu x' \times \vu y' &=& \vu z', && \Im(\vu z') &=& 0, \er \eq constitutes such an orthogonal basis, from which other vectors can be built. The geometries of some models suggest no particular basis, when one usually just uses a constant $[\vu x\;\vu y\;\vu z]$. The geometries of other models however do suggest a particular basis, often a variable one. \bi \item \index{contour} Where the model features a contour like the example's winding road, an $[\wu\ell\;\vu v\;\vu w]$ basis (or a $[\vu u\;\vu v\;\wu\ell]$ basis or even a $[\vu u\;\wu\ell\;\vu w]$ basis) can be used, where~$\wu\ell$ locally follows the contour. The variable unit vectors~$\vu v$ and~$\vu w$ (or~$\vu u$ and~$\vu v$, etc.)\ can be defined in any convenient way so long as they remain perpendicular to one another and to~$\wu\ell$---such that $(\vu z \times \wu\ell) \cdot \vu w = 0$ for instance (that is, such that~$\vu w$ lay in the plane of~$\vu z$ and~$\wu\ell$)---but if the geometry suggests a particular~$\vu v$ or~$\vu w$ (or~$\vu u$), like the direction right-to-left across the example's road, then that~$\vu v$ or~$\vu w$ should probably be used. The letter~$\ell$ here stands for ``longitudinal.''% \footnote{The assertion wants a citation, which the author lacks.} \item \index{normal unit vector} \index{perpendicular unit vector} \index{unit vector!normal or perpendicular} \index{surface!orientation of} \index{surface normal} \index{unit normal} \index{sea!wavy surface of} \index{wavy sea} Where the model features a curved surface like the surface of a wavy sea,% \footnote{\cite{PM:1964}} a $[\vu u\;\vu v\;\vu n]$ basis (or a $[\vu u\;\vu n\;\vu w]$ basis, etc.)\ % can be used, where~$\vu n$ points locally perpendicularly to the surface. The letter~$n$ here stands for ``normal,'' a synonym for ``perpendicular.'' Observe, incidentally but significantly, that such a unit normal~$\vu n$ tells one everything one needs to know about its surface's local orientation. \item Combining the last two, where the model features a contour along a curved surface, an $[\wu\ell\;\vu v\;\vu n]$ basis can be used. One need not worry about choosing a direction for~$\vu v$ in this case since necessarily $\vu v = \vu n \times \wu\ell$. \item \index{circle} \index{cylinder} \index{cylindrical basis} \index{basis!cylindrical} \index{axis!of a cylinder or circle} \index{azimuth} \index{up} \index{south} \index{east} Where the model features a circle or cylinder, a $[\wu\rho\;\wu\phi\;\vu z]$ basis can be used, where~$\vu z$ is constant and runs along the cylinder's axis (or perpendicularly through the circle's center),~$\wu\rho$ is variable and points locally away from the axis, and~$\wu\phi$ is variable and runs locally along the circle's perimeter in the direction of increasing azimuth~$\phi$. Refer to \S~\ref{trig:277} and Fig.~\ref{vector:230:fig-cylindrical-basis}. \begin{figure} \caption[The cylindrical basis.]{% The cylindrical basis. (The conventional symbols~\vectortoward\ and~\vectoraway\ respectively represent vectors pointing out of the page toward the reader and into the page away from the reader. Thus, this figure shows the constant basis vector~$\vu z$ pointing out of the page toward the reader. The dot in the middle of the~\vectortoward\ is supposed to look like the tip of an arrowhead.)% } \label{vector:230:fig-cylindrical-basis} \bc \nc\fxa{-1.2} \nc\fxb{4.0} \nc\fya{-1.2} \nc\fyb{4.0} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \small \psset{dimen=middle} {% \psset{linewidth=0.5pt}% \psline(-0.5,0)(3.5,0)% \psline(0,-0.5)(0,3.5)% \rput{30}(0,0){% \psline[linestyle=dashed](0,0)(2.85,0)% \rput(3.0,0){% \psset{linewidth=2.0pt}% \psline{->}(0.15,0)(0.8,0)% \rput{*0}(1.00,0){$\wu\rho$}% \rput{*0}(0.30,0.85){$\wu\phi$}% \psline{->}(0,0.15)(0,0.8)% \rput(0,0){\vectortoward}% \rput{*0}(0,0){% \psline[linewidth=0.5pt](-0.06471,-0.24148)(-0.25882,-0.96593)(-0.35882,-0.96593)% \rput(-0.52,-0.92){$\vu z$}% }% }% }% \psarc{->}{0.8}{0}{30}% \rput(1.00,0.27){$\phi$}% \rput(1.45,1.10){$\rho$}% }% \psarc[linewidth=1.0pt](0,0){3}{-15}{27.135}% \psarc[linewidth=1.0pt](0,0){3}{32.865}{105}% } \end{pspicture} \ec \end{figure} \item \index{sphere} \index{spherical basis} \index{basis!spherical} \index{azimuth} \index{elevation} Where the model features a sphere, an $[\vu r\;\wu\theta\;\wu\phi]$ basis can be used, where~$\vu r$ is variable and points locally away from the sphere's center,~$\wu\theta$ is variable and runs locally tangentially to the sphere's surface in the direction of increasing elevation~$\theta$ (that is, though not usually in the~$-\vu z$ direction itself, as nearly as possible to the~$-\vu z$ direction without departing from the sphere's surface), and~$\wu\phi$ is variable and runs locally tangentially to the sphere's surface in the direction of increasing azimuth~$\phi$ (that is, along the sphere's surface perpendicularly to~$\vu z$). Standing on the earth's surface, with the earth as the sphere,~$\vu r$ would be up,~$\wu\theta$ south, and~$\wu\phi$ east. Refer to \S~\ref{trig:277} and Fig.~\ref{vector:230:fig-spherical-basis}. \begin{figure} \caption[The spherical basis.]{% The spherical basis (see also Fig.~\ref{vector:fig-sphere}).% } \label{vector:230:fig-spherical-basis} \bc \nc\fxa{0.0} \nc\fxb{6.0} \nc\fya{-2.0} \nc\fyb{5.0} \nc\gxa{0.5} \nc\gxb{5.5} \nc\gya{-1.5} \nc\gyb{4.5} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) { \small \psset{dimen=middle} { \psset{linewidth=1.0pt} \psclip{\psframe[linewidth=0.0pt,linestyle=none](\gxa,\gya)(\gxb,0)}% \psellipse[linewidth=0.5pt](0,0)(5.0,1.0)% \endpsclip \psclip{\psframe[linewidth=0.0pt,linestyle=none](\gxa,\gya)(\gxb,\gyb)}% \psellipse[linewidth=0.5pt](0,0)(4.0958,5.0)% \pscircle(0,0){5.0}% \endpsclip }% \rput(3.1375,3.2139){% theta = 50, phi = 55 \psset{linewidth=2.0pt}% \psdot(0,0)% \psline{->}(0,0)( 0.44122, 0.12602)% \psline{->}(0,0)( 0.42123,-0.61284)% \psline{->}(0,0)( 0.50201, 0.51423)% \rput( 0.68, 0.72){$\vu r$} \rput(-0.02,-0.45){$\wu\theta$} \psline[linewidth=0.5pt](0.32,-0.07)(0.47,-0.27) \rput( 0.60,-0.30){$\wu\phi$} }% } \end{pspicture} \ec \end{figure} \item \index{secondary circle} \index{secondary cylindrical basis} \index{basis!secondary cylindrical} \index{circle!secondary} Occasionally a model arises with two circles that share a center but whose axes stand perpendicular to one another. In such a model one conventionally establishes~$\vu z$ as the direction of the principal circle's axis but then is left with~$\vu x$ or~$\vu y$ as the direction of the secondary circle's axis, upon which an $[\vu x\;\wu\rho^x\;\wu\phi^x]$, $[\wu\phi^x\;\vu r\;\wu\theta^x]$, $[\wu\phi^y\;\vu y\;\wu\rho^y]$ or $[\wu\theta^y\;\wu\phi^y\;\vu r]$ basis can be used locally as appropriate. Refer to \S~\ref{trig:277}. \ei Many other orthogonal bases are possible (as in \S~\ref{vector:280}, for instance), but the foregoing are the most common. Whether listed here or not, each orthogonal basis orders its three unit vectors by the right-hand rule~(\ref{vector:230:10}). \index{quiz} \index{azimuth} Quiz: what does the vector expression $\wu\rho 3 - \wu\phi(1/4) + \vu z 2$ mean? Wrong answer: it meant the cylindrical coordinates $(3;-1/4,2)$; or, it meant the position vector $\vu x 3 \cos(-1/4) + \vu y 3 \sin(-1/4) + \vu z 2$ associated with those coordinates. Right answer: the expression means nothing certain in itself but acquires a definite meaning only when an azimuthal coordinate~$\phi$ is also supplied, after which the expression indicates the ordinary rectangular vector $\vu x' 3 - \vu y'(1/4) + \vu z' 2$, where $\vu x'=\wu\rho=\vu x\cos\phi + \vu y\sin\phi$, $\vu y'=\wu\phi=-\vu x\sin\phi + \vu y\cos\phi$, and $\vu z'=\vu z$. But, if this is so---if the cylindrical basis $[\wu\rho\;\wu\phi\;\vu z]$ is used solely to express \emph{rectangular} vectors---then why should we name this basis ``cylindrical''? Answer: only because cylindrical coordinates (supplied somewhere) determine the actual directions of its basis vectors. Once directions are determined such a basis is used purely rectangularly, like any other orthogonal basis. \index{jet engine} \index{axle} \index{velocity!local} \index{vector!local} \index{vector!position} \index{position vector} This can seem confusing until one has grasped what the so-called nonrectangular bases are for. Consider the problem of air flow in a jet engine. It probably suits such a problem that instantaneous local air velocity within the engine cylinder be expressed in cylindrical coordinates, with the~$z$ axis oriented along the engine's axle; but this does not mean that the air flow within the engine cylinder were everywhere $\vu z$-directed. On the contrary, a local air velocity of $\ve q = [-\wu\rho 5.0 + \wu\phi 30.0 - \vu z 250.0]$ m/s would have air moving through the point in question at 250.0~m/s aftward along the axle, 5.0~m/s inward toward the axle and 30.0~m/s circulating about the engine cylinder. In this model, it is true that the basis vectors~$\wu\rho$ and~$\wu\phi$ indicate different directions at different positions within the cylinder, but at a particular position the basis vectors are still used rectangularly to express~$\ve q$, the instantaneous local air velocity at that position. It's just that the ``rectangle'' is rotated locally to line up with the axle. Naturally, you cannot make full sense of an air-velocity vector~$\ve q$ unless you have also the coordinates $(\rho;\phi,z)$ of the position within the engine cylinder at which the air has the velocity the vector specifies---yet this is when confusion can arise, for besides the air-velocity vector there is also, separately, a position vector $\ve r = \vu x \rho\cos\phi + \vu y \rho\sin\phi + \vu z z$. One may denote the air-velocity vector as% \footnote{ Conventionally, one is much more likely to denote a velocity vector as $\ve u(\ve r)$ or $\ve v(\ve r)$, except that the present chapter is (as footnote~\ref{vector:230:fn1} has observed) already using the letters~$u$ and~$v$ for an unrelated purpose. To denote position as~$\ve r$ however is entirely standard. } $\ve q(\ve r)$, a function of position; yet, though the position vector is as much a vector as the velocity vector is, one nonetheless handles it differently. One will not normally express the position vector~$\ve r$ in the cylindrical basis. \index{tautology} It would make little sense to try to express the position vector~$\ve r$ in the cylindrical basis because the position vector is the very thing that \emph{determines} the cylindrical basis. In the cylindrical basis, after all, the position vector is necessarily $\ve r = \wu\rho\rho + \vu z z$ (and consider: in the spherical basis it is the even more cryptic $\ve r = \vu r r$), and how useful is that, really? Well, maybe it is useful in some situations, but for the most part to express the position vector in the cylindrical basis would be as to say, ``My house is zero miles away from home.'' Or, ``The time is presently now.'' Such statements may be tautologically true, perhaps, but they are confusing because they only seem to give information. The position vector~$\ve r$ determines the basis, after which one expresses things other than position, like instantaneous local air velocity~$\ve q$, in that basis. In fact, the only basis normally suitable to express a position vector is a fixed rectangular basis like $[\vu x\;\vu y\;\vu z]$. Otherwise, one uses cylindrical coordinates $(\rho;\phi,z)$, but not a cylindrical basis $[\wu\rho\;\wu\phi\;\vu z]$, to express a position~$\ve r$ in a cylindrical geometry. Maybe the nonrectangular bases were more precisely called ``rectangular bases of the nonrectangular coordinate systems,'' but those are too many words and, anyway, that is not how the usage has evolved. Chapter~\ref{vcalc} will elaborate the story by considering spatial derivatives of quantities like air velocity, when one must take the variation in~$\wu\rho$ and~$\wu\phi$ from point to point into account, but the foregoing is the basic story nevertheless. % ---------------------------------------------------------------------- \section{Notation} \label{vector:240} \index{vector!concise notation for} \index{notation!for the vector, concise} \index{concision} \index{prolixity} The vector notation of \S\S~\ref{vector:210} and~\ref{vector:220} is correct, familiar and often expedient but sometimes inconveniently prolix. This admittedly difficult section augments the notation to render it much more concise. \subsection{Components by subscript} \label{vector:240.10} \index{components of a vector by subscript} \index{subscript!indicating the components of a vector by} \index{dot product!abbrevated notation for} The notation \[ \settowidth\tla{$\wu\phi$} \nc\xxa[1]{\makebox[\tla][c]{$#1$} \cdot \ve a} \setlength\arraycolsep{0.30\arraycolsep} \br{lclclcl} a_x &\equiv& \xxa{\vu x}, &\ \ \ \ & a_\rho &\equiv& \xxa{\wu\rho}, \\ a_y &\equiv& \xxa{\vu y}, && a_r &\equiv& \xxa{\vu r}, \\ a_z &\equiv& \xxa{\vu z}, && a_\theta &\equiv& \xxa{\wu\theta}, \\ a_n &\equiv& \xxa{\vu n}, && a_\phi &\equiv& \xxa{\wu\phi}, \er \] and so forth abbreviates the indicated dot product. That is to say, the notation represents the component of a vector~$\ve a$ in the indicated direction. Generically, \bq{vector:240:10} a_\alpha \equiv \wu\alpha \cdot \ve a. \eq Applied mathematicians use subscripts for several unrelated or vaguely related purposes, so the full dot-product notation $\wu\alpha \cdot \ve a$ is often clearer in print than the abbreviation~$a_\alpha$ is, but the abbreviation especially helps when several such dot products occur together in the same expression. \index{prime mark ($'$)} \index{$'$} \index{primed coordinate} \index{coordinate!primed and unprimed} \index{clutter, notational} Since% \footnote{ ``Wait!'' comes the objection. ``I thought that you said that~$a_x$ meant $\vu x \cdot \ve a$. Now you claim that it means the~$x$ component of~$\ve a$?'' But there is no difference between $\vu x \cdot \ve a$ and the~$x$ component of~$\ve a$. The two are one and the same. } \[ \begin{split} \vu a &= \vu x a_x + \vu y a_y + \vu z a_z, \\ \vu b &= \vu x b_x + \vu y b_y + \vu z b_z, \end{split} \] the abbreviation lends a more amenable notation to the dot and cross products of~(\ref{vector:dot1}) and~(\ref{vector:cross1}): \bqa \ve a \cdot \ve b &=& a_x b_x + a_y b_y + a_z b_z; \label{vector:dot} \\ \ve a \times \ve b &=& \left| \br{ccc} \vu x & \vu y & \vu z \\ a_x & a_y & a_z \\ b_x & b_y & b_z \er \right|. \label{vector:cross} \eqa In fact---because, as we have seen, reorientation of axes cannot alter the dot and cross products---any orthogonal basis $[\vu x'\;\vu y'\;\vu z']$ (\S~\ref{vector:230}) can serve here, so one can write more generally that \bqa \ve a \cdot \ve b &=& a_{x'} b_{x'} + a_{y'} b_{y'} + a_{z'} b_{z'}; \label{vector:dotp} \\ \ve a \times \ve b &=& \left| \br{ccc} \vu x' & \vu y' & \vu z' \\ a_{x'} & a_{y'} & a_{z'} \\ b_{x'} & b_{y'} & b_{z'} \er \right|. \label{vector:crossp} \eqa Because all those prime marks burden the notation and for professional mathematical reasons, the general forms~(\ref{vector:dotp}) and~(\ref{vector:crossp}) are sometimes rendered \bqb \ve a \cdot \ve b &=& a_1 b_1 + a_2 b_2 + a_3 b_3, \\ \ve a \times \ve b &=& \left| \br{ccc} \vu e_1 & \vu e_2 & \vu e_3 \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \er \right|, \eqb but you have to be careful about that in applied usage because people are not always sure whether a symbol like~$a_3$ means ``the third component of the vector~$\ve a$'' (as it does here) or ``the third vector's component in the~$\vu a$ direction'' (as it would in eqn.~\ref{vector:dot1}). Typically, applied mathematicians will write in the manner of~(\ref{vector:dot}) and~(\ref{vector:cross}) with the implied understanding that they really mean~(\ref{vector:dotp}) and~(\ref{vector:crossp}) but prefer not to burden the notation with extra little strokes---that is, with the implied understanding that~$x$, $y$ and~$z$ could just as well be~$\rho$, $\phi$ and~$z$ or the coordinates of any other orthogonal, right-handed, three-dimensional basis. Some pretty powerful confusion can afflict the student regarding the roles of the cylindrical symbols~$\rho$, $\phi$ and~$z$; or, worse, of the spherical symbols~$r$, $\theta$ and~$\phi$. Such confusion reflects a pardonable but remediable lack of understanding of the relationship between coordinates like~$\rho$, $\phi$ and~$z$ and their corresponding unit vectors~$\wu\rho$, $\wu\phi$ and~$\vu z$. Section~\ref{vector:230} has already written of the matter; but, further to dispel the confusion, one can now ask the student what the cylindrical coordinates of the vectors~$\wu\rho$, $\wu\phi$ and~$\vu z$ are. The correct answer: $(1;\phi,0)$, $(1;\phi+2\pi/4,0)$ and $(0;0,1)$, respectively. Then, to reinforce, one can ask the student which cylindrical coordinates the variable vectors~$\wu\rho$ and~$\wu\phi$ are functions of. The correct answer: both are functions of the coordinate~$\phi$ only ($\vu z$, a constant vector, is not a function of anything). What the student needs to understand is that, among the cylindrical coordinates,~$\phi$ is a different kind of thing than~$z$ and~$\rho$ are: \bi \item $z$ and~$\rho$ are lengths whereas~$\phi$ is an angle; \item but~$\wu\rho$, $\wu\phi$ and~$\vu z$ are all the same kind of thing, unit vectors; \item and, separately,~$a_\rho$, $a_\phi$ and~$a_z$ are all the same kind of thing, lengths. \ei Now to ask the student a harder question: in the cylindrical basis, what is the vector representation of $(\rho_1;\phi_1,z_1)$? The correct answer: $\wu\rho\rho_1\cos(\phi_1-\phi) + \wu\phi\rho_1\sin(\phi_1-\phi) + \vu z z_1$. The student that gives this answer probably grasps the cylindrical symbols. If the reader feels that the notation begins to confuse more than it describes, the writer empathizes but regrets to inform the reader that the rest of the section, far from granting the reader a comfortable respite to absorb the elaborated notation as it stands, will not delay to elaborate the notation yet further! The confusion however is subjective. The trouble with vector work is that one has to learn to abbreviate or the expressions involved grow repetitive and unreadably long. For vectors, the abbreviated notation really is the proper notation. Eventually one accepts the need and takes the trouble to master the conventional vector abbreviation this section presents; and, indeed, the abbreviation is rather elegant once one becomes used to it. So, study closely and take heart! The notation is not actually as impenetrable as it at first will seem. \subsection{Einstein's summation convention} \label{vector:240.20} \index{Einstein, Albert (1879--1955)} \index{Einstein's summation convention} \index{summation convention, Einstein's} \index{Einstein notation} \emph{Einstein's summation convention} is this: \emph{that repeated indices are implicitly summed over.}% \footnote{\cite{Hopman}} For instance, where the convention is in force, the equation% \footnote{ Some professional mathematicians now write a superscript~$a^i$ in certain cases in place of a subscript~$a_i$, where the superscript bears some additional semantics~% \cite[``Einstein notation,'' 05:36, 10 February 2008]{wikip}. Scientists and engineers however tend to prefer Einstein's original, subscript-only notation. } \bq{vector:240:20} \ve a \cdot \ve b = a_i b_i \eq means that \[ \ve a \cdot \ve b = \sum_{i} a_i b_i \] or more fully that \[ \ve a \cdot \ve b = \!\!\!\sum_{i=x',y',z'}\!\!\! a_i b_i = a_{x'} b_{x'} + a_{y'} b_{y'} + a_{z'} b_{z'}, \] which is~(\ref{vector:dotp}), except that Einstein's form~(\ref{vector:240:20}) expresses it more succinctly. Likewise, \bq{vector:240:22} \ve a \times \ve b = \vui ( a_{i+1} b_{i-1} - b_{i+1} a_{i-1} ) \eq is~(\ref{vector:crossp})---although an experienced applied mathematician would probably apply the Levi-Civita epsilon of \S~\ref{vector:240.30}, below, to further abbreviate this last equation to the form of~(\ref{vector:240:32}) before presenting it. Einstein's summation convention is also called the \emph{Einstein notation,} a term sometimes taken loosely to include also the Kronecker delta and Levi-Civita epsilon of \S~\ref{vector:240.30}. \index{accountant} What is important to understand about Einstein's summation convention is that, in and of itself, it brings no new mathematics. It is rather a notational convenience.% \footnote{\cite[``Einstein summation'']{EWW-web}} It asks a reader to regard a repeated index like the~$i$ in~``$a_ib_i$'' as a dummy index (\S~\ref{alggeo:227}) and thus to read~``$a_ib_i$'' as ``$\sum_i a_ib_i$.'' It does not magically create a summation where none existed; it just hides the summation sign to keep it from cluttering the page. It is the kind of notational trick an accountant might appreciate. Under the convention, the summational operator~$\sum_i$ is implied not written, but the operator is still there. Admittedly confusing on first encounter, the convention's utility and charm are felt after only a little practice. Incidentally, nothing requires you to invoke Einstein's summation convention everywhere and for all purposes. You can waive the convention, writing the summation symbol out explicitly whenever you like.% \footnote{\cite{physics321}} In contexts outside vector analysis, to invoke the convention at all may make little sense. Nevertheless, you should indeed learn the convention---if only because you must learn it to understand the rest of this chapter---but once having learned it you should naturally use it only where it actually serves to clarify. Fortunately, in vector work, it often does just that. \index{quiz} Quiz:% \footnote{\cite{Hopman}} if~$\delta_{ij}$ is the Kronecker delta of \S~\ref{matrix:150}, then what does the symbol~$\delta_{ii}$ represent where Einstein's summation convention is in force? \subsection{The Kronecker delta and the Levi-Civita epsilon} \label{vector:240.30} \index{Kronecker delta} \index{delta, Kronecker} \index{Kronecker, Leopold (1823--1891)} \index{Levi-Civita, Tullio (1873--1941)} \index{Levi-Civita epsilon} \index{epsilon, Levi-Civita} \index{$\delta$} \index{$\epsilon$} \index{repetition, unseemly} \index{parity!and the Levi-Civita epsilon} Einstein's summation convention expresses the dot product~(\ref{vector:240:20}) neatly but, as we have seen in~(\ref{vector:240:22}), does not by itself wholly avoid unseemly repetition in the cross product. The \emph{Levi-Civita epsilon}% \footnote{ Also called the \emph{Levi-Civita symbol, tensor,} or \emph{permutor.} For native English speakers who do not speak Italian, the ``ci'' in Levi-Civita's name is pronounced as the ``chi'' in ``children.'' }% ~$\ep_{ijk}$ mends this, rendering the cross-product as \bq{vector:240:32} \ve a \times \ve b = \ep_{ijk} \vui a_j b_k, \eq where% \footnote{\cite[``Levi-Civita permutation symbol'']{planetm}} \bq{vector:240:30} \settowidth\tla{$+1$} \ep_{ijk} \equiv \begin{cases} \makebox[\tla][r]{$+1$} & \ \mbox{if $(i,j,k) = (x',y',z')$, $(y',z',x')$ or $(z',x',y')$;} \\ \makebox[\tla][r]{$-1$} & \ \mbox{if $(i,j,k) = (x',z',y')$, $(y',x',z')$ or $(z',y',x')$;} \\ \makebox[\tla][r]{$ 0$} & \ \mbox{otherwise [for instance if $(i,j,k) = (x',x',y')$].} \end{cases} \eq In the language of \S~\ref{matrix:322}, the Levi-Civita epsilon quantifies parity. (Chapters~\ref{matrix} and~\ref{eigen} did not use it, but the Levi-Civita notation applies in any number of dimensions, not only three as in the present chapter. In this more general sense the Levi-Civita is the determinant of the permutor whose ones hold the indicated positions---which is a formal way of saying that it's a~$+$ sign for even parity and a~$-$ sign for odd. For instance, in the four-dimensional, $4 \times 4$ case $\ep_{1234}=1$ whereas $\ep_{1243}=-1$: refer to \S\S~\ref{matrix:322}, \ref{matrix:325.10} and~\ref{eigen:310}. Table~\ref{vector:240:tbl1}, however, as the rest of this section and chapter, concerns the three-dimensional case only.) Technically, the Levi-Civita epsilon and Einstein's summation convention are two separate, independent things, but a canny reader takes the Levi-Civita's appearance as a hint that Einstein's convention is probably in force, as in~(\ref{vector:240:32}). The two tend to go together.% \footnote{ The writer has heard the apocryphal belief expressed that the letter~$\epsilon$, a Greek~$e$, stood in this context for ``Einstein.'' As far as the writer knows,~$\epsilon$ is merely the letter after~$\delta$, which represents the name of Paul Dirac---though the writer does not claim his indirected story to be any less apocryphal than the other one (the capital letter~$\Delta$ has a point on top that suggests the pointy nature of the Dirac delta of Fig.~\ref{integ:670:fig-d}, which makes for yet another plausible story). In any event, one sometimes hears Einstein's summation convention, the Kronecker delta and the Levi-Civita epsilon together referred to as ``the Einstein notation,'' which though maybe not quite terminologically correct is hardly incorrect enough to argue over and is clear enough in practice. } The Levi-Civita epsilon~$\ep_{ijk}$ relates to the Kronecker delta~$\delta_{ij}$ of \S~\ref{matrix:150} approximately as the cross product relates to the dot product. Both delta and epsilon find use in vector work. For example, one can write~(\ref{vector:240:20}) alternately in the form \[ \ve a \cdot \ve b = \delta_{ij} a_i b_j. \] Table~\ref{vector:240:tbl1} lists several relevant properties,% \footnote{\cite{physics321}} each as with Einstein's summation convention in force.% \footnote{The table incidentally answers \S~\ref{vector:240.20}'s quiz.} \begin{table} \caption[Properties of the Kronecker delta and Levi-Civita epsilon.]{% Properties of the Kronecker delta and the Levi-Civita epsilon, with Einstein's summation convention in force. } \label{vector:240:tbl1} \index{Kronecker delta!properties of} \index{delta, Kronecker!properties of} \index{Levi-Civita epsilon!properties of} \index{epsilon, Levi-Civita!properties of} \bqb \delta_{jk} &=& \delta_{kj} \\ \delta_{ij}\delta_{jk} &=& \delta_{ik} \\ \delta_{ii} &=& 3 \\ \delta_{jk}\ep_{ijk} &=& 0 \\ \delta_{nk}\ep_{ijk} &=& \ep_{ijn} \\ \ep_{ijk} = \ep_{jki} = \ep_{kij} &=& -\ep_{ikj} = -\ep_{jik} = -\ep_{kji} \\ \ep_{ijk}\ep_{ijk} &=& 6 \\ \ep_{ijn}\ep_{ijk} &=& 2\delta_{nk} \\ \ep_{imn}\ep_{ijk} &=& \delta_{mj}\delta_{nk} - \delta_{mk}\delta_{nj} \eqb \end{table} Of the table's several properties, the property that $\ep_{imn}\ep_{ijk} = \delta_{mj}\delta_{nk} - \delta_{mk}\delta_{nj}$ is proved by observing that, in the case that $i=x'$, either $(j,k)=(y',z')$ or $(j,k)=(z',y')$, and also either $(m,n)=(y',z')$ or $(m,n)=(z',y')$; and similarly in the cases that $i=y'$ and $i=z'$ (more precisely, in each case the several indices can take any values, but combinations other than the ones listed drive~$\ep_{imn}$ or~$\ep_{ijk}$, or both, to zero, thus contributing nothing to the sum). This implies that either $(j,k)=(m,n)$ or $(j,k)=(n,m)$---which, when one takes parity into account, is exactly what the property in question asserts. The property that $\ep_{ijn}\ep_{ijk} = 2\delta_{nk}$ is proved by observing that, in any given term of the Einstein sum,~$i$ is either~$x'$ or~$y'$ or~$z'$ and that~$j$ is one of the remaining two, which leaves the third to be shared by both~$k$ and~$n$. The factor~$2$ appears because, for $k=n=x'$, an $(i,j)=(y',z')$ term and an $(i,j)=(z',y')$ term both contribute positively to the sum; and similarly for $k=n=y'$ and again for $k=n=z'$. Unfortunately, the last paragraph likely makes sense to few who do not already know what it means. A concrete example helps. Consider the compound product $\ve c \times (\ve a \times \ve b)$. In this section's notation and with the use of~(\ref{vector:240:32}), the compound product is \bqb \ve c \times (\ve a \times \ve b) &=& \ve c \times (\ep_{ijk}\vui a_j b_k) \xn\\&=& \ep_{mni} \vu m c_n (\ep_{ijk}\vui a_j b_k)_i \xn\\&=& \ep_{mni} \ep_{ijk} \vu m c_n a_j b_k \xn\\&=& \ep_{imn} \ep_{ijk} \vu m c_n a_j b_k \xn\\&=& (\delta_{mj}\delta_{nk} - \delta_{mk}\delta_{nj}) \vu m c_n a_j b_k \xn\\&=& \delta_{mj}\delta_{nk} \vu m c_n a_j b_k - \delta_{mk}\delta_{nj} \vu m c_n a_j b_k \xn\\&=& \vuj c_k a_j b_k - \vu k c_j a_j b_k \xn\\&=& (\vuj a_j)(c_k b_k) - (\vu k b_k)(c_j a_j). \xn \eqb That is, in light of~(\ref{vector:240:20}), \bq{vector:240:35} \ve c \times (\ve a \times \ve b) = \ve a(\ve c \cdot \ve b) - \ve b(\ve c \cdot \ve a), \eq a useful vector identity. Written without the benefit of Einstein's summation convention, the example's central step would have been \bqb \ve c \times (\ve a \times \ve b) &=& \sum_{i,j,k,m,n} \ep_{imn} \ep_{ijk} \vu m c_n a_j b_k \xn\\&=& \sum_{j,k,m,n} (\delta_{mj}\delta_{nk} - \delta_{mk}\delta_{nj}) \vu m c_n a_j b_k, \eqb which makes sense if you think about it hard enough,% \footnote{ If thinking about it hard enough does not work, then here it is in interminable detail: \bqb \lefteqn{\sum_{i,j,k,m,n} \ep_{imn} \ep_{ijk} f(j,k,m,n)} \\&=& \ep_{x'y'z'} \ep_{x'y'z'} f(y',z',y',z') + \ep_{x'y'z'} \ep_{x'z'y'} f(y',z',z',y') \\&&\ \ \mbox{} + \ep_{x'z'y'} \ep_{x'y'z'} f(z',y',y',z') + \ep_{x'z'y'} \ep_{x'z'y'} f(z',y',z',y') \\&&\ \ \mbox{} + \ep_{y'z'x'} \ep_{y'z'x'} f(z',x',z',x') + \ep_{y'z'x'} \ep_{y'x'z'} f(z',x',x',z') \\&&\ \ \mbox{} + \ep_{y'x'z'} \ep_{y'z'x'} f(x',z',z',x') + \ep_{y'x'z'} \ep_{y'x'z'} f(x',z',x',z') \\&&\ \ \mbox{} + \ep_{z'x'y'} \ep_{z'x'y'} f(x',y',x',y') + \ep_{z'x'y'} \ep_{z'y'x'} f(x',y',y',x') \\&&\ \ \mbox{} + \ep_{z'y'x'} \ep_{z'x'y'} f(y',x',x',y') + \ep_{z'y'x'} \ep_{z'y'x'} f(y',x',y',x') \\&=& f(y',z',y',z') - f(y',z',z',y') - f(z',y',y',z') + f(z',y',z',y') \\&&\ \ \mbox{} + f(z',x',z',x') - f(z',x',x',z') - f(x',z',z',x') + f(x',z',x',z') \\&&\ \ \mbox{} + f(x',y',x',y') - f(x',y',y',x') - f(y',x',x',y') + f(y',x',y',x') \\&=& \big[ f(y',z',y',z') + f(z',x',z',x') + f(x',y',x',y') \\&&\ \ \ \ \ \ \ \ \mbox{} + f(z',y',z',y') + f(x',z',x',z') + f(y',x',y',x') \big] \\&&\ \ \mbox{} - \big[ f(y',z',z',y') + f(z',x',x',z') + f(x',y',y',x') \\&&\ \ \ \ \ \ \ \ \mbox{} + f(z',y',y',z') + f(x',z',z',x') + f(y',x',x',y') \big] \\&=& \big[ f(y',z',y',z') + f(z',x',z',x') + f(x',y',x',y') \\&&\ \ \ \ \ \ \ \ \mbox{} + f(z',y',z',y') + f(x',z',x',z') + f(y',x',y',x') \\&&\ \ \ \ \ \ \ \ \mbox{} + f(x',x',x',x') + f(y',y',y',y') + f(z',z',z',z') \big] \\&&\ \ \mbox{} - \big[ f(y',z',z',y') + f(z',x',x',z') + f(x',y',y',x') \\&&\ \ \ \ \ \ \ \ \mbox{} + f(z',y',y',z') + f(x',z',z',x') + f(y',x',x',y') \\&&\ \ \ \ \ \ \ \ \mbox{} + f(x',x',x',x') + f(y',y',y',y') + f(z',z',z',z') \big] \\&=& \sum_{j,k,m,n} (\delta_{mj}\delta_{nk} - \delta_{mk}\delta_{nj}) f(j,k,m,n). \eqb That is for the property that $\ep_{imn}\ep_{ijk} = \delta_{mj}\delta_{nk} - \delta_{mk}\delta_{nj}$. For the property that $\ep_{ijn}\ep_{ijk} = 2\delta_{nk}$, the corresponding calculation is \bqb \lefteqn{\sum_{i,j,k,n} \ep_{ijn} \ep_{ijk} f(k,n)} \\&=& \ep_{y'z'x'} \ep_{y'z'x'} f(x',x') + \ep_{z'y'x'} \ep_{z'y'x'} f(x',x') \\&&\ \ \mbox{} + \ep_{z'x'y'} \ep_{z'x'y'} f(y',y') + \ep_{x'z'y'} \ep_{x'z'y'} f(y',y') \\&&\ \ \mbox{} + \ep_{x'y'z'} \ep_{x'y'z'} f(z',z') + \ep_{y'x'z'} \ep_{y'x'z'} f(z',z') \\&=& f(x',x') + f(x',x') + f(y',y') + f(y',y') + f(z',z') + f(z',z') \\&=& 2\big[ f(x',x') + f(y',y') + f(z',z') \big] \\&=& 2\sum_{k,n} \delta_{nk} f(k,n). \eqb For the property that $\ep_{ijk}\ep_{ijk} = 6$, \[ \sum_{i,j,k} \ep_{ijk} \ep_{ijk} = \ep_{x'y'z'}^2 + \ep_{y'z'x'}^2 + \ep_{z'x'y'}^2 + \ep_{x'z'y'}^2 + \ep_{y'x'z'}^2 + \ep_{z'y'x'}^2 = 6. \] It is precisely to encapsulate such interminable detail that we use the Kronecker delta, the Levi-Civita epsilon and the properties of Table~\ref{vector:240:tbl1}. } and justifies the table's claim that $\ep_{imn}\ep_{ijk} = \delta_{mj}\delta_{nk} - \delta_{mk}\delta_{nj}$. (Notice that the compound Kronecker operator~$\delta_{mj}\delta_{nk}$ includes nonzero terms for the case that $j=k=m=n=x'$, for the case that $j=k=m=n=y'$ and for the case that $j=k=m=n=z'$, whereas the compound Levi-Civita operator~$\ep_{imn}\ep_{ijk}$ does not. However, the compound Kronecker operator~$-\delta_{mk}\delta_{nj}$ includes canceling terms for these same three cases. This is why the table's claim is valid as written.) To belabor the topic further here would serve little purpose. The reader who does not feel entirely sure that he understands what is going on might work out the table's several properties with his own pencil, in something like the style of the example, until he is satisfied that he adequately understands the several properties and their correct use. \index{projection!onto a plane} \index{plane!projection onto} \index{vector!projection of onto a plane} Section~\ref{vcalc:425} will refine the notation for use when derivatives with respect to angles come into play but, before leaving the present section, we might pause for a moment to appreciate~(\ref{vector:240:35}) in the special case that $\ve b = \ve c = \vu n$: \bq{vector:240:36} -\vu n \times (\vu n \times \ve a) = \ve a - \vu n(\vu n \cdot \ve a). \eq The difference $\ve a - \vu n(\vu n \cdot \ve a)$ evidently projects a vector~$\ve a$ onto the plane whose unit normal is~$\vu n$. Equation~(\ref{vector:240:36}) reveals that the double cross product $-\vu n \times (\vu n \times \ve a)$ projects the same vector onto the same plane. Figure~\ref{vector:240:fig-proj} illustrates. \begin{figure} \caption{A vector projected onto a plane.} \label{vector:240:fig-proj} \bc \nc\fxa{-2.7} \nc\fxb{6.8} \nc\fya{-1.7} \nc\fyb{4.0} %\nc\xxs{0.20} % Perspective ratio of the x-y circle. \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) \localscalebox{1.0}{1.0}{ \small \psset{dimen=middle} % The slope of the plane's sides on the page is 4:1. \pspolygon[linewidth=0.5pt,linecolor=lightgray,fillstyle=solid,fillcolor=lightgray] (-1.7,1.1)(3.9,1.1)(3.3,-1.3)(-2.3,-1.3) \psline[linewidth=2.0pt]{<->}(0.00000,1.50000)(0.00000,0.00000)(2.00000,0.69282) \psline[linewidth=2.0pt]{cc->}(0.00000,0.00000)(2.00000,3.69282) \psline[linewidth=1.0pt,linestyle=dashed](2.00000,0.69282)(2.00000,3.69282) { \psset{linewidth=0.5pt} \rput(2.00000,0.69282){% \psline(0.00000,0.45000)(-0.22500,0.37206)(-0.22500,-0.07794) } %\psline(0.00000,0.45000)(-0.22500,0.37206)(-0.22500,-0.07794) %\psline(0.00000,0.45000)(0.22500,0.52794)(0.22500,0.07794) \psline(0.00000,0.45000)(-0.45000,0.45000)(-0.45000,-0.02) } \rput(0.00,1.75){$\vu n$} \rput(1.20,2.80){$\ve a$} \rput[l](2.15,2.10){$\vu n(\vu n \cdot \ve a)$} \psline[linewidth=0.5pt](1.05,0.25)(1.15,0.15)(4.20,0.15) \rput[l](4.10,-0.05){ $ \br{l} -\vu n \times (\vu n \times \ve a) \\ = \ve a - \vu n(\vu n \cdot \ve a) \er $ } } \end{pspicture} \ec \end{figure} % ---------------------------------------------------------------------- \section{Algebraic identities} \label{vector:250} \index{identity!vector, algebraic} \index{vector!algebraic identities of} Vector algebra is not in principle very much harder than scalar algebra is, but with three distinct types of product it has more rules controlling the way its products and sums are combined. Table~\ref{vector:tbl-algid} lists several of these.% \footnote{\cite[Appendix~II]{Stratton}\cite[Appendix~A]{Harrington}}% $\mbox{}^,$% \footnote{ Nothing in any of the table's identities requires the vectors involved to be real. The table is equally as valid when vectors are complex. } \begin{table} \caption{Algebraic vector identities.} \label{vector:tbl-algid} \[ \br{c} \br{rclcrclcrcl} \psi\ve a &=& \vui\psi a_i &\ \ & \ve a \cdot \ve b &\equiv& a_i b_i &\ \ & \ve a \times \ve b &\equiv& \ep_{ijk}\vui a_j b_k \er \\ \br{rclcrcl} \ve a^{*} \cdot \ve a &=& \left| a \right|^2 && (\psi)(\ve a + \ve b) &=& \psi \ve a + \psi \ve b \\ \ve b \cdot \ve a &=& \ve a \cdot \ve b && \ve b \times \ve a &=& -\ve a \times \ve b \\ \ve c \cdot ( \ve a + \ve b ) &=& \ve c \cdot \ve a + \ve c \cdot \ve b &\ \ & \ve c \times ( \ve a + \ve b ) &=& \ve c \times \ve a + \ve c \times \ve b \\ \ve a \cdot ( \psi \ve b ) &=& ( \psi ) ( \ve a \cdot \ve b ) && \ve a \times ( \psi \ve b ) &=& ( \psi ) ( \ve a \times \ve b ) \er \\ \br{rclcl} \ve c \cdot ( \ve a \times \ve b ) &=& \ve a \cdot ( \ve b \times \ve c ) &=& \ve b \cdot ( \ve c \times \ve a ) \\ \ve c \times ( \ve a \times \ve b ) &=& \multicolumn{3}{l}{\ve a ( \ve c \cdot \ve b) - \ve b ( \ve c \cdot \ve a)} \\ -\vu n \times (\vu n \times \ve a) &=& \multicolumn{3}{l}{\ve a - \vu n(\vu n \cdot \ve a)} \er \er \] \end{table} Most of the table's identities are plain by the formulas~(\ref{vector:220:02}), (\ref{vector:dot}) and~(\ref{vector:cross}) respectively for the scalar, dot and cross products, and two were proved as~(\ref{vector:240:35}) and~(\ref{vector:240:36}). The remaining identity is proved in the notation of \S~\ref{vector:240} as \[ \br{rclclcl} \ep_{ijk} c_i a_j b_k &=& \ep_{ijk} c_i a_j b_k &=& \ep_{kij} c_k a_i b_j &=& \ep_{jki} c_j a_k b_i \\&=& \ep_{ijk} c_i a_j b_k &=& \ep_{ijk} a_i b_j c_k &=& \ep_{ijk} b_i c_j a_k \\&=& \ve c \cdot ( \ep_{ijk} \vui a_j b_k ) &=& \ve a \cdot ( \ep_{ijk} \vui b_j c_k ) &=& \ve b \cdot ( \ep_{ijk} \vui c_j a_k ). \er \] That is, \bq{vector:250:30} \ve c \cdot ( \ve a \times \ve b ) = \ve a \cdot ( \ve b \times \ve c ) = \ve b \cdot ( \ve c \times \ve a ). \eq Besides the several vector identities, the table also includes the three vector products in Einstein notation.% \footnote{ If the reader's native language is English, then he is likely to have heard of the unfortunate ``back cab rule,'' which actually is not a rule but an unhelpful mnemonic for one of Table~\ref{vector:tbl-algid}'s identities. The mnemonic is mildly orthographically clever but, when learned, significantly impedes real understanding of the vector. The writer recommends that the reader forget the rule if he has heard of it for, in mathematics, spelling-based mnemonics are seldom if ever a good idea. } Each definition and identity of Table~\ref{vector:tbl-algid} is invariant under reorientation of axes. % ---------------------------------------------------------------------- \section{Isotropy} \label{vector:270} \index{isotropy} \index{coordinates!isotropic} A real,% \footnote{ The reader is reminded that one can licitly express a complex vector in a real basis. } three-dimensional coordinate system% \footnote{ This chapter's footnote~\ref{vector:280:fn1} and Ch.~\ref{vcalc}'s footnote~\ref{vcalc:420:fn1} explain the usage of semicolons as coordinate delimiters. } $(\alpha;\beta;\gamma)$ is \emph{isotropic} at a point $\ve r=\ve r_1$ if and only if \bq{vector:270:10} \begin{split} \wu\beta(\ve r_1) \cdot \wu\gamma(\ve r_1) &= 0, \\ \wu\gamma(\ve r_1) \cdot \wu\alpha(\ve r_1) &= 0, \\ \wu\alpha(\ve r_1) \cdot \wu\beta(\ve r_1) &= 0, \end{split} \eq and \bq{vector:270:15} \left|\frac{\pl\ve r}{\pl \alpha}\right|_{\ve r = \ve r_1} = \left|\frac{\pl\ve r}{\pl \beta}\right|_{\ve r = \ve r_1} = \left|\frac{\pl\ve r}{\pl \gamma}\right|_{\ve r = \ve r_1}. \eq That is, a three-dimensional system is isotropic if its three coordinates advance locally at right angles to one another but at the same rate. Of the three basic three-dimensional coordinate systems---indeed, of all the three-dimensional coordinate systems this book treats---only the rectangular is isotropic according to~(\ref{vector:270:10}) and~(\ref{vector:270:15}).% \footnote{ Whether it is even possible to construct an isotropic, nonrectangular coordinate system in three dimensions is a question we will leave to the professional mathematician. The author has not encountered such a system. } Isotropy admittedly would not be a very interesting property if that were all there were to it. However, there is also \emph{two-dimensional isotropy,} more interesting because it arises oftener. \index{coordinates!logarithmic cylindrical} \index{logarithmic cylindrical coordinates} A real, two-dimensional coordinate system $(\alpha;\beta)$ is isotropic at a point $\we\rho^\gamma=\we\rho^\gamma_1$ if and only if \bq{vector:270:20} \wu \alpha(\we\rho^\gamma_1) \cdot \wu \beta(\we\rho^\gamma_1) = 0 \eq and \bq{vector:270:25} \left|\frac{\pl\we\rho^\gamma}{\pl \alpha}\right|_{\we\rho^\gamma = \we\rho^\gamma_1} = \left|\frac{\pl\we\rho^\gamma}{\pl \beta}\right|_{\we\rho^\gamma = \we\rho^\gamma_1}, \eq where $\we\rho^\gamma = \wu\alpha\alpha + \wu\beta\beta$ represents position in the~$\alpha$-$\beta$ plane. (If the~$\alpha$-$\beta$ plane happens to be the~$x$-$y$ plane, as is often the case, then $\we\rho^\gamma = \we\rho^z = \we\rho$ and per eqn.~\ref{trig:277:40} one can omit the superscript.) The two-dimensional rectangular system $(x,y)$ naturally is isotropic. Because $\left|\pl\we\rho/\pl\phi\right| = (\rho)\left|\pl\we\rho/\pl\rho\right|$ the standard two-dimensional cylindrical system $(\rho;\phi)$ as such is nonisotropic, but the change of coordinate \bq{vector:270:30} \lambda \equiv \ln\frac{\rho}{\rho_o}, \eq where~$\rho_o$ is some arbitrarily chosen reference radius, converts the system straightforwardly into the \emph{logarithmic cylindrical} system $(\lambda;\phi)$ which is isotropic everywhere in the plane except at the origin $\rho = 0$. Further two-dimensionally isotropic coordinate systems include the parabolic system of \S~\ref{vector:280.10}, to follow. % ---------------------------------------------------------------------- \section{Parabolic coordinates} \label{vector:280} \index{parabolic coordinates} \index{coordinates!parabolic} \index{coordinates!special} Scientists and engineers find most spatial-geometrical problems they encounter in practice to fall into either of two categories. The first category comprises problems of simple geometry conforming to any one of the three basic coordinate systems---rectangular, cylindrical or spherical. The second category comprises problems of complicated geometry, analyzed in the rectangular system not because the problems' geometries fit that system but rather because they fit no system and thus give one little reason to depart from the rectangular. One however occasionally encounters problems of a third category, whose geometries are simple but, though simple, nevertheless fit none of the three basic coordinate systems. Then it may fall to the scientist or engineer to devise a special coordinate system congenial to the problem. \index{root-length} This section will treat the \emph{parabolic} coordinate systems which, besides being arguably the most useful of the various special systems, serve as good examples of the kind of special system a scientist or engineer might be called upon to devise. The two three-dimensional parabolic systems are the \emph{parabolic cylindrical} system $(\sigma,\tau,z)$ of \S~\ref{vector:280.51} and the \emph{circular paraboloidal} system% \footnote{\label{vector:280:fn1}% The reader probably will think nothing of it now, but later may wonder why the circular paraboloidal coordinates are $(\eta;\phi,\xi)$ rather than $(\xi;\phi,\eta)$ or $(\eta,\xi;\phi)$. The peculiar ordering is to honor the right-hand rule (\S~\ref{trig:230} and eqn.~\ref{vector:230:10}), since $\wu\eta \times \wu\xi = -\wu\phi$ rather than~$+\wu\phi$. See \S~\ref{vector:280.56}. (Regarding the semicolon~``;'' delimiter, it doesn't mean much. This book arbitrarily uses a semicolon when the following coordinate happens to be an angle, which helps to distinguish rectangular coordinates from cylindrical from spherical. Admittedly, such a notational convention ceases to help much when parabolic coordinates arrive, but we will continue to use it for inertia's sake. See also Ch.~\ref{vcalc}'s footnote~\ref{vcalc:420:fn1}.) } $(\eta;\phi,\xi)$ of \S~\ref{vector:280.56}, where the angle~$\phi$ and the length~$z$ are familiar to us but~$\sigma$, $\tau$, $\eta$ and~$\xi$---neither angles nor lengths but root-lengths (that is, coordinates having dimensions of $[\mbox{length}]^{1/2}$)---are new.% \footnote{ The letters~$\sigma$, $\tau$, $\eta$ and~$\xi$ are available letters this section happens to use, not necessarily standard parabolic symbols. See Appendix~\ref{greek}. } Both three-dimensional parabolic systems derive from the two-dimensional parabolic system $(\sigma,\tau)$ of \S~\ref{vector:280.10}.% \footnote{ \cite[``Parabolic coordinates,'' 09:59, 19~July 2008]{wikip} } However, before handling any parabolic system we ought formally to introduce the parabola itself, next. \subsection{The parabola} \label{vector:280.05} \index{parabola} Parabolic coordinates are based on a useful geometrical curve called the \emph{parabola,} which many or most readers will have met long before opening this book's covers. The parabola, simple but less obvious than the circle, may however not be equally well known to all readers, and even readers already acquainted with it might appreciate a re\"examination. This subsection reviews the parabola. \index{focus} \index{directrix} \index{equidistance} \emph{Given a point, called the \emph{focus,} and a line, called the \emph{directrix,}% \footnote{ Whether the parabola's definition ought to forbid the directrix to pass through the focus is a stylistic question this book will leave unanswered. } plus the plane in which the focus and the directrix both lie, the associated \emph{parabola} is that curve which lies in the plane everywhere equidistant from both focus and directrix.}% \footnote{\cite[\S~12-1]{Shenk}} See Fig.~\ref{vector:280:fig-parabola}. \begin{figure} \caption{The parabola.} \label{vector:280:fig-parabola} \bc \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-1.7} \nc\fyb{2.5} \nc\xxb{1.2} \nc\xxc{3.5} \nc\xxd{0.12} \nc\xxe{1.5492} \nc\xxf{0.4} \nc\xxl{% \psline[linewidth=0.5pt]{<-}(0.06,0)(0.63,0)% \psline[linewidth=0.5pt]{<-}(1.60,0)(1.03,0)% \rput{*0}(0.83,0){$a$}% } \nc\xxm{2.4} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} { \psset{linewidth=2.0pt} \psline[linewidth=1.0pt](-\xxc,-\xxb)(\xxc,-\xxb) \psline(0,-\xxd)(0,\xxd) \psline(-\xxd,0)(\xxd,0) \psdot(\xxe,\xxf) } { \psset{linewidth=0.5pt} \psline(-0.25,0)(-0.55,0) \psline{<-}(-0.4,0)(-0.4,-0.35) \psline{<-}(-0.4,-1.2)(-0.4,-0.85) \rput(-0.4,-0.6){$\sigma^2$} } \rput{-90}(\xxe,\xxf)\xxl \rput{-165.522}(\xxe,\xxf)\xxl { \psset{linewidth=2.0pt,plotpoints=80} \psplot{-\xxm}{-0.65}{ 0.6 dup 4.0 mul x dup mul exch div sub neg } \psplot{-0.15}{\xxe}{ 0.6 dup 4.0 mul x dup mul exch div sub neg } \psplot{\xxe}{\xxm}{ 0.6 dup 4.0 mul x dup mul exch div sub neg } } } \end{pspicture} \ec \end{figure} \index{projectile} \index{parabolic arc} Referring to the figure, if rectangular coordinates are established such that~$\vu x$ and~$\vu y$ lie in the plane, that the parabola's focus lies at $(x,y)=(0,k)$, and that the equation $y=k-\sigma^2$ describes the parabola's directrix, then the equation \[ x^2 + (y-k)^2 = (y-k+\sigma^2)^2 \] evidently expresses the equidistance rule of the parabola's definition. Solving for $y-k$ and then, from that solution, for~$y$, we have that \bq{vector:280:05} y = \frac{x^2}{2\sigma^2} + \left( k - \frac{\sigma^2}{2} \right). \eq With the definitions that \bq{vector:280:07} \begin{split} \mu &\equiv \frac{1}{2\sigma^2}, \\ \kappa &\equiv k - \frac{\sigma^2}{2}, \end{split} \eq given which \bq{vector:280:08} \begin{split} \sigma^2 &= \frac{1}{2\mu}, \\ k &= \kappa + \frac{1}{4\mu}, \end{split} \eq eqn.~(\ref{vector:280:05}) becomes \bq{vector:280:06} y = \mu x^2 + \kappa. \eq Equations fitting the general form~(\ref{vector:280:06}) arise extremely commonly in applications. To choose a particularly famous example, the equation that describes a projectile's flight in the absence of air resistance fits the form. %\footnote{ % Though the chapter is not about projectiles, briefly: Gravity~$g$ % accelerates a projectile as % \[ % \frac{d^2y}{dt^2} = -g, % \] % where~$y$ represents the projectile's altitude and~$t$ represents % time. The equation's solution is that % \[ % y = -\frac{gt^2}{2} + v_{yo}t + y_o, % \] % where the parameter~$v_{yo}$ represents the projectile's initial % vertical velocity and the parameter~$y_o$ represents its initial % height. Neglecting air resistance, the projectile does not accelerate % in the~$x$ direction, so % \[ % x = v_x t + x_o. % \] % Combining the last two equations, we have that % \[ % y = -\frac{g}{2}\left(\frac{x-x_o}{v_x}\right)^2 + v_{yo}\left(\frac{x-x_o}{v_x}\right) + y_o, % \] % which, once its terms are distributed and collected and~$x$ is % appropriately offset, indeed fits the general parabolic % form~(\ref{vector:280:06}). %} Any equation that fits the form can be plotted as a parabola, which for example is why projectiles fly in parabolic arcs. \index{bisection} \index{satellite dish antenna} \index{dish antenna} \index{parabolic antenna} \index{antenna!satellite dish} \index{antenna!parabolic} \index{cross-section!parabolic} \index{parabolic cross-section} \index{mirror!parabolic} \index{mirror!parabolic} \index{parabolic mirror} \index{ray} \index{light ray} Observe that the parabola's definition does not actually require the directrix to be $\vu y$-oriented: the directrix can be $\vu x$-oriented or, indeed, oriented any way (though naturally in that case eqns.~\ref{vector:280:05} and~\ref{vector:280:06} would have to be modified). Observe also the geometrical fact that \emph{the parabola's track necessarily bisects the angle between the two line segments % The following should be "labeled~$a$", but visually that does not work. labeled~``$a$\!'' in Fig.~\ref{vector:280:fig-parabola}.} One of the consequences of this geometrical fact---a fact it seems better to let the reader visualize and ponder than to try to justify in so many words% \footnote{ If it helps nevertheless, some words:\ \ Consider that the two line segments labeled~$a$ in the figure run in the directions of increasing distance respectively from the focus and from the directrix. If you want to draw away from the directrix at the same rate as you draw away from the focus, thus maintaining equal distances, then your track cannot but exactly bisect the angle between the two segments. Once you grasp the idea, the bisection is obvious, though to grasp the idea can take some thought. \emph{To bisect} a thing, incidentally---if the context has not already made the meaning plain---is to divide the thing at its middle into two equal parts. }% ---is that a parabolic mirror reflects precisely% \footnote{ Well, actually, physically, the ray model of light implied here is valid only insofar as $\lambda \ll \sigma^2$, where~$\lambda$ represents the light's characteristic wavelength. Also, regardless of~$\lambda$, the ray model breaks down in the immediate neighborhood of the mirror's focus. Such wave-mechanical considerations are confined to a footnote not because they were untrue but rather because they do not concern the present geometrical discussion. Insofar as rays are concerned, the focusing is precise. } toward its focus all light rays that arrive perpendicularly to its directrix (which for instance is why satellite dish antennas have parabolic cross-sections). \subsection{Parabolic coordinates in two dimensions} \label{vector:280.10} \index{parabolic coordinates!in two dimensions} \index{coordinates!parabolic, in two dimensions} \index{two-dimensional space} \index{parabolic track} Parabolic coordinates are most easily first explained in the two-dimensional case that $z=0$. In two dimensions, the parabolic coordinates $(\sigma,\tau)$ represent the point in the~$x$-$y$ plane that lies equidistant \bi \item from the line $y=-\sigma^2$, \item from the line $y=+\tau^2$, and \item from the point $\rho=0$, \ei where the parameter~$k$ of \S~\ref{vector:280.05} has been set to $k=0$. Figure~\ref{vector:280:fig2} depicts the construction described. \begin{figure} \caption[Locating a point by parabolic construction.]% {Locating a point in two dimensions by parabolic construction.} \label{vector:280:fig2} \bc \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-1.7} \nc\fyb{2.5} \nc\xxa{2.0} \nc\xxb{1.2} \nc\xxc{3.5} \nc\xxd{0.12} \nc\xxe{1.5492} \nc\xxf{0.4} \nc\xxl{% \psline[linewidth=0.5pt]{<-}(0.06,0)(0.63,0)% \psline[linewidth=0.5pt]{<-}(1.60,0)(1.03,0)% \rput{*0}(0.83,0){$a$}% } \nc\xxm{2.4} \nc\xxn{37.761} \nc\xxg{0.3} \nc\xxo{0.7} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} { \psset{linewidth=2.0pt} \psline(-\xxc,\xxa)(\xxc,\xxa) \psline(-\xxc,-\xxb)(\xxc,-\xxb) \psline(0,-\xxd)(0,\xxd) \psline(-\xxd,0)(\xxd,0) \psdot(\xxe,\xxf) } { \psset{linewidth=0.5pt} \psline(-0.25,0)(-0.55,0) \psline{<-}(-0.4,0)(-0.4,0.75) \psline{<-}(-0.4,2.0)(-0.4,1.25) \rput(-0.4,1.0){$\tau^2$} \psline{<-}(-0.4,0)(-0.4,-0.35) \psline{<-}(-0.4,-1.2)(-0.4,-0.85) \rput(-0.4,-0.6){$\sigma^2$} \psline(-0.10,0.10)(-0.20,0.20)(-0.20,0.55)(0.05,0.55) \rput[l](0.07,0.55){$\rho\!=\!0$} \rput{\xxn}(-\xxe,\xxf){% \psline(-\xxo,0)(\xxo,0)% \psline(0,-\xxo)(0,\xxo)% \psline(0,\xxg)(-\xxg,\xxg)(-\xxg,0)% } } \rput{90}(\xxe,\xxf)\xxl \rput{-90}(\xxe,\xxf)\xxl \rput{-165.522}(\xxe,\xxf)\xxl { \psset{linewidth=0.5pt,linestyle=dotted,plotpoints=80} \psplot{-\xxm}{-0.65}{ 0.6 dup 4.0 mul x dup mul exch div sub neg } \psplot{-0.15}{\xxe}{ 0.6 dup 4.0 mul x dup mul exch div sub neg } \psplot{\xxe}{\xxm}{ 0.6 dup 4.0 mul x dup mul exch div sub neg } \psplot{-\xxm}{-0.65}{ -1.0 dup 4.0 mul x dup mul exch div sub neg } \psplot{-0.15}{\xxe}{ -1.0 dup 4.0 mul x dup mul exch div sub neg } \psplot{\xxe}{\xxm}{ -1.0 dup 4.0 mul x dup mul exch div sub neg } } } \end{pspicture} \ec \end{figure} In the figure are two dotted curves, one of which represents the point's parabolic track if~$\sigma$ were varied while~$\tau$ were held constant and the other of which represents the point's parabolic track if~$\tau$ were varied while~$\sigma$ were held constant. Observe according to \S~\ref{vector:280.05}'s bisection finding that each parabola necessarily bisects the angle between two of the three line segments labeled~$a$ in the figure. Observe further that the two angles' sum is the straight angle $2\pi/2$, from which one can conclude, significantly, that \emph{the two parabolas cross precisely at right angles to one another.} Figure~\ref{vector:280:fig1} lays out the parabolic coordinate grid. \begin{figure} \caption{The parabolic coordinate grid in two dimensions.} \label{vector:280:fig1} \index{grid, parabolic coordinate} \index{coordinate grid!parabolic} \index{parabolic coordinate grid} \bc \nc\fxa{-5.0} \nc\fxb{5.0} \nc\fya{-3.6} \nc\fyb{3.6} \nc\tta{2.85} \nc\ttb{1.30} \nc\ttc{2.58} \nc\ttd{3.85} \nc\tte{5.00} \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} { \small \psset{dimen=middle} { { \psset{linewidth=0.5pt,plotpoints=200} { \psset{linecolor=gray} \psplot{-2.925}{-1.950}{ 3.25 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul } \psplot{-3.150}{-2.100}{ 3.50 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul } \psplot{-3.375}{-2.250}{ 3.75 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul } \psplot{-2.700}{-2.025}{ 2.25 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul neg } \psplot{-3.000}{-2.250}{ 2.50 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul neg } \psplot{-3.300}{-2.475}{ 2.75 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul neg } } \psline(0,-\tta)(0,\tta) \psplot{-\ttb}{\ttb}{ 1.0 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul } \psplot{-\ttc}{\ttc}{ 2.0 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul } \psplot{-\ttd}{\ttd}{ 3.0 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul } \psplot{-\tte}{\tte}{ 4.0 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul } \psplot{-\ttb}{\ttb}{ 1.0 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul neg } \psplot{-\ttc}{\ttc}{ 2.0 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul neg } \psplot{-\ttd}{\ttd}{ 3.0 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul neg } \psplot{-\tte}{\tte}{ 4.0 0.30 x 1 index div dup mul 2 index dup mul 2.0 mul 1.0 exch div mul 3 2 roll dup mul 2.0 div sub mul neg } \rput[r](0.08, 3.15){$\sigma = 0$} \rput[r](0.08,-3.15){$\tau = 0$} \rput(1.36, 2.99){$\pm 1$} \rput(1.36,-2.99){$\pm 1$} \rput(2.69, 2.51){$\pm 2$} \rput(2.69,-2.51){$\pm 2$} \rput(4.02, 1.70){$\pm 3$} \rput(4.02,-1.70){$\pm 3$} } } } \end{pspicture} \ec \end{figure} Notice in the figure that one of the grid's several cells is subdivided at its quarter-marks for illustration's sake, to show how one can subgrid at need to locate points like, for example, $(\sigma,\tau) = (\frac 7 2,-\frac 9 4)$ visually. (That the subgrid's cells approach square shape implies that the parabolic system is isotropic, a significant fact \S~\ref{vector:280.30} will demonstrate formally.) Using the Pythagorean theorem, one can symbolically express the equidistant construction rule above as \bq{vector:280:11} \begin{split} a &= \sigma^2 + y = \tau^2 - y, \\ a^2 &= \rho^2 = x^2 + y^2. \end{split} \eq From the first line of~(\ref{vector:280:11}), \bq{vector:280:12} y = \frac{\tau^2 - \sigma^2}{2}. \eq On the other hand, combining the two lines of~(\ref{vector:280:11}), \[ (\sigma^2 + y)^2 = x^2 + y^2 = (\tau^2 - y)^2, \] or, subtracting~$y^2$, \[ \sigma^4 + 2\sigma^2y = x^2 = \tau^4 - 2\tau^2y. \] Substituting~(\ref{vector:280:12})'s expression for~$y$, \[ x^2 = (\sigma\tau)^2. \] That either $x = +\sigma\tau$ or $x = -\sigma\tau$ would satisfy this equation. Arbitrarily choosing the~$+$ sign gives us that \bq{vector:280:13} x = \sigma\tau. \eq Also, since $\rho^2 = x^2 + y^2$,~(\ref{vector:280:12}) and~(\ref{vector:280:13}) together imply that \bq{vector:280:19} \rho = \frac{\tau^2 + \sigma^2}{2}. \eq Combining~(\ref{vector:280:12}) and~(\ref{vector:280:19}) to isolate~$\sigma^2$ and~$\tau^2$ yields \bq{vector:280:19a} \begin{split} \sigma^2 &= \rho - y, \\ \tau^2 &= \rho + y. \end{split} \eq \subsection{Properties} \label{vector:280.30} \index{parabolic coordinates!properties of} \index{coordinates!parabolic, properties of} \index{isotropy!of parabolic coordinates} \index{parabolic coordinates!isotropy of} \index{coordinates!parabolic, isotropy of} The derivatives of~(\ref{vector:280:13}), (\ref{vector:280:12}) and~(\ref{vector:280:19}) are \bq{vector:280:21} \begin{split} dx &= \sigma\,d\tau + \tau\,d\sigma, \\ dy &= \tau\,d\tau - \sigma\,d\sigma, \\ d\rho &= \tau\,d\tau + \sigma\,d\sigma. \end{split} \eq Solving the first two lines of~(\ref{vector:280:21}) simultaneously for~$d\sigma$ and~$d\tau$ and then collapsing the resultant subexpression $\tau^2 + \sigma^2$ per~(\ref{vector:280:19}) yields \bq{vector:280:22} \begin{split} d\sigma &= \frac{\tau\,dx - \sigma\,dy}{2\rho}, \\ d\tau &= \frac{\sigma\,dx + \tau\,dy}{2\rho}, \end{split} \eq from which it is apparent that \[ \begin{split} \wu\sigma &= \frac{\vu x \tau - \vu y \sigma}{\sqrt{\tau^2 + \sigma^2}}, \\ \wu\tau &= \frac{\vu x \sigma + \vu y \tau}{\sqrt{\tau^2 + \sigma^2}}; \end{split} \] or, collapsing again per~(\ref{vector:280:19}), that \bq{vector:280:17} \begin{split} \wu\sigma &= \frac{\vu x \tau - \vu y \sigma}{\sqrt{2\rho}}, \\ \wu\tau &= \frac{\vu x \sigma + \vu y \tau}{\sqrt{2\rho}}, \end{split} \eq of which the dot product \bq{vector:280:17t} \wu\sigma \cdot \wu\tau = 0 \ \mbox{if} \ \rho \neq 0 \eq is null, confirming our earlier finding that the various grid parabolas cross always at right angles to one another. Solving~(\ref{vector:280:17}) simultaneously for~$\vu x$ and~$\vu y$ then produces \bq{vector:280:17a} \begin{split} \vu x = \frac{\wu\tau \sigma + \wu\sigma \tau}{\sqrt{2\rho}}, \\ \vu y = \frac{\wu\tau \tau - \wu\sigma \sigma}{\sqrt{2\rho}}. \end{split} \eq One can express an infinitesimal change in position in the plane as \bqb d\we\rho &=& \vu x \,dx + \vu y \,dy \\&=& \vu x (\sigma\,d\tau + \tau\,d\sigma) + \vu y (\tau\,d\tau - \sigma\,d\sigma) \\&=& (\vu x \tau - \vu y \sigma) \,d\sigma + (\vu x \sigma + \vu y \tau) \,d\tau, \eqb in which~(\ref{vector:280:21}) has expanded the differentials and from which \[ \begin{split} \frac{\pl\we\rho}{\pl\sigma} &= \vu x \tau - \vu y \sigma, \\ \frac{\pl\we\rho}{\pl\tau} &= \vu x \sigma + \vu y \tau, \end{split} \] and thus \bq{vector:280:17u} \left|\frac{\pl\we\rho}{\pl\sigma}\right| = \left|\frac{\pl\we\rho}{\pl\tau}\right|. \eq Equations~(\ref{vector:280:17t}) and~(\ref{vector:280:17u}) respectively meet the requirements~(\ref{vector:270:20}) and~(\ref{vector:270:25}), implying that \emph{the two-dimensional parabolic coordinate system is isotropic} except at $\rho = 0$. Table~\ref{vector:280:t30} summarizes, gathering parabolic coordinate properties from this subsection and \S~\ref{vector:280.10}. \begin{table} \caption{Parabolic coordinate properties.} \label{vector:280:t30} \nc\tta{0.38} \nc\ttb{0.02} \bc \parbox[t]{\tta\textwidth}{ \bqb x &=& \sigma\tau \\ y &=& \frac{\tau^2 - \sigma^2}{2} \\ \rho &=& \frac{\tau^2 + \sigma^2}{2} \\ \rho^2 &=& x^2 + y^2 \\ \sigma^2 &=& \rho - y \\ \tau^2 &=& \rho + y \eqb } \hspace{\ttb\textwidth} \parbox[t]{\tta\textwidth}{ \bqb \vu x &=& \frac{\wu\tau \sigma + \wu\sigma \tau}{\sqrt{2\rho}} \\ \vu y &=& \frac{\wu\tau \tau - \wu\sigma \sigma}{\sqrt{2\rho}} \\ \wu\sigma &=& \frac{\vu x \tau - \vu y \sigma}{\sqrt{2\rho}} \\ \wu\tau &=& \frac{\vu x \sigma + \vu y \tau}{\sqrt{2\rho}} \\ \wu\sigma \times \wu\tau &=& \vu z \\ \wu\sigma \cdot \wu\tau &=& 0 \\ \left|\frac{\pl\we\rho}{\pl\sigma}\right| &=& \left|\frac{\pl\we\rho}{\pl\tau}\right| \eqb } \ec \end{table} \subsection{The parabolic cylindrical coordinate system} \label{vector:280.51} \index{parabolic cylindrical coordinates} \index{coordinates!parabolic cylindrical} \index{cylindrical coordinates!parabolic} \index{three-dimensional space} \index{parabolic cylinder} \index{cylinder!parabolic} Two-dimensional parabolic coordinates are trivially extended to three dimensions by adding a~$z$ coordinate, thus constituting the \emph{parabolic cylindrical} coordinate system $(\sigma,\tau,z)$. The surfaces of constant~$\sigma$ and of constant~$\tau$ in this system are \emph{parabolic cylinders} (and the surfaces of constant~$z$ naturally are planes). All the properties of Table~\ref{vector:280:t30} apply. Observe however that the system is isotropic only in two dimensions not three. The orthogonal parabolic cylindrical basis is $[\wu\sigma\;\wu\tau\;\vu z]$. \subsection{The circular paraboloidal coordinate system} \label{vector:280.56} \index{circular paraboloidal coordinates} \index{paraboloidal coordinates} \index{coordinates!circular paraboloidal} Sometimes one would like to extend the parabolic system to three dimensions by adding an azimuth~$\phi$ rather than a height~$z$. This is possible, but then one tends to prefer the parabolas, foci and directrices of Figs.~\ref{vector:280:fig2} and~\ref{vector:280:fig1} to run in the~$\rho$-$z$ plane rather than in the~$x$-$y$. Therefore, one defines the coordinates~$\eta$ and~$\xi$ to represent in the~$\rho$-$z$ plane what the letters~$\sigma$ and~$\tau$ have represented in the~$x$-$y$. The properties of Table~\ref{vector:280:t56} result, which are just the properties of Table~\ref{vector:280:t30} with coordinates changed. \begin{table} \caption{Circular paraboloidal coordinate properties.} \label{vector:280:t56} \nc\tta{0.42} \nc\ttb{0.02} \bc \parbox[t]{\tta\textwidth}{ \bqb \rho &=& \eta\xi \\ z &=& \frac{\xi^2 - \eta^2}{2} \\ r &=& \frac{\xi^2 + \eta^2}{2} \\ r^2 &=& \rho^2 + z^2 = x^2 + y^2 + z^2 \\ \eta^2 &=& r - z \\ \xi^2 &=& r + z \eqb } \hspace{\ttb\textwidth} \parbox[t]{\tta\textwidth}{ \bqb \wu\rho &=& \frac{\wu\xi \eta + \wu\eta \xi}{\sqrt{2r}} \\ \vu z &=& \frac{\wu\xi \xi - \wu\eta \eta}{\sqrt{2r}} \\ \wu\eta &=& \frac{\wu\rho \xi - \vu z \eta}{\sqrt{2r}} \\ \wu\xi &=& \frac{\wu\rho \eta + \vu z \xi}{\sqrt{2r}} \\ \wu\eta \times \wu\xi &=& -\wu\phi \\ \wu\eta \cdot \wu\xi &=& 0 \\ \left|\frac{\pl\ve r}{\pl\eta}\right| &=& \left|\frac{\pl\ve r}{\pl\xi}\right| \eqb } \ec \end{table} The system is the \emph{circular paraboloidal system} $(\eta;\phi,\xi)$. \index{paraboloid} The surfaces of constant~$\eta$ and of constant~$\xi$ in the circular paraboloidal system are \emph{paraboloids,} parabolas rotated about the~$z$ axis (and the surfaces of constant~$\phi$ are planes, or half planes if you like, just as in the cylindrical system). Like the parabolic cylindrical system, the circular paraboloidal system too is isotropic in two dimensions. \index{right-hand rule} Notice that, given the usual definition of the~$\wu\phi$ unit basis vector, $\wu\eta \times \wu\xi = -\wu\phi$ rather than~$+\wu\phi$ as one might first guess. The correct, right-handed sequence of the orthogonal circular paraboloidal basis therefore would be $[\wu\eta\;\wu\phi\;\wu\xi]$.% \footnote{ See footnote~\ref{vector:280:fn1}. } % ---------------------------------------------------------------------- This concludes the present chapter on the algebra of vector analysis. Chapter~\ref{vcalc}, next, will venture hence into the larger and even more interesting realm of vector calculus. derivations-0.53.20120414.orig/tex/drvtv.tex0000644000000000000000000015305711742566274017041 0ustar rootroot% ---------------------------------------------------------------------- \chapter{The derivative} \label{drvtv} \index{calculus} \index{calculus!the two complementary questions of} \index{derivative} \index{rate} \index{change, rate of} \index{Newton, Sir Isaac (1642--1727)} \index{Leibnitz, Gottfried Wilhelm (1646--1716)} The mathematics of \emph{calculus} concerns a complementary pair of questions:% \footnote{ Although once grasped the concept is relatively simple, to understand this pair of questions, so briefly stated, is no trivial thing. They are the pair which eluded or confounded the most brilliant mathematical minds of the ancient world. The greatest conceptual hurdle---the stroke of brilliance---probably lies simply in stating the pair of questions clearly. Sir Isaac Newton and G.W.~Leibnitz cleared this hurdle for us in the seventeenth century, so now at least we know the right pair of questions to ask. With the pair in hand, the calculus beginner's first task is quantitatively to understand the pair's interrelationship, generality and significance. Such an understanding constitutes the basic calculus concept. It cannot be the role of a book like this one to lead the beginner gently toward an apprehension of the basic calculus concept. Once grasped, the concept is simple and briefly stated. In this book we necessarily state the concept briefly, then move along. Many instructional textbooks---\cite{Hamming} is a worthy example---have been written to lead the beginner gently. Although a sufficiently talented, dedicated beginner could perhaps obtain the basic calculus concept directly here, he would probably find it quicker and more pleasant to begin with a book like the one referenced. } \bi \item Given some function $f(t)$, what is the function's instantaneous rate of change, or \emph{derivative,} $f'(t)$? \item Interpreting some function $f'(t)$ as an instantaneous rate of change, what is the corresponding accretion, or \emph{integral,} $f(t)$? \ei This chapter builds toward a basic understanding of the first question. % ---------------------------------------------------------------------- \section{Infinitesimals and limits} \label{drvtv:210} Calculus systematically treats numbers so large and so small, they lie beyond the reach of our mundane number system. \subsection{The infinitesimal} \label{drvtv:210.001} \index{infinitesimal} \index{infinity} \index{number!very large or very small} A number~$\ep$ is an \emph{infinitesimal} if it is so small that \[ 0 < \left|\ep\right| < a \] for all possible mundane positive numbers~$a$. This is somewhat a difficult concept, so if it is not immediately clear then let us approach the matter colloquially. Let me propose to you that I have an infinitesimal. ``How big is your infinitesimal?'' you ask. ``Very, very small,'' I reply. ``How small?'' ``Very small.'' ``Smaller than 0x0.01?'' ``Smaller than what?'' ``Than~$2^{-8}$. You said that we should use hexadecimal notation in this book, remember?'' ``Sorry. Yes, right, smaller than 0x0.01.'' ``What about 0x0.0001? Is it smaller than that?'' ``Much smaller.'' ``Smaller than $\mr{0x0.0000\,0000\,0000\,0001}$?'' ``Smaller.'' ``Smaller than $2^{-\mr{0x1\,0000\,0000\,0000\,0000}}$?'' ``Now \emph{that} is an impressively small number. Nevertheless, my infinitesimal is smaller still.'' ``Zero, then.'' ``Oh, no. Bigger than that. My infinitesimal is definitely bigger than zero.'' This is the idea of the infinitesimal. It is a definite number of a certain nonzero magnitude, but its smallness conceptually lies beyond the reach of our mundane number system. If~$\ep$ is an infinitesimal, then $1/\ep$ can be regarded as an \emph{infinity:} a very large number much larger than any mundane number one can name. \index{$\delta$} \index{$\epsilon$} \index{dividing by zero} \index{zero!dividing by} \index{$0$ (zero)!dividing by} \index{infinitesimal!practical size of} The principal advantage of using symbols like~$\ep$ rather than~$0$ for infinitesimals is in that it permits us conveniently to compare one infinitesimal against another, to add them together, to divide them, etc. For instance, if $\delta = 3\ep$ is another infinitesimal, then the quotient $\delta/\ep$ is not some unfathomable~$0/0$; rather it is $\delta/\ep = 3$. In physical applications, the infinitesimals are often not true mathematical infinitesimals but rather relatively very small quantities such as the mass of a wood screw compared to the mass of a wooden house frame, or the audio power of your voice compared to that of a jet engine. The additional cost of inviting one more guest to the wedding may or may not be infinitesimal, depending on your point of view. The key point is that the infinitesimal quantity be negligible by comparison, whatever ``negligible'' means in the context.% \footnote{ Among scientists and engineers who study wave phenomena, there is an old rule of thumb that sinusoidal waveforms be discretized not less finely than ten points per wavelength. In keeping with this book's adecimal theme (Appendix~\ref{hex}) and the concept of the hour of arc (\S~\ref{trig:260}), we should probably render the rule as \emph{twelve} points per wavelength here. In any case, even very roughly speaking, a quantity greater then $1/\mbox{0xC}$ of the principal to which it compares probably cannot rightly be regarded as infinitesimal. On the other hand, a quantity less than $1/\mbox{0x10000}$ of the principal is indeed infinitesimal for most practical purposes (but not all: for example, positions of spacecraft and concentrations of chemical impurities must sometimes be accounted more precisely). For quantities between $1/\mbox{0xC}$ and $1/\mbox{0x10000}$, it depends on the accuracy one seeks. } \index{infinitesimal!second- and higher-order} The second-order infinitesimal~$\ep^2$ is so small on the scale of the common, first-order infinitesimal~$\ep$ that the even latter cannot measure it. The~$\ep^2$ is an infinitesimal to the infinitesimals. Third- and higher-order infinitesimals are likewise possible. \index{$\ll$ and~$\gg$} The notation $u \ll v$, or $v \gg u$, indicates that~$u$ is much less than~$v$, typically such that one can regard the quantity $u/v$ to be an infinitesimal. In fact, one common way to specify that~$\ep$ be infinitesimal is to write that $\ep \ll 1$. \subsection{Limits} \label{drvtv:210.20} \index{limit} The notation $\lim_{z\rightarrow z_o}$ indicates that~$z$ draws as near to~$z_o$ as it possibly can. When written $\lim_{z\rightarrow z_o^+}$, the implication is that~$z$ draws toward~$z_o$ from the positive side such that $z > z_o$. Similarly, when written $\lim_{z\rightarrow z_o^-}$, the implication is that~$z$ draws toward~$z_o$ from the negative side. The reason for the notation is to provide a way to handle expressions like \[ \frac{3z}{2z} \] as~$z$ vanishes: \[ \lim_{z\rightarrow 0}\frac{3z}{2z} = \frac{3}{2}. \] The symbol ``$\lim_Q$'' is short for ``in the limit as~$Q$.'' Notice that $\lim$ is not a function like $\log$ or $\sin$. It is just a reminder that a quantity approaches some value, used when saying that the quantity \emph{equaled} the value would be confusing. Consider that to say \[ \lim_{z\rightarrow 2^-} ( z + 2 ) = 4 \] is just a fancy way of saying that $2+2=4.$ The $\lim$ notation is convenient to use sometimes, but it is not magical. Don't let it confuse you. % ---------------------------------------------------------------------- \section{Combinatorics} \label{drvtv:220} \index{combinatorics} In its general form, the problem of selecting~$k$ specific items out of a set of~$n$ available items belongs to probability theory (Ch.~\ref{prob}). In its basic form, however, the same problem also applies to the handling of polynomials or power series. This section treats the problem in its basic form.% \footnote{\cite{Hamming}} \subsection{Combinations and permutations} \label{drvtv:220.20} \index{combination} \index{permutation} \index{choice of wooden blocks} \index{selection from among wooden blocks} \index{block, wooden} \index{wooden block} Consider the following scenario. I have several small wooden blocks of various shapes and sizes, painted different colors so that you can clearly tell each block from the others. If I offer you the blocks and you are free to take all, some or none of them at your option, if you can take whichever blocks you want, then how many distinct choices of blocks do you have? Answer: you have~$2^n$ choices, because you can accept or reject the first block, then accept or reject the second, then the third, and so on. Now, suppose that what you want is exactly~$k$ blocks, neither more nor fewer. Desiring exactly~$k$ blocks, you select your favorite block first: there are~$n$ options for this. Then you select your second favorite: for this, there are $n-1$ options (why not~$n$ options? because you have already taken one block from me; I have only $n-1$ blocks left). Then you select your third favorite---for this there are $n-2$ options---and so on until you have~$k$ blocks. There are evidently \bq{drvtv:220:20} P\cmb{n}{k} \equiv n!/(n-k)! \eq ordered ways, or \emph{permutations,} available for you to select exactly~$k$ blocks. However, some of these distinct permutations put exactly the same \emph{combination} of blocks in your hand; for instance, the permutations red-green-blue and green-red-blue constitute the same combination, whereas red-white-blue is a different combination entirely. For a single combination of~$k$ blocks (red, green, blue), evidently~$k!$ permutations are possible (% red-green-blue, red-blue-green, green-red-blue, green-blue-red, blue-red-green, blue-green-red% ). Hence dividing the number of permutations~(\ref{drvtv:220:20}) by~$k!$ yields the number of combinations \bq{drvtv:220:30} \cmb{n}{k} \equiv \frac{n!/(n-k)!}{k!}. \eq \index{combination!properties of} \index{Pascal's triangle!neighbors in} Properties of the number {$\cmbl{n}{k}$} of combinations include that \bqa \cmb{n}{n-k} &=& \cmb{n}{k},\label{drvtv:220:31}\\ \sum_{k=0}^{n} \cmb{n}{k} &=& 2^n,\label{drvtv:220:34}\\ \cmb{n-1}{k-1} + \cmb{n-1}{k} &=& \cmb{n}{k},\label{drvtv:220:37}\\ \cmb{n}{k} &=& \frac{n-k+1}{k}\cmb{n}{k-1}\label{drvtv:220:41}\\ &=& \frac{k+1}{n-k}\cmb{n}{k+1}\label{drvtv:220:42}\\ &=& \frac{n}{k}\cmb{n-1}{k-1}\label{drvtv:220:43}\\ &=& \frac{n}{n-k}\cmb{n-1}{k}.\label{drvtv:220:44} \eqa Equation~(\ref{drvtv:220:31}) results from changing the variable $k\la n-k$ in~(\ref{drvtv:220:30}). Equation~(\ref{drvtv:220:34}) comes directly from the observation (made at the head of this section) that~$2^n$ total combinations are possible if any~$k$ is allowed. Equation~(\ref{drvtv:220:37}) is seen when an $n$th block---let us say that it is a black block---is added to an existing set of $n-1$ blocks; to choose~$k$ blocks then, you can either choose~$k$ from the original set, or the black block plus $k-1$ from the original set. Equations~(\ref{drvtv:220:41}) through~(\ref{drvtv:220:44}) come directly from the definition~(\ref{drvtv:220:30}); they relate combinatoric coefficients to their neighbors in Pascal's triangle (\S~\ref{drvtv:220.30}). Because one can choose neither fewer than zero nor more than~$n$ from~$n$ blocks, \bq{drvtv:220:48} \cmb n k = 0 \ \ \mbox{unless}\ 0 \le k \le n. \eq For $\cmbl n k$ when $n<0$, there is no obvious definition. \subsection{Pascal's triangle} \label{drvtv:220.30} \index{Pascal's triangle} \index{Pascal, Blaise (1623--1662)} Consider the triangular layout in Fig.~\ref{drvtv:pasc0} of the various possible $\cmbl{n}{k}$. \begin{figure} \caption{The plan for Pascal's triangle.} \label{drvtv:pasc0} \bc \[ \br{c} \cmbl{0}{0} \\ \cmbl{1}{0} \cmbl{1}{1} \\ \cmbl{2}{0} \cmbl{2}{1} \cmbl{2}{2} \\ \cmbl{3}{0} \cmbl{3}{1} \cmbl{3}{2} \cmbl{3}{3} \\ \cmbl{4}{0} \cmbl{4}{1} \cmbl{4}{2} \cmbl{4}{3} \cmbl{4}{4} \\ \vdots \er \] \ec \end{figure} Evaluated, this yields Fig.~\ref{drvtv:pasc}, \emph{Pascal's triangle.} Notice how each entry in the triangle is the sum of the two entries immediately above, as~(\ref{drvtv:220:37}) predicts. (In fact this is the easy way to fill Pascal's triangle out: for each entry, just add the two entries above.) \begin{figure} \nc\di[1]{\makebox[\wdig][c]{#1}} \caption{Pascal's triangle.} \label{drvtv:pasc} \bc \[ \br{c} 1 \\ 1\ \ 1 \\ 1\ \ 2\ \ 1 \\ 1\ \ 3\ \ 3\ \ 1 \\ 1\ \ 4\ \ 6\ \ 4\ \ 1 \\ 1\ \ 5\ \ \di A\ \ \di A\ \ 5\ \ 1 \\ 1\ \ 6\ \ \di F\ \ \di{14}\ \ \di F\ \ 6\ \ 1 \\ 1\ \ 7\ \ \di{15}\ \ \di{23}\ \ \di{23}\ \ \di{15}\ \ 7\ \ 1 \\ \vdots \er \] \ec \end{figure} % ---------------------------------------------------------------------- \section{The binomial theorem} \label{drvtv:230} \index{binomial theorem} This section presents the binomial theorem and one of its significant consequences. \subsection{Expanding the binomial} \label{drvtv:230.20} The \emph{binomial theorem} holds that% \footnote{ % diagn: this revised footnote wants one last review. The author is given to understand that, by an heroic derivational effort,~(\ref{drvtv:230:binth}) can be extended to nonintegral~$n$. However, since applied mathematics does not usually concern itself with hard theorems of little known practical use, the extension as such is not covered in this book. What is covered---in Table~\ref{taylor:315:tbl}---is the Taylor series for $(1+z)^{a-1}$ for complex~$z$ and complex~$a$, which amounts to much the same thing. } \bq{drvtv:230:binth} (a+b)^n = \sum_{k=0}^n \cmb{n}{k} a^{n-k}b^k. \eq In the common case that $a=1$, $b=\ep$, $\left| \ep \right| \ll 1$, this is \bq{drvtv:230:binthe} (1+\ep)^n = \sum_{k=0}^n \cmb{n}{k} \ep^k \eq (actually this holds for any~$\ep$, small or large; but the typical case of interest has $|\ep| \ll 1$). In either form, the binomial theorem is a direct consequence of the combinatorics of \S~\ref{drvtv:220}. Since \[ (a+b)^n = (a+b)(a+b)\cdots(a+b)(a+b), \] each $(a+b)$ factor corresponds to one of the ``wooden blocks,'' where~$a$ means rejecting the block and~$b$, accepting it. \subsection{Powers of numbers near unity} \label{drvtv:230.30} \index{$\approx$} Since $\cmbl{n}{0} = 1$ and $\cmbl{n}{1} = n$, it follows from~(\ref{drvtv:230:binthe}) for \[ (m,n) \in \mathbb Z, \ m > 0, \ n \ge 0, \ \left|\delta\right| \ll 1, \ \left|\ep\right| \ll 1, \ \left|\ep_o\right| \ll 1, \] that% \footnote{ The symbol~$\approx$ means ``approximately equals.'' } \[ 1 + m\ep_o \approx (1+\ep_o)^m \] to excellent precision. Furthermore, raising the equation to the $1/m$ power then changing $\delta \la m\ep_o$, we have that \[ (1 + \delta)^{1/m} \approx 1+\frac{\delta}{m}. \] Changing $1+\delta \la (1+\ep)^n$ and observing from the $(1+\ep_o)^m$ equation above that this implies that $\delta \approx n\ep$, we have that \[ (1 + \ep)^{n/m} \approx 1+\frac{n}{m}\ep. \] Inverting this equation yields \[ (1+\ep)^{-n/m} \approx \frac{1}{1 + (n/m)\ep} = \frac{[1-(n/m)\ep]}{[1-(n/m)\ep][1+(n/m)\ep]} \approx 1-\frac{n}{m}\ep. \] Taken together, the last two equations imply that \bq{drvtv:230:apxex} (1+\ep)^x \approx 1 + x\ep \eq for any real~$x$. \index{Taylor expansion, first-order} The writer knows of no conventional name% \footnote{ Actually, ``the first-order Taylor expansion'' is a conventional name for it, but so unwieldy a name does not fit the present context. Ch.~\ref{taylor} will introduce the Taylor expansion as such. } for~(\ref{drvtv:230:apxex}), but named or unnamed it is an important equation. The equation offers a simple, accurate way of approximating any real power of numbers in the near neighborhood of~1. \subsection{Complex powers of numbers near unity} \label{drvtv:230.35} \index{power!complex} \index{complex power} Equation~(\ref{drvtv:230:apxex}) is fine as far as it goes, but its very form suggests the question: what if~$\ep$ or~$x$, or both, are complex? Changing the symbol $z\la x$ and observing that the infinitesimal~$\ep$ may also be complex, one wants to know whether \bq{drvtv:230:apxe} (1+\ep)^z \approx 1 + z\ep \eq still holds. No work we have yet done in the book answers the question, because although a complex infinitesimal~$\ep$ poses no particular problem, the action of a complex power~$z$ remains undefined. Still, for consistency's sake, one would like~(\ref{drvtv:230:apxe}) to hold. In fact nothing prevents us from defining the action of a complex power such that~(\ref{drvtv:230:apxe}) does hold, which we now do, logically extending the known result~(\ref{drvtv:230:apxex}) into the new domain. Section~\ref{cexp:230} will investigate the extremely interesting effects which arise when $\Re(\ep)=0$ and the power~$z$ in~(\ref{drvtv:230:apxe}) grows large, but for the moment we shall use the equation in a more ordinary manner to develop the concept and basic application of the derivative, as follows. % ---------------------------------------------------------------------- \section{The derivative} \label{drvtv:240} \index{derivative!definition of} \index{derivative!Newton notation for} \index{Newton, Sir Isaac (1642--1727)} \index{derivative!balanced form} \index{derivative!unbalanced form} Having laid down~(\ref{drvtv:230:apxe}), we now stand in a position properly to introduce the chapter's subject, the derivative. What is the derivative? The \emph{derivative} is the instantaneous rate or slope of a function. In mathematical symbols and for the moment using real numbers, \bq{drvtv:def} f'(t) \equiv \lim_{\ep\rightarrow 0^+} \frac{f(t+\ep/2)-f(t-\ep/2)}{\ep}. \eq Alternately, one can define the same derivative in the unbalanced form \[ f'(t) = \lim_{\ep\rightarrow 0^+} \frac{f(t+\ep)-f(t)}{\ep}, \] but this book generally prefers the more elegant balanced form~(\ref{drvtv:def}), which we will now use in developing the derivative's several properties through the rest of the chapter.% \footnote{ From this section through \S~\ref{drvtv:260}, the mathematical notation grows a little thick. There is no helping this. The reader is advised to tread through these sections line by stubborn line, in the good trust that the math thus gained will prove both interesting and useful. } \subsection{The derivative of the power series} \label{drvtv:240.20} \index{power series!derivative of} In the very common case that $f(t)$ is the power series \bq{drvtv:240:30} f(t) = \sum_{k=-\infty}^{\infty} c_k t^k, \eq where the~$c_k$ are in general complex coefficients,~(\ref{drvtv:def}) says that \bqb f'(t) &=& \sum_{k=-\infty}^{\infty} \lim_{\ep\rightarrow 0^+} \frac{ (c_k)(t+\ep/2)^k - (c_k)(t-\ep/2)^k }{\ep} \\ &=& \sum_{k=-\infty}^{\infty} \lim_{\ep\rightarrow 0^+} c_kt^k \frac{ (1+\ep/2t)^k - (1-\ep/2t)^k }{\ep}. \eqb Applying~(\ref{drvtv:230:apxe}), this is \[ f'(t) = \sum_{k=-\infty}^{\infty} \lim_{\ep\rightarrow 0^+} c_kt^k \frac{ (1+k\ep/2t) - (1-k\ep/2t) }{\ep}, \] which simplifies to \bq{drvtv:240:polyderiv} f'(t) = \sum_{k=-\infty}^{\infty} c_kkt^{k-1}. \eq Equation~(\ref{drvtv:240:polyderiv}) gives the general derivative of the power series.% \footnote{ Equation~(\ref{drvtv:240:polyderiv}) admittedly has not explicitly considered what happens when the real~$t$ becomes the complex~$z$, but \S~\ref{drvtv:240.50} will remedy the oversight. } \subsection{The Leibnitz notation} \label{drvtv:240.25} \index{Leibnitz, Gottfried Wilhelm (1646--1716)} \index{Leibnitz notation} \index{derivative!Leibnitz notation for} \index{infinitesimal!and the Leibnitz notation} \index{dependent variable} \index{independent variable} \index{variable!dependent} \index{variable!independent} The $f'(t)$ notation used above for the derivative is due to Sir Isaac Newton, and is easier to start with. Usually better on the whole, however, is G.W.~Leibnitz's notation% \footnote{ This subsection is likely to confuse many readers the first time they read it. The reason is that Leibnitz elements like~$dt$ and~$\partial f$ usually tend to appear in practice in certain specific relations to one another, like $\partial f/\partial z$. As a result, many users of applied mathematics have never developed a clear understanding as to precisely what the individual symbols mean. Often they have developed positive misunderstandings. Because there is significant practical benefit in learning how to handle the Leibnitz notation correctly---particularly in applied complex variable theory---this subsection seeks to present each Leibnitz element in its correct light. } \bqb dt &=& \ep,\\ df &=& f(t+dt/2)-f(t-dt/2), \eqb such that per~(\ref{drvtv:def}), \bq{drvtv:240:50} f'(t) = \frac{df}{dt}. \eq Here~$dt$ is the infinitesimal, and~$df$ is a dependent infinitesimal whose size \emph{relative to~$dt$} depends on the independent variable~$t$. For the independent infinitesimal~$dt$, conceptually, one can choose any infinitesimal size~$\ep$. Usually the exact choice of size does not matter, but occasionally when there are two independent variables it helps the analysis to adjust the size of one of the independent infinitesimals with respect to the other. The meaning of the symbol~$d$ unfortunately depends on the context. In~(\ref{drvtv:240:50}), the meaning is clear enough: $d(\cdot)$ signifies how much $(\cdot)$ changes when the independent variable~$t$ increments by~$dt$.% \footnote{ If you do not fully understand this sentence, reread it carefully with reference to~(\ref{drvtv:def}) and~(\ref{drvtv:240:50}) until you do; it's important. } Notice, however, that the notation~$dt$ itself has two distinct meanings:% \footnote{ This is difficult, yet the author can think of no clearer, more concise way to state it. The quantities~$dt$ and~$df$ represent coordinated infinitesimal changes in~$t$ and~$f$ respectively, so there is usually no trouble with treating~$dt$ and~$df$ as though they were the same kind of thing. However, at the fundamental level they really aren't. If~$t$ is an independent variable, then~$dt$ is just an infinitesimal of some kind, whose specific size could be a function of~$t$ but more likely is just a constant. If a constant, then~$dt$ does not fundamentally have anything to do with~$t$ as such. In fact, if~$s$ and~$t$ are both independent variables, then we can (and in complex analysis sometimes do) say that $ds=dt=\ep$, after which nothing prevents us from using the symbols~$ds$ and~$dt$ interchangeably. Maybe it would be clearer in some cases to write~$\ep$ instead of~$dt$, but the latter is how it is conventionally written. By contrast, if~$f$ is a dependent variable, then~$df$ or $d(f)$ is the amount by which~$f$ changes as~$t$ changes by~$dt$. The~$df$ is infinitesimal but not constant; it is a function of~$t$. Maybe it would be clearer in some cases to write~$d_tf$ instead of~$df$, but for most cases the former notation is unnecessarily cluttered; the latter is how it is conventionally written. Now, most of the time, what we are interested in is not~$dt$ or~$df$ as such, but rather the ratio $df/dt$ or the sum $\sum_k f(k\,dt) \,dt = \int f(t) \,dt$. For this reason, we do not usually worry about which of~$df$ and~$dt$ is the independent infinitesimal, nor do we usually worry about the precise value of~$dt$. This leads one to forget that~$dt$ does indeed have a precise value. What confuses is when one changes perspective in mid-analysis, now regarding~$f$ as the independent variable. Changing perspective is allowed and perfectly proper, but one must take care: the~$dt$ and~$df$ after the change are not the same as the~$dt$ and~$df$ before the change. However, the ratio $df/dt$ remains the same in any case. Sometimes when writing a differential equation like the potential-kinetic energy equation $ma\,dx=mv\,dv$, we do not necessarily have either~$v$ or~$x$ in mind as the independent variable. This is fine. The important point is that~$dv$ and~$dx$ be coordinated so that the ratio $dv/dx$ has a definite value no matter which of the two be regarded as independent, or whether the independent be some third variable (like~$t$) not in the equation. One can avoid the confusion simply by keeping the $dv/dx$ or $df/dt$ always in ratio, never treating the infinitesimals individually. Many applied mathematicians do precisely that. That is okay as far as it goes, but it really denies the entire point of the Leibnitz notation. One might as well just stay with the Newton notation in that case. Instead, this writer recommends that you learn the Leibnitz notation properly, developing the ability to treat the infinitesimals individually. Because the book is a book of applied mathematics, this footnote does not attempt to say everything there is to say about infinitesimals. For instance, it has not yet pointed out (but does so now) that even if~$s$ and~$t$ are equally independent variables, one can have $dt = \ep(t)$, $ds = \delta(s,t)$, such that~$dt$ has prior independence to~$ds$. The point is not to fathom all the possible implications from the start; you can do that as the need arises. The point is to develop a clear picture in your mind of what a Leibnitz infinitesimal really is. Once you have the picture, you can go from there. } \bi \item the independent infinitesimal $dt=\ep$; and \item $d(t)$, which is how much $(t)$ changes as~$t$ increments by~$dt$. \ei At first glance, the distinction between~$dt$ and $d(t)$ seems a distinction without a difference; and for most practical cases of interest, so indeed it is. However, when switching perspective in mid-analysis as to which variables are dependent and which are independent, or when changing multiple independent complex variables simultaneously, the math can get a little tricky. In such cases, it may be wise to use the symbol~$dt$ to mean $d(t)$ only, introducing some unambiguous symbol like~$\ep$ to represent the independent infinitesimal. In any case you should appreciate the conceptual difference between $dt=\ep$ and $d(t)$, both of which nonetheless normally are written~$dt$. \index{partial derivative} \index{derivative!partial} Where two or more independent variables are at work in the same equation, it is conventional to use the symbol~$\partial$ instead of~$d$, as a reminder that the reader needs to pay attention to which~$\partial$ tracks which independent variable.% \footnote{ The writer confesses that he remains unsure why this minor distinction merits the separate symbol~$\partial$, but he accepts the notation as conventional nevertheless. } A derivative $\partial f/\partial t$ or $\partial f/\partial s$ in this case is sometimes called by the slightly misleading name of \emph{partial derivative}. (If needed or desired, one can write $\partial_t f$ when tracking~$t$, $\partial_s f$ when tracking~$s$, etc. Use discretion, though. Such notation appears only rarely in the literature, so your audience might not understand it when you write it.) Conventional shorthand for $d(df)$ is~$d^2f$; for $(dt)^2$, $dt^2$; so \[ \frac{d(df/dt)}{dt} = \frac{d^2f}{dt^2} \] is a derivative of a derivative, or \emph{second derivative.} By extension, the notation \[ \frac{d^kf}{dt^k} \] represents the $k$th derivative. \subsection{The derivative of a function of a complex variable} \label{drvtv:240.50} \index{derivative!of a function of a complex variable} \index{complex variable} \index{variable!complex} \index{function!of a complex variable} For~(\ref{drvtv:def}) to be robust, written here in the slightly more general form \bq{drvtv:defz} \frac{df}{dz} = \lim_{\ep\rightarrow 0} \frac{f(z+\ep/2)-f(z-\ep/2)}{\ep}, \eq one should like it to evaluate the same in the limit regardless of the complex phase of~$\ep$. That is, if~$\delta$ is a positive real infinitesimal, then it should be equally valid to let $\ep=\delta$, $\ep=-\delta$, $\ep=i\delta$, $\ep=-i\delta$, $\ep=(4-i3)\delta$ or any other infinitesimal value, so long as $0<\left|\ep\right|\ll 1$. One should like the derivative~(\ref{drvtv:defz}) to come out the same regardless of the Argand direction from which~$\ep$ approaches 0 (see Fig.~\ref{alggeo:225:fig}). In fact for the sake of robustness, one normally demands that derivatives do come out the same regardless of the Argand direction; and~(\ref{drvtv:defz}) rather than~(\ref{drvtv:def}) is the definition we normally use for the derivative for this reason. Where the limit~(\ref{drvtv:defz}) is sensitive to the Argand direction or complex phase of~$\ep$, there we normally say that the derivative does not exist. \index{differentiability} Where the derivative~(\ref{drvtv:defz}) does exist---where the derivative is finite and insensitive to Argand direction---there we say that the function $f(z)$ is \emph{differentiable.}% \footnote{ % diagn: this footnote still wants review. The unbalanced definition of the derivative from \S~\ref{drvtv:240}, whose complex form is \[ \frac{df}{dz} = \lim_{\ep\rightarrow 0} \frac{f(z+\ep)-f(z)}{\ep}, \] does not always serve applications as well as does the balanced definition~(\ref{drvtv:defz}) this book prefers. Professional mathematicians have different needs, though. They seem to prefer the unbalanced nonetheless. In the professionals' favor, one acknowledges that the balanced definition strictly misjudges the modulus function $f(z) = \left|z\right|$ to be differentiable solely at the point $z=0$, whereas that the unbalanced definition, probably more sensibly, judges the modulus to be differentiable nowhere---though the writer is familiar with no significant applied-mathematical implication of the distinction. (Would it co\"ordinate the two definitions to insist that a derivative exist not only at a point but everywhere in the point's immediate, complex neighborhood? The writer does not know. It is a question for the professionals.) Scientists and engineers tend to prefer the balanced definition among other reasons because it more reliably approximates the derivative of a function for which only discrete samples are available~\cite[\S\S~I:9.6 and~I:9.7]{Feynman}\@. Moreover, for this writer at least the balanced definition just better captures the subjective sense of the thing. } Excepting the nonanalytic parts of complex numbers ($\left|\cdot\right|$, $\arg[\cdot]$, $[\cdot]^{*}$, $\Re[\cdot]$ and $\Im[\cdot]$; see \S~\ref{alggeo:225.3}), plus the Heaviside unit step $u(t)$ and the Dirac delta $\delta(t)$ (\S~\ref{integ:670}), most functions encountered in applications do meet the criterion~(\ref{drvtv:defz}) except at isolated nonanalytic points (like $z=0$ in $h[z]=1/z$ or $g[z]=\sqrt z$). Meeting the criterion, such functions are fully differentiable except at their poles (where the derivative goes infinite in any case) and other nonanalytic points. Particularly, the key formula~(\ref{drvtv:230:apxe}), written here as \[ (1+\ep)^w \approx 1 + w\ep, \] works without modification when~$\ep$ is complex; so the derivative~(\ref{drvtv:240:polyderiv}) of the general power series, \bq{drvtv:240:polyderivz} \frac{d}{dz} \sum_{k=-\infty}^{\infty} c_kz^k = \sum_{k=-\infty}^{\infty} c_kkz^{k-1} \eq holds equally well for complex~$z$ as for real. \subsection{The derivative of~$z^a$} \label{drvtv:240.30} \index{derivative!of~$z^a$} Inspection of \S~\ref{drvtv:240.20}'s logic in light of~(\ref{drvtv:230:apxe}) reveals that nothing prevents us from replacing the real~$t$, real~$\ep$ and integral~$k$ of that section with arbitrary complex~$z$, $\ep$ and~$a$. That is, \bqb \frac{d(z^a)}{dz} &=& \lim_{\ep\rightarrow 0} \frac{ (z+\ep/2)^a - (z-\ep/2)^a }{\ep} \\ &=& \lim_{\ep\rightarrow 0} z^a \frac{ (1+\ep/2z)^a - (1-\ep/2z)^a }{\ep} \\ &=& \lim_{\ep\rightarrow 0} z^a \frac{ (1+a\ep/2z) - (1-a\ep/2z) }{\ep}, \eqb which simplifies to \bq{drvtv:240.30:10} \frac{d(z^a)}{dz} = az^{a-1} \eq for any complex~$z$ and~$a$. How exactly to evaluate~$z^a$ or $z^{a-1}$ when~$a$ is complex is another matter, treated in \S~\ref{cexp:230} and its~(\ref{cexp:230:33}); but in any case you can use~(\ref{drvtv:240.30:10}) for real~$a$ right now. \subsection{The logarithmic derivative} \label{drvtv:240.40} \index{logarithmic derivative} \index{derivative!logarithmic} \index{rate!relative} \index{relative rate} \index{interest} \index{bond} Sometimes one is more interested in knowing the rate of $f(t)$ \emph{relative to the value of $f(t)$} than in knowing the absolute rate itself. For example, if you inform me that you earn $\$\:1000$ a year on a bond you hold, then I may commend you vaguely for your thrift but otherwise the information does not tell me much. However, if you inform me instead that you earn~10 percent a year on the same bond, then I might want to invest. The latter figure is a relative rate, or \emph{logarithmic derivative,} \bq{drvtv:240.40:10} \frac{df/dt}{f(t)} = \frac{d}{dt}\ln f(t). \eq The investment principal grows at the absolute rate $df/dt$, but the bond's interest rate is $(df/dt)/f(t)$. The natural logarithmic notation $\ln f(t)$ may not mean much to you yet, as we'll not introduce it formally until \S~\ref{cexp:225}, so you can ignore the right side of~(\ref{drvtv:240.40:10}) for the moment; but the equation's left side at least should make sense to you. It expresses the significant concept of a relative rate, like~10 percent annual interest on a bond. % ---------------------------------------------------------------------- \section{Basic manipulation of the derivative} \label{drvtv:250} \index{derivative!manipulation of} This section introduces the derivative chain and product rules. \subsection{The derivative chain rule} \label{drvtv:250.20} \index{derivative!chain rule for} \index{chain rule, derivative} If~$f$ is a function of~$w$, which itself is a function of~$z$, then% \footnote{ For example, one can rewrite \[ f(z) = \sqrt{3z^2 - 1} \] in the form \bqb f(w) &=& w^{1/2}, \\ w(z) &=& 3z^2 - 1. \eqb Then \bqb \frac{df}{dw} &=& \frac{1}{2w^{1/2}} = \frac{1}{2\sqrt{3z^2 - 1}}, \\ \frac{dw}{dz} &=& 6z, \eqb so by~(\ref{drvtv:chain}), \[ \frac{df}{dz} = \left(\frac{df}{dw}\right) \left(\frac{dw}{dz}\right) = \frac{ 6z }{ 2\sqrt{3z^2 - 1} } = \frac{ 3z }{ \sqrt{3z^2 - 1} }. \] } \bq{drvtv:chain} \frac{df}{dz} = \left(\frac{df}{dw}\right) \left(\frac{dw}{dz}\right). \eq Equation~(\ref{drvtv:chain}) is the \emph{derivative chain rule.}% \footnote{ It bears emphasizing to readers who may inadvertently have picked up unhelpful ideas about the Leibnitz notation in the past: the~$dw$ factor in the denominator cancels the~$dw$ factor in the numerator; a thing divided by itself is~1. That's it. There is nothing more to the proof of the derivative chain rule than that. } \subsection{The derivative product rule} \label{drvtv:250.30} \index{derivative!product rule for} \index{product rule, derivative} In general per~(\ref{drvtv:defz}), \[ d\left[\prod_j f_j(z)\right] = \prod_j f_j\left(z+\frac{dz}{2}\right) - \prod_j f_j\left(z-\frac{dz}{2}\right). \] But to first order, \[ f_j\left(z\pm\frac{dz}{2}\right) \approx f_j(z) \pm \left(\frac{df_j}{dz}\right)\left(\frac{dz}{2}\right) = f_j(z) \pm \frac{df_j}{2}; \] so, in the limit, \[ d\left[\prod_j f_j(z)\right] = \prod_j \left(f_j(z) + \frac{df_j}{2}\right) - \prod_j \left(f_j(z) - \frac{df_j}{2}\right). \] Since the product of two or more~$df_j$ is negligible compared to the first-order infinitesimals to which they are added here, this simplifies to \[ d\left[\prod_j f_j(z)\right] = \left[ \prod_j f_j(z) \right] \left[\sum_k \frac{df_k}{2f_k(z)} \right] - \left[ \prod_j f_j(z) \right] \left[\sum_k \frac{-df_k}{2f_k(z)} \right], \] or in other words \bq{drvtv:prod} d\prod_j f_j = \left[ \prod_j f_j \right]\left[ \sum_k \frac{df_k}{f_k} \right]. \eq In the common case of only two~$f_j$, this comes to \bq{drvtv:prod2} d(f_1f_2) = f_2\,df_1 + f_1\,df_2. \eq On the other hand, if $f_1(z) = f(z)$ and $f_2(z) = 1/g(z)$, then by the derivative chain rule~(\ref{drvtv:chain}), $df_2 = -dg/g^2$; so, \bq{drvtv:proddiv} d\left(\frac f g\right) = \frac{g\,df - f\,dg}{g^2}. \eq Equation~(\ref{drvtv:prod}) is the \emph{derivative product rule.} After studying the complex exponential in Ch.~\ref{cexp}, we shall stand in a position to write~(\ref{drvtv:prod}) in the slightly specialized but often useful form% \footnote{ This paragraph is extra. You can skip it for now if you prefer. } \bqa \lefteqn{d\left[\prod_j g_j^{a_j} \prod_j e^{b_jh_j} \prod_j \ln c_jp_j \right]} && \xn\\&=& \left[ \prod_j g_j^{a_j} \prod_j e^{b_jh_j} \prod_j \ln c_jp_j \right] \xn\\&& \ \ \mbox{}\times \left[ \sum_k a_k\frac{dg_k}{g_k} + \sum_k b_k\,dh_k + \sum_k \frac{dp_k}{p_k\ln c_kp_k}\right]. \label{drvtv:prod3} \eqa where the~$a_k$, $b_k$ and~$c_k$ are arbitrary complex coefficients and the~$g_k$, $h_k$ and~$p_k$ are arbitrary functions.% \footnote{ The subsection is sufficiently abstract that it is a little hard to understand unless one already knows what it means. An example may help: \[ d\left[\frac{u^2v^3}{z} e^{-5t} \ln 7s \right] = \left[\frac{u^2v^3}{z} e^{-5t} \ln 7s \right] \left[ 2\frac{du}{u} + 3\frac{dv}{v} - \frac{dz}{z} - 5\,dt + \frac{ds}{s\ln 7s} \right]. \] } \subsection{A derivative product pattern} \label{drvtv:250.35} \index{pattern!derivative product} \index{product rule, derivative!a pattern of} \index{derivative!product rule for} \index{derivative product!a pattern of} \index{derivative pattern} According to~(\ref{drvtv:prod2}) and~(\ref{drvtv:240.30:10}), the derivative of the product $z^af(z)$ with respect to its independent variable~$z$ is \[ \frac{d}{dz}[z^af(z)] = z^a\frac{df}{dz} + az^{a-1}f(z). \] Swapping the equation's left and right sides then dividing through by~$z^a$ yields \bq{drvtv:250:35} \frac{df}{dz} + a\frac{f}{z} = \frac{d(z^af)}{z^a \,dz}, \eq a pattern worth committing to memory, emerging among other places in \S~\ref{vcalc:440}. % ---------------------------------------------------------------------- \section{Extrema and higher derivatives} \label{drvtv:255} \index{extremum} \index{minimum} \index{maximum} \index{derivative!higher} \index{function!extremum of} \index{evaluation} One problem which arises very frequently in applied mathematics is the problem of finding a local \emph{extremum}---that is, a local minimum or max\-imum---of a real-valued function $f(x)$. Refer to Fig.~\ref{drvtv:255:fig1}. \begin{figure} \caption{A local extremum.} \label{drvtv:255:fig1} \bc \begin{pspicture}(-2,-1.5)(7.5,4) { \small \psline[linewidth=0.5pt](-1,0)(6,0) \psline[linewidth=0.5pt](0,-0.5)(0,3) \psplot[linewidth=2.0pt,plotpoints=200]{0.5}{5.0}{ x 2.5 sub dup mul -0.15 mul 2 add } \psdot[linewidth=2.0pt](2.5,2.0) \psline[linestyle=dashed,linewidth=0.5pt](0,2.0)(4.5,2.0) \psline[linestyle=dashed,linewidth=0.5pt](2.5,0)(2.5,2.0) \rput(2.5,-0.3){$x_o$} \rput[r](-0.15,2.0){$f(x_o)$} \rput(5.4,0.9){$f(x)$} \rput(6.3,0){$x$} \rput(0,3.3){$y$} } \end{pspicture} \ec \end{figure} The almost distinctive characteristic of the extremum $f(x_o)$ is that% \footnote{ The notation $P|_Q$ means ``$P$ when $Q$,'' ``$P$, given $Q$,'' or ``$P$ evaluated at $Q$.'' Sometimes it is alternately written $P|Q$ or $[P]_Q$. } \bq{drvtv:255:10} \left.\frac{df}{dx}\right|_{x=x_o} = 0. \eq At the extremum, the slope is zero. The curve momentarily runs level there. One solves~(\ref{drvtv:255:10}) to find the extremum. \index{slope} \index{derivative!second} \index{second derivative} \index{inflection} Whether the extremum be a minimum or a maximum depends on wheth\-er the curve turn from a downward slope to an upward, or from an upward slope to a downward, respectively. If from downward to upward, then the derivative of the slope is evidently positive; if from upward to downward, then negative. But the derivative of the slope is just the derivative of the derivative, or second derivative. Hence if $df/dx=0$ at $x=x_o$, then \bqb \left.\frac{d^2f}{dx^2}\right|_{x=x_o} &>& 0 \ \ \mbox{implies a local minimum at~$x_o$;} \\ \left.\frac{d^2f}{dx^2}\right|_{x=x_o} &<& 0 \ \ \mbox{implies a local maximum at~$x_o$.} \eqb Regarding the case \[ \left.\frac{d^2f}{dx^2}\right|_{x=x_o} = 0, \] this might be either a minimum or a maximum but more probably is neither, being rather a \emph{level inflection point} as depicted in Fig.~\ref{drvtv:255:fig2}.% \footnote{ Of course if the first and second derivatives are zero not just at $x=x_o$ but everywhere, then $f(x) = y_o$ is just a level straight line, but you knew that already. Whether one chooses to call some random point on a level straight line an inflection point or an extremum, or both or neither, would be a matter of definition, best established not by prescription but rather by the needs of the model at hand. } \begin{figure} \caption{A level inflection.} \label{drvtv:255:fig2} \bc \begin{pspicture}(-2,-1.5)(7.5,4) { \small \psline[linewidth=0.5pt](-1,0)(6,0) \psline[linewidth=0.5pt](0,-0.5)(0,3) \psplot[linewidth=2.0pt,plotpoints=200]{0.5}{5.0}{ x 2.5 sub dup dup mul mul -0.15 mul 2 add } \psdot[linewidth=2.0pt](2.5,2.0) \psline[linestyle=dashed,linewidth=0.5pt](0,2.0)(4.5,2.0) \psline[linestyle=dashed,linewidth=0.5pt](2.5,0)(2.5,2.0) \rput(2.5,-0.3){$x_o$} \rput[r](-0.15,2.0){$f(x_o)$} \rput(5.1,0.9){$f(x)$} \rput(6.3,0){$x$} \rput(0,3.3){$y$} } \end{pspicture} \ec \end{figure} (In general the term \emph{inflection point} signifies a point at which the second derivative is zero. The inflection point of Fig.~\ref{drvtv:255:fig2} is \emph{level} because its first derivative is zero, too.) % ---------------------------------------------------------------------- \section{L'H\^opital's rule} \label{drvtv:260} \index{l'H\^opital's rule} \index{l'H\^opital, Guillaume de (1661--1704)} \index{root} \index{pole} If $z=z_o$ is a root of both $f(z)$ and $g(z)$, or alternately if $z=z_o$ is a pole of both functions---that is, if both functions go to zero or infinity together at $z=z_o$---then \emph{l'H\^opital's rule} holds that \bq{drvtv:260:lhopital} \lim_{z\ra z_o} \frac{f(z)}{g(z)} = \left. \frac{df/dz}{dg/dz} \right|_{z=z_o}. \eq In the case where $z=z_o$ is a root, l'H\^opital's rule is proved by reasoning% \footnote{ Partly with reference to \cite[``L'Hopital's rule,'' 03:40, 5 April 2006]{wikip}. } \bqb \lefteqn{ \lim_{z\ra z_o} \frac{f(z)}{g(z)} = \lim_{z\ra z_o} \frac{f(z)-0}{g(z)-0} } &&\\&& = \lim_{z\ra z_o} \frac{f(z)-f(z_o)}{g(z)-g(z_o)} = \lim_{z\ra z_o} \frac{df}{dg} = \lim_{z\ra z_o} \frac{df/dz}{dg/dz}. \eqb In the case where $z=z_o$ is a pole, new functions $F(z)\equiv 1/f(z)$ and $G(z)\equiv 1/g(z)$ of which $z=z_o$ is a root are defined, with which \[ \lim_{z\ra z_o} \frac{f(z)}{g(z)} = \lim_{z\ra z_o} \frac{G(z)}{F(z)} = \lim_{z\ra z_o} \frac{dG}{dF} = \lim_{z\ra z_o} \frac{-dg/g^2}{-df/f^2}, \] where we have used the fact from~(\ref{drvtv:240.30:10}) that $d(1/u)=-du/u^2$ for any~$u$. Canceling the minus signs and multiplying by $g^2/f^2$, we have that \[ \lim_{z\ra z_o} \frac{g(z)}{f(z)} = \lim_{z\ra z_o} \frac{dg}{df}. \] Inverting, \[ \lim_{z\ra z_o} \frac{f(z)}{g(z)} = \lim_{z\ra z_o} \frac{df}{dg} = \lim_{z\ra z_o} \frac{df/dz}{dg/dz}. \] And if~$z_o$ itself is infinite? Then, whether it represents a root or a pole, we define the new variable $Z=1/z$ and the new functions $\Phi(Z)=f(1/Z)=f(z)$ and $\Gamma(Z)=g(1/Z)=g(z)$, with which we apply l'H\^opital's rule for $Z\ra 0$ to obtain \bqb \lefteqn{ \lim_{z\ra \infty} \frac{f(z)}{g(z)} = \lim_{Z\ra 0} \frac{\Phi(Z)}{\Gamma(Z)} = \lim_{Z\ra 0} \frac{d\Phi/dZ}{d\Gamma/dZ} = \lim_{Z\ra 0} \frac {df/dZ} {dg/dZ} } &&\\&& = \lim_{ \stackindexdecl{ z &\ra& \infty, \\ Z &\ra& 0 } } \frac{(df/dz)(dz/dZ)}{(dg/dz)(dz/dZ)} = \lim_{z\ra \infty} \frac{(df/dz)(-z^2)}{(dg/dz)(-z^2)} = \lim_{z\ra \infty} \frac{df/dz}{dg/dz}. \eqb Nothing in the derivation requires that~$z$ or~$z_o$ be real. Nothing prevents one from applying l'H\^opital's rule recursively, should the occasion arise.% \footnote{ Consider for example the ratio $\lim_{x\ra 0} (x^3+x)^2/x^2$, which is $0/0$. The easier way to resolve this particular ratio would naturally be to cancel a factor of~$x^2$ from it; but just to make the point let us apply l'H\^opital's rule instead, reducing the ratio to $\lim_{x\ra 0} 2(x^3+x)(3x^2+1)/2x$, which is still $0/0$. Applying l'H\^opital's rule again to the result yields $\lim_{x\ra 0} 2[(3x^2+1)^2+(x^3+x)(6x)]/2 = 2/2 = 1$. Where expressions involving trigonometric or special functions (Chs.~\ref{trig}, \ref{cexp} and % diagn [not yet written]) appear in ratio, a recursive application of l'H\^opital's rule can be just the thing one needs. Observe that one must stop applying l'H\^opital's rule once the ratio is no longer $0/0$ or $\infty/\infty$. In the example, applying the rule a third time would have ruined the result. } \index{indeterminate form} L'H\^opital's rule is used in evaluating indeterminate forms of the kinds % bad break $0/0$ and $\infty/\infty$, plus related forms like $(0)(\infty)$ which can be recast into either of the two main forms. Good examples of the use require math from Ch.~\ref{cexp} and later, but if we may borrow from~(\ref{cexp:225:dln}) the natural logarithmic function and its derivative,% \footnote{ This paragraph is optional reading for the moment. You can read Ch.~\ref{cexp} first, then come back here and read the paragraph if you prefer. } \[ \frac{d}{dx}\ln x = \frac{1}{x}, \] then a typical l'H\^opital example is% \footnote{\cite[\S~10-2]{Shenk}} \[ \lim_{x\ra\infty}\frac{\ln x}{\sqrt x} = \lim_{x\ra\infty}\frac{1/x}{1/2\sqrt x} = \lim_{x\ra\infty}\frac{2}{\sqrt x} = 0. \] The example incidentally shows that natural logarithms grow slower than square roots, an instance of a more general principle we shall meet in \S~\ref{cexp:228}. Section~\ref{cexp:228} will put l'H\^opital's rule to work. % ---------------------------------------------------------------------- \section{The Newton-Raphson iteration} \label{drvtv:270} \index{Newton-Raphson iteration} \index{Raphson, Joseph (1648--1715)} \index{Newton, Sir Isaac (1642--1727)} \index{root!finding of numerically} \index{iteration} The \emph{Newton-Raphson iteration} is a powerful, fast converging, broadly applicable method for finding roots numerically. Given a function $f(z)$ of which the root is desired, the Newton-Raphson iteration is \bq{drvtv:NR} z_{k+1} = \left. z - \frac{f(z)}{\frac{d}{dz}f(z)}\right|_{z=z_k}. \eq One begins the iteration by guessing the root and calling the guess~$z_0$. Then~$z_1$, $z_2$, $z_3$, etc., calculated in turn by the iteration~(\ref{drvtv:NR}), give successively better estimates of the true root~$z_\infty$. \index{tangent line} To understand the Newton-Raphson iteration, consider the function $y=f(x)$ of Fig~\ref{drvtv:270:fig1}. \begin{figure} \caption{The Newton-Raphson iteration.} \label{drvtv:270:fig1} \bc \begin{pspicture}(-2,-1.5)(7.5,4) { \small \psline[linewidth=0.5pt](-1,0)(6,0) \psline[linewidth=0.5pt](0,-0.5)(0,3) \psplot[linewidth=2.0pt,plotpoints=200]{1.0}{5.0}{ x 6.0 sub dup mul 0.15 mul -1 add } \psdot[linewidth=2.0pt](1.35,2.2434) \psline[linestyle=dashed,linewidth=1.0pt](1.35,2.2434)(3.3000,-0.47687) \psline[linestyle=dashed,linewidth=0.5pt](1.35,0)(1.35,2.0) \psline[linewidth=0.5pt](1.35,-0.2)(1.35,0) \psline[linewidth=0.5pt](2.9582,-0.2)(2.9582,0) \rput(1.35,-0.4){$x_k$} \rput(2.8,-0.4){$x_{k+1}$} \rput(4.50,-1.0){$f(x)$} \rput(6.3,0){$x$} \rput(0,3.3){$y$} } \end{pspicture} \ec \end{figure} The iteration approximates the curve $f(x)$ by its tangent line% \footnote{ A \emph{tangent} line, also just called a \emph{tangent,} is the line which most nearly approximates a curve at a given point. The tangent touches the curve at the point, and in the neighborhood of the point it goes in the same direction the curve goes. The dashed line in Fig.~\ref{drvtv:270:fig1} is a good example of a tangent line. The relationship between the tangent line and the trigonometric tangent function of Ch.~\ref{trig} is slightly obscure, maybe more of linguistic interest than of mathematical. The trigonometric tangent function is named from a variation on Fig.~\ref{trig:226:f1} in which the triangle's bottom leg is extended to unit length, leaving the rightward leg tangent to the circle. } (shown as the dashed line in the figure): \[ \tilde f_k(x) = f(x_k) + \left[\frac{d}{dx}f(x)\right]_{x=x_k} (x-x_k). \] It then approximates the root $x_{k+1}$ as the point at which $\tilde f_k(x_{k+1}) = 0$: \[ \tilde f_k(x_{k+1}) = 0 = f(x_k) + \left[\frac{d}{dx}f(x)\right]_{x=x_k} (x_{k+1}-x_k). \] Solving for $x_{k+1}$, we have that \[ x_{k+1} = \left. x - \frac{f(x)}{\frac{d}{dx}f(x)}\right|_{x=x_k}, \] which is~(\ref{drvtv:NR}) with $x\la z$. Although the illustration uses real numbers, nothing forbids complex~$z$ and $f(z)$. The Newton-Raphson iteration works just as well for these. The principal limitation of the Newton-Raphson arises when the function has more than one root, as most interesting functions do. The iteration often converges on the root nearest the initial guess~$z_o$ but does not always, and in any case there is no guarantee that the root it finds is the one you wanted. The most straightforward way to beat this problem is to find \emph{all} the roots: first you find some root~$\alpha$, then you remove that root (without affecting any of the other roots) by dividing $f(z)/(z-\alpha)$, then you find the next root by iterating on the new function $f(z)/(z-\alpha)$, and so on until you have found all the roots. If this procedure is not practical (perhaps because the function has a large or infinite number of roots), then you should probably take care to make a sufficiently accurate initial guess if you can. A second limitation of the Newton-Raphson is that, if you happen to guess~$z_0$ especially unfortunately, then the iteration might never converge at all. For example, the roots of $f(z)=z^2+2$ are $z=\pm i\sqrt 2$, but if you guess that $z_0 = 1$ then the iteration has no way to leave the real number line, so it never converges% \footnote{ It is entertaining to try this on a computer. Then try again with $z_0 = 1+i2^{-\mr{0x10}}$. } (and if you guess that $z_0 = \sqrt{2}$---well, try it with your pencil and see what~$z_2$ comes out to be). You can fix the problem with a different, possibly complex initial guess. A third limitation arises where there is a multiple root. In this case, the Newton-Raphson normally still converges, but relatively slowly. For instance, the Newton-Raphson converges relatively slowly on the triple root of $f(z) = z^3$. However, even the relatively slow convergence is still pretty fast and is usually adequate, even for calculations by hand. Usually in practice, the Newton-Raphson iteration works very well. For most functions, once the Newton-Raphson finds the root's neighborhood, it converges on the actual root remarkably quickly. Figure~\ref{drvtv:270:fig1} shows why: in the neighborhood, the curve hardly departs from the straight line. \index{square root!calculation of by Newton-Raphson} \index{$n$th root!calculation of by Newton-Raphson} The Newton-Raphson iteration is a champion square root calculator, incidentally. Consider \[ f(x) = x^2 - p, \] whose roots are \[ x=\pm\sqrt p. \] Per~(\ref{drvtv:NR}), the Newton-Raphson iteration for this is \bq{drvtv:270:30} x_{k+1} = \frac 1 2 \left[ x_k + \frac{p}{x_k} \right]. \eq If you start by guessing \[ x_0=1 \] and iterate several times, the iteration~(\ref{drvtv:270:30}) converges on $x_\infty = \sqrt p$ fast. To calculate the $n$th root $x=p^{1/n}$, let \[ f(x) = x^n - p \] and iterate% \footnote{ Equations~(\ref{drvtv:270:30}) and~(\ref{drvtv:270:35}) work not only for real~$p$ but also usually for complex. Given $x_0 = 1$, however, they converge reliably and orderly only for real, nonnegative~$p$. (To see why, sketch $f[x]$ in the fashion of Fig.~\ref{drvtv:270:fig1}.) If reliable, orderly convergence is needed for complex $p = u + iv = \sigma \cis \psi$, $\sigma \ge 0$, you can decompose $p^{1/n}$ per de~Moivre's theorem~(\ref{trig:280:24}) as $p^{1/n} = \sigma^{1/n} \cis(\psi/n)$, in which $\cis(\psi/n) = \cos(\psi/n) + i\sin(\psi/n)$ is calculated by the Taylor series of Table~\ref{taylor:315:tbl}. Then~$\sigma$ is real and nonnegative, upon which~(\ref{drvtv:270:35}) reliably, orderly computes~$\sigma^{1/n}$. The Newton-Raphson iteration however excels as a \emph{practical} root-finding technique, so it often pays to be a little less theoretically rigid in applying it. If so, then don't bother to decompose; seek $p^{1/n}$ directly, using complex~$z_k$ in place of the real~$x_k$. In the uncommon event that the direct iteration does not seem to converge, start over again with some randomly chosen complex $z_0$. This saves effort and usually works. }$\mbox{}^,$\footnote{\cite[\S~4-9]{Shenk}\cite[\S~6.1.1]{Nayfeh/Bal}\cite{EWW}} \bq{drvtv:270:35} x_{k+1} = \frac 1 n \left[ (n-1)x_k + \frac{p}{x_k^{n-1}} \right]. \eq Section~\ref{mtxinv:420} generalizes the Newton-Raphson iteration to handle vector-valued functions. % ---------------------------------------------------------------------- This concludes the chapter. Chapter~\ref{taylor}, treating the Taylor series, will continue the general discussion of the derivative. derivations-0.53.20120414.orig/tex/alggeo.tex0000644000000000000000000024272611742566274017134 0ustar rootroot% ---------------------------------------------------------------------- \chapter{Classical algebra and geometry} \label{alggeo} \index{classical algebra} \index{algebra!classical} One learns arithmetic and the simplest elements of classical algebra and geometry as a child. Few readers presumably, on the present book's tier, would wish the book to begin with a treatment of $1+1=2$, or of how to solve $3x - 2 = 7$, or of the formal consequences of the congruence of the several angles produced when a line intersects some parallels. However, there are some basic points which do seem worth touching. The book starts with these. % ---------------------------------------------------------------------- \section{Basic arithmetic relationships} \label{alggeo:222} \index{arithmetic} This section states some arithmetical rules. \subsection[Commutivity, associativity, distributivity]{Commutivity, associativity, distributivity, identity and inversion} \label{alggeo:222.1} \index{commutivity} \index{associativity} \index{distributivity} \index{identity, arithmetic} \index{inversion, arithmetic} \index{zero} \index{$0$ (zero)} \index{one} \index{$1$ (one)} \index{unity} \index{rectangle} \index{box} \index{area} \index{volume} Table~\ref{alggeo:222:table} lists several arithmetical rules, each of which applies not only to real numbers but equally to the complex numbers of \S~\ref{alggeo:225}. \begin{table} \caption{Basic properties of arithmetic.} \label{alggeo:222:table} \bc \[ \br{rcll} a+b&=&b+a&\mbox{Additive commutivity}\\ a+(b+c)&=&(a+b)+c&\mbox{Additive associativity}\\ a+0=0+a&=&a&\mbox{Additive identity}\\ a+(-a)&=&0&\mbox{Additive inversion}\\ ab&=&ba&\mbox{Multiplicative commutivity}\\ (a)(bc)&=&(ab)(c)&\mbox{Multiplicative associativity}\\ (a)(1)=(1)(a)&=&a&\mbox{Multiplicative identity}\\ (a)(1/a)&=&1&\mbox{Multiplicative inversion}\\ (a)(b+c)&=&ab+ac&\mbox{Distributivity} \er \] \ec \end{table} Most of the rules are appreciated at once if the meaning of the symbols is understood. In the case of multiplicative commutivity, one imagines a rectangle with sides of lengths~$a$ and~$b$, then the same rectangle turned on its side, as in Fig.~\ref{alggeo:222:20}: since the area of the rectangle is the same in either case, and since the area is the length times the width in either case (the area is more or less a matter of counting the little squares), evidently multiplicative commutivity holds. \begin{figure} \caption{Multiplicative commutivity.} \label{alggeo:222:20} \bc { \nc\xax{-5} \nc\xbx{-2} \nc\xcx{ 5} \nc\xdx{ 2} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xxg{2.0} \nc\xxu{1.0} \nc\xxv{0.6} \nc\xxs{0.4} \nc\xxp{6} \nc\xxq{4} \nc\xxb{ \psline[linewidth=0.5pt](-\xxu,0)(\xxu,0) } \nc\xxc{ \psline[linewidth=0.5pt](0,-\xxv)(0,\xxv) } \nc\xxa{ \multips(0,-\xxv)(0,\xxs){\xxq}{\xxb} \multips(-\xxu,0)(\xxs,0){\xxp}{\xxc} \psframe[linewidth=2.0pt](-\xxu,-\xxv)(\xxu,\xxv) } \rput(-\xxg,0){ \rput{ 0}(0,0){\xxa} \uput[d](0,-\xxv){$a$} \uput[l](-\xxu,0){$b$} } \rput( \xxg,0){ \rput{90}(0,0){\xxa} \uput[d](0,-\xxu){$b$} \uput[l](-\xxv,0){$a$} } \end{pspicture} } \ec \end{figure} A similar argument validates multiplicative associativity, except that here we compute the \emph{volume} of a three-dimensional rectangular box, which box we turn various ways.% \footnote{\cite[Ch.~1]{Spiegel}} \index{inversion!multiplicative} \index{multiplicative inversion} \index{reciprocal} Multiplicative inversion lacks an obvious interpretation when $a=0$. Loosely, \[ \frac{1}{0}=\infty. \] But since $3/0 = \infty$ also, surely either the zero or the infinity, or both, somehow differ in the latter case. \index{C and C++} Looking ahead in the book, we note that the multiplicative properties do not always hold for more general linear transformations. For example, matrix multiplication is not commutative and vector cross-multiplication is not associative. Where associativity does not hold and parentheses do not otherwise group, right-to-left association is notationally implicit:% \footnote{ The fine~C and~C++ programming languages are unfortunately stuck with the reverse order of association, along with division inharmoniously on the same level of syntactic precedence as multiplication. Standard mathematical notation is more elegant: \[ abc/uvw = \frac{(a)(bc)}{(u)(vw)}. \] }$\mbox{}^,$% \footnote{ The nonassociative \emph{cross product} $\ve B \times \ve C$ is introduced in \S~\ref{vector:220.30}. } \[ \ve A \times \ve B \times \ve C = \ve A \times ( \ve B \times \ve C ). \] The sense of it is that the thing on the left ($\ve A\times\mbox{}$) \emph{operates} on the thing on the right ($\ve B \times \ve C$). (In the rare case in which the question arises, you may want to use parentheses anyway.) \subsection{Negative numbers} \label{alggeo:222.20} \label{negative number} Consider that \bqb (+a)(+b) &=& +ab, \\ (+a)(-b) &=& -ab, \\ (-a)(+b) &=& -ab, \\ (-a)(-b) &=& +ab. \eqb The first three of the four equations are unsurprising, but the last is interesting. Why would a negative count~$-a$ of a negative quantity~$-b$ come to a positive product~$+ab$? To see why, consider the progression \settowidth\tla{$\ds+$} \bqb &\vdots& \\ (+3)(-b) &=& -3b, \\ (+2)(-b) &=& -2b, \\ (+1)(-b) &=& -1b, \\ ( 0)(-b) &=& \makebox[\tla]{}0b, \\ (-1)(-b) &=& +1b, \\ (-2)(-b) &=& +2b, \\ (-3)(-b) &=& +3b, \\ &\vdots& \eqb The logic of arithmetic demands that the product of two negative numbers be positive for this reason. \subsection{Inequality} \label{alggeo:222.6} \index{inequality} If% \footnote{ Few readers attempting this book will need to be reminded that~$<$ means ``is less than,'' that~$>$ means ``is greater than,'' or that~$\le$ and~$\ge$ respectively mean ``is less than or equal to'' and ``is greater than or equal to.'' } \[ a < b, \] then necessarily \[ a + x < b + x. \] However, the relationship between~$ua$ and~$ub$ depends on the sign of~$u$: \bqb ua < ub && \mbox{if $u > 0$;} \\ ua > ub && \mbox{if $u < 0$.} \eqb Also, \[ \frac{1}{a} > \frac{1}{b}. \] \subsection{The change of variable} \label{alggeo:222.40} \index{variable!change of} \index{change of variable} \index{variable!assignment} \index{assignment} \index{$\leftarrow$} \index{C and C++} The applied mathematician very often finds it convenient \emph{to change variables,} introducing new symbols to stand in place of old. For this we have the \emph{change of variable} or \emph{assignment} notation% \footnote{ There appears to exist no broadly established standard mathematical notation for the change of variable, other than the~$=$ equal sign, which regrettably does not fill the role well. One can indeed use the equal sign, but then what does the change of variable $k=k+1$ mean? It looks like a claim that~$k$ and $k+1$ are the same, which is impossible. The notation $k\la k+1$ by contrast is unambiguous; it means to increment~$k$ by one. However, the latter notation admittedly has seen only scattered use in the literature. The~C and~C++ programming languages use~\texttt{==} for equality and~\texttt{=} for assignment (change of variable), as the reader may be aware. } \[ Q \la P. \] This means, ``in place of~$P$, put~$Q$''; or, ``let~$Q$ now equal~$P$.'' For example, if $a^2 + b^2 = c^2$, then the change of variable $2\mu \la a$ yields the new form $(2\mu)^2 + b^2 = c^2$. \index{variable!definition notation for} \index{definition notation} \index{$\equiv$} Similar to the change of variable notation is the \emph{definition} notation \[ Q \equiv P. \] This means, ``let the new symbol~$Q$ represent~$P$.''% \footnote{ One would never write $k\equiv k+1$. Even $k\la k+1$ can confuse readers inasmuch as it appears to imply two different values for the same symbol~$k$, but the latter notation is sometimes used anyway when new symbols are unwanted or because more precise alternatives (like $k_n = k_{n-1} + 1$) seem overwrought. Still, usually it is better to introduce a new symbol, as in $j\la k+1$. In some books,~$\equiv$ is printed %as~$\stackrel{\triangle}{=}$. as~$\triangleq$. } The two notations logically mean about the same thing. Subjectively, $Q \equiv P$ identifies a quantity~$P$ sufficiently interesting to be given a permanent name~$Q$, whereas $Q\la P$ implies nothing especially interesting about~$P$ or~$Q$; it just introduces a (perhaps temporary) new symbol~$Q$ to ease the algebra. The concepts grow clearer as examples of the usage arise in the book. % ---------------------------------------------------------------------- \section{Quadratics} \label{alggeo:240} \index{quadratics} \index{factorization} \index{root} \index{constant expression} \index{linear expression} \index{quadratic expression} \index{cubic expression} \index{quartic expression} \index{quintic expression} \index{order} Differences and sums of squares are conveniently factored as \bq{alggeo:224:30} \index{squares, sum or difference of} \begin{split} a^2-b^2 &= (a+b)(a-b), \\ a^2+b^2 &= (a+ib)(a-ib), \\ a^2-2ab+b^2 &= (a-b)^2, \\ a^2+2ab+b^2 &= (a+b)^2 \end{split} \eq (where~$i$ is the \emph{imaginary unit,} a number defined such that $i^2 = -1$, introduced in more detail in \S~\ref{alggeo:225} below). Useful as these four forms are, however, none of them can directly factor the more general quadratic% \footnote{ The adjective \emph{quadratic} refers to the algebra of expressions in which no term has greater than second order. Examples of quadratic expressions include~$x^2$, $2x^2-7x+3$ and $x^2+2xy+y^2$. By contrast, the expressions $x^3-1$ and~$5x^2y$ are \emph{cubic} not quadratic because they contain third-order terms. First-order expressions like $x+1$ are \emph{linear;} zeroth-order expressions like~$3$ are \emph{constant.} Expressions of fourth and fifth order are \emph{quartic} and \emph{quintic,} respectively. (If not already clear from the context, \emph{order} basically refers to the number of variables multiplied together in a term. The term $5x^2y=5[x][x][y]$ is of third order, for instance.) } expression \[ z^2 - 2\beta z + \gamma^2. \] To factor this, we \emph{complete the square,} writing \bqb \index{square, completing the} \index{completing the square} z^2 - 2\beta z + \gamma^2 &=& z^2 - 2\beta z + \gamma^2 + (\beta^2 - \gamma^2) - (\beta^2 - \gamma^2) \xn\\ &=& z^2 - 2\beta z + \beta^2 - (\beta^2 - \gamma^2) \xn\\ &=& (z-\beta)^2 - (\beta^2 - \gamma^2). \eqb The expression evidently has roots% \footnote{ A \emph{root} of $f(z)$ is a value of~$z$ for which $f(z)=0$. See \S~\ref{alggeo:250}. } where \[ (z-\beta)^2 = (\beta^2 - \gamma^2), \] or in other words where% \footnote{ The symbol~$\pm$ means~``$+$ or~$-$.'' In conjunction with this symbol, the alternate symbol~$\mp$ occasionally also appears, meaning~``$-$ or~$+$''---which is the same thing except that, where the two symbols appear together, $(\pm z) + (\mp z) = 0$. } \bq{alggeo:240:quad} \index{quadratic formula} \index{root extraction!from a quadratic polynomial} z = \beta \pm \sqrt{\beta^2 - \gamma^2}. \eq This suggests the factoring% \footnote{ It suggests it because the expressions on the left and right sides of~(\ref{alggeo:240:50}) are both quadratic (the highest power is~$z^2$) and have the same roots. Substituting into the equation the values of~$z_1$ and~$z_2$ and simplifying proves the suggestion correct. } \bq{alggeo:240:50} z^2 - 2\beta z + \gamma^2 = (z-z_1)(z-z_2), \eq where~$z_1$ and~$z_2$ are the two values of~$z$ given by~(\ref{alggeo:240:quad}). It follows that the two solutions of the quadratic equation \bq{alggeo:240:quadeq} z^2 = 2\beta z - \gamma^2 \eq are those given by~(\ref{alggeo:240:quad}), which is called \emph{the quadratic formula.}% \footnote{ The form of the quadratic formula which usually appears in print is \[ x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}, \] which solves the quadratic $ax^2 + bx + c = 0$. However, this writer finds the form~(\ref{alggeo:240:quad}) easier to remember. For example, by~(\ref{alggeo:240:quad}) in light of~(\ref{alggeo:240:quadeq}), the quadratic \[ z^2 = 3z - 2 \] has the solutions \[ z = \frac{3}{2} \pm \sqrt{ \left( \frac{3}{2} \right)^2 - 2 } = 1\ \mbox{or}\ 2. \] } % (\emph{Cubic} and \emph{quartic formulas} also exist respectively to extract the roots of polynomials of third and fourth order, but they are much harder. See Ch.~\ref{cubic} and its Tables~\ref{cubic:cubic-table} and~\ref{cubic:quartic-table}.) % ---------------------------------------------------------------------- \section{Integer and series notation} \label{alggeo:227} \index{series} \index{series!notation for} \index{integer} \index{series!sum of} \index{series!product of} \index{sum} \index{summation} \index{product} \index{multiplication} \index{dummy variable} \index{index} \index{index!of summation} \index{summation!index of} \index{index!of multiplication} \index{multiplication!index of} \index{loop counter} Sums and products of series arise so frequently in mathematical work that one finds it convenient to define terse notations to express them. The summation notation \[ \sum_{k=a}^{b} f(k) \] means to let~$k$ equal each of the integers $a,a+1,a+2,\ldots,b$ in turn, evaluating the function $f(k)$ at each~$k$, then adding the several $f(k)$. For example,% \footnote{ The hexadecimal numeral~$\mbox{0x56}$ represents the same number the decimal numeral~$86$ represents. The book's preface explains why the book represents such numbers in hexadecimal. Appendix~\ref{hex} tells how to read the numerals. } \[ \sum_{k=3}^{6} k^2 = 3^2 + 4^2 + 5^2 + 6^2 = \mr{0x56}. \] The similar multiplication notation \[ \prod_{j=a}^{b} f(j) \] means \emph{to multiply} the several $f(j)$ rather than to add them. The symbols~$\sum$ and~$\prod$ come respectively from the Greek letters for~S and~P, and may be regarded as standing for ``Sum'' and ``Product.'' The~$j$ or~$k$ is a \emph{dummy variable, index of summation} or \emph{loop counter}---a variable with no independent existence, used only to facilitate the addition or multiplication of the series.% \footnote{ Section~\ref{integ:240} speaks further of the dummy variable. } (Nothing prevents one from writing~$\prod_k$ rather than~$\prod_j$, incidentally. For a dummy variable, one can use any letter one likes. However, the general habit of writing~$\sum_k$ and~$\prod_j$ proves convenient at least in \S~\ref{drvtv:250.30} and Ch.~\ref{taylor}, so we start now.) \index{\char 33} \index{factorial} The product shorthand \bqb n! &\equiv& \prod_{j=1}^{n} j,\\ n!/m! &\equiv& \prod_{j=m+1}^{n} j, \eqb is very frequently used. The notation~$n!$ is pronounced ``$n$ factorial.'' Regarding the notation $n!/m!$, this can of course be regarded correctly as~$n!$ divided by~$m!$ , but it usually proves more amenable to regard the notation as a single unit.% \footnote{ One reason among others for this is that factorials rapidly multiply to extremely large sizes, overflowing computer registers during numerical computation. If you can avoid unnecessary multiplication by regarding $n!/m!$ as a single unit, this is a win. } \index{series!multiplication order of} Because multiplication in its more general sense as linear transformation (\S~\ref{matrix:120.10}) is not always commutative, we specify that \[ \prod_{j=a}^{b} f(j) = [f(b)] [f(b-1)] [f(b-2)] \cdots [f(a+2)] [f(a+1)] [f(a)] \] rather than the reverse order of multiplication.% \footnote{ The extant mathematical literature lacks an established standard on the order of multiplication implied by the ``$\prod$'' symbol, but this is the order we will use in this book. } Multiplication proceeds from right to left. In the event that the reverse order of multiplication is needed, we will use the notation \[ \coprod_{j=a}^{b} f(j) = [f(a)] [f(a+1)] [f(a+2)] \cdots [f(b-2)] [f(b-1)] [f(b)]. \] Note that for the sake of definitional consistency, \[ \begin{split} \sum _{k=N+1}^{N} f(k) &= 0 + \sum _{k=N+1}^{N} f(k) = 0, \\ \prod_{j=N+1}^{N} f(j) &= (1) \prod_{j=N+1}^{N} f(j) = 1. \end{split} \] This means among other things that \bq{alggeo:227:40} 0!=1. \eq \index{belonging} \index{membership} \index{set} \index{$\in \mathbb Z$} \index{Fortran} Context tends to make the notation \[ N,j,k \in \mathbb Z \] unnecessary, but if used (as here and in \S~\ref{alggeo:224}) it states explicitly that~$N$, $j$ and~$k$ are integers. (The symbol~$\mathbb Z$ represents% \footnote{ The letter~$\mathbb Z$ recalls the transitive and intransitive German verb \emph{z\"ahlen,} ``to count.'' } the set of all integers: $\mathbb Z \equiv \{\ldots,-5,-4,-3,-2,-1,0,1,2,3,4,5,\ldots\}$. The symbol~$\in$ means ``belongs to'' or ``is a member of.'' Integers conventionally get the letters% \footnote{ Though Fortran is perhaps less widely used a computer programming language than it once was, it dominated applied-mathematical computer programming for decades, during which the standard way to declare an integer variable to the Fortran compiler was simply to let its name begin with~\texttt{I}, \texttt{J}, \texttt{K}, \texttt{L}, \texttt{M} or~\texttt{N}; so, this alphabetical convention is fairly well cemented in practice. }% ~$i$, $j$, $k$, $m$, $n$, $M$ and~$N$ when available---though~$i$ is sometimes avoided because the same letter represents the imaginary unit of \S~\ref{alggeo:225}. Where additional letters are needed~$\ell$, $p$ and~$q$, plus the capitals of these and the earlier listed letters, can be pressed into service, occasionally joined even by~$r$ and~$s$. Greek letters are avoided, as---ironically in light of the symbol~$\mathbb Z$---are the Roman letters~$x$, $y$ and~$z$. Refer to Appendix~\ref{greek}.) On first encounter, the~$\sum$ and~$\prod$ notation seems a bit overwrought, whether or not the $\in\mathbb Z$ notation also is used. Admittedly it is easier for the beginner to read ``$f(1)+f(2)+\cdots+f(N)$'' than ``$\sum_{k=1}^{N} f(k)$.'' However, experience shows the latter notation to be extremely useful in expressing more sophisticated mathematical ideas. We will use such notation extensively in this book. % ---------------------------------------------------------------------- \section{The arithmetic series} \label{alggeo:229} \index{arithmetic series} \index{series!arithmetic} A simple yet useful application of the series sum of \S~\ref{alggeo:227} is the \emph{arithmetic series} \[ \sum_{k=a}^{b} k = a + (a+1) + (a+2) + \cdots + b. \] Pairing~$a$ with~$b$, then $a+1$ with $b-1$, then $a+2$ with $b-2$, etc., the average of each pair is $[a+b]/2$; thus the average of the entire series is $[a+b]/2$. (The pairing may or may not leave an unpaired element at the series midpoint $k=[a+b]/2$, but this changes nothing.) The series has $b-a+1$ terms. Hence, \bq{alggeo:229:10} \sum_{k=a}^{b} k = (b-a+1)\frac{a+b}{2}. \eq Success with this arithmetic series leads one to wonder about the \emph{geometric series} $\sum_{k=0}^\infty z^k$. Section~\ref{alggeo:228.30} addresses that point. % ---------------------------------------------------------------------- \section{Powers and roots} \label{alggeo:224} \index{power} This necessarily tedious section discusses powers and roots. It offers no surprises. Table~\ref{alggeo:224:t1} summarizes its definitions and results. Readers seeking more rewarding reading may prefer just to glance at the table then to skip directly to the start of the next section. \begin{table} \caption{Power properties and definitions.} \label{alggeo:224:t1} \index{power!properties of} \bqb \ds z^n &\equiv& \prod_{j=1}^{n}z, \ \ n \ge 0 \\ z&=&(z^{1/n})^n=(z^n)^{1/n} \\ \sqrt z &\equiv& z^{1/2} \\ (uv)^a &=& u^a v^a \\ z^{p/q} &=& (z^{1/q})^p=(z^p)^{1/q} \\ z^{ab} &=& (z^a)^b = (z^b)^a \\ z^{a+b} &=& z^az^b \\ z^{a-b} &=& \frac{z^a}{z^b} \\ z^{-b} &=& \frac{1}{z^b} \\ j,n,p,q &\in& \mathbb Z \eqb \end{table} \index{integer} In this section, the exponents \[ j,k,m,n,p,q,r,s \in \mathbb Z \] are integers, but the exponents~$a$ and~$b$ are arbitrary real numbers. \subsection{Notation and integral powers} \label{alggeo:224int} \index{power!notation for} \index{power!integral} \index{exponent} The power notation \[ z^n \] indicates the number~$z$, multiplied by itself~$n$ times. More formally, when the \emph{exponent}~$n$ is a nonnegative integer,% \footnote{ % Strictly, the comma here should go inside the quotes, but in this % context that might confuse the reader, so we bend the style rule % here. The symbol~``$\equiv$'' means~``$=$'', but it further usually indicates that the expression on its right serves to define the expression on its left. Refer to \S~\ref{alggeo:222.40}. } \bq{alggeo:224:10} z^n \equiv \prod_{j=1}^{n}z. \eq For example,% \footnote{ The case~$0^0$ is interesting because it lacks an obvious interpretation. The specific interpretation depends on the nature and meaning of the two zeros. For interest, if $E \equiv 1/\ep$, then \[ \lim_{\ep\rightarrow 0^+} \ep^\ep = \lim_{E\rightarrow \infty} \left(\frac{1}{E}\right)^{1/E} = \lim_{E\rightarrow \infty} E^{-1/E} = \lim_{E\rightarrow \infty} e^{-(\ln E)/E} = e^0 = 1. \] } \bqb z^3 &=& (z)(z)(z), \\ z^2 &=& (z)(z), \\ z^1 &=& z, \\ z^0 &=& 1. \eqb Notice that in general, \[ z^{n-1} = \frac{z^n}{z}. \] This leads us to extend the definition to negative integral powers with \bq{alggeo:224:0} z^{-n} = \frac{1}{z^n}. \eq From the foregoing it is plain that \bq{alggeo:224:1} \begin{split} z^{m+n} &= z^mz^n, \\ z^{m-n} &= \frac{z^m}{z^n}, \end{split} \eq for any integral~$m$ and~$n$. For similar reasons, \bq{alggeo:224:2} z^{mn}=(z^m)^n=(z^n)^m. \eq On the other hand from multiplicative associativity and commutivity, \bq{alggeo:224:2a} (uv)^n = u^n v^n. \eq \subsection{Roots} \label{alggeo:224roots} \index{root} \index{power!fractional} Fractional powers are not something we have defined yet, so for consistency with~(\ref{alggeo:224:2}) we let \[ (u^{1/n})^n = u. \] This has $u^{1/n}$ as the number which, raised to the $n$th power, yields~$u$. Setting \[ v = u^{1/n}, \] it follows by successive steps that \bqb v^n &=& u, \\ (v^n)^{1/n} &=& u^{1/n}, \\ (v^n)^{1/n} &=& v. \eqb Taking the~$u$ and~$v$ formulas together, then, \bq{alggeo:224:11} (z^{1/n})^n = z = (z^n)^{1/n} \eq for any~$z$ and integral~$n$. \index{square root} The number $z^{1/n}$ is called the \emph{$n$th root} of~$z$---or in the very common case $n=2$, the \emph{square root} of~$z$, often written \[ \sqrt{z}. \] When~$z$ is real and nonnegative, the last notation is usually implicitly taken to mean the real, nonnegative square root. In any case, the power and root operations mutually invert one another. \index{power!real} \index{real number!approximation of as a ratio of integers} \index{ratio} What about powers expressible neither as~$n$ nor as $1/n$, such as the~$3/2$ power? If~$z$ and~$w$ are numbers related by \[ w^q=z, \] then \[ w^{pq}=z^p. \] Taking the $q$th root, \[ w^p=(z^p)^{1/q}. \] But $w=z^{1/q}$, so this is \[ (z^{1/q})^p=(z^p)^{1/q}, \] which says that it does not matter whether one applies the power or the root first; the result is the same. Extending~(\ref{alggeo:224:2}) therefore, we define $z^{p/q}$ such that \bq{alggeo:224:12} (z^{1/q})^p=z^{p/q}=(z^p)^{1/q}. \eq Since any real number can be approximated arbitrarily closely by a ratio of integers,~(\ref{alggeo:224:12}) implies a power definition for all real exponents. Equation~(\ref{alggeo:224:12}) is this subsection's main result. However, \S~\ref{alggeo:224pppp} will find it useful if we can also show here that \bq{alggeo:224:5} (z^{1/q})^{1/s}=z^{1/qs}=(z^{1/s})^{1/q}. \eq The proof is straightforward. If \[ w \equiv z^{1/qs}, \] then raising to the~$qs$ power yields \[ (w^s)^q = z. \] Successively taking the $q$th and $s$th roots gives \[ w = (z^{1/q})^{1/s}. \] By identical reasoning, \[ w = (z^{1/s})^{1/q}. \] But since $w \equiv z^{1/qs}$, the last two equations imply~(\ref{alggeo:224:5}), as we have sought. \subsection{Powers of products and powers of powers} \label{alggeo:224pppp} \index{power!of a product} \index{power!of a power} Per~(\ref{alggeo:224:2a}), \[ (uv)^p = u^p v^p. \] Raising this equation to the $1/q$ power, we have that \bqb (uv)^{p/q} &=& \left[u^p v^p\right]^{1/q} \\ &=& \left[(u^p)^{q/q} (v^p)^{q/q}\right]^{1/q} \\ &=& \left[(u^{p/q})^q (v^{p/q})^q\right]^{1/q} \\ &=& \left[(u^{p/q}) (v^{p/q})\right]^{q/q} \\ &=& u^{p/q} v^{p/q}. \eqb In other words \bq{alggeo:224:4} (uv)^a = u^a v^a \eq for any real~$a$. On the other hand, per~(\ref{alggeo:224:2}), \[ z^{pr} = (z^p)^r. \] Raising this equation to the $1/qs$ power and applying~(\ref{alggeo:224:2}), (\ref{alggeo:224:12}) and % bad break (\ref{alggeo:224:5}) to reorder the powers, we have that \[ z^{(p/q)(r/s)} = (z^{p/q})^{r/s}. \] By identical reasoning, \[ z^{(p/q)(r/s)} = (z^{r/s})^{p/q}. \] Since $p/q$ and $r/s$ can approximate any real numbers with arbitrary precision, this implies that \bq{alggeo:224:4a} (z^a)^b=z^{ab}=(z^b)^a \eq for any real~$a$ and~$b$. \subsection{Sums of powers} \label{alggeo:224sdp} \index{power!sum of} With~(\ref{alggeo:224:1}), (\ref{alggeo:224:4}) and~(\ref{alggeo:224:4a}), one can reason that \[ z^{(p/q)+(r/s)} = (z^{ps+rq})^{1/qs} = (z^{ps}z^{rq})^{1/qs} = z^{p/q}z^{r/s}, \] or in other words that \bq{alggeo:224:6} z^{a+b} = z^az^b. \eq In the case that $a=-b$, \[ 1 = z^{-b+b} = z^{-b}z^b, \] which implies that \bq{alggeo:224:7} z^{-b} = \frac{1}{z^b}. \eq But then replacing $-b \la b$ in~(\ref{alggeo:224:6}) leads to \[ z^{a-b} = z^az^{-b}, \] which according to~(\ref{alggeo:224:7}) is \bq{alggeo:224:8} z^{a-b} = \frac{z^a}{z^b}. \eq \subsection{Summary and remarks} \label{alggeo:224summary} Table~\ref{alggeo:224:t1} on page~\pageref{alggeo:224:t1} summarizes the section's definitions and results. Looking ahead to \S~\ref{alggeo:225}, \S~\ref{trig:280} and Ch.~\ref{cexp}, we observe that nothing in the foregoing analysis requires the base variables~$z$, $w$, $u$ and~$v$ to be real numbers; if complex (\S~\ref{alggeo:225}), the formulas remain valid. Still, the analysis does imply that the various exponents~$m$, $n$, $p/q$, $a$, $b$ and so on are real numbers. This restriction, we shall remove later, purposely defining the action of a complex exponent to comport with the results found here. With such a definition the results apply not only for all bases but also for all exponents, real or complex. % ---------------------------------------------------------------------- \section{Multiplying and dividing power series} \label{alggeo:228} \index{power series} \index{polynomial} \index{Laurent series} \index{Laurent, Pierre Alphonse (1813--1854)} \index{power series!with negative powers} A \emph{power series}% \footnote{ Another name for the \emph{power series} is \emph{polynomial.} The word ``polynomial'' usually connotes a power series with a finite number of terms, but the two names in fact refer to essentially the same thing. Professional mathematicians use the terms more precisely. Equation~(\ref{alggeo:228:05}), they call a ``power series'' only if $a_k=0$ for all $k<0$---in other words, technically, not if it includes negative powers of~$z$. They call it a ``polynomial'' only if it is a ``power series'' with a finite number of terms. They call~(\ref{alggeo:228:05}) in general a \emph{Laurent series.} The name ``Laurent series'' is a name we shall meet again in \S~\ref{taylor:380}. In the meantime however we admit that the professionals have vaguely daunted us by adding to the name some pretty sophisticated connotations, to the point that we applied mathematicians (at least in the author's country) seem to feel somehow unlicensed actually to use the name. We tend to call~(\ref{alggeo:228:05}) a ``power series with negative powers,'' or just ``a power series.'' This book follows the last usage. You however can call~(\ref{alggeo:228:05}) a \emph{Laurent series} if you prefer (and if you pronounce it right: ``lor-ON''). That is after all exactly what it is. Nevertheless if you do use the name ``Laurent series,'' be prepared for people subjectively---for no particular reason---to expect you to establish complex radii of convergence, to sketch some annulus in the Argand plane, and/or to engage in other maybe unnecessary formalities. If that is not what you seek, then you may find it better just to call the thing by the less lofty name of ``power series''---or better, if it has a finite number of terms, by the even humbler name of ``polynomial.'' Semantics. % bad break (To remove the following \pagebreak indirectly causes a % page with almost nothing on it to appear. This however is subject % to the content of the entire chapter up to and about this point. % It is a temperamental bad break.) \pagebreak All these names mean about the same thing, but one is expected most carefully always to give the right name in the right place. What a bother! (Someone once told the writer that the Japanese language can give different names to the same object, depending on whether the \emph{speaker} is male or female. The power-series terminology seems to share a spirit of that kin.) If you seek just one word for the thing, the writer recommends that you call it a ``power series'' and then not worry too much about it until someone objects. When someone does object, you can snow him with the big word ``Laurent series,'' instead. The experienced scientist or engineer may notice that the above vocabulary omits the name ``Taylor series.'' The vocabulary omits the name because that name fortunately remains unconfused in usage---it means quite specifically a power series without negative powers and tends to connote a representation of some particular function of interest---as we shall see in Ch.~\ref{taylor}. } is a weighted sum of integral powers: \bq{alggeo:228:05} A(z) = \sum_{k=-\infty}^{\infty} a_kz^k, \eq where the several~$a_k$ are arbitrary constants. This section discusses the multiplication and division of power series. \subsection{Multiplying power series} \label{alggeo:228.10} \index{power series!multiplying} Given two power series \bq{alggeo:228:07} \begin{split} A(z) &= \sum_{k=-\infty}^{\infty} a_kz^k, \\ B(z) &= \sum_{k=-\infty}^{\infty} b_kz^k, \end{split} \eq the product of the two series is evidently \settoheight\tla{\scriptsize$k=-\infty\ j=-\infty$} \bq{alggeo:228:prod} P(z) \equiv A(z)B(z) = \sum_{k=-\infty\rule{0em}{\tla}}^{\infty} \,\sum_{j=-\infty\rule{0em}{\tla}}^{\infty} a_jb_{k-j} z^k. \eq \subsection{Dividing power series} \label{alggeo:228.20} \index{power series!division of} \index{long division} \index{remainder} \index{quotient} \index{dividend} \index{divisor} \index{numerator} \index{denominator} The quotient $Q(z) = B(z)/A(z)$ of two power series is a little harder to calculate, and there are at least two ways to do it. Section~\ref{alggeo:228.50} below will do it by matching coefficients, but this subsection does it by long division. For example, \bqb \frac{2z^2 - 3z + 3}{z-2} &=& \frac{2z^2 - 4z}{z-2} + \frac{z+3}{z-2} = 2z + \frac{z+3}{z-2} \\ &=& 2z + \frac{z-2}{z-2} + \frac{5}{z-2} = 2z + 1 + \frac{5}{z-2}. \eqb The strategy is to take the dividend% \footnote{ If $Q(z)$ is a \emph{quotient} and $R(z)$ a \emph{remainder,} then $B(z)$ is a \emph{dividend} (or \emph{numerator}) and $A(z)$ a \emph{divisor} (or \emph{denominator}). Such are the Latin-derived names of the parts of a long division. } $B(z)$ piece by piece, purposely choosing pieces easily divided by $A(z)$. If you feel that you understand the example, then that is really all there is to it, and you can skip the rest of the subsection if you like. One sometimes wants to express the long division of power series more formally, however. That is what the rest of the subsection is about. (Be advised however that the cleverer technique of \S~\ref{alggeo:228.50}, though less direct, is often easier and faster.) Formally, we prepare the long division $B(z)/A(z)$ by writing \bq{alggeo:228:20} B(z) = A(z)Q_n(z) + R_n(z), \eq where $R_n(z)$ is a \emph{remainder} (being the part of $B[z]$ \emph{remaining} to be divided); and \bq{alggeo:228:08} \begin{split} A(z) &= \sum_{k=-\infty}^{K} a_kz^k,\ \ a_K \neq 0,\\ B(z) &= \sum_{k=-\infty}^{N} b_kz^k, \\ R_N(z) &= B(z), \\ Q_N(z) &= 0, \\ R_n(z) &= \sum_{k=-\infty}^{n} r_{nk}z^k, \\ Q_n(z) &= \sum_{k=n-K+1}^{N-K} q_{k}z^k, \end{split} \eq where~$K$ and~$N$ identify the greatest orders~$k$ of~$z^k$ present in $A(z)$ and $B(z)$, respectively. Well, that is a lot of symbology. What does it mean? The key to understanding it lies in understanding~(\ref{alggeo:228:20}), which is not one but several equations---one equation for each value of~$n$, where $n=N,N-1,N-2,\ldots$\,. The dividend $B(z)$ and the divisor $A(z)$ stay the same from one~$n$ to the next, but the quotient $Q_n(z)$ and the remainder $R_n(z)$ change. At start, $Q_N(z)=0$ while $R_N(z)=B(z)$, but the thrust of the long division process is to build $Q_n(z)$ up by wearing~$R_n(z)$ down. The goal is to grind $R_n(z)$ away to nothing, to make it disappear as $n\ra-\infty$. As in the example, we pursue the goal by choosing from $R_n(z)$ an easily divisible piece containing the whole high-order term of~$R_n(z)$. The piece we choose is $(r_{nn}/a_K)z^{n-K}A(z)$, which we add and subtract from~(\ref{alggeo:228:20}) to obtain \[ B(z) = A(z)\left[Q_n(z)+\frac{r_{nn}}{a_K}z^{n-K}\right] + \left[ R_n(z) - \frac{r_{nn}}{a_K}z^{n-K}A(z) \right]. \] Matching this equation against the desired iterate \[ B(z) = A(z)Q_{n-1}(z) + R_{n-1}(z) \] and observing from the definition of $Q_n(z)$ that $Q_{n-1}(z) = Q_n(z) + \linebreak % bad break q_{n-K}z^{n-K}$, we find that \bq{alggeo:228:25} \begin{split} q_{n-K} &= \frac{r_{nn}}{a_K}, \\ R_{n-1}(z) &= R_n(z) - q_{n-K} z^{n-K}A(z), \end{split} \eq where no term remains in $R_{n-1}(z)$ higher than a~$z^{n-1}$ term. To begin the actual long division, we initialize \[ R_N(z) = B(z), \] for which~(\ref{alggeo:228:20}) is trivially true. Then we iterate per~(\ref{alggeo:228:25}) as many times as desired. If an infinite number of times, then so long as $R_n(z)$ tends to vanish as $n \ra -\infty$, it follows from~(\ref{alggeo:228:20}) that \bq{alggeo:228:30} \frac{B(z)}{A(z)} = Q_{-\infty}(z). \eq Iterating only a finite number of times leaves a remainder, \bq{alggeo:228:35} \frac{B(z)}{A(z)} = Q_{n}(z) + \frac{R_{n}(z)}{A(z)}, \eq except that it may happen that $R_{n}(z) = 0$ for sufficiently small~$n$. Table~\ref{alggeo:228:tbl-down} summarizes the long-division procedure.% \footnote{\cite[\S~3.2]{SRW}} \begin{table} \caption{Dividing power series through successively smaller powers.} \label{alggeo:228:tbl-down} \index{long division!procedure for} \bqb B(z) &=& A(z)Q_n(z) + R_n(z) \\ A(z) &=& \sum_{k=-\infty}^{K} a_kz^k,\ \ a_K \neq 0 \\ B(z) &=& \sum_{k=-\infty}^{N} b_kz^k \\ R_N(z) &=& B(z) \\ Q_N(z) &=& 0 \\ R_n(z) &=& \sum_{k=-\infty}^{n} r_{nk}z^k \\ Q_n(z) &=& \sum_{k=n-K+1}^{N-K} q_{k}z^k \\ q_{n-K} &=& \frac{r_{nn}}{a_K} = \frac{1}{a_K} \left( b_n - \sum_{k=n-K+1\rule{0em}{\tla}}^{N-K} a_{n-k} q_k \right) \\ R_{n-1}(z) &=& R_n(z) - q_{n-K} z^{n-K}A(z) \\ \frac{B(z)}{A(z)} &=& Q_{-\infty}(z) \eqb \end{table} In its $q_{n-K}$ equation, the table includes also the result of \S~\ref{alggeo:228.50} below. It should be observed in light of Table~\ref{alggeo:228:tbl-down} that if% \footnote{ The notations~$K_o$, $a_k$ and~$z^k$ are usually pronounced, respectively, as ``$K$ naught,'' ``$a$ sub $k$'' and ``$z$ to the~$k$'' (or, more fully, ``$z$ to the $k$th power'')---at least in the author's country. } \bqb A(z) &=& \sum_{k=K_o}^{K} a_kz^k, \\ B(z) &=& \sum_{k=N_o}^{N} b_kz^k, \eqb then \bq{alggeo:228:37} R_n(z) = \sum_{k=n-(K-K_o)+1}^n r_{nk} z^k \ \ \mbox{for all $n < N_o+(K-K_o)$.} \eq That is, the remainder has order one less than the divisor has. The reason for this, of course, is that we have strategically planned the long-division iteration precisely to cause the leading term of the divisor to cancel the leading term of the remainder at each step.% \footnote{ If a more formal demonstration of~(\ref{alggeo:228:37}) is wanted, then consider per~(\ref{alggeo:228:25}) that \[ R_{m-1}(z) = R_m(z) - \frac{r_{mm}}{a_K} z^{m-K}A(z). \] If the least-order term of $R_m(z)$ is a~$z^{N_o}$ term (as clearly is the case at least for the initial remainder $R_N[z] = B[z]$), then according to the equation so also must the least-order term of $R_{m-1}(z)$ be a~$z^{N_o}$ term, unless an even lower-order term be contributed by the product $z^{m-K}A(z)$. But that very product's term of least order is a $z^{m-(K-K_o)}$ term. Under these conditions, evidently the least-order term of $R_{m-1}(z)$ is a $z^{m-(K-K_o)}$ term when $m-(K-K_o) \le N_o$; otherwise a~$z^{N_o}$ term. This is better stated after the change of variable $n+1\la m$: the least-order term of $R_n(z)$ is a $z^{n-(K-K_o)+1}$ term when $n < N_o + (K-K_o)$; otherwise a~$z^{N_o}$ term. The greatest-order term of $R_n(z)$ is by definition a~$z^n$ term. So, in summary, when~$n < N_o + (K-K_o)$, the terms of $R_n(z)$ run from $z^{n-(K-K_o)+1}$ through~$z^n$, which is exactly the claim~(\ref{alggeo:228:37}) makes. } The long-division procedure of Table~\ref{alggeo:228:tbl-down} extends the quotient $Q_n(z)$ through successively smaller powers of~$z$. Often, however, one prefers to extend the quotient through successively \emph{larger} powers of~$z$, where a~$z^K$ term is $A(z)$'s term of \emph{least} order. In this case, the long division goes by the complementary rules of Table~\ref{alggeo:228:tbl-up}. \begin{table} \caption{Dividing power series through successively larger powers.} \label{alggeo:228:tbl-up} \index{long division!procedure for} \bqb B(z) &=& A(z)Q_n(z) + R_n(z) \\ A(z) &=& \sum_{k=K}^{\infty} a_kz^k,\ \ a_K \neq 0 \\ B(z) &=& \sum_{k=N}^{\infty} b_kz^k \\ R_N(z) &=& B(z) \\ Q_N(z) &=& 0 \\ R_n(z) &=& \sum_{k=n}^{\infty} r_{nk}z^k \\ Q_n(z) &=& \sum_{k=N-K}^{n-K-1} q_{k}z^k \\ q_{n-K} &=& \frac{r_{nn}}{a_K} = \frac{1}{a_K} \left( b_n - \sum_{k=N-K\rule{0em}{\tla}}^{n-K-1} a_{n-k} q_k \right) \\ R_{n+1}(z) &=& R_n(z) - q_{n-K} z^{n-K}A(z) \\ \frac{B(z)}{A(z)} &=& Q_{\infty}(z) \eqb \end{table} \subsection{Dividing power series by matching coefficients} \label{alggeo:228.50} \index{power series!division of by matching coefficients} \index{division!by matching coefficients} \index{coefficient!matching of} \index{matching coefficients} \index{quotient} There is another, sometimes quicker way to divide power series than by the long division of \S~\ref{alggeo:228.20}. One can divide them by matching coefficients.% \footnote{\cite{Kohler-lecture}\cite[\S~2.5]{Fisher}} If \bq{alggeo:228:60} Q_\infty(z) = \frac{B(z)}{A(z)}, \eq where \bqb A(z) &=& \sum_{k=K}^\infty a_kz^k, \ \ a_K \neq 0, \\ B(z) &=& \sum_{k=N}^\infty b_kz^k \eqb are known and \[ Q_\infty(z) = \sum_{k=N-K}^\infty q_kz^k \] is to be calculated, then one can multiply~(\ref{alggeo:228:60}) through by $A(z)$ to obtain the form \[ A(z)Q_\infty(z) = B(z). \] Expanding the left side according to~(\ref{alggeo:228:prod}) and changing the index $n \la k$ on the right side, \settoheight\tla{\scriptsize $k$} \[ \sum_{n=N\rule{0em}{\tla}}^{\infty} \,\sum_{k=N-K\rule{0em}{\tla}}^{n-K} a_{n-k} q_k z^n = \sum_{n=N}^\infty b_nz^n. \] But for this to hold for all~$z$, the coefficients must match for each~$n$: \[ \sum_{k=N-K\rule{0em}{\tla}}^{n-K} a_{n-k} q_k = b_n, \ \ n \ge N. \] Transferring all terms but $a_Kq_{n-K}$ to the equation's right side and dividing by~$a_K$, we have that \bq{alggeo:228:65} q_{n-K} = \frac{1}{a_K} \left( b_n - \sum_{k=N-K\rule{0em}{\tla}}^{n-K-1} a_{n-k} q_k \right), \ \ n \ge N. \eq Equation~(\ref{alggeo:228:65}) computes the coefficients of $Q(z)$, each coefficient depending on the coefficients earlier computed. The coefficient-matching technique of this subsection is easily adapted to the division of series in decreasing, rather than increasing, powers of~$z$ if needed or desired. The adaptation is left as an exercise to the interested reader, but Tables~\ref{alggeo:228:tbl-down} and~\ref{alggeo:228:tbl-up} incorporate the technique both ways. Admittedly, the fact that~(\ref{alggeo:228:65}) yields a sequence of coefficients does not necessarily mean that the resulting power series $Q_\infty(z)$ converges to some definite value over a given domain. Consider for instance~(\ref{alggeo:228:45}), which diverges when% \footnote{ See footnote~\ref{alggeo:228:fn1}. } $\left|z\right| > 1$, even though all its coefficients are known. At least~(\ref{alggeo:228:65}) is correct when $Q_\infty(z)$ does converge. Even when $Q_\infty(z)$ as such does not converge, however, often what interest us are only the series' first several terms \[ Q_n(z) = \sum_{k=N-K}^{n-K-1} q_kz^k. \] In this case, \bq{alggeo:228:68} Q_\infty(z) = \frac{B(z)}{A(z)} = Q_n(z) + \frac{R_n(z)}{A(z)} \eq and convergence is not an issue. Solving~(\ref{alggeo:228:68}) for $R_n(z)$, \bq{alggeo:228:70} R_n(z) = B(z) - A(z)Q_n(z). \eq \subsection [Common quotients and the geometric series] {Common power-series quotients and the geometric series} \label{alggeo:228.30} \index{power series!common quotients of} Frequently encountered power-series quotients, calculated by the long division of \S~\ref{alggeo:228.20}, computed by the coefficient matching of \S~\ref{alggeo:228.50}, and/or verified by multiplying, include% \footnote{\label{alggeo:228:fn1}% The notation~$\left|z\right|$ represents the magnitude of~$z$. For example, $\left|5\right| = 5$, but also $\left|-5\right| = 5$. } \bq{alggeo:228:40} \frac{1}{1\pm z} = \begin{cases} \ds\sum_{k=0}^\infty (\mp z)^k, &\left|z\right| < 1; \\ \ds-\!\!\!\sum_{k=-\infty}^{-1}\!\!\!(\mp z)^k, &\left|z\right| > 1. \end{cases} \eq \index{geometric series} \index{series!geometric} Equation~(\ref{alggeo:228:40}) almost incidentally answers a question which has arisen in \S~\ref{alggeo:229} and which often arises in practice: to what total does the infinite \emph{geometric series} $\sum_{k=0}^{\infty} z^k$, $\left|z\right|<1$, sum? Answer: it sums exactly to $1/(1-z)$. However, there is a simpler, more aesthetic, more instructive way to demonstrate the same thing, as follows. Let \[ S \equiv \sum_{k=0}^\infty z^{k}, \ \ \left|z\right| < 1. \] Multiplying by~$z$ yields \[ zS \equiv \sum_{k=1}^\infty z^{k}. \] Subtracting the latter equation from the former leaves \[ (1-z)S = 1, \] which, after dividing by $1-z$, implies that \bq{alggeo:228:45} S \equiv \sum_{k=0}^\infty z^{k} = \frac{1}{1-z}, \ \ \left|z\right| < 1, \eq as was to be demonstrated. \subsection{Variations on the geometric series} \label{alggeo:228.40} \index{geometric series!variations on} \index{series!geometric, variations on} \index{power series!extending the technique of} \index{Planck, Max (1858--1947)} \index{blackbody radiation} Besides being more aesthetic than the long division of \S~\ref{alggeo:228.20}, the difference technique of \S~\ref{alggeo:228.30} permits one to extend the basic geometric series in several ways. For instance, the sum \[ S_1 \equiv \sum_{k=0}^\infty k z^{k}, \ \ \left|z\right| < 1 \] (which arises in, among others, Planck's quantum blackbody radiation calculation% \footnote{\cite{McMahon}}% ), we can compute as follows. We multiply the unknown~$S_1$ by~$z$, producing \[ zS_1 = \sum_{k=0}^\infty k z^{k+1} = \sum_{k=1}^\infty (k-1) z^k. \] We then subtract~$zS_1$ from~$S_1$, leaving \[ (1-z)S_1 = \sum_{k=0}^\infty k z^{k} - \sum_{k=1}^\infty (k-1)z^k = \sum_{k=1}^\infty z^k = z \sum_{k=0}^\infty z^k = \frac{z}{1-z}, \] where we have used~(\ref{alggeo:228:45}) to collapse the last sum. Dividing by $1-z$, we arrive at \bq{alggeo:228:50} S_1 \equiv \sum_{k=0}^\infty k z^{k} = \frac{z}{(1-z)^2}, \ \ \left|z\right| < 1, \eq which was to be found. Further series of the kind, such as $\sum_k k^2 z^k$, $\sum_k (k+1)(k) z^k$, $\sum_k k^3 z^k$, etc., can be calculated in like manner as the need for them arises. % ---------------------------------------------------------------------- % bad break \section[Constants and variables]{Indeterminate constants, independent vari- \linebreak ables and dependent variables} \label{alggeo:231} \index{constant} \index{constant, indeterminate} \index{variable} \index{variable!independent} \index{variable!dependent} \index{sound} Mathematical models use \emph{indeterminate constants,} \emph{independent variables} and \emph{dependent variables.} The three are best illustrated by example. Consider the time~$t$ a sound needs to travel from its source to a distant listener: \[ t=\frac{\Delta r}{v_\mr{sound}}, \] where~$\Delta r$ is the distance from source to listener and $v_\mr{sound}$ is the speed of sound. Here, $v_\mr{sound}$ is an indeterminate constant (given particular atmospheric conditions, it doesn't vary),~$\Delta r$ is an independent variable, and~$t$ is a dependent variable. The model gives~$t$ as a function of~$\Delta r$; so, if you tell the model how far the listener sits from the sound source, the model returns the time needed for the sound to propagate from one to the other. Note that the abstract validity of the model does not necessarily depend on whether we actually know the right figure for $v_\mr{sound}$ (if I tell you that sound goes at $500\:\mr{m/s}$, but later you find out that the real figure is $331\:\mr{m/s}$, it probably doesn't ruin the theoretical part of your analysis; you just have to recalculate numerically). Knowing the figure is not the point. The point is that conceptually there pre\"exists some right figure for the indeterminate constant; that sound goes at some constant speed---whatever it is---and that we can calculate the delay in terms of this. \index{concert hall} Although there exists a definite philosophical distinction between the three kinds of quantity, nevertheless it cannot be denied that which particular quantity is an indeterminate constant, an independent variable or a dependent variable often depends upon one's immediate point of view. The same model in the example would remain valid if atmospheric conditions were changing ($v_\mr{sound}$ would then be an independent variable) or if the model were used in designing a musical concert hall% \footnote{ Math books are funny about examples like this. Such examples remind one of the kind of calculation one encounters in a childhood arithmetic textbook, as of the quantity of air contained in an astronaut's round helmet. One could calculate the quantity of water in a kitchen mixing bowl just as well, but astronauts' helmets are so much more interesting than bowls, you see. %(Some editor believes that if a %kid feels that he is doing astronautical calculations, then he may %grow up to be a famous scientist some day. Well, maybe. It's worth a %moderate try, anyway.) The chance that the typical reader will ever specify the dimensions of a real musical concert hall is of course vanishingly small. However, it is the idea of the example that matters here, because the chance that the typical reader will ever specify \emph{something} technical is quite large. Although sophisticated models with many factors and terms do indeed play a major role in engineering, the great majority of practical engineering calculations---for quick, day-to-day decisions where small sums of money and negligible risk to life are at stake---are done with models hardly more sophisticated than the one shown here. So, maybe the concert-hall example is not so unreasonable, after all. } to suffer a maximum acceptable sound time lag from the stage to the hall's back row ($t$ would then be an independent variable;~$\Delta r$, dependent). Occasionally we go so far as deliberately to change our point of view in mid-analysis, now regarding as an independent variable what a moment ago we had regarded as an indeterminate constant, for instance (a typical case of this arises in the solution of differential equations by the method of unknown coefficients, \S~\ref{inttx:240}). Such a shift of viewpoint is fine, so long as we remember that there is a difference between the three kinds of quantity and we keep track of which quantity is which kind to us at the moment. The main reason it matters which symbol represents which of the three kinds of quantity is that in calculus, one analyzes how change in independent variables affects dependent variables as indeterminate constants remain fixed. (Section~\ref{alggeo:227} has introduced the dummy variable, which the present section's threefold taxonomy seems to exclude. However, in fact, most dummy variables are just independent variables---a few are dependent variables---whose scope is restricted to a particular expression. Such a dummy variable does not seem very ``independent,'' of course; but its dependence is on the operator controlling the expression, not on some other variable within the expression. Within the expression, the dummy variable fills the role of an independent variable; without, it fills no role because logically it does not exist there. Refer to \S\S~\ref{alggeo:227} and~\ref{integ:240}.) % ---------------------------------------------------------------------- \section{Exponentials and logarithms} \label{alggeo:230} \index{exponential} \index{exponent} In \S~\ref{alggeo:224} we have considered the power operation~$z^a,$ where (in \S~\ref{alggeo:231}'s language) the independent variable~$z$ is the base and the indeterminate constant~$a$ is the exponent. There is another way to view the power operation, however. One can view it as the \emph{exponential} operation \[ a^z, \] where the variable~$z$ is the exponent and the constant~$a$ is the base. \subsection{The logarithm} \label{alggeo:230.10} \index{logarithm} The exponential operation follows the same laws the power operation follows, but because the variable of interest is now the exponent rather than the base, the inverse operation is not the root but rather the \emph{logarithm:} \bq{alggeo:230:logdef} \log_a (a^z) = z. \eq The logarithm $\log_a w$ answers the question, ``What power must I raise~$a$ to, to get~$w$?'' Raising~$a$ to the power of the last equation, we have that \[ a^{\log_a (a^z)} = a^z. \] With the change of variable $w \la a^z$, this is \bq{alggeo:230:logdef2} a^{\log_a w} = w. \eq Hence, the exponential and logarithmic operations mutually invert one another. \subsection{Properties of the logarithm} \label{alggeo:230.20} \index{logarithm!properties of} The basic properties of the logarithm include that \bqa \log_a uv &=& \log_a u + \log_a v, \label{alggeo:230:310}\\ \log_a \frac{u}{v} &=& \log_a u - \log_a v, \label{alggeo:230:311}\\ \log_a (w^z) &=& z \log_a w, \label{alggeo:230:315}\\ w^z &=& a^{z\log_a w}, \label{alggeo:230:316}\\ \log_b w &=& \frac{ \log_a w }{ \log_a b }. \label{alggeo:230:318} \eqa Of these,~(\ref{alggeo:230:310}) follows from the steps \bqb (uv) &=& (u)(v), \\ (a^{\log_a uv}) &=& (a^{\log_a u})(a^{\log_a v}), \\ a^{\log_a uv} &=& a^{\log_a u + \log_a v}; \eqb and~(\ref{alggeo:230:311}) follows by similar reasoning. Equations~(\ref{alggeo:230:315}) and~(\ref{alggeo:230:316}) follow from the steps \bqb \settowidth\tla{$\ds a^{\log_a (w^z)}$} \settowidth\tla{$\ds a^{\log_a (w^z)}$} w^z = \makebox[\tla][r]{$\ds(w^z)$} &=& (w)^z, \\ w^z = a^{\log_a (w^z)} &=& (a^{\log_a w})^z, \\ w^z = a^{\log_a (w^z)} &=& a^{z \log_a w}. \eqb Equation~(\ref{alggeo:230:318}) follows from the steps \bqb w &=& b^{\log_b w}, \\ \log_a w &=& \log_a( b^{\log_b w}), \\ \log_a w &=& \log_b w \log_a b. \eqb Among other purposes,~(\ref{alggeo:230:310}) through~(\ref{alggeo:230:318}) serve respectively to transform products to sums, quotients to differences, powers to products, exponentials to differently based exponentials, and logarithms to differently based logarithms. Table~\ref{alggeo:230:tbl} repeats the equations along with~(\ref{alggeo:230:logdef}) and~(\ref{alggeo:230:logdef2}) (which also emerge as restricted forms of eqns.~\ref{alggeo:230:315} and~\ref{alggeo:230:316}), thus summarizing the general properties of the logarithm. \begin{table} \caption{General properties of the logarithm.} \label{alggeo:230:tbl} \bqb \log_a uv &=& \log_a u + \log_a v \\ \log_a \frac{u}{v} &=& \log_a u - \log_a v \\ \log_a (w^z) &=& z \log_a w \\ w^z &=& a^{z\log_a w} \\ \log_b w &=& \frac{ \log_a w }{ \log_a b } \\ \log_a (a^z) &=& z \\ w &=& a^{\log_a w} \eqb \end{table} % ---------------------------------------------------------------------- \section{Triangles and other polygons: simple facts} \label{alggeo:323} \index{geometry} \index{triangle} \index{polygon} This section gives simple facts about triangles and other polygons. \subsection{The area of a triangle} \label{alggeo:323.10} \index{triangle!area of} \index{right triangle} \index{triangle!right} \index{rectangle!splitting of down the diagonal} The area of a \emph{right} triangle% \footnote{ A \emph{right triangle} is a triangle, one of whose three angles is perfectly square. } is half the area of the corresponding rectangle. This is seen by splitting a rectangle down its diagonal into a pair of right triangles of equal size. The fact that \emph{any} triangle's area is half its base length times its height is seen by dropping a perpendicular from one point of the triangle to the opposite side (see Fig.~\ref{intro:284:fig} on page~\pageref{intro:284:fig}), dividing the triangle into two right triangles, for each of which the fact is true. In algebraic symbols, \bq{alggeo:323:10} A=\frac{bh}{2}, \eq where~$A$ stands for area,~$b$ for base length, and~$h$ for perpendicular height. \subsection{The triangle inequalities} \label{alggeo:323.20} \index{triangle inequalities} \index{proof!by sketch} \index{sketch, proof by} Any two sides of a triangle together are longer than the third alone, which itself is longer than the difference between the two. In symbols, \bq{alggeo:323:20} \left|a-b\right| < c < a+b, \eq where~$a$, $b$ and~$c$ are the lengths of a triangle's three sides. These are the \emph{triangle inequalities.} The truth of the sum inequality $c < a+b$, is seen by sketching some triangle on a sheet of paper and asking: if~$c$ is the direct route between two points and $a+b$ is an indirect route, then how can $a+b$ not be longer? Of course the sum inequality is equally good on any of the triangle's three sides, so one can write $a}(-3,0)(0,0)(-2.5,1.5) \psset{linewidth=0.5pt} \psline(0,0)(3,0) \psarc{->}(0,0){0.5}{0}{149.04} \rput(0.4,0.7){$\phi$} \psarc{-}(0,0){0.7}{149.04}{180} \rput(-1.1,0.3){$\psi$} \end{pspicture} } \ec \end{figure} In mathematical notation, \bqb \phi_1+\phi_2+\phi_3 &=& 2\pi,\\ \phi_k+\psi_k &=& \frac{2\pi}{2},\ \ k=1,2,3, \eqb where~$\psi_k$ and~$\phi_k$ are respectively the triangle's inner angles and the angles through which the car turns. Solving the latter equations for~$\phi_k$ and substituting into the former yields \bq{alggeo:323:50} \psi_1+\psi_2+\psi_3 = \frac{2\pi}{2}, \eq which was to be demonstrated. \index{angle!of a polygon} Extending the same technique to the case of an $n$-sided polygon, we have that \bqb \sum_{k=1}^n \phi_k &=& 2\pi,\\ \phi_k+\psi_k &=& \frac{2\pi}{2}. \eqb Solving the latter equations for~$\phi_k$ and substituting into the former yields \[ \sum_{k=1}^n \left( \frac{2\pi}{2} - \psi_k \right) = 2\pi, \] or in other words \bq{alggeo:323:60} \sum_{k=1}^n \psi_k = (n-2)\frac{2\pi}{2}. \eq Equation~(\ref{alggeo:323:50}) is then seen to be a special case of~(\ref{alggeo:323:60}) with $n=3$. % ---------------------------------------------------------------------- \section{The Pythagorean theorem} \label{alggeo:223} \index{Pythagorean theorem} \index{Pythagoras (c.~580--c.~500~B.C.)} \index{diagonal} \index{hypotenuse} \index{leg} Along with Euler's formula~(\ref{cexp:euler}), the fundamental theorem of calculus % bad break (\ref{integ:antider}), Cauchy's integral formula~(\ref{taylor:cauchy}) and Fourier's equation~(\ref{fouri:eqn}), the Pythagorean theorem is one of the most famous results in all of mathematics. The theorem holds that \bq{alggeo:pythag} a^2 + b^2 = c^2, \eq where~$a$, $b$ and~$c$ are the lengths of the legs and diagonal of a right triangle, as in Fig.~\ref{alggeo:223:25}. \begin{figure} \caption{A right triangle.} \label{alggeo:223:25} \bc { \nc\xax{-5} \nc\xbx{-1} \nc\xcx{ 4} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \psset{linewidth=2.0pt} \psline{C-C}(0,0)(0,2.6)(-1.4,0)(0,0) \rput(-.7,-.3){$a$} \rput(.3,1.3){$b$} \rput(-1.0,1.45){$c$} \nc\tmpd{0.3} \psline[linewidth=0.5pt](-\tmpd,0)(-\tmpd,\tmpd)(0,\tmpd) \end{pspicture} } \ec \end{figure} Many proofs of the theorem are known. \index{square!tilted} \index{area} \index{Euclid (c.~300~B.C.)} One such proof posits a square of side length $a+b$ with a tilted square of side length~$c$ inscribed as in Fig.~\ref{alggeo:223:35}. \begin{figure} \caption{The Pythagorean theorem.} \label{alggeo:223:35} \bc { \nc\xax{-5} \nc\xbx{-3} \nc\xcx{ 5} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \psset{linewidth=2.0pt} \nc\tmpa{0.6} \nc\tmpb{1.97} \nc\tmpc{2.0} \psline{C-C}( \tmpa,-\tmpb)( \tmpb, \tmpa)(-\tmpa, \tmpb)(-\tmpb,-\tmpa)( \tmpa,-\tmpb) \psline{C-C}( \tmpc,-\tmpc)( \tmpc, \tmpc)(-\tmpc, \tmpc)(-\tmpc,-\tmpc)( \tmpc,-\tmpc) \rput(1.3,-2.3){$a$} \rput(2.3,-.7){$b$} \rput(-.7,-2.3){$b$} \psline[linewidth=0.5pt](\tmpa,-2.0)(\tmpa,-2.3) \rput(1.0,-.55){$c$} \nc\tmpd{0.3} \rput(2,-2){ \psline[linewidth=0.5pt](-\tmpd,0)(-\tmpd,\tmpd)(0,\tmpd) } \rput(\tmpa,-\tmpb) { \rput{-28.30}(0,0) { \psline[linewidth=0.5pt](-\tmpd,0)(-\tmpd,\tmpd)(0,\tmpd) } } \end{pspicture} } \ec \end{figure} The area of each of the four triangles in the figure is evidently $ab/2$. The area of the tilted inner square is~$c^2$. The area of the large outer square is $(a+b)^2$. But the large outer square is comprised of the tilted inner square plus the four triangles, hence the area of the large outer square equals the area of the tilted inner square plus the areas of the four triangles. In mathematical symbols, this is \[ (a+b)^2 = c^2 + 4\left(\frac{ab}{2}\right), \] which simplifies directly to~(\ref{alggeo:pythag}).% \footnote{ This elegant proof is far simpler than the one famously given by the ancient geometer Euclid, yet more appealing than alternate proofs often found in print. Whether Euclid was acquainted with the simple proof given here this writer does not know, but it is possible \cite[``Pythagorean theorem,'' 02:32, 31 March 2006]{wikip} that Euclid chose his proof because it comported better with the restricted set of geometrical elements he permitted himself to work with. Be that as it may, the present writer encountered the proof this section gives somewhere years ago and has never seen it in print since, so can claim no credit for originating it. Unfortunately the citation is now long lost. A current, electronic source for the proof is~\cite{wikip} as cited earlier in this footnote. } \index{Pythagorean theorem!in three dimensions} \index{diagonal!three-dimensional} \index{altitude} The Pythagorean theorem is readily extended to three dimensions as \bq{alggeo:223:40} a^2 + b^2 + h^2 = r^2, \eq where~$h$ is an altitude perpendicular to both~$a$ and~$b$, thus also to~$c$; and where~$r$ is the corresponding three-dimensional diagonal: the diagonal of the right triangle whose legs are~$c$ and~$h$. Inasmuch as~(\ref{alggeo:pythag}) applies to any right triangle, it follows that $c^2+h^2=r^2$, which equation expands directly to yield~(\ref{alggeo:223:40}). % ---------------------------------------------------------------------- \section{Functions} \label{alggeo:250} \index{function} \index{mapping} This book is not the place for a gentle introduction to the concept of the function. Briefly, however, a \emph{function} is a mapping from one number (or vector of several numbers) to another. For example, $f(x) = x^2 - 1$ is a function which maps~1 to~0 and~$-3$ to~$8$, among others. \index{domain} \index{range} One often speaks of \emph{domains} and \emph{ranges} when discussing functions. The \emph{domain} of a function is the set of numbers one can put into it. The \emph{range} of a function is the corresponding set of numbers one can get out of it. In the example, if the domain is restricted to real~$x$ such that $\left|x\right|\le 3$, then the corresponding range is $-1 \le f(x) \le 8$. % diagn: this new paragraph wants review. \index{inverse!of a function} \index{function!inverse of} \index{invertibility} The notation $f^{-1}(\cdot)$ indicates the \emph{inverse} of the function $f(\cdot)$ such that \bq{alggeo:250:30} f^{-1}[f(x)] = x = f[f^{-1}(x)], \eq thus swapping the function's range with its domain. Though only a function which never maps distinct values from its domain together onto a single value in its range is strictly \emph{invertible}---which means that the $f(x) = x^2 - 1$ of the example is not strictly invertible, since it maps both $x=-u$ and $x=u$ onto $f(x) = u^2 - 1$---it often does not pay to interpret the requirement too rigidly, for context tends to choose between available values (see for instance \S~\ref{taylor:325}). Inconsistently, inversion's notation $f^{-1}(\cdot)$ clashes with the similar-looking but different-meaning notation $f^2(\cdot) \equiv [f(\cdot)]^2$, whereas $f^{-1}(\cdot) \neq [f(\cdot)]^{-1}$: both notations are conventional and both are used in this book. \index{root} \index{zero} \index{pole} \index{pole!multiple} \index{multiple pole} \index{singularity} \index{divergence to infinity} Other terms which arise when discussing functions are \emph{root} (or \emph{zero}), \emph{singularity} and \emph{pole}. A \emph{root} (or \emph{zero}) of a function is a domain point at which the function evaluates to zero (the example has roots at $x=\pm 1$). A \emph{singularity} of a function is a domain point at which the function's output \emph{diverges;} that is, where the function's output is infinite.% \footnote{ % diagn: this revised footnote wants review. Here is one example of the book's deliberate lack of formal mathematical rigor. A more precise formalism to say that ``the function's output is infinite'' might be \[ \lim_{x\ra x_o}\left|f(x)\right| = \infty, \] and yet preciser formalisms than this are conceivable, and occasionally even are useful. Be that as it may, the applied mathematician tends to avoid such formalisms where there seems no immediate need for them. } A \emph{pole} is a singularity that behaves locally like $1/x$ (rather than, for example, like $1/\sqrt x$). A singularity that behaves as $1/x^N$ is a \emph{multiple pole}, which (\S~\ref{inttx:260.20}) can be thought of as~$N$ poles. The example's function $f(x)$ has no singularities for finite~$x$; however, the function $h(x)=1/(x^2-1)$ has poles at $x=\pm 1$. \index{branch point} \index{essential singularity} \index{singularity!essential} (Besides the root, the singularity and the pole, there is also the troublesome \emph{branch point,} an infamous example of which is $z=0$ in the function $g[z]=\sqrt{z}$. Branch points are important, but the book must lay a more extensive foundation before introducing them properly in \S~\ref{taylor:325}.% \footnote{ There is further the \emph{essential singularity,} an example of which is $z=0$ in $p(z)=\exp(1/z)$, but the best way to handle such unreasonable singularities is almost always to change a variable, as $w \la 1/z$, or otherwise to frame the problem such that one need not approach the singularity. % diagn: review the rest of the footnote; check the vague reference Except tangentially much later when it treats asymptotic series, this book will have little to say of such singularities. }) % ---------------------------------------------------------------------- \section{Complex numbers (introduction)} \label{alggeo:225} \index{number} \index{real number} \index{imaginary number} \index{complex number} \index{number!real} \index{number!imaginary} \index{number!complex} \index{square root} \index{unit} \index{unit!real} \index{unit!imaginary} \index{imaginary unit} \index{conjugate} \index{elf} \index{$i$} Section~\ref{alggeo:224roots} has introduced square roots. What it has not done is to tell us how to regard a quantity such as $\sqrt{-1}$. Since there exists no real number~$i$ such that \bq{alggeo:225:i} i^2 = -1 \eq and since the quantity~$i$ thus defined is found to be critically important across broad domains of higher mathematics, we accept~(\ref{alggeo:225:i}) as the definition of a fundamentally new kind of quantity: \emph{the imaginary number.}% \footnote{ The English word \emph{imaginary} is evocative, but perhaps not of quite the right concept in this usage. Imaginary numbers are not to mathematics as, say, imaginary elfs are to the physical world. In the physical world, imaginary elfs are (presumably) not substantial objects. However, in the mathematical realm, imaginary numbers \emph{are} substantial. The word \emph{imaginary} in the mathematical sense is thus more of a technical term than a descriptive adjective. The number~$i$ is just a concept, of course, but then so is the number~$1$ (though you and I have often met one \emph{of something}---one apple, one chair, one summer afternoon, etc.---neither of us has ever met just~1). The reason imaginary numbers are called ``imaginary'' probably has to do with the fact that they emerge from mathematical operations only, never directly from counting things. Notice, however, that the number 1/2 never emerges directly from counting things, either. If for some reason the $i$year were offered as a unit of time, then the period separating your fourteenth and twenty-first birthdays could have been measured as $-i7$~$i$years. Madness? No, let us not call it that; let us call it a useful formalism, rather. The unpersuaded reader is asked to suspend judgment a while. He will soon see the use. } Imaginary numbers are given their own number line, plotted at right angles to the familiar real number line as in Fig.~\ref{alggeo:225:fig}. \begin{figure} \caption[The complex (or Argand) plane.]{The complex (or Argand) plane, and a complex number~$z$ therein.} \label{alggeo:225:fig} \index{complex plane} \index{Argand plane} \index{Argand, Jean-Robert (1768--1822)} \bc { \nc\xax{-5} \nc\xbx{-3} \nc\xcx{ 5} \nc\xdx{ 3} \begin{pspicture}(\xax,\xbx)(\xcx,\xdx) \psset{dimen=middle} \small %\psframe[linewidth=0.5pt](\xax,\xbx)(\xcx,\xdx) \nc\xx{2.5} \nc\xxb{1.0} \nc\xxc{2.0} \nc\xxd{0.1} \nc\xxe{0.25} \nc\xxf{0.25} \nc\xxp{1.3} \nc\xxq{26.565} \nc\xxqq{13.283} \nc\xxr{2.2361} \nc\xxrr{1.1180} \nc\xxrrr{2.3} \nc\xxs{2.6} \nc\xxlsep{0.20} \nc\xxa{ \psline[linewidth=1.0pt]{<->}(-\xx,0)(\xx,0) \psline[linewidth=0.5pt]( \xxb,-\xxd)( \xxb,\xxd) \psline[linewidth=0.5pt](-\xxb,-\xxd)(-\xxb,\xxd) \psline[linewidth=0.5pt]( \xxc,-\xxd)( \xxc,\xxd) \psline[linewidth=0.5pt](-\xxc,-\xxd)(-\xxc,\xxd) } \xxa \rput{90}(0,0){\xxa} \rput[r](-\xxe,-2){$-i2$} \rput[r](-\xxe,-1){$-i $} \rput[r](-\xxe, 1){$ i $} \rput[r](-\xxe, 2){$ i2$} \rput[t](-2,-\xxf){$-2$} \rput[t](-1,-\xxf){$-1$} \rput[t]( 1,-\xxf){$ 1$} \rput[t]( 2,-\xxf){$ 2$} \psline[linewidth=2.0pt]{cc-*}(0,0)(2,1) \psarc[linewidth=0.5pt]{->}(0,0){\xxp}{0}{\xxq} \rput{\xxqq}(0,0){ \uput[r](\xxp,0){ \rput{*0}(0,0){$\phi$} } } \rput{\xxq}(0,0){ \uput{\xxlsep}[u](\xxrr,0){ \rput{*0}(0,0){$\rho$} } \uput{\xxlsep}[r](\xxrrr,0){ \rput{*0}(0,0){$z$} } } \rput[l](\xxs,0){$\Re(z)$} \rput[b](0,\xxs){$i\Im(z)$} \end{pspicture} } \ec \end{figure} The sum of a real number~$x$ and an imaginary number~$iy$ is the \emph{complex number} \[ z = x + iy. \] The \emph{conjugate}~$z^{*}$ of this complex number is defined to be% \footnote{ % diagn: footnote wants one last review. For some inscrutable reason, in the author's country at least, professional mathematicians seem universally to write~$\overline z$ instead of~$z^{*}$, whereas rising engineers take the mathematicians' classes at school and then, having passed the classes, promptly start writing~$z^{*}$ for the rest of their lives. The writer has his preference between the two notations and this book reflects it, but the curiously absolute character of the notational split is interesting as a social phenomenon. } \[ z^{*} = x - iy. \] The \emph{magnitude} (or \emph{modulus,} or \emph{absolute value}) $\left|z\right|$ of the complex number is defined to be the length~$\rho$ in Fig.~\ref{alggeo:225:fig}, which per the Pythagorean theorem (\S~\ref{alggeo:223}) is such that \bq{225:10} \index{magnitude} \index{modulus} \index{absolute value} \index{complex number!magnitude of} \left|z\right|^2 = x^2 + y^2. \eq The \emph{phase} $\arg z$ of the complex number is defined to be the angle~$\phi$ in the figure, which in terms of the trigonometric functions of \S~\ref{trig:226}% \footnote{ This is a forward reference. If the equation doesn't make sense to you yet for this reason, skip it for now. The important point is that $\arg z$ is the angle~$\phi$ in the figure. } is such that \bq{225:11} \index{phase} \index{complex number!phase of} \index{arg} \tan (\arg z) = \frac{y}{x}. \eq Specifically to extract the real and imaginary parts of a complex number, the notations \begin{equation} \label{225:12} \index{real part} \index{imaginary part} \index{complex number!real part of} \index{complex number!imaginary part of} \begin{split} \Re(z) &= x, \\ \Im(z) &= y, \end{split} \end{equation} are conventionally recognized (although often the symbols $\Re[\cdot]$ and $\Im[\cdot]$ are written~$\mr{Re}[\cdot]$ and~$\mr{Im}[\cdot]$, particularly when printed by hand). \subsection[Rectangular complex multiplication]{Multiplication and division of complex numbers in rectangular form} \label{alggeo:225.1} \index{multiplication} \index{division} \index{complex number!multiplication and division} Several elementary properties of complex numbers are readily seen if the fact that $i^2 = -1$ is kept in mind, including that \bqa z_1z_2 &=& (x_1x_2 - y_1y_2) + i(y_1x_2 + x_1y_2), \label{alggeo:225:10}\\ \frac{z_1}{z_2} &=& \frac{x_1+iy_1}{x_2+iy_2} = \left(\frac{x_2-iy_2}{x_2-iy_2}\right)\frac{x_1+iy_1}{x_2+iy_2} \xn\\ &=& \frac{(x_1x_2+y_1y_2)+i(y_1x_2-x_1y_2)}{x_2^2+y_2^2}. \label{alggeo:225:20} \eqa It is a curious fact that \bq{alggeo:225:23} \frac{1}{i} = -i. \eq It is a useful fact that \bq{alggeo:225:24} z^{*}z = x^2 + y^2 = \left| z \right|^2 \eq (the curious fact, eqn.~\ref{alggeo:225:23}, is useful, too). Sometimes convenient are the forms \bq{alggeo:225:25} \begin{split} \Re(z) &= \frac{z + z^{*}}{2}, \\ \Im(z) &= \frac{z - z^{*}}{i2}, \end{split} \eq trivially proved. \subsection{Complex conjugation} \label{alggeo:225.2} \index{conjugate} \index{conjugation} \index{complex conjugation} \index{complex number!conjugating} \index{induction} \index{proof!by induction} An important property of complex numbers descends subtly from the fact that \[ i^2 = -1 = (-i)^2. \] If one defined some number $j\equiv-i$, claiming that~$j$ not~$i$ were the true imaginary unit,% \footnote{\cite[\S~I:22-5]{Feynman}} then one would find that \[ (-j)^2 = -1 = j^2, \] and thus that all the basic properties of complex numbers in the~$j$ system held just as well as they did in the~$i$ system. The units~$i$ and~$j$ would differ indeed, but would perfectly mirror one another in every respect. That is the basic idea. To establish it symbolically needs a page or so of slightly abstract algebra as follows, the goal of which will be to show that $[f(z)]^{*} = f(z^{*})$ for some unspecified function $f(z)$ with specified properties. To begin with, if \[ z = x+iy, \] then \[ z^{*} = x-iy \] by definition. Proposing that $(z^{k-1})^{*}=(z^{*})^{k-1}$ (which may or may not be true but for the moment we assume it), we can write \bqb z^{k-1} &=& s_{k-1}+it_{k-1}, \\ (z^{*})^{k-1} &=& s_{k-1}-it_{k-1}, \eqb where $s_{k-1}$ and $t_{k-1}$ are symbols introduced to represent the real and imaginary parts of $z^{k-1}$. Multiplying the former equation by $z=x+iy$ and the latter by $z^{*}=x-iy$, we have that \bqb z^k &=& (xs_{k-1}-yt_{k-1})+i(ys_{k-1}+xt_{k-1}), \\ (z^{*})^k &=& (xs_{k-1}-yt_{k-1})-i(ys_{k-1}+xt_{k-1}). \eqb With the definitions $s_k \equiv xs_{k-1}-yt_{k-1}$ and $t_k \equiv ys_{k-1}+xt_{k-1}$, this is written more succinctly \bqb z^k &=& s_k+it_k, \\ (z^{*})^k &=& s_k-it_k. \eqb In other words, if $(z^{k-1})^{*}=(z^{*})^{k-1}$, then it necessarily follows that $(z^k)^{*}=(z^{*})^k$. Solving the definitions of $s_k$ and $t_k$ for $s_{k-1}$ and $t_{k-1}$ yields the reverse definitions $s_{k-1} = (xs_k+yt_k)/(x^2+y^2)$ and $t_{k-1} = (-ys_k+xt_k)/(x^2+y^2)$. Therefore, except when $z=x+iy$ happens to be null or infinite, the implication is reversible by reverse reasoning, so by mathematical induction% \footnote{ \emph{Mathematical induction} is an elegant old technique for the construction of mathematical proofs. Section~\ref{taylor:314} elaborates on the technique and offers a more extensive example. Beyond the present book, a very good introduction to mathematical induction is found in~\cite{Hamming}. } we have that \bq{alggeo:225:21} (z^k)^{*}=(z^{*})^k \eq for all integral~$k$. We have also from~$(\ref{alggeo:225:10})$ that \bq{alggeo:225:22} (z_1z_2)^{*} = z_1^{*}z_2^{*} \eq for any complex~$z_1$ and~$z_2$. Consequences of~(\ref{alggeo:225:21}) and~(\ref{alggeo:225:22}) include that if \bqa f(z) &\equiv& \sum_{k=-\infty}^{\infty} (a_k+ib_k) (z-z_o)^k, \label{alggeo:225:laurent} \\ f^{*}(z) &\equiv& \sum_{k=-\infty}^{\infty} (a_k-ib_k) (z-z_o^{*})^k, \label{alggeo:225:laurent2} \eqa where~$a_k$ and~$b_k$ are real and imaginary parts of the coefficients peculiar to the function $f(\cdot)$, then \bq{alggeo:225:conj} \left[f(z)\right]^{*} = f^{*}(z^{*}). \eq In the common case where all $b_k=0$ and $z_o=x_o$ is a real number, then $f(\cdot)$ and $f^{*}(\cdot)$ are the same function, so~(\ref{alggeo:225:conj}) reduces to the desired form \bq{alggeo:225:conj2} \left[f(z)\right]^{*} = f(z^{*}), \eq which says that the effect of conjugating the function's input is merely to conjugate its output. Equation~(\ref{alggeo:225:conj2}) expresses a significant, general rule of complex numbers and complex variables which is better explained in words than in mathematical symbols. The rule is this: for most equations and systems of equations used to model physical systems, \emph{one can produce an equally valid alternate model simply by simultaneously conjugating all the complex quantities present.}% \footnote{\cite{Hamming}\cite{Spiegel}} \subsection{Power series and analytic functions (preview)} \label{alggeo:225.3} \index{power series} \index{function!analytic} \index{analytic function} Equation~(\ref{alggeo:225:laurent}) is a general power series% \footnote{\cite[\S~10.8]{Hildebrand}} in $z-z_o$. Such power series have broad application.% \footnote{ That is a pretty impressive-sounding statement: ``Such power series have broad application.'' However, molecules, air and words also have ``broad application''; merely stating the fact does not tell us much. In fact the general power series is a sort of one-size-fits-all mathematical latex glove, which can be stretched to fit around almost any function. The interesting part is not so much in the general form~(\ref{alggeo:225:laurent}) of the series as it is in the specific choice of~$a_k$ and~$b_k$, which this section does not discuss. Observe that the Taylor series (which this section also does not discuss; see \S~\ref{taylor:310}) is a power series with $a_k=b_k=0$ for $k<0$. } It happens in practice that most functions of interest in modeling physical phenomena can conveniently be constructed as power series (or sums of power series)% \footnote{ The careful reader might observe that this statement neglects \emph{Gibbs' phenomenon,} but that curious matter will be dealt with in \S~\ref{fours:170}. } with suitable choices of~$a_k$, $b_k$ and~$z_o$. The property~(\ref{alggeo:225:conj}) applies to all such functions, with~(\ref{alggeo:225:conj2}) also applying to those for which $b_k=0$ and $z_o=x_o$. The property the two equations represent is called the \emph{conjugation property.} Basically, it says that if one replaces all the~$i$ in some mathematical model with~$-i$, then the resulting conjugate model is equally as valid as the original.% \footnote{ To illustrate, from the fact that $(1+i2)(2+i3) + (1-i) = -3+i6$ the conjugation property infers immediately that $(1-i2)(2-i3) + (1+i) = -3-i6$. Observe however that no such property holds for the real parts: $(-1+i2)(-2+i3) + (-1-i) \neq 3+i6$. } Such functions, whether $b_k=0$ and $z_o=x_o$ or not, are \emph{analytic functions} (\S~\ref{taylor:320}). In the formal mathematical definition, a function is analytic which is infinitely differentiable (Ch.~\ref{drvtv}) in the immediate domain neighborhood of interest. However, for applications a fair working definition of the analytic function might be ``a function expressible as a power series.'' Chapter~\ref{taylor} elaborates. All power series are infinitely differentiable except at their poles. \index{function!nonanalytic} \index{nonanalytic function} There nevertheless exist one common group of functions which cannot be constructed as power series. These all have to do with the parts of complex numbers and have been introduced in this very section: the magnitude $\left|\cdot\right|$; the phase $\arg(\cdot)$; the conjugate $(\cdot)^{*}$; and the real and imaginary parts $\Re(\cdot)$ and $\Im(\cdot)$. These functions are not analytic and do not in general obey the conjugation property. Also not analytic are the Heaviside unit step $u(t)$ and the Dirac delta $\delta(t)$ (\S~\ref{integ:670}), used to model discontinuities explicitly. We shall have more to say about analytic functions in Ch.~\ref{taylor}. We shall have more to say about complex numbers in \S\S~\ref{trig:280}, \ref{drvtv:230.35}, and~\ref{drvtv:240}, and much more yet in Ch.~\ref{cexp}. derivations-0.53.20120414.orig/tex/prob.tex0000644000000000000000000022606211742566274016633 0ustar rootroot\chapter{Probability} \label{prob} \index{probability} [This chapter is still quite a rough draft.] \index{statistics} Of all mathematical fields of study, none may be so counterintuitive yet, paradoxically, so widely applied as that of probability, whether as \emph{probability} in the technical term's conventional, restricted meaning or as probability in its expanded or inverted guise as \emph{statistics.}% \footnote{ The nomenclature is slightly unfortunate. Were statistics called ``inferred probability'' or ``probabilistic estimation'' the name would suggest something like the right taxonomy. Actually, the nomenclature is fine once you know what it means, but on first encounter it provokes otherwise unnecessary footnotes like this one. Statistics (singular noun) the expanded mathematical discipline---as opposed to the statistics (plural noun) mean and standard deviation of \S~\ref{prob:070}---as such lies mostly beyond this book's scope, but the chapter will have at least a little to say about it in \S~\ref{prob:200}. } Man's mind, untrained, seems somehow to rebel against the concept. Sustained reflection on the concept however gradually reveals to the mind a fascinating mathematical landscape. \index{thumb} \index{uncertainty} As calculus is the mathematics of change, so \emph{probability} is the mathematics of uncertainty. If I tell you that my thumb is three inches long, I likely do not mean that it is exactly three inches. I mean that it is about three inches. Quantitatively, I might report the length as~$3.0 \pm 0.1$ inches, thus indicating not only the length but the degree of uncertainty in the length. Probability in its guise as \emph{statistics} is the mathematics which produces, analyzes and interprets such quantities. \index{American male} \index{height} More obviously statistical is a report that the average, say,~25-year-old American male is~$70 \pm 3$ inches tall, inferred from actual measurements of some number $N>1$ of~25-year-old American males. Deep mathematics underlie such a report, for the report implies among other things that a little over two-thirds---$(1/\sqrt{2\pi})\int_{-1}^{1}\exp(-\tau^2/2)\,d\tau \approx \mbox{0x0.AEC5}$, to be precise---of a typical, randomly chosen sample of~25-year-old American males ought to be found to have heights between~67 and~73 inches. \index{game of chance} \index{card} \index{deck of cards} Probability is also met in games of chance and in models of systems which---from the model's point of view---logically resemble games of chance, and in this setting probability is not statistics. The reason it is not is that its mathematics in this case is based not on observation but on a teleological assumption of some kind, often an assumption of symmetry such as that no face of a die or card from a deck ought to turn up more often than another. Entertaining so plausible an assumption, if you should draw three cards at random from a standard~52-card deck (let us use decimal notation rather than hexadecimal in this paragraph, since neither you nor I have ever heard of a~0x34-card deck), the deck comprising four cards each of thirteen ranks, then there would be some finite probability---which is $(3/51)(2/50) = 1/425$---that the three cards you draw would share the same rank (why?). If I should however shuffle the deck, draw three cards off the top, and look at the three cards without showing them to you, all before inviting you to draw three, then the probability that your three would share the same rank were again $1/425$ (why?). On the other hand, if before you drew I let you peek at my three hidden cards, and you saw that they were ace, queen and ten, this knowledge alone must slightly lower your estimation of the probability that your three would subsequently share the same rank as one another to $(40/49)(3/48)(2/47) + (9/49)(2/48)(1/47) \approx 1/428$ (again, why?). \index{trial} That the probability should be $1/425$ suggests that one would draw three of the same rank once in~425 tries. However, were I to shuffle~425 decks and you to draw three cards from each, then you \emph{might} draw three of the same rank from two, three or four decks, or from none at all, though very unlikely from twenty decks---so what does a probability of $1/425$ really mean? The answer is that it means something like this: were I to shuffle~425 \emph{million} decks then you would draw three of the same rank from very nearly~1.0 million decks---almost certainly not from as many as~1.1 million nor as few as~0.9 million. It means that the ratio of the number of three-of-the-same-rank events to the number of trials must converge exactly upon $1/425$ as the number of trials tends toward infinity. See also \S~\ref{drvtv:220}. Other than by this brief introduction, the book you are reading is not well placed to offer a gentle tutorial in probabilistic thought.% \footnote{ The late R.W. Hamming's fine book~\cite{Hamming} ably fills such a role. } What it does offer, in the form of the present chapter, is the discovery and derivation of the essential mathematical functions of probability theory (including in \S~\ref{prob:100} the derivation of one critical result undergraduate statistics textbooks usually state but, understandably, omit to prove), plus a brief investigation of these functions' principal properties and typical use. % ---------------------------------------------------------------------- \section{Definitions and basic concepts} \label{prob:050} \index{probability!definitions pertaining to} \index{probability} \index{probability density function} \index{density function, probability} \index{PDF (probability density function)} \index{distribution} \index{cumulative distribution function} \index{distribution function, cumulative} \index{CDF (cumulative distribution function)} \index{quantile} \index{random variable} \index{variable!random} A \emph{probability} is the chance that a trial of some kind will result in some specified event of interest. Conceptually, \[ P_{\mr{event}} \equiv \lim_{N\ra\infty} \frac{N_{\mr{event}}}{N}, \] where~$N$ and~$N_{\mr{event}}$ are the numbers respectively of trials and events. A \emph{probability density function} (PDF) or \emph{distribution} is a function $f(x)$ defined such that \bq{prob:050:10} \begin{split} P_{ba} &= \int_{a}^{b} f(x) \,dx, \\ 1 &= \int_{-\infty}^{\infty} f(x) \,dx, \\ 0 &\le f(x), \ \ \Im[f(x)] = 0. \end{split} \eq where the event of interest is that the \emph{random variable}~$x$ fall% \footnote{ This sentence and the rest of the section condense somewhat lengthy tracts of an introductory collegiate statistics text like~\cite{Walpole/Myers}\cite{Alder/Roessler}\cite{Lindgren}\cite{Rosenkrantz}, among others. If the sentence and section make little sense to you then so likely will the rest of the chapter, but any statistics text you might find conveniently at hand should fill the gap---which is less a mathematical gap than a conceptual one. Or, if defiant, you can stay here and work through the concepts on your own. } in the interval% \footnote{ We might as well have expressed the interval $a < x < b$ as $a \le x \le b$ or even as $a \le x < b$, except that such notational niceties would distract from the point the notation means to convey. The notation in this case is not really interested in the bounding points themselves. If \emph{we} are interested the bounding points, as for example we would be if $f(x) = \delta(x)$ and $a=0$, then we can always write in the style of $P_{(0^{-})b}$, $P_{(0^{+})b}$, $P_{(a-\ep)(b+\ep)}$, $P_{(a+\ep)(b-\ep)}$ or the like. We can even be most explicit and write in the style of $P\{a \le x \le b\}$, often met in the literature. } $a < x < b$ and~$P_{ba}$ is the probability of this event. The corresponding \emph{cumulative distribution function} (CDF) is \bq{prob:CDF} F(x) \equiv \int_{-\infty}^{x} f(\tau) \,d\tau, \eq where \bq{prob:050:020} \begin{split} 0 &= F(-\infty), \\ 1 &= F(\infty), \\ P_{ba} &= F(b) - F(a). \end{split} \eq The \emph{quantile} $F^{-1}(\cdot)$ inverts the CDF $F(x)$ such that \bq{prob:quantile} F^{-1}[F(x)] = x, \eq generally calculatable by a Newton-Raphson iteration~(\ref{drvtv:NR}) if by no other means. \index{probability!that both of two, independent events will occur} \index{probability density function!of a sum of random variables} \index{convolution} It is easy enough to see that the product \bq{prob:050:30} P = P_1P_2 \eq of two probabilities composes the single probability that not just one but both of two, independent events will occur. Harder to see, but just as important, is that the convolution \bq{prob:050:40} f(x) = f_1(x) \ast f_2(x) \eq of two probability density functions composes the single probability density function of the sum of two random variables $x=x_1+x_2$, where, per Table~\ref{fouri:110:tbl41}, \[ f_2(x) \ast f_1(x) = f_1(x) \ast f_2(x) \equiv \int_{-\infty}^{\infty} f_1\left(\frac{x}{2}-\tau\right) f_2\left(\frac{x}{2}+\tau\right) \,d\tau. \] That is, if you think about it in a certain way, the probability that $a < x_1 + x_2 < b$ cannot but be \bqb P_{ba} &=& \lim_{\ep \ra 0^{+}} \sum_{k=-\infty}^{\infty} \bigg\{ \bigg[ \int_{(k-1/2)\ep}^{(k+1/2)\ep} f_1(x) \,dx \bigg] \bigg[ \int_{a-k\ep}^{b-k\ep} f_2(x) \,dx \bigg] \bigg\} \\&=& \lim_{\ep \ra 0^{+}} \sum_{k=-\infty}^{\infty} \bigg\{ \bigg[ \ep f_1(k\ep) \bigg] \bigg[ \int_{a}^{b} f_2(x-k\ep) \,dx \bigg] \bigg\} \\&=& \int_{-\infty}^{\infty} f_1(\tau) \bigg[ \int_{a}^{b} f_2(x-\tau) \,dx \bigg] \,d\tau \\&=& \int_{a}^{b} \left[ \int_{-\infty}^{\infty} f_1(\tau) f_2(x-\tau) \,d\tau \right] \,dx \\&=& \int_{a}^{b} \left[ \int_{-\infty}^{\infty} f_1\left(\frac x2 + \tau\right) f_2\left(\frac x2 - \tau\right) \,d\tau \right] \,dx \\&=& \int_{a}^{b} \left[ \int_{-\infty}^{\infty} f_1\left(\frac x2 - \tau\right) f_2\left(\frac x2 + \tau\right) \,d\tau \right] \,dx, \eqb which in consideration of~(\ref{prob:050:10}) implies~(\ref{prob:050:40}). % ---------------------------------------------------------------------- \section{The statistics of a distribution} \label{prob:070} \index{statistic} \index{mean} \index{standard deviation} \index{$\mu$} \index{$\sigma$} \index{expected value} \index{$\langle\cdot\rangle$} A probability density function $f(x)$ describes a distribution whose \emph{mean}~$\mu$ and \emph{standard deviation}~$\sigma$ are defined such that \bq{prob:stat} \begin{split} \mu \equiv \langle x \rangle &= \int_{-\infty}^{\infty} f(x) x \,dx, \\ \sigma^2 \equiv \left\langle ( x - \langle x \rangle )^2 \right\rangle &= \int_{-\infty}^{\infty} f(x) (x-\mu)^2 \,dx; \end{split} \eq where~$\langle\cdot\rangle$ indicates the \emph{expected value} of the quantity enclosed, defined as the first line of~(\ref{prob:stat}) suggests. The mean~$\mu$ is just the distribution's average, about which a random variable should center. The standard deviation~$\sigma$ measures a random variable's typical excursion from the mean. The mean and standard deviation are \emph{statistics} of the distribution.% \footnote{ Other statistics than the mean and standard deviation are possible, but these two are the most important ones and are the two this book treats. } When the chapter's introduction proposed that the average~25-year-old American male were~$70 \pm 3$ inches tall, it was saying that his height could quantitatively be modeled as a random variable drawn from a distribution whose statistics are $\mu = 70$ inches and $\sigma = 3$ inches. % ---------------------------------------------------------------------- \section{The sum of random variables} \label{prob:074} \index{random variable!sum of} The statistics of the sum of two random variables $x=x_1+x_2$ are of interest. For the mean, substituting~(\ref{prob:050:40}) into the first line of~(\ref{prob:stat}), \bqb \mu &=& \int_{-\infty}^{\infty} [f_1(x) \ast f_2(x)] x \,dx \\&=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_1\left(\frac x2-\tau\right) f_2\left(\frac x2+\tau\right) \,d\tau\,x\,dx \\&=& \int_{-\infty}^{\infty} f_2(\tau) \int_{-\infty}^{\infty} f_1(x-\tau) x \,dx\,d\tau \\&=& \int_{-\infty}^{\infty} f_2(\tau) \int_{-\infty}^{\infty} f_1(x)(x+\tau) \,dx\,d\tau \\&=& \int_{-\infty}^{\infty} f_2(\tau) \left[ \int_{-\infty}^{\infty} f_1(x)x \,dx +\tau\int_{-\infty}^{\infty} f_1(x) \,dx \right]\,d\tau \\&=& \int_{-\infty}^{\infty} f_2(\tau) [\mu_1 + \tau] \,d\tau \\&=& \mu_1\int_{-\infty}^{\infty} f_2(\tau) \,d\tau +\int_{-\infty}^{\infty} f_2(\tau) \tau \,d\tau. \eqb That is, \bq{prob:070:20} \mu = \mu_1 + \mu_2, \eq which is no surprise, but at least it is nice to know that our mathematics is working as it should. The standard deviation of the sum of two random variables is such that, substituting~(\ref{prob:050:40}) into the second line of~(\ref{prob:stat}), \bqb \sigma^2 &=& \int_{-\infty}^{\infty} [f_1(x) \ast f_2(x)] (x-\mu)^2 \,dx \\&=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_1\left(\frac x2-\tau\right) f_2\left(\frac x2+\tau\right) \,d\tau\,(x-\mu)^2\,dx. \eqb Applying~(\ref{prob:070:20}), \bqb \sigma^2 &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_1\left(\frac x2-\tau\right) f_2\left(\frac x2+\tau\right) \,d\tau\,(x-\mu_1-\mu_2)^2\,dx \\&=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_1\left(\frac{x+\mu_1}2-\tau\right) f_2\left(\frac{x+\mu_1}2+\tau\right) \,d\tau\,(x-\mu_2)^2\,dx \\&=& \int_{-\infty}^{\infty} f_2(\tau) \int_{-\infty}^{\infty} f_1(x+\mu_1-\tau) (x-\mu_2)^2 \,dx\,d\tau \\&=& \int_{-\infty}^{\infty} f_2(\tau) \int_{-\infty}^{\infty} f_1(x) [(x-\mu_1)+(\tau-\mu_2)]^2 \,dx\,d\tau \\&=& \int_{-\infty}^{\infty} f_2(\tau) \bigg\{ \int_{-\infty}^{\infty} f_1(x) (x-\mu_1)^2 \,dx \\&&\ \ \ \ \ \ \ \ \mbox{} +2(\tau-\mu_2)\int_{-\infty}^{\infty} f_1(x) (x-\mu_1) \,dx \\&&\ \ \ \ \ \ \ \ \mbox{} +(\tau-\mu_2)^2\int_{-\infty}^{\infty} f_1(x) \,dx \bigg\}\,d\tau \\&=& \int_{-\infty}^{\infty} f_2(\tau) \bigg\{ \int_{-\infty}^{\infty} f_1(x) (x-\mu_1)^2 \,dx \\&&\ \ \ \ \ \ \ \ \mbox{} +2(\tau-\mu_2)\int_{-\infty}^{\infty} f_1(x) x \,dx \\&&\ \ \ \ \ \ \ \ \mbox{} +(\tau-\mu_2)(\tau-\mu_2-2\mu_1)\int_{-\infty}^{\infty} f_1(x) \,dx \bigg\}\,d\tau. \eqb Applying~(\ref{prob:stat}) and~(\ref{prob:050:10}), \bqb \sigma^2 &=& \int_{-\infty}^{\infty} f_2(\tau) \left\{ \sigma_1^2 + 2(\tau-\mu_2)\mu_1 +(\tau-\mu_2)(\tau-\mu_2-2\mu_1) \right\}\,d\tau \\&=& \int_{-\infty}^{\infty} f_2(\tau) \left\{ \sigma_1^2 + (\tau-\mu_2)^2 \right\}\,d\tau \\&=& \sigma_1^2 \int_{-\infty}^{\infty} f_2(\tau) \,d\tau +\int_{-\infty}^{\infty} f_2(\tau) (\tau-\mu_2)^2 \,d\tau. \eqb Applying~(\ref{prob:stat}) and~(\ref{prob:050:10}) again, \bq{prob:070:30} \sigma^2 = \sigma_1^2 + \sigma_2^2. \eq If this is right---as indeed it is---then the act of adding random variables together not only adds the means of the variables' respective distributions according to~(\ref{prob:070:20}) but also, according to~(\ref{prob:070:30}), adds the squares of the standard deviations. It follows directly that, if~$N$ independent instances $x_1, x_2, \ldots, x_N$ of a random variable are drawn from the same distribution $f_o(x_k)$, the distribution's statistics being~$\mu_o$ and~$\sigma_o$, then the statistics of their sum $x = \sum_{k=1}^N x_k = x_1 + x_2 + \cdots + x_N$ are \bq{prob:070:40} \begin{split} \mu &= N\mu_o, \\ \sigma &= \left(\sqrt N\right)\sigma_o. \end{split} \eq % ---------------------------------------------------------------------- \section{The transformation of a random variable} \label{prob:080} \index{random variable!transformation of} \index{transformation!of a random variable} \index{distribution!conversion between two} \index{inverse!of a function} \index{function!inverse of} \index{suaveness} If~$x_o$ is a random variable obeying the distribution $f_o(x_o)$ and $g(\cdot)$ is some invertible function whose inverse per~(\ref{alggeo:250:30}) is styled $g^{-1}(\cdot)$, then \[ x \equiv g(x_o) \] is itself a random variable obeying the distribution \bq{prob:080:10} f(x) = \left. \frac{f_o(x_o)}{\left|dg/dx_o\right|} \right|_{x_o = g^{-1}(x)}. \eq Another, suaver way to write the same thing is \bq{prob:080:11} f(x) \left|dx\right| = f_o(x_o) \left|dx_o\right|. \eq Either way, this is almost obvious if seen from just the right perspective, but can in any case be supported symbolically by \[ \int_{a}^{b} f_o(x_o) \,dx_o = \left| \int_{g(a)}^{g(b)} f_o(x_o) \frac{dx_o}{dx} \,dx \right| = \int_{g(a)}^{g(b)} f_o(x_o) \left|\frac{dx_o}{dg}\right| \,dx. \] \index{random variable!scaling of} \index{scaling} \index{probability density function!flattening of} \index{flattening} One of the most frequently useful transformations is the simple \[ x \equiv g(x_o) \equiv \alpha x_o, \ \ \mbox{$\Im(\alpha) = 0$, $\Re(\alpha) > 0$.} \] For this, evidently $dg/dx_o = \alpha$, so according to~(\ref{prob:080:10}) or~(\ref{prob:080:11}) \bq{prob:080:49} f(x) = \frac{1}{\left|\alpha\right|} f_o\left(\frac{x}{\alpha}\right). \eq If $\mu_o=0$ and $\sigma_o=1$, then $\mu=0$ and, in train of~(\ref{prob:stat}), \[ \sigma^2 = \int_{-\infty}^{\infty} f(x) x^2 \,dx = \int_{-\infty}^{\infty} f_o(x_o) (\alpha x_o)^2 \,dx_o = \alpha^2; \] whereby $\sigma=\alpha$ and we can rewrite the transformed PDF as \bq{prob:080:50} f(x) = \frac{1}{\sigma} f_o\left(\frac{x}{\sigma}\right)\ % \mbox{and $\mu = 0$, if $\mu_o = 0$ and $\sigma_o = 1$.} \eq Assuming null mean,~(\ref{prob:080:50}) states that the act of scaling a random variable flattens out the variable's distribution and scales its standard deviation, all by the same factor---which, naturally, is what one would expect it to do. % ---------------------------------------------------------------------- \section{The normal distribution} \label{prob:100} \index{normal distribution} \index{distribution!normal} \index{$\Omega$ as the Gaussian pulse} Combining the ideas of \S\S~\ref{prob:074} and~\ref{prob:080} leads us now to ask whether a distribution does not exist for which, when independent random variables drawn from it are added together, \emph{the sum obeys the same distribution,} only the standard deviations differing. More precisely, we should like to identify a distribution \[ \mbox{$f_o(x_o)$: $\mu_o = 0$, $\sigma_o = 1$;} \] for which, if~$x_1$ and~$x_2$ are random variables drawn respectively from the distributions \[ \begin{split} f_1(x_1) &= \frac{1}{\sigma_1} f_o\left(\frac{x_1}{\sigma_1}\right), \\ f_2(x_2) &= \frac{1}{\sigma_2} f_o\left(\frac{x_2}{\sigma_2}\right), \end{split} \] as~(\ref{prob:080:50}) suggests, then \[ x = x_1 + x_2 \] by construction is a random variable drawn from the distribution \[ f(x) = \frac{1}{\sigma} f_o\left(\frac{x}{\sigma}\right), \] where per~(\ref{prob:070:30}), \[ \sigma^2 = \sigma_1^2 + \sigma_2^2. \] There are several distributions one might try, but eventually the Gaussian pulse $\Omega(x_o)$ of \S\S~\ref{fours:095} and~\ref{fouri:130}, \bq{prob:normdist} \Omega(x) = \frac{\exp\left(-x^2/2\right)}{\sqrt{2\pi}}, \eq recommends itself. This works. The distribution $f_o(x_o) = \Omega(x_o)$ meets our criterion. To prove that the distribution $f_o(x_o) = \Omega(x_o)$ meets our criterion we shall first have to show that it is indeed a distribution according to~(\ref{prob:050:10}). Especially, we shall have to demonstrate that \[ \int_{-\infty}^{\infty} \Omega(x_o) \,dx_o = 1. \] Fortunately, as it happens, we have already demonstrated this fact as % bad break \linebreak (\ref{fouri:130:60}); so, since $\Omega(x_o)$ evidently meets the other demands of~(\ref{prob:050:10}), it apparently is a proper distribution. That $\mu_o = 0$ for $\Omega(x_o)$ is obvious by symmetry. That $\sigma_o = 1$ is shown by \bqb \int_{-\infty}^{\infty} \Omega(x_o) x_o^2 \,dx_o &=& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} \exp\left(-\frac{x_o^2}{2}\right) x_o^2 \,dx_o \\&=& -\frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} x_o \,d\left[\exp\left(-\frac{x_o^2}{2}\right)\right] \\&=& -\left.\frac{x_o \exp\left(-x_o^2/2\right)}{\sqrt{2\pi}}\right|_{-\infty}^{\infty} \\&&\ \ \ \ \mbox{} +\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty} \exp\left(-\frac{x_o^2}{2}\right) \,dx_o, \\&=& 0 + \int_{-\infty}^{\infty} \Omega(x_o) \,dx_o, \eqb from which again according to~(\ref{fouri:130:60}) \bq{prob:100:10} \int_{-\infty}^{\infty} \Omega(x_o) x_o^2 \,dx_o = 1 \eq as was to be shown. Now having justified the assertions that $\Omega(x_o)$ is a proper distribution and that its statistics are $\mu_o = 0$ and $\sigma_o = 1$, all that remains to be proved per~(\ref{prob:050:40}) is that \bq{prob:100:20} \begin{split} \left[\frac{1}{\sigma_1}\Omega\left(\frac{x_o}{\sigma_1}\right)\right] \ast \left[\frac{1}{\sigma_2}\Omega\left(\frac{x_o}{\sigma_2}\right)\right] &= \frac{1}{\sigma}\Omega\left(\frac{x_o}{\sigma}\right), \\ \sigma_1^2 + \sigma_2^2 &= \sigma^2, \end{split} \eq which is to prove that the sum of Gaussian random variables is itself Gaussian. We will prove it in the Fourier domain of Ch.~\ref{fouri} as follows. According to Tables~\ref{fouri:110:tbl20} and~\ref{fouri:120:tbl20}, and to~(\ref{prob:normdist}), \bqb \lefteqn{ \left[\frac{1}{\sigma_1}\Omega\left(\frac{x_o}{\sigma_1}\right)\right] \ast \left[\frac{1}{\sigma_2}\Omega\left(\frac{x_o}{\sigma_2}\right)\right] } &&\\&=& \mathcal{F}^{-1}\left\{ \left(\sqrt{2\pi}\right) \mathcal F \left[\frac{1}{\sigma_1}\Omega\left(\frac{x_o}{\sigma_1}\right)\right] \mathcal F \left[\frac{1}{\sigma_2}\Omega\left(\frac{x_o}{\sigma_2}\right)\right] \right\} \\&=& \mathcal{F}^{-1}\left\{ \left(\sqrt{2\pi}\right) \Omega(\sigma_1 x_o) \Omega(\sigma_2 x_o) \right\} \\&=& \mathcal{F}^{-1}\left\{ \frac{\exp\left[-\sigma_1^2 x_o^2/2\right] \exp\left[-\sigma_2^2 x_o^2/2\right]}{\sqrt{2\pi}} \right\} \\&=& \mathcal{F}^{-1}\left\{ \frac{\exp\left[-\left(\sigma_1^2+\sigma_2^2\right) x_o^2/2\right]} {\sqrt{2\pi}} \right\} \\&=& \mathcal{F}^{-1}\left\{ \Omega\left[\left(\sqrt{\sigma_1^2+\sigma_2^2}\right)x_o\right] \right\} \\&=& \frac{1}{\sqrt{\sigma_1^2+\sigma_2^2}} \Omega\left(\frac{x_o}{\sqrt{\sigma_1^2+\sigma_2^2}}\right), \eqb the last line of which is~(\ref{prob:100:20}) in other notation, thus completing the proof. \index{Gauss, Carl Friedrich (1777--1855)} \index{Gaussian pulse} \index{pulse, Gaussian} \index{normal distribution} \index{distribution!normal} \index{Gaussian distribution} \index{distribution!Gaussian} \index{bell curve} In the Fourier context of Ch.~\ref{fouri} one usually names $\Omega(\cdot)$ the \emph{Gaussian pulse,} as we have seen. The function $\Omega(\cdot)$ turns out to be even more prominent in probability theory than in Fourier theory, however, and in a probabilistic context it usually goes by the name of the \emph{normal distribution.} This is what we will call $\Omega(\cdot)$ through the rest of the present chapter. Alternate conventional names include those of the \emph{Gaussian distribution} and the \emph{bell curve} (the Greek capital~$\Omega$ vaguely, accidentally resembles a bell, as does the distribution's plot, and we will not be too proud to take advantage of the accident, so that is how you can remember it if you like). By whichever name, Fig.~\ref{prob:normdist-fig} plots the normal distribution $\Omega(\cdot)$ and its cumulative distribution function~(\ref{prob:CDF}). \begin{figure} \caption[The normal distribution and its CDF.]{% The normal distribution $\Omega(x) \equiv (1/\sqrt{2\pi})\exp(-x^2/2)$ and its cumulative distribution function $F_\Omega(x) = \int_{-\infty}^{x} \Omega(\tau) \,d\tau$.% } \label{prob:normdist-fig} \bc \settowidth\tla{\small$1$} \nc\xxxab{5.4} \nc\xxya{-0.3} \nc\xxyb{1.7} \nc\xxyc{2.3} \nc\fxa{-6.0} \nc\fxb{6.0} \nc\fya{-4.6} \nc\fyb{2.6} \nc\xxticlength{0.15} \nc\xxscale{1.7} \nc\xxfigoffset{3.8} \nc\xxaxes{% {% \psset{linewidth=0.5pt}% \psline(-\xxxab,0)(\xxxab,0)% \psline(0,\xxya)(0,\xxyb)% \uput[r](\xxxab,0){$x$}% \uput[u](0,\xxyb){$\Omega(x)$}% }% } \nc\xxaxess{% {% \psset{linewidth=0.5pt}% \psline(-\xxxab,0)(\xxxab,0)% \psline(0,\xxya)(0,\xxyc)% \psline[linestyle=dashed](-\xxxab,\xxscale)(\xxxab,\xxscale)% \uput[r](\xxxab,0){$x$}% \uput[u](0,\xxyc){$F_\Omega(x)$}% }% } \begin{pspicture}(\fxa,\fya)(\fxb,\fyb) %\psframe[linewidth=0.5pt,dimen=outer](\fxa,\fya)(\fxb,\fyb) %\localscalebox{1.0}{1.0} \small \rput(0,0){ \xxaxes \psplot[linewidth=2.0pt,plotpoints=300]{-5.10}{5.10}{ /twopi 6.28318530717959 def /e 2.71828182845905 def /scale 1.7 def e x scale div dup mul 2.0 div neg exp twopi sqrt div scale mul } \uput[ur](0,0.67820){$1/\sqrt{2\pi}$} \rput(\xxscale,0){\psline[linewidth=0.5pt](0,0)(0,0.41135)} \rput(-\xxscale,0){\psline[linewidth=0.5pt](0,0)(0,0.41135)} \uput[d](\xxscale,0){$1$} \uput[d](-\xxscale,0){\makebox[\tla][r]{$-1$}} } \rput(0,-\xxfigoffset){ \xxaxess \psplot[linewidth=2.0pt,plotpoints=300]{-5.10}{5.10}{ /twopi 6.28318530717959 def /e 2.71828182845905 def /scale 1.7 def x scale div dup dup mul neg exch dup 1 1 15 {2.0 mul 4 1 roll 2 index mul 3 index div dup 4 index 1 add div 3 2 roll add exch 4 3 roll pop} for pop exch pop twopi sqrt div 0.5 add scale mul } \uput[ur](0,\xxscale){$1$} \rput(\xxscale,0){\psline[linestyle=dashed,linewidth=0.5pt](0,0)(0,\xxscale)} \rput(-\xxscale,0){\psline[linestyle=dashed,linewidth=0.5pt](0,0)(0,\xxscale)} \uput[d](\xxscale,0){$1$} \uput[d](-\xxscale,0){\makebox[\tla][r]{$-1$}} } \end{pspicture} \ec \end{figure}% \index{normal distribution!cumulative distribution function of} \index{cumulative distribution function!of the normal distribution} \index{cumulative distribution function!numerical calculation of} Regarding the cumulative normal distribution function, one way to calculate it numerically is to integrate the normal distribution's Taylor series term by term. As it happens, \S~\ref{inttx:450} has worked a very similar integral as an example, so this section will not repeat the details, but the result is \bqa F_\Omega(x_o) = \int_{-\infty}^{x_o} \Omega(\tau) \,d\tau &=& \frac 12 + \frac{1}{\sqrt{2\pi}} \sum_{k=0}^{\infty} \frac{(-)^kx_o^{2k+1}}{(2k+1)2^k k!} \xn\\&=& \frac 12 + \frac{x_o}{\sqrt{2\pi}} \sum_{k=0}^{\infty} \frac{1}{2k+1} \prod_{j=1}^k \frac{-x_o^2}{2j}. \label{prob:100:40} \ \ \ \ % \eqa Unfortunately, this Taylor series---though always theoretically correct---is practical only for small and moderate $\left|x_o\right| \lesssim 1$. For $\left|x_o\right| \gg 1$, see \S~\ref{prob:750}. \index{distribution!default} The normal distribution tends to be the default distribution in applied mathematics. When one lacks a reason to do otherwise, one models a random quantity as a normally distributed random variable. See \S~\ref{prob:300} for the reason. % ---------------------------------------------------------------------- \section{Inference of statistics} \label{prob:200} \index{inference of statistics} \index{statistic!inference of} \index{estimation of statistics} \index{sample} Suppose that several, concrete instances of a random variable---collectively called a \emph{sample}---were drawn from a distribution $f(x)$ and presented to you, but that you were not told the shape of $f(x)$. Could you infer the shape? The answer is that you could infer the shape with passable accuracy provided that the number~$N$ of samples were large. Typically however one will be prepared to make some assumption about the shape such as that \bq{prob:200:20} f(x) = \mu + \frac{1}{\sigma}\Omega\left(\frac{x}{\sigma}\right), \eq which is to assume that~$x$ were normally distributed with unknown statistics~$\mu$ and~$\sigma$. The problem then becomes to infer the statistics from the sample. \index{mean!inference of} In the absence of additional information, one can hardly hardly suppose much about the mean other than that \bq{prob:200:mean} \mu \approx \frac{1}{N}\sum_k x_k. \eq One infers the mean to be the average of the instances one has observed. One might think to infer the standard deviation in much the same way except that to calculate the standard deviation directly according to~(\ref{prob:stat}) would implicate our imperfect estimate~(\ref{prob:200:mean}) of the mean. If we wish to estimate the standard deviation accurately from the sample then we shall have to proceed more carefully than that. \index{standard deviation!inference of} It will simplify the standard-deviational analysis to consider the shifted random variable \[ u_k = x_k - \mu_{\mr{true}} \] instead of~$x_k$ directly, where~$\mu_{\mr{true}}$ is not the estimated mean of~(\ref{prob:200:mean}) but the true, unknown mean of the hidden distribution $f(x)$. The distribution of~$u$ then is $f(u+\mu_{\mr{true}})$, a distribution which by construction has zero mean. (Naturally, we do not know---we shall never know---the actual value of~$\mu_{\mr{true}}$, but this does not prevent us from representing~$\mu_{\mr{true}}$ symbolically during analysis.) We shall presently find helpful the identities \[ \begin{split} \left\langle \sum_k u_k^2 \right\rangle &= N\sigma^2, \\ \left\langle {\sum_k}^2 u_k \right\rangle &= N\sigma^2, \end{split} \] the first of which is merely a statement of the leftward part of (\ref{prob:stat})'s second line with respect to the unknown distribution $f(u+\mu_{\mr{true}})$ whose mean $\langle u \rangle$ is null by construction, the second of which considers the sum $\sum_k u_k$ as a random variable whose mean again is null but whose standard deviation~$\sigma_\Sigma$ according to~(\ref{prob:070:30}) is such that $\sigma_\Sigma^2 = N\sigma_{\mr{true}}^2$. With the foregoing definition and identities in hand, let us construct from the available sample the quantity \bqb \left(\sigma'\right)^2 \equiv \frac 1 N \sum_k\left( x_k - \frac 1 N \sum_k x_k \right)^2, \eqb which would tend to approach~$\sigma_{\mr{true}}^2$ as~$N$ grew arbitrarily large but which, unlike~$\sigma_{\mr{true}}$, is a quantity we can actually compute for any $N > 1$. By successive steps, \bqb \left(\sigma'\right)^2 &=& \frac 1 N \sum_k\left( [u_k+\mu_{\mr{true}}] - \frac 1 N \sum_k [u_k+\mu_{\mr{true}}] \right)^2 \\&=& \frac 1 N \sum_k\left( u_k - \frac 1 N \sum_k u_k \right)^2 \\&=& \frac 1 N \sum_k\left( u_k^2 - \frac 2 N u_k \sum_k u_k + \frac 1{N^2} {\sum_k}^2 u_k \right) \\&=& \frac 1 N \sum_k u_k^2 -\frac 2{N^2} {\sum_k}^2 u_k +\frac 1{N^2} {\sum_k}^2 u_k \\&=& \frac 1 N \sum_k u_k^2 -\frac 1{N^2} {\sum_k}^2 u_k, \eqb the expected value of which is \[ \left\langle \left(\sigma'\right)^2 \right\rangle = \frac 1 N \left\langle \sum_k u_k^2 \right\rangle -\frac 1{N^2} \left\langle {\sum_k}^2 u_k \right\rangle, \] Applying the identities of the last paragraph, \[ \left\langle \left(\sigma'\right)^2 \right\rangle = \sigma^2 - \frac{\sigma^2}{N} = \frac{N-1}{N} \sigma^2, \] from which \[ \sigma^2 = \frac{N}{N-1} \left\langle \left(\sigma'\right)^2 \right\rangle. \] Because the expected value $\langle(\sigma')\rangle$ is not a quantity whose value we know, we can only suppose that $\langle(\sigma')^2\rangle \approx \sigma^2$, whereby \[ \sigma^2 \approx \frac{N}{N-1} \left(\sigma'\right)^2, \] and, substituting the definition of $(\sigma')^2$ into the last equation, \bq{prob:200:stdev} \sigma^2 \approx \frac 1{N-1} \sum_k\left( x_k - \frac 1 N \sum_k x_k \right)^2. \eq \index{statistic!sample} \index{sample statistic} The estimates~(\ref{prob:200:mean}) and~(\ref{prob:200:stdev}) are known as \emph{sample statistics.} They are the statistics one imputes to an unknown distribution based on the incomplete information of $N > 1$ samples. \index{independence} \index{dependence} \index{correlation} \index{correlation coefficient} \index{Pfufnik, Gorbag} This chapter generally assumes independent random variables when it speaks of probability. In statistical work however one must sometimes handle correlated quantities like the height and weight of a~25-year-old American male---for, obviously, if I point to some 25-year-old over there and say, ``That's Pfufnik. The average is~160 pounds, but he weighs~250!'' then your estimate of his probable height will change, because height and weight are not independent but correlated. The conventional statistical measure% \footnote{\cite[\S~9.9]{Walpole/Myers}\cite[eqns.~12-6 and~12-14]{Alder/Roessler}} of the correlation of a series $(x_k,y_k)$ of pairs of data, such as $([\mbox{height}]_k,[\mbox{weight}]_k)$ of the example, is the \emph{correlation coefficient} \bq{prob:070:60} r \equiv \frac{ \sum_k (x_k-\mu_x) (y_k-\mu_y) }{ \sqrt{\sum_k (x_k-\mu_x)^2 \sum_k (x_y-\mu_y)^2} }, \eq a unitless quantity whose value is~$\pm 1$, indicating perfect correlation, when $y_k = x_k$ or even when $y_k = a_1 x_k + a_0$; but whose value should be near zero when the paired data are unrelated. See Fig.~\ref{mtxinv:320:fig1} for another example of the kind of paired data in whose correlation one might be interested: in the figure, the correlation would be~$+1$ if the points fell all right on the line. (Beware that the conventional correlation coefficient of eqn.~\ref{prob:070:60} can overstate the relationship between paired data when~$N$ is small. Consider for instance that $r=\pm 1$ always when $N=2$. The coefficient as given nevertheless is conventional.) If further elaborated, the mathematics of statistics rapidly grow much more complicated. The book will not pursue the matter further but will mention that the kinds of questions that arise tend to involve the statistics of the statistics themselves, treating the statistics as random variables. Such questions confound two, separate uncertainties: the uncertainty inherent by definition~(\ref{prob:050:10}) in a random variable even were the variable's distribution precisely known; and the uncertain knowledge of the distribution.% \footnote{ The subtle mathematical implications of this far exceed the scope of the present book but are developed to one degree or another in numerous collegiate statistics texts of which~\cite{Walpole/Myers}\cite{Alder/Roessler}\cite{Lindgren}\cite{Rosenkrantz} are representative examples. } Fortunately, if $N \gg 1$, one can usually tear the two uncertainties from one another without undue violence to accuracy, pretending that one knew the unknown~$\mu$ and~$\sigma$ to be exactly the values~(\ref{prob:200:mean}) and~(\ref{prob:200:stdev}) respectively calculate, supposing that the distribution were the normal~(\ref{prob:200:20}), and modeling on this basis. % ---------------------------------------------------------------------- \section{The random walk and its consequences} \label{prob:300} \index{random walk} \index{walk, random} This section brings overview, insight and practical perspective. It also analyzes the simple but often encountered statistics of a series of all-or-nothing-type attempts. \subsection{The random walk} \label{prob:300.10} \index{random walk} \index{walk, random} \index{Sands, Matthew} \index{Feynman, Richard~P. (1918-1988)} \index{lecture} \index{coin} Matthew Sands gave a famous lecture~\cite[\S\S~I:6]{Feynman}, on probability, on behalf of Richard~P. Feynman at Caltech in the fall of~1961. The lecture is a classic and is recommended to every reader of this chapter who can conveniently lay hands on a printed copy---recommended among other reasons because it lends needed context to the rather abstruse mathematics this chapter has presented to the present point. One section of the lecture begins, ``There is [an] interesting problem in which the idea of probability is required. It is the problem of the `random walk.' In its simplest version, we imagine a `game' in which a `player' starts at the point [$D=0$] and at each `move' is required to take a step \emph{either} forward (toward [$+D$]) \emph{or} backward (toward [$-D$]). The choice is to be made \emph{randomly,} determined, for example, by the toss of a coin. How shall we describe the resulting motion?'' \index{straying} \index{likely straying} Sands goes on to observe that, though one cannot guess whether the `player' will have gone forward or backward after~$N$ steps---and, indeed, that in the absence of other information one must expect $\langle D_N \rangle = 0$, zero net progress---``[one has] the feeling that as~$N$ increases, [the `player'] is likely to have strayed farther from the starting point.'' Sands is right, but if $\langle D_N \rangle$ is not a suitable measure of this ``likely straying,'' so to speak, then what would be? The measure $\langle\left| D_N \right|\rangle$ might recommend itself, but this being nonanalytic (\S\S~\ref{alggeo:225.3} and~\ref{taylor:320}) proves inconvenient in practice (you can try it if you like). The success of the least-squares technique of \S~\ref{mtxinv:320} however encourages us to try the measure $\langle D_N^2 \rangle$. The squared distance~$D_N^2$ is nonnegative in every instance and also is analytic, so its expected value $\langle D_N^2 \rangle$ proves a most convenient measure of ``likely straying.'' It is moreover a measure universally accepted among scientists and engineers, and it is the measure this book will adopt. Sands notes that, if the symbol $D_{N-1}$ represents the `player's' position after $N-1$ steps, his position after~$N$ steps must be $D_N = D_{N-1} \pm 1$. The expected value $\langle D_N \rangle = 0$ is uninteresting as we said, but the expected value $\langle D_N^2 \rangle$ is interesting. And what is this expected value? Sands finds two possibilities: either the `player' steps forward on his $N$th step, in which case \[ \big\langle D_N^2 \big\rangle = \big\langle (D_{N-1} + 1)^2 \big\rangle = \big\langle D_{N-1}^2 \big\rangle + 2\big\langle D_{N-1} \big\rangle + 1; \] or he steps backward on his $N$th step, in which case \[ \big\langle D_N^2 \big\rangle = \big\langle (D_{N-1} - 1)^2 \big\rangle = \big\langle D_{N-1}^2 \big\rangle - 2\big\langle D_{N-1} \big\rangle + 1. \] Since forward and backward are equally likely, the actual expected value must be the average \[ \left\langle D_N^2 \right\rangle = \left\langle D_{N-1}^2 \right\rangle + 1 \] of the two possibilities. Evidently, the expected value increases by~$1$ with each step. Thus by induction, since $\langle D_0^2 \rangle = 0$, \[ \left\langle D_N^2 \right\rangle = N. \] Observe that the PDF of a single step~$x_k$ is $f_o(x_o) = [\delta(x_o+1) + \delta(x_o-1)]/2$, where $\delta(\cdot)$ is the Dirac delta of Fig.~\ref{integ:670:fig-d}; and that the corresponding statistics are $\mu_o = 0$ and $\sigma_o = 1$. The PDF of~$D_N$ is more complicated (though not especially hard to calculate in view of \S~\ref{drvtv:220}), but its statistics are evidently $\mu_N = 0$ and $\sigma_N = \sqrt N$, agreeing with~(\ref{prob:070:40}). \subsection{Consequences} \label{prob:300.50} \index{random walk!consequences of} \index{walk, random!consequences of} \index{real-estate agent} \index{house} \index{sales} An important variation of the random walk comes with the distribution \bq{prob:300:50} f_o(x_o) = (1-p_o)\delta(x_o) + p_o\delta(x_o-1), \eq which describes or governs an act whose probability of success is~$p_o$. This distribution's statistics according to~(\ref{prob:stat}) are such that \bq{prob:300:55} \begin{split} \mu_o &= p_o, \\ \sigma_o^2 &= (1-p_o)p_o. \end{split} \eq As an example of the use, consider a real-estate agent who expects to sell one house per~10 times he shows a prospective buyer a house: $p_o=1/10=0.10$. The agent's expected result from a single showing, according to~(\ref{prob:300:55}), is to sell $\mu_o \pm \sigma_o = 0.10 \pm 0.30$ of a house. The agent's expected result from $N = 400$ showings, according to~(\ref{prob:070:40}), is to sell $\mu \pm \sigma = N\mu_o \pm \left(\sqrt N\right)\sigma_o = 40.0 \pm 6.0$ houses. Such a conclusion, of course, is valid only to the extent to which the model is valid---which in a real-estate agent's case may be \emph{not very}---but that nevertheless is how the mathematics of it work. \index{normal distribution!convergence toward} As the number~$N$ of attempts grows large one finds that the distribution $f(x)$ of the number of successes begins more and more to take on the bell-shape of Fig.~\ref{prob:normdist-fig}'s normal distribution. Indeed, this makes sense, for one would expect the aforementioned real-estate agent to have a relatively high probability of selling~39, 40 or~41 houses but a low probability to sell~10 or~70; thus one would expect $f(x)$ to take on something rather like the bell-shape. If for~400 showings the distribution is $f(x)$ then, according to~(\ref{prob:050:40}), for $800=400+400$ showings the distribution must be $f(x) \ast f(x)$. Moreover, since the only known PDF which, when convolved with itself, does not change shape is the normal distribution of \S~\ref{prob:100}, one infers that the normal distribution is the PDF toward which the real-estate agent's distribution---and indeed most other distributions of sums of random variables---must converge% \footnote{ Admittedly, the argument, which supposes that all (or at least most) aggregate PDFs must tend toward some common shape as~$N$ grows large, is somewhat specious, or at least unrigorous---though on the other hand it is hard to imagine any plausible conclusion other than the correct one the argument reaches---but one can construct an alternate though tedious argument toward the normal distribution on the following basis. Counting permutations per \S~\ref{drvtv:220}, derive an exact expression for the probability of~$k$ successes in~$N$ tries, which is $P_k = \cmbl{N}{k}p^k(1-p)^{N-k}$. Considering also the probabilities of $k-1$ and $k+1$ successes in~$N$ tries, approximate the logarithmic derivative of~$P_k$ per~(\ref{drvtv:240.40:10}) as $(\pl P_k/\pl k)/P_k \approx (P_{k+1}-P_{k-1})/2P_k$ or, better---remembering that suitable arithmetical approximations are permissible in such work---as $(\pl P_k/\pl k)/P_k \approx (P_{k+1/2}-P_{k-1/2})/P_k$. Change $x \la (k - p_oN)/\sqrt{(1-p_o)p_oN}$. Discover a continuous, analytic function, which is $f(x) = C\exp(-ax^2)$, that for large~$N$ has a similar logarithmic derivative in the distribution's peak region. To render the arithmetic tractable one might try first the specific case of $p=1/2$ and make various arithmetical approximations as one goes, but to fill in the tedious details is left as an exercise to the interested (penitent?) reader. The author confesses that he prefers the specious argument of the narrative. } as $N \ra \infty$. \index{distribution!default} For such reasons, applications tend to approximate sums of several random variables as though the sums were normally distributed; and, moreover, tend to impute normal distributions to random variables whose true distributions are unnoticed, uninteresting or unknown. In the theory and application of probability, the normal distribution is the master distribution, the distribution of last resort, often the only distribution tried. The banal suggestion, ``When unsure, go normal!'' usually prospers in probabilistic work. % ---------------------------------------------------------------------- \section{Other distributions} \label{prob:400} Many distributions other than the normal one of Fig.~\ref{prob:normdist-fig} are possible. This section will name a few of the most prominent. \subsection{The uniform distribution} \label{prob:400.10} \index{uniform distribution} \index{distribution!uniform} \index{computer!pseudorandom-number generator} \index{pseudorandom number} The \emph{uniform distribution} can be defined in any several forms, but the conventional form is \bq{prob:400:10} f(x) = \Pi\left(x-\frac 12\right) = \begin{cases} 1 &\mbox{if $0 \le x < 1$,} \\ 0 &\mbox{otherwise.} \end{cases} \eq where $\Pi(\cdot)$ is the square pulse of Fig.~\ref{fours:095:fig1}. Besides sometimes being useful in its own right, this is also the distribution a computer's pseudorandom-number generator obeys. One can extract normally distributed (\S~\ref{prob:100}) or Rayleigh-distributed (\S~\ref{prob:400.30}) random variables from it by the Box-Muller transformation of \S~\ref{prob:410}. \subsection{The exponential distribution} \label{prob:400.20} \index{exponential distribution} \index{distribution!exponential} \index{bearing} \index{mechanical bearing} \index{retail establishment} \index{customer} \index{failure of a mechanical part} The \emph{exponential distribution} is% \footnote{ Unlike the section's other subsections, this one explicitly includes the mean~$\mu$ in the expression~(\ref{prob:400:20}) of its distribution. The inclusion of~$\mu$ here is admittedly inconsistent. The reader who prefers to do so can mentally set $\mu = 1$ and read the section in that light. However, in typical applications the entire point of choosing the exponential distribution may be to specify~$\mu$, or to infer it. The exponential distribution is inherently ``$\mu$-focused,'' so to speak. The author prefers to leave the~$\mu$ in the expression for this reason. } \bq{prob:400:20} f(x) = \frac{1}{\mu}\exp\left(-\frac{x}{\mu}\right), \ \ x \ge 0, \eq whose mean is \[ \frac 1 \mu \int_0^\infty \exp\left( -\frac x \mu \right) x\,dx = \left. - \exp\left( -\frac x \mu \right) ( x + \mu ) \right|_0^\infty = \mu \] as advertised and whose standard deviation is such that \bqb \sigma^2 &=& \frac 1 \mu \int_0^\infty \exp\left( -\frac x \mu \right) (x-\mu)^2\,dx \\&=& \left. - \exp\left( -\frac x \mu \right) ( x^2 + \mu^2 ) \right|_0^\infty, \eqb (the integration by the method of unknown coefficients of \S~\ref{inttx:240}), which implies that \bq{prob:400:25} \sigma = \mu. \eq The exponential's CDF~(\ref{prob:CDF}) and quantile~(\ref{prob:quantile}) are evidently \bq{prob:400:28} \begin{split} F(x) &= 1 - \exp\left( -\frac{x}{\mu} \right), \\ F^{-1}(u) &= -\mu\ln(1-u). \end{split} \eq Among other effects, the exponential distribution models the delay until some imminent event like a mechanical bearing's failure or the arrival of a retail establishment's next customer. \subsection{The Rayleigh distribution} \label{prob:400.30} \index{Rayleigh distribution} \index{distribution!Rayleigh} \index{Rayleigh, John Strutt, 3rd baron (1842--1919)} \index{missile} The \emph{Rayleigh distribution} is a generalization of the normal distribution for position in a plane. Let each of the~$x$ and~$y$ coordinates be drawn independently from a normal distribution of zero mean and unit standard deviation, such that \bqb dP &\equiv& \left[ \Omega(x) \,dx \right] \left[ \Omega(y) \,dy \right] \\&=& \frac{1}{2\pi} \exp\left( -\frac{x^2+y^2}{2} \right) \,dx\,dy \\&=& \frac{1}{2\pi} \exp\left( -\frac{\rho^2}{2} \right) \rho\,d\rho\,d\phi, \eqb whence \bqb P_{ba} &\equiv& \int_{\phi=-\pi}^\pi \int_{\rho=a}^b dP \\&=& \frac{1}{2\pi} \int_{-\pi}^\pi \int_a^b \exp\left( -\frac{\rho^2}{2} \right) \rho\,d\rho\,d\phi \\&=& \int_a^b \exp\left( -\frac{\rho^2}{2} \right) \rho\,d\rho, \eqb which implies the distribution \bq{prob:400:30} f(\rho) = \rho \exp\left( -\frac{\rho^2}{2} \right), \ \ \rho \ge 0. \eq This is the Rayleigh distribution. That it is a proper distribution according to~(\ref{prob:050:10}) is proved by evaluating the integral \bq{prob:400:32} \int_0^{\infty} f(\rho) \,d\rho = 1 \eq using the method of \S~\ref{fouri:130}. Rayleigh's CDF~(\ref{prob:CDF}) and quantile~(\ref{prob:quantile}) are evidently \bq{prob:400:34} \begin{split} F(\rho) &= 1 - \exp\left( -\frac{\rho^2}{2} \right), \\ F^{-1}(u) &= \sqrt{-2\ln(1-u)}. \end{split} \eq The Rayleigh distribution models among others the distance~$\rho$ by which a missile might miss its target. \index{azimuth} Incidentally, there is nothing in the mathematics to favor any particular value of~$\phi$ over another,~$\phi$ being the azimuth toward which the missile misses, for the integrand $\exp(-\rho^2/2) \rho \,d\rho\,d\phi$ above includes no~$\phi$; so, unlike~$\rho$, $\phi$ by symmetry will be uniformly distributed. \subsection{The Maxwell distribution} \label{prob:400.40} \index{Maxwell distribution} \index{distribution!Maxwell} \index{Maxwell, James Clerk (1831--1879)} \index{particle} \index{air} The \emph{Maxwell distribution} extends the Rayleigh from two to three dimensions. Maxwell's derivation closely resembles Rayleigh's, with the difference that Maxwell uses all three of~$x$, $y$ and~$z$ and then transforms to spherical rather than cylindrical coordinates. The distribution which results, the Maxwell distribution, is \bq{prob:400:40} f(r) = \frac{2r^2}{\sqrt{2\pi}} \exp\left( -\frac{r^2}{2} \right), \ \ r \ge 0, \eq which models, among others, the speed at which an air molecule might travel.% \footnote{\cite[eqn.~I:40.7]{Feynman}} %\footnote{ % The section has omitted several reasonably well-known distributions. % As the section ends it will pass these over, except one, without % special notice. The one is the \emph{chi-square} distribution % \cite[Ch.~13]{Adler}. The reason the section has omitted the % chi-square is that the principal application of the chi-square lies in % ``hypothesis testing,'' an application of statistical inference this % book does not treat. %} \subsection{The log-normal distribution} \label{prob:400.50} \index{log-normal distribution} \index{distribution!log-normal} In the \emph{log-normal distribution,} it is not~$x$ but \bq{prob:400:51} x_o \equiv \frac{\ln x}{\alpha} \eq that is normally distributed, a fairly common case. Setting $x = g(x_o) = \exp \alpha x_o$ and $f_o(x_o) = \Omega(x_o)$ in~(\ref{prob:080:10}), one can express the log-normal distribution in the form% \footnote{\cite[Ch.~5]{Papoulis}} \bq{prob:400:50} f(x) = \frac{1}{\alpha x}\Omega\left(\frac{\ln x}{\alpha}\right). \eq % ---------------------------------------------------------------------- \section{The Box-Muller transformation} \label{prob:410} \index{Box-Muller transformation} \index{transformation, Box-Muller} \index{Box, G.E.P. (1919--)} \index{Muller, Mervin~E.} \index{distribution!conversion between two} \index{quantile!use of to convert between distributions} The quantiles~(\ref{prob:400:28}) and~(\ref{prob:400:34}) imply easy conversions from the uniform distribution to the exponential and Rayleigh. Unfortunately, we lack a quantile formula for the normal distribution. However, we can still convert uniform to normal by way of Rayleigh as follows. Section~\ref{prob:400.30} has shown how Rayleigh gives the distance~$\rho$ by which a missile misses a target when each of~$x$ and~$y$ are normally distributed and, interestingly, how the azimuth~$\phi$ is uniformly distributed under these conditions. Because we know the quantiles, to convert a pair of instances~$u$ and~$v$ of a uniformly distributed random variable to Rayleigh's distance and azimuth is thus straightforward:% \footnote{ One can eliminate a little trivial arithmetic by appropriate changes of variable in~(\ref{prob:410:20}) like $u' \la 1-u$, but to do so saves little computational time and makes the derivation harder to understand. Still, the interested reader might complete the improvement as an exercise. } \bq{prob:410:20} \begin{split} \rho &= \sqrt{-2\ln(1-u)}, \\ \phi &= (2\pi)\left(v - \frac 1 2\right). \end{split} \eq But for the reason just given, \bq{prob:410:25} \begin{split} x &= \rho\cos\phi, \\ y &= \rho\sin\phi, \end{split} \eq must then constitute two independent instances of a normally distributed random variable with $\mu = 0$ and $\sigma = 1$. Evidently, though we lack an easy way to convert a single uniform instance to a single normal instance, we can convert a \emph{pair} of uniform instances to a pair of normal instances. Equations~(\ref{prob:410:20}) and~(\ref{prob:410:25}) are the \emph{Box-Muller transformation.}% \footnote{\cite{EWW}} % ---------------------------------------------------------------------- \section[The normal CDF at large arguments]{% The normal cumulative distribution function at large arguments% } \label{prob:750} \index{normal distribution!cumulative distribution function of} \index{cumulative distribution function!of the normal distribution} \index{large-argument form} \index{rounding error} \index{error, rounding} The Taylor series~(\ref{prob:100:40}) in theory correctly calculates the normal CDF $F_\Omega(x)$, an entire function, for any argument~$x$. In practice however---consider the Taylor series \[ 1 - F_\Omega(6) \approx - \mbox{0x0.8000} + \mbox{0x2.64C6} - \mbox{0xE.5CA7} + \mbox{0x4D.8DEC} - \cdots \] Not promising, is it? Using a computer's standard, \texttt{double}-type floating-point arithmetic, this calculation fails, swamped by rounding error. \index{integration!by parts} \index{residual} \index{semiconvergent series} \index{asymptotic series} \index{series!semiconvergent} \index{series!asymptotic} One can always calculate in greater precision,% \footnote{\cite{libmpfi}} of course, asking the computer to carry extra bits; and, actually, this is not necessarily a bad approach. There remain however several reasons one might prefer a more efficient formula. \bi \item One might wish to evaluate the CDF at thousands or millions of points, not just one. At some scale, even with a computer, the calculation grows expensive. \item \index{embedded device} One might wish to evaluate the CDF on a low-power ``embedded device.'' \item \index{embedded control} \index{aircraft} One might need to evaluate the CDF under a severe time constraint measured in microseconds, as in aircraft control. \item \index{computer} \index{pencil} \index{slide rule} Hard though it might be for some to imagine, one might actually wish to evaluate the CDF with a pencil! Or with a slide rule. (Besides that one might not have a suitable electronic computer conveniently at hand, that electronic computers will never again be scarce is a proposition whose probability the author is not prepared to evaluate.) \item \index{efficiency} \index{prudence} \index{elegance} The mathematical method by which a more efficient formula is derived is most instructive.% \footnote{ Such methods prompt one to wonder how much useful mathematics our civilization should have forgone had Leonhard Euler (1707--1783), Carl Friedrich Gauss (1777--1855) and other hardy mathematical minds of the past computers to lean upon. } \item One might regard a prudent measure of elegance, even in applications, to be its own reward. \ei Here is the method.% \footnote{\cite[\S~2.2]{Lebedev}} Beginning from \bqb 1 - F_\Omega(x) &=& \frac{1}{\sqrt{2\pi}} \int_x^\infty \exp\left( -\frac{\tau^2}{2} \right) \,d\tau \\&=& \frac{1}{\sqrt{2\pi}} \left\{ -\int_{\tau=x}^\infty \frac{ d\left[ e^{-\tau^2/2} \right] }{\tau} \right\} \eqb and integrating by parts, \bqb 1 - F_\Omega(x) &=& \frac{1}{\sqrt{2\pi}} \left\{ \frac{e^{-x^2/2}}{x} - \int_{x}^\infty \frac{ e^{-\tau^2/2} \,d\tau }{\tau^2} \right\} \\&=& \frac{1}{\sqrt{2\pi}} \left\{ \frac{e^{-x^2/2}}{x} + \int_{\tau=x}^\infty \frac{ d\left[ e^{-\tau^2/2} \right] }{\tau^3} \right\}. \eqb Integrating by parts again, \bqb 1 - F_\Omega(x) &=& \frac{1}{\sqrt{2\pi}} \left\{ \frac{e^{-x^2/2}}{x} - \frac{e^{-x^2/2}}{x^3} + 3 \int_{x}^\infty \frac{ e^{-\tau^2/2} \,d\tau }{\tau^4} \right\} \\&=& \frac{1}{\sqrt{2\pi}} \left\{ \frac{e^{-x^2/2}}{x} - \frac{e^{-x^2/2}}{x^3} - 3 \int_{\tau=x}^\infty \frac{ d\left[ e^{-\tau^2/2} \right] }{\tau^5} \right\}. \eqb Integrating by parts repeatedly, \bqb 1 - F_\Omega(x) &=& \frac{1}{\sqrt{2\pi}} \Bigg\{ \frac{e^{-x^2/2}}{x} - \frac{e^{-x^2/2}}{x^3} + \frac{3e^{-x^2/2}}{x^5} - \cdots \\&&\ \ \ \ \ \ \ \ \ \ \ \ \mbox{}% + \frac{(-)^{n-1}(2n-3)!!e^{-x^2/2}}{x^{2n-1}} \\&&\ \ \ \ \ \ \ \ \ \ \ \ \mbox{}% + (-)^n(2n-1)!! \int_{x}^\infty \frac{ e^{-\tau^2/2} \,d\tau}{\tau^{2n}} \Bigg\}, \eqb in which the convenient notation \bq{prob:750:05} \index{\char 33 \char 33} \index{factorial!\char 33 \char 33 -style} m!! \equiv \settowidth\tla{$\prod_{j=1}^{(m+1)/2} (2j-1)$} \begin{cases} \prod_{j=1}^{(m+1)/2} (2j-1) = (m)(m-2)\cdots(5)(3)(1) &\mbox{for odd~$m$,} \\ \makebox[\tla][l]{$\prod_{j=1}^{m/2} (2j)$} = (m)(m-2)\cdots(6)(4)(2) &\mbox{for even~$m$,} \end{cases} \eq is introduced.% \cite[Exercise~2.2.15]{Andrews} The last expression for $1 - F_\Omega(x)$ is better written \bqa 1 - F_\Omega(x) &=& \frac{\Omega(x)}{x} [ S_n(x) + R_n(x) ], \label{prob:750:20}\\ S_n(x) &\equiv& \sum_{k=0}^{n-1} \left[ \prod_{j=1}^k \frac{2j-1}{-x^2} \right] = \sum_{k=0}^{n-1} \frac{(-)^k(2k-1)!!}{x^{2k}}, \xn\\ R_n(x) &\equiv& (-)^n(2n-1)!! x \int_{x}^\infty \frac{ e^{(x^2-\tau^2)/2} \,d\tau}{\tau^{2n}}. \xn \eqa The series $S_n(x)$ is an \emph{asymptotic series,} also called an \emph{semiconvergent series.}% \footnote{ % diagn: complete the citation here? As professional use them, the adjectives \emph{asymptotic} and \emph{semiconvergent} apparently can differ slightly in meaning~\cite{Andrews}. We'll not worry about that here. } So long as $x \gg 1$, the first several terms of the series will successively shrink in magnitude but, no matter how great the argument~$x$ might be, eventually the terms will insist on growing again, growing without limit. Unlike a Taylor series, $S_\infty(x)$ diverges for all~$x$. \index{truncation} \index{series!truncation of} \index{error bound} \index{term!finite number of} \index{partial sum} \index{sum!partial} \index{cumulative distribution function!estimation of} Fortunately, nothing requires us to let $n \ra \infty$, and we remain free to choose~$n$ strategically as we like---for instance to exclude from~$S_n$ the series' least term in magnitude and all the terms following. So excluding leaves us with the problem of evaluating the integral~$S_n$, but see: \bqb \left|R_n(x)\right| &\le& (2n-1)!! \left|x\right| \int_{x}^\infty \left| \frac{ e^{(x^2-\tau^2)/2} \,d\tau}{\tau^{2n}} \right| \\&\le& \frac{(2n-1)!!}{\left|x\right|^{2n}} \int_{x}^\infty \left| e^{(x^2-\tau^2)/2} \tau\,d\tau \right|, \eqb because $\left|x\right| \le \left|\tau\right|$, so $\left|x\right|^{2n+1} \le \left|\tau\right|^{2n+1}$. Changing $\xi^2 \la \tau^2 - x^2$, whereby $\xi \,d\xi = \tau \,d\tau$, \[ \left|R_n(x)\right| \le \frac{(2n-1)!!}{\left|x\right|^{2n}} \int_{0}^\infty \left| e^{-\xi^2/2} \xi\,d\xi \right|. \] Using~(\ref{prob:400:30}) and~(\ref{prob:400:32}), \bq{prob:750:25} \left|R_n(x)\right| \le \frac{(2n-1)!!}{\left|x\right|^{2n}}, \ \ \Im(x) = 0, \eq which in view of~(\ref{prob:750:20}) has that the magnitude $\left|R_n\right|$ of the error due to truncating the series after~$n$ terms does not exceed the magnitude of the first omitted term. Equation~(\ref{prob:750:20}) thus provides the efficient means we have sought to estimate the CDF accurately for large arguments. % ---------------------------------------------------------------------- \section{The normal quantile} \label{prob:775} \index{normal distribution!quantile of} \index{quantile!of the normal distribution} \index{Newton-Raphson iteration} Though no straightforward quantile~(\ref{prob:quantile}) formula for the normal distribution seems to be known, nothing prevents one from calculating the quantile via the Newton-Raphson iteration~(\ref{drvtv:NR})% \footnote{ When implementing numerical algorithms like these on the computer one should do it intelligently. For example, if $F_\Omega(x_k)$ and~$u$ are both likely to be close to~1, do not ask the computer to calculate and/or store these quantities. Rather, ask it to calculate and/or store $1-F_\Omega(x_k)$ and $1-u$. Then, when~(\ref{prob:775:10}) instructs you to calculate a quantity like $F_\Omega(x_k)-u$, let the computer instead calculate $[1-u]-[1-F_\Omega(x_k)]$, which is arithmetically no different but numerically, on the computer, much more precise. } \bq{prob:775:10} \begin{split} x_{k+1} &= x_k - \frac{F_\Omega(x_k)-u}{\Omega(x_k)}, \\ F_\Omega^{-1}(u) &= \lim_{k \ra \infty} x_k, \\ x_0 &= 0, \end{split} \eq where $F_\Omega(x)$ is as given by~(\ref{prob:100:40}) and/or~(\ref{prob:750:20}) and $\Omega(x)$, naturally, is as given by~(\ref{prob:normdist}). The shape of the normal CDF as seen in Fig.~\ref{prob:normdist-fig}---curving downward traveling right from $x=0$, upward when traveling left, evidently guarantees convergence per Fig.~\ref{drvtv:270:fig1}. \index{lazy convergence} \index{convergence!lazy} \index{bracket} \index{accuracy} \index{na\"ivet\'e} In the large-argument limit, \[ \begin{split} 1-u &\ll 1, \\ x &\gg 1; \end{split} \] so, according to~(\ref{prob:750:20}), \[ F_\Omega(x) \approx 1 - \frac{\Omega(x)}{x} \left( 1 - \frac{1}{x^2} + \cdots \right). \] Substituting this into~(\ref{prob:775:10}) yields, by successive steps, \bqb x_{k+1} &\approx& x_k - \frac{1}{\Omega(x_k)}\left[ 1 - u - \frac{\Omega(x_k)}{x_k} \left( 1 - \frac{1}{x_k^2} + \cdots \right) \right] \\&\approx& x_k - \frac{1-u}{\Omega(x_k)} + \frac{1}{x_k} - \frac{1}{x_k^3} + \cdots \\&\approx& x_k - \frac{\left(\sqrt{2\pi}\right)(1-u)}{1 - x_k^2/2 + \cdots} + \frac{1}{x_k} - \frac{1}{x_k^3} + \cdots \\&\approx& x_k - \bigg(\sqrt{2\pi}\bigg)\bigg(1-u\bigg)\bigg(1 + \frac{x_k^2}{2} + \cdots \bigg) + \frac{1}{x_k} - \frac{1}{x_k^3} + \cdots \\&\approx& x_k - \bigg(\sqrt{2\pi}\bigg)\bigg(1-u\bigg) + \frac{1}{x_k} + \cdots, \eqb suggesting somewhat lazy, but usually acceptable convergence in domains of typical interest (the convergence might be unacceptable if, for example, $x>\mbox{0x40}$, but the writer has never encountered an application of the normal distribution $\Omega(x)$ or its incidents at such large values of~$x$). If unacceptable, various stratagems might be tried to accelerate the Newton-Raphson, or---if you have no need to impress anyone with the pure elegance of your technique but only want the right answer reasonably fast---you might just search for the root in the na\"ive way, trying $F_\Omega(2^0)$, $F_\Omega(2^1)$, $F_\Omega(2^2)$ and so on until identifying a bracket $F_\Omega(2^{k-1}) < u \le F_\Omega(2^k)$; then dividing the bracket in half, then in half again, then again and again until satisfied with the accuracy thus achieved, or until the bracket were strait enough for you to set~$x_0$ to the bracket's lower (not upper) limit and to switch over to~(\ref{prob:775:10}) which performs well when it starts close enough to the root. In truth, though not always stylish, the normal quantile of a real argument is relatively quick, easy and accurate to calculate once you have~(\ref{prob:normdist}), (\ref{prob:100:40}) and~(\ref{prob:750:20}) in hand, even when the performance of~(\ref{prob:775:10}) might not quite suit. You only must remain a little flexible as to the choice of technique.% \footnote{See also \S~\ref{taylor:316.80}.} %% ---------------------------------------------------------------------- % %\section{Statistical mechanics: the ideal gas} %\label{prob:900} %\index{statistical mechanics} %\index{mechanics} %\index{ideal gas} %\index{gas} % %\index{air} %\index{probability!application of} %\index{application} %\index{physical application} %The mechanics of air offer a lovely application of probability theory. %As with most physical applications of mathematics, this application %begins with a simplified, idealized model. % %\index{collision} %\index{common velocity} %\index{differential path} %\index{path} %\index{differential velocity} %\index{velocity!common and differential} %\index{particle} %\index{rigidity} %\index{momentum} %\index{conservation of momentum} %\index{law} %Suppose that air were an \emph{ideal gas,} consisting of a very large %number of tiny, elastic, rigid, spherical, moving particles, the %particles together occupying a negligible fraction of the volume through %which they travel or within which they are confined, the volume thus %being mostly empty. Suppose two such air particles, masses~$m_1$ %and~$m_2$, velocities~$\vu u_1$ and~$\vu u_2$, on a collision course. %One can regard each velocity as the sum of two, separate velocities, %\bq{prob:900:220} % \begin{split} % \ve u_1 &= \ve u_{c} + \ve u_{1d}, \\ % \ve u_2 &= \ve u_{c} + \ve u_{2d}, \\ % \ve u_c &\equiv \frac{ m_1 \ve u_1 + m_2 \ve u_2 }{ m_1 + m_2 }, % \end{split} %\eq %where~$\ve u_c$ is the particles' shared \emph{common velocity} and %where~$\ve u_{1d}$ and~$\ve u_{2d}$ are the particles' respective %\emph{differential velocities.} That the particles are \emph{elastic} %implies by definition that, if~$\ve u_1'$ and~$\ve u_2'$ represent the %particles' respective velocities after the collision, %\bq{prob:900:225} % \begin{split} % u_d' &= u_d, \\ % \ve u_d &\equiv \ve u_2 - \ve u_1 = \ve u_{2d} - \ve u_{1d}, \\ % \ve u_d' &\equiv \ve u_2' - \ve u_1' = \ve u_{2d}' - \ve u_{1d}', % \end{split} %\eq %which is to assert that such particles diverge after a collision as fast %as they converge before it. The physical \emph{law of %conservation of momentum,} which the model accepts as a postulate, %demands that the collision not alter the quantity $m_1 \ve u_1 + m_2 \ve %u_2$. In our present notation we will style this demand as %\bq{prob:900:230} % \ve u_c' = \ve u_c. %\eq %Rearranging and combining the lines of~(\ref{prob:900:220}) and %using the symbol~$\ve u_d$ from~(\ref{prob:900:225}), %\bq{prob:900:235} % \begin{split} % \ve u_{1d} &= -\frac{ m_2 }{ m_1 + m_2 } \ve u_d, \\ % \ve u_{2d} &= +\frac{ m_1 }{ m_1 + m_2 } \ve u_d, % \end{split} %\eq %which says among other things that the paths the particles %\emph{differentially} travel run parallel to one another. For %completeness' sake, let us write here in the manner %of~(\ref{prob:900:220}) that %\bq{prob:900:221} % \begin{split} % \ve u_1' &= \ve u_{c} + \ve u_{1d}', \\ % \ve u_2' &= \ve u_{c} + \ve u_{2d}'; \\ % \end{split} %\eq %and in the manner of~(\ref{prob:900:235}) that %\bq{prob:900:236} % \begin{split} % \ve u_{1d}' &= -\frac{ m_2 }{ m_1 + m_2 } \ve u_d', \\ % \ve u_{2d}' &= +\frac{ m_1 }{ m_1 + m_2 } \ve u_d'; % \end{split} %\eq %since per~(\ref{prob:900:230}) the collision cannot alter the particles' %shared common velocity and since, otherwise, the same physical %considerations apply after the collision as before. Let us also note %that~(\ref{prob:900:225}), (\ref{prob:900:235}) and~(\ref{prob:900:236}) %together imply that %\bq{prob:900:238} % \begin{split} % u_{1d}' &= u_{1d}, \\ % u_{2d}' &= u_{2d}. % \end{split} %\eq % %\index{angle of collision} %\index{collision!angle of} %Imagine at the moment of collision a line connecting the two particles' %centers. Recall that, differentially, the particles travel along %parallel paths; which is to say that the particles share a single %differential direction of travel~$\vu u_d$. Let~$\alpha$ be the angle %between the differential direction~$\vu u_d$ and the line connecting the %centers, an angle we will call the \emph{angle of collision,} such that %$\alpha = 0$ would imply a direct, head-on collision and $\alpha = %2\pi/4$, a merely grazing collision. The particles' rigidity and %spherical shapes then imply the \emph{angle of deflection} %\bq{prob:900:240} % \begin{split} % \theta &= \frac{2\pi}{2} - 2\alpha, \\ % \cos\theta &\equiv \vu u_d \cdot \vu u_d'. % \end{split} %\eq %As it happens, under the foregoing conditions the PDF of~$\theta$ will %turn out to be %\bq{prob:900:245} % f_\theta(\theta) = \frac{\sin\theta}{2}, %\eq %a fact we will prove momentarily. What we wish to notice now is %that~(\ref{prob:900:245}) implies, remarkably, that \emph{collision %unbiasedly randomizes the differential direction of separation~$\vu %u_d'$}. To see the implication, imagine a large sphere of radius~$r$, %centered on the point of collision, moving along with the particles at %the common rate~$\ve u_c$ (we note incidentally, in advance of the next %paragraph, that $r \gg \rho_{\mr{max}}$, but we've not defined the %latter symbol yet). Which region of the sphere's surface is the one %particle or the other most likely to exit through? The answer is that, %inasmuch $dS = 2\pi r \sin\theta \,d\theta$ is a differential area of %the sphere's surface,~(\ref{prob:900:245}) implies that no region of the %surface is preferred. With respect to the differential direction of %travel, the collision of ideal particles apparently is a perfect %randomizer. This is true even though, according %to~(\ref{prob:900:225}), (\ref{prob:900:235}) and~(\ref{prob:900:236}), %such a collision does not alter the particles' differential speeds at %all. % %\index{inertial frame of reference} %\index{frame of reference, inertial} %The foregoing however relies on the assertion, which we will now %support, that~(\ref{prob:900:245}) were true. We shall find the %assertion easier to support if first we adopt the \emph{inertial frame %of reference} of a nonaccelerating observer who moves along with the %particles at the common rate~$\ve u_c$, an observer who regards himself %to be standing at rest---and, thus, from whose point of view $\ve u_c = %0$ and the differential velocities are the only velocities there are. %Adopting the observer's perspective,% %\footnote{ % The inertial frame of reference is a powerful engine for the % generation of physical insight % % diagn % (and an equally powerful engine for the generation of metaphysical % confusion, when the inertial frame is mistaken for a teleological % principle, but let that pass~\cite{Feser}), % because physical modeling usually postulates, and physical observation % generally supports, that \emph{no inertial frame of reference is % privileged}---which, loosely speaking, is to assert that every % nonaccelerating observer is equally justified in regarding himself to % stand at rest, defining the velocities of objects under observation in % this respect. The book you are reading only touches on the concept, % but you will find much more about it in a good introductory physics % book like \cite{Halliday/Resnick} or \cite{Feynman}. %} %let the symbol~$\rho_{\mr{max}}$ represent the sum of the radii of the %two particles---which is the greatest distance that can possibly %separate the particles' parallel paths if the particles are to %collide---and sketch a circle of radius~$\rho_{\mr{max}}$, perpendicular %to~$\vu u_d$, centered on particle~1 at the moment of collision. If the %particles are indeed to collide, particle~2's path must pass %(perpendicularly) through the circle, but there seems to be no %particular reason the path should prefer one region of %the circle over another to pass through. Considering that $dS = 2\pi \rho %\,d\rho$ is a differential area of the circle, we surmise therefore that %\[ % f_\rho(\rho) = \frac{2\rho}{\rho_{\mr{max}}^2}, % \ \ 0 \le \rho < \rho_{\mr{max}}. %\] %But $\rho = \rho_{\mr{max}}\sin\alpha$, so %\[ % f_\rho(\rho) = \frac{2\sin\alpha}{\rho_{\mr{max}}}. %\] %Multiplying by $d\rho = \rho_{\mr{max}}\cos\alpha \,d\alpha$ and %applying~(\ref{prob:080:11}), %\[ % f_\rho(\rho) \,d\rho = 2\sin\alpha\cos\alpha = \sin 2\alpha \,d\alpha, %\] %in which we have applied a trigonometric identity of %Table~\ref{trig:275:table}. According then to~(\ref{prob:080:11}), %\[ % f_\alpha(\alpha) = \sin 2\alpha, %\] %which since $\sin 2\alpha = \sin [(2\pi/2) - \theta] = \sin\theta$ %and $\left|d\alpha\right| = \left|d\theta\right|\!/2$ %implies~(\ref{prob:900:245}) that was to be proved. % %\index{conjecture} %Leaving the observer's inertial frame of reference and returning to our %original perspective, we should like to deduce from---or at least to %impute and test according to---the foregoing considerations the %distribution of speeds of the particles in an ideal gas. Deduction is %hard. Imputation is a matter of intelligent guessing, but after a %patient intellectual labor of several false tries we might at last %impute the not implausible distribution %\bq{prob:900:520} % f_{u_{\ell i}}(u_{\ell i}) = % \Omega\left( \frac{m_\ell^{1/2} u_{\ell i}}{\hat\sigma_u} \right), %\eq %not for the particles' speeds as such but for %the rectangular components~% %$u_{1x}$, %$u_{1y}$, %$u_{1z}$, %$u_{2x}$, %$u_{2y}$ and~% %$u_{2z}$ %of the particles' velocities, %in which $\ell=1,2,$ stands for the $\ell$th particle and, in the %Einstein notation of \S~\ref{vector:240}, $i=x,y,z,$ indicates one of %the three cardinal directions; and furthermore in which~$\hat\sigma_u$ is a %constant peculiar to the gas in question ($\hat\sigma_u$ is %related to the gas' temperature, but we'll not develop the relationship %here). The conjecture will turn out to be right. Really interested in %the particles' speeds, though, we observe according to %\S~\ref{prob:400.40} and its~(\ref{prob:400:40}) %that~(\ref{prob:900:520}) implies that the speeds~$u_1$ and~$u_2$ must %obey the Maxwell distribution %\bq{prob:900:530} % f_{u_\ell}(u_\ell) = \frac{2(m_\ell^{1/2} u_\ell/\hat\sigma_u)^2}{\sqrt{2\pi}} % \exp\left[ -\frac{ (m_\ell^{1/2} u_\ell/\hat\sigma_u)^2 }{2} \right], %\eq %the particles traveling in unbiasedly random directions. %Either~(\ref{prob:900:520}) or %% bad break %(\ref{prob:900:530}) %thus suffices to state the conjecture. We will %prefer~(\ref{prob:900:530}). % %The conjecture~(\ref{prob:900:530}) however remains but a conjecture. %To prove it---in the spirit of the demonstration \S~\ref{prob:100} has %earlier given of~(\ref{prob:100:20})---we will demonstrate that %collisions of particles do not alter the distribution. That is, we %will demonstrate that~(\ref{prob:900:530}) implies that %\bq{prob:900:540} % f_{u_\ell'}(\cdot) = % f_{u_\ell}(\cdot), %\eq %as follows. According to~(\ref{prob:900:221}), $\ve u_\ell' = \ve u_{c} + %\ve u_{\ell d}'$. However, we already learned as~(\ref{prob:900:238}) that %$u_{\ell d}' = u_{\ell d}$ and we have already seen that \emph{collision %unbiasedly randomizes the differential direction of separation;} so we %have now that %\[ % \ve u_\ell' = \ve u_{c} + \vu a u_{\ell d}, %\] %where~$\vu a$ represents the unbiased random direction. From this, for %$\ve u_1'$ by successive steps, %\bqb % \ve u_1' % &=& % \frac{ m_1 \ve u_1 + m_2 \ve u_2 }{ m_1 + m_2 } % - \vu a % \frac{ m_2 }{ m_1 + m_2 } % \left| \ve u_2 - \ve u_1 \right| % \\&=& % \frac{ % m_1 \ve u_1 + m_2\left( % \ve u_2 - \vu a \left| \ve u_2 - \ve u_1 \right| % \right) % }{ m_1 + m_2 } %\eqb %From this. for $u_{1z}'$ by successive steps, %\bqb % u_{1z}' % &=& % \frac{ % m_1 u_{1z} + m_2\left( % u_{2z} - a_z \left| \ve u_2 - \ve u_1 \right| % \right) % }{ m_1 + m_2 } %\eqb derivations-0.53.20120414.orig/btool/0000755000000000000000000000000011742566274015456 5ustar rootrootderivations-0.53.20120414.orig/btool/complete-pdf.cc0000644000000000000000000001073611742566274020353 0ustar rootroot // --------------------------------------------------------------------- // General declarations. // --------------------------------------------------------------------- #include #include #include #include #include #include #include "def.h" #include "Util/roman_numeral.h" #include "Util/TeX_atom.h" #include "Page_no/Page_number.h" #include "TOC/Table.h" #include "PDF/PDF.h" #include "PDF/updator.h" #define PROGNAME "complete-pdf" #define DESC_ONE_LINE \ "fix PDF page numbers and add a table of contents" #define ARGP_ARGS_DOC "PDF PS TOC {TITLE}" #define ARGP_DOC_TAIL \ "The " PROGNAME "(1) command demands as arguments three filenames\n" \ "plus an optional string:\n" \ "\n" \ " PDF - the name of the PDF file;\n" \ " PS - the name of the PostScript file from which gs(1),\n" \ " ps2pdf(1) or the like made the PDF;\n" \ " TOC - the name of the *.toc table-of-contents file LaTeX\n" \ " wrote while compiling the DVI to feed to gs(1);\n" \ " TITLE - optionally, a title to embed in the PDF in case\n" \ " none is embedded already.\n" \ "\n" \ "The command itself alters no file, but prints to stdout\n" \ "text one can subsequently append to the PDF file to complete it.\n" \ "Example: " PROGNAME " foo.pdf foo.ps bar.toc 'My Title' >pdf-addendum\n" std::string filename_pdf = ""; std::string filename_ps = ""; std::string filename_toc = ""; std::string title = ""; int toc_page_prefatory = 0; int toc_page_corporeal = 0; int bib_page = 0; int index_page = 0; // --------------------------------------------------------------------- // Definitions to parse the command line. // --------------------------------------------------------------------- // Prepare the elements argp will need to parse the command line. const char *argp_program_version = PROGNAME " " PROGVER; const char *argp_program_bug_address = "<" BUG_EMAIL ">"; const argp_option argp_option_array[] = { { "toc-page-pref", 'T', "NUM", 0, "the prefatory page number of the document's table of contents", 0 }, { "toc-page-main", 't', "NUM", 0, "the main-body page number of the document's table of contents", 0 }, { "bib-page", 'b', "NUM", 0, "the page number of the document's bibliography", 0 }, { "index-page", 'i', "NUM", 0, "the page number of the document's index", 0 }, { 0, 0, 0, 0, 0, 0 } }; static error_t argp_parse_options( const int key, char *const arg, argp_state *const state ) { static int n_arg = 0; switch (key) { case 'T': toc_page_prefatory = atoi(arg); break; case 't': toc_page_corporeal = atoi(arg); break; case 'b': bib_page = atoi(arg); break; case 'i': index_page = atoi(arg); break; case ARGP_KEY_ARG: if ( n_arg == 0 ) filename_pdf = arg; else if ( n_arg == 1 ) filename_ps = arg; else if ( n_arg == 2 ) filename_toc = arg; else if ( n_arg == 3 ) title = arg; else argp_error( state, "too many arguments" ); ++n_arg; break; case ARGP_KEY_END: if ( n_arg < 3 ) argp_error( state, "too few arguments" ); break; default: return ARGP_ERR_UNKNOWN; } return 0; } // Parse the command line. void parse_cmd_line( const int argc, char **const argv ) { const argp argp1 = { argp_option_array, argp_parse_options, ARGP_ARGS_DOC, PROGNAME " -- " DESC_ONE_LINE "\v" ARGP_DOC_TAIL, 0, 0, 0 }; if ( argp_parse( &argp1, argc, argv, 0, 0, 0 ) ) error( EINVAL, 0, "cannot parse the command line" ); } // --------------------------------------------------------------------- // The main program. // --------------------------------------------------------------------- int main( const int argc, char **const argv ) { parse_cmd_line( argc, argv ); try { std::cout << PDF::updator( filename_pdf, filename_ps, filename_toc, title ); } catch ( Util::TeX_atom::Exc_unbalanced ) { error( EPERM, 0, "found unbalanced braces in the TOC file" ); } catch ( Util::Exc_roman_numeral ) { error( EPERM, 0, "failed to parse a Roman numeral" ); } catch ( PDF::PDF::Exc_IO ) { error( EIO, 0, "could not open, read from or write to the PDF" ); } catch ( PDF::PDF::Exc ) { error( EPERM, 0, "could not interpret the PDF" ); } return 0; } derivations-0.53.20120414.orig/btool/Makefile-optim0000644000000000000000000000010011742566274020233 0ustar rootrootoptim := -O2 werror := $(if $(BUILD_FOR_PACKAGING), , -Werror) derivations-0.53.20120414.orig/btool/README0000644000000000000000000000626211742566274016344 0ustar rootroot In this directory are build-tools---that is, programs used to build the book. Though build-tools are of no direct use to the book's reader and are not distributed with the book once the book is built, neither are they mere optional development helpers; they are integral parts of the book's source. One cannot build the book fully correctly from source without them. Build tools in general might provide any transformation needed during building, but these particular build-tools (or "tool" if there is only one at the moment) serve to add a few typical PDF features like an outline table of contents to the book in PDF. Such features are added automatically (and probably better) by PDFLaTeX, but PDFLaTeX does not coexist happily with Timothy Van Zandt's PSTricks and Seminar packages, which the author likes. (There exists a PDFTricks package, but it does not help much.) The source is organized into several subdirectories. This is a matter of preference. It is not done because the source is particularly large, but only because the author finds it convenient to organize the source in this way, and because there seems little real reason not to do so. [Too many C++ source files and headers in one directory, alphabetically listed by ls(1), make the author squint.] PERIPHERAL REMARKS The source admittedly represents the author's first attempt himself to split a single executable's code into several subdirectories. For the author it is an interesting experiment, and it is probably not done perfectly. The resultant Makefiles are uncomfortably cryptic though surprisingly short [the experiment has taught the author much about make(1)]. Automake probably provides an alternative, but the author would prefer to avoid Automake. Then there are various build systems like BJam, SCons, etc., which the author has not tried. Advice, references to well organized sources as examples, and so forth would be well received. The source has fewer comments than some sources the author has written in the past. Comments play a central role in Fortran 77 and C programming, but the author has learned from Bjarne Stroustrup regarding C++ comments that, when an idea can be expressed in the programming language itself, it usually should be. Many more ideas can be expressed cleanly in C++ than in the older languages. The trouble with unnecessary comments is that one naturally tends to write a comment when one first drafts the code fragment to which it immediately pertains, which means that one writes most comments at a time when the source is substantially incomplete. Revising earlier code fragments when one writes later code (an extensively standard programming activity), one often inadvertently leaves old comments incomplete, irrelevant, inaccurate, misleading or even positively false. The compiler cannot help. Experience suggests that it were better not to have written the unnecessary comments in the first place. (The practice of commenting source liberally, drilled by the first generation of computer scientists into their students, seems likely to go the way of source flowcharting, into the museum of early computer-science history; they're nice ideas, but praxis has found better ways to reach their goals.) derivations-0.53.20120414.orig/btool/Util/0000755000000000000000000000000011742566274016373 5ustar rootrootderivations-0.53.20120414.orig/btool/Util/roman_numeral.h0000644000000000000000000000115611742566274021406 0ustar rootroot #ifndef UTIL_ROMAN_NUMERAL_H #define UTIL_ROMAN_NUMERAL_H #include // --------------------------------------------------------------------- // To convert between roman numerals and ints. // --------------------------------------------------------------------- namespace Util { struct Exc_roman_numeral {}; enum Roman_case { UPPER_ROMAN_CASE = 0, LOWER_ROMAN_CASE }; int roman_to_int( char roman ); int roman_to_int( const char *roman ); int roman_to_int( const std::string &roman ); std::string int_to_roman( int n, Roman_case rcase = UPPER_ROMAN_CASE ); } #endif derivations-0.53.20120414.orig/btool/Util/README0000644000000000000000000000044011742566274017251 0ustar rootroot Here are some general and utility source files. (What is a utility? In a software context, the definition is imprecise. Basically, a utility is something---a source file, a program, etc.---that is small and relatively independent and fills an incidental, auxiliary or generic role.) derivations-0.53.20120414.orig/btool/Util/roman_numeral.cc0000644000000000000000000001007611742566274021545 0ustar rootroot #include "roman_numeral.h" #include // The algorithm here is simple. It observes that each of the Roman // letters MDCLXVI represents a unique positive integer. Where another // letter stands immediately to the right and the rightward letter // represents a larger positive integer, then the present letter's // integer is subtracted from the total; otherwise it is added. For // example, XIV or xiv represents +10-1+5 == 14. If fed an arbitrary // sequence of Roman digits, the algorithm returns a predictable though // not necessarily sensical integer; for instance, on IXIIIVL it // returns -1+10+1+1-1-5+50 == 55 (how a real Roman would have read such // a numeral, one can only guess). The algorithm does return the // expected integer for any proper Roman numeral, and returns zero for // the empty string. int Util::roman_to_int( const char roman ) { switch ( roman ) { case 'I': case 'i': return 1; break; case 'V': case 'v': return 5; break; case 'X': case 'x': return 10; break; case 'L': case 'l': return 50; break; case 'C': case 'c': return 100; break; case 'D': case 'd': return 500; break; case 'M': case 'm': return 1000; break; default: if ( isspace( roman ) ) return 0; else throw Exc_roman_numeral(); } return 0; // unreachable } int Util::roman_to_int( const char *roman ) { int total = 0; int prev = 0; for ( ; *roman; ++roman ) { const int curr = roman_to_int( *roman ); if ( curr ) { if ( prev < curr ) total -= prev << 1; total += prev = curr; } } // A slightly pedantic note on a small point of programming style: // Given the particular values assigned to the seven Roman-numeral // letters, the `total' can never actually come to less than zero. It // could do so however if, for example, the values 4 and 6 were for // some reason assigned respectively to U and W; then, UVW would // represent -4-5+6 == -3. This might surprise a caller who, quite // reasonably, expected a nonnegative return. Admittedly, no one // seems likely to assign such values to the unused Roman letters // (technically U and W, like J and arguably Y and Z, are not // authentic Roman letters at all, but we digress), but for // generality's sake the code prohibits a negative return anyway. The // point of programming style exposed herein is that code not follow // indirect reasoning through a subtle logical shortcut such as, in // this case, depending on `total' already to be // nonnegative---especially when the shortcut in fact cuts little from // the processing time. It seems neater to bound `total' explicitly, // even though the bound were never exercised. return total >= 0 ? total : 0; } int Util::roman_to_int( const std::string &roman ) { return roman_to_int( roman.c_str() ); } std::string Util::int_to_roman( int n, const Roman_case rcase ) { if ( n < 0 ) throw Exc_roman_numeral(); std::string roman; { struct { bool operator()( int *const n, const int r ) const { if ( *n >= r ) { *n -= r; return true; } return false; } } reduce; while ( n > 0 ) { if ( reduce( &n, 1000 ) ) roman += rcase ? "m" : "M"; else if ( reduce( &n, 900 ) ) roman += rcase ? "cm" : "CM"; else if ( reduce( &n, 500 ) ) roman += rcase ? "d" : "D"; else if ( reduce( &n, 400 ) ) roman += rcase ? "cd" : "CD"; else if ( reduce( &n, 100 ) ) roman += rcase ? "c" : "C"; else if ( reduce( &n, 90 ) ) roman += rcase ? "xc" : "XC"; else if ( reduce( &n, 50 ) ) roman += rcase ? "l" : "L"; else if ( reduce( &n, 40 ) ) roman += rcase ? "xl" : "XL"; else if ( reduce( &n, 10 ) ) roman += rcase ? "x" : "X"; else if ( reduce( &n, 9 ) ) roman += rcase ? "ix" : "IX"; else if ( reduce( &n, 5 ) ) roman += rcase ? "v" : "V"; else if ( reduce( &n, 4 ) ) roman += rcase ? "iv" : "IV"; else if ( reduce( &n, 1 ) ) roman += rcase ? "i" : "I"; else throw Exc_roman_numeral(); // unreachable } } return roman; } derivations-0.53.20120414.orig/btool/Util/TeX_atom.cc0000644000000000000000000001203011742566274020416 0ustar rootroot #include "TeX_atom.h" #include using std::string; typedef std::vector vector; typedef std::vector vector_atom; // Split a line of TeX source into tokens. // // This function tokenizes a single line of TeX source. Following TeX // lexical conventions, it makes most characters in the line individual // tokens, but joins a backslash followed by a single non-letter or by // any number of letters into a single token. It ignores leading and // trailing whitespace, but tokenizes internal whitespace, except that // it counts multiple spaces together as a single space and ignores // whitespace following a backslash-letters token. // // The lexical conventions come from no authoritative source, // unfortunately, but are merely inferred from experience in using // LaTeX. // void Util::tokenize_TeX( const string &line, vector *const tokens, const Translate_nobreakspace translate_nobreakspace ) { { struct { bool operator()( const string::const_iterator p, const string::const_iterator end ) { return p != end && *p != '\n'; } } in_string; string ::const_iterator p = line.begin(); const string::const_iterator end = line.end (); bool first_time = true; while ( in_string( p, end ) ) { const string::const_iterator q = p; if ( *p == '\\' ) { if ( in_string( ++p, end ) ) { if ( isalpha(*p) ) { do ++p; while ( in_string( p, end ) && isalpha(*p) ); tokens->push_back( string( q, p ) ); tokens->back() += ' '; while ( in_string( p, end ) && isspace(*p) ) ++p; if ( translate_nobreakspace == TRANSLATE_NOBREAKSPACE && tokens->back() == "\\nobreakspace " && in_string( p , end ) && * p == '{' && in_string( p+1, end ) && *(p+1) == '}' ) { p += 2; tokens->back() = ' '; } } else if ( isspace(*p) ) { do ++p; while ( in_string( p, end ) && isspace(*p) ); tokens->push_back( "\\ " ); } else tokens->push_back( string( q, ++p ) ); } } else if ( isspace(*p) ) { do ++p; while ( in_string( p, end ) && isspace(*p) ); // Cancel leading whitespace. if (!first_time) tokens->push_back( " " ); } else tokens->push_back( string( q, ++p ) ); first_time = false; } } // Cancel trailing whitespace. if ( tokens->end() != tokens->begin() && tokens->back () == " " ) tokens->pop_back (); } void Util::TeX_atom_nonterminal::init( vector ::const_iterator p, const vector::const_iterator end ) { int level = 0; vector::const_iterator q = end; for ( ; p != end; ++p ) { if ( *p == "{" ) { if ( !level ) q = p+1; ++level; } else if ( *p == "}" ) { --level; if ( level < 0 ) throw Exc_unbalanced(); if ( !level ) { push_back( new TeX_atom_nonterminal( q, p ) ); q = end; } } else if ( !level ) push_back( new TeX_atom_terminal(*p) ); } if ( level ) throw Exc_unbalanced(); } string Util::TeX_atom_nonterminal::term() const { string res = "{"; for ( const_iterator p = begin(); p != end(); ++p ) res += (*p)->term(); res += "}"; return res; } Util::TeX_atom_nonterminal::~TeX_atom_nonterminal() { for ( iterator p = begin(); p != end(); ++p ) delete *p; } Util::TeX_atom_nonterminal::TeX_atom_nonterminal( const string &line ) { vector tokens; tokenize_TeX( line, &tokens, TRANSLATE_NOBREAKSPACE ); init( tokens.begin(), tokens.end() ); } Util::TeX_atom_nonterminal::TeX_atom_nonterminal( const vector &tokens ) { init( tokens.begin(), tokens.end() ); } Util::TeX_atom_nonterminal::TeX_atom_nonterminal( const vector::const_iterator begin, const vector::const_iterator end ) { init( begin, end ); } Util::TeX_atom_nonterminal::TeX_atom_nonterminal( const vector_atom &atoms ) { for ( vector_atom::const_iterator p = atoms.begin(); p != atoms.end(); ++p ) push_back( (*p)->replicate() ); } Util::TeX_atom_nonterminal::TeX_atom_nonterminal( vector_atom ::const_iterator p, const vector_atom::const_iterator end ) { for ( ; p != end; ++p ) push_back( (*p)->replicate() ); } Util::TeX_atom_nonterminal::TeX_atom_nonterminal( const TeX_atom_nonterminal &atom ) : TeX_atom(), vector_atom() { for ( const_iterator p = atom.begin(); p != atom.end(); ++p ) push_back( (*p)->replicate() ); } Util::TeX_atom_nonterminal &Util::TeX_atom_nonterminal::operator=( const TeX_atom_nonterminal &atom ) { if ( &atom != this ) { clear(); for ( const_iterator p = atom.begin(); p != atom.end(); ++p ) push_back( (*p)->replicate() ); } return *this; } Util::TeX_atom_terminal &Util::TeX_atom_terminal::operator=( const TeX_atom_terminal &atom ) { if ( &atom != this ) term1 = atom.term(); return *this; } std::ostream &Util::operator<<( std::ostream &os, const TeX_atom &atom ) { return os << atom.term(); } derivations-0.53.20120414.orig/btool/Util/pdf_stringize.cc0000644000000000000000000000276611742566274021564 0ustar rootroot #include "pdf_stringize.h" #include #include #include "def.h" using std::string; // Refer to Adobe's PDF Reference, ver. 1.7, Sect. 3.2.3 and Table 3.2. string Util::pdf_stringize( const char c ) { if ( c == '\n' ) return "\\n" ; else if ( c == '\r' ) return "\\r" ; else if ( c == '\t' ) return "\\t" ; else if ( c == '\b' ) return "\\b" ; else if ( c == '\f' ) return "\\f" ; else if ( c == '('/*)*/ ) return "\\(" ; else if ( c == /*(*/')' ) return "\\)" ; else if ( c == '\\' ) return "\\\\"; else if ( !( 0040 <= c && c < 0177 ) ) { int i = c & 0377; std::ostringstream s; s << std::noshowbase << std::oct << std::setfill('0'); s << '\\' << std::setw(3) << i; return s.str(); } return string( 1, c ); } string Util::pdf_stringize( const std::string &s ) { string pdf = "("; for ( string::const_iterator p = s.begin(); p != s.end() && ( Util::max_pdf_string_length < 0 || // The following test looks odd. A +1 should be added to // max_pdf_string_length to account for the opening delimiter; // a -1 should be subtracted to allow for the possibility that the // next character added will be escaped. The two offset, so // nothing is added or subtracted in net. int(pdf.length()) < Util::max_pdf_string_length ); ++p ) pdf += pdf_stringize(*p); return pdf += ")"; } string Util::pdf_stringize( const char *const s ) { return pdf_stringize( string(s) ); } derivations-0.53.20120414.orig/btool/Util/Makefile0000777000000000000000000000000011742566274023171 2../Makefile-subdirustar rootrootderivations-0.53.20120414.orig/btool/Util/pdf_stringize.h0000644000000000000000000000157111742566274021417 0ustar rootroot #ifndef UTIL_PDF_STRINGIZE_H #define UTIL_PDF_STRINGIZE_H #include // --------------------------------------------------------------------- // To convert a string to a PDF string object. // --------------------------------------------------------------------- // Note: The function pdf_stringize( char c ) returns only the // PDF-string representation of the character c; it does not enclose the // result in parentheses as the other pdf_stringize() functions do. For // the latter effect, one can call pdf_stringize( string( 1, c ) ). // [Arguably, pdf_stringize( char c ) might be given a separate name for // this reason, such as pdf_stringcharize( char c ), but at present no // separate name is given.] namespace Util { std::string pdf_stringize( char c ); std::string pdf_stringize( const std::string &s ); std::string pdf_stringize( const char *s ); } #endif derivations-0.53.20120414.orig/btool/Util/def.h0000644000000000000000000000123211742566274017300 0ustar rootroot #ifndef UTIL_DEF_H #define UTIL_DEF_H namespace Util { // Adobe's PDF Reference 1.7, sect. 3.4, reads, "However, to increase // compatibility with other applications that process PDF files, lines // that are not part of stream object data are limited to no more // than 255 characters, with one exception." Whether any modern PDF // readers actually enforce the limit, this programmer does not know; // but in the PDF source text the present software writes, PDF strings // probably pose the main overrun hazard. The next parameter limits // the these PDF strings' lengths. const int max_pdf_string_length = 0xc0; // -1 for no limit } #endif derivations-0.53.20120414.orig/btool/Util/TeX_atom.h0000644000000000000000000001040611742566274020265 0ustar rootroot #ifndef UTIL_TEX_ATOM_H #define UTIL_TEX_ATOM_H #include #include #include // --------------------------------------------------------------------- // To parse a line of TeX source. // --------------------------------------------------------------------- // The definitions here parse a line of TeX source to the limit of the // programmer's empirical understanding of the TeX grammar. // // The programmer first wrote these definitions to parse LaTeX *.toc // files. Foreseeing that the definitions might someday find use in // another setting, the programmer has tried code cleanly for // extensibility, but naturally has left many or most of the likely // extensions themselves uncoded. In other words, the extensions aren't // there, but there's a proper place to put them if they come. The // code's principal limitation as it stands that it handles only a // single line of TeX source at a time. This suffices to parse // LaTeX *.toc files. To bridge source lines is left until the actual // need should arise; the task if undertaken should not prove too hard. // (Were the task undertaken, the programmer suggests, or at least // speculates, that the neatest way to bridge might be to preprocess the // TeX source to combine lines before the present code ever saw the // source. The reason the programmer has not actually coded the bridge, // besides to save time---admittedly the biggest reason---is that he has // no use case in his way to test it on. With no use case and assuming // that the programmer does not go to the effort to search out or to // contrive a suitably realistic one, the code remains unused. Unused // code tends to linger years in an undetectedly buggy state. Better // not to have written such code in the first place.) // // namespace Util { enum Translate_nobreakspace { DO_NOT_TRANSLATE_NOBREAKSPACE = 0, TRANSLATE_NOBREAKSPACE }; void tokenize_TeX( const std::string &line, std::vector *tokens_ptr, Translate_nobreakspace translate_nobreakspace = DO_NOT_TRANSLATE_NOBREAKSPACE ); class TeX_atom; class TeX_atom_nonterminal; class TeX_atom_terminal; std::ostream &operator<<( std::ostream &os, const TeX_atom &atom ); } class Util::TeX_atom { public: struct Exc_unbalanced {}; virtual bool is_terminal() const = 0; virtual std::string term() const = 0; virtual ~TeX_atom() {} // The interface provides a replicate() method to let a user copy // a TeX_atom in ignorance of its exact type. (There may well exist // a neater way to let the user do this, but the way somehow does // not come to mind at the moment the code is written.) virtual TeX_atom *replicate() const = 0; }; class Util::TeX_atom_nonterminal : public TeX_atom, public std::vector { private: void init( std::vector::const_iterator begin, std::vector::const_iterator end ); public: bool is_terminal() const { return false; } std::string term() const; ~TeX_atom_nonterminal(); TeX_atom *replicate() const { return new TeX_atom_nonterminal(*this); } explicit TeX_atom_nonterminal( const std::string &line ); explicit TeX_atom_nonterminal( const std::vector &tokens ); explicit TeX_atom_nonterminal( std::vector::const_iterator begin, std::vector::const_iterator end ); explicit TeX_atom_nonterminal( const std::vector &atoms ); explicit TeX_atom_nonterminal( std::vector::const_iterator begin, std::vector::const_iterator end ); TeX_atom_nonterminal( const TeX_atom_nonterminal &atom ); TeX_atom_nonterminal &operator=( const TeX_atom_nonterminal &atom ); }; class Util::TeX_atom_terminal : public TeX_atom { private: std::string term1; public: bool is_terminal() const { return true; } std::string term() const { return term1; } TeX_atom *replicate() const { return new TeX_atom_terminal(*this); } explicit TeX_atom_terminal( const std::string &token ) : term1(token) {} TeX_atom_terminal( const TeX_atom_terminal &atom ) : TeX_atom(), term1(atom.term()) {} TeX_atom_terminal &operator=( const TeX_atom_terminal &atom ); }; #endif derivations-0.53.20120414.orig/btool/PDF/0000755000000000000000000000000011742566274016067 5ustar rootrootderivations-0.53.20120414.orig/btool/PDF/README0000644000000000000000000000006511742566274016750 0ustar rootroot Here are source files to interpret and modify PDF. derivations-0.53.20120414.orig/btool/PDF/Iref.h0000644000000000000000000000032511742566274017125 0ustar rootroot #ifndef PDF_IREF_H #define PDF_IREF_H namespace PDF { struct Iref; } struct PDF::Iref { int i; int gen; Iref( const int i0, const int gen0 ) : i(i0), gen(gen0) {} Iref() : i(0), gen(0) {} }; #endif derivations-0.53.20120414.orig/btool/PDF/updator.cc0000644000000000000000000000515211742566274020057 0ustar rootroot #include #include #include #include "../TOC/def.h" #include "updator.h" #include "update_catalog.h" using std::string; typedef std::vector vector; using std::ostringstream; using std::setw; string PDF::updator( PDF &pdf, const Page_no::PS_page_numbering &nog, const TOC::Table &toc, const string &title ) { const int wo = 10; // number of digits in an xref offset const int wg = 5; // number of digits in an xref generation // Study the PDF, the PS and the TOC. Make strings // representing PDF objects for the updated PDF catalog // and for the PDF outline. const string catalog = update_catalog ( pdf, nog ); const string info = add_title_to_info( pdf, title ); vector outline; for ( TOC::Item *item = toc.root(); item; item = TOC::next_in_seq(*item) ) outline.push_back( pdf_object_str( pdf, nog, *item, 0, TOC::OFFSET_RELATIVE ) ); int file_offset = file_length(pdf); // Make a string representing the PDF xref table update. string xref; { ostringstream oss; oss << std::setfill('0'); oss << "xref\n"; // The catalog object index is purposely printed // with minimal field width. oss << iref_catalog(pdf).i << " 1\n"; { // The trailing space in " n \n" is significant. // Refer to Adobe's PDF Reference 1.7, sect. 3.4.3. oss << setw(wo) << file_offset << ' ' << setw(wg) << 0 << " n \n"; file_offset += catalog.length(); } oss << iref_info(pdf).i << " 1\n"; { oss << setw(wo) << file_offset << ' ' << setw(wg) << 0 << " n \n"; file_offset += info.length(); } oss << n_obj(pdf) << ' ' << outline.size() << '\n'; for ( vector::const_iterator p = outline.begin(); p != outline.end(); ++p ) { oss << setw(wo) << file_offset << ' ' << setw(wg) << 0 << " n \n"; file_offset += p->length(); } xref += oss.str(); } // Make a string representing the new PDF trailer. string trailer = update_trailer( pdf, n_obj(pdf) + outline.size(), file_offset ); string res; { res += catalog; res += info; for ( vector::const_iterator p = outline.begin(); p != outline.end(); ++p ) res += *p; res += xref; res += trailer; } return res; } string PDF::updator( const string &filename_pdf, const string &filename_ps, const string &filename_toc, const string &title ) { PDF pdf( filename_pdf ); return updator( pdf, Page_no::PS_page_numbering( filename_ps ), TOC::Table( filename_toc ), title ); } derivations-0.53.20120414.orig/btool/PDF/PDF.h0000644000000000000000000001025611742566274016655 0ustar rootroot #ifndef PDF_PDF_H #define PDF_PDF_H #include #include "Iref.h" #include "../Page_no/PS_page_numbering.h" // --------------------------------------------------------------------- // To interpret a PDF, through an interface narrower // and more conventional than LibPoppler's. // --------------------------------------------------------------------- // Caution: One should view code which calls PDF::PDF::get_PDFDoc_ptr() // with suspicion. The function is a hack necessary for some purposes // which, when used, unfortunately violates encapsulation. // Some notes on this module: // // 1. Various of the comments in this file, in the corresponding *.cc // and maybe elsewhere would seem to suggest that Libpoppler were // somehow deficient. On the contrary, this programmer appreciates the // availability of Libpoppler and is very glad that it frees him from // the need independently to implement its relevant functionality! Of // course Libpoppler could be improved significantly with commensurate // programming effort (by someone), as could most code including the // code you are now reading, but this is not the point. The only point // is that the present code is written in view of the admittedly archaic // interface Libpoppler happens to provide. The purpose of the comments // is not to critize Libpoppler (as if anyone seeking criticism of // Libpoppler would look for it here) but rather solely to explain why // the present code, which uses Libpoppler, does certain things in a way // less than obvious to a typical modern C++ programmer. // // 2. In many established modules, one tends to avoid adding new // symbols unnecessarily, but users of this module should expect new // symbols to be added at any time. This is because the module's // purpose is less to provide its users a thin, stable interface than to // remove one step from them Libpoppler's interface. // // namespace PDF { struct PDF_rep; class PDF; int file_length ( const PDF &pdf ); int offset_last_xref_table( const PDF &pdf ); Iref iref_catalog ( const PDF &pdf ); Iref iref_info ( const PDF &pdf ); int n_obj ( const PDF &pdf ); int offset ( const PDF &pdf, int i ); int n_page ( const PDF &pdf ); int i_page( const PDF &pdf, Iref iref, bool do_not_throw = false ); Iref iref_page ( const PDF &pdf, int i ); } class PDF::PDF { private: PDF_rep *rep; public: // The exception classes are imprecise. Basically, Exc_IO() is // thrown when the problem seems likely to have involved failure to // read a file, whereas Exc_PDF() is thrown when the PDF file is // found but seems to be corrupted. Given the way Libpoppler is // defined, it would take too much programming effort to be more // precise, but if one does not like the exception categories given // then one can always default to catching Exc(). struct Exc {}; struct Exc_IO : public Exc {}; struct Exc_PDF : public Exc {}; explicit PDF( const std::string &filename ); ~PDF(); friend int file_length ( const PDF &pdf ); friend int offset_last_xref_table( const PDF &pdf ); friend Iref iref_catalog ( const PDF &pdf ); friend Iref iref_info ( const PDF &pdf ); friend int n_obj ( const PDF &pdf ); friend int offset ( const PDF &pdf, int i ); friend int n_page ( const PDF &pdf ); friend int i_page( const PDF &pdf, Iref iref, bool do_not_throw ); friend Iref iref_page ( const PDF &pdf, int i ); // The get_PDFDoc_ptr() member function is a necessary hack. Its // intended user is code which calls Libpoppler to manipulate // Libpoppler's representation of a PDF object to dump the object to // a string. Libpoppler's peculiar presentation unfortunately // renders obvious alternatives impractical or unacceptable. To // discourage casual or accidental use, the function demands a magic // integer, the correct value of which you can find hardwired into // the PDF.cc code. PDF_rep *get_PDF_rep( int magic ); }; #endif derivations-0.53.20120414.orig/btool/PDF/PDF_rep.h0000644000000000000000000000221411742566274017516 0ustar rootroot #ifndef PDF_PDF_REP_H #define PDF_PDF_REP_H #include #include #include // fetch() #include // fetch(), is*(), get*() #include // getLength(), add(), get() #include // add(), is(), lookup() #include #include "Iref.h" // (Few users should #include this header directly. It is provided as a // separate header only for the benefit of callers of the discouraged // function PDF::PDF::get_PDF_rep().) namespace { struct PDF_rep; } struct PDF::PDF_rep { int file_length1; PDFDoc *pdfdoc ; XRef *xref ; Dict *trailer ; int n_obj1 ; // Following Libpoppler, `catalog' and `catalog2' give different // interfaces to the same PDF object. The pointer `catalog_obj' is // stored only so that PDF::~PDF() can properly deallocate the thing // it points to, which is probably not directly useful otherwise. Object *catalog_obj ; Dict *catalog ; Catalog *catalog2 ; Iref info_iref ; Object *info_obj ; Dict *info ; }; #endif derivations-0.53.20120414.orig/btool/PDF/PDF.cc0000644000000000000000000000764311742566274017021 0ustar rootroot #include "PDF.h" #include #include "PDF_rep.h" int PDF::file_length( const PDF &pdf ) { return pdf.rep->file_length1; } int PDF::offset_last_xref_table( const PDF &pdf ) { return pdf.rep->xref->getLastXRefPos(); } PDF::Iref PDF::iref_catalog( const PDF &pdf ) { XRef *const xref = pdf.rep->xref; return Iref( xref->getRootNum(), xref->getRootGen() ); } PDF::Iref PDF::iref_info( const PDF &pdf ) { return pdf.rep->info_iref; } int PDF::n_obj( const PDF &pdf ) { return pdf.rep->n_obj1; } int PDF::offset( const PDF &pdf, const int i ) { return pdf.rep->xref->getEntry(i)->offset; } int PDF::n_page( const PDF &pdf ) { return pdf.rep->catalog2->getNumPages(); } int PDF::i_page( const PDF &pdf, const Iref iref, const bool do_not_throw ) { const int i = pdf.rep->catalog2->findPage( iref.i, iref.gen ); if (!do_not_throw && !i) throw PDF::Exc_PDF(); return i; } PDF::Iref PDF::iref_page( const PDF &pdf, const int i ) { const Ref *const rp = pdf.rep->catalog2->getPageRef(i); if (!rp) throw PDF::Exc_PDF(); return Iref( rp->num, rp->gen ); } // The programmer does not feel sure that a little memory is not leaking // here. The amount of memory in question is small, and of course the // system reclaims leaked memory at execution's end, anyway, so the leak // if any is not serious; but even if not serious, leaking still is not // neat. The only documentation for Libpoppler appears to consist of // its development headers, which seem insufficiently informative in the // matter. For these reasons, where in doubt, rather than risking // improper deallocation, the code leaks. PDF::PDF::PDF( const std::string &filename_pdf ) : rep( new PDF_rep() ) { { struct stat s; if ( stat( filename_pdf.c_str(), &s ) ) throw Exc_IO(); rep->file_length1 = s.st_size; } { GooString gs( filename_pdf.c_str() ); rep->pdfdoc = new PDFDoc(&gs); if ( !rep->pdfdoc->isOk() ) throw Exc_IO(); } { rep->xref = rep->pdfdoc->getXRef(); if ( !rep->xref->isOk() ) throw Exc_PDF(); } { Object *const obj = rep->xref->getTrailerDict(); if ( !obj->isDict() ) throw Exc_PDF(); rep->trailer = obj->getDict(); } { Object obj; { char s[] = "Size"; rep->trailer->lookup( s, &obj ); } if ( !obj.isInt() ) throw Exc_PDF(); rep->n_obj1 = obj.getInt(); } { rep->catalog_obj = new Object(); rep->xref->getCatalog( rep->catalog_obj ); if ( !rep->catalog_obj->isDict() ) throw Exc_PDF(); rep->catalog = rep->catalog_obj->getDict(); } { rep->catalog2 = rep->pdfdoc->getCatalog(); if ( !rep->catalog2->isOk() ) throw Exc_PDF(); } { Object obj; { char s[] = "Info"; rep->trailer->lookupNF( s, &obj ); } if ( !obj.isRef() ) throw Exc_PDF(); const Ref ref = obj.getRef(); rep->info_iref = Iref( ref.num, ref.gen ); } { rep->info_obj = new Object(); rep->xref->fetch( rep->info_iref.i, rep->info_iref.gen, rep->info_obj ); if ( !rep->info_obj->isDict() ) throw Exc_PDF(); rep->info = rep->info_obj->getDict(); } } PDF::PDF::~PDF() { delete rep->catalog_obj; delete rep->info_obj; // For reasons this programmer does not understand, the Libpoppler PDFDoc // object does not seem to deallocate gracefully. It is allowed to leak for // this reason. //delete rep->pdfdoc; delete rep; } PDF::PDF_rep *PDF::PDF::get_PDF_rep( const int magic ) { // The function demands a magic integer precisely to discourage // callers from calling it, and conversely to prevent it from // returning disruptive information to unwitting callers. The integer // serves no other purpose. Its value is not elsewhere documented. // If you must call this function, then supply the integer. (The // integer's value has 1s in the zeroth and fifteenth bits, with six // more 1s scattered randomly across the fourteen places between. It // has no significance.) return magic == 0x9f05 ? rep : 0; } derivations-0.53.20120414.orig/btool/PDF/Makefile0000777000000000000000000000000011742566274022665 2../Makefile-subdirustar rootrootderivations-0.53.20120414.orig/btool/PDF/update_catalog.h0000644000000000000000000000116011742566274021212 0ustar rootroot #ifndef PDF_UPDATE_CATALOG_H #define PDF_UPDATE_CATALOG_H #include "../Page_no/PS_page_numbering.h" #include "PDF.h" // --------------------------------------------------------------------- // To add a PageLabel and a reference to Outlines to the PDF catalog. // --------------------------------------------------------------------- namespace PDF { std::string update_catalog( PDF &pdf, const Page_no::PS_page_numbering &nog ); std::string add_title_to_info( PDF &pdf, const std::string &title ); std::string update_trailer( PDF &pdf, int n_pdf_obj, int offset_xref ); } #endif derivations-0.53.20120414.orig/btool/PDF/update_catalog.cc0000644000000000000000000001371511742566274021361 0ustar rootroot #include "update_catalog.h" #include #include #include #include #include #include #include #include "../TOC/def.h" #include "PDF.h" #include "PDF_rep.h" using std::string; typedef std::set set; const int magic = 0x9f05; // deprecated // Beware that Libpoppler's Dict::add() method does not copy the key-string you // feed it. In an earlier Libpoppler, didn't it used to? Maybe not; I can't // clearly remember. In any case, letting the key-string go out of scope now // invalidates the Dict. ---THB, March 2010--- namespace { // This function copies key-value pairs from source to destination // Poppler Dicts, excepting pairs with the specified keys. // It expects the caller to have preinitialized the destination // as an empty Dict. int copy_but( Dict *const dest, Dict *const src, const set &keys ) { const int size = src->getLength(); for ( int i = 0; i < size; ++i ) { char *const key = src->getKey(i); if ( keys.count(key) ) continue; Object obj; src ->getValNF( i , &obj ); dest->add ( key, &obj ); } return size; } } string PDF::update_catalog( PDF &pdf, const Page_no::PS_page_numbering &nog ) { PDF_rep *const rep = pdf.get_PDF_rep(magic); // To understand this code, refer to the Libpoppler headers and // to Adobe's PDF Reference 1.7, sect. 8.3.1. Object catalog_obj; catalog_obj.initDict(static_cast(0)); Dict *catalog = catalog_obj.getDict(); { set keys; { char s[] = "PageLabels"; keys.insert(s); } { char s[] = "Outlines" ; keys.insert(s); } copy_but( catalog, rep->catalog, keys ); } Object dict_Roman_obj; char s_Roman[] = "S"; { dict_Roman_obj.initDict(static_cast(0)); Dict *const dict_Roman = dict_Roman_obj.getDict(); Object name_Roman; { char s[] = "r"; name_Roman.initName(s); } dict_Roman->add( s_Roman, &name_Roman ); } Object dict_Arabic_obj; char s_Arabic[] = "S"; { dict_Arabic_obj.initDict(static_cast(0)); Dict *const dict_Arabic = dict_Arabic_obj.getDict(); Object name_Arabic; { char s[] = "D"; name_Arabic.initName(s); } dict_Arabic->add( s_Arabic, &name_Arabic ); } Object array_obj; { Object zero; zero.initInt( 0 ); Object n_page; n_page.initInt( nog.count_prefatory_page() ); array_obj.initArray(static_cast(0)); Array *const array = array_obj.getArray(); array->add( &zero ); array->add( &dict_Roman_obj ); array->add( &n_page ); array->add( &dict_Arabic_obj ); } Object dict_obj; char s_Nums[] = "Nums"; { dict_obj.initDict(static_cast(0)); Dict *const dict = dict_obj.getDict(); dict->add( s_Nums, &array_obj ); } Object ref_obj; ref_obj.initRef( n_obj(pdf), 0 ); char s_PageLabels[] = "PageLabels"; char s_Outlines [] = "Outlines" ; catalog->add( s_PageLabels, &dict_obj ); catalog->add( s_Outlines , &ref_obj ); string res; { Iref iref = iref_catalog(pdf); std::ostringstream oss; oss << std::setw(TOC::width_i_obj) << iref.i << " " << iref.gen << " obj\n"; res += oss.str(); } // Do print() to a string rather than to stdout or a file. { int fd[2]; pipe(fd); { FILE *q = fdopen( fd[1], "w" ); catalog_obj.print(q); fclose(q); } { FILE *q = fdopen( fd[0], "r" ); int c; while ( (c=fgetc(q)) != EOF ) res += c; fclose(q); } } res += "\nendobj\n"; return res; } string PDF::add_title_to_info( PDF &pdf, const string &title ) { PDF_rep *const rep = pdf.get_PDF_rep(magic); Object info_obj; info_obj.initDict(static_cast(0)); Dict *info = info_obj.getDict(); { set keys; copy_but( info, rep->info, keys ); } char s_Title[] = "Title"; { Object obj_old; info->lookup( s_Title, &obj_old ); if ( obj_old.isNull() ) { Object obj_new; { GooString gs( title.c_str() ); obj_new.initString( &gs ); } info->add( s_Title, &obj_new ); } } string res; { const Iref iref = iref_info(pdf); std::ostringstream oss; oss << std::setw(TOC::width_i_obj) << iref.i << " " << iref.gen << " obj\n"; res += oss.str(); } // Do print() to a string rather than to stdout or a file. { int fd[2]; pipe(fd); { FILE *q = fdopen( fd[1], "w" ); info_obj.print(q); fclose(q); } { FILE *q = fdopen( fd[0], "r" ); int c; while ( (c=fgetc(q)) != EOF ) res += c; fclose(q); } } res += "\nendobj\n"; return res; } string PDF::update_trailer( PDF &pdf, const int n_pdf_obj, const int offset_xref ) { PDF_rep *const rep = pdf.get_PDF_rep(magic); // To understand this code, refer to the Libpoppler headers and // to Adobe's PDF Reference 1.7, sect. 3.4.4. Object new_trailer_obj; new_trailer_obj.initDict(static_cast(0)); Dict *new_trailer = new_trailer_obj.getDict(); { set keys; { char s[] = "Size"; keys.insert(s); } { char s[] = "Prev"; keys.insert(s); } { char s[] = "ID" ; keys.insert(s); } copy_but( new_trailer, rep->trailer, keys ); } char s_Size[] = "Size"; { Object obj; obj.initInt( n_pdf_obj ); new_trailer->add( s_Size, &obj ); } char s_Prev[] = "Prev"; { Object obj; obj.initInt( offset_last_xref_table(pdf) ); new_trailer->add( s_Prev, &obj ); } string res = "trailer\n"; // Do print() to a string rather than to stdout or a file. { int fd[2]; pipe(fd); { FILE *q = fdopen( fd[1], "w" ); new_trailer_obj.print(q); fclose(q); } { FILE *q = fdopen( fd[0], "r" ); int c; while ( (c=fgetc(q)) != EOF ) res += c; fclose(q); } } res += "\nstartxref\n"; { std::ostringstream oss; oss << offset_xref << '\n'; res += oss.str(); } res += "%%EOF\n"; return res; } derivations-0.53.20120414.orig/btool/PDF/updator.h0000644000000000000000000000146211742566274017721 0ustar rootroot #ifndef PDF_UPDATOR_H #define PDF_UPDATOR_H #include #include "../Page_no/PS_page_numbering.h" #include "../TOC/Table.h" #include "PDF.h" // --------------------------------------------------------------------- // To produce an update string suitable for appending to the PDF. // --------------------------------------------------------------------- // (Is "updator" a word? This programmer does not know, but it fits // nevertheless.) namespace PDF { std::string updator( PDF &pdf, const Page_no::PS_page_numbering &nog, const TOC::Table &toc, const std::string &title = std::string() ); std::string updator( const std::string &filename_pdf, const std::string &filename_ps, const std::string &filename_toc, const std::string &title = std::string() ); } #endif derivations-0.53.20120414.orig/btool/test.cc0000644000000000000000000000023111742566274016740 0ustar rootroot // You can add code to this file as desired // to create a temporary test driver. #include #include int main() { return 0; } derivations-0.53.20120414.orig/btool/Page_no/0000755000000000000000000000000011742566274017026 5ustar rootrootderivations-0.53.20120414.orig/btool/Page_no/README0000644000000000000000000000021411742566274017703 0ustar rootroot Here are source files related to the book's page numbering (i, ii, iii, iv, ... in the prefatory pages; 1, 2, 3, ... in the book's body). derivations-0.53.20120414.orig/btool/Page_no/PS_page_numbering.h0000644000000000000000000000122511742566274022563 0ustar rootroot #ifndef PAGE_NO_PS_PAGE_NUMBERING_H #define PAGE_NO_PS_PAGE_NUMBERING_H #include // --------------------------------------------------------------------- // To extract the page numbering from a PS file. // --------------------------------------------------------------------- namespace Page_no { class PS_page_numbering; } class Page_no::PS_page_numbering { private: int n_prefatory_page1; int n_corporeal_page1; public: explicit PS_page_numbering( const std::string &filename_ps ); int count_prefatory_page() const { return n_prefatory_page1; } int count_corporeal_page() const { return n_corporeal_page1; } }; #endif derivations-0.53.20120414.orig/btool/Page_no/Makefile0000777000000000000000000000000011742566274023624 2../Makefile-subdirustar rootrootderivations-0.53.20120414.orig/btool/Page_no/PS_page_numbering.cc0000644000000000000000000001174111742566274022725 0ustar rootroot #include "PS_page_numbering.h" #include #include #include #include #include #include namespace { // The caller must manually free() the buffer this function returns. char *match_dup( const char *const line, const regmatch_t *const pmatch ) { return strndup( line + pmatch->rm_so, pmatch->rm_eo - pmatch->rm_so ); } } Page_no::PS_page_numbering::PS_page_numbering( const std::string &filename_ps ) { using std::string; std::ifstream file_ps( filename_ps.c_str() ); if ( !file_ps ) error( EIO, 0, "cannot read the PS file" ); n_prefatory_page1 = 0; n_corporeal_page1 = 0; regex_t preg_Pages; const size_t nmatch_Pages = 2; const char regex_Pages[] = "^%%Pages:[[:space:]]+([[:digit:]]+)[[:space:]]*$"; regmatch_t pmatch_Pages[nmatch_Pages]; regex_t preg_PageOrder; const size_t nmatch_PageOrder = 2; const char regex_PageOrder[] = "^%%PageOrder:[[:space:]]+([^[:space:]]+)[[:space:]]*$"; regmatch_t pmatch_PageOrder[nmatch_PageOrder]; regex_t preg_Page; const size_t nmatch_Page = 3; const char regex_Page[] = "^%%Page:" "[[:space:]]+([[:digit:]]+)" "[[:space:]]+([[:digit:]]+)" "[[:space:]]*$"; regmatch_t pmatch_Page[nmatch_Page]; if ( regcomp( &preg_Pages , regex_Pages , REG_EXTENDED ) || regcomp( &preg_PageOrder, regex_PageOrder, REG_EXTENDED ) || regcomp( &preg_Page , regex_Page , REG_EXTENDED ) ) error( EPERM, 0, "internal malfunction (cannot compile the PS regexes)" ); int n_page_whole = 0; { bool have_found_Pages = false; bool have_found_PageOrder = false; string line; for (;;) { std::getline( file_ps, line ); if ( file_ps.eof() ) break; const char *const line1 = line.c_str(); if ( !regexec( &preg_Page, line1, nmatch_Page, pmatch_Page, 0 ) ) { if ( !( have_found_Pages && have_found_PageOrder ) ) error( EPERM, 0, "found \"%%%%Page:\" in the PS file " "before finding \"%%%%Pages:\" and \"%%%%PageOrder:\"" ); char *const i_page_part_str = match_dup( line1, &pmatch_Page[1] ); const int i_page_part = atoi( i_page_part_str ); free( i_page_part_str ); char *const i_page_whole_str = match_dup( line1, &pmatch_Page[2] ); const int i_page_whole = atoi( i_page_whole_str ); free( i_page_whole_str ); // At this point, i_page_part and i_page_whole hold a single // page's page numbers with respect, respectively, to a part of // the document and to the whole document. { const char err_msg[] = "confused by irregular page numbering in the PS file; " "expecting an orderly sequence of corporeal " "(main body) pages " "possibly preceded by an orderly sequence of prefatory pages, " "each sequence beginning from page 1"; if ( i_page_part == i_page_whole ) { if ( n_corporeal_page1 || i_page_part != n_prefatory_page1 + 1 ) error( EPERM, 0, err_msg ); n_prefatory_page1 = i_page_part; } else { if ( i_page_whole != i_page_part + n_prefatory_page1 || i_page_part != n_corporeal_page1 + 1 ) error( EPERM, 0, err_msg ); n_corporeal_page1 = i_page_part; } } } else if ( !regexec( &preg_Pages, line1, nmatch_Pages, pmatch_Pages, 0 ) ) { if ( have_found_Pages ) error( EPERM, 0, "found more than one \"%%%%Pages:\" line in the PS file" ); have_found_Pages = true; char *const n_page_whole_str = match_dup( line1, &pmatch_Pages[1] ); n_page_whole = atoi( n_page_whole_str ); free( n_page_whole_str ); } else if ( !regexec( &preg_PageOrder, line1, nmatch_PageOrder, pmatch_PageOrder, 0 ) ) { if ( have_found_PageOrder ) error( EPERM, 0, "found more than one \"%%%%PageOrder:\" line in the PS file" ); have_found_PageOrder = true; char *const pageOrder_str = match_dup( line1, &pmatch_PageOrder[1] ); if ( strcmp( pageOrder_str, "Ascend" ) ) error( EPERM, 0, "sorry, am (as yet) programmed to handle " "only PS \"%%%%PageOrder: Ascend\"" ); free( pageOrder_str ); } } } if ( !n_corporeal_page1 ) { n_corporeal_page1 = n_prefatory_page1; n_prefatory_page1 = 0; } if ( n_prefatory_page1 + n_corporeal_page1 != n_page_whole ) error( EPERM, 0, "the PS \"%%%%Pages:\" line gives a number " "different than the number of pages actually found" ); } derivations-0.53.20120414.orig/btool/Page_no/Page_number.h0000644000000000000000000000076511742566274021433 0ustar rootroot #ifndef PAGE_NO_PAGE_NUMBER_H #define PAGE_NO_PAGE_NUMBER_H // --------------------------------------------------------------------- // To represent a page number in the book. // --------------------------------------------------------------------- namespace Page_no { enum Book_part { PREFACE = 0, BODY }; struct Page_number; } struct Page_no::Page_number { Book_part part; int i; Page_number( const Book_part part0 = PREFACE, const int i0 = 0 ) : part(part0), i(i0) {} }; #endif derivations-0.53.20120414.orig/btool/romanize.cc0000644000000000000000000000042311742566274017610 0ustar rootroot #include #include #include "Util/roman_numeral.h" // Convert a numeral from Arabic to Roman. int main( int, char **argv ) { for ( ++argv ; *argv; ++argv ) std::cout << int_to_roman( atoi(*argv), Util::LOWER_ROMAN_CASE ) << '\n'; return 0; } derivations-0.53.20120414.orig/btool/Makefile0000644000000000000000000000315211742566274017117 0ustar rootroot prog := complete-pdf romanize srcdir := Util Page_no TOC PDF warn := -Wall -Wextra allobj := $(foreach dir, $(srcdir), \ $(patsubst %.cc, %.o, \ $(filter \ $(patsubst %.h, %.cc, $(wildcard $(dir)/*.h)), \ $(wildcard $(dir)/*.cc) \ ) \ ) \ ) include Makefile-optim warn += $(werror) clean := cleanless clean .PHONY: FORCE all alld allobj $(clean) all : $(prog) # The `alld:' target seems even to the Make program itself to make # nothing, but notice the ifneq below the target. The include directive # therein makes the several %.d implicitly. (The `alld:' maneuver is to # let "make clean [program]" work as expected. Observe however that the # makefile includes the several %.d before it processes any targets, # even the `clean:' target. If this is of concern for some # reason---maybe because one has been monkeying with timestamps---then # one must give "make clean" and "make [program]" as separate commands.) alld: ifneq ($(strip $(filter-out $(clean), $(MAKECMDGOALS))),) alld := $(patsubst %.cc, %.d, $(wildcard *.cc)) include $(alld) endif allobj: $(allobj) $(foreach dir, $(srcdir), $(dir)/%): FORCE; $(MAKE) -C $(@D) $(@F) %.d: %.cc; g++ -MM $< | sed -e 's/:/ $*.d:/' >$@ %.o:; g++ $(warn) $(optim) -c $< -o $*.o complete-pdf: complete-pdf.o $(allobj) g++ $(warn) $(optim) -lpoppler $^ -o $@ romanize: romanize.o Util/roman_numeral.o g++ $(warn) $(optim) $^ -o $@ a.out: test.o $(allobj) g++ $(warn) $(optim) -lpoppler $^ -o $@ cleanless: $(foreach dir, $(srcdir), $(MAKE) -C $(dir) clean ;) rm -fv *.d *.o *.gch a.out $(if $(alld), $(MAKE) alld) clean: cleanless; rm -fv $(prog) derivations-0.53.20120414.orig/btool/def.h0000644000000000000000000000060111742566274016362 0ustar rootroot #ifndef DEF_H #define DEF_H // To work around g++'s -I-/-iquote bug, one should // #include this file as "Util/../def.h" rather than // simply as "def.h". // It would be neater to define the following each // as "const string", but it is easier to use // when they are macros; so, macros they are. #define PROGVER "0.1.0" #define BUG_EMAIL "thb@derivations.org" #endif derivations-0.53.20120414.orig/btool/TOC/0000755000000000000000000000000011742566274016103 5ustar rootrootderivations-0.53.20120414.orig/btool/TOC/README0000644000000000000000000000010111742566274016753 0ustar rootroot Here are source files related to the book's table of contents. derivations-0.53.20120414.orig/btool/TOC/Makefile0000777000000000000000000000000011742566274022701 2../Makefile-subdirustar rootrootderivations-0.53.20120414.orig/btool/TOC/def.h0000644000000000000000000000075611742566274017022 0ustar rootroot #ifndef TOC_DEF_H #define TOC_DEF_H #include "Sect_level.h" namespace TOC { const int width_i_obj = 6; const Sect_level level_open_in_outline = CHAPTER; // If suppress_emph is true, then the PDF table of contents // will nonrecursively render any LaTeX "\emph{foo}" // in a section title as "foo". (It will not alter // constructs like "\mbox{\emph{foo}}" or even "{\emph{foo}}"; // it touches \emph only in the outermost scope.) const bool suppress_emph = true; } #endif derivations-0.53.20120414.orig/btool/TOC/Sect_level.h0000644000000000000000000000033511742566274020342 0ustar rootroot #ifndef TOC_SECT_LEVEL_H #define TOC_SECT_LEVEL_H namespace TOC { enum Sect_level { ROOT = 0, PART, CHAPTER, SECTION, SUBSECTION, SUBSUBSECTION, PARAGRAPH, SUBPARAGRAPH }; } #endif derivations-0.53.20120414.orig/btool/TOC/Table.cc0000644000000000000000000002147711742566274017454 0ustar rootroot #include "Table.h" #include #include #include #include #include #include #include #include "def.h" #include "../Util/roman_numeral.h" #include "../Util/pdf_stringize.h" using std::string; using Util::TeX_atom; using Util::TeX_atom_nonterminal; using Util::TeX_atom_terminal; TOC::Item *TOC::next_in_seq( const Item &item, const bool descend ) { if ( descend && item.first ) { return item.first; } else if ( item.next ) { return item.next ; } else if ( item.parent ) { return next_in_seq( *item.parent, false ); } return 0; } int TOC::count_children_recursively( const Item &item ) { int n = 0; for ( Item *p = item.first; p; p = p->next ) n += count_children_recursively(*p) + 1; return n; } int TOC::count_items( const Table &table ) { return count_children_recursively( *table.root() ) + 1; } std::string TOC::pdf_object_str( const PDF::PDF &pdf, const Page_no::PS_page_numbering &nog, const Item &item, const int offset, const Offset_mode offset_mode ) { using std::setw; class I_obj { private: const PDF::PDF &pdf; const Page_no::PS_page_numbering &nog; const int offset; const Offset_mode offset_mode; public: I_obj( const PDF::PDF &pdf0, const Page_no::PS_page_numbering &nog0, const int offset0, const Offset_mode offset_mode0 ) : pdf(pdf0), nog(nog0), offset(offset0), offset_mode(offset_mode0) {} int operator()( const Item *const item ) { return item->i() + offset + ( offset_mode == OFFSET_RELATIVE ? n_obj(pdf) : 0 ); } } i_obj( pdf, nog, offset, offset_mode ); std::ostringstream oss; const int &w = TOC::width_i_obj; if ( item.i() ) { { const string sect = item.sect_no(); const string sect_and_title = ( sect.length() ? sect + " " : "" ) + item.title(); oss << setw(w) << i_obj( &item ) << " 0 obj\n" << "<<\n" << " /Title " << Util::pdf_stringize(sect_and_title) << "\n"; } oss << " /Parent " << setw(w) << i_obj( item.parent ) << " 0 R\n"; if ( item.prev ) oss << " /Prev " << setw(w) << i_obj( item.prev ) << " 0 R\n"; if ( item.next ) oss << " /Next " << setw(w) << i_obj( item.next ) << " 0 R\n"; if ( item.first && item.last ) { oss << " /First " << setw(w) << i_obj( item.first ) << " 0 R\n" << " /Last " << setw(w) << i_obj( item.last ) << " 0 R\n"; const int count = count_children_recursively( item ); oss << " /Count " << setw(w) << ( item.level() < level_open_in_outline ? +count : -count ) << "\n"; } { const Page_no::Page_number i_page = item.i_page(); const int i_page_whole = ( i_page.part == Page_no::BODY ? nog.count_prefatory_page() : 0 ) + i_page.i; PDF::Iref iref = PDF::iref_page( pdf, i_page_whole ); oss << " /Dest [ " << setw(w) << iref.i << ' ' << iref.gen << " R /XYZ null null null ]\n"; } oss << ">>\n" << "endobj\n"; } else { oss << setw(w) << i_obj( &item ) << " 0 obj\n" << "<<\n" << " /Type /Outlines\n"; if ( item.first && item.last ) oss << " /First " << setw(w) << i_obj( item.first ) << " 0 R\n" << " /Last " << setw(w) << i_obj( item.last ) << " 0 R\n" << " /Count " << setw(w) << count_children_recursively( item ) << "\n"; oss << ">>\n" << "endobj\n"; } return oss.str(); } void TOC::Item::clear_obj() { level1 = ROOT; sect_no1 = ""; title1 = ""; i_page1 = Page_number(); parent = prev = next = first = last = 0; } // As an example, here is a typical line from a LaTeX TOC file: // \contentsline {subsection}{\numberline {1.2.1}Axiom and definition}{2} void TOC::Item::init( const TeX_atom_nonterminal &atom ) { clear_obj(); TeX_atom_nonterminal ::const_iterator p = atom.begin(); const TeX_atom_nonterminal::const_iterator p_end = atom.end (); // Get the \contentsline command name. if ( !( p != p_end && (*p)->is_terminal() && (*p)->term() == "\\contentsline " ) ) throw Exc(); // Read the section's TOC level. if ( !( ++p != p_end && !(*p)->is_terminal() ) ) throw Exc(); { const string l = (*p)->term(); if ( l == "{part}" ) level1 = PART ; else if ( l == "{chapter}" ) level1 = CHAPTER ; else if ( l == "{section}" ) level1 = SECTION ; else if ( l == "{subsection}" ) level1 = SUBSECTION ; else if ( l == "{subsubsection}" ) level1 = SUBSUBSECTION; else if ( l == "{paragraph}" ) level1 = PARAGRAPH ; else if ( l == "{subparagraph}" ) level1 = SUBPARAGRAPH ; else throw Exc(); // The next line is not necessarily permanent. It tells the code // to ignore the book's parts, regarding only chapters, sections // and so forth. if ( level1 == PART ) throw Exc(); } // Read the section's number and title. if ( !( ++p != p_end ) ) throw Exc(); { // The collect() functor collects and stringizes // all remaining available tokens. struct { string operator()( TeX_atom_nonterminal ::const_iterator *const qp, const TeX_atom_nonterminal::const_iterator q_end ) { string res; for ( ; *qp != q_end; ++*qp ) { if ( suppress_emph && (**qp)->term() == "\\emph " ) { TeX_atom_nonterminal::const_iterator r = *qp + 1; if ( r != q_end ) { ++*qp; const string t = (*r)->term(); res += (*r)->is_terminal() ? t : t.substr( 1, t.length()-2 ); } } else res += (**qp)->term(); } return res; } } collect; TeX_atom_nonterminal *d = dynamic_cast(*p); if (d) { TeX_atom_nonterminal ::const_iterator q = d->begin(); const TeX_atom_nonterminal::const_iterator q_end = d->end (); if ( q != q_end && (*q)->is_terminal() && (*q)->term() == "\\numberline " ) { if ( !( ++q != q_end ) ) throw Exc(); { const string t = (*q)->term(); sect_no1 = (*q)->is_terminal() ? t : t.substr( 1, t.length()-2 ); } title1 += collect( &++q, q_end ); } // This "else" is for the Preface's TOC listing, // which has no section number. title1 += collect( &q, q_end ); } else title1 = (*p)->term(); // unlikely } // Read the page number. if ( !( ++p != p_end && !(*p)->is_terminal() ) ) throw Exc(); { const string t = (*p)->term(); const string s = t.substr( 1, t.length()-2 ); if ( s.end() != s.begin() && isdigit(s[0]) ) { i_page1.part = BODY; i_page1.i = atoi( s.c_str() ); } else { i_page1.part = PREFACE; i_page1.i = Util::roman_to_int( s.c_str() ); } } // Verify that nothing follows except maybe a TeX comment. if ( !( ++p == p_end || ( (*p)->is_terminal() && (*p)->term() == " " && ( ++p == p_end || ( (*p)->is_terminal() && (*p)->term() == "%" ) ) ) ) ) throw Exc(); } int TOC::Item::delete_children() { int n = 0; { Item *pnext = 0; for ( Item *p = first; p; p = pnext ) { n += p->delete_children(); pnext = p->next; delete p; ++n; } } first = last = 0; return n; } TOC::Table::Table( const string &filename_toc ) { root1 = new Item(); std::ifstream file_toc( filename_toc.c_str() ); if ( !file_toc ) error( EIO, 0, "cannot read the TOC file" ); // Build the tree. (Node-tree code is always dense, // but commenting it really does not help much. The clearest // explanation of such code is probably the code itself.) { Item *target = root1; int i = 1; for (;;) { string line; std::getline( file_toc, line ); if ( file_toc.eof() ) break; Item *item; try { item = new Item( line, i ); } catch ( Item::Exc ) { continue; } ++i; while ( target->level() >= item->level() ) { target = target->parent; if ( !target ) error( EINVAL, 0, "internal malfunction " "(generated a loose TOC::Item)" ); } item->parent = target; if ( target->first ) { item ->prev = target->last; target->last->next = item; target->last = item; } else target->first = target->last = item; target = item; } } } TOC::Table::Table() { root1 = new Item(); } TOC::Table::~Table() { root1->delete_children(); delete root1; } derivations-0.53.20120414.orig/btool/TOC/Table.h0000644000000000000000000000546311742566274017313 0ustar rootroot #ifndef TOC_TABLE_H #define TOC_TABLE_H #include #include "../Util/TeX_atom.h" #include "../Page_no/Page_number.h" #include "../Page_no/PS_page_numbering.h" #include "Sect_level.h" #include "../PDF/PDF.h" // --------------------------------------------------------------------- // The book's table of contents. // --------------------------------------------------------------------- namespace TOC { using Page_no::Book_part; using Page_no::PREFACE; using Page_no::BODY; using Page_no::Page_number; class Item; class Table; // The function next_in_sequence() lets one treat a tree of TOC_items // as a flat sequence. As the name suggests, for each TOC_item in the // sequence, next_in_sequence() returns a pointer to the next // TOC_item, or null if none is next. It flattens the sequence // depth-first. Beginning at the tree's root and following the // sequence therefrom, calling next_in_sequence() repeatedly, one // eventually encounters each TOC_item in the tree exactly once. Item *next_in_seq( const Item &item, bool descend = true ); int count_children_recursively( const Item &item ); int count_items( const Table &table ); enum Offset_mode { OFFSET_RELATIVE = 0, OFFSET_ABSOLUTE }; std::string pdf_object_str( const PDF::PDF &pdf, const Page_no::PS_page_numbering &nog, const Item &item, const int offset = 0, const Offset_mode offset_mode = OFFSET_RELATIVE ); } class TOC::Item { private: Sect_level level1; std::string sect_no1; std::string title1; Page_number i_page1; const int i1; void clear_obj(); void init( const Util::TeX_atom_nonterminal &atom ); public: struct Exc {}; // One could hide the pointers the next line defines, // but to what purpose? The standard C++ practice of hiding // information seems unhelpful in this particular case. Item *parent, *prev, *next, *first, *last; Sect_level level () const { return level1 ; } std::string sect_no() const { return sect_no1; } std::string title () const { return title1 ; } Page_number i_page () const { return i_page1 ; } int i () const { return i1 ; } explicit Item( const Util::TeX_atom_nonterminal &atom, const int i0 = 0 ) : i1(i0) { init( atom ); } explicit Item( const std::string &line, const int i0 = 0 ) : i1(i0) { init( Util::TeX_atom_nonterminal( line ) ); } Item() : i1(0) { clear_obj(); } int delete_children(); }; class TOC::Table { private: Item *root1; public: Item *root() const { return root1; } explicit Table( const std::string &filename_toc ); Table(); ~Table(); }; // One could define copy constructors, copy assignors, etc., but for the // purpose at hand such do not seem to be needed. #endif derivations-0.53.20120414.orig/btool/fill-toc-ends0000755000000000000000000000130611742566274020044 0ustar rootroot#! /bin/bash -e # The script takes two filenames as arguments: the *.toc and *.llg files # LaTeX has generated, the latter on special instructions in the book's source. # From the *.llg, it reads the page numbers of the book's table of contents, # bibliography and index, writing to stdout an updated *.toc file with these # entries added. [[ $# == 2 ]] || false OLDTOC=$1 PAGE_NO=$2 PROGDIR=$( dirname $0 ) TEXDIR=$PROGDIR/../tex function page_no { sed -ne \ "s/^[[:space:]]*$1[[:space:]]\\+\\([^[:space:]]\\+\\)[[:space:]]*\$/\\1/p;T;q" \ $PAGE_NO } function toc_line { echo "\\contentsline {chapter}{$1}{$( page_no $1 )}" } toc_line Contents cat $OLDTOC toc_line Bibliography toc_line Index derivations-0.53.20120414.orig/btool/Makefile-subdir0000644000000000000000000000107311742566274020405 0ustar rootroot # This makefile is meant to be used only when accessed # through a symbolic link from an immediate subdirectory. warn := -Wall -Wextra include ../Makefile-optim warn += $(werror) clean := cleanless clean .PHONY: all alld $(clean) all : alld: ifneq ($(strip $(filter-out $(clean), $(MAKECMDGOALS))),) alld := $(patsubst %.cc, %.d, $(wildcard *.cc)) include $(alld) endif %.d: %.cc; g++ -MM $< | sed -e 's/:/ $*.d:/' >$@ %.o:; g++ $(warn) $(optim) -c $< -o $*.o cleanless: rm -fv *.d *.o *.gch a.out $(if $(alld), $(MAKE) alld) clean: cleanless; rm -fv $(prog) derivations-0.53.20120414.orig/doc/0000755000000000000000000000000011742576546015107 5ustar rootrootderivations-0.53.20120414.orig/doc/README0000644000000000000000000000006111742566274015761 0ustar rootroot This directory contains package documentation. derivations-0.53.20120414.orig/doc/changelog0000644000000000000000000021662011742576546016770 0ustar rootrootderivations (0.53.20120414) * Generally reorganized the book's development toward, presentation of, and plan for the mathematics of special functions, an effort which involved fairly extensive changes in, among others, + tex/integ.tex (The integral), + tex/fours.tex (The Fourier series), + tex/fouri.tex (The Fourier transform), + tex/prob.tex (Probability), and + tex/stub.tex; plus various minor changes in other parts of the book. Because of this reorganization, advanced the book's version number to 0.53. * Renamed the book's third part as "Transforms and special functions." * Rewrote the book's preface. * In tex/alggeo.tex (Classical algebra and geometry), renamed "Triangle area" as "The area of a triangle." * As a temporary workaround to aid the book's packaging on Debian (and maybe on other platforms), borrowed the following files from ftp://ftp.ctan.org/tex-archive/macros/latex/contrib/xkeyval/run/: keyval.tex; pst-xkey.sty; pst-xkey.tex; xkeyval.sty; xkeyval.tex; xkvltxp.sty; xkvtxhdr.tex; xkvview.sty. -- Thaddeus H. Black Sat, 14 Apr 2012 00:00:00 +0000 derivations (0.52.20100629) * Did not release this version, which represented a internal stage for convenience of development. * Added to the preface a paragraph suggesting the book's standing with respect to a collegiate curriculum. * Extended the very end of tex/intro.tex (Introduction), adding two paragraphs to warn the reader fairly that the book can sometimes make dry reading. * In tex/alggeo.tex (Classical algebra and geometry), "Functions," added a paragraph introducing function inverses and their notation. * In tex/cexp.tex (The complex exponential): + Shortened the beginning of the chapter's opening paragraph. + Divided "Complex trigonometrics" into subsections, adding the new subsection "Inverse complex trigonometrics." Included the new subsection's results in the existing table "Complex exponential properties," reformatting the table to accommodate. + Propagated Edward Feser's geographical orthography, correcting "Occam" to "Ockham," and (to the extent to which the author grasps it) propagated Feser's contra-Ockham credit to Aristotle. * In tex/integ.tex (The integral), explained better in "Multiple integrals" why formal tests for convergence are sometimes not recommended in applications. * In tex/taylor.tex (The Taylor series): + In "Expanding functions in Taylor series," substantially rewrote the long footnote on the Taylor series' sufficiency. (Ordinarily this changelog tends not mention changes to mere footnotes, but the peculiar footnote in question had been excluded from the main narrative for an unusual reason. In any event, the change seemed worth mentioning here.) + Added a non-Taylor-based division to "Odd and even functions." * Ended tex/inttx.tex (Integration techniques) with a forward reference to the new tex/fouri.tex (The Fourier transform), drawing the reader's attention to the cylindrical technique introduced in the latter chapter to integrate the Gaussian pulse. * In tex/gjrank.tex (Rank and the Gauss-Jordan), extended and generally clarified the subsection "The impossibility of identity-matrix promotion." * In tex/eigen.tex (The eigenvalue), "The eigenvalue itself," explained why a characteristic polynomial cannot lack full order. * In tex/fours.tex (The Fourier series): + Slightly improved the style of the chapter's introduction. + Substantially rewrote "Derivation of the Fourier-coefficient formula." * Continued, generally revised and completed the draft of the new chapter tex/fouri.tex (The Fourier transform). * Opened the new chapter tex/prob.tex (Probability). Completed a very rough draft of the chapter. * Extended tex/purec.tex (A sketch of pure complex theory) to explain why a Taylor series remains everywhere valid out the the distance of the nearest nonanalytic point. Also, removed from this appendix an unnecessary protest against unnecessity. * Consistently hyphenated well-adjectival and ill-adjectival constructs throughout the book, and consistently unhyphenated them where the constructs were well adjectival and ill adjectival, the stylistic difference (as the writer has belatedly come clearly to understand; doesn't the publisher have a competent editor on staff discreetly to teach the writer these things?!) being that the hyphenated compounds immediately modify their nouns without the mediation of a copular verb like "to be." * In tex/thb.sty, regulated the internal spacing in the four-dot ellipsis \mdots. * In tex/Makefile, caused the already implemented modification of LaTeX's 'book' class to columnize the lists of tables and figures like the contents. * In btool/fill-toc-ends and tex/main.tex: + Changed the "List of Tables" to the "List of tables," and likewise for the "List of Figures." + Caused the PDF table of contents (that is, the special table of contents which is not on any page of the book but which a PDF reader can display in a separate panel) to list the book's lists of tables and figures. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Tue, 29 Jun 2010 00:00:00 +0000 derivations (0.52.20100310) * Recognizing that recent versions of some of the software the computer uses to build the book's PostScript and PDF files have changed in operation, updated the code in btool/ as follows. + Since a recent version of Ghostscript seems to refuse to propagate the document's embedded title to the PDF, added to complete-pdf(1) a fourth command-line argument to specify a title to post-embed. Changed several *.h and *.cc files in the btool/ hierarchy to this end. + Observing that Libpoppler's Dict::add() does not copy the key-strings one feeds it (and seeming to recall that an earlier Libpoppler did copy them), rewrote the btool/ code to preserve the relevant key-strings until the Dict objects that use them go out of scope. + Moved the definition of the PDF::Iref struct from btool/PDF/PDF.h to its own, new header at btool/PDF/Iref.h. + Fixed complete-pdf(1)'s long-form options and help list. * Let tex/Makefile use the new complete-pdf(1). * Again refined the preface. * In tex/alggeo.tex (Classical algebra and geometry), "Multiplication and division of complex numbers in rectangular form," appended a sentence on real and imaginary parts by superposition. * In tex/fours.tex (The Fourier series), "The square and triangular pulses," mentioned the Gaussian pulse. * Began the new chapter tex/fouri.tex (The Fourier transform). * Deleted much of tex/greek.tex (The Greek alphabet), which was too prolix. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Wed, 10 Mar 2010 00:00:00 +0000 derivations (0.52.20090107) * Added hyphens throughout in "singular-value decomposition." * In a continuing gradual refinement, further tweaked the book's preface. (The preface is important because readers first opening the book will judge the book's style by it. This is why the preface continues to get attention, a little here and a little there.) * To comply with van der Vorst, reversed the definition of the residual throughout the book but particularly in tex/taylor.tex (The Taylor series) and tex/mtxinv.tex (Inversion and orthonormalization). * In tex/mtxinv.tex (Inversion and orthonormalization), "The Moore-Penrose pseudoinverse," corrected the inadvertently nonsensical symbolization of the squared residual norm. * Added the brief new appendix tex/purec.tex (A sketch of pure complex theory) (four pages only! it is an exceptionally brief sketch). Let other parts of the book refer to it where appropriate. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Wed, 07 Jan 2009 00:00:00 +0000 derivations (0.52.20081219) * Had tex/thb.sty use the package amssymb, which provides useful mathematical symbols including some, like \triangleq, \lesssim and \gtrsim, the book had previously tried to compose of existing elements. Substituted the newly available symbols generally into the book. * Added to tex/thb.sty the mathematical functions Sa(z), Si(z), Ci(z) and Ei(z), respectively by the new \sinarg, \sinint, \cosint and \expint macros. * In many formulas, improved the spacing about equal signs and other elements. Mostly, by substituting the {split} rather than the {eqnarray} environment, narrowed the space about equal signs where it seemed appropriate to do so. (When the author first started typesetting the book, he did not know about the {split} environment. Later he adopted it only fitfully. This change was not to eliminate {eqnarray}, which has its uses, but rather to go back and use {split} where it ought to have been used all along.) * Corrected several (but purposely not all) grammatically incorrect uses of the noun "example" to the adjectival "exemplary" and the genitival "example's." * Changed the types of several pairs of parentheses between (), [] and {}. * Renamed the book's Part III from "Advanced calculus" to "Transforms, special functions and other topics." * Feeling that the book's preface, tex/pref.tex, had much good material not woven tightly enough, reorganized the preface, separating (a) comments on the book's features (footnotes, etc.) and (b) acknowledgements into a distinct, unnumbered subsection. Acknowledged the author's parents there, boldly suggested that other authors cite the book, and otherwise modified the preface in a few minor respects. * In tex/alggeo.tex (Classical algebra and geometry), renamed the section "Notation for series sums and products" as "Integer and series notation." In the section, defined the $\in \mathbb Z$ notation. * In several spots throughout the book, explicitly stated by the $\in \mathbb Z$ notation that certain quantities were integers. * In tex/trig.tex (Trigonometry) and tex/inttx.tex (Integration techniques), extended the complex triangle sum inequality to the continuous case. * In tex/drvtv.tex (The derivative), revised the section "The binomial theorem" in several respects. * In tex/taylor.tex (The Taylor series): + In "Expanding functions in Taylor series," inserted into an existing footnote a new paragraph formalizing the footnote's justification of the Taylor series' sufficiency. + In "Taylor series for specific functions," appended a brief paragraph giving first-order approximate forms for the exponential and sine functions. + In "Geometric majorization," subscripted \rho_n, formerly styled merely as \rho. Further refined the subsection generally. + In "Calculation outside the fast convergence domain," observed explicitly that one had to have a fast means of calculating the function in question within at least one domain to apply the subsection's technique. * In tex/inttx.tex (Integration techniques): + In "Integration by closed contour," extended the complex triangle inequality to the continuous case. + In "The derivatives of a rational function," changed the index letters. * In several instances in the matrix chapters, wrote together on a single line a group of short equations each of which had previously had its own line. * In tex/vector.tex (Vector analysis): + Rewrote the latter part of the section "Orthogonal bases." + In "The parabola," changed $\alpha x^2$ to $\mu x^2$, since the chapter uses \alpha for an unrelated purpose. + Generally improved the style of some prose scattered across the section "Parabolic coordinates." * In tex/vcalc.tex (Vector calculus), renamed the subsection "The vector field as a sum of scalar fields" as "The vector field and its scalar components." * Completed tex/vcalc.tex (Vector calculus). * Added tex/fours.tex (The Fourier series) and drafted the whole chapter. * Opened a file for (and wrote a few words in) tex/fouri.tex (The Fourier and Laplace transforms). * Made the use of quotation marks more consistent in the bibliography. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Fri, 19 Dec 2008 00:00:00 +0000 derivations (0.52.20081003) * Divided the book into three parts, using the \part{} command in main.tex. Because of this division, advanced the version number to 0.52. * Also in tex/main.tex, anticipating the first edition's future release, added a (presently) commented-out "first printing" line for the bottom of the title page's reverse side. * Reconfigured the several Makefiles in the btool/ directory not to die on C++ compiler warnings when the BUILD_FOR_PACKAGING environment variable is set. * In btool/TOC/Table.cc, told the PDF table-of-contents code to ignore the book's new \part{} parts. (At present, the code mishandles the parts. Maybe this can be fixed at a future date but, in any case, to ignore the parts may be the right thing to do in any case.) * Added to tex/thb.sty the abbreviating macro \pl for \partial. * In tex/pref.tex (Preface): + Added some comments to show changes the author should make to the file before the book's actual, first-editional release. + Removed an unnecessary sentence from the end of the paragraph on the book's plan. + Extended and polished the remarks regarding the book's index. * In tex/trig.tex (Trigonometry): + In "Rotation," substituted \phi for \psi as the rotation angle, so as not to conflict with the Tait-Bryan \psi newly treated in tex/vector.tex (Vector analysis). Added a reference to the latter chapter. + In "Cylindrical and spherical coordinates," added several identities. + Rendered "The complex triangle inequalities" more precise. * In tex/drvtv.tex (The derivative), added the new subsection "A derivative product pattern." * In tex/integ.tex (The integral), changed several incorrect instances of various forms of the verb "to transit" to "to commute," including in one section title. * In tex/taylor.tex (The Taylor series), "Error bounds," and elsewhere in the book, let the symbols S_n and R_n consistently stand for the partial sum and remainder of a series truncated before rather than after the nth-order term. + In tex/mtxinv.tex (Inversion and orthonormalization), added the new section "The complex vector triangle inequalities." + Divided the chapter tex/vector.tex (Vector analysis) into a relatively short chapter with the same name and filename and the longer, new chapter tex/vcalc.tex (Vector calculus). Revised and completed tex/vector.tex. Revised and continued tex/vcalc.tex through the end of the subsection "Derivatives in spherical coordinates." + Altered the short form of the title of the appendix tex/hex.tex to read "Hexadecimal notation, et al." + Reworked a tract of tex/hist.tex (Manuscript history) to remove, shorten or de-emphasize comparisons against some other authors like Weisstein. Rewove some of the surrounding text in the process. * Changed some instances of the words "to the extent that" to "to the extent to which." * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Fri, 03 Oct 2008 00:00:00 +0000 derivations (0.51.20080419) * In btool/Makefile and btool/Makefile-subdir, reversed $(filter) to $(filter-out) to enable "make clean [program]" as a single command to work as expected. Also told the `cleanless:' target to remake any %.d files it removes when (and only when) the makefile itself has $(include)d them, so that "make clean [program]" not mysteriously seem to have used invisible %.d files. * Revised and continued tex/vector.tex (Vector analysis). Added the new section "Integral forms." * In tex/stub.tex (Plan), reported the recent download rate of 4000 copies of the book per year. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Sat, 19 Apr 2008 00:00:00 +0000 derivations (0.51.20080324) * (Debian) Removed an unnecessary, malformed groff comment from the man page doc/derivations.7. * Added a specific `derivations.ps:' target to tex/Makefile, including there a sed(1) command to change the internally embedded PostScript document title, hence also indirectly the internally embedded PDF document title, to the author's name, rather than the cryptic default "derivations.dvi" it had been. * Replaced tex/use.tex and tex/def.tex with the new tex/thb.sty, a proper LaTeX style file. Told tex/main.tex to use it and tex/Makefile to rely on it. * In tex/trig.tex (Trigonometry) and tex/vector.tex (Vector analysis), disambiguated the amplitude of a vector (generally a complex scalar) from the magnitude of a vector (a positive, real scalar). * In tex/drvtv.tex (The derivative), "The Leibnitz notation," defined the partial derivative, thus correcting an oversight. * Revised and continued tex/vector.tex (Vector analysis). * In tex/stub.tex (Plan), appended to the plan the proposition of a chapter "Information theory." Changed the name of the proposed chapter "Kepler's laws" to "Differential geometry and Kepler's laws." * Deleted altogether from the source several disused passages already commented out. * Edited the rest of the book in further, minor ways. * Noted for historical interest that the book had 488 pages, with fourteen chapters and a fifteenth partly typeset, plus appendices, a preface and other such auxiliary parts. -- Thaddeus H. Black Mon, 24 Mar 2008 00:00:00 +0000 derivations (0.51.20080307) * Because the book had begun to progress beyond the matrix chapters, advanced the version number to 0.51. (To a Debian developer, version 0.9 of something is followed by version 0.10, but such numbering naturally can confuse people who unlike the author happen not to be Debian developers. And even in the Debian world, version 0.9 implies "almost complete," which this book is not. Since even to a Debian developer 51 is unambiguously greater than 5, there seems little reason not to conserve the number line between 0.50 and 0.90 by adding the extra digit. The only version number that has a real, semantical meaning, anyway, is 1.0; as for the rest, so long as each number is greater than the last, the exact choice of numbers does not matter much.) * In tex/Makefile, revised the $(gs) macro to create the empty file tex/cidfmap when needed, and revised the `cleanless' rule to remove the same file. Consequently, removed the empty file tex/cidfmap from the distributed source. * In tex/README, noted that index entries can run 9.0 points too wide without unduly ill effect. * Began to typeset the new chapter tex/vector.tex (Vector analysis), up through its section "Notation." * In tex/cexp.tex (The complex exponential,) "Complex trigonometrics," explained why the Pythagorean theorem for trigonometrics does not involve a complex conjugate as the Pythagorean theorem for vectors does. * In tex/taylor.tex (The Taylor series), slightly revised "Calculation outside the fast convergence domain." * In tex/matrix.tex (The matrix), "The Kronecker delta," added a reference to the new section "The Kronecker delta and the Levi-Civita epsilon" of tex/vector.tex (Vector analysis). * Reformed stub.tex (now titled ``Plan'') into a proper plan for the rest of the book. * Revised and extended tex/greek.tex (The Greek alphabet) generally. * In the appendix tex/hist.tex (Manuscript history), extended the remarks concerning Wikipedia. * Throughout the book, in the notation for the alternate, x- and y-oriented cylindrical and spherical bases, lifted the subscripted "x" and "y" up to the superscript position. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Fri, 07 Mar 2008 00:00:00 +0000 derivations (0.5.20071222) * Integrating patches from Debian downstream, edited the btool/ code in several spots, altering no semantics but eliminating warnings by newer G++ compilers, mostly regarding a C string literal's type. (Maybe the warnings show deficiencies in the libpoppler headers rather than in the btool/ code, but, anyway, the change eliminates the warnings.) * In btool/Page_no/PS_page_numbering.cc, added a missing initializer to int n_page_whole. * Generally reviewed, slightly revised and where necessary corrected tex/mtxinv.tex (Inversion and orthonormalization). Notably, in "The Gram-Schmidt kernel formula," corrected the algorithm's handling of indices. * Generally reviewed, slightly revised and where necessary corrected tex/eigen.tex (The eigenvalue). * In tex.eigen.tex (The eigenvalue), rewrote "The singular value decomposition." * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Sat, 22 Dec 2007 00:00:00 +0000 derivations (0.5.20071110) * In tex/eigen.tex (The eigenvalue): + In "The nondiagonalizable matrix," explained in a footnote why the Schur U_S has the same multiplicities of eigenvalues as does the A it comes from. (Maybe this belongs in the body of the text, but it does not flow well there as the body currently is written.) + In "The singular value decomposition," explained the step to $I_n\Sigma^{*}\Sigma I_n$ more thoroughly. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Sat, 10 Nov 2007 00:00:00 +0000 derivations (0.5.20071108) * In btool/fill-toc-ends, tex/main.tex and tex/Makefile, corrected the PDF table-of-contents regimen to create and use a new page-number file tex/main.pgn at build time, rather than trying to glean the needed page numbers unreliably from the *.log file. * Corrected "this is because" to "this is so because" throughout the manuscript. * Gave Richard Courant and David Hilbert their full names in the narrative and bibliography. * In tex/drvtv.tex (The derivative), "The Newton-Raphson iteration," slightly improved the form of the iteration formula, letting the specification apply to the formula's entire right side; and referred to the new Newton-Raphson section in tex/mtxinv.tex (Inversion and orthonormalization). * In tex/taylor.tex (The Taylor series): + Corrected the title of "The power series expansion of 1/(1-z)^{n+1}" to "The power-series expansion...." + Added the new "Extrema over a complex domain." + Generalized and rewrote "Calculation outside the fast convergence domain." * Throughout the matrix chapters, where displayed matrices were stacked, either split the stack with appropriate verbiage or combined the stack into a single LaTeX equation environment of the appropriate kind, to prevent LaTeX from splitting the stack at a page boundary. (This is not a wholly satisfactory solution. The stacks are tall. They leave some pages pretty empty. Keeping the stack together however seems preferable to beginning a page with a displayed equation.) * In tex/matrix.tex (The matrix): + In "The general scaling operator," introduced the "diag" notation. + In "The unit triangular matrix," explained and symbolized the Schur-form triangular matrices better. + Added "The Jacobian derivative." * In tex/gjrank.tex (Rank and the Gauss-Jordan), at the close of the section "The Gauss-Jordan" decomposition, warned the reader to expect also the Gram-Schmidt and Schur decompositions in the following chapters. * Extended, generally reorganized and significantly rewrote the two chapters tex/mtxinv.tex (Inversion and orthonormalization) and tex/eigen.tex (The eigenvalue). Even moved some material from one chapter to the other. Retitled the former chapter, whose old name was "Matrix inversion." * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Thu, 08 Nov 2007 00:00:00 +0000 derivations (0.5.20071015) * Prepared this revision as an editing checkpoint for internal use only, not for release. Did not release it. (The reason to do such a thing is that the author habitually uses diff(1) to check the changes each new revision brings against the revision before it. This is hard to do when the chapter/file structure of the source changes from one revision to the next. This revision more or less isolates changes that disrupt the author's use of diff(1).) * Cancelled the planned chapter tex/mtxalg.tex (Matrix algebra), moving its section "The pseudoinverse" to the end of the newly constituted tex/mtxinv.tex (Matrix inversion). * Split the existing chapter tex/eigen.tex (Inversion and eigenvalue) in two: tex/mtxinv.tex (Matrix inversion); tex/eigen.tex (The eigenvalue). Planned to add to these chapters some of the material earlier planned for the now canceled "Matrix algebra" chapter. * Conventionalized the book's terminology to read "unit triangular matrix" where it had read "triangular matrix." Added the standard definition of the "triangular matrix," and added definitions of the "strictly triangular matrix" and the "diagonal matrix" for good measure. * Conventionalized the book's terminology to read "Gauss-Jordan (LU) decomposition," "orthonormalizing (QR) decomposition," etc., rather than "factorization" in each case. * Abandoned the archaic notation |A| for the determinant, converting generally to "det A". * Noted here for general interest is that the fact that a particular revision of the book is not released does not make the changelog for that revision irrelevant to readers who follow the changelog file. On the contrary, the changes in an unreleased revision normally carry through to future released revisions. Their changelogs are relevant for this reason. -- Thaddeus H. Black Mon, 15 Oct 2007 00:00:00 +0000 derivations (0.5.20071011) * After changing the several items listed below, found the overall result unsatisfactory. Closed the revision as it stood but did not release it. Thought for the next revision to cancel the old plan for the fourth chapter of the matrix sequence, to have been titled "Matrix algebra"; and instead to split the third chapter of the sequence, tex/eigen.tex (Inversion and eigenvalue), in two, distributing some of the formerly planned "Matrix algebra" material between the two. * In tex/Makefile: + Added a long comment to guide a hypothetical future maintainer of the book. + As an alternative to shipping a modified book.cls with the source, caused the book's build procedure to copy and modify LaTeX's standard book.cls locally. (The resulting modified class file is named derivations-book.cls, but this class file is generated only at build time. It cannot be found among the source files, because almost all its content is already present in the LaTeX installation required to build the book, anyway. It can however be generated from the top source directory by "make tex/derivations-book.cls" whenever it is wanted.) * In tex/main.tex, changed(!) to \documentclass{derivations-book}. * In tex/matrix.tex (The matrix): + Gave C^{-1} = 2I - C identities to some of the addition operators for which they hold. + Introduced the new table "Properties of the parallel triangular matrix." + Added shift-and-truncate identities to "The shift operator." * In tex/gjrank.tex (Rank and the Gauss-Jordan): + Added "Properties of the factors." + Added "Factoring an extended operator." + Added "Under- and overdetermined systems (introduction)." * Revised tex/eigen.tex (Inversion and eigenvalue) generally. * Started the first few words of a new chapter tex/mtxalg.tex (Matrix algebra), but see the first changelog item in this entry above. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Thu, 11 Oct 2007 00:00:00 +0000 derivations (0.5.20070926) * Moved the list of figures to after the list of tables. * Reread the rest of the manuscript, improving, correcting and supplying as in the last changelog. * Introduced the shorthand matrix notations A^{-T} and A^{-*}. * In tex/pref.tex (Preface), acknowledged G.S. Brown and noted that the book does not number theorems. * In tex/matrix.tex (The matrix): + In the introduction to the section "Dimensionality and matrix forms," acknowledged that the section's formalisms are nonstandard. + In "The rank-r identity matrix," promoted a footnote on I_3's infinite dimensionality to an in-line parenthetical note. + In "The scaling quasielementary or general scaling operator," gave a summational decomposition of D. * In tex/gjrank.tex (Rank and the Gauss-Jordan): + Eliminated several confusing, unnecessary transpositions. + Gave definitions of full row and column rank in "The full-rank matrix." + Added the new subsection "Full column rank and the Gauss-Jordan factors K and S." * In tex/eigen.tex (Inversion and eigenvalue): + Repaired and rewrote "Inverting the square matrix." + Integrated some motivational narrative into "The exactly determined linear system." + In "The eigenvector," defined the term "eigensolution." + Appended the new "Matrix condition." * Generally in the matrix chapters, changed C_> and C_< to G_> and G_<; and changed the generic notation for the full-rank factorization from XY to BC. * In tex/def.tex, added the new macro \mdots. * Scanned the LaTeX source for obsolete commented-out markup. Deleted such. -- Thaddeus H. Black Wed, 26 Sep 2007 00:00:00 +0000 derivations (0.5.20070921) * Added an explicit "all:" target to the root Makefile. * Reread the manuscript through the end of tex/cubic.tex (Cubics and quartics) and reread the appendices, improving the manuscript's style in numerous small instances, correcting a few minor errors, and supplying some relatively insubstantial omissions along the way. * Unsure of the number, stopped trying to count "the [four] most famous results in all of mathematics." * In tex/alggeo.tex (Classical algebra and geometry): + In "Properties of the logarithm," added two properties to the table explicitly expressing the inverse relationship between exponentiation and the logarithmic operation. Explained the derivations of two of the properties already in the table more thoroughly. + In "Multiplication and division of complex numbers in rectangular form," added an equation about (z^*)(z). + In "Complex conjugation," linked the reverse induction logic more explicitly, and put the observation that the product of conjugates is the conjugate of the product in symbolic form. * In tex/trig.tex (Trigonometry), partly rewrote the start of the "Definitions" section. * In tex/drvtv.tex (The derivative), "L'Hospital's rule," pointed out that one can apply the rule recursively, and otherwise generally explained things better. * In tex/cexp.tex (The complex exponential): + Removed the unnecessary "+" signs from the several "lim_{z->0^+}". + In "The natural logarithm," wrote briefly about converting from other bases. + In "The actuality of complex quantities," extended the remarks on Occam's razor." * In tex/noth.tex (Primes, roots and averages), "Compositional uniqueness," covered prime C/r. * In tex/taylor.tex (The Taylor series): + In "Shifting a power series' expansion point" and "Analytic continuation," treated more carefully the fact that the new expansion point of a directly shifted series must lie within the original series' convergence domain. + In "Error bounds, examples," subscripted R. * In tex/inttx.tex (Integration techniques): + Rewrote the latter half of the chapter's introduction. + Renamed "Integrating rational functions" to "Integrating a rational function." + Rewrote the subsection "Multiple poles (the conventional technique)," splitting it in three: "The derivatives of a rational function"; "Repeated poles (the conventional technique)"; and "The existence and uniqueness of solutions." Gave a different derivation, since the old one was wrong. + Generally renamed the "multiple pole" to the "repeated pole," including in (sub)section titles. * Having partly neglected to do so for the last release, properly spell-checked this release. -- Thaddeus H. Black Fri, 21 Sep 2007 00:00:00 +0000 derivations (0.5.20070912) * Added the new build tool btool/complete-pdf, along with its C++ source, which tool gives proper page numbering and a PDF table of contents to the book's PDF. Updated the Makefile accordingly. * Corrected some entries in the bibliography. * In tex/main.tex, issued \cleardoublepage commands before and after \makeindex, to cause the index to begin on an odd page and end on an even, as do all the book's other chapter-level elements. -- Thaddeus H. Black Wed, 12 Sep 2007 00:00:00 +0000 derivations (0.5.20070703) * After considerable study and experimentation, found the proposed transition to PGF and PDFLaTeX too hard for now. (PGF/TikZ seems to break at least LaTeX's $\:$ operator, but the real reason not to change over is that the manuscript is now too long; the changeover would take too much effort. It seems preferable eventually to hack correct PDF page numbers and a PDF table of contents within the framework of the existing document source. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Tue, 03 Jul 2007 00:00:00 +0000 derivations (0.5.20070524) * Did not actually release this version, meant only to mark internal development. However, recorded changes here anyway because the changes must accumulate toward the next version actually released. * Updated the author's e-mail address. * In tex/def.tex, defaulted to the newer, correct \scalebox usage rather than to the older, incorrect usage. (Left the older usage in the file, only commented out, for those who still need it.) * Edited the rest of the book in further, minor ways. * Wanted to migrate toward building the book by PDFLaTeX, but observed that this was easier said than done. Noted that the book's many PSTricks diagrams regrettably seem not to carry gracefully over to PDFLaTeX, leaving the author with a dilemma: to migrate the diagrams to a compatible format like PGF, or to forego PDFLaTeX? Thought that going to PDFLaTeX was probably a necessary step. Reached no decision, but leaned toward the PGF solution. Feared that translating the diagram sources by hand would take many hours, though. (Noted that there exists a PDFTricks package, but that the package seems inadequate to the purpose.) -- Thaddeus H. Black Thu, 24 May 2007 00:00:00 +0000 derivations (0.5.20070327) * In tex/alggeo.tex (Classical algebra and geometry): - Added the new subsection "Dividing power series by matching coefficients." - In the subsection "Properties of the logarithm" and also in its table, added the property that w^p == a^(p*log_a(w)). * In tex/drvtv.tex (The derivative), in the "l'Hospital's rule" section, corrected several instances of "singularity" to "pole." Slightly revised the surrounding text accordingly. * In tex/cexp.tex (The complex exponential), added more information on the hyperbolic functions. * In tex/taylor.tex (The Taylor series): - Added Taylor series for sinh and cosh to the Taylor-series table. - Added the new section "Trigonometric poles." * In tex/inttx.tex (Integration techniques), added the new section "Integrating products of exponentials, powers and logarithms," including a short table of relevant antiderivatives. * Realizing that the book's existing citation style burdened the text, lightened the style by shifting most of the citations to footnotes. * In the book's index, added the birth and death dates of most of the mathematicians named there, mostly per Wikipedia. * Improved the typesetting of the magnitude sign |z|. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Tue, 27 Mar 2007 00:00:00 +0000 derivations (0.5.20070322) * Linked the first three of the four matrix chapters officially into the book for the first time: - tex/matrix.tex (The matrix); - tex/gjrank.tex (Rank and the Gauss-Jordan); - tex/eigen.tex (Inversion and eigenvalue). Observed that these chapters did not just suddenly appear, but had been in various drafts for years. Did not yet link the fourth matrix chapter tex/mtxalg.tex (Matrix algebra). * In tex/drvtv.tex (The derivative), expanded a footnote in the "Summational and integrodifferential transitivity" subsection into a more carefully turned warning against conditional convergence, in the main narrative. * In tex/def.tex, added the new typesetting macro \mfd to render a matrix's determinant. * Replaced several instances of the word "while" by the less ambiguous word "whereas." * Edited the rest of the book in further, minor ways. * Noted for historical interest that the book had 372 pages, with thirteen chapters plus appendices, a preface and other such auxiliary parts. -- Thaddeus H. Black Thu, 22 Mar 2007 00:00:00 +0000 derivations (0.5.20070307) * Did not actually release this version, meant only to mark internal development. However, recorded changes here anyway because the changes must accumulate toward the next version actually released. * In tex/intro.tex (Introduction), generally revised the "Complex numbers and complex variables" section. * In tex/alggeo.tex (Classical algebra and geometry), extended a footnote to treat the various connotations of the terms "polynomial," "power series" and "Laurent series" in professional and applied usage. * In tex/drvtv.tex (The derivative), in the section "The derivative product rule," added the two-element quotient form of the rule. * In tex/taylor.tex (The Taylor series): - In the "Expanding functions in Taylor series" section, defined the term "Maclaurin series." - Rewrote and extended the latter part of the "Analytic continuation" section. - Added the new "Entire and meromorphic functions" section. - Corrected a gruesome error in the subsection "Enclosing a multiple pole," where the book had previously, quite falsely, found every complex contour integral about a multiple pole to come to zero. - Added the new "Taylor series in 1/z" section. * In tex/inttx.tex (Integration techniques): - In the "Partial-fraction expansion" subsection, corrected the limits of summation on the numerator polynomial in the unexpanded form. - Added the new "Multiple poles (the conventional technique)" subsection. * In tex/bib.bib (bibliography), corrected the date of Dr. Kohler's lecture. * Observing that the English verb "to prove" declines in a strange way, changed several (but not all) instances of "proven" to the semantically equivalent alternative participle "proved," depending on the rhythm of the sentence in which the word appears. * Edited the rest of the book in further, minor ways. * Further worked on the four matrix chapters, but did not yet permanently link them into the book. Brought tex/gjrank.tex (Rank and the Gauss-Jordan) to a first fully coherent draft. -- Thaddeus H. Black Wed, 07 Mar 2007 00:00:00 +0000 derivations (0.5.20070219) * Added the author's year of birth and his estimate of the book's U.S. Library of Congress class to the back of the title page. * In tex/cubic.tex (Cubics and quartics), in the narrative of the "Guessing the roots" section, corrected a_0q to a_0q^n. * Edited the rest of the book in further, minor ways. * Finally worked tex/matrix.tex (The Matrix) into a usable form. Edited it down to 38 pages, which are still a lot but are a marked improvement against earlier drafts. Cautiously refrained nevertheless, for the moment, from linking the chapter into the book proper, on the ground that the chapter alone is of limited interest without the (presently) three additional matrix chapters which follow. Hoped that, with this foundational matrix chapter finally in good order, the other three matrix chapters (already partly written) would prove easier to complete than this one has done. (Will they? We shall see. On the other hand, the author has had limited time available to work on the book in recent months in any case, which is the main reason the Matrix chapter has taken so long to write.) -- Thaddeus H. Black Mon, 19 Feb 2007 00:00:00 +0000 derivations (0.5.20070217) * Rendered the citation style more consistent throughout the book. (Remained somewhat dissatisfied with the style, however. It's too bulky.) * Generally changed the default product dummy variable from k to j. Retained k for summations. Also in the (not yet included) matrix chapters, retained k even for products, inasmuch as j serves a conflicting role there. * Promoted Cauchy's integral formula to the rank of one of the, now four, most famous results in all of mathematics. * In tex/alggeo.tex (Classical algebra and geometry), in the table "Power properties and definitions," corrected the condition z >= 0 to n >= 0. * In tex/drvtv.tex (The derivative), in the "Derivative product rule" subsection, extended the specialized form to handle logarithms. * In tex/inttx.tex (Integration techniques), in the "Integration by closed contour" section, added a second example which, unlike the first, does not extend the contour. * Edited the rest of the book in further, minor ways. * Further worked on the four matrix chapters, but did not yet link them into the book. -- Thaddeus H. Black Sat, 17 Feb 2007 00:00:00 +0000 derivations (0.5.20070213) * In tex/taylor.tex (The Taylor series), numbered the formerly unnumbered first equation of the section "The Laurent series." * Did not otherwise edit the book, but noted here that the last changelog forgot to mention that additional, minor edits were made. -- Thaddeus H. Black Tue, 13 Feb 2007 00:00:00 +0000 derivations (0.5.20070212) * (Susan M. Doyle) Corrected several instances of incorrect punctuation throughout the book, particularly with respect to the use of ellipses. * Further refined the book's preface. * (Susan M. Doyle) In tex/intro.tex (Introduction), edited the header of the section "Rigor" to make the language there less strained. * In tex/cexp.tex (The complex exponential), extended the section "Fast and slow functions" to lend the reader a better subjective sense of the manner in which logarithms and exponentials diverge, and to observe the fundamental zeroth-order resemblance between logarithms and constants. * In tex/integ.tex (The integral), renamed the section "Checking integrations" to "Checking an integration." * In tex/taylor.tex (The Taylor series): - In the subsection "Convergence," extended a footnote to excuse the book from the (for it) unnecessary rigor of the Weierstrass M test. - Added the new section "The Laurent series." * In tex/inttx.tex (Integration techniques), added the new section "Frullani's integral." * In tex/cubic.tex (Cubics and quartics), corrected the quartic table to handle the degenerate case K = 0. -- Thaddeus H. Black Mon, 12 Feb 2007 00:00:00 +0000 derivations (0.5.20070206) * Moved two philosophical paragraphs on convention from tex/pref.tex (Preface) to tex/hist.tex (Manuscript History). Generally revised the preface again. * In tex/taylor.tex (The Taylor series): - Revised the chapter generally, rendering it significantly easier for the reader to follow. - Corrected the Taylor series of the natural logarithm. - Added the new subsection "Enclosing a multiple pole". - Renamed the "Bounds" section to "Error bounds," and rewrote it. - Added the new section "Odd and even functions." * Further worked on the four matrix chapters, but did not yet link them into the book. * Slightly revised the outline of future work in tex/stub.tex (To be written). * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Tue, 06 Feb 2007 00:00:00 +0000 derivations (0.5.20061214) * Introduced the new file tex/use.tex, to include \usepackage{} commands formerly in tex/main.tex. * Expanded the club of "the most famous results in all of mathematics" from two to three, adding the fundamental theorem of calculus to the Pythagorean theorem and Euler's formula. Revised the text accordingly. * Renamed the "principal angles" to the "hour angles." * Rewrote much of the book's preface, in tex/pref.tex. * In tex/drvtv.tex (The derivative), improved and expanded the section on the Newton-Raphson iteration. * Added the short but complete new chapter tex/cubic.tex (Cubics and quartics), between tex/inttx.tex (Integration techniques) and tex/matrix.tex (The matrix). * In tex/alggeo.tex (Classical algebra and geometry): - Ended the "Quadratics" section with a reference to the new chapter tex/cubic.tex (Cubics and quartics). - In "Dividing power series," changed a few of the symbols slightly and added an early note excusing some readers from reading the rest of the subsection. - In "Indeterminate constants, independent variables and dependent variables," added a parenthesized note on dummy variables. - Generally revised, clarified and improved the "Complex conjugation" subsection. * In tex/trig.tex (Trigonometry), rewrote "Scalars, vectors and vector notation." * In tex/drvtv.tex (The derivative), in the "Powers of numbers near unity" subsection, disambiguated two separate symbols formerly both represented as \epsilon. * In tex/cexp.tex (The Complex Exponential): - Footnoted an alternate derivation of Euler's formula. - In the "Euler's formula" section, restored a missing `i' to several exponents. * In tex/taylor.tex (The Taylor Series): - In "Integrating along the contour," corrected several typos in the various coefficients and integration limits, and added a footnote clarifying a point of rigor involving convergence domains. - In "Taylor series for specific functions," extended the convergence domains for exp, sin and cos. - Added the new section "Bounds." * Split the three draft matrix chapters again, now into four chapters: - tex/matrix.tex (The matrix); - tex/gjrank.tex (Rank and the Gauss-Jordan); - tex/eigen.tex (Inversion and eigenvalue); - tex/mtxalg.tex (Matrix algebra). Observed that these chapters were certainly not yet ready for release, and in fact that they seemed likely to need to be reorganized and/or renamed again before release. * Excluded the four matrix chapters from the book (though included the four source files with the book's source). In their place, added a stub chapter tex/stub.tex. * In tex/hist.tex (Manuscript history), expanded and partly rewrote the paragraph about encyclopedic works. * Edited the rest of the book in further, minor ways. -- Thaddeus H. Black Thu, 14 Dec 2006 00:00:00 +0000 derivations (0.5.20061014) * Did not actually release this version. (The author's spare time to write the book, always limited, has been scanter than usual lately; but still the work creeps forward some evenings, half an hour here, an hour there. The author did feel pretty good about the last version released, as far as the book had then gone, so is not now anxious to imbalance the work by rushing additional chapters to it. Under such conditions---when the author's writing time is fragmented---unreleased working versions, documented here, help keep the author's thoughts in order.) * Further generally revised the as yet immature tex/matrix.tex (The matrix). * Added a paragraph on footnotes to and otherwise slightly revised tex/pref.tex (Preface). Also added, but for the time being commented out, a future draft alternative to the preface's "work in progress" paragraph. * Separated the last part of the section "L'Hopital's rule" of tex/drvtv.tex (The derivative). Moved it to tex/cexp.tex (The complex exponential) as the new section "Fast and slow functions." * In the "Serial and parallel addition" subsection of tex/noth.tex (Primes, roots and averages), defined the word "iff." (If memory serves, the word had once been defined earlier in the book, but redaction since then evidently has moved or removed the earlier definition.) * Noted the irony: sometimes, at least in some ways, the smaller the change a version represents, the longer the changelog. For sweeping changes, one tends merely to log, "sweeping changes." Odd, yes, but sometimes true nonetheless. * Renamed tex/invdet.tex (Matrix inversion and the determinant) to tex/eigen.tex (Inversion and eigenvalue). Began typesetting the chapter, taking it up through the first page or two of the eigenvalue treatment, but then lost compositional coherence---lost the thread of the narrative, as it were. Observed that the eigenvalue treatment did not flow sufficiently cleanly from what had gone before, indicating a need for some changes in what had gone before. Halted for a time to mull the matter, logging this internal version to document the stopping point. -- Thaddeus H. Black Sat, 14 Oct 2006 00:00:00 +0000 derivations (0.5.20060926) * Did not actually release this version. * Linked into the book the three new chapters - tex/matrix.tex (The matrix), - tex/invdet.tex (Matrix inversion and the determinant), and - tex/mtxalg.tex (Matrix algebra). Of the three, tentatively finished the first, but did not truly start the other two. * Because the book had begun to progress beyond its basic first nine chapters, advanced the version number to 0.5 (arguably the number could have been 1.1 rather than 0.5, but as decided in the derivations 0.4.20060804 changelog below, let us not rush that). -- Thaddeus H. Black Tue, 26 Sep 2006 00:00:00 +0000 derivations (0.4.20060920) * (Andreas Jochens and Julian Gilbey) Replaced all instances of the LaTeX \scalebox macro with the new \localscalebox, defined in tex/def.tex, thus allowing the user building from source to select teTeX-2 or teTeX-3 \scalebox syntax as appropriate. Thanked Andreas Jochens for identifying the bug and Julian Gilbey for fixing it. * Identified and removed from the book source many commented-out passages of text never likely to be used again. * In tex/alggeo.tex (Classical algebra and geometry): - Explicitly pointed out that 0! == 1. - Better motivated the long division procedure. Appended thereto an explanation of why remainders have order one less than divisors. - Extended the title of the subsection "Common power-series quotients" with "... and the geometric series." - Added the new subsection "Variations on the geometric series." - Appended to the discussion of roots and poles a brief parenthetical note on branch points. - Slightly expanded the narrative in the section "Complex numbers (introduction)." * In tex/trig.tex (Trigonometry), changed the angle symbol beta to psi in the "Rotation" section, to avoid confusion with the beta of the following section. * In tex/cexp.tex (The complex exponential): - Rewrote the discussion of the bound on the constant e. Added plots of the natural exponential and the natural logarithm. - Replaced the analogy of the mirrored portrait with an analogy of reflected handwriting. - Corrected the several "d/dz" to "d/dw" in the table "Derivatives of the inverse trigonometrics." * In tex/noth.tex (Primes, roots and averages): - Replaced the entire convoluted contents of the "Polynomial roots" subsection with a simpler, much briefer argument to the same effect. - Slightly shortened and clarified the paragraph on finding roots and the Newton-Raphson. * In tex/integ.tex (The integral), - Reorganized and partly rewrote the chapter's introduction and first section. Added a figure on the trapezoid rule. - Added the new section "Checking integrations." - Slightly revised the section "Remarks (and exercises)." * In tex/taylor.tex (The Taylor series): - Slightly reworded some parts of the section "Shifting a power series' expansion point." - In the section "The multidimensional Taylor series," explicitly displayed a few more instances of the k-vector. * In tex/inttx.tex (Integration techniques): - Clarified the integration checks. - Expanded the "Integration by parts" section. - Added a little more explanation to the end of the "Method of unknown coefficients" section. - In the section "Integration by closed contour," explained why I_1 + I_3 != 0. Expanded and clarified the section. - In the section "Integration by partial-fraction expansion," replaced a confusing limit with a more precise ratio of equations. * Reduced the type size in the book's index. * Edited the rest of the book in further, minor ways. * Split the two unlinked rough-draft chapters reluctantly into three, with the new, middle one being tex/invdet.tex, tentatively entitled "Matrix inversion and the determinant." (The chapter tex/matrix.tex, "The matrix," had grown too long comfortably to accommodate further material.) * Below in this changelog, corrected the incorrect former version number (0.1.19891206) to the correct number (0.1.19890714). -- Thaddeus H. Black Wed, 20 Sep 2006 00:00:00 +0000 derivations (0.4.20060804) * Advanced the version number to 0.4, observing that the typeset manuscript had grown into a concise but entire book. (The previous revision might have been 0.4, too; but the author did not want to rush the revolution. However, the time has come. Many more chapters remain to be added to the book, but the book's basic content is all there now. Arguably, this revision could even be numbered 1.0, but, well, let us not rush that. The number 1.0 can afford to wait many years if we want. The 0.4 will do for the time being.) * Started a new practice, appropriate now to the manuscript's rising overall maturity level, of not linking new, immature material into the actual book. Noted therefore that the tex/ directory can now contain source files which the actual book does not yet include. Advised developers and others reading the source to review tex/main.tex to see which material the book does officially include. * Began the new chapters tex/matrix.tex and tex/mtxalg.tex---"The Matrix" and "Matrix Algebra," respectively---but in LaTeX source form only. Did not yet link them into the actual book. * Throughout the LaTeX source, to improve the quality of the typesetting, added TeX tilde ~ ties as practical and appropriate to attach in-line variable names and short factors to the preceding words, thus preventing most lines of narrative text from beginning with a variable name. * Rationalized and explained the <- change-of-variable / assignment notation. Corrected several instances where the notation had improperly been applied in reverse. * In tex/pref.tex (Preface), further slightly refined the rhetoric. * In tex/intro.tex (Introduction): - Added a pertinent 1924 quote of R. Courant and D. Hilbert. - Deleted unnecessary verbiage from the "Complex numbers and complex variables" section. * In tex/alggeo.tex (Classical algebra and geometry): - Added the new subsection "Negative numbers." - Added the new subsection "The change of variable." - Added the new section "The arithmetic series." * Edited most of the chapters and other parts of the book in further, minor ways. * Warmly welcomed Karl Sarnow and Xplora Knoppix [http://www.xplora.org/downloads/Knoppix] aboard as a distributor of the book. -- Thaddeus H. Black Fri, 04 Aug 2006 00:00:00 +0000 derivations (0.3.20060607) * Reviewed the book line by line, finding and correcting many deficiencies. * Added the new file tex/sphere.tex, providing not a chapter but rather a detailed figure of the sphere and its parts, to be included in the book where needed. * In tex/pref.tex (Preface), reworded one sentence slightly. * In tex/intro.tex (Introduction), added the new section "Complex numbers and complex variables." * In tex/alggeo.tex (Classical algebra and geometry): - Added the new subsection "Negative numbers." - Improved the explanation of long division. - Distinguished poles from other singularities. * In tex/trig.tex (Trigonometry): - Added a plot of the sine function. - Condensed and clarified the formerly ponderous calculation of the trigonometrics of the principal angles. - Added the new section "Cylindrical and spherical coordinates." * In tex/drvtv.tex (The derivative): - Improved the introduction to combinatorics. - Added the new subsection "Complex powers of numbers near unity." - Contrasted the unbalanced against the balanced definition of the derivative. - Added the new subsection "The derivative of a function of a complex variable." - Expanded and slightly renamed the subsection "The derivative of z^a." - Added the new subsection "The logarithmic derivative." * In tex/cexp.tex (The complex exponential): - Sharpened the development of Euler's formula. - Refined the discussion of Occam's razor and the modeling of wave phenomena by complex functions. * In tex/noth.tex (Primes, roots and averages), oriented the reader subjectively to parallel addition. * In tex/integ.tex (The integral): - Rendered more precise the presentation of the basic integral concept. - Correcting an oversight, added a balanced definition of the integral to match the balanced definition of the derivative of tex/drvtv.tex. - Clarified the antiderivative. Presented the fundamental theorem of calculus more explicitly. - Added the new section "Areas and volumes." - Correcting an oversight, properly introduced the contour integration symbol. - Added the new section "Discontinuities," presenting the Heaviside unit step and the Dirac delta. * Substantially rewrote tex/taylor.tex (The Taylor series). * In tex/inttx.tex (Integration techniques): - Rewrote the chapter's introduction. - Highlighted the usefulness of (d/dz) ln(z) as an antiderivative, even where z is not a real, positive number. - Motivated better the method of unknown coefficients. Added a second example of the method, this time applied to a differential equation. - Cautioned the reader against enclosing branch points in Cauchy contours. - Fixed false counter indexing in the method of unknown coefficients. - Restored a missing coefficient in the pole-separation formula. - Added the new section "Integration by Taylor series." * In tex/hex.tex (Appendix: Hex and other notational matters), fixed some typos and slightly improved the prose in a few spots. * In tex/greek.tex (Appendix: The Greek alphabet), identified loop-counting variables more precisely. * In tex/main.tex, in the copyright and licensing statement, changed the word "may" to the more standard "can." * Edited the book in many lesser ways not itemized here. * Observed here that the book's first nine chapters, through tex/inttx.tex (Integration techniques), were now substantially complete; and that---although many, many interesting mathematical topics remained for the book to treat in coming years---the manuscript as it stood finally formed a concise but entire book. Noted for historical interest that the book had 212 pages, with nine chapters plus appendices, a preface and other such auxiliary parts. -- Thaddeus H. Black Wed, 07 Jun 2006 00:00:00 +0000 derivations (0.3.20060508) * In tex/alggeo.tex (Classical algebra and geometry): - (Juan Pedro Vallejo) Swapped p and q in the "Power properties and definitions" table. - Added a table summarizing general properties of the logarithm. * In tex/drvtv.tex (The derivative): - Slightly clarified the proof of l'Hopital's rule for infinite z_o. * In tex/trig.tex (Trigonometry): - Slightly refined the "Simple properties of the trigonometric functions" table. Added a new table "Further properties of the trigonometric functions." * In tex/cexp.tex (The complex exponential): - Rewrote the introduction. - Somewhat clarified the subjective motivation for Euler's formula. - Converted generally to the symbol 'w' as complement of 'z'. - Added a formula for the natural logarithm of a complex number. - Gave the hyperbolic form of the Pythagorean theorem for trigonometrics. - Added a table summarizing properties of the complex exponential. - Wrote a new section on the derivatives of the various trigonometric and inverse trigonometric functions. * In tex/noth.tex (Primes, roots and averages): - Changed the chapter's title again, this time from "Primes and roots". - Added a subsection on rational and irrational numbers. - Revised the discussion of the fundamental theorem of algebra. - Added a new section on parallel addition and averages. * In tex/integ.tex (The integral), corrected the notation in the antiderivative definition. * In tex/taylor.tex (The Taylor series): - Added a stub for a future new section, "Taylor series for specific functions." - Integrated the concept of the nonanalytic point generally into the chapter's latter parts. - Extended the section "Cauchy's integral formula," and split it into subsections. * In tex/inttx.tex (Integration techniques): - Added an explicit section on integration by antiderivative. Generally corrected notation and explanations involving antiderivatives. - Removed a fallacious example from the "Integration by substitution" section. - Added new sections on integration by closed contour and by partial-fraction expansion. * In tex/hist.tex (appendix: Manuscript history): - Changed the appendix's title from "Document history". - Refined and improved the text generally. - Added a footnote on linearity. -- Thaddeus H. Black Mon, 08 May 2006 00:00:00 +0000 derivations (0.3.20060502) * Removed a spurious non-source file from the source. * Added a period to a figure caption in tex/drvtv.tex (The derivative). -- Thaddeus H. Black Tue, 02 May 2006 00:00:00 +0000 derivations (0.3.20060501) * In tex/Makefile, replaced the old LaTeX build commands with a call to the Rubber LaTeX build system. * Added the new 'check' target to tex/Makefile. * Further polished the book's preface in tex/pref.tex. Added observations on the book's relationship to open source culture. * In tex/intro.tex (Introduction), moved distracting remarks on hexadecimal from the beginning to the end of the chapter. * In tex/alggeo.tex (Classical algebra and geometry): - Extended the discussion of multiplicative associativity. - Renamed the section "One-dimensional quadratics" to the less pretentious "Quadratics." - Added a footnote explaining the term "order", the adjective "quadratic", and related words. - Corrected and rewrote the faulty treatment of the division of power series. Added a pair of tables summarizing the long division procedure. * In tex/drvtv.tex (The derivative): - Corrected the misnaming of G.W. Leibnitz. - Further clarified the difficult discussion of the Leibnitz notation. - Added application examples to the treatment of l'Hopital's rule. * Inserted the new chapter tex/noth.tex (Primes and roots). * In tex/taylor.tex (The Taylor series): - In the integrand of the "Cauchy's integral formula" section, put "n-1" in place of "n". - Generalized the Cauchy integration where relevant to nonintegral exponents. * Partially filled the stub tex/inttx.tex (Integration techniques) with an actual chapter. * Edited most of the chapters in other, minor ways. * Added to the bibliography, and corrected a few typos therein. * Added an index to the book. * Closed some dangling references, replacing "[section not yet written]" with the appropriate section number. * Noted that future routine changes of the kinds of the last three bullets (bibliography updates, index updates and reference closures) will not normally be chronicled here. -- Thaddeus H. Black Mon, 01 May 2006 00:00:00 +0000 derivations (0.3.20060425) * Removed the 'helper' directory and its contents. (Shifted them to the Debian package only, as they are not relevant to non-Debian users.) * In tex/main.tex: - Added the new \verdate macro. - Because TeX apparently does not respect local scoping of lengths, reserved four TeX length variables \tla, \tlb, \tlc and \tld for general use and reuse throughout the book. - Rescripted the copyright page correctly to use \vspace{\stretch{1}} to force the text to the bottom of the page. * Further polished the book's preface in tex/pref.tex. Added to it a brief observation about the book's bibliography. * In tex/alggeo.tex (Classical algebra and geometry): - Added distributivity to the basic arithmetic properties table. - Added a section on multiplying and dividing power series. - Removed one footnote on rigor. * In tex/trig.tex (Trigonometry), added a new section on the complex triangle inequalities. * In tex/drvtv.tex (The derivative): - Added to the list of combinatorics properties several properties relating Pascal's-triangle neighbors of various sorts. - Added a new section on extrema and higher derivatives. * In tex/cexp.tex, renamed the chapter from "Complex exponentials" to "The complex exponential". * In tex/integ.tex (The integral), extended the brief stub text to a full new chapter. * Added the new chapter tex/taylor.tex (The Taylor series). * Opened the stub tex/inttx.tex (Integration techniques) for a future chapter. * In several chapters, changed incorrect references to "Laurent series" to correct references to "power series". * Edited most of the chapters in other, minor ways. * Corrected and extended some old bibliography entries; and added new entries. * In tex/def.tex, added some new hyphenations. * Corrected a misspelling in tex/README. * In tex/Makefile, freed the main document build from dependency on local template files. -- Thaddeus H. Black Tue, 25 Apr 2006 00:00:00 +0000 derivations (0.3.20060410) * Repackaged the software in non-Debian native form, incrementing the minor version number for this reason. * Retained all the old changelog entries here. * Noted here for clarity that several of the items in the (0.2.20051112) changelog entry below do refer to Debian-only packaging matters. Undertook to keep such Debian-only packaging matters from this changelog in the future. * As far as the book had yet gone (this is, through cexp.tex plus some appendices), generally refined and clarified the material already in the book. Especially, clarified the book's development of complex numbers and the complex exponential by moving the introduction of de Moivre's theorem up to a new section at the end of trig.tex, thus logically decoupling de Moivre's theorem from Euler's formula as such. * Added sections on l'Hopital's rule and the Newton-Raphson iteration to drvtv.tex. * Began the new integ.tex, a chapter treating the integral. * Added \usepackage{amsmath} to main.tex. * Changed \usepackage{pstricks} to \usepackage{pst-plot}. * Noted here for future historical interest that the book is 90 pages long, not counting prefatory matter. (Not much, but it's a start.) -- Thaddeus H. Black Mon, 10 Apr 2006 00:00:00 +0000 derivations (0.2.20051112) * Adhered to Policy 3.6.2.1. * Updated the FSF's address in debian/copyright. * Corrected the book's text in minor ways. * Fixed tex/Makefile to use make(1)'s $(filter-out ()) function correctly. * Modified, debugged and extended the helper scripts significantly. * Added helper/RelSteps. * Uploaded the package to Debian's experimental archive for the first time (closes: #338520). -- Thaddeus H. Black Sat, 12 Nov 2005 00:00:00 +0000 derivations (0.2.20051110) * Typeset part of the book in LaTeX and packaged it for Debian. * Noted that much of the version 0.1 hand-printed manuscript remains yet to be edited and typeset for this version 0.2. -- Thaddeus H. Black Thu, 10 Nov 2005 00:00:00 +0000 derivations (0.1.20020419) * Observed that this changelog entry and the entries below for earlier dates were made 4 Nov 2005, not so much as an accurate record of changes, but rather as an extremely abbreviated approximate sketch of the package's long history. (The precise dates given are chosen more or less randomly, but are probably correct to within two or three months.) * Added derivations including that of the method for calculating electrical resistance to earth ground. -- Thaddeus H. Black Fri, 19 Apr 2002 00:00:00 +0000 derivations (0.1.19960131) * Added Fourier transforms and their properties. -- Thaddeus H. Black Wed, 31 Jan 1996 00:00:00 +0000 derivations (0.1.19890714) * Misplaced and lost some of the original manuscript (but the ideas remain more or less intact in the author's thought, so the loss is tolerable). -- Thaddeus H. Black Fri, 14 Jul 1989 00:00:00 +0000 derivations (0.1.19881206) * Added the Gaussian distribution and many others. -- Thaddeus H. Black Tue, 06 Dec 1988 00:00:00 +0000 derivations (0.1.19831014) * Began the hand-printed manuscript with its first derivations, including the Pythagorean theorem. -- Thaddeus H. Black Fri, 14 Oct 1983 00:00:00 +0000 derivations-0.53.20120414.orig/doc/derivations.70000644000000000000000000000516411742566274017531 0ustar rootroot.TH "DERIVATIONS" 7 "10 March 2010" \ "Thaddeus H. Black" "Derivations of Applied Mathematics" .SH "NAME" derivations \- book: Derivations of Applied Mathematics .\" .\" -------------------------------------------------------------------- .\" .SH "DESCRIPTION" Understandably, program sources rarely derive the mathematical formulas they use. Not wishing to take the formulas on faith, a user might nevertheless reasonably wish to see such formulas somewhere derived. .PP .I Derivations of Applied Mathematics is a book which documents and derives many of the mathematical formulas and methods implemented in free software or used in science and engineering generally. It documents and derives the Taylor series (used to calculate trigonometrics), the Newton-Raphson method (used to calculate square roots), the Pythagorean theorem (used to calculate distances) and many others. .\" .\" -------------------------------------------------------------------- .\" .SH "READING THE BOOK" Among other ways, you can read the book on your computer screen by opening the file .I /usr/share/doc/derivations/derivations.ps.gz with the .BR gv (1) program under .BR X (7). To print the book on a standard postscript printer, just .BR zcat (1) then .BR lpr (1) the same file. .\" .\" -------------------------------------------------------------------- .\" .SH "FILES" .PD 0 .TP .I /usr/share/doc/derivations/derivations.ps.gz the book in postscript format .TP .I /usr/share/doc/derivations/derivations.pdf.gz the book in PDF .PD .\" .\" -------------------------------------------------------------------- .\" .SH "BUGS" The book is a work in progress. .\" .\" -------------------------------------------------------------------- .\" .SH "AUTHOR" The book and this manpage are written by Thaddeus H. Black, who also maintains the Debian package .I derivations in which they are distributed. Users who need to contact the author in his role as Debian package maintainer can reach him at . However, most e-mail will naturally be about the book itself: this should be sent to . .\" .\" -------------------------------------------------------------------- .\" .SH "COPYLEFT" Copyright (C) 1983\-2010 Thaddeus H. Black. .PP The book, this manpage and the entire .I derivations distribution are free software. You can redistribute them and/or modify them under the terms of the GNU General Public License, version 2. .\" .\" -------------------------------------------------------------------- .\" .SH "SEE ALSO" .BR gv (1) .RI [ gv ], .BR zcat (1) .RI [ gzip ], .BR psselect (1) .RI [ psutils ], .BR lpr (1) .RI [ lpr ], .BR octave (1) .RI [ octave ] derivations-0.53.20120414.orig/Makefile0000644000000000000000000000026311742566274016000 0ustar rootrootsrcdir := btool tex .PHONY: FORCE all clean all: all clean: ; $(foreach dir, $(srcdir), $(MAKE) -C $(dir) $@ ;) $(foreach dir, $(srcdir), $(dir)/%): FORCE; $(MAKE) -C $(@D) $(@F)