Instruction of mathematical formulas

H. Shibata Created
\[ \newcommand{\sN}{\mathbb{N}} \newcommand{\sR}{\mathbb{R}} \newcommand{\sZ}{\mathbb{Z}} \newcommand{\od}{\mathrm{d}} \newcommand{\ax}{\mathrm{x}} \newcommand{\ay}{\mathrm{y}} \newcommand{\az}{\mathrm{z}} \newcommand{\GCD}{\mathrm{GCD}} \newcommand{\br}[1]{\left ( #1 \right)} \newcommand{\sbr}[1]{\left [ #1 \right]} \newcommand{\cbr}[1]{\left \{ #1 \right \}} \]

This documents instruct how to write a basic mathematical formula for those in the domain of application of artificial intelligence.

When I am reviewing a paper from computer society, especially from the application domain of Artificial Intelligence, so many papers with terrible mathematical notations have been sent. These paper have great English with poor mathematics (almost slang or just a pasted one of python code.). I think teaching how to write a formula is the duty of supervisors who give their disciples a degree. However many of them does not teach it at all. Here are list of error and correction. If you know me and take any review from academic society, please consider to follow this instruction. Do not write a formula with some special ability or sense gifted from a God. Write it with formal rules . It is a formula!

Italic and Roman for variable

\(max\) should be written as \(\max\). \(sin, cos\) should be \(\sin, \cos\) too. In principle, any character of Italic font denote a variable, and there is a multiplication between any two characters. So if you write \(x=sin (y)\), it means \(x = s\times i\times n(y)\), where \(s, i, y\) are supposed to be a variable and \(n\) be a function that takes one argument.

To distinguish products of variables from a function name, well defined functions such as \(\sin, \cos, \log, \max\), etc., must be written in Roman typeface (upright). I recommend to write \(Softmax, ReUL\) as \(\mathrm{Softmax}, \mathrm{ReLU}\) respectively.

At the same time, a variable with multiple italic characters are not allowed. A variable noted as \(iteration\) is so, so bad. Use such as \(n_\mathrm{iteration}\), that means number of iteration. Note that subscription is in Roman typeface. I prefer just write it as \(n\). If you need so long identification such as \(\mathrm{iteration}\), I recommend to refine the model or system of your proposal, because it should be unnecessarily redundant.

This is a common sense on this planet that is taught at junior high school.

Decorating

In python, you can use only 26 alphabets (for English speakers, japanese can use 2000). So you need to a identification with multiple characters. But in mathematics, you can use lots of decoration that enables us to distinguish variable without lengthen its name. For example, there is, a, a', a'', \bar{a}, \tilde{a}, \dot{a}, \ddot{a}, \mathrm{a}, a^*, In order, \(a\) is used to denote a variable such as a number, \(a', a''\) are an auxiliary of \(a\), \(\bar{a}\) is an average of \(a\), \(\tilde{a} = a - \bar{a}\) is fluctuation of \(a\), \(\dot{a}\) is a variation rate of \(a\), and so on.

You can use upper case, greek, and fraktur as followings too. A, \alpha, \mathfrak{a}

You lacks number of variable even using the above, use index. a_\mathrm{in}, a_\mathrm{out}, a_\mathrm{solid}.

Set and series (family)

A countable set is defined with \(\cbr{\ }\), not with \(\br{\ }\) nor \(\sbr{\ }\). So it is defined as, \(A = \cbr{1, 2, ..., n}\). A series or tuple is defined with \(\br{}\), for example, \(a = \br{1, 2, 3}\). Picking up an element from a set is stated as \(e \in A\), on the other hand, from a series is stated as \(a_i\) with a index (In this case \(i\in\cbr{1,2,3}\)). If you want to write a summation for them, here are example respectively. S = \sum_{a\in A} a, S = \sum_{i\in\cbr{1, 2,3}} a_i That is, no index is designated for a set, but index is designated for a series.

Define a countable set without its contents. For example, a statement ``Let \(A\) be a set of books" is OK. A statement ``Let \(A=\cbr{a_1, a_2, ..., a_n}\) be a set of books where \(n\) is a number of books" is not good because it is redundant.

Here are common notations.

There is exactly only one identical element in a set. But there is multiple occurrences in a series. For example, if we define \(\delta\) as, a = b \Rightarrow \delta(a, b) =1, a \neq b \Rightarrow \delta(a, b) = 0 in following equation, \(S_1\) takes \(0\) or \(1\), however, \(S_2\) can take more than \(1\). S_1 = \sum_{e\in A}\delta(e, b), S_2 = \sum_{i\in L}\delta(a_i, b), where \(a \equiv (a_i; i\in L)\) is a series index by a set \(L\), and \(A\) is a countable set.

So called ``dataset" may not be a set in a sense of set theory.

Axis and label

Do you find a statement axis \(x\) ?, or \(f_x\)? This is not good convention. Because it should be a name of an axis, they should be ``axis \(\mathrm{x}\)", and \(f_\mathrm{x}\). \(x\) is frequently used to denote a coordinate value on axis \(\mathrm{x}\), but axis and its value are different class concept. This mixing understanding will be a problem in differential and integral.

frequently, we want to know the value of the above on 2 points here \(x_1, x_2\). \Delta = \frac{\partial f}{\partial x}(x_2) - \frac{\partial f}{\partial x}(x_1) It can be formally, \Delta = \frac{\partial f}{\partial \ax}(x_2) - \frac{\partial f}{\partial \ax}(x_1)

Here is common integration for cumulative function. g(x) = \int_{0}^x f(t)\od t, it is simply, g(x) = \int_{0}^x f\od \mathrm{x},

\(\sR^\cbr{\mathrm{x, y, z}}\) is a 3-D vector space with axes \(\mathrm{x, y, z}\). If \(a \in \sR^\cbr{\mathrm{x, y, z}}\), \(a_\ax, a_\ay, a_\az\) denote a coordinate value for each \(\ax, \ay, \az\) respectively. If we define a function \(f\) as \(f: \sR^\cbr{\mathrm{x, y, z}}\to \sR\), then \frac{\partial f}{\partial \ax}, \frac{\partial f}{\partial \ay}, \frac{\partial f}{\partial \az} can be defined. Deviation of \(f\) is then, \od f = \sum_{x\in\cbr{\ax, \ay, \az}}\frac{\partial f}{\partial x} \od x

If we want to finite difference, it can be defined as, \frac{f(r + \Delta r) - f(r)}{h}, \Delta r_\ax = h, \forall i\neq \ax\br{\Delta_i = 0}. using indexes for axes. This way of definition is rigorous because it can be used in any size of index set, compared with elementary explanation such as (not bad for lecture.). \frac{f(r_\ax + h, r_\ay, r_\az,...) - f(r_\ax, r_\ay, r_\az,...)}{h}