of the in-progress ebook on Linear Algebra, “A birds eye view of linear algebra”. This ebook will put a particular emphasis on AI functions and the way they leverage linear algebra.
Linear algebra is a basic self-discipline underlying something one can do with Math. From Physics to machine studying, likelihood principle (ex: Markov chains), you title it. It doesn’t matter what you’re doing, linear algebra is at all times lurking below the covers, able to spring at you as quickly as issues go multi-dimensional. In my expertise (and I’ve heard this from others), this was on the supply of an enormous shock between highschool and college. In highschool (India), I used to be uncovered to some very primary linear algebra (primarily determinants and matrix multiplication). Then in college stage engineering training, each topic swiftly appears to be assuming proficiency in ideas like Eigen values, Jacobians, and so forth. such as you had been purported to be born with the information.
This chapter is supposed to offer a excessive stage overview of the ideas and their apparent functions that exist and are necessary to know on this self-discipline.
The AI revolution
Nearly any info could be embedded in a vector house. Pictures, video, language, speech, biometric info and no matter else you’ll be able to think about. And all of the functions of machine studying and synthetic intelligence (just like the current chat-bots, textual content to picture, and so forth.) work on prime of those vector embeddings. Since linear algebra is the science of coping with excessive dimensional vector areas, it’s an indispensable constructing block.

Quite a lot of the strategies contain taking some enter vectors from one house and mapping them to different vectors from another house.
However why the give attention to “linear” when most fascinating capabilities are non-linear? It’s as a result of the issue of creating our fashions excessive dimensional and that of creating them non-linear (normal sufficient to seize all types of complicated relationships) transform orthogonal to one another. Many neural community architectures work by utilizing linear layers with easy one dimensional non-linearities in between them. And there’s a theorem that claims this sort of structure can mannequin any operate.
For the reason that approach we manipulate excessive dimensional vectors is primarily matrix multiplication, it isn’t a stretch to say it’s the bedrock of the trendy AI revolution.
I) Vector areas
As talked about within the earlier part, linear algebra inevitably crops up when issues go multi-dimensional. We begin off with a scalar, which is simply a lot of some kind. For this text, we’ll be contemplating actual and sophisticated numbers for these scalars. Usually, a scalar could be any object the place the essential operations of addition, subtraction, multiplication and division are outlined (abstracted as a “discipline”). Now, we would like a framework to explain collections of such numbers (add dimensions). These collections are referred to as “vector areas”. We’ll be contemplating the circumstances the place the weather of the vector house are both actual or complicated numbers (the previous being a particular case of the latter). The ensuing vector areas are referred to as “actual vector areas” and “complicated vector areas” respectively.
The concepts in linear algebra are relevant to those “vector areas”. The most typical instance is your ground, desk or the pc display screen you’re studying this on. These are all two-dimensional vector areas since each level in your desk could be specified by two numbers (the x and y coordinates as proven beneath). This house is denoted by R² since two actual numbers specify it.
We will generalize R² in numerous methods. First, we will add dimensions. The house we dwell in is 3 dimensional (R³). Or, we will curve it. The floor of a sphere just like the Earth for instance (denoted S²), continues to be two dimensional, however in contrast to R² (which is flat), it’s curved. Up to now, these areas have all principally been arrays of numbers. However the concept of a vector house is extra normal. It’s a assortment of objects the place the next concepts needs to be effectively outlined:
- Addition of any two of the objects.
- Multiplication of the objects by a scalar (an actual quantity).
Not solely that, however the objects needs to be “closed” below these operations. Which means in the event you apply these two operations to the objects of the vector house, you need to get objects of the identical sort (you shouldn’t go away the vector house). For instance, the set of integers isn’t a vector house as a result of multiplication by a scalar (actual quantity) can provide us one thing that isn’t an integer (3*2.5 = 7.5 which isn’t an integer).
One of many methods to specific the objects of a vector house is with vectors. Vectors require an arbitrary “foundation”. An instance of a foundation is the compass system with instructions — North, South, East and West. Any path (like “SouthWest”) could be expressed by way of these. These are “path vectors” however we will even have “place vectors” the place we want an origin and a coordinate system intersecting at that origin. The latitude and longitude system for referencing each place on the floor of the Earth is an instance. The latitude and longitude pair are one solution to determine your own home. However there are infinite different methods. One other tradition would possibly draw the latitude and longitude strains at a barely completely different angle to what the usual is. And so, they’ll give you completely different numbers for your own home. However that doesn’t change the bodily location of the home itself. The home exists as an object within the vector house and these other ways to specific that location are referred to as “bases”. Selecting one foundation lets you assign a pair of numbers to the home and selecting one other one lets you assign a special set of numbers which might be equally legitimate.

Vector areas will also be infinite dimensional. As an example, in miniature 12 of [2], your entire set of actual numbers is considered an infinite dimensional vector house.
II) Linear maps
Now that we all know what a vector house is, let’s take it to the following stage and speak about two vector areas. Since vector areas are merely collections of objects, we will consider a mapping that takes an object from one of many areas and maps it to an object from the opposite. An instance of that is current AI packages like Midjourney the place you enter a textual content immediate and so they return a picture matching it. The textual content you enter is first transformed to a vector. Then, that vector is transformed to a different vector within the picture house through such a “mapping”.
Let V and W be vector areas (both each actual or complicated vector areas). A operate f: V ->W is alleged to be a ‘linear map’ if for any two vectors u, v 𝞮 V and any scalar c (an actual variety of complicated quantity relying on climate we’re working with actual or complicated vector areas) the next two circumstances are glad:
$$f(u+v) = f(u) + f(v) tag{1}$$
$$f(c.v) = c.f(v)tag{2}$$
Combining the above two properties, we will get the next outcome a few linear mixture of n vectors.
$$f(c_1.u_1+ c_2.u_2+ … c_n.u_n) = c_1.f(u_1)+c_2.f(u_2)+…+c_n.f(u_n)$$
And now we will see the place the title “linear map” comes from. If we move to the linear map, f, a linear mixture of n vectors (LHS of equation above), that is equal to making use of the identical linear map to the capabilities (f) of the person vectors. We will apply the linear map first after which the linear mixture or the linear mixture first after which the linear map. The 2 are equal.
In highschool, we find out about linear equations. In two dimensional house, such an equation is represented by f(x)=m.x+c. Right here, m and c are the parameters of the equation. Word that this operate isn’t a linear map. Though it satisfies equation (1) above, it fails to fulfill equation (2). If we set f(x)=m.x as a substitute, then this can be a linear map because it satisfies each equations.

III) Matrices
In part I, we launched the idea of foundation for a vector house. Given a foundation for the primary vector house (V) and the dimensionality of the second (U), each linear map could be expressed as a matrix (for particulars, see right here). A matrix is only a assortment of vectors. These vectors could be organized in columns, giving us a 2-d grid of numbers as proven beneath.

Matrices are the objects folks first consider within the context of linear algebra. And for good purpose. More often than not spent working towards linear algebra is coping with matrices. However you will need to keep in mind that there (basically) are an infinite variety of matrices that may signify a linear map, relying on the idea we select for the primary house, V. The linear map is therefore a extra normal idea than the matrix one occurs to be utilizing to signify it.
How do matrices assist us carry out the linear map they signify (from one vector to the opposite)? By the matrix getting multiplied with the primary vector. The result’s the second vector and the mapping is full (from first to second).
Intimately, we take the dot product (sum product) of the primary vector, v_1 with the primary row of the matrix and this yields the primary entry of the ensuing vector, v_2 after which the dot product of v_1 with the second row of the matrix to get the second entry of v_2 and so forth. This course of is demonstrated beneath for a matrix with 2 rows and three columns. The primary vector, v_1 is three dimensional and the second vector, v_2 is 2 dimensional.

Word that the underlying linear map behind a matrix with this dimensionality (2x3) will at all times take a 3 dimensional vector, v_1 and map it to a two dimensional house, v_2.

Usually an (nxm) matrix will map an m dimensional vector to an n dimensional one.
III-A) Properties of matrices
Let’s cowl some properties of matrices that’ll enable us to determine properties of the linear maps they signify.
Rank
An necessary property of matrices and their corresponding linear maps is the rank. We will speak about this by way of a group of vectors, since that’s all a matrix is. Say we’ve got a vector, v1=[1,0,0]. The primary factor of the vector is the coordinate alongside the x-axis, the second is that alongside the y-axis and the third one the z-axis. These three axes are a foundation (there are a lot of) of the third-dimensional house, R³, that means that any vector on this house could be expressed as a linear mixture of these three vectors.

We will multiply this vector by a scalar, s. This offers us s.[1,0,0] = [s,0,0]. As we differ the worth of s, we will get any level alongside the x-axis. However that’s about it. Say we add one other vector to our assortment, v2=[3.5,0,0]. Now, what are the vectors we will make with linear mixtures of these two vectors? We get to multiply the primary one with any scalar, s_1 and the second with any scalar, s_2. This offers us:
$$s_1.[1,0,0] + s_2[3.5,0,0] = [s_1+3.5 s_2, 0,0] = [s’,0,0]$$
Right here, s’ is simply one other scalar. So, we will nonetheless attain factors solely on the x-axis, even with linear mixtures of each these vectors. The second vector didn’t “increase our attain” in any respect. The variety of factors we will attain with linear mixtures of the 2 is strictly the identical because the quantity we will attain with the primary. So regardless that we’ve got two vectors, the rank of this assortment of vectors is 1 because the house they span is one dimensional. If then again, the second vector had been v2=[0,1,0] then you would get any level on the x-y aircraft with these two vectors. So, the house spanned can be two dimensional and the rank of this assortment can be 2. If the second vector had been v2=[2.1,1.5,0.8], we might nonetheless span a two dimensional house with v1 and v2 (although that house can be completely different from the x-y aircraft now, it might be another 2-d aircraft). And the 2 vectors would nonetheless have a rank of 2. If the rank of a group of vectors is identical because the variety of vectors (that means they will collectively span an area of dimensionality as excessive because the variety of vectors), then they’re referred to as “linearly impartial”.
If the vectors that make up the matrix can span an m dimensional house, then the rank of the matrix is m. However a matrix could be considered a group of vectors in two methods. Because it’s a easy two dimensional grid of numbers, we will both take into account all of the columns because the group of vectors or take into account all of the rows because the group as proven beneath. Right here, we’ve got a (3x4) matrix (three rows and 4 columns). It may be considered both as a group of 4 column vectors (every third-dimensional) or 3 row vectors (every 4 dimensional).

Full row rank means all row the row vectors are linearly impartial. Full column rank means all column vectors are linearly impartial.
When the matrix is a sq. matrix, it seems that the row rank and column rank will at all times be the identical. This isn’t apparent in any respect and a proof is given within the mathexchange publish, [3]. Which means for a sq. matrix, we will discuss simply by way of the rank and don’t should hassle specifying “row rank” or “column rank”.
The linear transformation comparable to a (3 x 3) matrix that has a rank of two will map every little thing within the 3-D house to a decrease, 2-d house very like the (3 x 2) matrix we encountered within the final part.

Notions intently associated to the rank of sq. matrices are the determinant and invertibility.
Determinants
The determinant of a sq. matrix is its “measure” in a way. Let me clarify by going again to considering of a matrix as a group of vectors. Let’s begin with only one vector. The way in which to “measure” it’s apparent — its size. And since we’re dealing solely with sq. matrices, the one solution to have one vector is to have it’s one dimensional. Which is principally only a scalar. Issues get fascinating once we go from one dimension to 2. Now, we’re in two dimensional house. So, the notion of “measure” is now not size, however has graduated to areas. And with two vectors in that two dimensional house, it’s the space of the parallelogram they kind. If the 2 vectors are parallel to one another (ex: each lie on x-axis). In different phrases, they aren’t linearly impartial, then the world of the parallelogram between them will change into zero. The determinant of the matrix shaped by them will likely be zero and so will the rank of that matrix be zero.

Taking it one dimension larger, we get 3 dimensional house. And to assemble a sq. matrix (3x3), we now want three vectors. And because the notion of “measure” in three dimensional house is quantity, the determinant of a (3x3) matrix turns into the quantity contained between the vectors that make it up.

And this may be prolonged to house of any dimensionality.
Discover that we spoke concerning the space or the quantity contained between the vectors. We didn’t specify if these had been the vectors composing the rows of the sq. matrix or those composing its columns. And the considerably stunning factor is that we don’t must specify this as a result of it doesn’t matter both approach. Climate we take the vectors forming the rows and measure the quantity between them or the vectors forming the columns, we get the identical reply. That is confirmed within the mathexchange publish [4].
There are a number of different properties of linear maps and corresponding matrices that are invaluable in understanding them and extracting worth out of them. We’ll be delving into invertability, eigen values, diagonalizability and completely different transformations one can do within the coming articles (test again right here for hyperlinks).
When you favored this story, purchase me a espresso 🙂 https://www.buymeacoffee.com/w045tn0iqw
References
[1] Linear map: https://en.wikipedia.org/wiki/Linear_map
[2] Matousek’s miniatures: https://kam.mff.cuni.cz/~matousek/stml-53-matousek-1.pdf
[3] Mathexchange publish proving row rank and column rank are the identical: https://math.stackexchange.com/questions/332908/looking-for-an-intuitive-explanation-why-the-row-rank-is-equal-to-the-column-ran
[4] Mathexchange publish proving the determinants of a matrix and its transpose are the identical: https://math.stackexchange.com/a/636198/155881