首页 > 其他分享> 文章详细

week 1 - machine learning - Andrew ng- coursera

2021-12-06 22:32:36 阅读：271 来源： 互联网

标签：week function 1.4 matrix coursera Andrew vector learning regression

week1

1. week 1

1 week 1

1.1 intro

1.1.1 what is ML?

definition
the field of study that gives computers the ability to learn without being explicitly programmedd. (Arthur Samuel, 1959)

Tom mitchell (1998) well-posed learning problem: a computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Example: playing checkers.

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

E	experience
T	task
P	performance

classifications:
Supervised learning
Unsupervised learning.

1.1.2 supervised learning

categories:

regression problem - continuous output
classification problem- discrete output
- example: email spam/not spam

In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

examples of regression

housing price prediction

Example 2:

(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture

(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.

1.1.3 unsupervised learning

example: google news, clustering

applications:

organize computing clusters
social network
market segmentation
astronomical data analysis

example 2 cocktail party

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds

1.1.4 test 1

wrong answer for the following Q

desc: Some of the problems below are best addressed using a supervised

learning algorithm, and the others with an unsupervised

learning algorithm. Which of the following would you apply

supervised learning to? (Select all that apply.) In each case, assume some appropriate

dataset is available for your algorithm to learn from.

1.2 Linear Regression with One Variable

1.2.1 model representation

m: number of traning examples

univariate=one variable x⁽ⁱ⁾ – i element of input variables y⁽ⁱ⁾ – i elements of output variables

h: hypothesis, h(x)=a+bx

house predicting, regression vs classification

target variable	type
continuous	regression
discrete values	classification

When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.

Linear regression predicts a real-valued output based on an input value. We discuss the application of linear regression to housing price prediction, present the notion of a cost function, and introduce the gradient descent method for learning.

1.2.2 cost function–J(θ)

fig/ octave code: J=sum(((X*theta-y).²))/(2*m)

an example is the least square error least squre estimation 最小二乘估计法

m: number of traning examples

J(θ)	cost function

goal: minimize the cost function. We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.

J(θ0,θ1) =12m∑i=1m(y^i−yi)2=12m∑i=1m(hθ(xi)−yi)2J(θ₀, θ₁) = \dfrac {1}{2m} \displaystyle ∑ {i=1}^m \left ( \hat{y}_i- y_i \right)² = \dfrac {1}{2m} \displaystyle ∑ _{i=1}^m \left (hθ (x_i) - y_i \right)^2J(θ0,θ1)=2m1i=1∑m(y^i−yi)2=2m1i=1∑m(hθ(xi)−yi)2

To break it apart, it is 0.5 \bar{x}, where \bar{x}= the mean of the squares of h_θ(x_i)−y_i or the difference between the predicted value and the actual value.

This function is otherwise called the "Squared error function", or "Mean squared error". The mean is halved (12)\left(\frac{1}{2}\right)(21) as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 12\frac{1}{2}21 term. The following image summarizes what the cost function does:

contour plot

a graph that contains many contour lines

(no term)

feature:

a contour line of a two variable function has a constant value at all points of the same line.

1.2.3 gradient descent-梯度下降

goal: minimize the cost function J(θ1)

an algorithm for automatically finding that value of theta₀ and theta₁ that minimizes the cost function

We put theta₀ on the x axis and θ1 on the y axis, with the cost function on the vertical z axis.

https://www.hackerearth.com/blog/developers/gradient-descent-algorithm-linear-regression/

1.2.4 gradient descent for linear regression

1.3 quiz 1

1.4 linear algebra review

This optional module provides a refresher on linear algebra concepts. Basic understanding of linear algebra is necessary for the rest of the course, especially as we begin to cover models with multiple variables.

1.4.1 matrices and vectors

A_ij refers to the element in the ith row and jth column of matrix A.
A vector with 'n' rows is referred to as an 'n'-dimensional vector.
v_i refers to the element in the ith row of the vector.
In general, all our vectors and matrices will be 1-indexed. Note that for some programming languages, the arrays are 0-indexed.
Matrices are usually denoted by uppercase names while vectors are lowercase.
"Scalar" means that an object is a single value, not a vector or matrix.
R refers to the set of scalar real numbers.
Rⁿ refers to the set of n-dimensional vectors of real numbers.

% The ; denotes we are going back to a new row.

> A = [1, 2, 3; 4, 5, 6; 7, 8, 9; 10, 11, 12]

A =

    1    2    3
    4    5    6
    7    8    9
   10   11   1

% Get the dimension of the matrix A where m = rows and n = columns >[m,n] = size(A)

m = 4 n = 3 % You could also store it this way

dim_A = size(A)

dim_A =

4 3

% let's index into the 2nd row 3rd column of matrix A A₂₃ = A(2,3)

A_23 =  6

% Initialize a vector

v = [1;2;3]

v =

1 2 3

% Get the dimension of the vector v dim_v = size(v)

dim_v =

   3   1

1.4.2 addition and scalar multiplication

+	addition
-	subtraction

note:To add or subtract two matrices, their dimensions must be the same.

[a b c d]+[w x y z]=[a+w b+x c+y d+z]

[a b c d]−[w x y z]=[a−w b−x c−y d−z]

% Initialize matrix A and B A = [1, 2, 4; 5, 3, 2] B = [1, 3, 4; 1, 1, 1]

A =
   1   2   4
   5   3   2

% Initialize constant s s = 2

% See how element-wise addition works add_AB = A + B

% See how element-wise subtraction works sub_AB = A - B

% See how scalar multiplication works mult_As = A * s

% Divide A by s div_As = A / s

% What happens if we have a Matrix + scalar? add_As = A + s add_As =

3 4 6 7 5 4

1.4.3 matrix vector multiplication

we map the column of the vector onto each row of the matrix, multiplying each element and summing the result. [a b; c d; e f]∗[x; y]=[a∗x+b∗y; c∗x+d∗y; e∗x+f∗y] The result is a vector. The number of columns of the matrix must equal the number of rows of the vector.

An m x n matrix multiplied by an n x 1 vector results in an m x 1 vector.

exercise

% Initialize matrix A 
A = [1, 2, 3; 4, 5, 6;7, 8, 9] 

% Initialize vector v 
v = [1; 1; 1] 

% Multiply A * v
Av = A * v

example of an application: house prediction

1.4.4 matrix matrix multiplication

% Initialize a 3 by 2 matrix A = [1, 2; 3, 4; 5, 6]

% Initialize a 2 by 1 matrix B = [1; 2]

% We expect a resulting matrix of (3 by 2)*(2 by 1) = (3 by 1) mult_AB = A*B

% Make sure you understand why we got that result

example of an application: house prediction

1.4.5 matrix multiplication properties

matrix are not comutative Matrices are not commutative: A∗B≠B∗A

associative, yes Matrices are associative: (A∗B)∗C=A∗(B∗C)=(A∗B)∗C
indentity matrix

exercise

% Initialize random matrices A and B A = [1,2;4,5] B = [1,1;0,2]

% Initialize a 2 by 2 identity matrix I = eye(2)

% The above notation is the same as I = [1,0;0,1]

% What happens when we multiply I*A ? IA = I*A

% How about A*I ? AI = A*I

% Compute A*B AB = A*B

% Is it equal to B*A? BA = B*A

% Note that IA = AI but AB != BA

1.4.6 inverse and transpose

The inverse of a matrix A is denoted A^-1. Multiplying by the inverse results in the identity matrix. > A=[1 2 3; 4 5 6]; > pinv(A) % octave > inv(A) % matlab > transpose(A); A non square matrix does not have an inverse matrix. We can compute inverses of matrices in octave with the pinv(A) function and in Matlab with the inv(A) function. Matrices that don't have an inverse are singular or degenerate.

The transposition of a matrix is like rotating the matrix 90° in clockwise direction and then reversing it. We can compute transposition of matrices in matlab with the transpose(A) function or A':

Created: 2021-12-06 Mon 22:14

Emacs 25.3.1 (Org mode 8.2.10)

Validate

标签：week,function,1.4,matrix,coursera,Andrew,vector,learning,regression
来源： https://www.cnblogs.com/code-saturne/p/15652205.html

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9