机器学习 coursera【week1-3】

之前，浏览器一直出现缓冲问题，是配置文件设置的不对，解决方法如下：

1.到C:\Windows\System32\drivers\etc下找到host文件，并以文本方式打开，

机器学习 coursera【week1-3】

添加如下信息到hosts文件中：

52.84.246.90 d3c33hcgiwev3.cloudfront.net
52.84.246.252 d3c33hcgiwev3.cloudfront.net
52.84.246.144 d3c33hcgiwev3.cloudfront.net
52.84.246.72 d3c33hcgiwev3.cloudfront.net
52.84.246.106 d3c33hcgiwev3.cloudfront.net
52.84.246.135 d3c33hcgiwev3.cloudfront.net
52.84.246.114 d3c33hcgiwev3.cloudfront.net
52.84.246.90 d3c33hcgiwev3.cloudfront.net
52.84.246.227 d3c33hcgiwev3.cloudfront.net

2.刷新浏览器dns地址，ipconfig/flushdns

1.1introduction

structure and usage of machine learning

机器学习 coursera【week1-3】

the definition of ML

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E

supervised learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

there are two types of supervised learning, that are regression and classification. one sign is whether the relationship of input and output is continuous.

unsupervised learning

there are no labels for the unsupervised learning, and we hope that the computer can help us to labels some databets.

1.2model and cost function

1.2.1模型导入：

机器学习 coursera【week1-3】

training examples（x⁽ⁱ⁾，y⁽ⁱ⁾），i=1,2,3...,m，m is trainging set;

h(x) si a 'good' predictor for the goal of housing price of y，and h(x) here is called hypothesis;

if we are trying to predict the problem continuously, such as the housing price, we call the learning problem a regression problem.

1.2.2some figures of linear regression

机器学习 coursera【week1-3】

cost function

choose a suitable h_θ(x) for making the error with y to the minimum

make a cost function

机器学习 coursera【week1-3】

1.2.3cost function - intuition I

when θ₀=0 and θ₁=1，the cost function and function of the parameter is as below

机器学习 coursera【week1-3】

the relationship between the function of hypothesis function and the cost function, that is to say, there are different values of cost function that is corresponding to the the function of hypothesis

机器学习 coursera【week1-3】

1.2.4Intuition II

now, it is fixed values of θ₀，θ₁，

机器学习 coursera【week1-3】

the curve face to the ground is the height of the J(θ₀,θ₁)，we can see the description in the picture as below

机器学习 coursera【week1-3】

it is also called contour plots or contour figures to the left graph as below, and we can get the minimal result as much as possible,

机器学习 coursera【week1-3】

1.2.5algorithm of function of hypothesis to minimize the cost function of J

the best algorithm is to find a function to make the value of cost function which is a second-order function to the minimum, and then the inner circle point is what we need get. It is also corresonding to the two values θ₀ and θ_1.

机器学习 coursera【week1-3】

1.3parameter learning

1.3.1introduction of gradient descent

机器学习 coursera【week1-3】

the theory of gradient descent, like a model going down the hill, it bases on the hypothesis function(theta0 and theta1), and the cost function J is bases on the hypothesis function graphed below.

机器学习 coursera【week1-3】

the tangential line to a cost function is the black line which use athe derivative.

alpha is a parameter, which is called learning rate. A small alpha would result in a small step and a larger alpha would result in a larger step. the direction is taken by the partial derivative of J(θ₀，θ₁)

机器学习 coursera【week1-3】

1.3.2OUTLINE OF THE GRADIENT DESCENT ALGORITHM

theta 0 and theta1 need update together, otherwise they will be replaced after operation, such as the line listed for theta 0, and next it is incorrect when replace the value of theta0 in the equation of temp1

机器学习 coursera【week1-3】

1.3.3Gradient Descent Intuition

机器学习 coursera【week1-3】

if alpha is to small, gradient descent can be slow; and if alpha is to large, gradient descent can overshoot the minimum. may not be converge or even diverge.

机器学习 coursera【week1-3】

gradient descent can converge to a local minimum, whenever a learning rate alpha

gradient descent will automatically take smaller steps to make the result converge.

机器学习 coursera【week1-3】

Use gradient descent to assure the change of theta, when the gradient is positive, the gradient descent gradually decrease and when the gradient is negative, the gradient descent gradually increase.

机器学习 coursera【week1-3】

gradient for linear regression

partial derevative for theta0 and theta1

机器学习 coursera【week1-3】

convex function and bowl shape

机器学习 coursera【week1-3】

Batch gradient descent: every make full use of the training examples

gradient descent can be subceptible to local minima in general. gradient descent always converges to the global minimum.

机器学习 coursera【week1-3】

review

vector is a matric which is nx1 matrix

机器学习 coursera【week1-3】

R refers to the set of scalar real numbers.

n refers to the set of n-dimensional vectors of real numbers.

1.3.4Addition and scalar Multiplication

The knowledge here is similar with linear algebra, possibly there is no necessity to learn it.

1.3.5Matrix vector multiplication

The knowledge here is similar with linear algebra, possibly there is no necessity to learn it.

1.3.6Matrix Multiplication Properties

identity matrix

机器学习 coursera【week1-3】

1.3.7review and review Inverse and Transpose of matrix

through computing, Matrix A multiply inverse A is not equal inverse A multiply Matrix A

机器学习 coursera【week1-3】

week02Linear regression

2.1Multiple features(variables)

2.1.1Multible linear regression

compute the value x_j⁽ⁱ⁾ = value of feature j in ith training sets

机器学习 coursera【week1-3】

x₃⁽²⁾, x⁽²⁾means the line 2 and the x₃ means the third number, that is to say it is 2.

put the hypothesis to the n order, that is multivariable form of the hypothesis function

机器学习 coursera【week1-3】

to define the function h_θ(x) of the n order, we need to make sense its meaning, there is an example to explain.

机器学习 coursera【week1-3】

2.1.2gradient descent for multiple variables

机器学习 coursera【week1-3】

2.1.3gradient descent in practice 1 - feature scaling

mean normalization

机器学习 coursera【week1-3】

appropriate number of mean normalization can make the gradient descent more quick.

use x_i:= (x_i- u_i) / s_i，

where

is the average of all the values for feature (i) and si is the range of values (max - min), or si is the standard deviation.

机器学习 coursera【week1-3】

is the average of all the values for feature (i) and si is the range of values (max - min), or si is the standard deviation.

2.1.4Graident descent in practice ii - learning rate

how to adjust the parameter of learning rate, it is also a type a debug, so may be use a 3 to multiply the original learning rate to adjust the value to the optimal.

机器学习 coursera【week1-3】

If J(θ) ever increases, then you probably need to decrease α.

$src="https://image.pythonjishu.com/web/pythonjishu/20230410_oKxJkecLUYCt.jpg" title="机器学习 coursera【week1-3】"/>$

To summarize:

If $α is too small: slow convergence.$

If $α is too large: may not decrease on every iteration and thus may not converge.$

2.1.5features and polynomial regression

机器学习 coursera【week1-3】

feature scaling is to find a new function that can fit the range of training examples, such as if the price is up to the feets ranging from 1 to 1000, and then the polinomial regression is used to change the type of the original function.

机器学习 coursera【week1-3】

like this, we use two functions to compute the result, and x1 and x2 are that

机器学习 coursera【week1-3】

2.1.6Normal equation

for an example of normal equation

机器学习 coursera【week1-3】

in programming:

x' means transpose x

pinv(x) means inverse Matrix

机器学习 coursera【week1-3】

normal regression formula

机器学习 coursera【week1-3】

the comparasion of the gradient descent and normal regression

机器学习 coursera【week1-3】

2.1.7Normal Equation Noninvertibility

机器学习 coursera【week1-3】

feature scaling: 特征缩放

normalized features:标准化特征

2.1.8practice of octive

some basic operation for octive, it is like some operations in matlab or python.numpy

2.1.9vectorization

h(theta) is the original sythphasis function relate with theta0 and theta1, now use octave to vectorize it. prediction = theta' * x + theta(j) * x(j)

its programming in C++ below

机器学习 coursera【week1-3】

download octave to programming, GNU octave docs is here.

2.2homeworkweek02

part1 linear regression with one variable

2.2.1input the data

create a scatter plot to represent the plotData on the column1 and column2

clear ; close all; clc

%% ==================== Part 1: Basic Function ====================
% Complete warmUpExercise.m
fprintf('Running warmUpExercise ... \n');
fprintf('5x5 Identity Matrix: \n');
warmUpExercise()

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ======================= Part 2: Plotting =======================
fprintf('Plotting Data ...\n')
data = load('ex1data1.txt');
X = data(:, 1); y = data(:, 2);
m = length(y); % number of training examples

% Plot Data
% Note: You have to complete the code in plotData.m
plotData(X, y);

View Code

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：机器学习 coursera【week1-3】 - Python技术站