之前,浏览器一直出现缓冲问题,是配置文件设置的不对,解决方法如下:
1.到C:\Windows\System32\drivers\etc下找到host文件,并以文本方式打开,
添加如下信息到hosts文件中:
52.84.246.90 d3c33hcgiwev3.cloudfront.net
52.84.246.252 d3c33hcgiwev3.cloudfront.net
52.84.246.144 d3c33hcgiwev3.cloudfront.net
52.84.246.72 d3c33hcgiwev3.cloudfront.net
52.84.246.106 d3c33hcgiwev3.cloudfront.net
52.84.246.135 d3c33hcgiwev3.cloudfront.net
52.84.246.114 d3c33hcgiwev3.cloudfront.net
52.84.246.90 d3c33hcgiwev3.cloudfront.net
52.84.246.227 d3c33hcgiwev3.cloudfront.net
2.刷新浏览器dns地址,ipconfig/flushdns
1.1introduction
structure and usage of machine learning
the definition of ML
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
supervised learning
In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.
there are two types of supervised learning, that are regression and classification. one sign is whether the relationship of input and output is continuous.
unsupervised learning
there are no labels for the unsupervised learning, and we hope that the computer can help us to labels some databets.
1.2model and cost function
1.2.1模型导入:
training examples(x(i),y(i)),i=1,2,3...,m,m is trainging set;
h(x) si a 'good' predictor for the goal of housing price of y,and h(x) here is called hypothesis;
if we are trying to predict the problem continuously, such as the housing price, we call the learning problem a regression problem.
1.2.2some figures of linear regression
cost function
choose a suitable hθ(x) for making the error with y to the minimum
make a cost function
1.2.3cost function - intuition I
when θ0=0 and θ1=1,the cost function and function of the parameter is as below
the relationship between the function of hypothesis function and the cost function, that is to say, there are different values of cost function that is corresponding to the the function of hypothesis
1.2.4Intuition II
now, it is fixed values of θ0,θ1,
the curve face to the ground is the height of the J(θ0,θ1),we can see the description in the picture as below
it is also called contour plots or contour figures to the left graph as below, and we can get the minimal result as much as possible,
1.2.5algorithm of function of hypothesis to minimize the cost function of J
the best algorithm is to find a function to make the value of cost function which is a second-order function to the minimum, and then the inner circle point is what we need get. It is also corresonding to the two values θ0 and θ1.
1.3parameter learning
1.3.1introduction of gradient descent
the theory of gradient descent, like a model going down the hill, it bases on the hypothesis function(theta0 and theta1), and the cost function J is bases on the hypothesis function graphed below.
the tangential line to a cost function is the black line which use athe derivative.
alpha is a parameter, which is called learning rate. A small alpha would result in a small step and a larger alpha would result in a larger step. the direction is taken by the partial derivative of J(θ0,θ1)
1.3.2OUTLINE OF THE GRADIENT DESCENT ALGORITHM
theta 0 and theta1 need update together, otherwise they will be replaced after operation, such as the line listed for theta 0, and next it is incorrect when replace the value of theta0 in the equation of temp1
1.3.3Gradient Descent Intuition
if alpha is to small, gradient descent can be slow; and if alpha is to large, gradient descent can overshoot the minimum. may not be converge or even diverge.
gradient descent can converge to a local minimum, whenever a learning rate alpha
gradient descent will automatically take smaller steps to make the result converge.
Use gradient descent to assure the change of theta, when the gradient is positive, the gradient descent gradually decrease and when the gradient is negative, the gradient descent gradually increase.
gradient for linear regression
partial derevative for theta0 and theta1
convex function and bowl shape
Batch gradient descent: every make full use of the training examples
gradient descent can be subceptible to local minima in general. gradient descent always converges to the global minimum.
review
vector is a matric which is nx1 matrix
R refers to the set of scalar real numbers.
n refers to the set of n-dimensional vectors of real numbers.
1.3.4Addition and scalar Multiplication
The knowledge here is similar with linear algebra, possibly there is no necessity to learn it.
1.3.5Matrix vector multiplication
The knowledge here is similar with linear algebra, possibly there is no necessity to learn it.
1.3.6Matrix Multiplication Properties
identity matrix
1.3.7review and review Inverse and Transpose of matrix
through computing, Matrix A multiply inverse A is not equal inverse A multiply Matrix A
week02Linear regression
2.1Multiple features(variables)
2.1.1Multible linear regression
compute the value xj(i) = value of feature j in ith training sets
x3(2), x(2) means the line 2 and the x3 means the third number, that is to say it is 2.
put the hypothesis to the n order, that is multivariable form of the hypothesis function
to define the function hθ(x) of the n order, we need to make sense its meaning, there is an example to explain.
2.1.2gradient descent for multiple variables
2.1.3gradient descent in practice 1 - feature scaling
mean normalization
appropriate number of mean normalization can make the gradient descent more quick.
use xi := (xi - ui) / si,
where
is the average of all the values for feature (i) and si is the range of values (max - min), or si is the standard deviation.
is the average of all the values for feature (i) and si is the range of values (max - min), or si is the standard deviation.
2.1.4Graident descent in practice ii - learning rate
how to adjust the parameter of learning rate, it is also a type a debug, so may be use a 3 to multiply the original learning rate to adjust the value to the optimal.
If J(θ) ever increases, then you probably need to decrease α.
when the curve is fluctuent, it needs a smaller learning rate.
To summarize:
If α is too small: slow convergence.
If α is too large: may not decrease on every iteration and thus may not converge.
2.1.5features and polynomial regression
feature scaling is to find a new function that can fit the range of training examples, such as if the price is up to the feets ranging from 1 to 1000, and then the polinomial regression is used to change the type of the original function.
like this, we use two functions to compute the result, and x1 and x2 are that
2.1.6Normal equation
for an example of normal equation
in programming:
x' means transpose x
pinv(x) means inverse Matrix
normal regression formula
the comparasion of the gradient descent and normal regression
2.1.7Normal Equation Noninvertibility
feature scaling: 特征缩放
normalized features:标准化特征
2.1.8practice of octive
some basic operation for octive, it is like some operations in matlab or python.numpy
2.1.9vectorization
h(theta) is the original sythphasis function relate with theta0 and theta1, now use octave to vectorize it. prediction = theta' * x + theta(j) * x(j)
its programming in C++ below
download octave to programming, GNU octave docs is here.
2.2homeworkweek02
part1 linear regression with one variable
2.2.1input the data
create a scatter plot to represent the plotData on the column1 and column2
clear ; close all; clc
%% ==================== Part 1: Basic Function ====================
% Complete warmUpExercise.m
fprintf('Running warmUpExercise ... \n');
fprintf('5x5 Identity Matrix: \n');
warmUpExercise()
fprintf('Program paused. Press enter to continue.\n');
pause;
%% ======================= Part 2: Plotting =======================
fprintf('Plotting Data ...\n')
data = load('ex1data1.txt');
X = data(:, 1); y = data(:, 2);
m = length(y); % number of training examples
% Plot Data
% Note: You have to complete the code in plotData.m
plotData(X, y);
View Code
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:机器学习 coursera【week1-3】 - Python技术站