COUSERA Machine Learning Week02

13 min readNov 12, 2020

ソフトウェアエンジニアが Cousera の機械学習コーシ（2週目）に参加して学んだことをメモ代わりに共有します。今回から線形回帰の基礎と Octave を使って、実際に計算を行うことをしました。また、数学要素が多めです。高校と大学の数学がこんな形で役立つとは感動しました。

1週目の記事もあるので良かったら読んでみてください。

COUSERA Machine Learning Week01

ソフトウェアエンジニアが Cousera…

atsss.medium.com

Multivariate Linear Regression

前回までは簡易にするために変数を一つしか持たない単回帰を取り扱ってきました。今回からは特徴（変数）が複数ある多量変の線形回帰を扱っていきます。

Multiple Features

前述しましたが、今回から特徴が複数ある線形回帰を扱います。特にこの節では記法の説明をしておこうと思います。

上記のように書き表される多変量の線形回帰は x0 を1とすることで、行列を使って書き換えることができます。

上記の T の文字は行列の転置を意味します。転置は、簡単に説明すると行を列として置き換えて、新しい行列を作ることです。詳しい説明は下記の記事に譲ります。

転置行列の基本的な４つの性質と証明

転置行列とは？から始まり，基本的な性質とその証明。転置によりトレース，行列式が変わらないこと，行列積，逆行列と転置という操作の交換について。

mathtrain.jp

このように、行列を使って書き表すことによって、可読性が上がります。更に、行列の計算を備えているプログラミング言語では、メモリの使用効率も上がります。

最後にこの節のサマリーを載せておきます。

Linear regression with multiple variables is also known as “multivariate linear regression”.
We now introduce notation for equations where we can have any number of input variables.

The multivariable form of the hypothesis function accommodating these multiple features is as follows:

In order to develop intuition about this function, we can think about θ0 as the basic price of a house, θ1 as the price per square meter, θ2 as the price per floor, etc. x1 will be the number of square meters in the house, x2 the number of floors, etc.
Using the definition of matrix multiplication, our multivariable hypothesis function can be concisely represented as:

This is a vectorization of our hypothesis function for one training example; see the lessons on vectorization to learn more.

Gradient Descent For Multiple Variables

この節では、多変量の線形回帰における、最急降下法について説明します。前回までは、変数 x0 と x1 だけ（n=1）をもつ線形回帰に関して、扱っていました。ここで x0 は 1 を表すので、表記を省略できます。それが下記の数式です。

多変数の場合（n>1）の場合は、これと同様に偏微分を行うことによって、下記の数列のように書き表されます。

これを書き換えると下記のな数式になります。

以上より、多変数の線形回帰の最急降下法は上記の数式で表現できます。

Feature Scaling

この節では、最急降下法を使う際のテクニックとして、フィーチャースケーリングというものを説明します。詳しい説明は下記の記事に譲りますが、どうしてフィーチャースケーリングをする必要があるのかを説明しようと思います。

Feature Scalingはなぜ必要？ - Qiita

特徴量の取りうる値の範囲（スケール）を変えることです。データセットの特徴量間でスケールが異なることは多々あります。例えば、体重と身長、家の価格と部屋数では、その単位と値の範囲が異なります。…

qiita.com

フィーチャースケーリングを行う目的は、最急降下法を最も早く収束させることです。x0 から xn までの教師データに、それぞれ極端な範囲があったとします。ある変数 xi は 0<xi<5 の範囲で値が変わり、別の変数 xj は 0<xj<2000 の範囲で値が変わるとします。このように、変数の値が極端異なると、その式のグラフより歪な形になります。これは、最急降下法を使う際に、計算の手間を増やすことになります。そのために、特徴量の範囲を調整することで、範囲をおおよそ -1<x<1 に収まるようにします。注意点としては、-0.00001<x<0.00001 のような極端な値になってしまってダメです。これが上記記事内の正規化に該当します。

最後にこの節のサマリーを載せておきます。

We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.
The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally:
−1 ≤ x_{(i)}x(i) ≤ 1
or
−0.5 ≤ x_{(i)}x(i) ≤ 0.5
These aren’t exact requirements; we are only trying to speed things up. The goal is to get all input variables into roughly one of these ranges, give or take a few.
Two techniques to help with this are feature scaling and mean normalization. Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. Mean normalization involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero. To implement both of these techniques, adjust your input values as shown in this formula:
xi := xi−μi/si
Where μi is the average of all the values for feature (i) and si is the range of values (max — min), or si is the standard deviation.
Note that dividing by the range, or dividing by the standard deviation, give different results. The quizzes in this course use range — the programming exercises use standard deviation.
For example, if x_ixi represents housing prices with a range of 100 to 2000 and a mean value of 1000, then, xi :=1900price−1000/1900.

Normal Equation

この節では最急降下法とは別の解法である、正規方程式について説明します。正規方程式とは、学習率や収束発散を気にせずに目的のθを得るための数式です。それは下記の数式で表されます。

この数式の導出方法には、本講では言及がありませんでした。ただ、気になったので調べてみると下記の記事がとても分かりやすかったです。

線形回帰の Normal Equation（正規方程式）について - Qiita

某オンライン機械学習コースの Linear Regression with Multiple Variables（多変量線形回帰）で出てきた、Normal Equation（正規方程式）について。 Andrew Ng…

qiita.com

ただ、導出方法がわからなくても、公式のように覚えてしまっても問題ないと思います。そして、最急降下法と正規方程式の使い分けですが、概ね正規方程式を使っておけば問題ないと思います。正規方程式では、最急降下法でボトルネックであった、学習率の選定や収束発散の経過観察などの心配が無用です。一応、正規方程式のデメリットとしては、計算コストが特徴（変数）の数に3乗で比例するというのがあります。しかし、現在のパソコン性能だと、特徴（変数）の数が1万以上になるまでは問題なく計算することができるそうです。一応、場合によっては最急降下法を使うほうが有効な場合があるらしいですが、ほとんどの場合は正規方程式を使っておけば問題ないと思います。

最後にこの節のサマリーを載せておきます。

Gradient descent gives one way of minimizing J. Let’s discuss a second way of doing so, this time performing the minimization explicitly and without resorting to an iterative algorithm. In the “Normal Equation” method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:

There is no need to do feature scaling with the normal equation.
The following is a comparison of gradient descent and the normal equation:

With the normal equation, computing the inversion has complexity \mathcal{O}(n³)O(n3). So if we have a very large number of features, the normal equation will be slow. In practice, when n exceeds 10,000 it might be a good time to go from a normal solution to an iterative process.

Normal Equation Noninvertibility

前節で説明した正規方程式の XtX が逆行列を持たないという場合が稀に起こります。その場合は、下記の2つの条件に当てはまっていることが考えられます。

ある特徴が、ある別の特徴と比例の関係にある
データセット(m) に対して、特徴量(n) が多すぎる

なので、XtX が逆行列を持たない場合は、まず特徴に比例関係がないかを調べて、その後に冗長な特徴量を削った後に、正規方程式の計算を再度行います。

最後に今週のぶんの宿題のコードを載せておきます。全然コード量書いてないけど、理解するのに結構時間かかりました。

atsss/cousera_ml

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

3週目の記事もあるので良かったら読んでみてください。

COUSERA Machine Learning Week03

ソフトウェアエンジニアが Cousera の機械学習コーシ（3週目）に参加して学んだことをメモ代わりに共有します。今回から分類の問題を解いていきます。分類問題が分かると色々応用ができそうでテンションが上りました。

atsss.medium.com

COUSERA Machine Learning Week02

COUSERA Machine Learning Week01

ソフトウェアエンジニアが Cousera…

Multivariate Linear Regression

Multiple Features

転置行列の基本的な４つの性質と証明

転置行列とは？から始まり，基本的な性質とその証明。転置によりトレース，行列式が変わらないこと，行列積，逆行列と転置という操作の交換について。

Gradient Descent For Multiple Variables

Feature Scaling

Feature Scalingはなぜ必要？ - Qiita

特徴量の取りうる値の範囲（スケール）を変えることです。データセットの特徴量間でスケールが異なることは多々あります。例えば、体重と身長、家の価格と部屋数では、その単位と値の範囲が異なります。…

Normal Equation

線形回帰の Normal Equation（正規方程式）について - Qiita

某オンライン機械学習コースの Linear Regression with Multiple Variables（多変量線形回帰）で出てきた、Normal Equation（正規方程式）について。 Andrew Ng…

Normal Equation Noninvertibility

atsss/cousera_ml

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

COUSERA Machine Learning Week03

ソフトウェアエンジニアが Cousera の機械学習コーシ（3週目）に参加して学んだことをメモ代わりに共有します。今回から分類の問題を解いていきます。分類問題が分かると色々応用ができそうでテンションが上りました。

Written by Ats

No responses yet

COUSERA Machine Learning Week02

COUSERA Machine Learning Week01

ソフトウェアエンジニアが Cousera…

Multivariate Linear Regression

Multiple Features

転置行列の基本的な４つの性質と証明

転置行列とは？から始まり，基本的な性質とその証明。転置によりトレース，行列式が変わらないこと，行列積，逆行列と転置という操作の交換について。

Gradient Descent For Multiple Variables

Feature Scaling

Feature Scalingはなぜ必要？ - Qiita

特徴量の取りうる値の範囲（スケール）を変えることです。 データセットの特徴量間でスケールが異なることは多々あります。例えば、体重と身長、家の価格と部屋数では、その単位と値の範囲が異なります。…

Normal Equation

線形回帰の Normal Equation（正規方程式）について - Qiita

某 オンライン機械学習コース の Linear Regression with Multiple Variables（多変量線形回帰）で出てきた、Normal Equation（正規方程式）について。 Andrew Ng…

Normal Equation Noninvertibility

atsss/cousera_ml

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

COUSERA Machine Learning Week03

ソフトウェアエンジニアが Cousera の機械学習コーシ（3週目）に参加して学んだことをメモ代わりに共有します。今回から分類の問題を解いていきます。分類問題が分かると色々応用ができそうでテンションが上りました。

Written by Ats

No responses yet

特徴量の取りうる値の範囲（スケール）を変えることです。データセットの特徴量間でスケールが異なることは多々あります。例えば、体重と身長、家の価格と部屋数では、その単位と値の範囲が異なります。…

某オンライン機械学習コースの Linear Regression with Multiple Variables（多変量線形回帰）で出てきた、Normal Equation（正規方程式）について。 Andrew Ng…