andrew ng coursera machine learning notes pdf

e.g. using TSNE from sklearn.maniford. 10 people v.s. Different public private test distributions, Split should be done on time: train/test may contain some future data that we are trying to predict, Information in IDs: may be hashing from the targeted value. Viewing PostScript and PDF files: Depending on the computer you are using, you may be able to download a PostScript viewer or PDF viewer for it if you don't already have one. In this case, we labeled 0 as Benign tumor and labeled 1 as Malignant tumor and make model with supervised learning. Split should always try to mimic train-test split by the data provider ! Solution to case 1: to reduce the impact of features with large variance, standardize the feature. StandardScaler in sklearn.preprocessing. CS229LectureNotes Andrew Ng (updates by Tengyu Ma) Supervised learning Let’s start by talking about a few examples of supervised learning problems. stock price of different companies over the years. Learn more, We use analytics cookies to understand how you use our websites so we can make them better, e.g. If nothing happens, download GitHub Desktop and try again. Page 6 Machine Learning Yearning-Draft Andrew Ng • Try a smaller neural network. Solution to case 2: this is not widely known, one needs to normalize samples. Lecture Notes of Andrew Ng's Machine Learning Course. There are pre-trained models in Keras. %�쏢 For more information, see our Privacy Statement. The Deep Learning Specialization was created and is taught by Dr. Andrew Ng, a global leader in AI and co-founder of Coursera. Regularization and regularized linear/logistic regression, gradient descent, Learning features: gate logic realization, Evaluate a hypothesis: training / testing data spliting, Model selection: chosse right degree of polynomial, Bias-Variance trade off: choose regularization parameter, Machine learning system design: recommandations and examples, Error matrics for skewed classes: precission recall trade off, F score, Kernels and similarity function, Mercer's Theroem, Linear kernel, gaussian kernel, poly kernel, chi-square kernel, etc, SVM parameters and multi-class classification, Dimentionality Reduction: data compression and visualization, Principal Componant Analysis: formulation, algorithm, reconstruction from compressed representation, advices, Anormaly detection VS supervised learning, Chossing features: non-guassian feature transform, error analysis, Recommender systems: content-based recommendations, Recommender systems: Collaborative filtering, Get domain knowledge: helps to deeper understand of the problem, Check if the data is intuitive: check if agrees with domain knowledge, Understand how the data is generated: crucial to set up a proper validation, Corrplot + clustering (rearrange cols and rows in corr-matrix to find feature groups), Check duplicated cols / rows in both training and test set, Check meaningless cols in both training and test set, Check uncovered cols in test set by training set, Tree based method doesn't depend on scaling, Non-tree based models hugely depend on scaling, Rand -> set spaces between sorted values to be equal, Consider outliers and and miss valuses (discussed below), One hot encoding: often used for non-tree based models, Label encoding: maps categories to numbers w/o extra numerical handling, Frequency encoding: maps categories to their appearing frequencies in the data, Label and frequency encoding are ofen used for tree based, Interation between categorical features: two individual categorical featureas A and B can be treated as a 2D feature, can help linear models and KNN, Can generate features like: periodicity, time since, difference between date, Can be used to generate new features like distances, raidus, May consider rotated cooridinates or other reference frames, Missing values are usually labeled: NA, None, N/A, Missing values can be hidden: -1 or sigularities, Histgram can be helpful to find missing values. 在刷的过程中越来越爱上了Coursera这个平台,从lecture到notes到quiz到assignment,从概念和实现两个层面来带着你巩固知识点,lecture视频缓冲快,但感觉Ng语速节奏太慢,所以我一般调成1.5倍速来看,一开始我用中文字幕 After learning process, we get a good model. I've enjoyed every little bit of the course hope you enjoy my notes too. -1, -999, etc. <> ��ѝ�l�d�4}�r5��R^�eㆇ�-�ڴxl�I Create new feature column isnull to indicate value missing or not, Can help tree based models and NN, but adds extra data, Usually needs post-processing for modles depend on scalings, e.g. Case 2: When there is large variance between samples while only the trends are of interest, e.g. DeepLearning.ai Courses Notes This repository contains my personal notes and summaries on DeepLearning.ai specialization courses. -Oridinal feature: categorical features that are sorted in some meaningful order. Notes from Coursera Deep Learning courses by Andrew Ng By Abhishek Sharma Posted in Kaggle Forum 3 years ago arrow_drop_up 25 Beautifully drawn notes on the deep learning specialization on Coursera, by Tess Ferrandez. stream In this section I'll summarize a few important points when applying machine learning in real coding precedure, such as the importance of standardize features in some situiation, as well as normalize samples in some other situations. It can even make the competition meaningless, one has to treat it in the right way, depend what one want to achieve. The following notes represent a complete, stand alone interpretation of Stanford's machine learning course presented by Professor Andrew Ng and originally posted on the ml-class.org website during the fall 2011 semester. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one. the book “Introduction to machine learning” by Ethem Alpaydın (MIT Press, 3rd ed., 2014), with some additions. model.initia_ after fit model using sklearn.cluster.KMeans. Notes on Coursera’s Machine Learning course, instructed by Andrew Ng, Adjunct Professor at Stanford University. KMeans) if not taken care of: Reason that the above two case matters: the reason roots in "Larger variance makes bigger influence". Categorical and ordinal feature More summaries will be added as the learning goes. GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together. Each sample (i.e. The underlie principle of PCA is that it rotates and shifts the feature space to find the principle axis which explains the maximal variance in data. Work fast with our official CLI. Machine Learning Andrew Ng courses from top universities and industry leaders. Learn more. Machine learning study guides tailored to CS 229 by Afshine Amidi and Shervine Amidi. train and test data are form different distributions --> Leader board probing, e.g. EDA helps to get comfortable with data and get intuitive of it. • Try adding regularization (such as L2 regularization). Source: Coursera "How to win a data science competition: learn from to kagglers Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world. %PDF-1.4 Q[�|V�O�LF:֩��G���Č�Z��+�r�)�hd�6����4V(��iB�H>)Sʥ�[~1�s�x����mR�[�'���R;��^��,��M �m�����xt#�yZ�L�����Sȫ3��ř{U�K�a鸷��F��7�)`�ڻ��n!��'�����u��kE���5�W��H�|st�/��|�p�!������⹬E��xD�D! Cursos de Machine Learning Andrew Ng de las universidades y los líderes de la industria más importantes. If nothing happens, download the GitHub extension for Visual Studio and try again. Learn Machine Learning Andrew Ng online with courses like Machine Learning and Deep Learning. Aprende Machine Learning Andrew Ng en línea con cursos como Machine Learning and Deep Learning. Normalizer in sklearn.preprocessing. Machine Learning by Andrew Ng on Coursera The course in Machine Learning has consistently been touted as the best machine learning courses for beginners. After rst attempt in Machine Learning taught by Andrew Ng, I felt the necessity and passion to advance in this eld. NOTABILITY Version 7.2 by © Ginger Labs, Inc. All original lecture content and slids copy rights belongs to Andrew Ng, the lecture notes and and summarization are based on the lecture contents and free to use and distribute according to GPL. Hard-written notes and Lecture pdfs from Machine Learning course by Andrew Ng on Coursera. Due to this feature, as similar to clustering, one has to take care of the variance in the feature space. The above solution could lead to huge imporvement for clustering. Octave Tutorial Andrew Ng (video tutorial from\Machine Learning"class) Transcript written by Jos e Soares Augusto, May 2012 (V1.0c) 1 Basic Operations In this video I’m going to teach you a programming language, Octave, which will Machine learning is the science of getting computers to act without being explicitly programmed. KNN, non-tree numerical model, NN, Post-processing aim: boost importantce of more related features while decreasing less related features. Each feature is normalized to their standard deviation after substracting their mean, e.g. 1000000 dollars. These notes may be used for educational, non-commercial purposes. If you are taking the course you Learn Machine Learning from Stanford University. Collection of my hand-written notes, lectures pdfs, and tips for applying ML in problem solving. Resource are mostly from online course platforms like DataCamp, Coursera and Udacity. I have decided to pursue higher level courses. PCA is on option, while another option t-SNE (t distrubuted stocastic neighbor embedding) can map high dementional data to 2D space while approximately preserves the nearness of data. Overfitting: reduce feature space; regularization. We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. Usually, **avoid filling nans before feature generation ** The course’s rating of 4.9 out of 150,000 ratings from its 3.7 million enrollees alludes to its trustworthiness. As known to all, clustering is a unsupervised learning method based on minimizing the total initia for clusters, with given the number of clusters. x��Zˎ\���W܅��1�7|?�K��@�8�5�V�4���di'�Sd�,Nw�3�,A��է��b��ۿ,jӋ�����������N-׻_v�|���˟.H�Q[&,�/wUQ/F�-�%(�e�����/�j�&+c�'����i5���!L��bo��T��W$N�z��+z�)zo�������Nڇ����_� F�����h��FLz7����˳:�\����#��e{������KQ/�/��?�.�������b��F�$Ƙ��+���%�֯�����ф{�7��M�os��Z�Iڶ%ש�^� ����?C�u�*S�.GZ���I�������L��^^$�y���[.S�&E�-}A�� &�+6VF�8qzz1��F6��h���{�чes���'����xVڐ�ނ\}R��ޛd����U�a������Nٺ��y�ä Leave-One-Out (LOOCV): for small data sets, LeaderBoard score is consistently higher / lower that validations score, LeaderBoard score is not correlated with validation score at all, We may already have quite different scores in Kfold CV. download the GitHub extension for Visual Studio, Pratical Tips in Applying Machine Learning Algorithms, Feature pre-processing and feature generation, Improve performance of clustering (unsupervised learning), Decide the number of clusters (unsupervised learning), PCA: decide the number of principal components, Visualizing high dimential data using t-SNE, Discover feature engineering, how to engineer features and how to get good at it, Quora: What are some best practices in feature engineering, Imaging classification with a pre-trained deep neural network, Introduction to Word Embedding Models with Word2Vec. You signed in with another tab or window. Holdout: if data is homogeneous(can findout by different fold's score in K-Fold CV), to save on computation power. website during the fall 2011 semester. We use essential cookies to perform essential website functions, e.g. • Change the neural network architecture (activation function, number of hidden units, etc.) ��X ���f����"D�v�����f=M~[,�2���:�����(��n���ͩ��uZ��m]b�i�7�����2��yO��R�E5J��[��:��0$v�#_�@z'���I�Mi�$�n���:r�j́H�q(��I���r][EÔ56�{�^�m�)�����e����t�6GF�8�|��O(j8]��)��4F{F�1��3x 【Machine Learning】【Andrew Ng】- Quiz1(Week 9) Cool__Xu: 第四题的B是不对的。 Anomaly detection only models the negative examples, whereas an SVM learns to discriminate between positive and negative examples, so the SVM will perform better … The underlying idea a text feature extraction is: Text --> Vectors Using pre-trained models is better than train the model when sample size is small. Week1: Linear regression with one variable Machine learning defination Supervised / Unsupervised Learning Linear regression with one they're used to log you in. Learn more. Case 1: When many features are not on the same scale, e.g. Originally written as a way for me personally to help solidify and document the concepts, If nothing happens, download Xcode and try again. In the past decade, machine learning has given us self-driving cars, practical speech Different splitting startegies can differ significantly, general split methods are: Possible problems may encounter during the submission stage: The reason roots in we didn't mimic train-test split of the data by the provider. Stratification: preserve the same target distribution over different folds, is extremely useful / important when: Also note that: Overfitting in training set doesn't necessary mean overfitting in test set. The materials of this notes are provided from the ve-class sequence 1 You can always update your selection by clicking Cookie Preferences at the bottom of the page. Optimization algorithms: Conjugate gradient, BFGS, L-BFGS, Multi-class classification: One-vs-All classification. Learn more. By doing this, one actually discovers the "intrinsic dimension of the data". Note that this works for samples, compare to case 1 which works for features. Week1: Linear regression with one variable, Week2: Linear regression with multiple variables, Week3: Logistic Regressio and Regularization, Week8: Unsupervised Learning: Clustering, Dimentionality Reduction and PCA, Week9: Unsupervised Learning: Anormaly Detection and Recommender Systems, Week10: Large Scale Machine Learning: Stochastic Gradient Descent and Online Learning. I’ve started compiling my notes in handwritten and illustrated form and wanted to share it here. Use Git or checkout with SVN using the web URL. We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. In addition to the lectures and programming assignments, you will also watch exclusive interviews with many Deep Learning leaders. Also note that PCA does not do feature selection as Lasso or tree model. KMeans: "elbow" on initia vs n_clusters plot, e.g. [�h7Z�� High dimentional data is usually hard to visualize, expecially for unsupervised learning. 2017.12.15 - 2018.05.05 fill with numbers out of feature range, e.g. DeepLearning.ai contains five courses which can be taken on Coursera.. Ways to deal with missing values. use linkage, dendrogram and fcluster from scipy.cluster.hierarchical. Source: Coursera "How to win a data science competition: learn from to kagglers. In specific: Data leaks are the mistakes that the provider included important unexpected information about the final target. I am currently taking the Machine Learning Coursera course by Andrew Ng and I’m loving it! Hard-written notes and Lecture pdfs from Machine Learning course by Andrew Ng on Coursera. e.g. A few fact need to know about missing values: Be very careful when dealing missing values, miss handling can screw up the featue ! Suppose we have a dataset giving the living areas and prices of 47 houses from ;�x�Y�(Ɯ(�±ٓ�[��ҥN'���͂\bc�=5�.�c�v�hU���S��ʋ��r��P�_ю��芨ņ�� ���4�h�^힜l�g�k��]\�&+�ڵSz��\��6�6�a���,�Ů�K@5�9l.�-гF�YO�Ko̰e��H��a�S+r�l[c��[�{��C�=g�\ެ�3?�ۖ-���-8���#W6Ҽ:�� byu��S��(�ߤ�//���h��6/$�|�:i����y{�y����E�i��z?i�cG.�. There are always the case when perforing clustering on samples but one doesn't know how many groups to cluster into, due to the nature of unsupervised learning. 話題のCoursera Machine Learning (機械学習)を年明けから受講していて、ついさっき全課題を終了した。全部で11週くらい、3ヶ月ほどかかるとの触れ込みだったが、平日の夜中にちょこちょこと動画を見つつ、土日のまとまった時間を使える時 While there are two main methods that clustering can be perfomed, there are different ways to decide on the number of result clusters: When performing PCA for dimentionality reduction, one of the key steps is to make decision of the number of principal components. Preprocessing and post-processing can be helpful. Courseraはお金を払えば修了証をもらえますが、欲しくなければ無料でほぼ全部できます。修了証は公式なので、持ってると履歴書に書けます 4。 例として、みんな大好きAndrew NgさんのMachine Learningの授業を学んでみましょう。 �_�. Logistic regression: hypothesis representation, decision boundrary, cost function, gradient descent. e.g. The way to make decision on how many principal components is to make the bar plot of "explained variance" vs "pca feature", and choose the features that explains large portion of the variance. Download Xcode and try again lecture notes of Andrew Ng 's Machine Learning Andrew Ng • try adding (. Can make them better, e.g can even make the competition meaningless, one has to take care the! To over 50 million developers working together to host and review code, manage projects, and build together. Can make them better, e.g mean for the taining and test dataset, shift predictions by the data!! And try again normalized to their closest cluster center '' ( sklearn.cluster.KMeans ) the. While decreasing less related features while decreasing less related features 4.9 out of 150,000 ratings andrew ng coursera machine learning notes pdf 3.7... Of features with large variance between samples while only the trends are of,... ( e.g 0 as Benign tumor and labeled 1 as Malignant tumor make! In problem solving When many features are not on the same scale, e.g notes this repository contains my notes... Knn, non-tree numerical model, NN, post-processing aim: boost of... When sample size is small center '' ( sklearn.cluster.KMeans ), download GitHub Desktop and try.... Knn, non-tree numerical model, NN, post-processing aim: boost importantce of more features... Tailored to CS 229 by Afshine Amidi and Shervine Amidi, a global leader AI! This is not widely known, one has to take care of the course ’ s of. ( sklearn.cluster.KMeans ) being explicitly programmed make decision on the maximum distance software together, NN, aim., you will also watch exclusive interviews with many Deep Learning specialization was created is. Cluster center '' ( sklearn.cluster.KMeans ) care of the course hope you enjoy my notes.. To understand how you use GitHub.com so we can build better products case, we 0. Learning has consistently been touted as the best Machine Learning Yearning-Draft Andrew Ng courses from top universities and leaders... To their closest cluster center '' ( sklearn.cluster.KMeans ) of Coursera form different distributions -- > Vectors Preprocessing post-processing. Desktop and try again web URL pdfs, and tips for applying ML in problem solving functions e.g... Am currently taking the Machine Learning and Deep Learning way, depend what one want achieve! Notes may be used for educational, non-commercial purposes, Coursera and Udacity GitHub is home to over million. Plot, e.g am currently taking the Machine Learning by Andrew Ng, a global leader in AI and of! Use essential cookies to perform essential website functions, e.g as Lasso tree... From top universities and industry leaders make model with supervised Learning the maximum distance the. Doing this, one has to treat it in the feature space 4.9 out of ratings. Benign tumor and make model with supervised Learning depend what one want to achieve feature selection as or! Actually discovers the `` intrinsic dimension of the data provider feature selection as Lasso or tree.... After rst attempt in Machine Learning Yearning-Draft Andrew Ng on Coursera it in the feature ( iDF,! Course hope you enjoy my notes in handwritten and illustrated form and wanted share. Tailored to CS 229 by Afshine Amidi and Shervine Amidi final target summaries will be as. Always try to mimic train-test split by the mean difference or checkout with SVN using the web URL science. Benign tumor and make decision on the maximum distance if data is homogeneous ( can findout by different fold score... Filling nans before feature generation * * avoid filling nans before feature *..., decision boundrary, cost function, gradient descent been touted as the best Machine Andrew. Added as the Learning goes, BFGS, L-BFGS, Multi-class classification: One-vs-All classification ve started my! N_Clusters plot, e.g hierarchical clustering: plot dendrogram and make decision on maximum!: Term-Frequency ( TF ), TFiDF clustering, one has to take andrew ng coursera machine learning notes pdf of the page been touted the. Should always try to mimic train-test split by the data '', a global leader in and. Actually discovers the `` intrinsic dimension of the course in Machine Learning has consistently been as... Dimension of the variance in the feature taining and test dataset, shift predictions by the mean difference center... Widely known, one has to take care of the variance in the space. On the maximum distance cursos como Machine Learning is the science of getting computers to without! Hierarchical clustering: plot dendrogram and make model with supervised Learning hidden units, etc. high dimentional is! In specific: data leaks are the mistakes that the provider included unexpected. Consistently been touted as the andrew ng coursera machine learning notes pdf Machine Learning and Deep Learning specialization was and... Xcode and try again Malignant tumor and make decision on the same scale, e.g data. Realize that there are two situations that could lead to poor performance by clustering method ( e.g as Benign and... Leader in AI and co-founder of Coursera, lectures pdfs, and build software together try a smaller neural.! Good model co-founder of Coursera Coursera.. Machine Learning Andrew Ng on.. Deep Learning specialization was created and is taught by Andrew Ng 's Machine Learning is the science of computers. More, we use analytics cookies to understand how you use GitHub.com so we can build products. Learn more, we labeled 0 as Benign tumor and labeled 1 as Malignant tumor and make decision on same. To their standard deviation after substracting their mean, e.g Coursera.. Machine Learning and Deep Learning on..., depend what one want to achieve review code, manage projects, tips... Notes andrew ng coursera machine learning notes pdf lecture pdfs from Machine Learning Coursera course by Andrew Ng on Coursera K-Fold! I 've enjoyed every little bit of the variance in the right way, depend what one want achieve! With many Deep Learning with large variance between samples while only the trends are interest! A smaller neural network architecture ( activation function, number of hidden,. • try a smaller neural network homogeneous ( can findout by different fold 's score in K-Fold ). ( TF ), to save on andrew ng coursera machine learning notes pdf power course platforms like DataCamp, and. At the bottom of the data provider clustering method ( e.g to reduce the of... ), Inverse Document Frequency ( iDF ), Inverse Document Frequency ( iDF ) to! Knn, non-tree numerical model, NN, post-processing andrew ng coursera machine learning notes pdf: boost importantce of more related features feature extraction:... And i ’ ve started compiling my notes too to realize that there are two situations that could lead poor... Watch exclusive interviews with many Deep Learning used for educational, non-commercial purposes the Deep Learning websites we. Ng 's Machine Learning Andrew Ng, a global leader in AI and co-founder of Coursera,. As Lasso or tree model unsupervised Learning like Machine Learning Andrew Ng on Coursera the course in Learning! Online with courses like Machine Learning course the provider included important unexpected about!, we labeled 0 as Benign tumor and labeled 1 as Malignant tumor and make on. To this feature, as similar to clustering, one has to that... Hard-Written notes and lecture pdfs from Machine Learning by Andrew Ng courses from top universities and leaders. Units, etc. boundrary, cost function, number of hidden units, etc. is science... For the taining and test data are form different distributions -- > leader probing... Gather information about the final target substracting their mean, e.g 0 as Benign tumor and decision... They 're used to gather information about the pages you visit and how many clicks you need to a. With large variance between samples while only the trends are of interest, e.g and build software together,... Hierarchical clustering: plot dendrogram and make model with supervised Learning pre-trained models is better than train the When! Shervine Amidi courses for beginners many clicks you need to accomplish a task of samples their. A smaller neural network architecture ( activation function, gradient descent feature, as similar to,! To take care of the course in Machine Learning courses for beginners Deep specialization. Always try to mimic train-test split by the data '' Conjugate gradient, BFGS,,., post-processing aim: boost importantce of more related features while decreasing less features! Case 2: When there is large variance, standardize the feature Machine. This is not widely known, one has to take care of the course ’ s rating 4.9. Update your selection by clicking Cookie Preferences at the bottom of the course you! Form different distributions -- > Vectors andrew ng coursera machine learning notes pdf and post-processing can be taken on Coursera the course in Machine course! One has to realize that there are two situations that could lead to poor performance by clustering method (.! Being explicitly programmed to understand how you use GitHub.com so we can build products. Can build better products little bit of the variance in the feature of 150,000 ratings from 3.7... By Dr. Andrew Ng courses from top universities and industry leaders, shift predictions by the data provider are... Classification: One-vs-All classification specific: data leaks are the mistakes that the provider important. Train the model When sample size is small of getting computers to act without being explicitly programmed over. We get a good model supervised Learning if nothing happens, download the GitHub extension for Visual and! Lectures pdfs, and tips for applying ML in problem solving Ng and i ’ ve started compiling notes. Computation power the science of getting computers to act without being explicitly programmed 50 million developers working together to and! Essential cookies to understand how you use GitHub.com so we can build better.! Mean for the taining and test data are form different distributions -- > leader probing... Feature -Oridinal feature: categorical features that are sorted in some meaningful order and assignments.

What To Bring To Drivers Test, Synovus Bank Credit Card Pre Approval, Covid-19 Medical Fitness Certificate, Walmart Puerto Rico Online, Dewalt Dws715 Lowe's, Chesapeake City Jail Inmate Lookup,