- In this article, we will discuss the concepts of Logistic Regression and see how it can help us with problems.
- Logistic Regression is a classification algorithm used to assign objects to a discrete set of values (like 0, 1, 2, …). A typical example is the email classification, including work email, home email, spam email, … Whether online transactions are safe or not, benign or malicious. The above algorithm uses the logistic sigmoid function to provide a probabilistic assessment. Example: This tumor is 80% benign, this transaction is 90% fraud, …
2. Place the problem
- The bank you are working with has a preferential loan program for apartment buyers. However, recently there are many attractive apartments, so the number of applicants for the incentive program has increased significantly. Normally you can browse 10-20 applications a day to decide whether the application is loaned or not, but recently you received 1000-2000 records per day. You cannot process all applications and you need a solution to predict whether the new application should be loaned or not. After analyzing, you notice that there are two factors determining whether the application is accepted or not, that is salary and work experience. Below is an example graph
- Logically, we can immediately think of drawing a line that divides the blue points and then making a decision for a new point based on that line. The example looks like this:
- For example, the green line is the dividing line. Predictions for a salary of 6 million and 1 year of experience are not acceptable
- However, because banks are facing difficulties, they limit lending, banks require documents of over 80% to lend. Now, it is not just to decide whether to loan or not, but to find the probability that the loan is loaned.
3. The sigmoid function
- Now have to find the loan probability of 1 record, of course the value in the paragraph [0, 1] already. The function that always has a value in the paragraph [0, 1], if it’s continuous and easy to use, is the sigmoid function.
- 12Hàm liên tục và luôn đưa ra giá trị trong khoảng (0, 1)
- 12Có đạo hàm tại mọi điểm nên có thể dùng gradient descent
4. Set up the problem
Basically, we will have the following steps for a machine learning problem:
- Set up the model
- Set loss function Loss Function
- Find parameters by loss function optimization
- Predict new data based on the newly found loss function
- With flow i in the data, call is salary and is the work experience of the ith profile
- is the probability that the model predicts the ith loan
- is the probability that the model predicts the ith record not to lend.
- We have it now
- The sigmoid function is:
- Similar predictor function in Linear Regression is , in Logistic Regression we have the following prediction function:
4.2 Loss Function – Loss function
- Now we need a function to evaluate the goodness of the model (ie make a prediction).
- We have comments as follows: + If the ith record is loan, ie = 1, we want as close to 1 as possible or the model that predicts the probability for the ith application is as high as possible. + If the ith record is not lending, ie = 0 then we want as close to 0 as possible, or the model that predicts the probability for an ith application is as low as possible.
- With each point ( , ), we call the loss function (
Trong Machine learning, Deep leaning thì chúng ta hiểu log là ln nhé)
- Let’s try to evaluate the L function. If . This is a graph of the loss function in case = 1
- Comment: + L function decreases from 0 to 1 + When the model predicts = 1, ie the predicted value is close to the real value then L is small, approximately 0. + When the model predicts = 0, ie the predicted value is the opposite of the real value L is very large.
- On the contrary, if , we have the following graph
- Comment: + L function increases from 0 to 1 + When the model predicts near 0, ie the predicted value is close to the real value then L is small, approximately 0. + When the model predicts close to 1, ie the predicted value is the opposite of the real value then L is very big => The function L is small when the model value is close to the true value and very large when the model predicts incorrectly, or in other words, the smaller L, the closer the predicted model is to the true value. => The problem of finding the smallest value of L.
- We have a loss function on all data sets as follows:
4.3 Compute complex derivatives using Chain Rule technique
- What is a chain rule? If z = f (y) and y = g (x) or z = f (g (x)) then
- Try to apply the derivative of the sigmoid function . =
4.4 Apply gradient descent
- With each point ( , ), call the loss function Inside is the value predicted by the model, and yi is the true value of the data.
- Applying Chain rule we have:
- From the graph we see:
- The same, similar:
- This is on 1 data point, and on all the data:
4.5 Representation by matrix
- After performing gradient descent we find w0, w1, w2. With each new profile We calculate the percentage that should be loaned Then compare it with firm t’s loan threshold (usually 0.5, or higher 0.8), if then lend, otherwise not lend.
4.6 Building dividing lines
- Consider the line y = ax + b, then f = y – (ax + b), we have a line dividing the plane by 2 parts, 1 part f> 0, 1 part f <0 and the points on the line. then f = 0.
- Suppose the midpoint is 0.5 then > = 0.5 then lend, otherwise don’t lend.
- The same, similar => straight line The line separating the lending and denial points.
- In the case of any generalization, <=>
- We see when t = 0.8, the dividing line is closer to the red points than t = 0.5, even if the previous 2 red points are accepted, now they are rejected.
- Predict whether email is spam or not
- Bank transaction prediction is fraud or not
- Predict benign or malignant tumor
- Predict whether the loan will be repaid
- Predict whether your investment in a start-up will pay off.