AI Black Belt - Yellow

# AI Black Belt - Yellow

Day 3/4: Practice regression algorithms for sentiment analysis

---

## Outline

[Day 3] Practice regression algorithms for sentiment analysis
- Learn to evaluate properly the performance of your models.
- Practice regression algorithms with Scikit-Learn.
- Implement a full ML pipeline: from raw to text to sentiment analysis.

---

## Wrap-up of the day 2 challenge

Jump to `day2-03-challenge.ipynb`.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/AI-BlackBelt/yellow/master)

---

# Model evaluation

---

# Classification

.center.width-70[![](figures/day2/classification.jpg)]

.footnote[Credits: vas3k, [Machine Learning for Everyone](https://vas3k.com/blog/machine_learning/), 2018.]

---

## Accuracy

The accuracy is the proportion of correct predictions.

```python
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, clf.predict(X_test)))
>>> 0.84
```

---

## Precision, recall and F-measure

$$
\begin{aligned}
\text{Precision} &= \frac{TP}{TP + FP} \\\\
\text{Recall} &= \frac{TP}{TP + FN} \\\\
F &= \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
\end{aligned}
$$

.center.width-90[![](figures/day2/pr.png)]

---

```python
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import fbeta_score

print("Precision =", precision_score(y_test, clf.predict(X_test)))
print("Recall =", recall_score(y_test, clf.predict(X_test)))
print("F =", fbeta_score(y_test, clf.predict(X_test), beta=1))

>>> Precision = 0.8118811881188119
>>> Recall = 0.8631578947368421
>>> F = 0.8367346938775511
```

---

## ROC AUC

Area under the curve of the false positive rate (FPR) against the true positive rate (TPR) as the decision threshold of the classifier is varied.

.grid[
.kol-1-2[
.center.width-100[![](figures/day2/roc-auc.png)]
]
.kol-1-2[
```python
from sklearn.metrics import roc_auc_score
print("ROC AUC =", roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
>>> ROC AUC = 0.9297744360902256
```
]
]

---

# Training and test data

.center.width-70[![](figures/day2/train-test-split.png)]

Why splitting?
- The error estimated on training data is typically optimistic. It **does not** accurately represent the performance the model would have on new data.
- The actual performance should be estimated on (independent) test data.

.footnote[Credits: Andreas Mueller, [Introduction to Machine Learning with Scikit-Learn](https://github.com/amueller/ml-workshop-1-of-4/), 2019.]

???

Make the analogy with the classroom.

E.g., estimate region from how long it took to arrive.

---

Jump to `day3-01-evaluation.ipynb`.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/AI-BlackBelt/yellow/master)

.footnote[Credits: vas3k, [Machine Learning for Everyone](https://vas3k.com/blog/machine_learning/), 2018.]

---

# Regression

---

.center.width-70[![](figures/day2/regression.jpg)]

---

# Algorithms

Supervised learning algorithms can often be used both for classification and regression. This includes:
- decision trees
- K-nearest neighbors
- linear models
- neural networks.

---

.center.width-60[![](figures/day3/map.jpg)]

---

# Metrics

## Mean squared error

Measures the average squared difference between the estimated values and what is estimated.

```python
from sklearn.metrics import mean_squared_error
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_squared_error(y_true, y_pred)
0.375
```

---

## R2 score

The coefficient of determination represents the proportion of total variance explained by the model / total variance.
- The best score is 1.0 and it can be negative.
- A constant model would get a score of 0.0.

```python
>>> from sklearn.metrics import r2_score
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> r2_score(y_true, y_pred)
0.948...
```

---

Jump to
- `day3-02-regression.ipynb`
- `day3-03-boston.ipynb`
- `day3-04-amazon.ipynb`

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/AI-BlackBelt/yellow/master)