SOLUTION DETAILLEE
Prenez le temps de comparer avec vos reponses. Verifiez chaque etape !
RAPPEL DU MODELE
$$z = \textcolor{#3498db}{0.15} \cdot x_1 + \textcolor{#e67e22}{0.25} \cdot x_2 + \textcolor{#9B7AC4}{(-3.0)}$$
$$P(\text{spam}) = \frac{1}{1 + e^{-z}}$$
PARTIE 1 : Calcul des scores z
1.1) Email A ($x_1 = 10$, $x_2 = 8$) :
$$z_A = \textcolor{#3498db}{0.15 \times 10} + \textcolor{#e67e22}{0.25 \times 8} + \textcolor{#9B7AC4}{(-3.0)}$$
$$z_A = \textcolor{#3498db}{1.5} + \textcolor{#e67e22}{2.0} \textcolor{#9B7AC4}{- 3.0} = \textcolor{#27ae60}{\mathbf{0.5}}$$
1.2) Email B ($x_1 = 5$, $x_2 = 4$) :
$$z_B = \textcolor{#3498db}{0.15 \times 5} + \textcolor{#e67e22}{0.25 \times 4} + \textcolor{#9B7AC4}{(-3.0)}$$
$$z_B = \textcolor{#3498db}{0.75} + \textcolor{#e67e22}{1.0} \textcolor{#9B7AC4}{- 3.0} = \textcolor{#27ae60}{\mathbf{-1.25}}$$
1.3) Email C ($x_1 = 20$, $x_2 = 12$) :
$$z_C = \textcolor{#3498db}{0.15 \times 20} + \textcolor{#e67e22}{0.25 \times 12} + \textcolor{#9B7AC4}{(-3.0)}$$
$$z_C = \textcolor{#3498db}{3.0} + \textcolor{#e67e22}{3.0} \textcolor{#9B7AC4}{- 3.0} = \textcolor{#27ae60}{\mathbf{3.0}}$$
1.4) Email D ($x_1 = 2$, $x_2 = 2$) :
$$z_D = \textcolor{#3498db}{0.15 \times 2} + \textcolor{#e67e22}{0.25 \times 2} + \textcolor{#9B7AC4}{(-3.0)}$$
$$z_D = \textcolor{#3498db}{0.3} + \textcolor{#e67e22}{0.5} \textcolor{#9B7AC4}{- 3.0} = \textcolor{#27ae60}{\mathbf{-2.2}}$$
$\boxed{z_A = 0.5 \quad z_B = -1.25 \quad z_C = 3.0 \quad z_D = -2.2}$
PARTIE 2 : Conversion en probabilites
2.1) Email A ($z_A = 0.5$) :
$$P_A = \frac{1}{1 + e^{-0.5}} = \frac{1}{1 + 0.607} = \frac{1}{1.607} = \textcolor{#F7E64D}{\mathbf{0.622}}$$
2.2) Email B ($z_B = -1.25$) :
$$P_B = \frac{1}{1 + e^{1.25}} = \frac{1}{1 + 3.49} = \frac{1}{4.49} = \textcolor{#F7E64D}{\mathbf{0.223}}$$
2.3) Email C ($z_C = 3.0$) :
$$P_C = \frac{1}{1 + e^{-3}} = \frac{1}{1 + 0.05} = \frac{1}{1.05} = \textcolor{#F7E64D}{\mathbf{0.953}}$$
2.4) Email D ($z_D = -2.2$) :
$$P_D = \frac{1}{1 + e^{2.2}} = \frac{1}{1 + 9.03} = \frac{1}{10.03} = \textcolor{#F7E64D}{\mathbf{0.100}}$$
$\boxed{P_A = 62.2\% \quad P_B = 22.3\% \quad P_C = 95.3\% \quad P_D = 10.0\%}$
PARTIE 3 : Decisions de classification
3.1) Decisions (seuil = 0.5) :
- Email A : $P_A = 62.2\% \geq 50\%$ → $\textcolor{#e74c3c}{\text{SPAM}}$
- Email B : $P_B = 22.3\% < 50\%$ → $\textcolor{#27ae60}{\text{HAM}}$
- Email C : $P_C = 95.3\% \geq 50\%$ → $\textcolor{#e74c3c}{\text{SPAM}}$
- Email D : $P_D = 10.0\% < 50\%$ → $\textcolor{#27ae60}{\text{HAM}}$
3.2) Emails classes SPAM : A et C
3.3) Emails classes HAM : B et D
PARTIE 4 : Metriques de performance
- Email A : Predit SPAM, Reel SPAM → $\textcolor{#27ae60}{TP}$ (True Positive)
- Email B : Predit HAM, Reel HAM → $\textcolor{#27ae60}{TN}$ (True Negative)
- Email C : Predit SPAM, Reel SPAM → $\textcolor{#27ae60}{TP}$ (True Positive)
- Email D : Predit HAM, Reel HAM → $\textcolor{#27ae60}{TN}$ (True Negative)
$\boxed{TP = 2 \quad TN = 2 \quad FP = 0 \quad FN = 0}$
$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} = \frac{2 + 2}{2 + 2 + 0 + 0} = \frac{4}{4}$$
$\boxed{\text{Accuracy} = \textcolor{#27ae60}{100\%}}$
$$\text{Precision} = \frac{TP}{TP + FP} = \frac{2}{2 + 0} = \frac{2}{2}$$
$\boxed{\text{Precision} = \textcolor{#27ae60}{100\%}}$
$$\text{Recall} = \frac{TP}{TP + FN} = \frac{2}{2 + 0} = \frac{2}{2}$$
$\boxed{\text{Recall} = \textcolor{#27ae60}{100\%}}$
PARTIE 5 : Interpretation
- $a_1 = 0.15$ (majuscules)
- $a_2 = 0.25$ (liens)
$\textcolor{#e67e22}{a_2}$ a le plus d'impact car $0.25 > 0.15$. Chaque lien supplementaire augmente davantage le score z que chaque mot en majuscule.
L'email sera plus susceptible d'etre spam car $a_2 = 0.25$ a un impact plus fort. Par exemple, 0 majuscules + 12 liens donne :
$$z = 0 + 0.25 \times 12 - 3.0 = 0$$
$$P = 50\%$$
$$z = 0.15 \times 12 + 0 - 3.0 = -1.2$$
$$P \approx 23\%$$
$$P = 0.5 = \frac{1}{1 + e^{-z}}$$
$$1 + e^{-z} = 2$$
$$e^{-z} = 1$$
$$-z = 0$$
$\boxed{z = 0}$
Quand $z = 0$, on a exactement $P = 50\%$ (point d'indecision).
RESUME DES RESULTATS
| Email | z | P(spam) | Decision | Reel | Resultat |
|---|
| A | +0.5 | 62.2% | SPAM | SPAM | Correct (TP) |
| B | -1.25 | 22.3% | HAM | HAM | Correct (TN) |
| C | +3.0 | 95.3% | SPAM | SPAM | Correct (TP) |
| D | -2.2 | 10.0% | HAM | HAM | Correct (TN) |
- $\textcolor{#3498db}{Bleu}$ : contribution des majuscules ($a_1 \cdot x_1$)
- $\textcolor{#e67e22}{Orange}$ : contribution des liens ($a_2 \cdot x_2$)
- $\textcolor{#9B7AC4}{Violet}$ : biais du modele ($b = -3.0$)
- $\textcolor{#27ae60}{Vert}$ : score z et resultats corrects
- $\textcolor{#F7E64D}{Jaune}$ : probabilites calculees
- $\textcolor{#e74c3c}{Rouge}$ : classification SPAM