Skip to content

Commit b4b5090

Browse files
committed
docs: 📝 add flowcharts
1 parent 6e324c8 commit b4b5090

4 files changed

Lines changed: 5 additions & 0 deletions

File tree

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@ The Boruta algorithm tries to capture all the important features you might have
6666
At every iteration, the algorithm compares the Z-scores of the shuffled copies of the features and the original features to see if the latter performed better than the former. If it does, the algorithm will mark the feature as important. In essence, the algorithm is trying to validate the importance of the feature by comparing with randomly shuffled copies, which increases the robustness. This is done by simply comparing the number of times a feature did better with the shadow features using a binomial distribution. Since the whole process is done on the same train-test split, the variance of the variable importance comes only from the different re-fit of the model over the different iterations.
6767

6868

69+
<img src="./docs/boruta.png" alt="drawing" width="600"/>
70+
6971
## BoostARoota
7072

7173
BoostARoota follows closely the Boruta method but modifies a few things:
@@ -82,6 +84,8 @@ BoostARoota follows closely the Boruta method but modifies a few things:
8284

8385
In the spirit, the same heuristic than Boruta but using Boosting (originally Boruta was supporting only random forest). The validation of the importance is done by comparing to the maximum of the median var. imp of the shadow predictors (in Boruta, a statistical test is performed using the Z-score). Since the whole process is done on the same train-test split, the variance of the variable importance comes only from the different re-fit of the model over the different iterations.
8486

87+
<img src="./docs/boostaroota.png" alt="drawing" width="600"/>
88+
8589
## Modifications to Boruta and BoostARoota
8690

8791
I forked both Boruta and BoostARoota and made the following changes (under PR):
@@ -111,6 +115,7 @@ In the spirit, the same heuristic than Boruta but using Boosting (originally Bor
111115
- Not based on a given percentage of cols needed to be deleted
112116
- Plot method for var. imp
113117

118+
<img src="./docs/grootcv.png" alt="drawing" width="400"/>
114119

115120
## References
116121

docs/boostaroota.png

118 KB
Loading

docs/boruta.png

76.1 KB
Loading

docs/grootcv.png

55.6 KB
Loading

0 commit comments

Comments
 (0)