Understanding the Details of the Model

navneethc · 17 February 2022 19:09

Quoting from the M1.02 exercise notebook:

All scikit-learn models can be created without arguments, which means that you don’t need to understand the details of the model to use it in scikit-learn.

(emphasis mine)

Should the highlighted part perhaps be reworded? I’m not exactly sure what its intention is, so I’m not able to offer a suggestion, however I feel that someone new to the subject might mistake this to mean ML models can be applied without understanding what’s happening underneath. And a couple of years ago, I remember a thread on Twitter where someone pointed out the default argument in one of the classes was not what people often thought it to be. (Not saying it’s a mistake on the side of the developers of course, but that users actually need to pay attention to the parameters and their defaults, if any.)

Edit: My apologies for not noticing the sub-forum related to this specific exercise. Mods, please feel free to move this thread to the appropriate section, if required.

suvayu · 17 February 2022 19:16

Perhaps you mean this thread?

glemaitre58 · 17 February 2022 21:05

Looking at all initialization parameters can be distracting if this is the first time you are using it. The intention is therefore to avoid students/users going into details of the documentation at the first exercise of the MOOC

We emphasize greatly the importance of hyperparameters in the next section “Selecting the best model”.

lesteve · 23 February 2022 14:42

I feel that someone new to the subject might mistake this to mean ML models can be applied without understanding what’s happening underneath.

I see what you mean, I think we could indeed reword this as something like:

All scikit-learn models can be created without arguments. This is convenient because that means you don’t have to know the full details of a model before starting to use it. The value of the default parameters has been chosen with care but that does not mean that they will be the best for your data, we’ll see how to chose model parameters in the Module 3 about Hyperparameter tuning.

If that’s the Twitter post that @navneethc had in mind as well, well that’s a long story and I am going to simplify it and say it’s a trade-off between academic purity vs avoiding that users shoot themselves in the foot by default. I would say that scikit-learn is strongly favoring the the latter option but as with every trade-off there is no perfect solution that will make everyone happy …

navneethc · 23 February 2022 16:25

Indeed, that would be better. Thank you for considering the suggestion.

Yes, that’s the incident I was referring to. To reiterate, I brought it up to emphasise the importance of the user being aware of the default values rather than to judge the choice of the defaults itself.

lesteve · 23 February 2022 17:01

“incident” is a bit strong I would call it an argument which is the kind of things that tends to happen reasonably often on Twitter

lesteve · 23 February 2022 17:16

@navneethc by the way are you happy with my proposed rewording in Understanding the Details of the Model - #4 by lesteve?

navneethc · 23 February 2022 17:31

Yes, I am. In fact I marked your reply as a “solution”. Thanks, again.