Different solution M3.01?

PiaBrinkmann · 4 June 2021 08:53

Dear all,

when I solved task M3.01, I had a different solution than in the notebook. Maybe I made a mistake, maybe it is because I used the default settings for the cv parameter? I would like to understand this and happy for some input.

Here is my code:

# Write your code here.
from sklearn.model_selection import cross_validate

learning_rate = [1e-2, 1e-1, 1, 10]
max_leaf_nodes = [3, 10, 30]

for A in learning_rate:
    for B in max_leaf_nodes:
        model.set_params(classifier__learning_rate=A, classifier__max_leaf_nodes=B)
        cv_results = cross_validate(model, data, target)
        scores = cv_results["test_score"]
        print(f"Accuracy score via cross-validation with A={A} and B={B}:\n" # and B
              f"{scores.mean():.3f} +/- {scores.std():.3f}")

Best wishes,
Pia

glemaitre58 · 4 June 2021 09:16

This is the reason indeed. Your solution and the provided solution are using different data. The provided solution uses cv=2 to accelerate the processing of the resource available on the server but the default cv=5 is a better default if computational resources allow.

Something that is missing in your code (maybe this is intentional) is the mechanism to select the parameters corresponding to the best scores over the iteration. In the provided solution, this is the aim with the condition if mean_score > best_score: that will do this job.

echidne · 10 June 2021 13:01

Another potential reason to explain why PiaBrinkmann did obtain different results is that in his/her code the cross validation is done on the full dataset when in the solution you cross_validate on the data_train and target_train subsets

glemaitre58 · 10 June 2021 13:16

Good point, I look quickly at the solution and I missed it

Alvin19 · 13 June 2021 06:37

Hi, I just curious on how the best_score be determined in line 4 of the code below?
The mean_score is generated from scores_mean (line 2) while the best_score is equal to mean_score (line 5).

If best_score is equal to mean_score, then when the mean_score will more than best score in line 4?

scores = cross_val_score(model, data_train, target_train, cv=2)
mean_score = scores.mean()
print(f’scores: {mean_score:.3f}’)
if mean_score > best_score:
best_score = mean_score
best_params = {‘learning_rate’: lr, ‘max leaf nodes’: mln}
print(f’Found new best model with score {best_score:.3f}!’)

echidne · 13 June 2021 10:03

Hi Alvin,
first I show code with instantiation to have an easier understanding :

scores = cross_val_score(model, data_train, target_train, cv=2)
mean_score = scores.mean()
print(f’scores: {mean_score:.3f}’)
if mean_score > best_score:
    best_score = mean_score
    best_params = {‘learning_rate’: lr, ‘max leaf nodes’: mln}
    print(f’Found new best model with score {best_score:.3f}!’)

Before that best_score has been initialised with a value of 0 and best_params as an empty dict.
In the 2 for loop the parameter learning_rate et max_leaf_nodes will be choosen in a list of values and for each parameter the scores of your model will be evaluated by cross_val_score and stoked as a array of 2 elements (cv=2).
Then the mean of the 2 scores will be stocked in the variable mean_score and you compare the mean_score to the best_score. If the mean_score is better than the best score then best_score take as value mean_score and the params are stocked in the dict best_params.
At each new turn of the loop new scores will be found via cross_val_score(model, data_train, target_train, cv=2), mean_score will then have a new value that will be compared to the actual best_score value.
So at the first turn of the loop you ll have automaticaly a new best model, since the first model tested will automatically produce a mean_score > 0 and best_score will take the value of the mean_score of the first model tested. At the second turn you’ll have a new best_score only if the mean_score of the second model is greater than the mean_score of the first model. And so on …
If python the value of a variable is not static and can be dynamically changed.

I hope is clearer for you

Alvin19 · 13 June 2021 14:01

Thank you @echidne for your reply.
I can understand the whole code from your explanation.

I hope to get more practice from constructing a function or complex for loop. Do you have any suggestion for me to learn and practise?

Once again, thank you.

echidne · 13 June 2021 15:31

Hi Alvin,
I learned Python through a very good mooc on FUN but it is in French.
If you want one in English, you can find several ones on edx.org for example but I can’t say anything about it since I did not test them.
In the course presentation you have several links but the on for Python go to a very basic presentation of Python.
If you have other questions about Python do not hesitate to ask. I’ll try to help you.

Alvin19 · 13 June 2021 16:47

@echidne
Alright, thank you.

Marc_In_Singapore · 16 June 2021 09:11

@echidne I am actually registered in that course but I am not sure if I will find the time and energy at night to squeeze 9 weeks and 9 hours / week…

I am a bit confused though on the running of the course.

It says the course is archived. What does that mean?

Is there support like the one we have during this course?

How is the forum organized if a bunch of folks all take it at different times and there is no “synching” between them? For example, I took some FUN MOOCs on stats and R, and quantum physics, and it was really nice to have a sense of everyone moving at the same pace with weekly topic openings. Same concerns, questions, etc.

echidne · 16 June 2021 12:17

@Marc_In_Singapore I suppose you are asking me questions about the MOOC to learn Python on FUN done by 2 members of the INRIA at Nice (Arnaud Lebout and Thierry Parmentelat)?

I’m not a member of the pedagogic team but I’m an ancient student and an active member of the forum of that mooc.
That course is not archived. When a course on Fun is archived that means you can have access to the course but that the forum is dead and you can no more interact with teachers or other students.
I suppose you get confused with another very good Mooc for Python beginners on FUN done by members of Bruxelle and LA Réunion Universities : Apprendre à coder avec Python that has stopped for the summer holidays and will be back in September (I have done that mooc too).
To answer about your concern about the support : yes you have an active support at the forum of that mooc.
The mooc from Inria of Nice is done to allow you to learn at your own rythm. Each part of the course is cutted in beginner , intermediar and advanced users sections. When you ask question on the forum you can indicate in which part of the mooc you have difficulties as here.
Btw i did also the mooc on R as you but not everyone were moving at the same pace. Me for example started the course when all the sections have been opened