shuffling the toxicity dataset?

### Describe the issue linked to the documentation

the dataset in [this example](https://skrub-data.org/stable/auto_examples/02_text_with_string_encoders.html#sphx-glr-auto-examples-02-text-with-string-encoders-py) is sorted by label: the first 500 tweets are toxic and the rest non-toxic

the example does not explicitly shuffle it, but it still works because `cross_validate` for classification uses a stratified k fold by default. However that is not immediately obvious, and I think there has been some discussion recently in scikit-learn that it might be better not to stratify by default. So I think it might be good to actually shuffle the dataset.

Not sure where would be the best place to do it -- in the hosted zip file, in the fetcher, in the example, or by passing shuffle=True to cross_validate

WDYT @Vincent-Maladiere 

### Suggest a potential alternative/fix

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shuffling the toxicity dataset? #1234

Describe the issue linked to the documentation

Suggest a potential alternative/fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development