Skip to content

Localization of scikit-image website content. #7296

Open
@steppi

Description

Hi,

I'm working for Quansight labs, helping set up infrastructure for translating content from the websites for core scientific python packages as part of the CZI Scientific Python Community & Communications Infrastructure. @jarrodmillman and @stefanv were authors one and two on the grant proposal, but I'll give an overview for the sake of everyone else reading this.

The goal is to translate the brochure websites of at least 8 of the Scientific Python core projects into at least 3 commonly used languages. The list of them can be found here. By "brochure website", I mean the project website that give a general overview of the package, as distinct from technical documentation like API references, examples, and tutorials. For scikit-image this is https://scikit-image.org/.

So far translations have been completed and published for https://numpy.org. I've recently reached out to Pandas (pandas-dev/pandas#56301 (comment)) and scikit-learn (scikit-learn/scikit-learn#28105), and plan to reach out to maintainers from the remaining core projects over the next week. There's a lot of work involved in setting up translation infrastructure, finding coordinating with qualified translators, and approving and publishing translated content. The hope is that a cross-functional team including employees from Quansight together with volunteer translators and reviewers could take on much of the burden, minimizing the effort needed from core project maintainers themselves.

For translation management, we've been using Crowdin enterprise. Crowdin have generously offered a free supported enterprise organization we can use for managing translations across the different projects. So far the support has been excellent. Crowdin can be synced with a GitHub repo containing content, with segmented strings of content being uploaded to Crowdin for translation, and translations sent back to the repo as commits to a running PR. For numpy.org, Crowdin was synced to directly to the repo https://github.com/numpy/numpy.org hosting the website content. Based on things that have come up in the discussions with Pandas and scikit-learn maintainers, it seems would be better to have a separate repo for managing translated content.

I'm just interested in getting the ball rolling here, and will give more info as things develop over the next coming weeks. Here's a summary of the steps I think would be involved:

  1. Set up a repository for managing content that should be translated, with an automated process to get the latest content whenever changes are made. There may be multiple repos where content needs to be taken from. (For scikit-image much of it is in the docs folder from the primary repo, but I think at the least the index is in https://github.com/scikit-image/skimage-web.)

  2. Set up Crowdin integration with this repository. Markdown files can be segmented automatically, gnu gettext can be used for sphinx .rst files to generate .po files as described here https://www.sphinx-doc.org/en/master/usage/advanced/intl.html.

  3. Myself and/or colleagues from Quansight will help take care of finding and vetting interested and qualified translators, and there will hopefully be large overlap between the translators for different projects.

  4. Publishing translations on the core project website, with a drop down selector to choose between languages. How this is done will depend on the static site generator used. For sites using the Scientific Python Hugo theme (thanks @jarrodmillman and @stefanv) like numpy.org, setting this up is almost automatic. I've found that scikit-image is using the pydata-sphinx theme. There, I think the version selector could be used, or code could be copied from it to make a separate language selector.

Please let me know if you have any questions, especially from those who are much more knowledgeable than me about much of this stuff, and would probably like to hear more specifics.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions