Can compound characters be used in a domain name?

28 April 2020 - Stéphane Bortzmeyer

In the Charlie Hebdo edition No. 1446 of 8 April 2020, the article by Laure Daussy on the increase in violence against women during lockdown due to the COVID-19 epidemic refers to the website https://arretonslesviolences.gouv.fr/ (stop violence). But, unfortunately, the site name contains an error (arrêtonslesviolences.gouv.fr instead of arretonslesviolences.gouv.fr). This ‘error’ (correct spelling but not the correct domain name) is, alas, frequent. arrêtonslesviolences.gouv.fr does not exist and visitors who try the address given in the article would run into an error message.

This is not the first time that this problem has been observed. Indeed, during the debate on the future ‘Digital Republic’ law in 2015, the newspaper Le Parisien made the same error, citing the address of the debate site with a compound character not used in the correct address. More troublesome, the domain name with the compound character had been registered by a third party, and a website criticising the government had been set up behind this name. (This scenario could not arise for arretonslesviolences.gouv.fr as gouv.fr is sub-domain with restricted registration). It is therefore worth returning to the issue of domain names and compound characters, and giving some advice both to those registering domain names to enhance their online presence and to those who cite and publish these names.

First, we need to set out the current state of technology: contrary to what we still read all too often, it is perfectly possible to use compound characters, like é, ÿ or ç, in a domain name. (Compound characters are sometimes called ‘diacritical characters’ or ‘Unicode characters’). The technical standard on this subject was published over thirteen years ago, which is an eternity in computer science. (These names including compound characters are called IDNs, for Internationalized Domain Names) The top-level domain .fr has authorised these IDNs for seven years. Afnic has been also publishing via the website https://réussir-en.fr/ for years for that matter. It is also possible to have domain names written in Arabic, Chinese, Armenian and other scripts.

A private individual looking to create an online presence must therefore ask the question: should the domain name include compound characters or not? Or both? The choice is complex. Let's say your project goes by the name ‘café bien serré’ (strong coffee). Should the project leaders register the name café-bien-serré.fr, or cafe-bien-serre.fr? The first has the advantage of being in correct French. In addition to aesthetic satisfaction, this avoids jokes when the same word, without the compound characters, has another meaning (I'll leave it to you to find examples).

But on the other hand, you have to take into account users. The misconception “you can't put compound characters in a domain name” is still common, and some users might be surprised by the name and ‘correct’ it by removing the accents. In addition, even if the technical standard is old, IT is a mixture of constant innovation and extreme inertia. Some software still exists that does not work properly with domain names containing composite characters. This is all the more true since the dominant country on the Internet does not use composite characters in its language, and the developers of this country are therefore not necessarily aware of this issue. We will sometimes see failures to connect to the website using these names, or the name displayed in what the Japanese call mojibake, incomprehensible characters. If, with the name réussir-en.fr, you sometimes see the appearance of mojibake like, for example, xn—russir-en-b4a.fr, then you are using software with these old errors. And it's not just on the Web: email, for example, doesn't always handle email addresses with these names. Again, a technical standard exists, but that doesn't mean it is deployed everywhere.

Faced with these problems, a possible solution is to buy both names, with and without compound characters. One of the advantages of this solution is that it prevents a third party from registering them to create confusion. But, in addition to the question of budget, you still need to decide which one to give priority to, which one to publicise, which one to use to redirect weblinks. In short, there is unfortunately no perfect solution for now, everyone with an online presence must make a choice. It would be good it that choice could at least be an informed choice, and that future webmasters who have doubts about a name realise that, no, domain names are not restricted to the characters used in English.

I have sometimes heard the suggestion that the name be chosen with this issue in mind, thereby avoiding compound characters completely. But you can imagine the intellectual contortions it would take to get all project names to fit this Procrustean bed approach. Although arretonslesviolences.gouv.fr could perhaps be replaced by mettonsfinauxviolences.gouv.fr or halteauxviolences.gouv.fr, it would be more difficult in the case of ‘café bien serré’... Especially since, when we begin to consider an online presence, the name of the project or organisation may have already been around for a long time.

As there is no obvious choice on the part of the holder of the future domain name, some will choose to register an IDN like café-bien-serré.fr, others an older name like cafe-bien-serre.fr. This therefore entails a responsibility for people who cite domain names, for example in an article. Care must be taken to write the domain name correctly. If it is cafe-bien-serre.fr, do not ‘correct’ it into correct French as it may no longer work. If it is café-bien-serré.fr, do not ‘correct’ it, thinking that domain names can't have accents, as this is in fact not true.

Background note: origin of IDNs

Strictly speaking, it has always been possible for domain names to contain compound characters. But in practice, this was not feasible for various reasons, certain techniques (the lack of standard encoding, along with case insensitivity rules) and other policies (registration rules). After several attempts, and a lot of controversy (the question of languages and scripts is still very sensitive), it was not until March 2003 that a technical standard was developed in the form of the document ‘RFC 3490’ from the IETF (Internet Engineering Task Force, the standard-setting body), allowing for IDN (Internationalized Domain Names) and for them to work within existing software programmes, without the need to change the entire Internet infrastructure.

But while having a technical standard is one thing, deploying it is another. Some software had to be adapted (a task which is still ongoing to an extent), and the various domain name registries are adapting their registration policies. Top-level domains with compound characters have therefore been authorised since May 2010. And the possibility of registering them as .fr has existed since May 2012.

Note that the technical standard was revised fairly thoroughly in August 2010, by ‘RFC 5890’. So we now use IDN ‘version 2’.

Lire cette ressource en français Top of the page