On November 17 and 18 two study days on dictionary creation will take place in Lyon. The first day will allow discussion about methods and practices with a session dedicated to the Wiktionary project. The second day will be participative with group training in dictionary writing for the Wiktionary project, focused on the ten words of the Francophonie. And it’s co-organized by Lyokoï and Noé, two editors of Actualités!
- In August, we were talking about a Quebec-based association for the promotion of the Shawiya language. An interview in the newspaper L’initiative allows you to learn more about their activities, which include a workshop on contribution to Wiktionary!
- In an article in the newspaper Le Soir, Michel Francard takes interest in the term tote-bag and notes that “Wiktionary, well known for its lexical scouting, mentions one [attestation] as early as June 2013”.
- In an article in English, the Kansas City Star writes about whitesplaining by referring to the entry in the English Wiktionary, whitesplain. This word has not yet been added to the French Wiktionary.
- The Swiss newspaper 24 heures reports on a court case in which Wiktionary was called upon because it was the only dictionary to offer a definition for a problematic term. At the time of the first judgment the court had to determine whether the term used by the accused was a racist insult and relied solely on the definition given in Wiktionary. On appeal, the Federal Court pointed out that “Wiktionary has no official status and its definitions are open to modifications”. An entirely correct analysis and especially true in this case, since the page in question doesn't yet have any usage examples that could help identify the context in which the term is used. The case was transferred to the Cantonal Court of Vaud.
- A columnist of the website Jeuxvideo.com does not hesitate to rely on the “knowledge of Wiktionary” to define video game in the introduction to a controversial article.
- In a long paper published on the website Les Numériques, Jérôme Cartegini presents and compares the encyclopedia Universalis 2018, Le Grand Robert online and Wikipedia. Too bad Wiktionary was not included in the comparison, as it contains more entries than the Grand Robert and also has more usage examples.
- The designation blanco, tipex or blanc (correcteur) is regionalized according to français de nos régions. A lot more than you'd expect. And according to the author, Wiktionary is the only dictionary to have all the variants of blanc correcteur (or not...).
- An article on the lemonde.fr blog reports on the role of the internet in the evolution of conlang communities. A story that quickly forgets the previous centuries, whose creations were described in a book presented in the Actualités of January 2016.
- From mid-September to mid-October (from 09/20/2017 to 10/20/2017)
- French entries increased by 1,508 and quotations increased by 1,080. There are now 357,235 lemmas, 527,416 definitions and 331,833 quotations or examples.
- The three other languages which progressed the most are Northern Sami (+ 6,554 entries), Italian (+ 1,209 entries) and Esperanto (+ 280 entries).
- Four languages were added in the project (here with French names): lhomi (+1), merei (+1), limilngan (+1) and lorrain (+1).
- In October 11,272 entries were created for 76 languages!
- New lexicons
- Creation of Catégorie:Lexique en français de l’e-commerce.
- Words of the month
Statistics provided by Wikiscan:
A vote to split existing thesauri with ambiguous titles led to the creation of: cirque (naturel) and cirque (spectacle); langue (anatomie) and langue (linguistique); paresseux (animal) and paresseux (personne); assimilation culturelle and assimilation (biologie); racine (végétale), racine (odontologie), racine (linguistique), racine (informatique), racine (géologie) and racine (figuré et sociologique).
As of October 31 2017, the French Wiktionary contains 317 thesauri in French and a total of 452 thesauri in 54 languages!
23 new thesauri this month, five of which in French: punition [punishment], peine de mort [death penalty], prison [jail] (first thesaurus creation by Classiccardinal!), armure [armour] and tissage [weaving].
- There are 32,855 illustrations (images and videos) in the French Wiktionary entries and 258 have been added since last month.
Identifying a root
In automatic language processing, several operations can be used to produce tools for a language. Richard Khoury and Francesca Spasford have tried to create a tool for Latin stemming from the English Wiktionary, which they report in their article “ Latin word stemming using Wiktionary ” (in Digital Scholarship in the Humanities, volume 31, number 2, June 2016, pages 368–373). Their approach uses the database and links between pages that are specified in very precise declination models in order to link the roots to endings for verbs and suffixes for nouns. From a database dump of May 2015, they proceeded with three cleaning steps and then obtained 655,434 word forms for 32,860 roots.
The best tool before their experiments, the Schinke Stemmer, works on a different principle and is based on a set of rules that automatically stem by creating hypothetical roots, not always producing valid words but nevertheless reducing their overall number in a text, making it easier to search it with a search engine.
By comparing both tools, they observe that the one based on Wiktionary misses words that it does not know, but nevertheless reduces the vocabulary of a text much more effectively. Additionally it allows you to access a dictionary of definitions directly afterwards, something not possible with the previous tool. They even plan to improve their use of the Wiktionary database to integrate the part of speech categories of entries in order to produce an additional tool for morphosyntactic labelling of a corpus.
These uses show that the Wiktionary projects contain data that are not only usable as a dictionary, but also allow, through their regular structures, reuse by machines to create new tools — a review by Noé.
About patrol and patrollers
Some remarks about the role of patrollers:
Patrollers are editors who spend some of their time to read contributions made on Wiktionary.
They have a tool which tells them about changes still in need of patrolling. Only anonymous contributions or edits made by users lacking the "auto-patrol" flag have to be checked.
After proofreading they can mark a contribution as patrolled.
Being patrolled means free of vandalism in a broad sense, which implies:
- deletion of clearly defamatory material
- deletion of material containing personal information
- deletion of information irrelevant to the page title
- deletion of copyrighted information
- restoring of correct information after deletion or corruption
These are the basic actions of the patroller. They may, in this context, if they are not administrators, be required to request that contributions containing defamation, personal information and copyright violations be concealed by them.
Then, the patroller can, wether he wishes, go further by operating on the presentation of various possible additional actions such as:
- to correct a page to conform to the expected structure of a Wiktionary page
- to correct typography
- to correct spelling
- to correct or add templates
- to correct or add categories
- to check the sources and references that are used.
Last but not least, and by far the most interesting, it can investigate the substance, ensuring the accuracy of a contribution, or even providing additional information or corrections.
It must be said, this part is by far the longest and also the least easy.
Thus, it is possible:
- to add missing inflections
- to add quotes
- to add pronunciations, anagrams, etc.
- to verify the accuracy of translations.
Concerning this last point, it is necessary to have a certain level of linguistic skills, very rich material on a large number of languages and knowledge of the grammar of several languages — which is not the case for everyone.
Translation errors are indeed numerous, although made in good faith, often because of the metonymy processes are not the same for all languages. This means that it is sometimes fatal to copy a translation found elsewhere (dictionary, Wikipedia, etc.).
For example, many languages distinguish by different names the action from its result, the content from its container, the building from the institution, etc., where the French language does not necessarily do so. Thus, in Finnish: loading (action): kuormaus / loading (what is loaded): kuormitus; the town hall (the building): kaupungintalo / the town hall (administration): pormestarin
And of course, we find the same problem in the opposite direction Finnish/French.
It is, however, quite rare to encounter real mistranslations. I remember one, several years ago, on the English Wiktionary who had amused me: intrigued by the fact that I found several pages on the net giving the word anaullaut in Inuktitut, and knowing that this word meant stick I found, after some research, that the origin was that a contributor had found in an Inuktitut/English dictionary: anaullaut : bat and has created this entry on the English Wiktionary by specifying Category:Animals; this has been reused and translated into French by other websites.
Yet, alas for him, it was the English word bat but in his meaning of batte — for example in baseball — and not bats (animal) ...
If you have also noticed some crazy or funny contributions, do not hesitate to report them here for a future issue. — a chronicle by Unsui
Dictionary of the month
What happens when Wiktionary becomes a reference against its own will? When discussing the sources of our project it becomes clear that they aren't at all structured like on Wikipedia. We don't share the same attitude towards original research and could even count as a source ourselves. Well, actually we're doing this already. And I can prove it with the little dictionary of the month. A pocket reference which gives an overview of “French vocabulary borrowed from Gaulish, Breton and the Celtic languages”. Yann Lukas shows us some familiar words and some with an unexpected Celtic origin. He suggests Celtic roots for some slang words where standard dictionaries are lost: à dache, loufer, morfal and many more.
But on page 62 we find a funny turn of phrase: Tamis: although disputed, the Gaulish etymology of tamis is tempting. In his Dictionnaire des étymologies obscures (Payot, 1982), Pierre Guiraud opts for a Latin origin stamen, also the root of étamine. Wiktionary prefers the Low-Franconian tamisa (source of Old Dutch teems). [...] So we are cited in a recent etymological analysis. And our hypothesis for tamis isn't very solid. Actually it was added by an IP without giving sources, and other users have added more on top. Still, it shouldn't be discarded entirely since an etymologist has attested a certain value.
Apart from this small appearance which might bring us fame (or not), or at least acknowledgment, this short dictionary of Celtic words is filled with anecdotes about Celtic languages that allow us to get a better understanding of them in our world today. We also get to wonder about tortured Breton which got words that don't suit it: menhir (the Bretons say peulvan), dolmen (they say lichaven), kermesse (from the Flamish kerkmisse) or even triskèle (from Greek and written as triskell to make it look more Celtic). — a chronicle by Lyokoï
This section gives you a monthly selection of videos related to linguistics or the French language, don't hesitate to add more videos you find!
- Le Monde : the website of the newspaper Le Monde has published a 4 minute video on inclusive writing (écriture inclusive).
- Benoît Sagot: Extracting an Etymological Database from Wiktionary is a talk presented at the eLex conference (in Leiden, Netherlands) by the French lexicographer Benoît Sagot. He extracted a simple etymological tree from many entries of the English Wiktionary: EtymDB.
- Doct’Auvergne : In the show “le dicovergne” we are told about sérendipité.
- Linguisticae: First a video tossing around etymological hypotheses regarding fake Anglicisms and then another one about inclusive writing.
Curiosity: The phatic function
These are words and expressions like "you see" or "do you follow?" but also words used at the begin of a telephone conversation such as allô?.
Marina Yaguello extends the analysis to all discourses which only have the goal to maintain the conversation without sharing anything at all.
By concentrating on the level of sentences and words it is difficult for a dictionary to describe these usages. One one hand there exist a great deal of variation in the used terms, but finding written attestations isn't always straightforward.
On the other hand it is because of the difficulty to explain the function of these terms well.
Sometimes they are entire sentences, including a verb, emptied of its meaning, to fullfil an entire communicational purpose.
— a chronicle by Noé