The Linux Rain Linux General/Gaming News, Reviews and Tutorials

How to build and edit LibreOffice dictionaries

By Bob Mesibov, published 13/04/2017 in Tutorials


When writing or editing scientific-type text in LibreOffice Writer, I rely a lot on its spellchecker. Unfortunately, the scientific terms I'd like to check aren't in the default dictionaries behind LibreOffice's spellchecking routine.

There are two ways out of this dilemma. The first is to deal with the terms one at a time. For example, if I type florgiedorfle in a Writer document, the spellchecker throws a wriggly red line under the word:

Typing F7 brings up a dialog that allows me to add the word to a LibreOffice dictionary:

and if I add the word, the spellchecker stops being alarmed:

Build it yourself

Adding words to a dictionary one at a time is slo-o-o-ow. A faster way is to create a custom dictionary using your own special list of words.

LibreOffice (I have version 4.3) does its spellchecking using several different dictionaries. The main "external" one is listed under Options/Language Settings/Writing Aids/Available modules and is called the Hunspell SpellChecker. This usually lives at /usr/share/hunspell, so it's available system-wide. In my case it consists of word-building materials for Australian and US English: en_AU.aff, en_AU.dic, en_US.aff, en_US.dic.

The more accessible dictionary is "internal" to LibreOffice for individual users and is called standard.dic. On my system it's found in the folder /home/bob/.config/libreoffice/4/user/wordbook. It's a plain text file and the start of it looks like this:

OOoUserDict1
lang: <none>
type: positive
---

Following those four lines are the words you've added one by one within LibreOffice Writer. These are sorted C-style, with capitalised words alphabetised first, then uncapitalised words.

LibreOffice allows you to add your own custom dictionaries to that wordbook folder. Just give each custom dictionary its own name, start it with the lines shown above, and sort it C-style.

For example, I had a huge list of scientific names (correctly spelled) in the plain-text file names. I sorted them C-style on the command line and saved the sorted result as names.dic:

I then added the four LibreOffice header lines to names.dic and moved that file to /home/bob/.config/libreoffice/4/user/wordbook. Those scientific names don't panic the spellchecker now, and I can add new names one-by-one to my list using the 'Add to Dictionary' feature, because LibreOffice now lists names.dic as a local dictionary:

Editing a dictionary

It's easy enough to add a word to a LibreOffice dictionary, but how do you delete or modify a word?

LibreOffice has instructions for simple deletions. Go to Options/Language Settings/Writing Aids/User-defined dictionaries, choose the dictionary you want to edit and click the Edit button. In the editing dialog that pops up, choose the word you want to remove and click Delete:

Well, that's one word at a time, again, and you can't modify the word rather than delete it. But since the custom ("user-defined") dictionaries are just plain text files, they're easily edited in any text editor. You can also do it on the command line, as shown below:

Modify a word

Note the use here of sed's in-place option, "-i". If you don't want to risk mucking up that dictionary (and you haven't backed it up somewhere), use the option "-i.old". This edits standard.dic but creates a backup copy of the original, standard.dic.old, in the same folder. The backup won't appear in Options/Language Settings/Writing Aids/User-defined dictionaries because of its strange ".dic.old" suffix, but it's there in /home/[username]/.config/libreoffice/4/user/wordbook if you need it.

Modify a number of words

The file example.dic contains five misspelled words. The file subs contains the misspelled words and their correctly spelled replacements, in each pair with a tab separating the words:

There's more than one way to do multiple replacements on the command line, but regular Linux Rain readers will know that I'm a big fan of AWK. In the command shown below, AWK first builds an array ("a") from subs where the array index is the misspelled word and the value is the replacement. It then works through example.dic line by line, and if it finds a misspelled word from subs it replaces that word with the correctly spelled version from the array:

awk -F"\t" 'FNR==NR {a[$1]=$2;next} {if ($1 in a) {$1=a[$1]} print}' subs example.dic > temp

The result is saved in a temp file which is renamed example.dic with the mv command.

Remove a number of words

The easiest way to bulk-delete is to first make up a list of the words to be deleted and save the list as a file, then use grep and its "-f" (look for strings in file) and "-v" (invert the search) options:

Please note...

The LibreOffice set-up shown here is the one I have on my Debian system: version 4.3 and custom dictionaries in /home/bob/.config/libreoffice/4/user/wordbook. With other LibreOffice versions and other systems the custom dictionary location may be different.


About the Author

Bob Mesibov is Tasmanian, retired and a keen Linux tinkerer.

Tags: cli libreoffice dictionaries grep tutorials awk scripts gui
blog comments powered by Disqus