Paris 2011

Session 4 - Terminology and Lexicology / Terminologie et Lexicologie

Bookmark and Share
Version imprimable
Gertrūda Naktinienė, Linas Valiukas et Jolanta E. Zabarskaitė

Contemporary dictionary application: tips and tricks of getting it right

Voir la video

Full text/Texte intégral

1Even in the age of automatic machine translation one of the main tools for the day-to-day human translator remains the monolingual and the bilingual dictionary application. The dictionary application is a highly important and useful tool for the amateur translator too who just wants to find out how to say “pancakes and coffee, please” in French. Additionally, the monolingual and bilingual dictionary applications for various languages have a considerable demand in the general user market.

2However, the technological environment for dictionary applications (and desktop applications in general) has changed quite a lot in the recent years. While just some time ago one could manage to get away with Windows-only dictionary application sold as a CD, nowadays linguists, software developers and managers have to consider the colorful and diverse range of devices in their users’ hands. As there is no more a single winner in the operating system or mobile device market, publishers of dictionary applications must “hit” them all: Windows, Mac OS X, Linux based computers, iPhone mobile phones, iPod touch multimedia players, iPad tablets, Android mobile phones, Android tablets, et cetera.

3We present the most pragmatic solution to the problem described above that we were able to come up with. Namely, we describe in detail:

  • Which software development approaches are best for creating the multi-platform dictionary application ?

  • Why you are probably better off with choosing various free and open-source software components instead of proprietary ones ?

  • How do you deliver software to your users nowadays in the manner that is best for both the users and you, as a publisher of the software ?

  • What is an iterative approach and how do you apply it to dictionary applications ? Or, in other words, why it is counterproductive and even harmful to “get it right the first time ?”

  • You release the first version of your dictionary application. What next ? Should you call it a “job done” already ?

4To make it easier for the listeners (and us too !), we tell the story via the example of the “Dictionary of the Lithuanian Language” (lit. Lietuvių kalbos žodynas, or LKŽ), the application that we have released just months ago. “Dictionary of the Lithuanian Language” is the dictionary consisting of ~339,000 indexed words and ~237,000 articles. We believe that this dictionary is comprehensive for academic purposes, medium for a wider public, and concise for schools. The “LKŽ” dictionary is a work-in-progress at the Institute of the Lithuanian Language.

5The intended audience for this white paper is strictly non-technical staff at various organizations that are planning to release a multilingual or bilingual dictionary application. The primary objective of the paper is to outline main points of the modern dictionary application development from the managerial perspective.

Software development approaches

Iterative, nonlinear development

6The classical “waterfall” software development model (in which the planning, development, testing and deployment stages are being executed in a sequential order) is unfit for the creation of dictionary applications as well as it is no longer suitable for most of the real-world, user-oriented software products.

7Instead of the sequential “waterfall” software development model, we argue that one should choose the so-called iterative (incremental) model when there are no fixed “planning” / “development” / “testing” stages and it is possible to manage and implement various changes at almost any point in the development lifecycle.

8We would like to emphasize the following three ideas derived from the iterative (incremental) model:

  • 1.Do not spend time making a final (constant) list of features at the beginning of the software project. The list of features can (and will) change as the technical environment changes during the project and new ideas of what could be done appear.

  • 2.The developer (a team of developers) who implement the software project of yours should keep in mind that the project requirements might (and most probably will) change during time and so because of that they should plan and do their work accordingly.

  • 3.You, as the project manager, should be able to get your hands on the very early builds of the software product (as opposed to being able to see the working prototype of the product only after the development phase of the project is finished). Later, you should be able to acquire and evaluate weekly (monthly) intermediate versions of the product. You should never let the software developers “hide” the product from you with the motivation being that “it is not finished yet”.


9The consumer computer market is no longer limited to “Windows”, an operating system that is created and distributed by “Microsoft”. In the recent years, “Mac OS X” (created and distributed by “Apple”) and “Linux” (created and distributed by a set of volunteer contributors) has taken up ~7 % and ~1 % (~8 % together) of the overall market share respectively1.

10Eight percent might not seem as much, but it consists of a significant portion of users with specific spending abilities, skills and needs. For example, “Mac” computer users (~7 % of the overall desktop PC market) generally have more money to spend2 and are more willing to spend it on software or by some other means online3.

11“Linux” users (~1 % of the overall desktop PC market) make up an even smaller percent of the overall desktop computer users, but it is an important piece of that market. For example, some non-profit organizations and governments (ranging from the French Parliament to the White House) have decided to move their desktop computer infrastructures to “Linux” in order to save money and time.

12Thus, we argue that an individual or an organization who is planning to create a new dictionary application should not be limited to the “Windows” OS only, and instead should go about creating “Mac OS X” and “Linux” desktop applications too. This particular decision to “cover all bases” is not costly (in terms of both money and time), and such a desktop application can be created using readily available, freely accessible tools.


13Time-boxing is a popular and well-known principle in the software development management, but we feel that it is important to mention it in this paper as it is easy to forget the idea of having strict and implicit deadlines in a project with strong potential for scrupulousness.

14Various popular commercial software development paradigms such as “Scrum”4, “Agile”5 or “extreme programming”6, advocate the idea that time in which a particular software project is completed is as important as, say, a set of features that project contains. Thus, if the project does not meet the deadline, the deadline itself is not extended but instead of that the scope of the project is lessened (some of the second-class features and requirements are postponed until the next version of the software project).

15Time-boxing not only allows the software project to be completed on time, but is useful for trying to curb the often unneeded perfectionist tendencies of those involved in the project. That way the users of the software project – the one and only important part of the project itself – are able to make use of the particular software sooner. Also, when deadlines are met, the users are then more enabled to influence the project itself.

Revision control and issue tracking

16In organizations whose primary concern is not software development (e.g. language institutes such as ours) it is easy to miss such technology advancements as distributed revision control systems and issue trackers. However, those technologies are highly useful in various stages of dictionary application development, and we would highly encourage to take the effort to learn and make use of them as the benefits provided by the two outshines the (sometimes steep) learning curve.

17(Distributed) revision control is a technology primarily used in computer programming to keep track of the code changes made by several developers so that each and every one of them can then 1) have the same (the latest) copy of the code, and 2) see what has been changed by other developers (and why). In a nutshell, there is a single computer that hosts the latest version of the product’s code and keeps track of all the changes that have been made to that code. Revision control is easily adaptable to other less technical tasks such as manual dictionary database editing, collaborative preparation of various types of documents (e.g. the “Help” section of the desktop application), and others.

18Issue tracker is another technology that is being actively used in modern software development tasks. Somewhat similar to the revision control, an issue tracker is a centralized list of all “issues” (various bugs, enhancements, proposals or tasks of the software project). Keeping (or making a team of software developers to keep) an up-to-date, descriptive but terse list of issues will allow you to easily find out the current status of the project.

19Both the revision control and an issue tracker are the essentials of a modern software project of any size that is heading for success, and we believe that the usage of these technologies should be enforced in the software development tasks in your organization.

20Examples of reliable, modern, open-source distributed revision control systems include Git7, Mercurial8 and Bazaar9. Most popular issue tracking software tools are Bugzilla10 and Redmine11. There are many others.

Why use open-source components ?

21For our needs, we chose the “Qt Framework”12 among several candidate multi-platform software development frameworks. There are some other cross-platform application frameworks that could be used for creating multi-platform desktop applications, e.g. “FLTK”13, “wxWidgets”14, “Adobe Flash” (“Adobe AIR”)15 or “Java Swing”16. However, “Qt” best suited the needs, requirements and future projections of our desktop dictionary application. The reasoning is based (but not limited) on the following arguments:

  • “Qt” is an open-source library (as opposed to “Adobe Flash”). Thus, one can always find out what and why happens in the library; also, in case “Qt” lacks commercial-level support in the future, the development of the library can be taken over by a set of volunteers or some other commercial organization.

  • “Qt” is a mature library. The history of “Qt” development dates back to 1992; in this time, the project have probably learned various “dos” and “don’ts” of the desktop application industry; also, the feature set of the library that the developers were able to implement in 20+ years of active development is very rich and stable.

  • At the time of writing, “Qt” is a commercially-backed library. Among volunteer open-source developers around the world, the library is being created, supported, maintained and distributed by “Digia Plc”, a Finnish company which in 2011 acquired the commercial licensing business for “Qt” from Nokia.

22The main argument for choosing “Qt” was the fact that the library is open-source. The rest of the section explains the original motivation to choose an open-source library over the proprietary one for developing a dictionary application:

Cheaper (in terms of both money and labour)

23A lot of software developers are accustomed to the popular software development kits such as “Qt” (or other, similar ones) so it is easier and faster for them to reuse their previously acquired skills for the uses of the dictionary application.

24Also, while commercial support options exist for “Qt” (sold and supported by “Digia”), the framework and the accompanying documentation are free of charge.

Weaker dependencies on untrusted third parties

25As we mentioned before, “Qt” is a commercially-backed open-source library, and that means that the library itself is being created by both the worldwide network of volunteers and the “Digia” company in Finland.

26For the dictionary application developer (or any desktop application developer for that matter) this means that there are less points of possible failure. The proprietary (not open-source) partners may go out of business at any time, introduce new charges or increase the existing ones for using the library, fix bugs and add features in an unsatisfactory pace, or singlehandedly introduce other roadblocks for the project.

27Popular and open-source software development framework such as “Qt” does not suffer from those issues – the dependency of your project (the framework itself) is being used and managed by various other developers with their own needs (which sometimes even match your specific requirements).


28Other, newer platforms may be introduced and become popular in the near future; for example, nowadays the industry does its best to push ARM-based portable computers with various operating systems (“Android”, “iOS”, “Windows 8”, to name a few). Having your dictionary application (or any other desktop application for that matter) based on a software development kit which is intended to be multi-platform, you have a better chance to make your product compatible with whatever is an upcoming trend in the consumer computing market.

Software delivery

29We believe that the approach to software delivery in Compact Discs (CDs) is highly outdated. The CD did not prove itself to be reliable in household environments as it is prone to scratches. Also, newer laptops do not even have a CD-ROM that would be able to read the CD with the dictionary application.

30We chose the following two methods to deliver our dictionary application to the users:

  • 1. USB flash drives. While the USB flash drive adds some additional cost to the final price of the product (when compared to a CD), it has increased longevity and almost all desktop PCs (both new and old ones) have an USB port. Delivering the software product in a USB flash drive is useful for those clients who like to have a “material thing” (a physical installation media) with their purchase (as opposed to the software deliveries over the internet in which users do not have a physical installation media).

  • 2. Delivery over the internet. Nowadays, not all users can (or want to) go buy a physical installation media with the dictionary application. In the case of our dictionary “LKŽ”, the product is in a high demand among the Lithuanian diaspora in various countries (US, UK, Australia, to name a few), and it would be both easier and cheaper if the dictionary users from foreign countries could avoid ordering physical media with the dictionary and instead buy and download the dictionary directly, in the matter of minutes. Thus, we argue that a modern dictionary application must have a strategy on how it could be purchased and delivered over the internet to the end-user.


31The fact that you have released the first version of your dictionary application does not mean that the work of your organization is finished here. Make sure to plan beforehand about the support of your application. We would like to emphasize the following points:

  • Have a strategy for fixing bugs and adding new features. Software bugs (imprecisions in the implementation of the software project) are a natural part of the software lifecycle, and thus you should be able to fix them in a timely manner.

  • Make user-generated feedback welcome and appreciated. Software products are generally created with users in mind, and because of that the users should be able to get indirectly involved in your software project by reporting bugs and asking for additional functionality. Make it easy to report imperfections with your software online (e.g. on your website).

  • Have a strict (time-boxed) release plan. It might be useful for you to continue applying the time-boxing principle even after the first version of your software product is released. For example, set a fixed deadline for the next version of your software; try to fix as many bugs before the release of that next version, and do your best to not miss the deadline. “Rinse and repeat” for the further versions of the software.

Background information

Centre of Lexicography

32The Centre of Lexicography (henceforth LC) of the Institute of the Lithuanian Language specializes in theoretical aspects of lexicology and lexicography, the lexicon and semantics of Lithuanian, monolingual and multilingual lexicography, electronic lexicography, bilingual dictionaries of lesser used languages. Major areas of activity are the preparation of the Dictionary of the Lithuanian Language (in 20 volumes), its computerized version, the accumulation of a computerized database of the Lithuanian.

33LC is a member of EURALEX, the author of this article Assoc. Prof. Dr Jolanta Zabarskaitė is a member of the international EFNILEX group. The objective of the EFNILEX project is the development of a modern, cost-effective method for the production of bi- and multilingual dictionaries, making as much use as possible of modern language technology (for further details see

34The LC is started writing academic Belarusian-Lithuanian dictionary. In the field of historical lexicography the Centre collaborates with Charles University in Prague. The Institute of the Lithuanian Language is the member of META–NET – a Network of Excellence forging the Multilingual Europe Technology Alliance (Zabarskaitė 2009).

Dictionary of the Lithuanian language (LKŽ)

35The final 20th volume of the Dictionary of the Lithuanian language (lit. Lietuvių kalbos žodynas, further LKŽ) was released in 2002, and the biggest work in Lithuanian linguistics of the 20th c. that several generations of linguists worked on was completed. LKŽ was compiled using the paper card index of 4,5 mil words that dates back to 1902. The initiator was Kazimieras Būga, professor of the Universities of Saint Petersburg, Perm, Tomsk (Russia), and afterwards Kaunas (Lithuania).

36The history of LKŽ coincides with the complicated history of the State of Lithuania. LKŽ started writing in 1930. Volume I of the dictionary was released in 1941 and Volume II in 1947. The second volume came out at the beginning of the second Soviet occupation and was destined for the Soviet censorship. The Soviet authorities made demands for the introduction of examples in dictionary entries as illustrations from Soviet literature, Lenin works, documents of the Communist Party. Only after these, words could be illustrated by sentences of living dialects, old writings, and folklore. With the coming of ‘Perestroika’, ideological illustrative examples were discarded. Dictionary was completed after Lithuania regained its independence. Completion of the dictionary was celebrated all over Lithuania. Its significance to researches of Baltic linguistics, Indo-European linguistics, Baltic mythology and culture is acknowledged throughout the world (for further details see Schmalstieg (1996); Kažukauskaitė (2002); Toporov (2004)).

37It is the biggest work in Lithuanian linguistics. It is a mixed-type dictionary including both lexis of writings and living language (dialects). Lexis of writings comprises the period between 1547-2001, i.e. from the release of the first Lithuanian book to the time of dictionary completion. Lexis from living dialects includes words dating back to 1902 and up to 2001. The lexis of dialects is transposed into the standard language according to phonetic laws.

38Researches and doctoral students all over the world use the academic edition of the Dictionary of the Lithuanian language on the website The Dictionary serves as a source of national identity and linguistic investigation of the development of written and spoken Lithuanian and addresses the world community working in the humanities and the society at large. The scope of the Dictionary is ca. 22 000 pages, 0.5 million lexicographic entries, 11 million words. It gives access to archaic, dialectal and contemporary layers of the lexicon of Lithuanian. Illustrations of word meanings have been taken from various sources covering the period from 1547 to 2001: research and religious texts, fiction, folklore and dialects recorded in the last century, starting from 1902. The Dictionary is based on a card index consisting of 4.5 million items collected from almost 1 000 lexicographic sources (both handwritten and printed); dialectal words have been recorded in more than 500 Lithuanian settlements (Šimėnaitė 2007, 2010).

39It presents the origin, history, and distribution of the word together with its accentuation, grammatical forms, categories, and its peculiarities with respect to word formation, semantic structure, stylistic usage, and etc. Lexicographic entries are illustrated with sentences quoted from religious, scientific, political, fictional, and journalistic literature, and the material of dialects example of folklore. Lots of proverbs, riddles, figurative phrases and sayings and examples of phraseology are presented. It serves as a source of national identity and linguistic investigation of the development of written and spoken Lithuanian, and addresses the world community working in the humanities and the society at large. LKŽ reflects the Lithuanian worldview (Zabarskaitė 2010).

Issues of adapting LKŽ text to digital environment

40The first electronic version of LKŽ with a search engine of a headword was released in 2005. However, it was decided to present LKŽ to the public not only as a source of language history but as an active instrument of language and linguistics cognition. In the electronic version, language facts of the dialectal and old writings were amended based on the results of the latest linguistic researches. Entries of mistakenly transcribed words (the so called ‘words absentees’) were deleted and their examples were moved to the entries of the existing words (for example, entry of the ‘word absentee’ džirbti “dial. to work” was deleted and its illustrative sentences of the South Dialect (dzūkai) were moved to the entry of the word dirbti “standard l. to work”). Semantic differences, inequalities in the use of the main forms as well as accentuation errors were corrected (Zabarskaitė, Naktinienė, Šepetytė-Petrokienė (2006); Zabarskaitė, Naktinienė (2010)).

41For the convenience of the users there is a website of the Dictionary at (administrator: Gertrūda Naktinienė). Online Dictionary is constantly upgraded taking into consideration, to the extent possible, various abundant requests from the users (for further information on communication of lexicographers and users further refer to Zabarskaitė, Naktinienė (2007)). Responses from the online users show relatively high interest among the society in this freely accessible scholarly dictionary. According to the statistics, the Dictionary is accessed approximately 1200 times a day and about 10,000 entries a day are reviewed.

42Tools enabling database search by abbreviations of locations, sources, and last name of the authors, by accentuations, by grammatical, stylistic, usage fields and other references are also intended to be created. This database also has to serve as a tool for the preparation of amended and appended versions of the Dictionary. The first task is to add data of the card index of the Supplements to the new version of the Dictionary. Implementation of this task will start once software, the so called ‘LKŽ lexicographer’s workplace,’ is created.

43It is also planned to combine the text of LKŽ and its Main card index. Dictionary users would then be able to access data that is in the card of the card index only, phonetic transcription of dialect sentences, date of the record, names of the presenter and recorder, handwriting, names of small villages, and etc. The formation of the LKŽ database will facilitate dictionary compilation, lexicographical research, machine translation, and etc.; it will also contribute to the preservation of the Lithuanian linguistic heritage under the conditions of global integration (Zabarskaitė, Naktinienė, Šepetytė (2005)). Thesaurus of English keywords will be created for the convenience of foreign scholars. The other great responsibility of the Institute of the Lithuanian Language is the placement of the Lithuanian databases on the Internet and its spread in the academic sphere of the European Union.


Kažukauskaitė, Ona (2002). « Le grand dictionnaire d’une petite nation, une histoire de cent ans » In Cahiers lituaniens. Strasbourg, Nr. 3, pp. 29–33.

Lietuvių kalbos žodynas (t. I–XX, 1941–2002): elektroninis variantas / G. Naktinienė (vyr. redaktorė), J. Paulauskas, R. Petrokienė, V. Vitkauskas, J. Zabarskaitė. Programuotojai: E. Ožeraitis, V. Zinkevičius. – Vilnius: Lietuvių kalbos institutas, 2005. – Atnaujinta versija 2008. – [The Dictionary of the Lithuanian Language (Vol. 1-20, 1941–2002): electronic release [online], 2005. – Renewed version, 2008.]

Naktinienė, Gertrūda and Zabarskaitė, Jolanta (2004). « On the Linguistic Databases of the Institute of the Lithuanian Language » In The First Baltic Conference “Human Language Technologies: the Baltic Perspective”, Riga, Latvia, April 21-22, pp. 187–190.

Schmalstieg, William R. (1996). “Some Comments on new Volumes of the Lithuanian Academy Dictionary” In Lituanus, vol. 42, Nr. 1, pp. 18–23.

Šimėnaitė, Zita (2007) “Лексікаграфічныякрыніцывялікага “Слоўнікалітоўскаймовы” In Словоисловарь: Vocabulum et vocabularium. Сборникнаучныхтрудовполексикографии / ответственныередакторы: Л. В. Рычкова, В. Л. Воронович. Гродно, 23–24.

Šimėnaitė, Zita (2010) « Związki leksykografii polskiej i litewskiej » In Językowe i kulturowe dziedzictwo Wielkiego Księstwa Litewskiego. Bydgoszcz: Wydawnictwo Uniwersytetu Kazimierza Wielkiego, pp. 158–166.

Toporov, Vladimir N. (2004), « K vychodu v svet bol´šogo „Slovarja litovskogo jazyka“ » in Balto-slavjanskie issledovanija XVI, pp. 408–415. [Lithuanian translation: « Didžiajam „Lietuvių kalbos žodynui“ išėjus » in Mokslo Lietuva, pp. 16, 2 and 7.]

Zabarskaitė, Jolanta; Naktinienė, Gertrūda; Šepetytė, Ritutė (2005) “ Sovremennost’ i perspektyvy „Slovaria litovskogo jazyka" ” In Istoričeskij put’ litovskoj pis’mennosti. Sbornik materialov konferenciji. Vilnius: Institut litovskogo jazyka, pp. 340–355.

Zabarskaitė, Jolanta; Naktinienė, Gertrūda; Šepetytė-Petrokienė, Ritutė (2006) “ Elektroninio Lietuvių kalbos žodyno (t. I-XX) pirmasis leidimas ” In Prace Bałtystyczne 3: język, literatura, kultura. Warszawa, pp. 241-247.

Zabarskaitė, Jolanta (2007) “ Prameny a databáze Institutu pro litevský jazyk ” In Europeica – Slavica – Baltica. Publikace slovanské knihovny, 56. Praha: Národní knihovna České republiky, Slovanská knihovna, pp. 261-272.

Zabarskaitė, Jolanta; Naktinienė, Gertrūda (2007). “ Slouniki u Internece: dyjalog movoznajcy i nos’bita movy ” In Slovo i slovar’. Vocabulum et vocabularium. Sbornik naučnych trudov po leksikografii. Grodno, pp. 32–34.

Zabarskaitė, Jolanta (2009) “ The Institute of the Lithuanian Language and Centres of Baltic Studies ” In The Baltic languages and the Nordic countries: International conference, University of Oslo, June 19-20, pp. 135-147.

Zabarskaitė, Jolanta (2010) “ Politinis pasaulėvaizdis didžiajame Lietuvių kalbos žodyne (valstybė, įstatymas, valdžia) ” In Parlamento studijos, Nr. 9, pp. 126-143.

Zabarskaitė, Jolanta; Naktinienė, Gertrūda (2010) “ The dictionary of Lithuanian (LKŽ) and its Future in Databases and Electronic Version ” In Proceedings of XVI Euralex International Congress. Fryske Akademy – Afȗk, Ljouwert, pp. 780-787.

Zinkevičius, Vytautas (2004). “Creating the Electronic Version of the Dictionary of Lithuanian” In The First Baltic Conference. Human Language Technologies. The Baltic Perspective. Riga, Latvia, April 21-22, pp. 170-173.

Zinkevičius, Vytautas (2007) “The Digitization of the Dictionary of the Lithuanian Language” In The Third Baltic Conference on Human Language Technologies. Kaunas, October 4-5, pp. 349–355.





4 Coplien, James (2010). Lean Architecture for Agile Software Development. Chichester Hoboken, N.J : Wiley. p. 25. ISBN 978-0-470-68420-7.

5 Leffingwell, Dean (2011). Agile Software Requirements : Lean requirements practices for teams, programs, and the enterprise. Upper Saddle River, NJ : Addison-Wesley. pp. 17–19. ISBN 978-0-321-63584-6.

6 Beck, Kent (2000). Extreme programming eXplained : embrace change. Reading, MA : Addison-Wesley. pp. 85–96. ISBN 0-201-61641-6.











To cite this document/Pour citer ce document

Gertrūda Naktinienė, Linas Valiukas et Jolanta E. Zabarskaitė , «Contemporary dictionary application: tips and tricks of getting it right», Tralogy [En ligne], Tralogy II, Session 4 - Terminology and Lexicology / Terminologie et Lexicologie, mis à jour le : 02/06/2014,URL :

Quelques mots à propos de :  Gertrūda Naktinienė

Institute of the Lithuanian Language, Vileišio str. 5, Vilnius, LT-10308, Lithuania

Quelques mots à propos de :  Linas Valiukas

Institute of the Lithuanian Language, Vileišio str. 5, Vilnius, LT-10308, Lithuania

Quelques mots à propos de :  Jolanta E. Zabarskaitė

Institute of the Lithuanian Language, Vileišio str. 5, Vilnius, LT-10308, Lithuania