Here you'll find some tools that I developed and which can be accessed online.
Web-based tools for identifying difficult words in a text with a graded lexical database
As part of my Master’s and Ph.D. in NLP, I developed several web-based tools to automatically identify difficult words in a text with a graded lexical database.
My very first paper (Tack et al. 2016) introduced a simple, lexicon-based tool that could automatically identify difficult words in a given text based on a graded lexical database. This database had previously been extracted from a corpus of reading materials labeled by experts based on a scale of grade levels (e.g., K-12, CEFR, …). The tool only required a couple of parameters:
The tool used a two-step heuristic to identify difficult words in a text.
Although I initially developed these web-based tools for French and Dutch (which was the focus of my studies), I designed the infrastructure so that it could be used for multiple languages. This multilingual infrastructure then facilitated the integration of my two tools for French and Dutch with the tools that were developed later for other languages in the CEFRLex project. All available tools can be found here: https://cental.uclouvain.be/cefrlex/analyse/.
I developed a first version of this tool in the summer of 2014. At the time, I was a research intern at the Computer Science Laboratory for Mechanics and Engineering Sciences (LIMSI-CNRS, Université Paris-Sud, France), supervised by Anne-Laure Ligozat. During my internship, I worked on the automated analysis of lexical complexity for French elementary school children (with the MANULEX database) and for French as a foreign language learners (with the FLELex database). I compared the tool’s output with manual lexical simplifications in Vikidia, a simplified Wikipedia for children. I published and presented the results at the International Conference on Language Resources and Evaluation (Tack et al. 2016). After my internship, I developed a web interface for the tool, as illustrated in the images below.
During my Ph.D. at UCLouvain and KU Leuven, I introduced a new graded lexical database for Dutch (Tack et al. 2017) and developed a new version of the tool. This new version was designed as an Angular Material Design UI connected to an object-oriented RESTful backend developed in the Django framework. Although the system was specifically developed for the first publication of NT2Lex (Tack et al. 2018), its database architecture was independent of the tabular database format. As a result, future resources developed for other languages could be easily added to the system.