Qumin: Quantitative modelling of inflection¶
Qumin (QUantitative Modelling of INflection) is a collection of scripts for the computational modelling of the inflectional morphology of languages. It was developed by me (Sacha Beniamine) for my PhD, which was supervised by Olivier Bonami .
The documentation has moved to ReadTheDocs at: https://qumin.readthedocs.io/
For more detail, you can refer to my dissertation (in French):
Quick Start¶
Install¶
First, open the terminal and navigate to the folder where you want the Qumin code. Clone the repository from github:
git clone https://github.com/XachaB/Qumin.git
Make sure to have all the python dependencies installed. The dependencies are listed in environment.yml. A simple solution is to use conda and create a new environment from the environment.yml file:
conda env create -f environment.yml
There is now a new conda environment named Qumin. It needs to be activated before using any Qumin script:
conda activate Qumin
Data¶
The scripts expect full paradigm data in phonemic transcription, as well as a feature key for the transcription.
To provide a data sample in the correct format, Qumin includes a subset of the French flexique lexicon, distributed under a Creative Commons Attribution-NonCommercial-ShareAlike license.
For Russian nouns, see the Inflected lexicon of Russian Nouns in IPA notation.
Scripts¶
Patterns¶
Alternation patterns serve as a basis for all the other scripts. The algorithm to find the patterns was presented in: Sacha Beniamine. Un algorithme universel pour l’abstraction automatique d’alternances morphophonologiques 24e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Jun 2017, Orléans, France. 2 (2017), 24e Conférence sur le Traitement Automatique des Langues Naturelles.
Computing automatically aligned patterns for paradigm entropy or macroclass:
bin/$ python3 find_patterns.py <paradigm.csv> <segments.csv>
Computing automatically aligned patterns for lattices:
bin/$ python3 find_patterns.py -d -o <paradigm.csv> <segments.csv>
Microclasses¶
To visualize the microclasses and their similarities, you can use the new script microclass_heatmap.py:
Computing a microclass heatmap:
bin/$ python3 microclass_heatmap.py <paradigm.csv> <output_path>
Computing a microclass heatmap, comparing with class labels:
bin/$ python3 microclass_heatmap.py -l <labels.csv> -- <paradigm.csv> <output_path>
The labels file is a csv file. The first column give lexemes names, the second column provides inflection class labels. This allows to visually compare a manual classification with pattern-based similarity. This script relies heavily on seaborn’s clustermap function.
Paradigm entropy¶
This script was used in:
- Bonami, Olivier, and S. Beniamine. “Joint predictiveness in inflectional paradigms.” Word Structure 9, no. 2 (2016): 156-182. Some improvements have been implemented since then.
Computing entropies from one cell
bin/$ python3 calc_paradigm_entropy.py -n 1 -- <patterns.csv> <paradigm.csv> <segments.csv>
Computing entropies from two cells (you can specify any number of predictors, e.g. -n 1 2 3 works too)
bin/$ python3 calc_paradigm_entropy.py -n 2 -- <patterns.csv> <paradigm.csv> <segments.csv>
Add a file with features to help prediction (for example gender – features will be added to the known information when predicting)
bin/$ python3 calc_paradigm_entropy.py -n 2 --features <features.csv> -- <patterns.csv> <paradigm.csv> <segments.csv>
Macroclass inference¶
Our work on automatical inference of macroclasses was published in Beniamine, Sacha, Olivier Bonami, and Benoît Sagot. “Inferring Inflection Classes with Description Length.” Journal of Language Modelling (2018).
Inferring macroclasses
bin/$ python3 find_macroclasses.py <patterns.csv> <segments.csv>
Lattices¶
This script was used in:
- Beniamine, Sacha. (in press) “One lexeme, many classes: inflection class systems as lattices” , In: One-to-Many Relations in Morphology, Syntax and Semantics , Ed. by Berthold Crysmann and Manfred Sailer. Berlin: Language Science Press.
Inferring a lattice of inflection classes, with html output
bin/$ python3 make_lattice.py --html <patterns.csv> <segments.csv>
Documentation index¶
- Quick start
- 1. The paradigms file
- 2. The segments file
- 3. How to use the scripts
- 4. API
- 4.1. clustering package
- 4.2. entropy package
- 4.3. lattice package
- 4.4. representations package
- 4.4.1. Submodules
- 4.4.2. representations.alignment module
- 4.4.3. representations.confusables module
- 4.4.4. representations.contexts module
- 4.4.5. representations.generalize module
- 4.4.6. representations.patterns module
- 4.4.7. representations.quantity module
- 4.4.8. representations.segments module
- 4.4.9. representations.utils module
- 4.4.10. Module contents
- 4.5. utils package