Prepare taxonomy workflow
The first task is to manually collate keywords into an .ini
file, written following the rules of a Windows configuration file. Hence, a classification level is defined by a word in square brackets, and the list of words beneath it are the terms that indicate that classification level. Comments in the .ini
file are indicated by starting a line with a semi-colon. This format is relatively straightforward to use, as can be seen in the samples supplied in the framework.
The second task is to process the .ini
file with the prepare_taxonomy
script. The script stems and de-duplicates the terms, producing two files:
.pkl
a pickled file for use by the framework scripts;.txt
a text file for a human readable view of the stemmed output.Pickle is a Python specific storage method. It is the most efficient method of storing the classifications for re-use in Python. As all scripts in the framework are written in Python, it is used in this framework. The classification could easily be made available to other tools written in other languages if dumped out as a json
file for example.
The framework is supplied with a default taxonomy based on Bloom’s Cognitive taxonomy See: Krathwohl, David R. (2002) A Revision of Bloom’s Taxonomy: An Overview. Theory Into Practice, 41:4, 212-218.
Two other sample .ini
taxonomy files are supplied:
lda.ini
Learning Design Activities is based on describing the different levels of learning activities set out in teaching materials. The taxonomy and its seven levels are documented in Rienties, B., Toetenel, L. (2016). The impact of learning design on student behaviour, satisfaction and performance: A cross-institutional comparison across 151 modules. Computers in Human Behavior, 60, 333-341.solo.ini
Structure of the Observed Learning Outcome (SOLO) is based on the study of the outcomes of academic teaching. The taxonomy names and distinguishes five different levels according to the cognitive processes required to obtain them. See: Biggs, J. B., Collis, K. F. (1982). Evaluating the Quality of Learning: The SOLO Taxonomy, Structure of the Observed Learning Outcome. Academic Press, London.Other classification schemes can be added eaily. For example, one classification we would like to explore in a follow on project is BCS’s SFIA+ (British Computer Society’s IT Skills Framework).
Having established that we can apply a taxonomy to the forum posts, we wondered if a tailored taxonomy could be developed to provide more meaningful results than the Bloom Cognitive Taxonomy originally used. To that end, a new script was written to identify significant words in the existing forums.
Suggest keywords workflow
The suggest_keywords
produces \*_significant_words.html
in the reports folder, one file for each forum. The script uses tf:idf (term frequency–inverse document frequency) to identify significant words. This means that those words that appear more often in a particular forum (their term frequency) than they appear in all the forums considered together (their document frequency) is identified as significant. This approach produces some insight into each forum, but necessarily a lot of noise. For example, tutor names such as Ursula appear often in their own forums, but not at all in other forums, and so get highlighted as being significant in their forums.
To help understand each forum further and see through this noise, the script has been extended to produce \*_ word_counts.csv
in the data folder for each forum, and a summary \*_ word_counts.csv
for the original source folder. These files have two columns:
The data is sorted into Count order, from highest to lowest. Hence, the most common word is first. Reviewing these tables is a crude, but helpful, mechanism to see what words are used in the posts, and may further suggest keywords on which to base future taxonomies.
The development of a tailored taxonomy could be the basis for future work but not one we have yet pursued.