In 2013, I started my work on the design and co-development of Annotation Pro, a freely available software tool for annotation of linguistic and paralinguistic features. The programme offers a multilayer annotation interface, spectrogram display, graphical feature representation for annotation using continuous rating scales as well as perception test options, Annotation Pro is freely available for research and education purposes. Annotation Pro is continuously evolving – We are working on both its interface and functionality on a current basis. You can download the current version here: annotationpro.org/downloads. A number of plugins and external modules have so far been developed. One of them is the automatic transcription and segmentation module ANNPRO available from the CLARIN-PL repository (here). Any feedback is very welcome!
Resources for the analysis of linguistic and paralinguistic features in speech
Since 2014, I have been involved in Borderland, a project addressed at the documentation and interdisciplinary analysis of phenomena related to interpersonal communication in the region of Słubice (Poland) and Frankfurt Oder (Germany) – on the border of languages and cultures (http://borderland.amu.edu.pl/).
One of the corpora I co-developed earlier in 2013 is the Paralingua corpus for the study of linguistic and paralinguistic features (cf. Klessa et al., 2013 published in the proceedings of CILC 2013). If you are interested in using the Paralingua corpus for your research please contact me at email@example.com The corpus is freely available for non-commercial research purposes after confirmation of reading and accepting the user’s licence. It is sufficient to send the confirmation of reading and accepting the licence as an e-mail attachment. Please specify whether you are interested in EMO or DIAL subcorpus.
Text and Speech Corpora for Speech Technology
In the years 2006-2010 I worked within research projects aiming at creating very large text and speech corpora for automatic speech synthesis and recognition for Polish. The resulting corpora are e.g. the Jurisdic acoustic database (approximately 2000 voices delivering read and semi-spontaneous speech, currently deposited at Speech and Language Data Repository (SLDR/ORTOLANG), here) and the Speechlabs ASR lexical database (above 3 mln vocabulary items phonetically transcribed, accompanied by inflection information).
Endangered Languages – Corpora and Education Resources
Following my interests in various kinds of speech and language corpora I have also become involved in cooperation with a team of colleagues working on the issues of endangered languages within two projects: Dziedzictwo językowe Rzeczypospolitej. Baza dokumentacji zagrożonych języków – Poland’s Linguistic Heritage. Development of a Documentation Database for Endangered Languages : www.inne-jezyki.amu.edu.pl, and INNET – European Project for Endangered Languages Archive Network Management and Reinforcement. The product of the Innet project is among others the website languagesindanger.eu
I wrote my doctoral dissertation about Polish segmental duration modelling for the purposes of speech synthesis (2006, Adam Mickiewicz University, The Institute of Linguistics). The resulting duration model was implemented in the Polish version of the Bonn Open Synthesis System (BOSS). (see Publications for details, e.g., Optimization of Polish Segmental Duration Prediction with CART, Sixth ISCA Workshop on Speech Synthesis (SSW6), Bonn, 2007. (K. Klessa, M. Szymański, S. Breuer, G. Demenko) or for more recent adjustments: M. Szymański, K. Klessa, S. Breuer & G. Demenko (2011) Optimization of Unit Selection Speech Synthesis, Proceedings of ICPhS 2011, Hong Kong).
I am interested in various types of prosodic phenomena, temporal organization of spoken language and the functions of melody in speech. Among others, I participated in a research project focusing on Polish intonation whose result was one of the first digitally recorded Polish corpora of (semi)spontaneous speech (Polish Intonation Database).