1. Designing text corpora that contain metadata with information about their authors
The fundamental feature of the approach employed in the Laboratory while addressing text-based personality profiling is that we use specially designed text corpora containing metadata with information about their authors (gender, age, native language, psychological testing results, etc.). Designing such corpora is a scientific issue in its own right and thus it is time and labor costly.
You can see most of the corpora we have collected over the years and feel free to use them for your own research as well.
Литвинова Т.А. Электронный корпус студенческих эссе на русском языке и его возможности для современных гуманитарных исследований / О.В. Загоровская, Т.А. Литвинова, О.А. Литвинова // Мир науки, культуры и образования. – 2012. – № 3 (34). – С. 387-389.
Литвинова О.А., Диброва Е.В., Литвинова Т.А., Рыжкова Е.С. Корпусные исследования письменной речи в решении задач судебного автороведения // Филологические науки. Вопросы теории и практики. 2015, Т. 8, № 1, 107 – 113.
Tatiana Litvinova, Olga Litvinlova, Olga Zagorovskaya, Pavel Seredin, Aleksandr Sboev, Olga Romanchenko. “Ruspersonality”: a Russian Corpus for Authorship Profiling and Deception Detection, Intelligence, Social Media and Web (ISMW FRUCT), 2016 International FRUCT Conference, IEEE, 2016, 1 – 7.
2. Identifying linguistic features of texts by different age and social groups, etc.
This field is one of the major issues we are working on at our Laboratory. Searching for correlations between personality traits and linguistic parameters has long been of global interest and has been addressed by specialists both in this country and abroad. In Russia psychologists are at the forefront of the field and admittedly, there have not been a lot of linguistic studies of the problem and they are mostly very sketchy and eclectic. Nevertheless the issue has not only a theoretical but practical value in particular for forensic linguistics in developing methods for authorship analysis that seek to identify the author’s data (gender, age, personality traits, etc.) based on a linguistic analysis of their texts. It is essential that formal quantifiable content-independent text parameters that are beyond control are focused on through the course of analysis.
The studies are supported by the grants of Russian Foundation for Basic Research (RFBR) 13-06-00016 “Modelling Personality of a Written Text Author”, 15-36-50104 “Formal and Grammatical Parameters of Written Texts and Individual Personality Traits of their Authors: a Corpus Study”.
Литвинова Т.А. Частоты встречаемости последовательностей частей речи в тексте и психофизиологические характеристики его автора: корпусное исследование / Т.А. Литвинова, О.А. Литвинова, П.В. Середин // Вестник Иркутского государственного лингвистического университета. – 2014. – № 2. – С. 8-12.
Литвинова Т.А. Возможности компьютерной лингвистики для решения задач диагностирования личности по тексту (на материале корпуса текстов Personality) // Вестник Воронежского гос. ун-та. Серия: Лингвистика и межкультурная коммуникация. 2015. № 3. С. 37-41.
Литвинова Т.А. Идентификация и диагностирование личности автора письменного текста / Т. А. Литвинова, О.А. Литвинова. – Воронеж: Изд-во Воронеж. гос. пед. ун-та, 2015. – 332 с.
Литвинова Т.А. Индивидуально-личностные характеристики автора и количественные параметры его текста: корпусное исследование / Т.А. Литвинова, Е.В. Диброва, П.В. Середин, О.А. Литвинова // Вопросы психолингвистики. 2015. № 4. С. 98-108.
Litvinova T.A., Seredin P.V., Litvinova O.A. Using Part-of-Speech Sequences Frequencies in a Text to Predict Author Personality: a Corpus Study. Journal of Language and Literature 2015; 6(1), 214-217.
Litvinova T., Litvinova O. Authorship Profiling in Russian-Language Texts // Mayaffre D. et al. (eds.) Proceedings of JADT 2016: 13ème Journées internationales d’Analyse statistique des Données Textuelles. Nice, France. CNRS. Université Nice Sophia Antipolis. Vol. 2. pp. 793-798.
3. Lexicographic Profiling of Social Groups and Development of Sociolinguistic Dictionary Databases
Relevant studies are aimed at lexicographic parameterization of collective linguistic profiles of native Russian speakers that come from diverse social backgrounds (gender, age and profession) that is performed to identify cultural and linguistic parameters of authors as well as to design learners’ stylistic dictionaries of Russian keeping in mind social and stylistic characteristics of Russian. Those issues have not been tackled yet by Russian linguists, but their theoretical and practical significance for teaching Russian as first and second language is not be to denied as well as that of designing a “cultural and linguistic” corpus of Russian texts with metadata on the social characteristics of their authors, which contributes to text-based personality profiling.
The studies are supported by the grants of the Russian Humanitarian Science Foundation (RHSF) № 13-14-36001 “Linguistic Profiling of Voronezh Students (Using the Material of a Digital Text Corpus “Russia and World Through The Eyes of Voronezh Students”)”, № 16-04-18010 “Norms of Russian Literary Word Use and Its Stylistic Variants in the Linguistic Mind of Modern Russian Youngsters: a Field Study”.
Литвинова Т.А., Рыжкова Е.С., Шевченко И.С., Лантюхова Н.Н. Профилирование автора текста как одно из стратегических направлений исследований, Вестник Воронежского института ГПС МЧС России, Воронежский институт ГПС МЧС России, Воронеж, 2013, 1, 38 – 41.
4. Developing methods of identifying gender and age of text authors including when properties of written speech are intentionally falsified
The project has both theoretical and practical applications. There has been rapid development of Internet communication. Unfortunately, it has led to growing numbers of cybercrime: offenders would take advantage of the virtual world to perpetrate crime. The Internet is used as a tool for looking for new victims and grooming by pedophiles, illegal terrorist organizations, etc. Certainly cyber criminals seek to keep their data private and impossible to access so that their identities are not found. Therefore they tend to use fake personal data (gender, age, etc.) in their profiles and further correspondence and the only way to get hold of this information is to analyze their texts.
The particular objective of the project is to develop methods for identifying the gender and age of individuals participating in Internet communication based on the analysis of quantitative parameters of their texts. In this way demographic data can be identified with a high degree of accuracy considering that some parameters might be falsified in order to imitate someone of the other gender and/or age group and methods of identifying falsified parameters in written texts can be suggested.
The study is supported by the grant of the Russian Foundation for Basic Research (RFBR) № 16-18-10050 “Identification of Gender and Age of Participants of Internet Communication Based on Quantitative Parameters of Their Texts”.
Литвинова Т.А. Судебная автороведческая экспертиза текста с целью установления пола его автора: проблемы и перспективы // Современное право. 2016. № 7. С. 100-104.
Litvinova T., Seredin P., Litvinova O., Zagorovskaya O., Sboev A., Gudovskih D., Moloshnikov I., Rybka R. Predicting the gender of an author of a russian text using regression and classification techniques // Baixeries J., Ignatov D. I., Ilvovsky D., Panchenko A. (eds.). Proceedings of the Third Workshop on Concept Discovery in Unstructured Data. Moscow, Russia, July 18, 2016. pp. 44-53.
Литвинова Т.А., Загоровская О.В., Середин П.В. Диагностирование пола автора письменного текста на основе количественных параметров: когнитивный подход // Вопросы когнитивной лингвистики. 2016. № 4. C. 51-59.
Sboev A.G., Litvinova T.A., Gudovkikh D.V., Rybka R.B., Moloshnikov I.A. Machine Learning Models of Text Categorization by Author Gender Using Topic-Independent Features. Procedia Computer Science, 2016, vol. 101C, pp. 134-141. DOI: 10.1016/j.procs.2016.11.017.
Sboev A.G., Vlasov D.S., Serenko A.V., Moloshnikov I.A., Litvinova T.A. On the applicability of spiking neural network models to solve the task of recognizing gender hidden in texts. Procedia Computer Science, 2016, vol. 101C, pp. 187–196.
5. Developing methods of identifying individual psychological characteristics to cater for the demands of HR services
We have conducted a study to identify linguistic markers of conflict behavior tendencies using the analysis of a specially designed text corpus by means of the mathematical statistics methods and automatic language processing.
6. Determining linguistic features of individuals suffering from depression, dementia, schizophrenia, bipolar disorder
There is an ongoing study of linguistic features of written texts by individuals suffering from schizophrenia.
7. Developing methods to identify autoaggressive as well as suicidal tendencies
Another scientific issue that needs a lot more attention paid to it is identification of suicidal tendencies based on speech analysis. This issue has a theoretical as well as practical value. Over 800000 people die of suicide annually and only 30 % of them were reported to be vocal about their intentions. Thus there is a crying need for developing methods of identifying suicidal individuals and prevent suicides from happening. Linguistic features of suicide notes are most commonly studied. Despite the fact that it is essential for them to be studied, due to being rather short, they are not capable of providing enough data for linguistic features of these texts to be investigated. As significant as these studies are theoretically and practically, a whole range of linguistic parameters of such texts written at different points of lives of suicidal individuals has to be examined in order to be able to predict linguistic indicators of suicidal behavior, i.e. changes taking place at different levels of a text as a cognitive product as these tendencies progress. Finally, there is a comparative analysis conducted using the samples of the control group with a maximum similar education level, social statue, etc. but made of up those who did not commit suicide.
The studies are performed using the texts of “suicidal diaries”, i.e. online texts by those who committed suicide.
The study is supported by the grant of the President of the Russian Federation for young scientists, PhDs and Doctors of Science, project № МК-4633.2016.6.
Litvinova T., Zagorovskaya O., Litvinova O., Seredin P. Profiling a Set of Personality Traits of a Text’s Author: A Corpus-Based Approach // A. Ronzhin et al. (Eds.): SPECOM 2016, Lecture Notes in Computer Science, Springer International Publishing, Vol. 9811, pp. 555–562, 2016.
Литвинова Т.А., Загоровская О.В., Литвинова О.А. Выявление склонности личности к суицидальному поведению на основе количественного анализа ее речевой продукции // Studia Humanitatis. 2016.
Литвинова Т.А. Корпусные исследования речи лиц, совершивших суицид // Russian Linguistic Bulletin. 2016. № 3 (7). С. 133-136.
8. Identifying linguistic features of individuals with different profiles of functional sensomotor asymmetry
It is part of cutting-edge studies carried out by neurolinguists investigating norm. The concept of the project relies on the fact that typological characteristics of a linguistic personality are associated with typological features of the brain and characteristics of the nervous system. Although it has already been proved that verbal and non-verbal neuropsychological characteristics of individuals from the norm population are connected as there are common or similar components in the respective functions, there has been no comprehensive research into the issue. The aim of the project is to identify typological characteristics of individuals with different profiles of lateral organization. The fact that the profile of lateral organization is an important characteristics that reflects individual differences in the work of the pairs of the human brain hemispheres is indicated by its use as the foundation for the typology of individual differences in the mental development of healthy individuals in the neuropsychology of individual differences.
The study of quantitative parameters of texts and profile of lateral organization of their authors is conducted using a specially designed text corpus where individuals were instructed to do a series of neuropsychological tests which is named RusNeuroPsych.
The study is supported by the grant of Russian Foundation for Basic Research (RFBR) № 16-36-00036.
Litvinova T., Ryzhkova E., Litvinova O. Features of Written Texts of People with Different Profiles of the Lateral Brain Organization of Functions (on the Basis of RusNeuroPsych Corpus) // Botinis A. (ed.). Proceedings of the 7th Tutorial and Research Workshop on Experimental Linguistics (Exling 2016) (1-2 July2016, Saint Petersburg, Russia). Saint Petersburg State University, International Speech Communication Association, University of Athens, 2016. pp. 103-106.
9. Lie Detection in Texts
Lie detection that we define as presenting different forms of false information in order to mislead has been around for as long as there has been human life. However, people’s ability to tell the truth from lies is limited. According to expert evaluations, people’s capacity to detect deception is a bit over the random value. Therefore there have been attempts to design special technical tools for more efficient lie detection (e.g., a lie detector or a polygraph). However, it has been shown that these tools relying on the analysis of non-verbal and paraverbal behavior mostly reveal emotional states and are thus more often than not faulty (e.g., when someone is too anxious during a polygraph examination, their answers will be recognized as false). Modern research indicates that speech analysis are better at deception detection. This issue is being investigated a lot both in this country and abroad for oral and written speech. However, in Russia verbal markers of deception are mainly studied by psychologists while it has not emerged as a linguistic issue to be addressed by Russian scientists as there is no comprehensive list of verbal markers of deception and neither are there any consistent and experimental studies of the problem.
The results of the project being worked on (the data on the features of linguistic parameters of “deceptive” texts) are likely to contribute to current studies of general linguistic aspects of deception. In addition, apart from general linguistics addressing the problems of language and personality as well as language and mind, they are to be employed in practical tasks facing our society these days that have to do with lie detection in written communication including the Internet.
The study is supported by the grant of the Russian Humanitarian Science Foundation (RHSF) № 15-34-01221 “Lie Detection in Written Texts: a Corpus Study”.
Литвинова О.А., Литвинова Т.А., Середин П.В. Классификация текстов по признаку «ложный / правдивый» с использованием методов автоматической обработки текстов // Научный диалог, Центр научных и образовательных проектов, Екатеринбург, 2016, 10, 58, 70 – 83.
Литвинова О.А., Литвинова Т.А. Исследование лингвистических характеристик текстов, содержащих намеренно искаженную информацию, с помощью программы Linguistic Inquiry and Word Count, Вестник МГОУ. Лингвистика, МГОУ, Москва, 2015, 3, 71 – 77.
Литвинова Т.А. Установление в письменном тексте признаков намеренного искажения информации как одна из задач судебной лингвистики // Современное право. – Москва, 2016, 8, 115 – 119.
Olga Litvinova, Tatiana Litvinova, Pavel Seredin, John Lyell. Deception Detection in Russian Texts, in Proceedings of 15th The European Chapter of the Association for Computational Linguistics Conference (EACL 2017).
10. Modeling of the idiolect of a native speaker of the contemporary Russian language in the aspect of identification of the author of the text
Modern science is experiencing a burst of interest to the idiolect as a result of the realization of the properties of a language system under the speech conditions of an individual. However, despite the increased interest in this problem, there are many unresolved issues in the field of studying an individual language style (idiolect). The systematic consideration of a number of important problems such as the relationship between the mechanisms of variability and stability of style, the comparison of the nature of variation of groups of characters belonging to different linguistic levels and aspects, the limits of the dynamic variation of the parameters and the scale of their possible fluctuations, the interaction of linguistic characteristics is not received yet.
During the project, on the basis of persistent linguistic signs of idiostyle with distinctive ability, an experimental model of identification of the author of the text in the Russian language will be constructed taking into account the needs of modern expert practice in the context of the emergence and activation of new cyber threats (analysis of small texts; inconsistency of the obtained samples with a controversial text on style, genre, modus, etc.).
The study is supported by the grant of Russian Science Foundation № 18-78-10081.