1. Designing text corpora that contain metadata with information about their authors

The fundamental feature of the approach employed in the Laboratory while addressing text-based personality profiling is that we use specially designed text corpora containing metadata with information about their authors (gender, age, native language, psychological testing results, etc.). Designing such corpora is a scientific issue in its own right and thus it is time and labor costly.

You can see most of the corpora we have collected over the years and feel free to use them for your own research as well.  

2. Identifying linguistic features of texts by different age and social groups, etc.

This field is one of the major issues we are working on at our Laboratory. Searching for correlations between personality traits and linguistic parameters has long been of global interest and has been addressed by specialists both in this country and abroad. In Russia psychologists are at the forefront of the field and admittedly, there have not been a lot of linguistic studies of the problem and they are mostly very sketchy and eclectic. Nevertheless the issue has not only a theoretical but practical value in particular for forensic linguistics in developing methods for authorship analysis that seek to identify the author’s data (gender, age, personality traits, etc.) based on a linguistic analysis of their texts. It is essential that formal quantifiable content-independent text parameters that are beyond control are focused on through the course of analysis.

The studies are supported by the grants of Russian Foundation for Basic Research (RFBR) 13-06-00016 “Modelling Personality of a Written Text Author”, 15-36-50104 “Formal and Grammatical Parameters of Written Texts and Individual Personality Traits of their Authors: a Corpus Study”.

3. Lexicographic Profiling of Social Groups and Development of Sociolinguistic Dictionary Databases

Relevant studies are aimed at lexicographic parameterization of collective linguistic profiles of native Russian speakers that come from diverse social backgrounds (gender, age and profession) that is performed to identify cultural and linguistic parameters of authors as well as to design learners’ stylistic dictionaries of Russian keeping in mind social and stylistic characteristics of Russian.  Those issues have not been tackled yet by Russian linguists, but their theoretical and practical significance for teaching Russian as first and second language is not be to denied as well as that of designing a “cultural and linguistic” corpus of Russian texts with metadata on the social characteristics of their authors, which contributes to text-based personality profiling.

The studies are supported by the grants of the Russian Humanitarian Science Foundation (RHSF) № 13-14-36001 “Linguistic Profiling of Voronezh Students (Using the Material of a Digital Text Corpus “Russia and World Through The Eyes of Voronezh Students”)”, № 16-04-18010 “Norms of Russian Literary Word Use and Its Stylistic Variants in the Linguistic Mind of Modern Russian Youngsters: a Field Study”.

4. Developing methods of identifying gender and age of text authors including when properties of written speech are intentionally falsified

The project has both theoretical and practical applications. There has been rapid development of Internet communication. Unfortunately, it has led to growing numbers of cybercrime:  offenders would take advantage of the virtual world to perpetrate crime. The Internet is used as a tool for looking for new victims and grooming by pedophiles, illegal terrorist organizations, etc. Certainly cyber criminals seek to keep their data private and impossible to access so that their identities are not found. Therefore they tend to use fake personal data (gender, age, etc.) in their profiles and further correspondence and the only way to get hold of this information is to analyze their texts.

The particular objective of the project is to develop methods for identifying the gender and age of individuals participating in Internet communication based on the analysis of quantitative parameters of their texts. In this way demographic data can be identified with a high degree of accuracy considering that some parameters might be falsified in order to imitate someone of the other gender and/or age group and methods of identifying falsified parameters in written texts can be suggested.

The study is supported by the grant of the Russian Foundation for Basic Research (RFBR) № 16-18-10050 “Identification of Gender and Age of Participants of Internet Communication Based on Quantitative Parameters of Their Texts”.

5. Developing methods of identifying individual psychological characteristics to cater for the demands of HR services

We have conducted a study to identify linguistic markers of conflict behavior tendencies using the analysis of a specially designed text corpus by means of the mathematical statistics methods and automatic language processing.

6. Determining linguistic features of individuals suffering from depression, dementia, schizophrenia, bipolar disorder

There is an ongoing study of linguistic features of written texts by individuals suffering from schizophrenia.  

7. Developing methods to identify autoaggressive as well as suicidal tendencies

Another scientific issue that needs a lot more attention paid to it is identification of suicidal tendencies based on speech analysis. This issue has a theoretical as well as practical value. Over 800000 people die of suicide annually and only 30 % of them were reported to be vocal about their intentions. Thus there is a crying need for developing methods of identifying suicidal individuals and prevent suicides from happening. Linguistic features of suicide notes are most commonly studied. Despite the fact that it is essential for them to be studied, due to being rather short, they are not capable of providing enough data for linguistic features of these texts to be investigated.  As significant as these studies are theoretically and practically, a whole range of linguistic parameters of such texts written at different points of lives of suicidal individuals has to be examined in order to be able to predict linguistic indicators of suicidal behavior, i.e. changes taking place at different levels of a text as a cognitive product as these tendencies progress. Finally, there is a comparative analysis conducted using the samples of the control group with a maximum similar education level, social statue, etc. but made of up those who did not commit suicide.

The studies are performed using the texts of “suicidal diaries”, i.e. online texts by those who committed suicide.

The study is supported by the grant of the President of the Russian Federation for young scientists, PhDs and Doctors of Science, project № МК-4633.2016.6.

8. Identifying linguistic features of individuals with different profiles of functional sensomotor asymmetry

It is part of cutting-edge studies carried out by neurolinguists investigating norm. The concept of the project relies on the fact that typological characteristics of a linguistic personality are associated with typological features of the brain and characteristics of the nervous system.  Although it has already been proved that verbal and non-verbal neuropsychological characteristics of individuals from the norm population are connected as there are common or similar components in the respective functions, there has been no comprehensive research into the issue. The aim of the project is to identify typological characteristics of individuals with different profiles of lateral organization.  The fact that the profile of lateral organization is an important characteristics that reflects individual differences in the work of the pairs of the human brain hemispheres is indicated by its use as the foundation for the typology of individual differences in the mental development of healthy individuals in the neuropsychology of individual differences.  

The study of quantitative parameters of texts and profile of lateral organization of their authors is conducted using a specially designed text corpus where individuals were instructed to do a series of neuropsychological tests which is named RusNeuroPsych.

The study is supported by the grant of Russian Foundation for Basic Research (RFBR) № 16-36-00036.

9. Lie Detection in Texts

Lie detection that we define as presenting different forms of false information in order to mislead has been around for as long as there has been human life.   However, people’s ability to tell the truth from lies is limited.   According to expert evaluations, people’s capacity to detect deception is a bit over the random value. Therefore there have been attempts to design special technical tools for more efficient lie detection (e.g., a lie detector or a polygraph). However, it has been shown that these tools relying on the analysis of non-verbal and paraverbal behavior mostly reveal emotional states and are thus more often than not faulty (e.g., when someone is too anxious during a polygraph examination, their answers will be recognized as false). Modern research indicates that speech analysis are better at deception detection. This issue is being investigated a lot both in this country and abroad for oral and written speech. However, in Russia verbal markers of deception are mainly studied by psychologists while it has not emerged as a linguistic issue to be addressed by Russian scientists as there is no comprehensive list of verbal markers of deception and neither are there any consistent and experimental studies of the problem.  

The results of the project being worked on (the data on the features of linguistic parameters of “deceptive” texts) are likely to contribute to current studies of general linguistic aspects of deception. In addition, apart from general linguistics addressing the problems of language and personality as well as language and mind, they are to be employed in practical tasks facing our society these days that have to do with lie detection in written communication including the Internet.

The study is supported by the grant of the Russian Humanitarian Science Foundation (RHSF) № 15-34-01221 “Lie Detection in Written Texts: a Corpus Study”.

