Gender Imitation Corpus

Gender Imitation Corpus is the first Russian corpus for studies of stylistic deception. Each respondent (n=142) was instructed to write 3 texts on the same topic (from a list). Let us provide an example of the task: “Last summer you bought a package tour from a travel agency, but you were not at all pleased with your experience with that company and the trip was not worth the price. You are about to ask for a refund. Write three texts describing your negative experience providing a detailed account of it. Give a warning that you are intending to sue the company”. The first text (type “a”) is supposed to be written in a way usual for whoever writes it (without any deception), the second one (type “б”) should be written as if by someone of the opposite gender (“imitation”); the third one (type “в”) should be as if one by another individual of the same gender so that their personal writing style will not be recognized (what is referred to as “obfuscation”). Most of the texts are 80-150 words long. Some respondents did not manage to write three texts.

All of the respondents are students of Russian universities. Besides the texts, the corpus includes metadata with the authors’ characteristics: gender, year of birth, native language, handedness, psychological gender (femininity/masculinity measures according to Bem Sex Role Inventory). To the best of our knowledge, this is the first corpus of the kind globally.

This corpus is introduced in the paper T. A. Litvinova, O. V. Zagorovskaya, O. A. Litvinova. Russian text corpora for deception detection studies // International Journal of Open Information Technologies. 2017. Vol 5, No 11. P. 58-63.

Any scientific publication derived from the use of this corpus should explicitly refer to this paper.

The corpus was collected with support by the grant of Russian Science Foundation, project No 16-18-10050 “Identifying the Gender and Age of Online Chatters Using Formal Parameters of their Texts”.