Dataset Description

Dataset Summary


TURkish Emotional Speech database (TURES), which includes 5100 utterances extracted from 55 Turkish movies, was constructed. Each utterance in the database is labeled with emotion categories (happy, surprised, sad, angry, fear, neutral and other) and 3- dimensional emotional space (valence, activation, and dominance).

  • 5100 utterances from 582 (188 female, 394 male) speakers were obtained. The average length of utterances is 2.34 seconds.

  • The emotion in each utterance was evaluated in a listener test by a large number of annotators (27 university students) independently of one another. Annotators were asked to listen to the entire speech recordings (randomly permuted) and assign an emotion label (both categorical and dimensional) for each utterance. Annotators only took audio information into consideration.

For a more thorough explanation of the dataset collection and its contents, see [1]


File List

The following files are available (each explained in more detail below):

File namePartContents
RatingsOnline self-assessment and emotional classAll individual ratings from the online self-assessment and emotional class.
Feature from OpenSmileSingle file for all utteranceMFCC, Pitch, LSP, etc. Totally 6552 features (emo_large)
f0For each utterancePitch (F0) from Esps get_f0 function[2]
mfccFor each utteranceMel-Frequency Cepstral Coefficients(MFCC) from HTK Speech Recognition Toolkit [3]
formantsFor each utteranceF1,F2, and F3 formants extracted with using Praat[4]

File Details

  • Ratings

    The emotion in each utterance was evaluated in a listener test by a large number of annotators (27 university students) independently of one another.

    • Categorical Annotation

      Utterances were labelled in seven emotional states: happy, surprised, sad, angry, fear, neutral and other. For each utterance, the final emotion label was computed from the majority label of the 27 annotators.

    • Annotation in 3D Space

      For the emotion labelling in 3D space. Self-Assessment Manikins (SAMs) [5] were used for measuring the emotional content of each audio clip with ratings on a 5 level scale between one and five for valence, activation and dominance. Valence represents negative to positive axis, activation represents calm to excited axis and dominance represents weak to strong axis of 3 dimensional emotion space. For each utterance in the database, annotators were asked to select one of the iconic image from the corresponding row for each of three dimensions.

  • Feature from Opensmile

    This file contains emotion features large set of 6552 features, 1st level functionals of low-level descriptors such as MFCC, Pitch, LSP, ... ((56 LLD + 56 delta) * 39 functionals). File format is arff and labeled with valance, activation and dominance numeric values.

  • F0

    Files contain calculated Pitch values for each utterance. The files are labeled with utterance's names. We calculted Pitch (F0) frequency values with the Esps get_f0 function[2]

  • MFCC

    Files contain calculated MFCC values for each utterance with HTK Speech Recognition Toolkit[3]. The files are labeled with utterance's names.

  • Formants

    Files contain calculated Formant Frequency (F1,F2 and F3) values for each utterance with Praat [4]. The files are labeled with utterance's names.

References

  1. Oflazoglu C, Yildirim S: Turkish emotional speech database. In Proc. IEEE 19th Conf. Signal Processing and Communications Applications (SIU) 2011:1153–1156.
  2. Talkin, D. (1995). A Robust Algorithm for Pitch Tracking (RAPT). In Kleijn, W. B. and Paliwal, K. K. (Eds.), Speech Coding and Synthesis. New York: Elsevier.
  3. Steve Young et al. The HTK Book. Version 3.4, March 2006.
  4. Boersma, Paul & Weenink, David (2013). Praat: doing phonetics by computer [Computer program]. Version 5.3.56, retrieved 15 September 2013 from http://www.praat.org/
  5. Bradley M, Lang PJ: Measuring emotion – the self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 1994, (25):49–59.