Affectiva is an MIT Media Lab spin-off focused understanding human emotion. Our vision is that technology needs the ability to sense, adapt and respond to not just commands but also non-verbal signals. We are building artificial emotional intelligence (Emotion AI).

As you can imagine, such an ambitious vision takes a great team with a strong desire to explore and innovate. We are growing our team to improve and expand our core technologies and help solve many unique and interesting problems focused around sensing, understanding and adapting to human emotion.

Our first technology measures human emotion through sensing and analyzing facial expressions. This technology is already being used commercially in a number of different verticals and use cases, and has been released to the public in the form of SDKs so that developers around the world can begin to use it to create a new breed of apps, websites and experiences.


This position is on the Science team, the team tasked with creating and refining Affectiva’s technology. We are a group of individuals with backgrounds in machine learning, computer vision and affective computing.

We’re looking for researchers to extend our emotion sensing technology beyond the face and to analyze the human voice. Our goal is to build out our technology to perform emotion sensing unimodally from speech, as well as multi-modally from speech and facial expressions when both channels are present.

The areas of the research the selected candidate will be expected to focus on includes multi-modal emotion sensing from speech and face, speech enhancement and source separation with a goal to improve emotion recognition in a noisy acoustic environment, and semi-supervised and unsupervised techniques for automatic audio visual data collection and annotation. The candidate will work closely with other members of the science team to innovate and develop these exciting areas of research.

Great candidates will be people who will contribute ideas and want to help shape the future of this space, and can execute ideas effectively and efficiently.  This position will report to the Lead Speech Scientist.


  • Explore feature-level fusion methodologies and implement a subset of the viable feature-level fusion classification approaches for emotional state estimation from audio-visual data.
  • Develop data annotation experiments related to
    • Bootstrapping labels from video to audio channel and vise versa.
    • Autonomous learning paired with collaborative learning based approaches
    • Explore other weakly supervised or unsupervised approaches
  • Design, implement and evaluate crowdsourcing tasks for collecting datasets of affective interactions
  • Evaluate technical feasibility of research experiments and clearly communicate your implementations, experiments, and conclusions.
  • Work with engineers and labelers to design scalable annotation tools.
  • Patent and publish findings in speech, machine learning and affective computing conferences.


  • Graduate degree (MS or PhD) in Electrical Engineering, Computer Sciences, or Mathematics with specialization in speech processing or machine learning.
  • At least 2 years of experience using deep learning techniques (CNN, RNN, LSTM) on speech processing tasks (e.g., speech recognition, classification, diarization, etc.)
  • Experience working with deep learning frameworks (e.g. TensorFlow, Theano, Caffe) including implementing custom layers
  • Passionate about innovation and pushing state of the art research
  • Strong publication record in journals/proceedings such as ICASSP, NIPS, PAMI, InterSpeech.
  • Familiar with programming languages such as C/C++, Python.
  • Good presentation and communication skills

Apply now