Affectiva is an MIT Media Lab spin-off focused on understanding human emotion. Our vision is that technology needs the ability to sense, adapt and respond to not just commands but also non-verbal signals. We are building artificial emotional intelligence (Emotion AI).

As you can imagine, such an ambitious vision takes a great team with a strong desire to explore and innovate. We are growing our team to improve and expand our core technologies and help solve many unique and interesting problems focused around sensing, understanding and adapting to human emotion.

Our first technology measures human emotion through sensing and analyzing facial expressions. This technology is already being used commercially in a number of different verticals and use cases, and has been released to the public in the form of SDKs so that developers around the world can begin to use it to create a new breed of apps, websites and experiences. Since 2017, we have extended our emotion sensing technology beyond the face to leverage human speech. Our goal is to build out our technology to perform emotion sensing multi-modally from speech and facial expressions when both channels are present, and unimodally when one of the channels is available.


This position is on the Science team; the team tasked with creating and improving our emotion recognition technology. We’re a team of researchers with backgrounds in computer vision, speech processing, machine learning and affective computing. The Science team does everything from initial prototyping of state-of-the art algorithms to creating production models which can be included in our cloud and mobile products.

We are looking for a deep learning researcher to join our science team.  (S)he will have experience in solving either computer vision problems (e.g., object detection, localization and classification) or speech processing problems (e.g., speech classification, speech denoising, speaker diarization, source separation). Experience working on multi-modal classification problems, unsupervised learning, or semi-supervised learning is a plus, as well as any experience working with the human face or voice (recognition, emotion estimation, or multi-modal recognition).

We have a wide variety of interesting research areas we would like to pursue where the solutions will requiring innovating on the current state of the art. Great candidates will be those who want to shape the future of this space, can execute ideas effectively and efficiently, and are passionate about emotion research.


  • Running a multitude of deep learning experiments
    • Prototype new ideas
    • Explore a variety of approaches
    • Refine promising ideas into product ready models
  • Explore new methods to leverage Affectiva’s large dataset of spontaneous real-world audio-visual data
  • Patent and publish findings in computer vision, speech processing, and affective computing conferences


  • At least 2 years of experience using deep learning techniques (CNN, RNN/LSTM) on multi-modal (vision, speech) tasks (audio and video classification, action recognition)
  • Experience working with deep learning frameworks (e.g. Keras, TensorFlow, Theano, Caffe) including implementing custom layers
  • Passionate about innovation and pushing state of the art research
  • Strong Python programming skills
  • Demonstrated experience (publications, projects) solving machine learning problems
  • Masters or PhD in the field of computer vision or speech processing
  • Experience working in one of the follow fields is highly desirable:
    • Facial analysis – face detection, face recognition, expression and emotion classification, face landmark tracking
    • Speech analysis – source separation, speaker diarization or identification, emotion classification
  • Good presentation and communication skills

Apply now