Mitigating Director Gender Bias in Movie Recommender Systems


Michael Garcia-Perez

Christine Deng

MENTOR: Emily Ramond

MENTOR: Parker Addison

MENTOR: Greg Thein

GitHub Repo Report


Studies in sociology and media studies have revealed a gender gap in the film industry, with an underrepresentation of female directors in film production.

The implications of this disparity on recommendation systems are not widely researched.

Many content distribution platforms (like Netflix) utilize recommendation models for personalized user content.

Widely adopted bias mitigation tools (e.g., IBM’s AI Fairness 360) are optimized for regression and classification tasks, not recommendation tasks.

Our aim is to develop a fair movie recommender system that minimizes biases associated with the director’s gender.


The data for user ratings is sourced from MovieLens, an online platform that provides personalized movie recommendations based on users’ viewing preferences and rating history.

The data on the director’s gender comes from Northwestern University’s Amaral Lab, which looks at the gender breakdown of the crew of U.S. films released between 1894 and 2011.

IMDb’s dataset on titles and identifiers is utilized to combine the two datasets mentioned above.

Full procedures on data cleaning and merging can be found in the report linked on the page.

To prepare our data for model development, we binarize the director gender column.

Distribution of Rating Scores

We further separate this by the director’s gender to compare the proportion of ratings between male versus female directors.

Proportion of Ratings by Director Gender

Note that there is a strong class imbalance within the dataset, where most movies are entirely-male directed.

Utilizing AIF360 to assess for bias, we measure the Disparate Impact and Statistical Parity Difference of our dataset.

While the Statistical Parity Difference does not notably capture bias, the Disparate Impact shows a strong indication of bias based on the director gender.

Model Development & Evaluation

AIF360’s bias mitigation techniques are optimized for classification and regression tasks, so first examine bias mitigation in our dataset on a classifier model.

Random Forest (Classification Algorithm):

Bias Metrics Before Mitigation:

Accuracy Before Mitigation: 0.7866022619775142

Recommender Systems

Recommender System using Jaccard Similarity

We investigate whether recommender systems using different similarity measures are able to provide more diverse recommendations. We look at Cosine Similarity, a popular measure for capturing nuanced relationships between items and users, and Pearson Correlation, which captures linear relationships between variables.

Recommender System using Cosine Similarity

Recommender System using Pearson Correlation

It is inconclusive which similarity function yields in the least biased recommendations, since it varies per user.

Beyond similarity metrics, recommender systems can use Singular Value Decomposition (SVD), a Matrix Factorization-based algorithm that captures latent factors underlying user preferences.

Recommender System using Singular Value Decomposition

Bias Mitigation

A pre-processing technique called reweighing is employed as a bias mitigation strategy.

The dataset is reweighed using the Reweighing technique, which adjusts the instance weights in the different groups and labels to mitigate bias.

Random Forest Pre-Processing Bias Mitigation

The similarity metrics used in our recommender models above (Jaccard, Cosine, and Pearson correlation) rely on the dataset to calculate similarities between movies.

We develop the models again using the same similarity metrics (Jaccard, Cosine, and Similarity) on the transformed dataset.

Like the other similarity metrics above, SVD relies on user-item interaction data to make recommendations.


Reweighing as a pre-processing bias mitigation technique was successful for the Random Forest Classifier, but remains limited in its application to recommender systems.

Despite applying the reweighted dataset to the recommender models utilizing Jaccard and Cosine Similarity and Pearson Correlation, there was no difference in the diversity of movie recommendations.

However, reweighing the recommender system using SVD was able to yield a Disparate Impact closer to 1, effectively mitigating some bias in the predictions of rating scores.


The efficacy of AIF360’s Reweighing technique on bias mitigation in recommender systems is subject to ongoing discourse.

Because of the underrepresentation of female film directors in the dataset, reweighing adjusts the distribution of instances across different groups.

Additionally, simply examining the director’s gender in movie recommender systems is not enough to consider.

Lastly, while the bias mitigation techniques in this project were able to achieve a far model in terms of statistical fairness metrics, these techniques do not fully address the underlying systemic biases present in society.


  1. Karniouchina, E. V., Carson, S. J., Theokary, C., Rice, L., & Reilly, S. (2023). Women and minority film directors in Hollywood: Performance implications of product development and distribution biases. Journal of Marketing Research, 60(1), 25-51. [DOI: 10.1177/00222437221100217] 

  2. Smith, S. L., Choueiti, M., & Pieper, K. (2018). Inclusion in the Director’s Chair. Examining 1,100 Popular Films. []