Question 74

The most productive actors by category

Instructions:

  • An actor’s productivity is defined as the number of movies he/she has played.
  • Write a query to return the category_id, actor_id and number of moviesby the most productive actor in that category.
  • For example: John Doe filmed the most action movies, your query will return John as the result for action movie category.
  • Do this for every movie category.

Table 1: actor

  col_name   | col_type
-------------+--------------------------
 actor_id    | integer
 first_name  | text
 last_name   | text

Table 2: film_actor

Films and their casts

  col_name   | col_type
-------------+--------------------------
 actor_id    | smallint
 film_id     | smallint

Table 3: film_category

A film can only belong to one category

  col_name   | col_type
-------------+--------------------------
 film_id     | smallint
 category_id | smallint

Sample results

 category_id | actor_id | num_movies
-------------+----------+------------
           1 |       50 |          6
           2 |      150 |          6
           3 |       17 |          7
           4 |       86 |          6
           5 |      196 |          6
           6 |       48 |          6
           7 |        7 |          7

Solution

postgres
WITH actor_movies AS (
  SELECT 
    FC.category_id,
    FA.actor_id, 
    COUNT(DISTINCT F.film_id) num_movies
  FROM film_actor FA
  INNER JOIN film F
  ON F.film_id = FA.film_id
  INNER JOIN film_category FC
  ON FC.film_id = F.film_id
  GROUP BY FC.category_id, FA.actor_id
)
SELECT category_id, actor_id, num_movies
FROM (
	SELECT 
		category_id, 
		actor_id, 
		num_movies,
		ROW_NUMBER()OVER(PARTITION BY category_id ORDER BY num_movies DESC) AS productivity_idx
	FROM actor_movies
) X
WHERE productivity_idx = 1;

Explanation

This query retrieves the most productive actors for each film category based on the number of movies they have appeared in. It does this by first creating a CTE (Common Table Expression) called actor_movies that combines data from three tables: film_actor, film, and film_category. It selects the category ID, actor ID, and the count of distinct film IDs for each actor and category combination, and groups the results by category and actor.

Then, the main query selects the category ID, actor ID, and number of movies from the actor_movies CTE, and adds a calculated column called productivity_idx. This column is generated using the ROW_NUMBER() function, which assigns a sequential number to each row within a category based on the number of movies the actor has appeared in, in descending order. So, the most productive actor for each category will have a productivity_idx of 1.

Finally, the outer query filters the results to only show the rows where productivity_idx equals 1, i.e., the most productive actor for each category.

Copied

Last Submission

postgres

No submission yet for this engine. Run and submit your query to save it here.

Copied
Expected results

Submit a query to compare against expected output.