Really appreciate that Leon! Cheers
Luming,
Nov. 26, 2021, 11:59 p.m.
Hi All - a little late to this thread, but I was struggling with this question my self. I wanted to follow up on the business logic above of ensuring apples to apples comparisions by making sure there was enough information on trip experience. After some investigation, it turns out that the number of trips per restaurant, as defined by the trip table, is always greater than the number of trip ratings a restaurant received, as defined by the trip_rating table. This produces some interesting edge cases at the threshold of 10. Using the following query:
SELECT city,
restaurant_id,
SUM(IF(thumb_up IS TRUE, 1, 0)) / COUNT(DISTINCT t.trip_id) AS avg_rating_t,
SUM(IF(thumb_up IS TRUE, 1, 0)) / COUNT(DISTINCT tr.trip_id) AS avg_rating_tr,
COUNT(t.trip_id) AS num_trips_t,
COUNT(tr.trip_id) AS num_trips_tr
FROM trip AS t
LEFT JOIN trip_rating AS tr
ON t.trip_id = tr.trip_id
GROUP BY city, restaurant_id
HAVING num_trips_t >= 10
AND num_trips_tr < 10;
I generated the table:
| city |
restaurant_id |
avg_rating_t |
avg_rating_tr |
num_trips_t |
num_trips_tr |
| Berlin |
100134 |
0.7000 |
0.7778 |
10 |
9 |
| Chicago |
100096 |
0.6000 |
0.8571 |
10 |
7 |
| Chicago |
100097 |
0.6000 |
0.7500 |
10 |
8 |
| Delhi |
100099 |
0.5000 |
0.6667 |
12 |
9 |
| Los Angeles |
100135 |
0.7000 |
0.7778 |
10 |
9 |
| Portland |
100090 |
0.6667 |
0.8889 |
12 |
9 |
| Portland |
100115 |
0.6364 |
0.8750 |
11 |
8 |
| Seattle |
100079 |
0.8182 |
1.0000 |
11 |
9 |
| Tokyo |
100141 |
0.8000 |
0.8889 |
10 |
9 |
What is interesting to me is that the solution effectively uses num_trips_t to determine if the restaurant is included in the results, but it uses num_trips_tr to calculate the average rating. This ends up affecting the results such that restaurant 100079 in Seattle is ranked in the top three for the solution, but if you were to use the same num_trips (either t or tr) as both the threshold to be included AND as the denominator in calculating the average rating you get the results:
- num_trips_t produces an avg_rating of 0.8182, which ranks 4th (out of our top 3 spots)
OR
- num_trips_tr excludes the restaurant from the rankings all together because it is less than 10
Further, it would be possible with the current solution to have a restaurant with 10 or more trips as per the trip table but only one rating as per the trip_rating table with a 'thumbs-up', which would rank it as number one in that city, but with only one review.
So my question is, do we consider this a fair apples to apples comparison? And I appreciate this is far more of a business logic question than a pure SQL question, but thought it was worth asking since I found this interesting!
Thanks!
Will
Will,
Jan. 24, 2023, 11:31 a.m.