
Course Title: Agec211: Statistical methods
Instructor: Christopher Llones
Assignment: Netflix Dataset Analysis in R
Due Date: 9 October 2025
Objective
This assignment will assess your ability to apply R programming skills—specifically using the dplyr package and the pipe operator (%>%)—to explore and analyze a real-world dataset. You will work with the Netflix Movies & TV Shows dataset to answer questions using code.
Instructions
Use R and the
dplyrpackage to answer each question.Submit your
R script file (.R)with your code and outputs.Use the pipe operator (
%>%) for all data manipulations.You may use additional packages like
tidyrorstringrif needed.Ensure your code is clean, commented, and reproducible.
Access the dataset and R script template from the agec211-assignment1 folder.
Submit your completed R script file (.R) by the due date and upload using this link: Submission Link.
Questions
Part 1: Data exploration
How many rows and columns are in the dataset?
List all unique types of content (e.g., Movie, TV Show).
How many titles were released in 2020?
Part 2: filtering and summarising
Filter the dataset to show only TV Shows released in India. How many are there?
Find the top 5 most common ratings.
Which year had the most titles added to Netflix?
Part 3: grouping and aggregation
Group the data by
typeand count how many entries each type has.Group the data by
release_yearand summarize the number of titles released per year.Which country has produced the most content on Netflix?
Advanced Filtering
Filter the dataset to show all Movies with a duration longer than 100 minutes.
Find all titles directed by ‘Steven Spielberg’.
List all titles with the genre containing ‘Documentary’.
Bonus Challenge
Create a new column that extracts the number of seasons for TV Shows. Then, find the average number of seasons.
Which actor appears most frequently across all titles?
Grading rubrics
| Criteria | Excellent (5pts) | Good (4pts) | Fair (2-3 pts) | Needs improvement (0-1 pt) |
|---|---|---|---|---|
| Code accuracy | All answers are correct and match expected outputs. | Most answers are correct with minor errors. | Several answers are incorrect or incomplete. | Many answers are missing or incorrect. |
Use of dplyr Functions |
Consistently uses appropriate dplyr verbs (filter, mutate, summarise, etc.). |
Uses dplyr functions correctly in most cases. |
Uses some dplyr functions but inconsistently or incorrectly. |
Rarely uses dplyr or misuses functions. |
Pipe Operator Usage (%>%) |
Pipe operator is used fluently and correctly throughout. | Mostly correct usage with occasional syntax issues. | Used sporadically or with frequent errors. | Not used or used incorrectly. |
| Data Manipulation & Filtering | Demonstrates strong understanding of filtering, grouping, and summarizing. | Shows good grasp with minor gaps. | Basic filtering and grouping attempted but lacks depth. | Little to no meaningful data manipulation. |
| Insight & Interpretation | Provides thoughtful insights or observations where applicable. | Some interpretation is present. | Minimal interpretation or unclear reasoning. | No interpretation or irrelevant commentary. |
| Bonus Challenge (Q13–Q14) | Completed with correct logic and creative approach. | Attempted with mostly correct logic. | Attempted but contains errors or lacks clarity. | Not attempted or incorrect. |
| Reproducibility | Code runs without errors and produces expected results. | Minor issues but generally reproducible. | Some errors prevent full reproducibility. | Code fails to run or produces major errors. |