1: We have to write the code in python for extracting the data which should be done in hadoop environment from youtube.
2: We have to load the file from local to hdfs and create hive and impala tables.
3: Now load the data of hive and impala.
4: Segmentation: Load the data from Hive or Impala using pyspark
5: Create dataframe
6: Build Analysis like:
A: Number of users who see videos related to money deposit in banks.
B: Number of users who transfer money within same banks and to external bank accounts.
C: Segmentation: Find the location, age, number of comments, likes, feedback of users who see the videos.
It will be better if you setup cloudera environment from there we can do everything in python.