Frequency distribution with Python

Summary: This is the blog post for second peer graded assignment for the Coursera course “Data Management and Visualization” demonstrating frequency distributions for three different variables using Python and ”AddHealth” dataset (N=6504).

Following findings were done from the output:

  • There was no missing data in these three variables. I made it sure using dropna=False and counting length for all different variables which equals N
  • Most of the respondents agree or strongly agree that they are well coordinated. Only 10 respondents refused to answer and 31 respondents don’t know
  • Most of the respondents agree or strongly agree that they have a lot of energy. Only 10 respondents refused to answer, 12 don’t know and 1 respondent answered not applicable
  • I also checked that there is a difference between males and females strongly agreeing being well coordinated. Men strongly agree 1252 times when women strongly agree 934 times
  • Dataset differs from overall population (where men bit overweight women) with having bit more women in the data. This could be because of having more women in college.

Code

Output

Codebook refinements