Data management with Python

Research questions:

  1. Does having lot of energy help you to get started doing things?
  2. Does the amout of weekly video or computer game playing affect to having lot of energy?

Summary: Selected variables had values to be ignored like ”Refused”, ”Don’t know” and ”Not applicable” with different numbers in each variable (6, 8, 9, 996, 998). These answers were coded out with NaN. Also numeric explanations variables had, were recoded to include textual description to make output more readable. For video and computer playing hours per week (H1DA10) data was set in 50% and 100% percentiles.

Results: With all three variables quite a few answers were ignored totaling 52 answers from N=6504. Interesting finding was that most of the respondents did not play video or computer games at all or just one hour per week (50% percentile). I had difficulties to find out that it is not possible to take example 25% percentiles from this data because four equal sized groups could not be formed because most of the answers covered two first alternatives. Trying to have four groups ended having unclear error message stating:

ValueError: Bin edges must be unique: array([ 0., 0., 1., 3., 99.])