Data cleaning in SPSS

Поделиться
HTML-код
  • Опубликовано: 25 июл 2014
  • How to find and correct obvious errors using the software SPSS. More information is available on: science-network.tv/clean-data-...
  • НаукаНаука

Комментарии • 11

  • @aungyephone6366
    @aungyephone6366 5 лет назад +4

    Thank you for your clear explanation of finding missing data.

  • @mariadominiquecampos28
    @mariadominiquecampos28 7 лет назад +4

    Yes! thank you so much for your information!! I did my homework ,,, regards from Nicaragua

  • @nwogoekeji4230
    @nwogoekeji4230 3 года назад

    Thank you for tutorial, it was meaningful and helpful.

  • @muwongejosephjunior6131
    @muwongejosephjunior6131 6 лет назад +1

    Thank you!

  • @louisaschmidt1020
    @louisaschmidt1020 5 лет назад +1

    Thank you. But I have a problem. Somehow, frequencies are not shown in the output for one of my variables. Also, when I am in the frequency analysis window, this variable is in brackets. Do you know what the problem could be?

  • @daigohayden89
    @daigohayden89 5 лет назад

    what version spss do you use??

    • @science-network
      @science-network  4 года назад +1

      It was probably version 22 or 23 when this video was recorded. However, it should be the same also in more recent versions of SPSS.

  • @polished_kiwi227
    @polished_kiwi227 8 лет назад

    But what if there's no way to fact-check and find what the value "should" be? I have coded impossible values as missing, but I'm not sure where to go from here as far as making my data set clean...

    • @science-network
      @science-network  8 лет назад +1

      Normally you plan your project so that one column / variable is a unique id for each participant / observation. It would be a good idea to have another file linking up the unique id with the actual real data. You need to consider integrity and protecting data in case individuals (patients?) are involved so the link to real data might be stored in a separate encrypted file having the unique id with the link to how to find the raw real data. The link to real data might be a patient identification number in the electronic chart of a hospital. You are in some trouble if you have no link to real data making it impossible to check suspicious values. Also never throw away hard copies of questionnaires or any other research related material until several years after last publication.
      So what should you do if you have no possibility to check with the true raw data? Some values are obviously wrong and I support changing them to missing data. However, quite often you find data that are unlikely but not impossible. Are they wrong or correct? There is no way to sort this out without a link to the actual data. My suggestion is to eliminate "impossible" observations coding them as missing. I would also suggest leaving unlikely but possible observations as they are unless you have a very good reason to believe it is an error.

  • @statistikochspss-hjalpen8335
    @statistikochspss-hjalpen8335 3 года назад

    Very ineffective. You could use DO IF and RECODE statements instead.

    • @science-network
      @science-network  3 года назад

      That is true if you already know what you want to change. This video is more about exploring what kind of errors there are in the dataset and also, if possible, to identify the exact record and go to the original source to see what the correct input should be.