Skip to main
News

Data Best Practices for the New Year

Happy New Year, OUE! As we stride enthusiastically (or reluctantly) into 2025, it’s a great time to review best practices for working with the various types of data we encounter in our work. We need all sorts of data to do our work well, and a healthy data environment is crucial to supporting and informing our work in OUE and beyond. Several best practices and resources to support them are highlighted below. As always, don’t hesitate to reach out to the OUE Research team (molly.weeks@duke.edu) with questions and for consultation. We’re here to help!

  1. Know the sensitivity of your data according to Duke’s Data Classification Standard

    In our work as academic professionals, many of us regularly work with student records data that are protected by the Family Educational Rights and Privacy Act, or FERPA. Helpful information and a tutorial about understanding FERPA are available here. FERPA-protected data are considered “sensitive” under the Duke Data Classification standard, but other data we work with may be considered “restricted,” or, in rare cases, “public.”
     
  2. Store and transfer data in Duke-approved settings according to their sensitivity

    Once you know the classification of the data you’re working with, be sure to store and transfer those data using approved Duke services. Controlled-access cloud services such as Duke Box and Duke’s instance of Microsoft One Drive are approved for the storage and transfer of sensitive data and provide a great mechanism for storing and sharing data of any classification securely. Your computer desktop is not generally a secure place to store data, and e-mail is not a secure mechanism for data-sharing.
     
  3. Wherever possible, work with and store deidentified data, and report aggregated, as opposed to individual-level, data

    In many cases, access to personally identifiable information is integral to our work. In other cases, we don’t need or want to know which data come from which individuals. A great principle for protecting privacy and confidentiality is to work with data that have the least amount of identifiable information possible. When it’s not necessary to know the identity of individuals, we recommend removing unique identifiers (such as name, e-mail address, netID Duke unique ID, or EmplID), or not collecting unique identifiers in the first place.

    Beyond direct identifiers, it can be helpful to consider whether other pieces of information, alone or in combination, can be used to deduce the identity of an individual (often referred to in research settings as deductive disclosure). Considering deductive disclosure risk is especially relevant when reporting deidentified individual-level or aggregated data to broader audiences (for information about reducing risk of deductive disclosure in a research setting, see https://www.icpsr.umich.edu/web/pages/DSDR/disclosure.html).
     
  4. Delete temporary files after downloading data from the web (e.g., Qualtrics)

    When downloading data from Duke services on the web, such as Qualtrics, a temporary version of the data file is saved locally on your machine in a temporary or downloads (or similar) folder, which is not a secure location. A best practice is to delete all temporary instances of data files from your machine and empty the recycling bin or trash folder to make sure no local copies of the dataset are saved on your machine.