Skip to Main Content

Today's hours:

See all library hours »

  • Ask a Librarian
  • FAQ

Clean Messy Data with OpenRefine

Getting Started

Importing your Data

OpenRefine is compatible with data in CSV, TSV, *SV, Excel (.xls .xlsx), JSON, XML, RDF as XML, Google Sheets. Support for other formats can be added with Google Refine extensions.

To upload your data: 

  • Use the option in the center pane to locate your data. 
  • click Next.

OpenRefine will display a preview of your data, along with some options found within the lower section. To begin using OpenRefine, click the Create Project button found on the top right of the screen. 

OpenRefine will automatically save  changes you make to your date as well as your operational history. To resume working on a project, select open project to access a list of your active projects. 

Navigating OpenRefine

View Options

  • OpenRefine will display a limited number of rows; use the options above the grid to change your view up to 50 rows. To check that your data imported correctly, use the options found towards the far right to view the last set of rows. 

Transformation/Edit Options

  • OpenRefine can be described as "column-centric." Access transformation options using the drop down menu associated with the column you wish to modify. 
  • Use the Star and flag options in the first two columns to apply transformations to specific rows.

Single Cell Edits

  • To edit a single cell, hover over the cell to display an edit button in the top right of the cell.
    • Note, single cell edits are not captured in the operational history.  

Undo/Redo

Any change that you make to your data in OpenRefine can be undone, and is documented and numbered from when you first create your project. 

Use the Undo/Redo option to change any action, or to change a group of actions you have performed on this data. 

Your operational history is saved along with your data as part of your project. Quitting OpenRefine and reopening your project does not disturb the operational history. 

Operational history can also be extracted and applied to additional datasets. 

Note, edits modifying the contents of a specific, single cell are not captured in the operational history. 

Workshop Materials

This guide: 

Practice data: