Skip to Main Content

Today's hours:

See all library hours »

  • Ask a Librarian
  • FAQ

Clean Messy Data with OpenRefine

Transformations

Use the column drop down menus to access transformations for:

  • removing proceeding and trailing white spaces; collapsing extra spaces
  • standardizing letter case
  • removing html
  • changing cell type
  • ...and more

Use the star and flag options found in the first two columns to perform actions on specific rows/records.

Facets & Clustering

Facets

Facet options group together identical cells across rows for a particular column, and indicates the number of rows within each group. Facets can be used to:

  • identify and resolve inconsistencies
  • conduct batch edits similar to find and replace
  • gather count data

Access Facet options by clicking on the down arrow next to each column title. 

Cluster

Cluster options attempt to group together different cell values that may be alternative representations of the same thing, for example, English, english, EN, eng, ENG

Column Transformations

Reorder and/or Remove Columns

  • Using the All dropdown menu (above the star column), select Edit columns--> Re-order/remove columns. Drag and drop columns to reorder; move columns to right pane to remove them from the grid.  

Splitting Cells into Columns 

  • Cells can be split into additional columns using a separator (often a comma, semi-colon, dash, space, or other character) or field length.
  • Using the dropdown menu for the column you wish split, select Edit column-->Split into several columns. 

Switching Rows & Columns

  • OpenRefine also can support splitting rows into columns, and splitting cells into columns or rows using the Transpose function.

GREL Practice

Practice script: DOAJ article sample data

Standardizing author name order

  1. Split variables in the author name column into several columns
  2. Filter for names in last name, first name order
  3. Using Edit Cells --> Transform 
  4. create an array, reverse the order, and remove the array
    1. value.split(",").reverse().join(" ")
  5. Merge the author columns into a single column using the Join Column function
  6. Use the All --> Edit Columns function to apply the script to the remaining Author columns

Using GREL

GREL (General Refine Expression Language) allows you to build custom facets, filters and transformations beyond what is offered through OpenRefine menu options. 

GREL supports two types of syntax:

value.function(options)

function(value, options)

Common GREL functions:

  • Create an array: value.split()
    • many GREL functions require data to be split
  • Remove duplicates: value.uniques()
  • Reverse: value.reverse()

In the transformation box, click help to see more functions.

 

Fetching Data

OpenRefine can retrieve and parse data from external sources, to add new data, or reconcile your data against another source. Example sources:

  • Wikidata
  • CrossRef
  • Virtual International Authority File
  • Library of Congress Authorities
  • Google Maps Geocoder

For instructions, visit