janitor 2.1.0.9000 (unreleased, under development) Unreleased

Breaking changes

  • Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year). excel_numeric_to_date() did not account for this error, and now it does (thanks @billdenney for fixing). Dates returned from excel_numeric_to_date() that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will become as.POSIXct(NA). (thanks @billdenney for fixing)
  • A minor breaking change is the time zone is now always set for excel_numeric_to_date() and convert_date(). The default timezone is Sys.timezone(), previously it was an empty string ("").

Minor features

  • excel_numeric_to_date() now warns when times are converted to NA due to hours that do not exist because of daylight savings time. (fixed #420, thanks @Geomorph2 for reporting and @billdenney for fixing)

Bug fixes

  • When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a tabyl, the resulting columns or list are now sorted in numeric order, not alphabetic. (fixed #438, thanks @daaronr for reporting and @mattroumaya for fixing)

janitor 2.1.0 (2021-01-05) 2021-01-05

New features

  • The adorn_totals() function now accepts the special argument fill = NA, which will insert a class-appropriate NA value into each column that isn’t being totaled. This preserves the class of each column; previously they were all convered to character. (thanks @hamstr147 for implementing in #404 and @ymer for reporting in #298).

  • adorn_totals() now takes the value of "both" for the where argument. That is, adorn_totals("both") is a shorter version of adorn_totals(c("col", "row")). (#362, thanks to @svgsstats for implementing and @sfd99 for suggesting).

  • adorn_totals() now optionally accepts separate name values for a totals row and a totals column. The default remains that a single name, "Total", is applied to both. But now if a vector of two strings is passed to the name parameter, the first one will be used as the row heading (in column 1) and the second will be used as the column heading. (Thanks @francisbarton for suggesting in #359 and implementing in #413.)

Bug fixes

  • Fixed rounding issue in round_half_up() function (#396, thanks to @JJSteph)

  • Warnings for incomplete argument names are fixed (fix #367, thanks to @pabecerra for reporting and @billdenney for fixing)

  • 3-way tabyls with factors have columns and rows sorted in the correct order, by factor level (#379).

  • Transliteration from extended ASCII (character codes >127) to printable ASCII (character codes <=127) is now better supported (#389, thanks to @dcorynia for reporting and @billdenney for fixing)

  • clean_names called on a grouped tibble now also changes the names of the grouping variable(s), in addition to the column names (#260, thanks @CerebralMastication for reporting and the tidyverse team for fixing).

  • Omitting a numeric column of a tibble when using the ... select in adorn_totals() now succeeds (#388)

  • A call to make a 3-way tabyl() now succeeds when the first variable is of class ordered (#386)

  • If a totals row and/or column is present on a tabyl as a result of adorn_totals(), the functions chisq.test() and fisher.test() drop the totals and print a warning before proceding with the calculations (#385).

janitor 2.0.1 (2020-04-12) 2020-04-12

Bug fixes and Breaking changes

Transliteration of characters within make_clean_names() now operates across operating systems, independent of differences in stringi installations (Fix #365, thanks to @eamoncaddigan for reporting and @billdenney for fixing).

This bug patch represents a breaking change with the way that make_clean_names() worked in janitor versions 1.2.1.9000 and 2.0.0 as the transliterations are now more generalized and follow a more best-practice approach to transliterating to ASCII.

janitor 2.0.0 (2020-04-07) 2020-04-08

Breaking changes

  • clean_names() and make_clean_names() are now more locale-independent and translation to ASCII is simpler (in many cases, Unicode is removed, e.g., the Greek character “delta” becomes a “d”). You may also now control how substitutions occur and add your own substitutions (like “%” becoming “percent”). As a result of these changes, the clean names generated by these functions may break with what was produced in prior versions of janitor. (Fix #331, thanks to @billdenney)

As part of the improvements to make_clean_names() and clean_names(), the ... argument was added, allowing the user to pass additional information to the underlying transformation function from the snakecase package, to_any_case(). This allows for greater user control of clean_names() / make_clean_names() and for new functionality like specifying case = "title" for transforming variable names back to title case for making plots.

  • The adorn_* family of functions now allows control of columns to be adorned using the ... argument. This often-requested feature results in a small breakage as the now-redundant argument skip_first_col in adorn_percentages() was removed.

  • Obsolete functions were deprecated: crosstab, adorn_crosstab, use_first_valid_of, convert_to_NA, remove_empty_cols, remove_empty_rows, add_totals_col, add_totals_row.

Major features

Minor features

  • A quiet argument was added to remove_empty() and remove_constant() providing more information when quiet = 'FALSE' (#70, thanks to @jbkunst for suggesting and @billdenney for implementing).

  • row_to_names() works on matrix input (#320, thanks to @billdenney for suggesting and implementing

  • clean_names() can now be called on tbl_graph objects from the tidygraph package. (#252, thanks to @gvdr for bringing up the issue and thanks to @Tazinho for proposing solution).

Bug fixes

  • adorn_ns() doesn’t append anything to character columns when called on a data.frame resulting from a call to adorn_percentages(). (#195).

  • The name argument to adorn_totals() is correctly applied to 3-way tabyls (#306) (thanks to @jzadra for reporting).

  • adorn_rounding() now works when called on a 3-way tabyl.

  • remove_constant() works correctly with tibbles (in addition to already working on data.frames and matrices) (thanks to @billdenney for implementing).

  • get_dupes() works when called on a grouped tibble (#329) (thanks to @jzadra for fixing).

  • When the second variable in a tabyl (the column variable) contains the empty string "", it is converted to "emptystring_ before being spread to the tabyl’s column names. Previously it became the default variable name V1. (#203).

  • Behind-the-scenes code changes to maintain compatibility with breaking changes to dplyr 1.0.0, tibble 3.0.0, and R 4.0.0.

janitor 1.2.1 (2020-01-22) 2020-01-22

Adjusted a single test to account for a different error message produced by the tidyselect package. No changes to package functionality.

janitor 1.2.0 (2019-04-20) 2019-04-21

Major features

  • The new function make_clean_names() takes a character vector and returns the cleaned text, with the same functionality as the existing clean_names(), which runs on a data.frame, manipulating its names. (#197, thanks @tazinho and everyone who contributed to the discussion).

This function can be supplied as a value for the .name_repair argument of as_tibble() in the tibble package. For example: as_tibble(iris, .name_repair = make_clean_names).

  • The new function compare_df_cols() compares the names and classes of columns in a set of supplied data.frames or tibbles, reporting on the specific columns that are or are not similar. This is for the common use case where a set of data files should all have the same specifications but, in practice, may not. A companion function compare_df_cols_same() gives a TRUE/FALSE result indicating if the columns are the same (and therefore bindable, though FALSE is not definitive that binding will fail).

    • Its helper function describe_class() is exported for developers who wish to extend it so that the compare_df_ functions treat their custom classes appropriately.

This feature (#50) took almost 3 years from conception to implementation. Major thanks to @billdenney for making it happen!

  • A new function round_to_fraction() allows rounding to a fraction with specified denominator, e.g., to the nearest 1/7 (#235, thanks to @billdenney for suggesting & implementing).

  • The functions janitor::chisq.test() and janitor::fisher.test() to enable running these statistical tests from the base stats package on two-way tabyl objects. While the package loading message says the base functions are masked, the base tests still run on table objects (#255, thanks @juba for implementing).

  • remove_empty() now has a companion function remove_constant() which removes columns containing only a single unique value, optionally ignoring NA (#222, thanks to @billdenney for suggesting & implementing).

Minor features

  • excel_numeric_to_date() now returns a POSIXct object and includes a time zone. (#225, thanks to @billdenney for the feature.)

  • clean_names() can now be called on a simple features object from the sf package. (#247, thanks to @JosiahParry for suggesting & implementing.)

  • adorn_totals() gains an argument "name" that allows the user to specify a value other than “Total” to appear as the name of the added row and/or column (#263). Thanks to @StephieLaPugh for suggesting and @daniel-barnett for implementing.

  • remove_empty() and remove_constant() now work with matrices (returning a matrix). (#215) Thanks to @jsta for reporting and @billdenney for patching.

  • If the third variable in a three-way tabyl is a factor, the resulting list is sorted in order of its levels (#250). Empty factor levels in the 3rd variable are still omitted regardless of the value of show_missing_levels.

Bug fixes

janitor 1.1.1 (2018-07-30) 2018-07-31

Release summary

Patches a bug introduced in version 1.1.0 where excel_numeric_to_date() would fail if given an input vector containing an NA value.

Bug fixes

  • excel_numeric_to_date() again handles NA correctly, in version 1.1.0 the function would error if any values of the input vector were NA. (#220). Thanks @emilelatour for reporting and @billdenney for patching.

janitor 1.1.0 (2018-07-17) 2018-07-18

Release summary

This release was requested by CRAN to address some minor package dependency issues. It also contains several updates and additions described below.

Major features

The new function row_to_names() handles the case where a dirty data file is read in with its names stored as a row of the data.frame, rather than in the names. This function sets the names of the data.frame to this row and optionally cleans up the rows above and including where the names were stored. Thanks to @billdenney for writing this feature.

Minor features

excel_numeric_to_date() can now convert fractions of a day to time, e.g., excel_numeric_to_date(43001.01, include_time = TRUE) returns the POSIXlt value "2017-09-23 00:14:24". Thanks to @billdenney.

Breaking changes

As part of excel_numeric_to_date() now handling times, if a Date-only result is requested (the default behavior of include_time = FALSE), any fractional part of the date is now removed. The printed date itself is identical, but the internal representation of this object now contains only the integer part of the date. For example, while under both the old and new versions of this function the call excel_numeric_to_date_old(42001.1) would return the Date object "2014-12-28", calling as.numeric on this Date result would previously return 16432.1, while now it returns 16432.

This an improved behavior, as now excel_numeric_to_date(42001.1, include_time = FALSE) == as.Date("2014-12-28") returns TRUE, while previously it would appear to be equivalent from the printed value but this comparison would return FALSE.

janitor 1.0.0 (2018-03-17) 2018-03-22

Release summary

A stable version 1.0.0, with a new tabyl API and with breaking changes to the output of clean_names().

This builds on the original functionality of janitor, with similar-but-improved tools and significantly-changed implementation.

Breaking changes

A fully-overhauled tabyl

tabyl() is now a single function that can count combinations of one, two, or three variables, ala base R’s table(). The resulting tabyl data.frames can be manipulated and formatted using a family of adorn_ functions. See the tabyls vignette for more.

The now-redundant legacy functions crosstab() and adorn_crosstab() have been deprecated, but remain in the package for now. Existing code that relies on the version of tabyl present in janitor versions <= 0.3.1 will break if the sort argument was used, as that argument no longer exists in tabyl (use dplyr::arrange() instead).

Improvements to clean_names

clean_names() now detects and preserves camelCase inputs, allows multiple options for case outputs of the cleaned names, and preserves whether there’s space between letters and numbers. It also transliterates accented letters and turns # into "number".

These changes may cause old code to break. E.g., a raw column name variableName would now be converted to variable_name (or variableName, VariableName, etc. depending on your preference), where previously it would have been converted to variablename.

To minimize this inconvenience, there’s a quick fix for compatibility: you can find-and-replace to insert the argument case = "old_janitor", preserving the old behavior of clean_names() as of janitor version 0.3.1 (and thus not have to redo your scripts beyond that.)

No further changes are planned to clean_names() and its results should be stable from version 1.0.0 onward.

Major features

  • clean_names() transliterates accented letters, e.g., çãüœ becomes cauoe (#120). Thanks to @fernandovmacedo.

  • clean_names() offers multiple options for variable name styling. In addition to snake_case output you can select smallCamelCase, BigCamelCase, ALL_CAPS and others. (#131).

    • Thanks to @tazinho, who wrote the snakecase package that janitor depends on to do this, as well as the patch to incorporate it into clean_names(). And thanks to @maelle for proposing this feature.
  • Launched the janitor documentation website: http://sfirke.github.io/janitor. Thanks to the pkgdown package.

  • Deprecated the functions remove_empty_rows() and remove_empty_cols(), which are replaced by the single function remove_empty(). (#100)

    • To encourage transparency, remove_empty() prints a message if no value is supplied for the which argument; to suppress this, supply a value to which, even if it’s the default c("rows", "cols").
  • The new adorn_title() function adds the name of the 2nd tabyl variable (i.e., the name of the column variable). This un-tidies the data.frame but makes the result clearer to readers (#77)

Minor features

Bug fixes


janitor 0.3.1 (2018-01-04) 2018-01-04

Release summary

This is a bug-fix release with no new functionality or changes. It fixes a bug where adorn_crosstab() failed if the tibble package was version > 1.4.

Major changes to janitor are currently in development on GitHub and will be released soon. This is not that next big release.


janitor 0.3.0 (2017-05-06) 2017-05-06

Release summary

The primary purpose of this release is to maintain accuracy given breaking changes to the dplyr package, upon which janitor is built, in dplyr version >0.6.0. This update also contains a number of minor improvements.

Critical: if you update the package dplyr to version >0.6.0, you must update janitor to version 0.3.0 to ensure accurate results from janitor’s tabyl() function. This is due to a change in the behavior of dplyr’s _join functions (discussed in #111).

janitor 0.3.0 is compatible with this new version of dplyr as well as old versions of dplyr back to 0.5.0. That is, updating janitor to 0.3.0 does not necessitate an update to dplyr >0.6.0.

Breaking changes

  • The functions add_totals_row and add_totals_col were combined into a single function, adorn_totals(). (#57). The add_totals_ functions are now deprecated and should not be used.
  • The first argument of adorn_crosstab() is now “dat” instead of “crosstab” (indicating that the function can be called on any data.frame, not just a result of crosstab())

Major features

  • Exported the %>% pipe from magrittr (#107).

Deprecated the following functions: - use_first_valid_of() - use dplyr::coalesce() instead - convert_to_NA() - use dplyr::na_if() instead - add_totals_row() and add_totals_col() - replaced by the single function adorn_totals()

Minor features

  • adorn_totals() and ns_to_percents() can now be called on data.frames that have non-numeric columns beyond the first one (those columns will be ignored) (#57)
  • adorn_totals("col") retains factor class in 1st column if 1st column in the input data.frame was a factor

Bug fixes


janitor 0.2.1 (2016-10-30) 2016-10-31

Bug fixes


janitor 0.2.0 (2016-10-03) 2016-10-03

Features

Major

Submitted to CRAN!

Minor

  • The count in tabyl() for factor levels that aren’t present is now 0 instead of NA (#48)

Bug fixes

  • Can call tabyl() on the result of a tabyl(), e.g., mtcars %>% tabyl(mpg) %>% tabyl(n) (#54)
  • get_dupes() now works on variables with spaces in column names (#62)

Package management

  • Reached 100% unit test code coverage

janitor 0.1.2 Unreleased

Features

Major

  • Added a function adorn_crosstab() that formats the results of a crosstab() for pretty printing. Shows % and N in the same cell, with the % symbol, user-specified rounding (method and number of digits), and the option to include a totals row and/or column. E.g., mtcars %>% crosstab(cyl, gear) %>% adorn_crosstab().
  • crosstab() can be called in a %>% pipeline, e.g., mtcars %>% crosstab(cyl, gear). Thanks to @chrishaid (#34)
  • tabyl() can also be called in a %>% pipeline, e.g., mtcars %>% tabyl(cyl) (#35)
  • Added use_first_valid_of() function (#32)
  • Added minor functions for manipulating numeric data.frames for presentation: ns_to_percents(), add_totals_row(), add_totals_col(),

Minor

  • crosstab() returns 0 instead of NA when there are no instances of a variable combination.
  • A call like tabyl(df$vecname) retains the more-descriptive $ symbol in the column name of the result - if you want a legal R name in the result, call it as df %>% tabyl(vecname)
  • Single and double quotation marks are handled by clean_names()

Package management

  • Added codecov to measure test coverage
  • Added unit test coverage
  • Added Travis-CI for continuous integration

janitor 0.1 (2016-04-17) Unreleased

  • Initial draft of skeleton package on GitHub