CRAN release: 2023-02-02
These are all minor breaking changes resulting from enhancements and are not expected to affect the vast majority of users.
...argument was added to
row_to_names(), preceding the
remove_rowargument, as part of the new
find_header()functionality. If code previously used
remove_rowas an unnamed argument, it will now error. If code previously used the unsupported behavior of passing anything other than
remove_row, unexpected results may occur.
Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year).
excel_numeric_to_date()did not account for this error, and now it does. Dates returned from
excel_numeric_to_date()that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will become
as.POSIXct(NA). (#423, thanks @billdenney for fixing)
A minor breaking change is that the time zone is now always set for
convert_date(). The default timezone is
Sys.timezone(), previously it was an empty string (
""). (#422, thanks @billdenney for fixing)
There are several minor breaking changes resulting from enhancements to
- The addition of the new argument
format_funcmeans that previous calls relying on
,,,as shorthand to get to the
...column selection argument may now require an extra comma.
adorn_ns()now defaults to displaying numbers of >3 digits with
big.mark = ",", as part of the default value of the new
adorn_ns()no longer prints leading whitespace when
position = "front"- this is not a visible change in the printed result and it would be rare that this affects any code.
- The addition of the new argument
When the first column of the data.frame input to
adorn_totals()is a factor and a totals row is added to the bottom, that column now remains a factor, with “Total” or other user-specified totals name added to its factor levels (#494).
row_to_names()now has a new helper function,
find_header()to help find the row that contains the names. It can be used by passing
row_number="find_header". See the documentation of
find_header()for more examples. (fix #429)
remove_empty()has a new argument,
cutoffwhich allows rows or columns to be removed if at least the
cutofffraction of the data are missing. (fix #446, thanks to @jzadra for suggesting the feature and @billdenney for fixing)
adorn_Ns()contains a new
format_funcargument so that the user can format the Ns to their liking, e.g., changing the
clean_names()) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by a
replaceargument value. (#448, thanks @IndrajeetPatil for reporting and @billdenney for fixing) The rationale is that standard transliteration would convert
"mg"when it would be more typically be converted to
"ug"for use as a unit. A new, unexported constant (janitor:::mu_to_u) was added to help with mu to “u” replacements.
excel_numeric_to_date()now warns when times are converted to
NAdue to hours that do not exist because of daylight savings time (fix #420, thanks @Geomorph2 for reporting and @billdenney for fixing). It also warns when inputs are not positive, since Excel only supports values down to 1 (#423).
tabyl()or similar data.frame is sorted (e.g., with
dplyr::arrange()), then has
adorn_percentages()called on it, followed by
adorn_ns(), the Ns will be sorted correctly to match the tabyl they’re being adorned on. (fix #407)
adorn_pct_formatting()uses the locale-dependent value of
decimal.markas a decimal separator, e.g., in locales where
,it will print percentages in the format
"12,34%". This character can also be set manually with
options(OutDec = ",").(#451).
adorn_totals(where ="row")now preserves factor class and levels of the first column of the input data.frame (#494).
Some warning messages now have classes so that they can be specifically suppressed with
suppressWarnings(..., class="the_class_to_suppress"). To find the class of a warning you typically must look at the code where the error is occurring. (#452, thanks to @mgacc0 for suggesting and @billdenney for fixing)
When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a
tabyl, the resulting columns or list are now sorted in numeric order, not alphabetic. (#438, thanks @daaronr for reporting and @mattroumaya for fixing)
CRAN release: 2021-01-05
adorn_totals()function now accepts the special argument
fill = NA, which will insert a class-appropriate
NAvalue into each column that isn’t being totaled. This preserves the class of each column; previously they were all convered to character. (thanks @hamstr147 for implementing in #404 and @ymer for reporting in #298).
adorn_totals()now takes the value of
whereargument. That is,
adorn_totals("both")is a shorter version of
adorn_totals(c("col", "row")). (#362, thanks to @svgsstats for implementing and @sfd99 for suggesting).
adorn_totals()now optionally accepts separate name values for a totals row and a totals column. The default remains that a single name,
"Total", is applied to both. But now if a vector of two strings is passed to the
nameparameter, the first one will be used as the row heading (in column 1) and the second will be used as the column heading. (Thanks @francisbarton for suggesting in #359 and implementing in #413.)
Fixed rounding issue in round_half_up() function (#396, thanks to @JJSteph)
Warnings for incomplete argument names are fixed (fix #367, thanks to @pabecerra for reporting and @billdenney for fixing)
3-way tabyls with factors have columns and rows sorted in the correct order, by factor level (#379).
Transliteration from extended ASCII (character codes >127) to printable ASCII (character codes <=127) is now better supported (#389, thanks to @dcorynia for reporting and @billdenney for fixing)
clean_namescalled on a grouped tibble now also changes the names of the grouping variable(s), in addition to the column names (#260, thanks @CerebralMastication for reporting and the tidyverse team for fixing).
If a totals row and/or column is present on a tabyl as a result of
adorn_totals(), the functions
fisher.test()drop the totals and print a warning before proceding with the calculations (#385).
CRAN release: 2020-04-12
Transliteration of characters within
make_clean_names() now operates across operating systems, independent of differences in
stringi installations (Fix #365, thanks to @eamoncaddigan for reporting and @billdenney for fixing).
This bug patch represents a breaking change with the way that
make_clean_names() worked in janitor versions 188.8.131.5200 and 2.0.0 as the transliterations are now more generalized and follow a more best-practice approach to transliterating to ASCII.
CRAN release: 2020-04-08
make_clean_names()are now more locale-independent and translation to ASCII is simpler (in many cases, Unicode is removed, e.g., the Greek character “delta” becomes a “d”). You may also now control how substitutions occur and add your own substitutions (like “%” becoming “percent”). As a result of these changes, the clean names generated by these functions may break with what was produced in prior versions of janitor. (Fix #331, thanks to @billdenney)
As part of the improvements to
... argument was added, allowing the user to pass additional information to the underlying transformation function from the
to_any_case(). This allows for greater user control of
make_clean_names() and for new functionality like specifying
case = "title" for transforming variable names back to title case for making plots.
adorn_*family of functions now allows control of columns to be adorned using the
...argument. This often-requested feature results in a small breakage as the now-redundant argument
Obsolete functions were deprecated:
The new functions
convert_to_datetime()generalize the work done by
excel_numeric_to_date()allowing conversion to date or datetimes from many forms of input from numeric, to characters that look like numbers, to characters that look like dates or datetimes, to Dates, to date-times (POSIXt) (#310, thanks to @billdenney for implementing). For instance, this succeeds:
The variables considered by the function
get_dupes()can be specified using the select helper functions from
tidyselect. This includes
-column_nameto omit a variable as well as the matching functions
?tidyselect::select_helpersfor more (#326, thanks to @jzadra for suggesting and implementing).
adorn_rounding()now works when called on a 3-way tabyl.
remove_constant()works correctly with tibbles (in addition to already working on data.frames and matrices) (thanks to @billdenney for implementing).
When the second variable in a tabyl (the column variable) contains the empty string
"", it is converted to
"emptystring_before being spread to the tabyl’s column names. Previously it became the default variable name
Behind-the-scenes code changes to maintain compatibility with breaking changes to dplyr 1.0.0, tibble 3.0.0, and R 4.0.0.
CRAN release: 2020-01-22
Adjusted a single test to account for a different error message produced by the
tidyselect package. No changes to package functionality.
CRAN release: 2019-04-21
- The new function
make_clean_names()takes a character vector and returns the cleaned text, with the same functionality as the existing
clean_names(), which runs on a data.frame, manipulating its names. (#197, thanks @tazinho and everyone who contributed to the discussion).
This function can be supplied as a value for the
.name_repair argument of
as_tibble() in the
tibble package. For example:
as_tibble(iris, .name_repair = make_clean_names).
The new function
compare_df_cols()compares the names and classes of columns in a set of supplied data.frames or tibbles, reporting on the specific columns that are or are not similar. This is for the common use case where a set of data files should all have the same specifications but, in practice, may not. A companion function
TRUE/FALSEresult indicating if the columns are the same (and therefore bindable, though FALSE is not definitive that binding will fail).
- Its helper function
describe_class()is exported for developers who wish to extend it so that the
compare_df_functions treat their custom classes appropriately.
- Its helper function
This feature (#50) took almost 3 years from conception to implementation. Major thanks to @billdenney for making it happen!
janitor::fisher.test()to enable running these statistical tests from the base
statspackage on two-way
tabylobjects. While the package loading message says the base functions are masked, the base tests still run on
tableobjects (#255, thanks @juba for implementing).
remove_empty()now has a companion function
remove_constant()which removes columns containing only a single unique value, optionally ignoring
NA(#222, thanks to @billdenney for suggesting & implementing).
adorn_totals()gains an argument
"name"that allows the user to specify a value other than “Total” to appear as the name of the added row and/or column (#263). Thanks to @StephieLaPugh for suggesting and @daniel-barnett for implementing.
If the third variable in a three-way tabyl is a factor, the resulting list is sorted in order of its levels (#250). Empty factor levels in the 3rd variable are still omitted regardless of the value of
CRAN release: 2018-07-31
Patches a bug introduced in version 1.1.0 where
excel_numeric_to_date() would fail if given an input vector containing an
CRAN release: 2018-07-18
This release was requested by CRAN to address some minor package dependency issues. It also contains several updates and additions described below.
The new function
row_to_names() handles the case where a dirty data file is read in with its names stored as a row of the data.frame, rather than in the names. This function sets the names of the data.frame to this row and optionally cleans up the rows above and including where the names were stored. Thanks to @billdenney for writing this feature.
excel_numeric_to_date() can now convert fractions of a day to time, e.g.,
excel_numeric_to_date(43001.01, include_time = TRUE) returns the POSIXlt value
"2017-09-23 00:14:24". Thanks to @billdenney.
As part of
excel_numeric_to_date() now handling times, if a Date-only result is requested (the default behavior of
include_time = FALSE), any fractional part of the date is now removed. The printed date itself is identical, but the internal representation of this object now contains only the integer part of the date. For example, while under both the old and new versions of this function the call
excel_numeric_to_date_old(42001.1) would return the Date object
as.numeric on this Date result would previously return
16432.1, while now it returns
This an improved behavior, as now
excel_numeric_to_date(42001.1, include_time = FALSE) == as.Date("2014-12-28") returns TRUE, while previously it would appear to be equivalent from the printed value but this comparison would return FALSE.
CRAN release: 2018-03-22
A stable version 1.0.0, with a new
tabyl API and with breaking changes to the output of
This builds on the original functionality of janitor, with similar-but-improved tools and significantly-changed implementation.
tabyl() is now a single function that can count combinations of one, two, or three variables, ala base R’s
table(). The resulting
tabyl data.frames can be manipulated and formatted using a family of
adorn_ functions. See the tabyls vignette for more.
The now-redundant legacy functions
adorn_crosstab() have been deprecated, but remain in the package for now. Existing code that relies on the version of
tabyl present in janitor versions <= 0.3.1 will break if the
sort argument was used, as that argument no longer exists in
clean_names() now detects and preserves camelCase inputs, allows multiple options for case outputs of the cleaned names, and preserves whether there’s space between letters and numbers. It also transliterates accented letters and turns
These changes may cause old code to break. E.g., a raw column name
variableName would now be converted to
VariableName, etc. depending on your preference), where previously it would have been converted to
To minimize this inconvenience, there’s a quick fix for compatibility: you can find-and-replace to insert the argument
case = "old_janitor", preserving the old behavior of
clean_names() as of janitor version 0.3.1 (and thus not have to redo your scripts beyond that.)
No further changes are planned to
clean_names() and its results should be stable from version 1.0.0 onward.
- To encourage transparency,
remove_empty()prints a message if no value is supplied for the
whichargument; to suppress this, supply a value to
which, even if it’s the default
- To encourage transparency,
- The utility function
round_half_up()is now exported for public use. It’s an exact implementation of https://stackoverflow.com/questions/12688717/round-up-from-5-in-r/12688836#12688836/, written by @mrdwab.
tabylobjects now print with row numbers suppressed
clean_names()now retains the character
"number"in the resulting names
CRAN release: 2018-01-04
This is a bug-fix release with no new functionality or changes. It fixes a bug where
adorn_crosstab() failed if the
tibble package was version > 1.4.
Major changes to janitor are currently in development on GitHub and will be released soon. This is not that next big release.
CRAN release: 2017-05-06
The primary purpose of this release is to maintain accuracy given breaking changes to the dplyr package, upon which janitor is built, in dplyr version >0.6.0. This update also contains a number of minor improvements.
Critical: if you update the package
dplyr to version >0.6.0, you must update janitor to version 0.3.0 to ensure accurate results from janitor’s
tabyl() function. This is due to a change in the behavior of dplyr’s
_join functions (discussed in #111).
janitor 0.3.0 is compatible with this new version of dplyr as well as old versions of dplyr back to 0.5.0. That is, updating janitor to 0.3.0 does not necessitate an update to dplyr >0.6.0.
- The functions
add_totals_colwere combined into a single function,
adorn_totals(). (#57). The
add_totals_functions are now deprecated and should not be used.
- The first argument of
adorn_crosstab()is now “dat” instead of “crosstab” (indicating that the function can be called on any data.frame, not just a result of
Deprecated the following functions: -
use_first_valid_of() - use
dplyr::coalesce() instead -
convert_to_NA() - use
dplyr::na_if() instead -
add_totals_col() - replaced by the single function
clean_names()now handles leading spaces (#85)
ns_to_percents()work on a 2-column data.frame (#89)
adorn_totals()now works on a grouped tibble (#97)
- Long variable names with spaces no longer break
NA_column in the result of a
crosstab()will appear at the last column position (#109)
CRAN release: 2016-10-31
crosstab()now appear in the package manual (#65)
- Fixed minor bug per CRAN request -
crosstab()failed to retain ill-formatted variable names only when using R 3.2.5 for Windows (#76)
add_totals_row()works on two-column data.frame (#69)
use_first_valid_of()returns POSIXct-class result when given POSIXct inputs
CRAN release: 2016-10-03
- Added a function
adorn_crosstab()that formats the results of a
crosstab()for pretty printing. Shows % and N in the same cell, with the % symbol, user-specified rounding (method and number of digits), and the option to include a totals row and/or column. E.g.,
mtcars %>% crosstab(cyl, gear) %>% adorn_crosstab().
crosstab()can be called in a
mtcars %>% crosstab(cyl, gear). Thanks to @chrishaid (#34)
tabyl()can also be called in a
mtcars %>% tabyl(cyl)(#35)
- Added minor functions for manipulating numeric data.frames for presentation:
crosstab()returns 0 instead of NA when there are no instances of a variable combination.
- A call like
tabyl(df$vecname)retains the more-descriptive
$symbol in the column name of the result - if you want a legal R name in the result, call it as
df %>% tabyl(vecname)
- Single and double quotation marks are handled by