qdapRegex library for R

I just found the qdapRegex package for R, part of the larger qdap packages that Jason Gray and Tyler Rinker have put together for supporting text munging/processing for discourse analysis, etc. There’s a lot in there, with four libraries, including the Regex set, some tools, dictionaries and a qdap proper for the qualitative analysis (pre)-processing.

The Regex alone seems worthwhile pulling down and keeping local. There are some nice convenience functions for removing phone numbers, names, tags, zips, etc. from text in R.

The authors have built a dependency on knitr for documentation, probably to promote a good publishing solution and to reduce the maintenance overhead. I just pulled down qdap 2.2.2 from CRAN and compiled from sources happily until the final push:


trying URL 'https://cran.rstudio.com/src/contrib/qdap_2.2.2.tar.gz'
Content type 'application/x-gzip' length 2473841 bytes (2.4 MB)
-==================================================
downloaded 2.4 MB

 installing *source* package ‘tm’ ...
 package ‘tm’ successfully unpacked and MD5 sums checked
 libs
clang -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include    -fPIC  -Wall -mtune=core2 -g -O2  -c copy.c -o copy.o
clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o tm.so copy.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
installing to /Library/Frameworks/R.framework/Versions/3.2/Resources/library/tm/libs
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (tm)
* installing *source* package ‘qdap’ ...
** package ‘qdap’ successfully unpacked and MD5 sums checked
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
No Java runtime present, requesting install.
Warning in install.packages :
  installation of package ‘qdap’ had non-zero exit status

and this then opened a warning on my deck about how I needed to install java 1.6, which is already EOL. I haven’t found yet what in the installer is calling for this, but will try to get back to it. This needs to be fixed.

Besides that, there is much to praise in these packages and I look forward to exploring it further. For those that haven’t pulled it down, I post this as it is just another nice little convenience:

> library(qdapRegex)
> cheat()
NAME REGEX WHAT IT DOES
1 Lookahead (?=foo) What follows is `foo`
2 Lookbehind (?<=foo) What precedes is `foo` 3 Negative Lookahead (?!foo) What follows is not `foo` 4 Negative Lookbehind (?= 0 (Greedy) x* Match 0 or more times greedy
20 >= 0 (Lazy) x*? Match 0 or more times lazy
21 >= 1 (Greedy) x+ Match 1 or more times greedy
22 >= 1 (Lazy) x+? Match 1 or more times lazy
23 Exactly N x{4} Match N times
24 Min-Max x{4,8} Match min-max times
25 > N x{9,} Match N or more times

twitter
twitter

About shawnmehan

Shawn Mehan
This entry was posted in Wisdom's Quintessence Blog and tagged , , , , , . Bookmark the permalink.