Serialize R objects for transport across the wire

I’ve been thinking lately about serialization and transport of R objects. It seems to me that there is still some clunkiness in having modular classes share objects and that the predominant paradigm is still to store them at rest. But there are options, including save() and saveRDS() which will happily serialize your object to rest or to a connection. SaveRDS seems to be a better method unless you need to write multiple objects with one call, in which case you must fall back to save(). Simple to use:


> anObject <- (rnorm(100, mean=0, sd=1)) > anObject <- data.frame(x=1:length(anObject), y=anObject) > saveRDS(object = anObject, file = "/local/path/to/use/objectAtRest.rds")

For some instances, text as storage and even transport protocol may be better. Not fighting any environment issues for sharing an object, or wanting to transport cleanly across the wire. There is a base method, dput(), which will represent the serialization into text format for at rest or connections, but it seems to be very clunky and temperamental. Even the R base documentation tells us it is not a good way to share objects. But anyone outside the bubble would think immediately of xml or, more lightweight, json. And there are three packages (at least) that read/write json in R.

JSONlite is a package that originally forked from , and then underwent rewrite. It has several useful methods:

  • flatten – converts a nested df into a 2D df
  • prettify, minify – [adds,removes] indentation to a JSON string
  • serializeJSON – robust, and consequently, more verbose, serialization to/from R objects. Encodes all data and attributes
  • stream_in, stream_out – for line-by-line processing of JSON over a connection. Common with large datasets in JSON DBs
  • toJSON, fromJSON – serializes to/from JSON with type conventions discussed here
  • unbox – utility method which marks atomic df or vector as singleton, for use with restrictive predetermined JSON structures
  • validate – test if a string contains valid JSON
  • rbind.pages – combine a list of dfs to a single df, intended to help with paged JSON coming in over the wire.
  • So, if you need textual representations of objects then I would use toJSON() over dput().


    # Get JSON over the wire and convert to a local df

    > aDataFrame <- fromJSON("https://api.github.com/users/hadley/orgs")


    > anObjectInJSON <- toJSON(anObject, pretty=TRUE)

    twitter
    twitter

    Leave a Reply

    Your email address will not be published. Required fields are marked *