So, there is a movement in the data science community to kill the pie chart because all pie charts are bad. And much of the criticism is valid. Pie charts can be replaced by better representations, usually starting with bar charts, to convey more accurate and easier to interpret relational comparisons between categories. It’s harder to get away with willfully distorting perspective in a bar chart than it is in a pie chart – always ask to see the percentage values in someone else’s dense pie chart. When it does come into play is a binary comparison. That can be powerful. > 3, use a bar chart.
I had to do some profiling of some functions, so needed to delve into the basics of R profiling options.
I just found the qdapRegex package for R, part of the larger qdap packages that Jason Gray and Tyler Rinker have put together for supporting text munging/processing for discourse analysis, etc. There’s a lot in there, with four libraries, including the Regex set, some tools, dictionaries and a qdap proper for the qualitative analysis (pre)-processing.
Here is a list of (mostly) various publicly available datasets, in easily digested formats. There is a great amount of variation here, from finance to email collections from enron, to governmental and health care data. Very useful. In addition, here are some others not on the list:
Looking to create a small multiples plot in ggplot with a wordy y axis title. Here is the code:
aes(x=x, y=y) +
labs(x="XXX", y="Many, many words about YYY")
Want to see some of the wonderful things that one can adjust with the axes, go read
formats for axes. But, I had the y-axis title overlapping the scale ticks on the y-axis and there was nothing I found in the cookbook to deal with this.
So, to adjust the position use a theme method:
theme(axis.title.y = element_text(vjust=0.5))
Remember that this is adjusting from the perspective of the axis, so for the y-axis I want to shift it vertically.
The lubridate package in R is excellent. It is intuitive for working with all kinds of date methods and very comprehensive. All praise to Hadley for continuing to maintain this excellent addition to the community.
In recent work, I have noticed that methods will correctly handle NA values in the input object, but I don’t see a way to turn off the warnings when irrelevant. A long time ago, NA was breaking methods, but Hadley fixed that with b8e90c.
And it works.
> test <- c("1/1/15", "2/2/15", "3/3/15", NA, "5/5/15") > test
 "1/1/15" "2/2/15" "3/3/15" NA "5/5/15"
 FALSE FALSE FALSE TRUE FALSE
 "2015-01-01 UTC" "2015-02-02 UTC" "2015-03-03 UTC" NA "2015-05-05 UTC"
In addition, in the wild, I am using the method on a vector and getting warnings that I would like to ignore. I haven’t seen a param to pass to ignore warnings, so it would be a nice addition.
So, the analysis is rolling in about what won it for Obama, which includes a great deal owing to big data and analytical models. The data came from public sector and commercial databases, combined into an obama campaign datawarehouse. Then there were real data scientists who knew how to build models and act upon them. Romney’s people also were doing these sorts of things, but importantly for me, chose to outsource much of the effort and thus were not able to own and exploit as much of the results and models as obama. This is the salient lesson, that in a world of increasing data-centricity, the successful organizations will not view data and the applications which sweat it as anything commodity. Rather they will need to see them as a core strategic requirement that, if anything, they will need to grow. Continue reading Own your data and the capability to sweat it
Tom Davenport knocked together an interesting [summary of CC tools](http://blogs.hbr.org/cs/2012/08/a_few_weeks_ago_i.html) that has some cursory analysis and applications. Worth a quick gander. Interesting also are some of the comments. It is amusing to see that the common angst of datawarehousing is still coming to new audiences as this drive towards larger adoption of data-centricity continues, namely *we need common definitions*. Ahh, ontology. A book that is referenced and praised in the comments that I haven’t read yet is [Customer Worthy](http://www.amazon.com/Customer-Worthy-everyone-organization-Think/dp/0981986919/ref=sr_1_1?ie=UTF8qid=1345216632sr=8-1keywords=Customer+Worthy%2C+Why+and+How+Everyone+Must+Think+Like+a+Customer). Need to read this I think.
[SOLAR](http://www.solaresearch.org/) is the society for learning analytics research has some good [resources](http://www.solaresearch.org/resources/) and are working towards certifications. Continue reading SoLAR
There was some interesting stuff happening at the [LAK 2012](http://www.solaresearch.org/events/lak/2012videos/) in Vancouver. I didn’t go, but want to go over some stuff here and capture it for later. Much of this will be pushed forwards in Denver at [Educause 2012](http://www.educause.edu/annual-conference) which I should be at. This particular talk was looking at how to build organizational capacity for LA inside an HE. Donald M. Norris,Linda Baer
Panel Proposal: Building Organizational Capacity for Analytics. Continue reading LAK 2012