Tag Archives: analytics

Most pie charts are bad, so here is a good one.

So, there is a movement in the data science community to kill the pie chart because all pie charts are bad. And much of the criticism is valid. Pie charts can be replaced by better representations, usually starting with bar charts, to convey more accurate and easier to interpret relational comparisons between categories. It’s harder to get away with willfully distorting perspective in a bar chart than it is in a pie chart – always ask to see the percentage values in someone else’s dense pie chart. When it does come into play is a binary comparison. That can be powerful. > 3, use a bar chart.

But with all of that said, I came across a very illuminating pie chart today that I wanted to preserve. pacman pie chart

There is also this actual pie. pie eaten pie chart

qdapRegex library for R

I just found the qdapRegex package for R, part of the larger qdap packages that Jason Gray and Tyler Rinker have put together for supporting text munging/processing for discourse analysis, etc. There’s a lot in there, with four libraries, including the Regex set, some tools, dictionaries and a qdap proper for the qualitative analysis (pre)-processing.

Continue reading qdapRegex library for R

Moving label titles in ggplot

Looking to create a small multiples plot in ggplot with a wordy y axis title. Here is the code:
ggplot(myData) +
aes(x=x, y=y) +
geom_point() +
facet_wrap(~a_third_variable) +
labs(x="XXX", y="Many, many words about YYY")

Want to see some of the wonderful things that one can adjust with the axes, go read
formats for axes. But, I had the y-axis title overlapping the scale ticks on the y-axis and there was nothing I found in the cookbook to deal with this.

So, to adjust the position use a theme method:
theme(axis.title.y = element_text(vjust=0.5))

Remember that this is adjusting from the perspective of the axis, so for the y-axis I want to shift it vertically.

Need to be able to silence warns on lubridate functions

The lubridate package in R is excellent. It is intuitive for working with all kinds of date methods and very comprehensive. All praise to Hadley for continuing to maintain this excellent addition to the community.

In recent work, I have noticed that methods will correctly handle NA values in the input object, but I don’t see a way to turn off the warnings when irrelevant. A long time ago, NA was breaking methods, but Hadley fixed that with b8e90c.

And it works.


> test <- c("1/1/15", "2/2/15", "3/3/15", NA, "5/5/15") > test
[1] "1/1/15" "2/2/15" "3/3/15" NA "5/5/15"
> is.na(test)
[1] FALSE FALSE FALSE TRUE FALSE
> dmy(test)
[1] "2015-01-01 UTC" "2015-02-02 UTC" "2015-03-03 UTC" NA "2015-05-05 UTC"

In addition, in the wild, I am using the method on a vector and getting warnings that I would like to ignore. I haven’t seen a param to pass to ignore warnings, so it would be a nice addition.

Own your data and the capability to sweat it

So, the analysis is rolling in about what won it for Obama, which includes a great deal owing to big data and analytical models. The data came from public sector and commercial databases, combined into an obama campaign datawarehouse. Then there were real data scientists who knew how to build models and act upon them. Romney’s people also were doing these sorts of things, but importantly for me, chose to outsource much of the effort and thus were not able to own and exploit as much of the results and models as obama. This is the salient lesson, that in a world of increasing data-centricity, the successful organizations will not view data and the applications which sweat it as anything commodity. Rather they will need to see them as a core strategic requirement that, if anything, they will need to grow. Continue reading Own your data and the capability to sweat it

Customer Intelligence tools and options

Tom Davenport knocked together an interesting [summary of CC tools](http://blogs.hbr.org/cs/2012/08/a_few_weeks_ago_i.html) that has some cursory analysis and applications. Worth a quick gander. Interesting also are some of the comments. It is amusing to see that the common angst of datawarehousing is still coming to new audiences as this drive towards larger adoption of data-centricity continues, namely *we need common definitions*. Ahh, ontology. A book that is referenced and praised in the comments that I haven’t read yet is [Customer Worthy](http://www.amazon.com/Customer-Worthy-everyone-organization-Think/dp/0981986919/ref=sr_1_1?ie=UTF8qid=1345216632sr=8-1keywords=Customer+Worthy%2C+Why+and+How+Everyone+Must+Think+Like+a+Customer). Need to read this I think.

LAK 2012

There was some interesting stuff happening at the [LAK 2012](http://www.solaresearch.org/events/lak/2012videos/) in Vancouver. I didn’t go, but want to go over some stuff here and capture it for later. Much of this will be pushed forwards in Denver at [Educause 2012](http://www.educause.edu/annual-conference) which I should be at. This particular talk was looking at how to build organizational capacity for LA inside an HE. Donald M. Norris,Linda Baer
Panel Proposal: Building Organizational Capacity for Analytics. Continue reading LAK 2012