Plot covariance estimates in a GaussianMixture cluster


covariance

Moving covariance matrix functions from GMM to GaussianMixture in sklearn

So in sklearn 0.18 the GMM model is deprecated and looks to be removed in 0.20. The replacement is GaussianMixture available from sklearn.mixture. I have some code that currently uses GMM and needed to port it to GaussianMixture. One of the major convienence features to GMM are accessors to mean, covars, etc. The Covariance matrix itslef is critical for many of the clustering applications that would motivate the original use of GMM. This post shows how to port that covariance accessor to the promoted GaussianMixture implementation.

Covariance Matrix

Remember that the covariance matrix is a matrix representing the covariances between all of the elements between two vectors, X and Y:

Cov[X,Y]=E[(X − E[X])(Y − E[Y])]=E[XY] − E[X]E[Y]

An alternative form, perhaps more expressive when considering the matrix forms that we are dealing with here and using Σ as the covariance matrix, and μ is the mean of any random vector X:

Σ = E[(X − μ)(X − μ)T ] = E[XXT ] − μμT

Python Intelligent Algorithms

Working with a GMM clustering over the iris dataset, there is a dependency on the make_ellipses method published by Ron Weiss ronweiss@gmail.com.
The most non-trivial move required to shift from GMM to GaussianMixture involves:

v, w = np.linalg.eigh(gmm._get_covars()[n][row_idx[:, None], col_idx])

which needs to shift to

v, w = np.linalg.eigh(gmm.covariances_[n][row_idx[:, None], col_idx])

in order to work with the newer model package.
The full listing of the function as I have ported it is then:

def make_ellipses(gmm, ax, x, y):
"""
Extracts a covariance matrix in 2D from a higher dimensional feature space.
Calculates an ellipse along maximal variance in a GMM object in 2D,
both direction and the respective magnitude, i.e., the eigenvector and eigenvalue
of the covariance matrix. It writes the resulting ellipse onto an existing pyplot plot.
:param gmm: sklearn GaussianMixture object
:param ax: plot axis - the 2D subset of the full feature space
:param x: the first dimension of the 2D plot axis
:param y: the second dimension of the 2D plot axis
:return:
"""
for n, color in enumerate('rgb'):
    row_idx = np.array([x, y])
    col_idx = np.array([x, y])
    # FIXME GMM has method _get_covars not present in GaussianMixture
    #v, w = np.linalg.eigh(gmm._get_covars()[n][row_idx[:, None], col_idx])
    v, w = np.linalg.eigh(gmm.covariances_[n][row_idx[:, None], col_idx])
    u = w[0] / np.linalg.norm(w[0])
    angle = np.arctan2(u[1], u[0])
    angle = 180 * angle / np.pi  # convert rads to degrees
    v *= 9
    ell = mpl.patches.Ellipse(gmm.means_[n, [x, y]], v[0], v[1], 180 + angle, color=color)
    ell.set_clip_box(ax.bbox)
    ell.set_alpha(0.5)
    ax.add_artist(ell)

For an excellent discussion on the use of the ellipses for plotting the covariances of your GaussianMixture see GMM covariances


rose-chart

Make a Rose chart in R using ggplot

I got a request to make a rose plot, sometimes called a circumplex or doughnut chart, recently. There are two cases for this kind of
plot. The first is where you are using data that naturally sits in the circumpolar coordintate system. Circular or polar data would fit naturally
in such a chart. The second case is one where you want to take naturally cartesian coordinate data and transform it into the circumpolar
coordinate system. Often this is done simply for visual effect. Regardless, here I will describe how to do this in R (version 3.3.1 bug-in-your-hair)
and ggplot 2 (should work fine in a 2.x version).

Naturally circumpolar data

An example of a natural dataset for such a graph can be seen in this periodic data represented in the rose chart.

Polar Data Plot

Naturally cartesian data

However, most people aren’t dealing with this natural coordinate system. Rather, they are in a traditional cartesian coordinate system – if you don’t know then with a high degree of probability you should assume that your in a basically cartesian space.
But we can still achieve the rose chart for this data. Let’s walk through it with some sample data.

library(ggplot2)
library(plyr)

# generate some random data
set.seed(42)
events <- ceiling(10*runif(10)) 
sales <- 1000*runif(10)

# make a dataframe
df <- data.frame(market=1:10, events = events, sales = sales)

Now we have created some markets each of which have a number of events (1:10) and some sales returns (1:1000).
My dataframe ended up looking like:

Market Events Sales
1 10 457.7418
2 10 719.1123
3 3 934.6722
4 9 255.4288
5 7 462.2928
6 6 940.0145
7 8 978.2264
8 2 117.4874
9 7 474.9971
10 8 560.3327

We can easily create a bar chart that shows this data:

# make the initial bar chart
p1 <- ggplot(df) +
    aes(x=factor(market), y=sales, fill=factor(market)) +
    geom_bar(width=1, stat="identity")

calling p1 will give you your version of this plot:

Bar Chart

You could easily make a similar chart for Events by Market.

Translate to a circumpolar coordinate system

To make the data that we have into a rose plot we are going to wrap that bar chart onto itself.

# now simply want to cast the cartesian coord bar chart onto a circumpolar coord system.
p2 <- p1 + scale_y_continuous(breaks = 0:10) +
    coord_polar() + 
    labs(x = "", y = "") +
    scale_fill_discrete(guide_legend(title='Market')) +
    theme(axis.text.x = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks = element_blank())

Here we have taken the already existing bar chart, p1, and given it a continuous y scale that corresponds to 10 divisions – 1000 would not add any clarity to the resulting plot. We are simply trying to give a sense of scale in the y-axis.
We then push onto the polar coordintate system with coord_polar(). That’s it. The remaining calls help to clean up our presentation. Remove the x and y axis labels and add a legend for the Market factor color map. Finally, using calls to theme, we remove all of the axis text and ticks to simplify the presentation.
Here is what we end up with:

Rose chart

That’s fine, but we lose all sense of perspective in the actual market values and comparison between markets could perhaps be made simpler. Let’s try to add some perspective. Moving back to our original bar chart, let’s add some grids that give a better sense of scale along the y-axis.

# to achieve a grid that is visible, we will add 
# a variable to the dataframe that we can plot as a separate plot
# This means that we use plyr.ddply to subset the original data,
# grouped by the market column, and add a new "border" column
# that we can then stack in a separate geom_bar

df2 <- ddply(df, .(market), transform, border = rep(1, events))

p1 <- ggplot(df) +
    aes(x=factor(market)) +
    geom_bar(aes(y=events, fill=factor(market)),
             width=1, 
             stat="identity") +
    geom_bar(data=df2,
             aes(y = border, width = 1), 
             position = "stack", 
             stat = "identity", 
             fill = NA, 
             colour = "black")

Firstly, we computed a second dataframe using ddply out of plyr. This took every market row and added a border column that has a 1 for every event in that market. Have a View() of the dataframe and you will see many more rows that df – I have 70 in mine. Each market now has a multitude of rows equal to how many events there were in that market.
We then did the same sort of bar chart as before, but do note that we have flipped to event for the y-axis. I have reversed what we did before so you can try it out for sales on your own.
Crucially, we added a second bar chart to the plot object, which uses the df2 data. It is building that bar chart with the border column data and stacking the results with no fill and black outlines. Your resulting bar chart looks like:

Bar Chart with Grids

Cast our grided barchart to polar coords.

To get a rose chart from this new bar chart is no different to what we did before. All the differences are wrapped up in the generation of p1, so we have kept our code fairly DRY.
Rerunning the generation of p2 with the new p1:

p2 <- p1 + scale_y_continuous(breaks = 0:10) +
    coord_polar() + 
    labs(x = "", y = "") +
    scale_fill_discrete(guide_legend(title='Market')) +
    theme(axis.text.x = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks = element_blank())

yields:
Rose chart with grids

Iterables vs. Generators in Python

Iterables vs. Generators in Python

I’ve had some people asking me lately what the difference between a Python iteratable and a Python generator is, and when to use them. This is a short write-up to show some of the differences and the benefits of use.

Iterables

Simply put, an iteratable in Python is any object that will allow you to use it in with an in operator, e.g.:

    A = [1,2,3,4,5]
    for a in A:
        print(a)

will output:

1
2
3
4
5

An iteratable is an object that will hold other items, primitives or more complex objects. Lists, tuples, sets, dicts are all examples of iterables. So are things like strings and files. We use them all the time, quite properly, in our code.

And when we use a comprehension, (hint, hint!) we are also creating an iterable, so

    B = [x for x in range(1,10000)]

creates a list of 100 values, from 0…10000.

So that’s great. but perhaps we should be concerned about things like efficiency, and speed, and memory when we are building larger applications. Let’s examine a couple of data points. Here I’m using sys.getsizeof to get a fairly decent idea of memory usage.

sys.getsizeof(A)
920
sys.getsizeof(B)
87632

not surprisingly, B > A. Reasonable given that B has 10000 items while A has 5. But think about your application and how long A or B are going to hang around. Are you using them once or many times? Are you losing available memory for a one time operation or do you need to access that iterable again and again? If you only need it once, it is costly to make the iterable and then have it hang around.

Generators

Welcome the generator that will not store the values but rather the method used to compute the values. Let’s create a generator that gives the same result as B above.

    C = (x for x in range(1,10000))

The difference in the definition is the use of () rather than the list comprehension []. What did we get? Let’s look at the differences.

type(A)
< class ‘list’>
type(B)
< class ‘list’>
type(C)
< class ‘generator’>

But try using C in your print loop and you will get the same result! Size differences?

sys.getsizeof(A)
920
sys.getsizeof(B)
87632
sys.getsizeof(C)
80

Now that’s interesting! not only is C 0.09% the size of B, both giving the same result when used for computation, but C is 8.7% the size of A, which only has 5 values compared to C’s 10000 (0.05% of 10000)!

I got the output I wanted from C with a similar iteration, namely:

    for c in C:
        print(c)

and, sure enough, I have 10000 values in output. But when I ran it a second time, I got no output! That’s because my generator was consumed when I iterated over it. They can only be used once. So there are many times when I only need to make the iterable and run over it once in flow, and so that’s fine. And I gain considerable efficiencies.

A more complex example

And these are merely simple iterables. Imagine what happens with more complex structures, e.g.

    letters = ['a', 'b', 'c', 'd', 'e']
    colors = ['red', 'yellow', 'blue', 'green']
    squares = [x*x for x in range(1,10)]

    newlist = [(l, c, s) for l in letters for c in colors for s in squares]
    newgen = ((l, c, s) for l in letters for c in colors for s in squares)

Both newgen and newlist produce objects that can yield a 180 element group of tuples, e.g.:

(‘a’, ‘red’, 1)
(‘a’, ‘red’, 4)
(‘a’, ‘red’, 9)
(‘a’, ‘red’, 16)
(‘a’, ‘red’, 25)
(‘a’, ‘red’, 36)
(‘a’, ‘red’, 49)
(‘a’, ‘red’, 64)
(‘a’, ‘red’, 81)
(‘a’, ‘yellow’, 1)
(‘a’, ‘yellow’, 4)
(‘a’, ‘yellow’, 9)
(‘a’, ‘yellow’, 16)
(‘a’, ‘yellow’, 25)
(‘a’, ‘yellow’, 36)
(‘a’, ‘yellow’, 49)
(‘a’, ‘yellow’, 64)
(‘a’, ‘yellow’, 81)
(‘a’, ‘blue’, 1)
(‘a’, ‘blue’, 4)
(‘a’, ‘blue’, 9)
(‘a’, ‘blue’, 16)
(‘a’, ‘blue’, 25)
(‘a’, ‘blue’, 36)
(‘a’, ‘blue’, 49)
(‘a’, ‘blue’, 64)
(‘a’, ‘blue’, 81)
(‘a’, ‘green’, 1)
(‘a’, ‘green’, 4)
(‘a’, ‘green’, 9)
(‘a’, ‘green’, 16)
(‘a’, ‘green’, 25)
(‘a’, ‘green’, 36)
(‘a’, ‘green’, 49)
(‘a’, ‘green’, 64)
(‘a’, ‘green’, 81)
(‘b’, ‘red’, 1)
(‘b’, ‘red’, 4)
(‘b’, ‘red’, 9)
(‘b’, ‘red’, 16)
(‘b’, ‘red’, 25)
(‘b’, ‘red’, 36)
(‘b’, ‘red’, 49)
(‘b’, ‘red’, 64)
(‘b’, ‘red’, 81)
(‘b’, ‘yellow’, 1)
(‘b’, ‘yellow’, 4)
(‘b’, ‘yellow’, 9)
(‘b’, ‘yellow’, 16)
(‘b’, ‘yellow’, 25)
(‘b’, ‘yellow’, 36)
(‘b’, ‘yellow’, 49)
(‘b’, ‘yellow’, 64)
(‘b’, ‘yellow’, 81)
(‘b’, ‘blue’, 1)
(‘b’, ‘blue’, 4)
(‘b’, ‘blue’, 9)
(‘b’, ‘blue’, 16)
(‘b’, ‘blue’, 25)
(‘b’, ‘blue’, 36)
(‘b’, ‘blue’, 49)
(‘b’, ‘blue’, 64)
(‘b’, ‘blue’, 81)
(‘b’, ‘green’, 1)
(‘b’, ‘green’, 4)
(‘b’, ‘green’, 9)
(‘b’, ‘green’, 16)
(‘b’, ‘green’, 25)
(‘b’, ‘green’, 36)
(‘b’, ‘green’, 49)
(‘b’, ‘green’, 64)
(‘b’, ‘green’, 81)
(‘c’, ‘red’, 1)
(‘c’, ‘red’, 4)
(‘c’, ‘red’, 9)
(‘c’, ‘red’, 16)
(‘c’, ‘red’, 25)
(‘c’, ‘red’, 36)
(‘c’, ‘red’, 49)
(‘c’, ‘red’, 64)
(‘c’, ‘red’, 81)
(‘c’, ‘yellow’, 1)
(‘c’, ‘yellow’, 4)
(‘c’, ‘yellow’, 9)
(‘c’, ‘yellow’, 16)
(‘c’, ‘yellow’, 25)
(‘c’, ‘yellow’, 36)
(‘c’, ‘yellow’, 49)
(‘c’, ‘yellow’, 64)
(‘c’, ‘yellow’, 81)
(‘c’, ‘blue’, 1)
(‘c’, ‘blue’, 4)
(‘c’, ‘blue’, 9)
(‘c’, ‘blue’, 16)
(‘c’, ‘blue’, 25)
(‘c’, ‘blue’, 36)
(‘c’, ‘blue’, 49)
(‘c’, ‘blue’, 64)
(‘c’, ‘blue’, 81)
(‘c’, ‘green’, 1)
(‘c’, ‘green’, 4)
(‘c’, ‘green’, 9)
(‘c’, ‘green’, 16)
(‘c’, ‘green’, 25)
(‘c’, ‘green’, 36)
(‘c’, ‘green’, 49)
(‘c’, ‘green’, 64)
(‘c’, ‘green’, 81)
(‘d’, ‘red’, 1)
(‘d’, ‘red’, 4)
(‘d’, ‘red’, 9)
(‘d’, ‘red’, 16)
(‘d’, ‘red’, 25)
(‘d’, ‘red’, 36)
(‘d’, ‘red’, 49)
(‘d’, ‘red’, 64)
(‘d’, ‘red’, 81)
(‘d’, ‘yellow’, 1)
(‘d’, ‘yellow’, 4)
(‘d’, ‘yellow’, 9)
(‘d’, ‘yellow’, 16)
(‘d’, ‘yellow’, 25)
(‘d’, ‘yellow’, 36)
(‘d’, ‘yellow’, 49)
(‘d’, ‘yellow’, 64)
(‘d’, ‘yellow’, 81)
(‘d’, ‘blue’, 1)
(‘d’, ‘blue’, 4)
(‘d’, ‘blue’, 9)
(‘d’, ‘blue’, 16)
(‘d’, ‘blue’, 25)
(‘d’, ‘blue’, 36)
(‘d’, ‘blue’, 49)
(‘d’, ‘blue’, 64)
(‘d’, ‘blue’, 81)
(‘d’, ‘green’, 1)
(‘d’, ‘green’, 4)
(‘d’, ‘green’, 9)
(‘d’, ‘green’, 16)
(‘d’, ‘green’, 25)
(‘d’, ‘green’, 36)
(‘d’, ‘green’, 49)
(‘d’, ‘green’, 64)
(‘d’, ‘green’, 81)
(‘e’, ‘red’, 1)
(‘e’, ‘red’, 4)
(‘e’, ‘red’, 9)
(‘e’, ‘red’, 16)
(‘e’, ‘red’, 25)
(‘e’, ‘red’, 36)
(‘e’, ‘red’, 49)
(‘e’, ‘red’, 64)
(‘e’, ‘red’, 81)
(‘e’, ‘yellow’, 1)
(‘e’, ‘yellow’, 4)
(‘e’, ‘yellow’, 9)
(‘e’, ‘yellow’, 16)
(‘e’, ‘yellow’, 25)
(‘e’, ‘yellow’, 36)
(‘e’, ‘yellow’, 49)
(‘e’, ‘yellow’, 64)
(‘e’, ‘yellow’, 81)
(‘e’, ‘blue’, 1)
(‘e’, ‘blue’, 4)
(‘e’, ‘blue’, 9)
(‘e’, ‘blue’, 16)
(‘e’, ‘blue’, 25)
(‘e’, ‘blue’, 36)
(‘e’, ‘blue’, 49)
(‘e’, ‘blue’, 64)
(‘e’, ‘blue’, 81)
(‘e’, ‘green’, 1)
(‘e’, ‘green’, 4)
(‘e’, ‘green’, 9)
(‘e’, ‘green’, 16)
(‘e’, ‘green’, 25)
(‘e’, ‘green’, 36)
(‘e’, ‘green’, 49)
(‘e’, ‘green’, 64)
(‘e’, ‘green’, 81)

Yet look at the type and size differences:

< class ‘list’> 1680
< class ‘generator’> 80

newlist is 21 times larger than newgen. So, as you write iterable obejcts, especially iterables that are being used for another computation, consider generators!

Public data sources

Many people ask me for data as they want to experiment with understanding different kinds of data analysis and visualization. Here are some sources that would support different kinds of analysis.

Install Tor using apt-get on ubuntu- trusty 14.x





tor.mrd—/Users/shawnmehan


I went to turn on Tor on an ubuntu-64 “trusty” 14.x guest and ran into problems. There appears to be a bug
that critically affects the traditional methods using gpg, so here is some information on how to avoid this.


Tor is free software and an open network that helps you defend against traffic analysis, a form of network surveillance that threatens personal freedom and privacy, confidential business activities and relationships, and state security. There is a very neat Python
controller library for use with Tor called stem.

The problem

To install Tor on an ubuntu, the following instructions can be found:

$ sudo nano /etc/apt/sources.list.d/tor_repo.list

and then add the following lines:

deb http://deb.torproject.org/torproject.org trusty main
deb-src http://deb.torproject.org/torproject.org trusty main

The critical problem then occurs as you try to add the appropriate keyring and key used to sign the packages:


$ gpg –keyserver keys.gnupg.net –recv 886DDD89

$ gpg –export A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89 | sudo apt-key add

There appears to be a bug with gpg and guests involving not correctly resolving DNS for the gpg commands. The symptom that I was getting was

$ gpg –export A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89 | sudo apt-key add –
usage: gpg [options] [filename]
gpg: can't open `–': No such file or directory

The fail in the pipe is due to there not being anything actually exported by gpg. You can test this with:


$ gpg --export A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89
gpg: WARNING: nothing exported

Solution

So, take another approach. Use different packages to avoid any of this problem.


$ sudo apt-get update

$ sudo apt-get install deb.torproject.org-keyring

$ sudo apt-get install tor

which will get you something like:

The following NEW packages will be installed:
  deb.torproject.org-keyring
0 upgraded, 1 newly installed, 0 to remove and 63 not upgraded.
Need to get 5,268 B of archives.
After this operation, 7,168 B of additional disk space will be used.
WARNING: The following packages cannot be authenticated!
  deb.torproject.org-keyring
Install these packages without verification? [y/N] Y
Get:1 http://deb.torproject.org/torproject.org/ trusty/main deb.torproject.org-keyring all 2014.08.31+b1 [5,268 B]
Fetched 5,268 B in 0s (22.3 kB/s)                     
Selecting previously unselected package deb.torproject.org-keyring.
(Reading database ... 128038 files and directories currently installed.)
Preparing to unpack .../deb.torproject.org-keyring_2014.08.31+b1_all.deb ...
Unpacking deb.torproject.org-keyring (2014.08.31+b1) ...
Setting up deb.torproject.org-keyring (2014.08.31+b1) ...
OK

and

    The following extra packages will be installed:
  libseccomp2 tor-geoipdb torsocks
Suggested packages:
  mixmaster torbrowser-launcher socat tor-arm apparmor-utils obfsproxy
  obfs4proxy
The following NEW packages will be installed:
  libseccomp2 tor tor-geoipdb torsocks
0 upgraded, 4 newly installed, 0 to remove and 63 not upgraded.
Need to get 1,707 kB of archives.
After this operation, 8,053 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
WARNING: The following packages cannot be authenticated!
  tor tor-geoipdb
Install these packages without verification? [y/N] Y
Get:1 http://archive.ubuntu.com/ubuntu/ trusty/main libseccomp2 amd64 2.1.0+dfsg-1 [34.8 kB]
Get:2 http://deb.torproject.org/torproject.org/ trusty/main tor amd64 0.2.7.6-1~trusty+1 [1,024 kB]
Get:3 http://archive.ubuntu.com/ubuntu/ trusty/universe torsocks amd64 1.3-3 [73.0 kB]
Get:4 http://deb.torproject.org/torproject.org/ trusty/main tor-geoipdb all 0.2.7.6-1~trusty+1 [575 kB]
Fetched 1,707 kB in 2s (749 kB/s)
Selecting previously unselected package libseccomp2:amd64.
(Reading database ... 128043 files and directories currently installed.)
Preparing to unpack .../libseccomp2_2.1.0+dfsg-1_amd64.deb ...
Unpacking libseccomp2:amd64 (2.1.0+dfsg-1) ...
Selecting previously unselected package tor.
Preparing to unpack .../tor_0.2.7.6-1~trusty+1_amd64.deb ...
Unpacking tor (0.2.7.6-1~trusty+1) ...
Selecting previously unselected package torsocks.
Preparing to unpack .../torsocks_1.3-3_amd64.deb ...
Unpacking torsocks (1.3-3) ...
Selecting previously unselected package tor-geoipdb.
Preparing to unpack .../tor-geoipdb_0.2.7.6-1~trusty+1_all.deb ...
Unpacking tor-geoipdb (0.2.7.6-1~trusty+1) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Processing triggers for ureadahead (0.100.0-16) ...
Setting up libseccomp2:amd64 (2.1.0+dfsg-1) ...
Setting up tor (0.2.7.6-1~trusty+1) ...
Something or somebody made /var/lib/tor disappear.
Creating one for you again.
Something or somebody made /var/log/tor disappear.
Creating one for you again.
 * Starting tor daemon...                                                                                                                  [ OK ] 
Setting up torsocks (1.3-3) ...
Processing triggers for ureadahead (0.100.0-16) ...
Setting up tor-geoipdb (0.2.7.6-1~trusty+1) ...
Processing triggers for libc-bin (2.19-0ubuntu6.6) ...

Now you have Tor installed, just start it.


$ sudo /etc/init.d/tor start

Starting tor daemon...

It also appears that there is some confusion as to the defaul port the service is running on. I found it to be running on 9050, not 9150 as some
articles are reporting. I haven’t yet found an easy way to determine this but trial-and-error proved it to be true.


HTML5 Mathematical Entities

Mathematical HTML entities

The following is a list of all the mathematical entities available under compliant HTML5. It is meant as a reference. Where available, the entity is given, but all entities have their decimal and hexidecimal reference listed.

Char Dec Hex Entity Name
8704 2200 &forall; FOR ALL
8705 2201   COMPLEMENT
8706 2202 &part; PARTIAL DIFFERENTIAL
8707 2203 &exist; THERE EXISTS
8708 2204   THERE DOES NOT EXIST
8709 2205 &empty; EMPTY SET
8710 2206   INCREMENT
8711 2207 &nabla; NABLA
8712 2208 &isin; ELEMENT OF
8713 2209 &notin; NOT AN ELEMENT OF
8714 220A   SMALL ELEMENT OF
8715 220B &ni; CONTAINS AS MEMBER
8716 220C   DOES NOT CONTAIN AS MEMBER
8717 220D   SMALL CONTAINS AS MEMBER
8718 220E   END OF PROOF
8719 220F &prod; N-ARY PRODUCT
8720 2210   N-ARY COPRODUCT
8721 2211 &sum; N-ARY SUMMATION
8722 2212 &minus; MINUS SIGN
8723 2213   MINUS-OR-PLUS SIGN
8724 2214   DOT PLUS
8725 2215   DIVISION SLASH
8726 2216   SET MINUS
8727 2217 &lowast; ASTERISK OPERATOR
8728 2218   RING OPERATOR
8729 2219   BULLET OPERATOR
8730 221A &radic; SQUARE ROOT
8731 221B   CUBE ROOT
8732 221C   FOURTH ROOT
8733 221D &prop; PROPORTIONAL TO
8734 221E &infin; INFINITY
8735 221F   RIGHT ANGLE
8736 2220 &ang; ANGLE
8737 2221   MEASURED ANGLE
8738 2222   SPHERICAL ANGLE
8739 2223   DIVIDES
8740 2224   DOES NOT DIVIDE
8741 2225   PARALLEL TO
8742 2226   NOT PARALLEL TO
8743 2227 &and; LOGICAL AND
8744 2228 &or; LOGICAL OR
8745 2229 &cap; INTERSECTION
8746 222A &cup; UNION
8747 222B &int; INTEGRAL
8748 222C   DOUBLE INTEGRAL
8749 222D   TRIPLE INTEGRAL
8750 222E   CONTOUR INTEGRAL
8751 222F   SURFACE INTEGRAL
8752 2230   VOLUME INTEGRAL
8753 2231   CLOCKWISE INTEGRAL
8754 2232   CLOCKWISE CONTOUR INTEGRAL
8755 2233   ANTICLOCKWISE CONTOUR INTEGRAL
8756 2234 &there4; THEREFORE
8757 2235   BECAUSE
8758 2236   RATIO
8759 2237   PROPORTION
8760 2238   DOT MINUS
8761 2239   EXCESS
8762 223A   GEOMETRIC PROPORTION
8763 223B   HOMOTHETIC
8764 223C &sim; TILDE OPERATOR
8765 223D   REVERSED TILDE
8766 223E   INVERTED LAZY S
8767 223F   SINE WAVE
8768 2240   WREATH PRODUCT
8769 2241   NOT TILDE
8770 2242   MINUS TILDE
8771 2243   ASYMPTOTICALLY EQUAL TO
8772 2244   NOT ASYMPTOTICALLY EQUAL TO
8773 2245 &cong; APPROXIMATELY EQUAL TO
8774 2246   APPROXIMATELY BUT NOT ACTUALLY EQUAL TO
8775 2247   NEITHER APPROXIMATELY NOR ACTUALLY EQUAL TO
8776 2248 &asymp; ALMOST EQUAL TO
8777 2249   NOT ALMOST EQUAL TO
8778 224A   ALMOST EQUAL OR EQUAL TO
8779 224B   TRIPLE TILDE
8780 224C   ALL EQUAL TO
8781 224D   EQUIVALENT TO
8782 224E   GEOMETRICALLY EQUIVALENT TO
8783 224F   DIFFERENCE BETWEEN
8784 2250   APPROACHES THE LIMIT
8785 2251   GEOMETRICALLY EQUAL TO
8786 2252   APPROXIMATELY EQUAL TO OR THE IMAGE OF
8787 2253   IMAGE OF OR APPROXIMATELY EQUAL TO
8788 2254   COLON EQUALS
8789 2255   EQUALS COLON
8790 2256   RING IN EQUAL TO
8791 2257   RING EQUAL TO
8792 2258   CORRESPONDS TO
8793 2259   ESTIMATES
8794 225A   EQUIANGULAR TO
8795 225B   STAR EQUALS
8796 225C   DELTA EQUAL TO
8797 225D   EQUAL TO BY DEFINITION
8798 225E   MEASURED BY
8799 225F   QUESTIONED EQUAL TO
8800 2260 &ne; NOT EQUAL TO
8801 2261 &equiv; IDENTICAL TO
8802 2262   NOT IDENTICAL TO
8803 2263   STRICTLY EQUIVALENT TO
8804 2264 &le; LESS-THAN OR EQUAL TO
8805 2265 &ge; GREATER-THAN OR EQUAL TO
8806 2266   LESS-THAN OVER EQUAL TO
8807 2267   GREATER-THAN OVER EQUAL TO
8808 2268   LESS-THAN BUT NOT EQUAL TO
8809 2269   GREATER-THAN BUT NOT EQUAL TO
8810 226A   MUCH LESS-THAN
8811 226B   MUCH GREATER-THAN
8812 226C   BETWEEN
8813 226D   NOT EQUIVALENT TO
8814 226E   NOT LESS-THAN
8815 226F   NOT GREATER-THAN
8816 2270   NEITHER LESS-THAN NOR EQUAL TO
8817 2271   NEITHER GREATER-THAN NOR EQUAL TO
8818 2272   LESS-THAN OR EQUIVALENT TO
8819 2273   GREATER-THAN OR EQUIVALENT TO
8820 2274   NEITHER LESS-THAN NOR EQUIVALENT TO
8821 2275   NEITHER GREATER-THAN NOR EQUIVALENT TO
8822 2276   LESS-THAN OR GREATER-THAN
8823 2277   GREATER-THAN OR LESS-THAN
8824 2278   NEITHER LESS-THAN NOR GREATER-THAN
8825 2279   NEITHER GREATER-THAN NOR LESS-THAN
8826 227A   PRECEDES
8827 227B   SUCCEEDS
8828 227C   PRECEDES OR EQUAL TO
8829 227D   SUCCEEDS OR EQUAL TO
8830 227E   PRECEDES OR EQUIVALENT TO
8831 227F   SUCCEEDS OR EQUIVALENT TO
8832 2280   DOES NOT PRECEDE
8833 2281   DOES NOT SUCCEED
8834 2282 &sub; SUBSET OF
8835 2283 &sup; SUPERSET OF
8836 2284 &nsub; NOT A SUBSET OF
8837 2285   NOT A SUPERSET OF
8838 2286 &sube; SUBSET OF OR EQUAL TO
8839 2287 &supe; SUPERSET OF OR EQUAL TO
8840 2288   NEITHER A SUBSET OF NOR EQUAL TO
8841 2289   NEITHER A SUPERSET OF NOR EQUAL TO
8842 228A   SUBSET OF WITH NOT EQUAL TO
8843 228B   SUPERSET OF WITH NOT EQUAL TO
8844 228C   MULTISET
8845 228D   MULTISET MULTIPLICATION
8846 228E   MULTISET UNION
8847 228F   SQUARE IMAGE OF
8848 2290   SQUARE ORIGINAL OF
8849 2291   SQUARE IMAGE OF OR EQUAL TO
8850 2292   SQUARE ORIGINAL OF OR EQUAL TO
8851 2293   SQUARE CAP
8852 2294   SQUARE CUP
8853 2295 &oplus; CIRCLED PLUS
8854 2296   CIRCLED MINUS
8855 2297 &otimes; CIRCLED TIMES
8856 2298   CIRCLED DIVISION SLASH
8857 2299   CIRCLED DOT OPERATOR
8858 229A   CIRCLED RING OPERATOR
8859 229B   CIRCLED ASTERISK OPERATOR
8860 229C   CIRCLED EQUALS
8861 229D   CIRCLED DASH
8862 229E   SQUARED PLUS
8863 229F   SQUARED MINUS
8864 22A0   SQUARED TIMES
8865 22A1   SQUARED DOT OPERATOR
8866 22A2   RIGHT TACK
8867 22A3   LEFT TACK
8868 22A4   DOWN TACK
8869 22A5 &perp; UP TACK
8870 22A6   ASSERTION
8871 22A7   MODELS
8872 22A8   TRUE
8873 22A9   FORCES
8874 22AA   TRIPLE VERTICAL BAR RIGHT TURNSTILE
8875 22AB   DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE
8876 22AC   DOES NOT PROVE
8877 22AD   NOT TRUE
8878 22AE   DOES NOT FORCE
8879 22AF   NEGATED DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE
8880 22B0   PRECEDES UNDER RELATION
8881 22B1   SUCCEEDS UNDER RELATION
8882 22B2   NORMAL SUBGROUP OF
8883 22B3   CONTAINS AS NORMAL SUBGROUP
8884 22B4   NORMAL SUBGROUP OF OR EQUAL TO
8885 22B5   CONTAINS AS NORMAL SUBGROUP OR EQUAL TO
8886 22B6   ORIGINAL OF
8887 22B7   IMAGE OF
8888 22B8   MULTIMAP
8889 22B9   HERMITIAN CONJUGATE MATRIX
8890 22BA   INTERCALATE
8891 22BB   XOR
8892 22BC   NAND
8893 22BD   NOR
8894 22BE   RIGHT ANGLE WITH ARC
8895 22BF   RIGHT TRIANGLE
8896 22C0   N-ARY LOGICAL AND
8897 22C1   N-ARY LOGICAL OR
8898 22C2   N-ARY INTERSECTION
8899 22C3   N-ARY UNION
8900 22C4   DIAMOND OPERATOR
8901 22C5 &sdot; DOT OPERATOR
8902 22C6   STAR OPERATOR
8903 22C7   DIVISION TIMES
8904 22C8   BOWTIE
8905 22C9   LEFT NORMAL FACTOR SEMIDIRECT PRODUCT
8906 22CA   RIGHT NORMAL FACTOR SEMIDIRECT PRODUCT
8907 22CB   LEFT SEMIDIRECT PRODUCT
8908 22CC   RIGHT SEMIDIRECT PRODUCT
8909 22CD   REVERSED TILDE EQUALS
8910 22CE   CURLY LOGICAL OR
8911 22CF   CURLY LOGICAL AND
8912 22D0   DOUBLE SUBSET
8913 22D1   DOUBLE SUPERSET
8914 22D2   DOUBLE INTERSECTION
8915 22D3   DOUBLE UNION
8916 22D4   PITCHFORK
8917 22D5   EQUAL AND PARALLEL TO
8918 22D6   LESS-THAN WITH DOT
8919 22D7   GREATER-THAN WITH DOT
8920 22D8   VERY MUCH LESS-THAN
8921 22D9   VERY MUCH GREATER-THAN
8922 22DA   LESS-THAN EQUAL TO OR GREATER-THAN
8923 22DB   GREATER-THAN EQUAL TO OR LESS-THAN
8924 22DC   EQUAL TO OR LESS-THAN
8925 22DD   EQUAL TO OR GREATER-THAN
8926 22DE   EQUAL TO OR PRECEDES
8927 22DF   EQUAL TO OR SUCCEEDS
8928 22E0   DOES NOT PRECEDE OR EQUAL
8929 22E1   DOES NOT SUCCEED OR EQUAL
8930 22E2   NOT SQUARE IMAGE OF OR EQUAL TO
8931 22E3   NOT SQUARE ORIGINAL OF OR EQUAL TO
8932 22E4   SQUARE IMAGE OF OR NOT EQUAL TO
8933 22E5   SQUARE ORIGINAL OF OR NOT EQUAL TO
8934 22E6   LESS-THAN BUT NOT EQUIVALENT TO
8935 22E7   GREATER-THAN BUT NOT EQUIVALENT TO
8936 22E8   PRECEDES BUT NOT EQUIVALENT TO
8937 22E9   SUCCEEDS BUT NOT EQUIVALENT TO
8938 22EA   NOT NORMAL SUBGROUP OF
8939 22EB   DOES NOT CONTAIN AS NORMAL SUBGROUP
8940 22EC   NOT NORMAL SUBGROUP OF OR EQUAL TO
8941 22ED   DOES NOT CONTAIN AS NORMAL SUBGROUP OR EQUAL
8942 22EE   VERTICAL ELLIPSIS
8943 22EF   MIDLINE HORIZONTAL ELLIPSIS
8944 22F0   UP RIGHT DIAGONAL ELLIPSIS
8945 22F1   DOWN RIGHT DIAGONAL ELLIPSIS
8946 22F2   ELEMENT OF WITH LONG HORIZONTAL STROKE
8947 22F3   ELEMENT OF WITH VERTICAL BAR AT END OF HORIZONTAL STROKE
8948 22F4   SMALL ELEMENT OF WITH VERTICAL BAR AT END OF HORIZONTAL STROKE
8949 22F5   ELEMENT OF WITH DOT ABOVE
8950 22F6   ELEMENT OF WITH OVERBAR
8951 22F7   SMALL ELEMENT OF WITH OVERBAR
8952 22F8   ELEMENT OF WITH UNDERBAR
8953 22F9   ELEMENT OF WITH TWO HORIZONTAL STROKES
8954 22FA   CONTAINS WITH LONG HORIZONTAL STROKE
8955 22FB   CONTAINS WITH VERTICAL BAR AT END OF HORIZONTAL STROKE
8956 22FC   SMALL CONTAINS WITH VERTICAL BAR AT END OF HORIZONTAL STROKE
8957 22FD   CONTAINS WITH OVERBAR
8958 22FE   SMALL CONTAINS WITH OVERBAR
8959 22FF   Z NOTATION BAG MEMBERSHIP

Recent OS X security vulnerabilities

Serving up some issues for OS X that were catalogued on Dark Reading. There are some serious concerns. I missed the keychain vulnerability which is very worrisome. I know a good many who are using password safes with standard copy and paste UX in their use.

The past several months have been full of bad news for Mac and iOS. Here’s a quick rundown of the highlights:

  • Keychain vulnerability: Reported to Apple a year ago, revealed to the public in June, and still not fixed, researchers discovered a vulnerability in Keychain on Mac OS X. Attackers could poison Keychain and steal the data it stores, which included passwords and tokens for a variety of applications, including iCloud and Facebook.
  • Gatekeeper vulnerabilities: At the Black Hat Las Vegas conference in August, Synack director of research Patrick Wardle detailed proof-of-concept exploits that circumvent Gatekeeper, Apple’s mechanism for preventing unsigned code from running on Mac. At the Virus Bulletin Prague conference in October, Wardle showed that Apple did not repair the problem with OS X El Capitan, released Sep. 30, and told Forbes that “Gatekeeper is no obstacle at all.”. A researcher snuck unsigned malicious code past Gatekeeper by wrapping it into a signed installer package. Gatekeeper only checks the installer package, not what’s in it — so it’s vulnerable to what is essentially a basic piggybacking attack that any good lesson in social engineering cautions against.
  • DYLD_PRINT_TO_FILE vulnerability: Discovered in July, patched in mid-August, this was a bug in an environment variable in Mac OS X Yosemite that enabled root access.
  • Tpwn vulnerability: Publicly disclosed in mid-August before it was patched, Tpwn was a memory corruption bug in the kernel of OS X Mavericks through Yosemite, that would allow local privilege escalation and grant attackers root access.
  • KeyRaider: In late August, the KeyRaider iOS malware stole 225,000 legitimate Apple accounts and slammed devices with ransomware, data theft, and phony purchases. The malware was secretly wrapped into unauthorized iOS apps, downloaded from a China-based third-party website, and thus it only affected jailbroken iOS devices.
  • AirDrop vulnerability: Disclosed in mid-September, a vulnerability in both Mac and ioS — patched in the new iOS 9 — lets attackers bomb any iOS and Mac device within Bluetooth range with malware, via the Airdrop file-sharing feature.
  • XCodeGhost: In late September, attackers showed they could hit non-jailbroken iOS devices too. XcodeGhost is a Trojanized version of Apple’s application development software, Xcode. Attackers uploaded it to Chinese cloud storage service Baidu Yunpan — a regional, third-party alternative to the Apple Store where download times are shorter for iOS and Mac developers in China. Innocent app developers then used XcodeGhost to write apps and upload them to the official App Store, never knowing that those apps were malicious. Originally, it was thought that only about 40 apps were infected with XcodeGhost, but that number was later increased to 4,000, including WeChat, ride-hailing app Didi Kuaidi, and music sharing app NetEase Music.
  • YiSpecter: In early October, researchers at Palo Alto Networks discovered about 100 apps in the iTunes App Store abusing Apple’s private APIs — used only by Apple itself and not available to app developers — in order to circumvent the Store’s security tools. YiSpecter download, install, and open applications, replace on-board apps with unwanted downloads, and force apps to show advertisements.
  • Yanked apps: Last week, Apple pulled some ad-blocking apps from its App Store after discovering that some of those apps installed root certificates that expose all traffic, including encrypted traffic, from the device to the application. Apple is allowing the app developers to resubmit to the store after they make alterations.

Most pie charts are bad, so here is a good one.

So, there is a movement in the data science community to kill the pie chart because all pie charts are bad. And much of the criticism is valid. Pie charts can be replaced by better representations, usually starting with bar charts, to convey more accurate and easier to interpret relational comparisons between categories. It’s harder to get away with willfully distorting perspective in a bar chart than it is in a pie chart – always ask to see the percentage values in someone else’s dense pie chart. When it does come into play is a binary comparison. That can be powerful. > 3, use a bar chart.

But with all of that said, I came across a very illuminating pie chart today that I wanted to preserve. pacman pie chart

There is also this actual pie. pie eaten pie chart