Git cheatsheet for cli

#Git cheatsheet

Here is a cheatsheet on git.


Create

  • Clone an existing repo
    $ git clone ssh:://user@my.com/repo.git
  • Create a new local repo
    $ git init

Local Changes

  • Changed files in your working directory
    $ git status
  • Changes to tracked files
    $ git diff
  • Add all current changes to the next commit
    $ git add .
  • Add some changes in < file > to the next commit
    $ git add -p < file >
  • Commit all local changes in tracked files
    $ git commit -a
  • Commit previously staged changes
    $ git commit
  • Change the last commit
    $ git commit --amend

Commit History

  • Show all commits, starting with newest
    $ git log
  • Show changes over time for a specific file
    $ git log -p < file >
  • Who changes what and when in < file >
    $ git blame < file >

Branches & Tags

  • List all existing branches
    $ git branch -av
  • Switch HEAD branch
    $ git checkout < branch >
  • Create a new branch based on your current HEAD
    $ git branch < new-branch >
  • Create a new tracking branch based on a remote branch
    $ git checkout --track < remote/branch >
  • Delete a local branch
    $ git branch -d < branch >
  • Mark the current commit with a tag
    $ git tag < tag-name >

Update & Publish

  • List all currently configured remotes
    $ git remote -v
  • Show information about a remote
    $ git remote show < remote >
  • Add new remote repository, named < remote >
    $ git remote add < shortname > < url >
  • Download all changes from < remote >, but don't integrate into HEAD
    $ git fetch < remote >
  • Download changes and directly merge/integrate into HEAD
    $ git pull < remote > < branch >
  • Publish local changes on a remote
    $ git push < remote > < branch >
  • Delete a branch on the remote
    $ git branch -dr < remote/branch >
  • Publish your tags
    $ git push --tags

Merge & Rebase

  • Merge < branch > into your current HEAD
    $ git merge < branch >
  • Rebase your current HEAD onto < branch >
    $ git rebase < branch >
  • Abort a rebase
    $ git rebase --abort
  • Continue a rebase after resolving conflicts
    $ git rebase --continue
  • Use your configured merge tool to solve conflicts
    $ git mergetool
  • Use your editor to manually solve conflicts and (after resolving) mark file as resolved
    $ git add < resolved-file >
    $ git rm < resolved-file >

Undo

  • Discard all local changes in your working directory
    $ git reset --hard HEAD
  • Discard local changes in a specific file
    $ git checkout HEAD < file >
  • Revert a commit (by producing a new commit with contrary changes)
    $ git revert < commit >
  • Reset your HEAD pointer to a previous commit and discard all changes since
    $ git reset --hard < commit >
  • and preserve all changes as unstaged changes
    $ git reset < commit >
  • and preserve uncommitted local changes
    $ git reset --keep < commit >

Learn git

Purrr package for R is good for performance

Hadley’s project purrr

So, if you haven’t seen it, there’s some goodness over at github where Hadley Wickham has been
working to fill in some more of the holes in R should one want a more functional programming language
set of constructs to work with.

But, in true Hadley style, in addition to all of the functional programming syntactical goodness, the code is fast as well.

——more——

To install the package, which is not on CRAN as of this post, one need simply


# install.packages("devtools")
devtools::install_github("hadley/purrr")

Here is an example using purrr. The example sets to split a data frame into pieces, fit a model to each piece, summarise and extract R^2.

library(purrr)
 
mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared")

Here is another, more complicated example. It generates 100 random test-training splits, fits a model to each training split then evaluates based on the test split:

library(dplyr)
randomgroup <- function(n, probs) {
  probs <- probs / sum(probs)
  g <- findInterval(seq(0, 1, length = n), c(0, cumsum(probs)),
    rightmost.closed = TRUE)
  names(probs)[sample(g)]
}
partition <- function(df, n, probs) {
  replicate(n, split(df, randomgroup(nrow(df), probs)), FALSE) %>%
    zip() %>%
    asdataframe()
}
 
msd <- function(x, y) sqrt(mean((x - y) ^ 2))
 
# Genearte 100 rbootandom test-training splits
boot <- partition(mtcars, 100, c(test = 0.8, training = 0.2))
boot
 
boot <- boot %>% mutate(
  # Fit the models
  models = map(training, ~ lm(mpg ~ wt, data = mtcars)),
  # Make predictions on test data
  preds = map2(models, test, predict),
  diffs = map2(preds, test %>% map("mpg"), msd)
)
 
# Evaluate mean-squared difference between predicted and actual
mean(unlist(boot$diffs))

As Hadley writes about the philosophy for purrr, the goal is not to try and simulate Haskell in R: purrr does not implement currying or destructuring binds or pattern matching. The goal is to give you similar expressiveness to an FP language, while allowing you to write code that looks and works like R.

  • Instead of point free style, use the pipe, %>%, to write code that can be read from left to right.

  • Instead of currying, we use … to pass in extra arguments.

  • Anonymous functions are verbose in R, so we provide two convenient shorthands. For predicate functions, ~ .x + 1 is equivalent to function(.x) .x + 1. For chains of transformations functions, . %>% f() %>% g() is equivalent to function(.) . %>% f() %>% g().

  • R is weakly typed, so we can implement general zip(), rather than having to specialise on the number of arguments. (That said I still provide map2() and map3() since it’s useful to clearly separate which arguments are vectorised over).

  • R has named arguments, so instead of providing different functions for minor variations (e.g. detect() and detectLast()) I use a named argument, .first. Type-stable functions are easy to reason about so additional arguments will never change the type of the output.

Timings

OK, so how about some measurements of performance. Let us create a 10 x 10,000 matrix with one row for each combination of the levels in f.

# Some data
nvars <- 10000
nsamples <- 500
sample_groups <- 5
MAT <- replicate(nvars, runif(n=nsamples))
 
# And a grouping vector:
 
f <- rep_len(1:sample_groups, nsamples)
f <- LETTERS[f]

In pursuit of this, the first task is to calculate the mean for each group for all columns. First, a high order function in R
leveraging helpers.

# Settings
aggr_FUN  <- mean
combi_FUN <- function(x,y) "/"(x,y) 
 
# helper function
pasteC <- function(x,y) paste(x,y,sep=" - ")
 
# aggregate
system.time({
temp2 <- aggregate(. ~ class, data = cbind.data.frame(class=f,MAT), aggr_FUN)
})

which yields

user system elapsed
13.457 1.187 14.766

Here’s an approach with reshape

# reshape2
library(reshape2)
system.time({
temp3 <- recast(data.frame(class=f,MAT),class ~ variable,id.var="class",aggr_FUN)
})

which has

user system elapsed
1.945 0.454 2.525

7x faster. Finally, here is a purrr approach. Firstly, look at the elegance of the representation. Then look at the timings.

# purrr 
library(purrr)
system.time({
    tmp <- data.frame(class = f, MAT) %>%
        slicerows("class") %>%
        byslice(map, aggr_FUN)
})

user system elapsed
0.512 0.043 0.569

Another 4x speedup, or 28x faster than the original approach with aggregate. Impressive. The purrr work deserves to be
looked at and picked up by R devs, as it is both elegant and performant.

All of this has resulted in

tmp[,1:10]
Source: local data frame [5 x 10]
 
  class        X1        X2        X3        X4        X5        X6        X7        X8        X9
1     A 0.5194124 0.5066943 0.5326734 0.5042122 0.4190162 0.4882796 0.4947138 0.4701085 0.4982535
2     B 0.5267829 0.4545410 0.4883640 0.4894278 0.4672661 0.4477106 0.4832262 0.4583598 0.4767773
3     C 0.4703151 0.4994032 0.4842406 0.4960585 0.5276044 0.4817216 0.4853307 0.5331066 0.4881527
4     D 0.5139762 0.5318747 0.5071466 0.4657025 0.4972884 0.4815889 0.5049296 0.4685044 0.5535197
5     E 0.5439962 0.4479991 0.4640088 0.4946168 0.4716724 0.5370196 0.5011706 0.5219855 0.5160875

Yahoo Financial data API

Access financial data from web api at yahoo

Yahoo used to run a very rich API to financial data at , but, alas, it serves no more on most of the URLs. There is still a service, but it
is a pale shadow of the former. The former URLs had up to 84 parameters! (see the reference of the API at the bottom). Now
you can query


http://real-chart.finance.yahoo.com/table.csv?s=AAPL

and you will get a return of Apple’s data, going back for decades:


Date,Open,High,Low,Close,Volume,Adj Close

It’s all comma delimited, and there doesn’t appear to be any secret xml switch. This is all dumbed down from what used to be there.
You can get the same result from another deprecated service they had at
http://ichart.finance.yahoo.com/table.csv?s=GOOG, again replacing
your S param with the symbol you wish to look up. No other params seem to make any difference to the request or will
result in a 404. One doesn’t seem to be able to pull multiple symbols in a single query.

There are other services that still seem to be running:

Download your CSV

http://download.finance.yahoo.com/d/quotes.csv?s=AAPL+GOOG&f=snl1c1p2&e=.csv

which will give you a downloaded CSV with quotes in the form of:


"AAPL","Apple Inc.",112.72,-0.20,"-0.18%"
"GOOG","Google Inc.",628.59,-9.02,"-1.41%"

This is accepting multiple symbols. It also seems to be accepting many of the old parameters referenced below.
Perhaps this should be reworked into a little api as it’s pretty hairy just now. I might do that in R to see how useful
it could be for time series. the extension makes no difference to what the format is nor the actual file sent. So,

  • s = <+SYM>…
  • f =
  • e no difference

Want a chart?

You can request a symbol at http://chart.finance.yahoo.com/z?s=GOOG
and you will get Google stock chart

Want a snapshot, no historical, in XML?

You can still reach through to the backend through this quite amazing piece of internet fossil evidence.
http://query.yahooapis.com/v1/public/yql?q=select%20%20from%20yahoo.finance.quotes%20where%20symbol%20in%20%28%22AAPL,GOOG%22%29&env=store://datatables.org/alltableswithkeys
will yield a *mostly
empty XML block. But, hey, it’s there.

Need a symbol, go here.

Parameter API

  • a Ask
  • a2 Average Daily Volume
  • a5 Ask Size
  • b Bid
  • b2 Ask (Real-time)
  • b3 Bid (Real-time)
  • b4 Book Value
  • b6 Bid Size
  • c Change & Percent Change
  • c1 Change
  • c3 Commission
  • c6 Change (Real-time)
  • c8 After Hours Change (Real-time)
  • d Dividend/Share
  • d1 Last Trade Date
  • d2 Trade Date
  • e Earnings/Share
  • e1 Error Indication (returned for symbol changed / invalid)
  • e7 EPS Estimate Current Year
  • e8 EPS Estimate Next Year
  • e9 EPS Estimate Next Quarter
  • f6 Float Shares
  • g Day’s Low
  • h Day’s High
  • j 52-week Low
  • k 52-week High
  • g1 Holdings Gain Percent
  • g3 Annualized Gain
  • g4 Holdings Gain
  • g5 Holdings Gain Percent (Real-time)
  • g6 Holdings Gain (Real-time)
  • i More Info
  • i5 Order Book (Real-time)
  • j1 Market Capitalization
  • j3 Market Cap (Real-time)
  • j4 EBITDA
  • j5 Change From 52-week Low
  • j6 Percent Change From 52-week Low
  • k1 Last Trade (Real-time) With Time
  • k2 Change Percent (Real-time)
  • k3 Last Trade Size
  • k4 Change From 52-week High
  • k5 Percebt Change From 52-week High
  • l Last Trade (With Time)
  • l1 Last Trade (Price Only)
  • l2 High Limit
  • l3 Low Limit
  • m Day’s Range
  • m2 Day’s Range (Real-time)
  • m3 50-day Moving Average
  • m4 200-day Moving Average
  • m5 Change From 200-day Moving Average
  • m6 Percent Change From 200-day Moving Average
  • m7 Change From 50-day Moving Average
  • m8 Percent Change From 50-day Moving Average
  • n Name
  • n4 Notes
  • o Open
  • p Previous Close
  • p1 Price Paid
  • p2 Change in Percent
  • p5 Price/Sales
  • p6 Price/Book
  • q Ex-Dividend Date
  • r P/E Ratio
  • r1 Dividend Pay Date
  • r2 P/E Ratio (Real-time)
  • r5 PEG Ratio
  • r6 Price/EPS Estimate Current Year
  • r7 Price/EPS Estimate Next Year
  • s Symbol
  • s1 Shares Owned
  • s7 Short Ratio
  • t1 Last Trade Time
  • t6 Trade Links
  • t7 Ticker Trend
  • t8 1 yr Target Price
  • v Volume
  • v1 Holdings Value
  • v7 Holdings Value (Real-time)
  • w 52-week Range
  • w1 Day’s Value Change
  • w4 Day’s Value Change (Real-time)
  • x Stock Exchange
  • y Dividend Yield

R Data Structures

R Data Structures overview by Hadley Wickham

If you are working with any programming language, there is nothing more important to understand fundamentally than
the language’s underlying data structures. Wickham on R Data Structures is an
excellent overview for R programmers.


There are five fundamental data types in R.

  • Homogeneous
    1. 1D – Atomic vector
    2. 2D – Matrix
    3. nD – Array
  • Heterogeneous
    1. List
    2. Data frame

Hadley goes through the five to show how they compare, contrast and, most importantly, they are interrelated. Important stuff.
He also goes through a small set of exercises to test comprehension. I think that some of these could be used
as bones for interview questions.

Taken from his book, Advanced R which
is well worth the price and should be read by serious R folks.

Sublime for blogging in WP

Reducing Friction in WordPress blogging

So, I just found a nice post on zero-friction blogging using the all good Sublime text editor.
Which is a real treat, because I used to use TextMate directly into WP, but that went away
a couple of years ago with the demise of TextMate and the wierdness of WP. But, take heart!
Sublime continues to amaze me as an even better editor than TextMate, with so
much Emacs power in it. And now the plugin community has really ponied up and made
some nice plugins that can greatly reduce friction in your workflow for making posts into WP.


Use Markdown again!

Markdown is back in my workflow. For the last few months I’ve been using the WP admin
blog editor. Even with the dreaded WYSIWYG turned off, it still pretty much sucks.
Markdown is so much easier.
I just couldn’t find a replacement for the markdown plugin that was working fine in
my old version of TextMate.

So, assuming that you have already installed Sublime then you
need to install Package Control if you haven’t already.
Also, you might want to learn about some of the awesome sauce you get in Sublime. Once you have those done, you need to open
up the Package Installer in Sublime {super}+{shift}+{p} and install the following:

Create in Sublime MD

Go into Sublime with those plugins. Create a new MD file. Sublime is awesome in that it is
contextually aware of file type and will load a whole slew of macros and functions relevant to
that context. MD tells it that it’s markdown, and you’ll get all of that sugar to drive your post.

Once you have it to where you think your’re good, use the OmniMarkupPreviewer to see how
it’s going to look as HTML, which is what you’re going to output it to in order to make
it nice for WP to consume. You can do that with {alt}+{super}+{O} which will cleverly force the
HTML to be rendered by your default browser. When you’re happy with that render, use
Markdown to Clipboard to make a clean HTML export via {right-click} and paste that
into your new post in WP.

This isn’t as straight as just uploading it, but it still gives you Markdown and all of the goodness
of the Sublime editor, plus the nice UX of the markdown editing theme. We might yet
be able to get it to push into WP, I’ll be looking into that. I saw that there were some
older (circ 2014) plugins for WP that might be alright or at least updateable.

Code syntax tool for R into HTML

So this is useful for R people. Need to place some code up in HTML, want to have syntax highlighting, don’t want to fight code and pre format tags all day in wordpress? Paste your block into pretty-r:

data(tips, package="reshape2")
 
tipsAnova <- aov(tip~day-1, data=tips)
tipsLM <- lm(tip~day-1, data=tips)
summary(tipsAnova)
summary(tipsLM)

What this is doing is allowing you to maintain syntactically highlighted and well-formed R source code in your HTML pages, easily. You have already written the source in R, cut and paste it into the form, and it will return a clean set of styled HTML for pasting into any web source you need.

Superb writeup of sql select approaches in R

This is an excellent writeup, with consideration for performance as well as expressibility. And it has been touched recently with additions for Hadley Wickham’s awesome new dplyr package.

Here is a synopsis of the dplyr methods:

* inner_join(x, y, by = NULL, copy = FALSE, …): return all rows from x where there are matching values in y, and all columns from x and y
* left_join(x, y, by = NULL, copy = FALSE, …): return all rows from x, and all columns from x and y
* semi_join(x, y, by = NULL, copy = FALSE, …): return all rows from x where there are matching values in y, keeping just columns from x.
* anti_join(x, y, by = NULL, copy = FALSE, …): return all rows from x where there are not matching values in y, keeping just columns from x

async or threaded file downloads in python 3.x

Capturing here some nice examples of using asyncio and threads to manage multiple file downloads in Python 3.x. Go here for more in-depth discussion.

You could use a thread pool to download files in parallel:


 #!/usr/bin/env python3
from multiprocessing.dummy import Pool # use threads for I/O bound tasks
from urllib.request import urlretrieve

urls = [...]
result = Pool(4).map(urlretrieve, urls) # download 4 files at a time

You could also download several files at once in a single thread using asyncio:


 #!/usr/bin/env python3
import asyncio
import logging
from contextlib import closing
import aiohttp # $ pip install aiohttp

@asyncio.coroutine
def download(url, session, semaphore, chunk_size=1<<15):
    with (yield from semaphore): # limit number of concurrent downloads
        filename = url2filename(url)
        logging.info('downloading %s', filename)
        response = yield from session.get(url)
        with closing(response), open(filename, 'wb') as file:
            while True: # save file
                chunk = yield from response.content.read(chunk_size)
                if not chunk:
                    break
                file.write(chunk)
        logging.info('done %s', filename)
    return filename, (response.status, tuple(response.headers.items()))

urls = [...]
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
with closing(asyncio.get_event_loop()) as loop, 
     closing(aiohttp.ClientSession()) as session:
    semaphore = asyncio.Semaphore(4)
    download_tasks = (download(url, session, semaphore) for url in urls)
    result = loop.run_until_complete(asyncio.gather(*download_tasks))

qdapRegex library for R

I just found the qdapRegex package for R, part of the larger qdap packages that Jason Gray and Tyler Rinker have put together for supporting text munging/processing for discourse analysis, etc. There’s a lot in there, with four libraries, including the Regex set, some tools, dictionaries and a qdap proper for the qualitative analysis (pre)-processing.

Continue reading qdapRegex library for R