What is R?

Copied from http://www.r-project.org/about.html:

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

Getting help with R

R is easy to begin to use but somewhat more difficult to master. As with most R-like programs (e.g., MATLAB, Python, even Mathematica and Maple to a certain extent), a common problem is "I know what I want to do, and I know there is a way to do it in R, but I can't remember (or never knew) how to do it."

If at least you remember the name of the function you need to use, type help(functionname), as in

[Note that the text of this document is interspersed with R commands that may be copied and pasted directly into R.]

It is also good to know that most documentation includes a "see also" section, so if you can think of a function that is similar to the one you want, sometimes "see also" can be helpful. If you don't know the name of the function, here are two alternatives:

   help.search("network") # Search for anything on the topic of "networks"
   help.start() # Start the interactive help browser
Finally, there are many R introductions on the web, even at the R home page under "documentation". Just try googling "R introduction" sometime.

Network and statnet (and other packages)

One of the most important features of the R language is its extensibility. Numerous researchers have created R packages and posted them publicly, mostly on the comprehensive R archive network (CRAN) accessible from the R website.

Among these packages are the "network" package and the "statnet" package. The former is available on CRAN; the latter is currently available at http://csde.washington.edu/statnet/.

In order to install a package that is found on CRAN, a user may simply use the install.packages function:

   install.packages("network") # May need modifying, depending on your file permissions
For statnet, there are instructions at http://csde.washington.edu/statnet/ on how to install.

Once the package is installed, its functionality may be easily accessed using the library function:


A network dataset

We'll look at a small subset of a 10,000 node, dynamically evolving network in which edges signify (hetero)sexual relationships between the nodes involved. Because female-female and male-male contacts are disallowed in this simplistic simulation, the network is bipartite (i.e., the nodes may be partitioned into two subsets within which no edges occur). Contacts are undirected. At the beginning of this simulation, 5 randomly chosen "seeds" -- 4 white males and 1 black male -- are infected with a hypothetical STD which has 100% transmissibility upon contact. There are only two race groups here, and the proportions of the sexes and the races among the 10,000 network nodes are chosen to match those from a certain dataset. At the end of the particular 3600-day simulation we are considering here, 256 nodes were infected. We will look at these 256 nodes along with all contacts among them.

Each node has several associated nodal attributes. Let's read these into R using the read.table command. N.B.: There are a lot of user-controllable options with the read.table, as there are with many R functions.

   help(read.table) # Take a look at the options.  Note the default values.
   nodeinfo <- read.table("http://www.stat.psu.edu/~dhunter/Rnetworks/nodal.attr.txt", head=T) 
The = sign may be used in place of the <- operator for the purpose of assignment, but there are some situations where the former does not work for assignment, whereas the latter always does.

We can use the nodeinfo object we just created to learn about these actors -- but more on that later. First, let's take a look at the list of edges in the network:

   myedges <- read.table("http://www.stat.psu.edu/~dhunter/Rnetworks/edgelist.txt")
The myedges object is a 2-column matrix in which each row gives a female and male node ID, signifying an edge between the two. We can look at the dimensions of this matrix and view its first 10 rows:
   myedges[1:10,] # Note use of : for sequences and [] for arrays 
Now let's explore a bit about the nodeinfo object as a way of introducing a few useful R functions. Here are the columns available:
We can, say, determine the number of individuals of each race in this network:
   table(nodeinfo$race)  # These 3 commands are all equivalent
Or we can check a race-by-sex table:
   table(nodeinfo$sex, nodeinfo$race) # hard to read, so add labels:
   table(sex=nodeinfo$sex, race=nodeinfo$race)
There are terrific graphics capabilities in R. (Admittedly, though, it takes a long time to learn all the intricacies well enough to manipulate them all exactly as you wish.) We can make a histogram of the time of infection, or compare times of infection by race using side-by-side boxplots:
   hist(nodeinfo$ti,nclass=20) # Note:  "ti" is sufficient to identify the correct column

Creating a network object

Remember the library(network) command? We now have a set of tools with which we can operate on network objects (thanks to Carter Butts at UC-Irvine, the principal author of "network"). Let's start by creating such an object:
   diseasenw <- network(myedges)
Among other things, we might wish to plot this network:
Note the arrows, indicating this is a directed network. Directed is the default setting of the "network" function, but it's easy to override this. Also, let's include more information about nodal (vertex) attributes and specify that this is a bipartite network. For the latter, we provide the dividing line between nodes of one type and nodes of the other, or in this case the largest female ID. It is necessary when specifying a bipartite network that the nodes be listed with females first, then males (or in general, all individuals in one group are listed before all individuals in the other).
   diseasenw <- network(myedges, directed=F, bipartite=132, 
                vertex.attr = nodeinfo)
Now plot again:
There are many possible options we might wish to modify in our plot. The usual course of action to learn how to do this is to check the documentation for the plot function by typing help(plot). However, this is where things get a bit tricky!

Every R object has a particular "class". When one applies a generic function like plot to an object of a particular class (say, the "network" class), R checks to see whether it has a special function that it should use in order to operate on members of that class. Such special functions are named functionname.classname. Thus, to obtain help on plotting networks, here's what to do:

   class(diseasenw) # Aha!  There is a special class called "network"
Now we might be interested in, say, allowing color to denote race instead of sex and allowing shape to denote sex:
   plot(diseasenw, vertex.col=3-nodeinfo$race, 
        main="Circles are female; triangles are male")

This simple "tutorial" is not meant to be a comprehensive introduction to R or even the network package in R. Yet I hope the glimpses provided above will give you the interest and the ability to look deeper.

I'll conclude with a quick word on...

Other packages built on network

In addition to the "statnet" package already mentioned (which allows users to fit exponential-family random graph models, or ERGMs), several other packages rely on the "network" package. These are available on CRAN: Still others are available at Carter's web site, http://erzuli.ss.uci.edu/R.stuff/, or at the statnet site, http://csde.washington.edu/statnet/