R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.
One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.
R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.
If at least you remember the name of the function you need to use, type help(functionname), as in
help(help)[Note that the text of this document is interspersed with R commands that may be copied and pasted directly into R.]
It is also good to know that most documentation includes a "see also" section, so if you can think of a function that is similar to the one you want, sometimes "see also" can be helpful. If you don't know the name of the function, here are two alternatives:
help.search("network") # Search for anything on the topic of "networks" help.start() # Start the interactive help browserFinally, there are many R introductions on the web, even at the R home page under "documentation". Just try googling "R introduction" sometime.
Among these packages are the "network" package and the "statnet" package. The former is available on CRAN; the latter is currently available at http://csde.washington.edu/statnet/.
In order to install a package that is found on CRAN, a user may simply use the install.packages function:
install.packages("network") # May need modifying, depending on your file permissionsFor statnet, there are instructions at http://csde.washington.edu/statnet/ on how to install.
Once the package is installed, its functionality may be easily accessed using the library function:
Each node has several associated nodal attributes. Let's read these into R using the read.table command. N.B.: There are a lot of user-controllable options with the read.table, as there are with many R functions.
help(read.table) # Take a look at the options. Note the default values. nodeinfo <- read.table("http://www.stat.psu.edu/~dhunter/Rnetworks/nodal.attr.txt", head=T)The = sign may be used in place of the <- operator for the purpose of assignment, but there are some situations where the former does not work for assignment, whereas the latter always does.
We can use the nodeinfo object we just created to learn about these actors -- but more on that later. First, let's take a look at the list of edges in the network:
myedges <- read.table("http://www.stat.psu.edu/~dhunter/Rnetworks/edgelist.txt")The myedges object is a 2-column matrix in which each row gives a female and male node ID, signifying an edge between the two. We can look at the dimensions of this matrix and view its first 10 rows:
dim(myedges) myedges[1:10,] # Note use of : for sequences and  for arraysNow let's explore a bit about the nodeinfo object as a way of introducing a few useful R functions. Here are the columns available:
names(nodeinfo)We can, say, determine the number of individuals of each race in this network:
table(nodeinfo$race) # These 3 commands are all equivalent table(nodeinfo[,2]) table(nodeinfo[,"race"])Or we can check a race-by-sex table:
table(nodeinfo$sex, nodeinfo$race) # hard to read, so add labels: table(sex=nodeinfo$sex, race=nodeinfo$race)There are terrific graphics capabilities in R. (Admittedly, though, it takes a long time to learn all the intricacies well enough to manipulate them all exactly as you wish.) We can make a histogram of the time of infection, or compare times of infection by race using side-by-side boxplots:
hist(nodeinfo$ti,nclass=20) # Note: "ti" is sufficient to identify the correct column boxplot(nodeinfo$time~nodeinfo$race)
diseasenw <- network(myedges)Among other things, we might wish to plot this network:
plot(diseasenw)Note the arrows, indicating this is a directed network. Directed is the default setting of the "network" function, but it's easy to override this. Also, let's include more information about nodal (vertex) attributes and specify that this is a bipartite network. For the latter, we provide the dividing line between nodes of one type and nodes of the other, or in this case the largest female ID. It is necessary when specifying a bipartite network that the nodes be listed with females first, then males (or in general, all individuals in one group are listed before all individuals in the other).
diseasenw <- network(myedges, directed=F, bipartite=132, vertex.attr = nodeinfo)Now plot again:
plot(diseasenw)There are many possible options we might wish to modify in our plot. The usual course of action to learn how to do this is to check the documentation for the plot function by typing help(plot). However, this is where things get a bit tricky!
Every R object has a particular "class". When one applies a generic function like plot to an object of a particular class (say, the "network" class), R checks to see whether it has a special function that it should use in order to operate on members of that class. Such special functions are named functionname.classname. Thus, to obtain help on plotting networks, here's what to do:
class(diseasenw) # Aha! There is a special class called "network" help(plot.network)Now we might be interested in, say, allowing color to denote race instead of sex and allowing shape to denote sex:
plot(diseasenw, vertex.col=3-nodeinfo$race, vertex.sides=2+nodeinfo$sex, main="Circles are female; triangles are male")
This simple "tutorial" is not meant to be a comprehensive introduction to R or even the network package in R. Yet I hope the glimpses provided above will give you the interest and the ability to look deeper.
I'll conclude with a quick word on...