1. Background:
The ggplot2 package is extremely flexible. The “gg” stands for the Grammar of Graphics.
In the book, The Grammar of Graphics, Wilkinson showed how you could describe plots not as discrete types like bar plot or pie chart, but using a “grammar” that would work not only for plots we commonly use but for almost any conceivable graphic. From this perspective a pie chart is just a bar chart with a circular (polar) coordinate system replacing the rectangular Cartesian coordinate system.
The ggplot2 is a simplified implementation of grammar of graphics written by Hadley Wickham for R. Wickham’s book, ggplot2: Elegant Graphics for Data Analysis, provides a detailed presentation of the ggplot2 package.
2. Functions and Examples
The ggplot2 package offers two main functions: quickplot() and ggplot().
 The quickplot() function – also known as qplot(). It is particularly easy to use for simple plots. Below is an example of the default plots that qplot() makes. The command that created each plot is shown in the title of each graph.
2. The ggplot() function.
What are the fundamental parts of every data graph? They are:
 Aesthetics – these are the roles that the variables play in each graph. A variable may control where points appear, the color or shape of a point, the height of a bar and so on.
 Geoms – these are the geometric objects. Do you need bars, points, lines?
 Statistics – these are the functions like linear regression you might need to draw a line.
 Scales – these are legends that show things like circular symbols represent females while circles represent males.
 Facets – these are the groups in your data. Faceting by gender would cause the graph to repeat for the two genders.
Let us start our use of the ggplot() function with a single stacked bar plot. On the xaxis there really is no variable, so I plugged in a call to the factor() function that creates an empty one on the fly. I then fill the single bar in using the fill argument. There is only one type of geometric object on the plot, which I add with geom_bar.
1
2

> ggplot (mydata100, aes (x = factor ( "" ), fill = workshop) ) + + geom_bar () 
The xaxis comes out labeled as “factor(“”)” but we can overwrite that with a title for the xaxis. What is particularly interesting is that this can become a pie chart simply by changing its coordinate system to polar. The final line of code changes the label on the discrete xaxis to blank with “”.
1
2
3
4
5

> ggplot (mydata100, + aes (x = factor ( "" ), fill = workshop) ) + + geom_bar () + + coord_polar (theta = "y" ) + + scale_x_discrete ( "" ) 
Bar Plots
Note the unusual use of the plus sign “+” to add the effect of of geom_bar() to ggplot(). Only one variable plays an “aesthetic” role: workshop. The aes() function sets that role. So here is one way to write the code:
1
2

> ggplot (mydata100) + + geom_bar ( aes (workshop) ) 
A very useful feature of the ggplot() function is that it can pass aesthetic roles to all the functions that are “added” to it. As graphs become more complex, it can be a big timesaver to set as many aesthetic roles in the ggplot() function call and let it pass them through to various other functions that we will add on to build a more complex plot. For example, we can create exactly the same barplot with this code:
1
2

> ggplot (mydata100, aes (workshop) ) + + geom_bar () 
1
2

> ggplot (mydata100, aes (workshop) ) + + geom_bar () + coord_flip () 
If you want to fill the bars with color, you can do that using the “fill” argument.
1
2

> ggplot (mydata100, aes (workshop, fill = workshop ) ) + + geom_bar () 
Below I use fill to color the bars by workshop and set the “position” to stack.
1
2

> ggplot (mydata100, aes (gender, fill = workshop) ) + + geom_bar (position = "stack" ) 
In the plot above, the height of the bars represents the total number of males and females. This is fine if you want to compare counts, but if you want to compare proportions of each gender that took each class, you would have to make the bars equal heights. You can do that by simply changing the position to “fill”.
1
2

> ggplot (mydata100, aes (gender, fill=workshop) ) + + geom_bar (position= "fill" ) 
Here is the same plot changing only the bar position to be “dodge”.
1
2

> ggplot (mydata100, aes (gender, fill=workshop ) ) + + geom_bar (position= "dodge" ) 
You can change any of the above colored graphs to shades of grey by simply adding the scale_fill_grey() function. Here is the plot immediately above repeated in greyscale.
1
2
3

> ggplot (mydata100, aes (gender, fill=workshop ) ) + + geom_bar (position= "dodge" ) + + scale_fill_grey (start = 0, end = 1) 
You can get the same information that is in the above plot by making small separate plots for one of the groups. You can accomplish that with the facet_grid() function. It accepts a formula in the form “rows ~ colums”, so using “gender ~ .” asks for two rows for the genders (three if we had not removed missing values) and no columns.
1
2

> ggplot (mydata100, aes (workshop) ) + + geom_bar () + facet_grid (gender ~ .) 
Unsummarized Data
1
2
3
4
5
6
7

myTemp < data.frame ( + myGroup= factor ( c ( "Before" , "After" ) ), + myMeasure= c (40, 60) + ) > ggplot (data=myTemp, aes (myGroup, myMeasure) ) + + geom_bar () 
Dot Charts
Dot charts are similar to bar charts, but since they are plotting points on both an x and yaxis, they require a special variable called “..count..”. It calculates the counts and lets you plot them on the yaxis. The points use the “bin” statistic. Since dot charts are usually shown “sideways” I am adding the coord_flip() funtion.
1
2
3

> ggplot (mydata100, + aes (workshop, ..count.. ) ) + + geom_point (stat = "bin" , size = 3) + coord_flip () + facet_grid (gender ~ .) 
Adding Titles and Labels
To add a title, use the opts() function and its title argument. Adding titless to axes is trickier. You use four different functions depending on the axis and whether or not it is discrete: scale_x_discrete scale_y_distrete scale_x_continuous scale_y_continuous For a bar plot, the xaxis is discrete so I will use scale_x_discrete to assign a label to it. The character sequence “\n” tells R to go to a new line in all R packages.
1
2
3
4

> ggplot (mydata100, aes (workshop, ..count..)) + + geom_bar () + + opts ( title= "Workshop Attendance" ) + + scale_x_discrete ( "Statistics Package \nWorkshops" ) 
Histograms
The geom_histogram function is all you need. I have set the color of the bar edges to white. Without that, the bars all run together in the same shade of grey.
1
2

> ggplot (mydata100, aes (posttest) ) + + geom_histogram (color= "white" ) 
1
2

> ggplot (mydata100, aes (posttest) ) + > geom_histogram (binwidth = 0.5) 
If you prefer a density plot, that is easy too.
1
2

> ggplot (mydata100, aes (posttest)) + > geom_density () 
1
2
3
4

> ggplot (data=mydata100) + + geom_histogram ( aes (posttest, ..density..) ) + + geom_density ( aes (posttest, ..density..) ) + > geom_rug ( aes (posttest) ) 
1
2
3

> ggplot (mydata100, aes (posttest) ) + + geom_histogram (color = "white" ) + + facet_grid (gender ~ .) 
Normal QQ Plots
Normal QQ plots are done in ggplot with the stat_qq() function and the sample aesthetic.
1
2

> ggplot (mydata100, aes (sample = posttest) ) + + stat_qq () 
Strip Plots
With fairly small data sets, you can do strip plots using the point geom.
1
2

> ggplot (mydata100, aes (workshop, posttest) ) + + geom_point () 
1
2

> ggplot (mydata100, aes (workshop, posttest) ) + + geom_jitter () 
Scatter and Line Plots Various type of scatter and line plots can be done using different geoms as shown below. You can, of course, add multiple geoms to a plot. For example, you might want both points and lines, in which case you would simply add both geoms.
1
2

> ggplot (mydata100, aes (pretest, posttest)) + > geom_point () 
When you add a line geom, the ggplot sorts the data along the xaxis automatically. If you had timeseries data that were not sorted by date, it would do so.
1
2

> ggplot (mydata100, aes (pretest, posttest) ) + + geom_line () 
1
2

> ggplot (mydata100, aes (pretest, posttest) ) + + geom_path () 
Scatter Plots for Large Data Sets
Large data sets provide a challenge since so many points are obscured by other points. First let us create a data set with 5,000 points.
1
2
3
4
5

> pretest2 < round ( rnorm ( n = 5000, mean = 80, sd = 5) ) > posttest2 < round ( pretest2 + rnorm ( n = 5000, mean = 3, sd = 3) ) > pretest2 [pretest2 > 100] < 100 > posttest2[posttest2 > 100] < 100 > temp= data.frame (pretest2,posttest2) 
Now I will plot the data using smallsized points, jittering their positions and coloring them with some transparency (called “alpha” in computerspeak).
1
2
3

> ggplot (temp, aes (pretest2, posttest2), + size=2, position = position_jitter (x = 2,y = 2) ) + + geom_jitter (colour= alpha ( "black" ,0.15) ) 
1
2

> ggplot (temp, aes ( x=pretest2, y=posttest2) ) + + geom_point ( size=1 ) + geom_density2d () 
1
2

> ggplot (temp, aes (pretest2, posttest2)) + + geom_hex ( bins=30 ) 
Scatter Plots with Fit Lines
The ggplot() function makes it particularly easy to add fit lines to scatter plots. Simply adding the geom_smooth() function does the trick.
1
2

> ggplot (mydata100, aes (pretest, posttest) ) + + geom_point () + geom_smooth () 
Adding a linear regression fit requires only the addition of “method = lm” argument.
1
2

> ggplot (mydata100, aes (pretest, posttest) ) + + geom_point () + geom_smooth (method=lm) 
1
2
3

> ggplot (mydata100, + aes (pretest, posttest, label = as.character (gender))) + + geom_text (size = 3) 
To use point shapes to represent the value of a third variable, simply set the shape aesthetic.
1
2

> ggplot (mydata100, aes (pretest, posttest) ) + + geom_point ( aes (shape=gender ) ) 
Scatter Plots with Linear Fits by Group
One way to use a different fit for each group is to do them on the same plot. This involves setting aesthetics for both linetype and point shape. I tend to think of lines being added to the scattered points, but in this case I placed the geom_point() call last so that the shading from the gray confidence intervals would not shade the points themselves.
1
2
3

> ggplot (mydata100, aes (pretest, posttest) ) + + geom_smooth ( aes (linetype = gender), method = "lm" ) + + geom_point ( aes (shape = gender) ) 
Another way to display linear fits per group is to facet the plot.
1
2
3
4
5

> ggplot (mydata100, + aes (pretest, posttest ) ) + + geom_smooth (method = "lm" ) + + geom_point () + + facet_grid (gender ~ .) 
Box Plots
The ggplot package offers considerable control over how you can do box plots. Here I plot the raw points and then the boxes on top of them. This hides the points that are actually in the middle 50% of the data. They are usually dense and of less interest than the points that are further out. If you have a lot of data you might consider using geom_jitter() to spread the points around, preventing overplotting.
1
2
3

> ggplot (mydata100, + aes (workshop, posttest )) + + geom_point () + geom_boxplot () 