A Brief Introduction to Seaborn

A scatterplot matrix with density plots, displaying visualizations of the Iris dataset, created using the Seaborn library. The word ‘seaborn’ is prominently overlaid in large black text across the center of the image

There are a few Python libraries that enable researchers to create exceptional visualizations. Among these, I’d like to introduce Seaborn—a versatile and user-friendly library specifically designed for crafting informative and visually appealing statistical plots. Built on top of Matplotlib, Seaborn streamlines the creation of advanced data visualizations and elevates the aesthetic quality of standard graphs. Its integration with pandas makes it an ideal choice for data analysis and exploratory data visualization. According to Seaborn’s official introduction: “Seaborn’s plotting functions operate on data frames and arrays containing entire datasets, internally performing the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API allows users to focus on the meaning of the different elements in your plots, rather than on the details of how to draw them.

Below, I will present four examples of what Seaborn can do. 

  • #Let’s get the test data ready
  • import pandas as pd
  • input_data = pd.read_csv(“Recoded_CSS_data.csv”)
  • data = input_data[“collgpa_standardized”].round(2)

Violin Plot

A violin plot is a powerful data visualization tool that combines the features of a box plot and a kernel density plot. It provides a comprehensive view of the distribution of a dataset by displaying both the summary statistics (e.g., median, quartiles) and the probability density of the data at different values. Compared to a box plot, a violin plot reveals the full distribution shape, making it particularly useful for comparing multiple groups or datasets. It is commonly used in exploratory data analysis to detect patterns, differences, or anomalies in the data.

Here’s an example of generating a violin plot with Seaborn:

  • import seaborn as sns
  • import matplotlib.pyplot as plt
  • #generate the violin plot
  • sns.violinplot(x=input_data[“RACE”],y = input_data[“collgpa_standardized”].round(2),palette=’pastel’)
  • plt.show()
A violin plot displaying the distribution of standardized college GPA scores across different racial categories, labeled as 0, 1, 2, 3, and 4 on the x-axis. Each violin is color-coded and contains a box plot inside, representing the median, interquartile range, and distribution density of the data.

KDE Plot

A Kernel Density Estimation (KDE) Plot is a statistical tool used to estimate the probability density function of a continuous random variable. It provides a smoothed representation of the data’s ›distribution, making it easier to visualize underlying patterns and trends compared to a histogram. KDE plots work by placing a kernel (a smooth, symmetric function, typically Gaussian) over each data point and summing these kernels to create a continuous density curve. It is particularly useful for analyzing the shape of data distributions.

Here’s an example of generating a KDE plot with Seaborn:

  • import seaborn as sns
  • import matplotlib.pyplot as plt
  • sns.kdeplot(input_data[“collgpa_standardized”].round(2),color = “orange”)
  • plt.show()
A density plot displaying the distribution of standardized college GPA scores. The x-axis represents the standardized college GPA values, ranging from approximately -4.5 to 1.5, while the y-axis represents density. The plot has multiple peaks, indicating variations in the distribution, and is drawn in an orange line on a white background

Scatter Plot

A Scatter Plot is a type of data visualization used to demonstrate the relationship between two variables. Each point in the plot represents an observation, with its position determined by the values of the two variables: one on the x-axis and the other on the y-axis. Scatter plots are widely used to identify correlations, patterns, clusters, or outliers in data, and they can also incorporate observations from higher dimensions using additional attributes like color, size, or shape.

Here’s an example of generating a Scatter plot with Seaborn:

  • import seaborn as sns
  • import pandas as pd
  • import matplotlib.pyplot as plt
  • #loading the example data
  • df = sns.load_dataset(‘iris’,data_home = ‘seaborn-data’,cache=False)
  • sns.regplot(x= df[“sepal_length”], y = df[“sepal_width”],color = “red”)
  • plt.show()
A scatterplot showing the relationship between sepal length and sepal width, with red circular markers representing individual data points. A red regression line with a shaded confidence interval is overlaid, indicating a slight negative correlation between the two variables. The x-axis is labeled ‘sepal_length,’ ranging from approximately 4.5 to 8.0, and the y-axis is labeled ‘sepal_width,’ ranging from approximately 2.0 to 4.5

Heat Plot

A Heat Plot, or a Heatmap, is a data visualization technique that uses color to represent values in a matrix or muti-dimensional data set. Each cell in the heatmap corresponds to a value, and the intensity or hue of the color indicates the magnitude of that value. Heatmaps are particularly useful for identifying patterns, correlations, or anomalies in data, making them popular for visualizing relationships in tables, time series data, or correlations between variables.

Here’s an example of generating a heatmap with Seaborn:

  • import numpy as np
  • import pandas as pd
  • import seaborn as sns
  • #loading the example data
  • df = pd.DataFrame(np.random.random((7,7)),columns = [“a”,”b”,”c”,”d”,”e”,”f”,”g”])
  • #generate the default heatmap
  • p1 = sns.heatmap(df)
 heatmap with a 7x7 grid displaying numerical values using a color gradient. The x-axis is labeled with letters from ‘a’ to ‘g’, and the y-axis is labeled with numbers from 0 to 6. The color scale ranges from dark purple (low values) to light beige (high values), with an accompanying color bar on the right indicating the value scale from approximately 0.0 to 1.0.
About the Author:

Tianji Jiang (he/him) is currently pursuing a Ph.D. in Information Studies at the University of California Los Angeles (UCLA)’s School of Education and Information Studies program. His passion pivots around advocating for the openness and sharing of research data, with a specific focus on how libraries can contribute to the research data life cycle and maximize the benefits of open data. He has worked as a RITC (Research and Instructional Technology Consultant) since 2022.

Banner image source; all other images by author