SLIDE 1 Who? Networks of social entities
Max Kemman
University of Luxembourg December 13, 2016
Doing Digital History: Introduction to Tools and Technology
SLIDE 2 Today
Final assignment
- Preparing the data with Palladio
- (Cleaning the date column with Google Spreadsheets)
- Visualising with Palladio
- Next time
SLIDE 3
Final assignment
Some additional info about the final assignment The computers in the TIC-Lab are powerful enough to work with all mails in Google Spreadsheets (You may also use Excel if you prefer, but more difficult for me to help when you're stuck) Create a selection and argument why this selection Deadline: 20 January 2017 23:59 You receive grades on Friday 27 January 2017
SLIDE 4 Final assignment data
All data is in Moodle in folder Final Assignment:
allmails-metadata.csv & allmails-metadata.ods
- allmails-ner.csv & allmails-metadata.ods (including mentioned people, organisations,
locations)
- allmails-geocoded.csv (about 108k locations)
- Folder with text files per 1k
SLIDE 5
Preparing the data with Palladio
To visualize the coded data, we will use Palladio: http://hdlab.stanford.edu/palladio/ First we need to prepare the data for Palladio
SLIDE 6
Loading the data
Click Start We will use the 1000mails-cleandate.csv file from Moodle in the Who folder) Drag the CSV file onto the text input field Click Load
SLIDE 7
Preparing the data
You will get a list of the columns from the spreadsheet You can already give your project a title and your data table as well Do not close this tab or refresh, you will have to start over! Let's look at several columns
SLIDE 8
From
Sort the values by Frequency Check the data type Click Close
SLIDE 9
Date
To set the data type to date we need the format: YYYY-MM-DD In our original CSV the format included the clock, but here we have the data in the right format so it's automatically recognised See next section for how to clean the date Click Close
SLIDE 10
People
This contains the named entities per email To separate multiple people in an email, enter the delimiter | in the Multiple values box Click Close
SLIDE 11
People
This contains the named entities per email To separate multiple people in an email, enter the delimiter | in the Multiple values box Click Close
SLIDE 12
Cleaning the date column with Google Spreadsheets
Here we used Google Spreadsheets, but also possible in Excel & LibreOffice You can skip this for now, but important for final assignment
SLIDE 13
Cleaning the Date field
Select the Date column, and go to Format > Number > More Formats > More date and time formats
SLIDE 14 Cleaning the Date field
Select the appropriate
click Apply
SLIDE 15
Cleaning the Date field
The Date column will now have the appropriate form
SLIDE 16
Exporting the CSV
Click File > Download as > Comma-separated values (.csv, current sheet)
SLIDE 17
Visualising with Palladio
Now let's look at the network by selecting Graph at the top bar As a source, choose the From and close the popup As a target, choose the To and close the popup Wait and watch the result!
SLIDE 18
SLIDE 19
Palladio Graph Settings
Try the two Highlighting check-boxes Try Size nodes What can we learn from this graph?
SLIDE 20
SLIDE 21 Facet
To filter for certain attributes, select Facet in the lower-left corner As a Dimension select From and close the popup Now you can select to filter emails only from one person You could alternatively filter emails mentioning a specific person, location,
To refine even further, we can select more facets by selecting the Dimension and selecting more options To remove a facet, delete the red trashcan in the lower right corner
SLIDE 22
Facet selection from From column
SLIDE 23
Facet selection from People column
SLIDE 24
Timeline
We can also create a timeline of the emails by clicking Timeline Drag the mouse in the timeline to create a bar that acts as a filter And drag the bar to move it around so you can see how the network develops: you could compare months or years To remove the timeline filter, delete the red trashcan in the lower right corner
SLIDE 25
Timeline
SLIDE 26
Filtering one part of the timeline
SLIDE 27
Filtering another part of the timeline
SLIDE 28
Why filtering?
The network can become quite large when you have more emails, or when you select one of the people, locations, organisations columns in the graph Filtering will help to be able to read the spaghetti/graph See next slide an example of a spaghetti ball (trying to do this might make your computer quite slow)
SLIDE 29
SLIDE 30 Sharing
To export a graph, click the Download button in the settings (the lower one). This will export an SVG file that you can embed in your HTML report with
img src Palladio Graph.svg alt graph
SLIDE 31
To export the entire workspace, click the upper Download button. This will export a JSON file that you can load next time (see next slide)
SLIDE 32
If you previously exported your workspace, you can load it in by selecting "Load an existing project" and choosing the JSON file. Also useful to share with project partners
SLIDE 33
For next time
20 December
Wrap-up