Mark Turner, a longtime JASNA member, has combined his interest in Jane Austen with his day job in the field of text analytics, to produce some data visualizations of "Pride and Prejudice".
Mark processed the text of Pride and Prejudice to find out how often words follow each other in the text. The software he uses is called 'nPath', an algorithm for finding sequences in large amounts of data; it was created by The Teradata Corporation (Mark worked at Teradata until recently). The user specifies a pattern to look for, for example: "Every sequence of two words following the word 'Elizabeth'", and the nPath algorithm then finds all the matches for this pattern throughout the novel, looking through all 119,500 words in just a few seconds.
Mark then created a visualization called a Sankey graph, which shows the sequences in graphic form. The thickness of the curved lines shows how often the word on the left is followed by the word on the right. The Sankey graph was named for a British officer, Captain Matthew Henry Phineas Riall Sankey, but an early example by Charles Menard, a Frenchman, is the famous diagram of Napoleon's forces shrinking during his Russian campaign in 1812. So the modern computer visualizations here have a Regency connection!
Those interested in Menard's diagram of Napoleon's campaign can see it at https://www.edwardtufte.com/tufte/posters.