Aconcordanceis a list of words that appear in a textdocument along with the numbers of the lines on which the words appear. It is just like an index of a book except that it lists line numbers instead of page.numbers. Concordances are useful for analyzing documents to find word frequencies and associations that are not evident from reading the document directly. This program builds a concordance for a text file. The run here uses this particular text taken from Shakespeare’s play Julius Caesar. The first part of the resulting concordance is shown on the right.
package com.albertshao.ds.map; // Data Structures with Java, Second Edition // by John R. Hubbard // Copyright 2007 by McGraw-Hill import java.io.*; import java.util.*; public class Concordance { private Map<String,String> map = new HashMap<String,String>(); public Concordance(String file) { int lineNumber = 0; try { Scanner input = new Scanner(new File(file)); while (input.hasNextLine()) { String line = input.nextLine(); ++lineNumber; StringTokenizer parser = new StringTokenizer(line,",.;:()-!?' "); while (parser.hasMoreTokens()) { String word = parser.nextToken().toUpperCase(); String listing = map.get(word); if (listing == null) { listing = "" + lineNumber; } else { listing += ", " + lineNumber; } map.put(word,listing); } } input.close(); } catch(IOException e) { System.out.println(e); } } public void write(String file) { try { PrintWriter output = new PrintWriter(file); for (Map.Entry<String,String> entry : map.entrySet()) { output.println(entry); } output.close(); } catch(IOException e) { System.out.println(e); } } }
package com.albertshao.ds.map; // Data Structures with Java, Second Edition // by John R. Hubbard // Copyright 2007 by McGraw-Hill public class TestConcordance { public static final String PATH = "D:\\machao\\DataStructure\\src\\com\\albertshao\\ds\\map\\"; public static final String IN_FILE = "Shakespeare.txt"; public static final String OUT_FILE = "Shakespeare.out"; public static void main(String[] args) { Concordance c = new Concordance(PATH+IN_FILE); c.write(PATH+OUT_FILE); } }
【Result】
The content in the Shakespeare.txt:<span style="font-family:Arial;">Friends, Romans, countrymen, lend me your ears! I come to bury Caesar, not to praise him. The evil that men do lives after them, The good is oft interred with their bones; So let it be with Caesar. The noble Brutus Hath told you Caesar was ambitious; If it were so, it was a grievous fault; And grievously hath Caesar answer'd it. Here, under leave of Brutus and the rest, -- For Brutus is an honourable man; So are they all, all honourable men. Come I to speak in Caesar's funeral. He was my friend, faithful and just to me. But Brutus says he was ambitious; And Brutus is an honourable man. He hath brought many captives home to Rome. Whose ransoms did the general coffers fill: Did this in Caesar seem ambitious? When that the poor have cried, Caesar hath wept; Ambition should be made of sterner stuff. Yet Brutus says he was ambitious; And Brutus is an honourable man. You all did see that on the Lupercal I thrice presented him with a kingly crown, Which he did thrice refuse: was this ambition? Yet Brutus says he was ambitious; And, sure, is an honourable man. I speak not to disprove what Brutus spoke, But here I am to speak what I do know. You all did love him once, not without cause. What cause withholds you, then, to mourn for him? O judgement! thou art fled to brutish beasts, And men have lost their reason! </span>The result also means the content in the Shakespeare.out:
<span style="font-family:Arial;font-size:14px;">GRIEVOUS=7 WHAT=28, 29, 31 KINGLY=24 REST=9 JUDGEMENT=32 SURE=27 CAUSE=30, 31 REFUSE=25 ME=1, 13 DO=3, 29 THEIR=4, 33 FUNERAL=12 NOT=2, 28, 30 YET=21, 26 CAESAR=2, 5, 6, 8, 12, 18, 19 LEAVE=9 THAT=3, 19, 23 COFFERS=17 HIM=2, 24, 30, 31 ARE=11 MADE=20 MY=13 CROWN=24 MOURN=31 FRIEND=13 THIS=18, 25 CAPTIVES=16 OFT=4 PRAISE=2 ROMANS=1 YOU=6, 23, 30, 31 HERE=9, 29 BURY=2 GRIEVOUSLY=8 WITHHOLDS=31 D=8 BEASTS=32 A=7, 24 O=32 LEND=1 WITHOUT=30 I=2, 12, 24, 28, 29, 29 SAYS=14, 21, 26 ANSWER=8 ON=23 CRIED=19 BUT=14, 29 STUFF=20 WEPT=19 ART=32 YOUR=1 S=12 OF=9, 20 AMBITIOUS=6, 14, 18, 21, 26 MANY=16 FLED=32 GENERAL=17 HE=13, 14, 16, 21, 25, 26 INTERRED=4 MEN=3, 11, 33 EVIL=3 FRIENDS=1 POOR=19 NOBLE=5 KNOW=29 WHOSE=17 LUPERCAL=23 BRUTISH=32 FAULT=7 THE=3, 4, 5, 9, 17, 19, 23 WERE=7 FOR=10, 31 THEY=11 THRICE=24, 25 AND=8, 9, 13, 15, 22, 27, 33 IF=7 UNDER=9 THEM=3 THEN=31 SEE=23 IN=12, 18 FILL=17 IS=4, 10, 15, 22, 27 ROME=16 IT=5, 7, 7, 8 WAS=6, 7, 13, 14, 21, 25, 26 ALL=11, 11, 23, 30 HAVE=19, 33 TOLD=6 LOST=33 ONCE=30 FAITHFUL=13 BRUTUS=5, 9, 10, 14, 15, 21, 22, 26, 28 AM=29 WITH=4, 5, 24 AN=10, 15, 22, 27 WHICH=25 HONOURABLE=10, 11, 15, 22, 27 TO=2, 2, 12, 13, 16, 28, 29, 31, 32 SPOKE=28 SHOULD=20 LIVES=3 BONES=4 BE=5, 20 AFTER=3 COUNTRYMEN=1 SPEAK=12, 28, 29 DID=17, 18, 23, 25, 30 AMBITION=20, 25 COME=2, 12 SEEM=18 REASON=33 BROUGHT=16 LOVE=30 STERNER=20 MAN=10, 15, 22, 27 WHEN=19 DISPROVE=28 PRESENTED=24 SO=5, 7, 11 HATH=6, 8, 16, 19 JUST=13 HOME=16 THOU=32 EARS=1 GOOD=4 LET=5 RANSOMS=17 </span>