Counting large nubmers of strings. And sorting by 2 factors

mateo_sof 2012-02-26 11:13:10 UTC #1

Hello, I have problem with counting repeated words from a input.
Im trying to simulate that process by using a file with 100 000 lines, each line looks like :
1. string StringsIWantCount
...
n. string StringsIWantCount

Im using a HashMap as a data structure, I chose it because of O(1) search and put.
Reading from a file or spoj input is done by buffered scanner class.

while reading from a file I check whether the string is in a hashmap or not, If it is I retrieve it value and increase by 1, otherwise Im putting that word into my data structure.

        HashMap<String,Integer> Punkty = new HashMap<String, Integer>() ;
        while (fs.nextString().length()>0)//odczytanie numeru
        {
            fs.nextString();//odczytanie nazwiska
            String s = fs.nextString().toUpperCase();//odczytanie
            if(Punkty.containsKey(s))
                Punkty.put(s,Punkty.get(s)+1);
            else
                Punkty.put(s,1);
        }

After reading whole file/spoj input I have to sort my hashmap by value and when values are the same, by key["a" to "z"]

 List<Map.Entry<String, Integer>> list = new Vector<Map.Entry<String, Integer>>(Punkty.entrySet());
        java.util.Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
            public int compare(Map.Entry<String, Integer> e, Map.Entry<String, Integer> e1) {
                int t=0;
                if(e.getValue().equals(e1.getValue()))
                    t = e.getKey().compareTo(e1.getKey());
                else
                   t = (e.getValue() < e1.getValue() ? 1 : -1);
                return t;
            }});

I chopped my program into a few parts, and measured execution time:

//
Number of names: 54
Read and write to HashMap time: 130 ms
Sorting time: 2 ms
output time: 3 ms
Execution time: 135 ms

//
Number of names: 1404
Read and write to HashMap time: 130 ms
Sorting time: 25 ms
output time: 5 ms
Execution time: 160 ms

//
Number of names: 34032
Read and write to HashMap time: 147 ms
Sorting time: 290 ms
output time: 3 ms
Execution time: 440 ms

//
Number of names: 94806
Read and write to HashMap time: 154 ms
Sorting time: 433 ms
output time: 3 ms
Execution time: 590 ms