Monday, August 24, 2015

How Well Do You Know Your Collection API?

I recently spent some time going over the Java collection API and even though, it can be said it is one of the most used API in everyday programming with Java, I still had some "Oh I totally forgot about that" moments to "I did not even know that was so" realizations.

I will note down some of the interesting observations in this post, being as succinct as possible and backing with code examples when needed. It is intended to be a place I can always consult to get an overview of the API.


General Overview

  • The collection API can be seen as:
    - a (pun not intended) collection of  interfaces,
    - implementations of these interfaces and
    - algorithms and operations that can be applied to the elements of a collection.
  • The Collection framework comes with the Collections class, which is a utility class that can be used to create different kinds of special purpose collections. e.g. Collections that are wrapped with special behaviour,  for example Collections.unmodifiableMap method which can be used to create a map that is final or Collections.synchronizedList method which creates thread safe lists.
  • The Collections class also contains methods that can be used to apply general purpose operations like shuffle, sort, search etc.
  • Most of the general purpose operations exposed in the Collections utility class only work with lists. for example, only lists can be sorted using Collections.sort method.
  • The collection API can be divided into two distinct hierarchies. One hierarchy contains only scalar value elements, the other contains a key-value collection of elements. 
The Collection Interfaces.

Some Direct Implementations.

Set SortedSet NavigableSet List Queue Deque Map SortedMap NaviableMap
HashSet
ArrayList ArrayDeque  HashMap
LinkedHashSet  LinkedList  LinkedList LinkedHashMap 

TreeSet
 TreeMap

Scalar Interfaces.

Collection Interface
  • The Collection Interface is the root interface in the hierarchy that contains scalar values.
  • The Java Standard Library does not provide any direct implementation of the Collection Interface.
  • All other interfaces in the Collection framework that contains scalar values inherits from the Collection Interfaces.
  • Collection interface provides 3 ways in which collections can be transversed. Using the Iterator, for-each construct and Aggregate operations. These three methods applies to all the interfaces derived from the Collection interface.
  • Using the Iterator to transverse a collection is the only safe way if the collection will be modified during the iteration.
  • Collection interface defines a toArray method which can be used to create an array out of the contents of the collection.
  • The toArray method can be "parameterized" i.e. specify the type of objects the array formed should contain. This is not done using the normal syntax in generics (diamond syntax) but by passing a new instance of the array type wanted. for example:
  • // if list is a collection of integers
    
    // creates an array of objects
    Object[] arrayofobjects = list.toArray();
    
    // creates an array of Integers
    Integers[] arrayofintegers = list.toArray(new Integer[0]);
  • All the methods that can modify the contents of a collection (eg, add, remove, addAll etc) return a boolean to indicate if the modification operation was successful or not; except the clear method which has a void return type.

First Level Interfaces.

What follows are the interfaces that extends directly the Collection Interface.

Set Interface
  • The Set interface has only methods inherited from Collection interface. Other subinterfaces add to the methods found in the Collection interface, but Set interface does not.
  • Set does not allow duplicate elements.
  • The Java Standard Library comes with 2 direct general purpose implementations of the Set interface: HashSet, and LinkedHashSet. (TreeSet not included, as TreeSet directly implements NavigableSet).
  • The Java Standard Library comes with 2 special purpose implementations of the Set interface: EnumSet and CopyOnWriteArraySet 
  • HashSet provides the best performance in most use cases, but it does not guarantee that its contents would be in any particular order.
  • TreeSet on the other hand, guarantees ordering based on the natural ordering of its contents. For example a TreeSet of String will have it's contents ordered Alphabetically.
  • LinkedHashSet keeps it contents ordered based on the order of insertion.
  • EnumSet is the ideal implementation that should be used whenever the elements contained in the set will be enums.
  • CopyOnWriteArraySet as the name conveys is a special implementation that creates a copy of the set on every action that mutates the set. It also copies the contents of the sets when using the Iterator to transverse it.
List Interface
  • The List keeps an order, based on insertion, of its elements
  • The List interface defines two methods, set, and remove which can be used to set and remove an element in the list. The cool thing about these methods is that they return the element that is being overwritten (in the case of set) and the element that is being removed, (in the case of remove).
  • Be careful when using the remove method with a list of integers. List.remove has two variants, one where you pass in the entry you want to remove: List.remove(Object toRemove), and one where you pass in the index to remove from List.remove(int pos). Passing in an integer, with the intention of removing that integer won’t work; the integer would be treated as an index position to remove from. eg:
  • // If I have a list of string I can remove using the value
            List<String> listOfStrings = new ArrayList();
            listOfStrings.add("1"); //index 0
            listOfStrings.add("2"); // index 1
            listOfStrings.add("3"); // index 2
            listOfStrings.add("4"); // index 3
            listOfStrings.add("5"); // index 4
            listOfStrings.add("6"); // index 5
            listOfStrings.add("7"); // index 6
    
            // removes 5
            listOfStrings.remove("5");
            System.out.println(listOfStrings);
            // [1, 2, 3, 4, 6, 7]
    
    
            // If I have a list of Integers, I can't remove 
            // based on the value
            List<Integer> listOfIntegers = new ArrayList();
            listOfIntegers.add(1); //index 0
            listOfIntegers.add(2); // index 1
            listOfIntegers.add(3); // index 2
            listOfIntegers.add(4); // index 3
            listOfIntegers.add(5); // index 4
            listOfIntegers.add(6); // index 5
            listOfIntegers.add(7); // index 6
    
            // won't remove 5, but will remove 6 as 6 is in index 5
            listOfIntegers.remove(5);
            System.out.println(listOfIntegers);
            // [1, 2, 3, 4, 5, 7]
  • Normally the addAll method adds another collection to the end of the list. It also has another variant which allows specifying where in the list the other collection should be added. Comes in handy when collection needs to be inserted in the middle of another list.
  • The List interface provides a version of Iterator referred to as listIterator that allows iterating through a list in both directions. The normal Iterator only allows for forward Iteration.
  • It is possible to have a Range-View access to a List, where we create a list that is a sublist of the original one and perform operations on the sublist that propagates back to the parent list. This is done via the subList method.
  • When using subList(int fromIndex, int toIndex)to return a sublist, the element at fromIndex is included in the returned list while the element at toIndex is excluded, making the returned list, half opened.
  • Note that the sub list created from using the subList method becomes usable (throws an exception) if the parent list is modified. eg:
  • List<String> parentList = new ArrayList<String>();
            parentList.add("one");
            parentList.add("two");
            parentList.add("three");
            parentList.add("four");
            // create a sublist
            List<String> subList = parentList.subList(1, 3);
            System.out.println(parentList);
            // [one, two, three, four]
    
            System.out.println(subList);
            // [two, three]
            // modify the parentlist
            parentList.add("five");
            // trying to access the subList
           // will throw ConcurrentModificationException
           System.out.println(subList);
    
  • The Java Standard Library comes with 2 direct general purpose implementations of the List interface: ArrayList and LinkedList
  • ArrayList will do most of the time. Especially for operations that involves access content of the list via its position. For operation that involves iterating and removing elements, LinkedList might perform better
  • The Java Standard Library comes with 2 special purpose implementations of the List interface: CopyOnWriteArrayList and Vector
  • CopyOnWriteArrayList as the name conveys is a special implementation that creates a copy of the list on every action that mutates the set. It also copies the contents of the list when using the Iterator to transverse it.
Queue Interface
  • Queue models having the contents of a collection in a form that is exactly as signified by the literal translation of the word Queue. Which means, in most cases, the first element that gets added to the Queue will be the first element to be removed from the Queue. i.e. FIFO
  • We have the PriorityQueue which does not follow the FIFO rule and allows the removal of elements in the Queue based on their ordering. The ordering can be the default natural ordering or can be specified using a comparator. 
  • It is possible for a Queue implementation to restrict the number of elements that it holds; such queues are known as bounded queues. The general purpose queues are not bounded. Some of the Queue implementation meant for concurrent use are.
  • LinkedList implements also the Queue interface (together with the List interface)
  • All methods that add elements or remove elements from a Queue comes in two variants: A variant that throws an exception when the operation fails, the other returns a value (either boolean or null) to indicate if the operation was successful. 
  • Type of Operation Throws exception Returns Special value
    Insert
    (Adds an element to the Queue)
    add(e) offer(e)
    Remove
    (Removes an element from the Queue)
    remove() poll()
    Examine
    (Retrieve an element from the Queue without removing it)
    element() peek()

Second Level Interfaces.

What follows are interfaces that extends from the first level interfaces.

Deque Interface
  • Deque extends the Queue interface which is a first level interface.
  • A Deque is a double-ended-queue.
  • The ArrayDeque and LinkedList classes implement the Deque interface.
  • The LinkedBlockingDeque class is the implementation of the Deque interface for concurrent use.
  • Just as Queue interface, the Deque has two variants to its methods which mutates the Deque. One throws an exception, the other returns special value to indicate the status of the operation.
  • Type of OperationOperates on the head of the dequeOperates on the tail of the deque
    Insert
    (Adds an element to the Queue)
    addFirst(e) //throws ex
    offerFirst(e) //return value
    addLast(e) //throws ex
    offerLast(e) //return value
    Remove
    (Removes an element from the Queue)
    removeFirst() // throws ex
    pollFirst() // return value
    removeLast() // throws ex
    pollLast() // return value
    Examine
    (Retrieve an element from the Queue without removing it)
    getFirst()  // throws ex
    peekFirst() // return value
    getFirst()  // throws ex
    peekFirst() // return value

SortedSet Interface
  • SortedSet extends the Set interface which is a first level interface.
  • SortedSet interface demands that elements are maintained in a sorted order. The sorted order could be the natural ordering of the elements or based on a Comparator that is passed to the SortedSet at creation.
  • There is no direct general purpose implementation of the SortedSet interface. (same with SortedMap).
  • The SortedSet allows the following special operations: 
    • Subset view operations: Allows accessing a sub-set of the elements
    • Endpoints operations: Allows easy access to first and last elements
  • Unlike the sub list operations, the SortedSet's subset view remains valid even if the parent set is modified. Changes to the subset view write back to the parent sorted set and vice versa.
  • SortedSet subset view is available via the subSet methods.
  • Unlike with lists where index is passed in the subList method, in SortedSet, objects are passed in, with the first one representing where the subset should start from, and second where it should end.
  • The set returned by the subSet is also half opened. That is, it includes the first element but not the second element.
  • If it is a SortedSet of Strings, to make the subSet returned a close interval, that is one where both the start element and ending element are included, then append "\0" to the last element:
  • If it is a SortedSet of Strings, to make the subSet returned be an open interval, that is, a set not containing both the first and ending element then append "\0" to the first element
    SortedSet sortedSet = new TreeSet<>();
            sortedSet.add("A");
            sortedSet.add("B");
            sortedSet.add("C");
            sortedSet.add("D");
            sortedSet.add("E");
            sortedSet.add("D");
            
            // prints [B, C, D]
            System.out.println(sortedSet.subSet("B", "E"));
            // prints [B, C, D, E]
            System.out.println(sortedSet.subSet("B", "E\0"));
            // prints [C,D]
           System.out.println(sortedSet.subSet("B\0", "E"));
    
  • SortedSet also provides the headSet("element") method which can be used to return a subset from the beginning of the set to, but not including the element specified.
  • SortedSet also provides the tailSet("element") method which can be used to return a subset from beginning with the element specified till the end of the set.
  • It is worth noting that TreeSet class does not implements the SortedSet interface directly. It implements the NavigableSet interface which extends the SortedSet interface.

Third Level Interfaces.

What follows are the third level interfaces that extends directly from the second level interfaces.

NavigableSet Interface
  • NavigableSet extends the SortedSet interface.
  • The Java Standard Library comes with 2 general purpose implementations of the NavigableSet interface: TreeSet, and KetSet (used in TreeMap).
  • The NavigableSet interface provides methods that makes it easier to navigate within the set.
  • The Java Standard Library also comes with couple of other implementation of the NavigableSet in the concurrent package and thus suitable for use in concurrent computations.

Key Value Interfaces.

Next up is the other half of the Collection framework: the hierarchy of interfaces that holds key-value elements.

Map Interface
  • The Java Standard Library comes with 3 general purpose implementations of the Map interface: HashMap, TreeMap and LinkedHashMap.
  • The Java Standard Library comes with 3 special purpose implementations of the Map interface: EnumMapWeakHashMap and IdentityHashMap.
  • The Map interface has Collection view methods which allow a Map to be viewed as a Collection. The three types of method views are: 
  • The collection view methods are the only way to iterate through a Map.
SortedMap Interface

  • SortedMap is extends the Map interface. It maintains its entries in order, based on the keys' natural ordering, or according to a Comparator provided at the time of the SortedMap creation.
  • There is no direct general purpose implementation of the SortedMap interface. (same with SortedSet).
  • The SortedMap shares most of the behaviour of the SortedSet interface, only applied to a map.

NavigableMap Interface
  • NavigableMap extends the SortedMap interface.
  • The Java Standard Library comes with a general purpose implementations of the NavigableMap interface: TreeMap.
  • The NavigableMap interface provides methods that makes it easier to navigate within the map.
  • The Java Standard Library also comes with couple of other implementation of the NavigableMap in the concurrent package and thus suitable for use in concurrent computations.
Common Use cases
  • Use EnumSet, when you need to keep enum types in a set.
  • Use EnumMap when you have a need to keep a map with the keys being enum types.
  • CopyOnWriteArraySet is most appropriate when you have sets that are rarely modified but frequently iterated and could be accessed concurrently.
  • For most everyday use, use HashSet.
  • If you need a set that is ordered, use TreeSet.
  • If you need to keep an ordered set and still want better performance, go with LinkedHashSet. LinkedHashSet offers keeping ordered elements. It is not as fast HashSet, but performs better than TreeSet.
  • For almost every day use, ArrayList will suffice.
  • For most use cases, HashMap will suffice.
  • Use TreeMap if you need a map whose keys are ordered.
  • When working in concurrent environment, use implementations in the java.util.concurrent package.

Miscellaneous
  • CopyOnWriteArraySet's iterators do not support the mutative remove operation.
  • In other to be able to use the Collections.sort method, the elements of the list should implement the Comparable Interface, if not ClassCastException will be thrown.
  • Use Collections.sort(list, Comparator) to sort a list using the sorting mechanism provided in the Comparator. This can be useful when you want to override the default comparable behaviour. It is also useful when you need to sort elements that do not implement the Comparable interface. For example if we want to sort a String not alphabetically, but based on the length of the String.
  • Use Collections.addAll to form union between two collections, Collections.retainAll for getting the intersection.
  • You can convert one type of a Collection to the other by passing the old collection as a constructor argument of the new Collection. For example turn a List into a Set:

    List<String> list = new ArrayList<>();
    list.add("Red");
    list.add("Green");
    list.add("Blue");
    // turn into a Set
    Set<String> set = new HashSet<>(list);
  • Do not use HashMap in concurrent environment. For more information check the post Digging Deeper Into Java's HashMap

No comments: