Thursday, 24 September 2015

Brief Overview of Java Collection Framework

Thanks to an excellent Java Concept of the Day, this is a brief description of main interfaces and classes of Java Collection Framework. Hopefully it would be handy for future references.


What is Collection Framework In Java?


Collection Framework in java is a centralized and unified theme to store and manipulate the group of objects. Java Collection Framework provides some pre-defined classes and interfaces to handle the group of objects. Using collection framework, you can store the objects as a list, set, queue or map and perform operations like adding an object or removing an object or sorting the objects without much hard work.


The entire collection framework is divided into four interfaces:

  1. List  —> It handles sequential list of objects. ArrayList, Vector and LinkedList classes implement this interface.
  2. Queue  —> It handles special list of objects in which elements are removed only from the head. LinkedList and PriorityQueue classes implement this interface.
  3. Set  —> It handles list of objects which must contain unique element. This interface is implemented by HashSet and LinkedHashSet classes and extended by SortedSet interface which in turn, is implemented by TreeSet.
  4. Map  —> This is the one interface in Collection Framework which is not inherited from Collection interface. It handles group of objects as Key/Value pairs. It is implemented by HashMap and HashTable classes and extended by SortedMap interface which in turn is implemented by TreeMap.
  5. Three of above interfaces (List, Queue and Set) inherit from Collection interface. Although, Map is included in collection framework it does not inherit from Collection interface.


Collection Interface:


Collection interface is the root level interface in the collection framework. List, Queue and Set are all sub interfaces of Collection interface. JDK does not provide any direct implementations of this interface. But, JDK provides direct implementations of it’s sub interfaces.

Collection interface extends Iterable interface which is a member of java.lang package. Iterable interface has only one method called iterator(). It returns an Iterator object, using that object you can iterate over the elements of Collection.


List Interface:


List Interface represents an ordered or sequential collection of objects. This interface has some methods which can be used to store and manipulate the ordered collection of objects. The classes which implement the List interface are called as Lists. ArrayList, Vector and LinkedList are some examples of lists. You have the control over where to insert an element and from where to remove an element in the list.

Here are some properties of lists:
  1. Elements of the lists are ordered using Zero based index.
  2. You can access the elements of lists using an integer index.
  3. Elements can be inserted at a specific position using integer index. Any pre-existing elements at or beyond that position are shifted right.
  4. Elements can be removed from a specific position. The elements beyond that position are shifted left.
  5. A list may contain duplicate elements.
  6. A list may contain multiple null elements.

List interface extends Collection interface. So, all 15 methods of Collection interface are inherited to List interface. Along with these methods, another 9 methods are included in the List interface to support the properties of lists.


Queue Interface:


The Queue Interface extends Collection interface. It defines queue data structure which is normally First-In-First-Out. Queue is a data structure in which elements are added from one end and elements are deleted from another end. But, exception being the Priority Queue in which elements are removed from one end, but elements are added according to the order defined by the supplied comparator.

Queue is a data structure where elements are added from one end called tail of the queue and elements are removed from another end called head of the queue. Queue is also first-in-first-out type of data structure (except priority queue). That means an element which is inserted first will be the first element to be removed from the queue. You can’t add or get or set elements at an arbitrary position in the queues.

Properties of queues:

  1. Null elements are not allowed in the queue. If you try to insert null object into the queue, it throws NullPointerException.
  2. Queue can have duplicate elements.
  3. Unlike a normal list, queue is not random access. i.e you can’t set or insert or get elements at an arbitrary positions.
  4. In most of cases, elements are inserted at one end called tail of the queue and elements are removed or retrieved from another end called head of the queue.
  5. In the Queue Interface, there are two methods to obtain and remove the elements from the head of the queue. They are poll() and remove(). The difference between them is, poll() returns null if the queue is empty and remove() throws an exception if the queue is empty.
  6. There are two methods in the Queue interface to obtain the elements but don’t remove. They are peek() and element(). peek() returns null if the queue is empty and element() throws an exception if the queue is empty.



Deque Interface:


The Deque Interface is the short name for “Double Ended Queue“. As the name suggest, Deque is a linear collection of objects which supports insertion and removal of elements from both the ends. The Deque interface defines the methods needed to insert, retrieve and remove the elements from both ends.

The Deque interface is introduced in Java SE 6. It extends Queue interface.

The main advantage of Deque is that you can use it as both Queue (FIFO) as well as Stack (LIFO). The Deque interface has all those methods required for FIFO and LIFO operations. Some of those methods throw an exception if operation is not possible and some methods return a special value (null or false) if operation fails. 

How Deque – Double Ended Queue Works?

As already said, Deque is nothing but the double ended queue. That means, you can insert, retrieve and remove the elements from both the ends.

Deque As Queue:

As Deque interface extends Queue interface, it inherits all methods of Queue interface. So, you can use all those inherited methods to perform Queue operations. Along with them, methods defined in the Deque interface can also be used for Queue operations.

Deque As Stack:

Deque interface has two more methods – pop() and push(). These two methods make Deque to function as a stack (Last-In-First-Out). Along with these two methods, you can also use addFirst(), peekFirst() and removeFirst() for stack operations.

Properties of dequeues:

  1. Unlike Queue, Deque can have null elements. But, it is recommended not to insert null elements as many methods return null to indicate Deque is empty.
  2. Deque can have duplicate elements.
  3. You can’t set or get or insert the elements at an arbitrary position of Deque. i.e Random access is not possible with the Deque.
  4. You can use removeFirstOccurrenec(E e), removeLastOccurrence(E e) and remove(E e) methods to delete the elements from the Deque.



Set Interface:


The Set interface defines a set. The set is a linear collection of objects with no duplicates. Duplicate elements are not allowed in a set. The Set interface extends Collection interface. Set interface does not have it’s own methods. All it’s methods are inherited from Collection interface. The only change that has been made to Set interface is that add() method will return false if you try to insert an element which is already present in the set.

Properties of sets:
  1. Set contains only unique elements. It does not allow duplicates.
  2. Set can contain only one null element.
  3. Random access of elements is not possible.
  4. Order of elements in a set is implementation dependent. HashSet elements are ordered on hash code of elements. TreeSet elements are ordered according to supplied Comparator (If no Comparator is supplied, elements will be placed in ascending order) and LinkedHashSet maintains insertion order.
  5. Set interface contains only methods inherited from Collection interface. It does not have it’s own methods. But, applies restriction on methods so that duplicate elements are always avoided.
  6. One more good thing about Set interface is that the stronger contract between equals() and hashCode() methods. According to this contract, you can compare two set instances of different implementation types (HashSet, TreeSet and LinkedHashSet).
  7. Two set instances, irrespective of their implementation types, are said to be equal if they contain same elements.



SortedSet Interface:


The SortedSet interface extends Set interface. SortedSet is a set in which elements are placed according to supplied comparator. This Comparator is supplied while creating a SortedSet. If you don’t supply comparator, elements will be placed in ascending order.

Properties of sorted sets:
  1. SortedSet can not have null elements. If you try to insert null element, it gives NullPointerException at run time.
  2. As SortedSet is a set, duplicate elements are not allowed.
  3. SortedSet elements are sorted according to supplied Comparator. If you don’t mention any Comparator while creating a SortedSet, elements will be placed in ascending order.
  4. Inserted elements must be of Comparable type and they must be mutually Comparable.
  5. You can retrieve first element and last elements of the SortedSet. You can’t access SortedSet elements randomly. i.e Random access is denied.
  6. SortedSets returned by headSet(), tailSet() and subSet() methods are just views of the original set. So, changes in the returned set are reflected in the original set and vice versa.



NavigableSet Interface:


The NavigableSet is a SortedSet with navigation facilities. The NavigableSet interface provides many methods through them you can easily find closest matches of any given element. It has the methods to find out less than, less than or equal to, greater than and greater than or equal of any element in a SortedSet.

Properties of navigable sets:
  1. NavaigableSet can’t have null elements.
  2. NavigableSet doesn’t support duplicate elements.
  3. NavigableSet can be traversed and accessed in either ascending or descending order.
  4. Methods subSet(), headSet() and tailSet() differ from SortedSet interface in taking additional arguments describing whether upper bound and lower bound are inclusive or exclusive.



ArrayList Class:


ArrayList class, in simple terms, can be defined as re-sizable array. ArrayList is same like normal array but it can grow and shrink dynamically to hold any number of elements. ArrayList is a sequential collection of objects which increases or decreases in size as we add or delete the elements.

In ArrayList, elements are positioned according to Zero-based index. That means, elements are inserted from index 0. Default initial capacity of an ArrayList is 10. This capacity increases automatically as we add more elements to arraylist. You can also specify initial capacity of an ArrayList while creating it.

ArrayList class implements List interface and extends AbstractList. It also implements 3 marker interfaces – RandomAccess, Cloneable and Serializable.

Some properties of array lists:

  1. Size of the ArrayList is not fixed. It can increase and decrease dynamically as we add or delete the elements.
  2.  ArrayList can have any number of null elements.
  3. ArrayList can have duplicate elements.
  4. As ArrayList implements RandomAccess, you can get, set, insert and remove elements of the ArrayList from  any arbitrary position.
  5. When you insert an element in the middle of the ArrayList, the elements at the right side of that position are shifted one position right and when you delete an element, they will be shifted one position left. This feature of the ArrayList causes some performance issues as shifting of elements is time consuming if ArrayList has lots of elements.
  6. Elements are placed according to Zero-based index. That means, first element will be placed at index 0 and last element at index n-1, where ‘n’ is the size of the ArrayList.
  7. ArrayList is not synchronized. That means, multiple threads can use same ArrayList simultaneously.
  8. If you know the element, you can retrieve the position of that element.



Difference Between Iterator And ListIterator:


Iterator and ListIterator are two interfaces in Java collection framework which are used to traverse the collections. Although ListIterator extends Iterator, there are some differences in the way they traverse the collections.

1. Using Iterator, you can traverse List, Set and Queue type of objects. But using ListIterator, you can traverse only List objects. In Set and Queue types, there is no method to get the ListIterator object. But, In List types, there is a method called listIterator() which returns ListIterator object.
2. Using Iterator, we can traverse the elements only in forward direction. But, using ListIterator you can traverse the elements in both the directions – forward and backward. ListIterator has those methods to support the traversing of elements in both the directions.


Vector Class:


The Vector Class is also dynamically grow-able and shrink-able collection of objects like an ArrayList class. But, the main difference between ArrayList and Vector is that Vector class is synchronized. That means, only one thread can enter into vector object at any moment of time.

Vector class is preferred over ArrayList class when you are developing a multi threaded application. But, precautions need to be taken because vector may reduce the performance of your application as it is thread safety and only one thread is allowed to have object lock at any moment of time and remaining threads have to wait until a thread releases the object lock which is held by it. So, it is always recommended that if you don’t need thread safety environment, it is better to use ArrayList class than the Vector class.

Vector class has same features as ArrayList. Vector class also extends AbstractList class and implements List interface. It also implements 3 marker interfaces – RandomAccess, Cloneable and Serializable.

Some properties of vectors:

  1. The main feature of Vector class is that it is thread safety. All methods of Vector class are synchronized so that only one thread can execute them at any given time. This feature of Vector class is useful when you need thread safety code.
  2. Thread safety property of Vector class effects the performance of an application as it makes threads to wait for object lock.
  3. Capacity Increment: Capacity increment is an amount by which the capacity of the vector is automatically incremented whenever size of the vector exceeds it’s capacity. You can pass this capacity increment while creating a vector. If you don’t pass, capacity increment will be treated as zero and capacity of the vector will be doubled whenever size exceeds capacity.
  4. Unlike an ArrayList, you can set the size of the Vector manually. If the new size is greater than the current size, the new slots will be filled with null elements. If the new size is smaller than current size, then the extra elements will be discarded.
  5. You can traverse the vector using Enumeration object. Vector class has a method called elements() which returns an Enumeration object consisting of all elements of Vector.
  6. Vector class has separate methods to retrieve first and last element of vector object. You will not find these methods in ArrayList class. firstElement() retrieves first element and lastElement()method retrieves last element of the vector.



LinkedList Class:


In general terms, LinkedList class is a data structure where each element consist of three things. First one is the reference to previous element, second one is the actual value of the element and last one is the reference to next element.

The LinkedList class in Java is an implementation of doubly linked list which can be used both as a List as well as Queue. The LinkedList in java can have any type of elements including null and duplicates. Elements can be inserted and can be removed from both the ends and can be retrieved from any arbitrary position.

The LinkedList class extends AbstractSequentialList and implements List and Deque interfaces. It also implements 2 marker interfaces – Cloneable and Serializable.

Properties of linked lists:

  1. Elements in the LinkedList are called as Nodes. Where each node consist of three parts – Reference To Previous Element, Value Of The Element and Reference To Next Element. Below diagram shows how LinkedList looks like.
  2. Reference To Previous Element of first node and Reference To Next Element of last node are null as there will be no elements before the first node and after the last node.
  3. You can insert the elements at both the ends and also in the middle of the LinkedList. Below is the list of methods for insertion operations.
  4. You can remove the elements from the head, from the tail and also from the middle of the LinkedList.
  5. You can retrieve the elements form the head, from the middle and from the tail of the LinkedList.
  6. Insertion and removal operations in LinkedList are faster than the ArrayList. Because in LinkedList, there is no need to shift the elements after each insertion and removal. only references of next and previous elements need to be changed.
  7. Retrieval of the elements is very slow in LinkedList as compared to ArrayList. Becaues in LinkedList, you have to traverse from beginning or end (whichever is closer to the element) to reach the element.
  8. The LinkedList can be used as stack. It has the methods pop() and push() which make it to function as Stack.
  9. The LinkedList can also be used as ArrayList, Queue, SIngle linked list and doubly linked list.
  10. LinkedList can have multiple null elements.
  11. LinkedList can have duplicate elements.
  12. LinkedList class in Java is not of type Random Access. i.e the elements can not be accessed randomly. To access the given element, you have to traverse the LinkedList from beginning or end (whichever is closer to the element) to reach the given element.



PriorityQueue Class:


The PriorityQueue is a queue in which elements are ordered according to specified Comparator. You have to specify this Comparator while creating a PriorityQueue itsel. If no Comparator is specified, elements will be placed in their natural order. The PriorityQueue is a special type of queue because it is not a First-In-First-Out (FIFO) as in the normal queues. But, elements are placed according to supplied Comparator.

The PriorityQueue does not allow null elements. Elements in the PriorityQueue must be of Comparable type, If you insert the elements which are not Comparable, you will get ClassCastException at run time.

PriorityQueue class extends AbstractQueue class which in turn implements Queue interface. PriorityQueue also implements one marker interface – java.io.Serializable interface.

Properties of priority queues:

  1. Elements in the PriorityQueue are ordered according to supplied Comparator. If Comparator is not supplied, elements will be placed in their natural order.
  2. The PriorityQueue is unbounded. That means the capacity of the PriorityQueue increases automatically if the size exceeds capacity. But, how it grows is not specified.
  3. The PriorityQueue can have duplicate elements but can not have null elements.
  4. All elements of the PriorityQueue must be of Comparable type. Otherwise ClassCastException will be thrown at run time.
  5. The head element of the PriorityQueue is always the least element and tail element is always the largest element according to specified Comparator.
  6. The default initial capacity of PriorityQueue is 11.
  7. You can retrieve the Comparator used to order the elements of the PriorityQueue using comparator() method.
  8. PriorityQueue is not a thread safe.



ArrayDeque Class:


The ArrayDeque class in Java is introduced from JDK 1.6. It is an implementation of Deque Interface which allows insertion of elements at both the ends. It does not have any restrictions on capacity. It expands automatically as we add more elements. The ArrayDeque class extends AbstractCollection class and implements Deque interface. It also implements Cloneable and Serializable marker interfaces.

Properties of array dequeues:

  1. ArrayDeque is a resizable-array implementation of Deque interface like ArrayList class which is a resizable-array implementation of List interface. But, ArrayDeque is not a List.
  2. ArrayDeque does not have any capacity limit. It will grow automatically as we add elements.
  3. Default initial capacity of ArrayDeque is 16. It will increase at a power of 2 (24, 25, 26 and so on) when size exceeds capacity.
  4. ArrayDeque can be used as a stack (LIFO) as well as a queue (FIFO). ArrayDeque is faster than the Stack class when used as a stack and faster than the LinkedList class when used as a queue.
  5. Performance of ArrayDeque is sometimes considered as the best among the collection framework. It gives performance of O(1) for insertion, removal and retrieval operations. ArrayDeque class is recommended instead of Stack class (when you want stack data structure) and instead of LinkedList class (when you want queue data structure).
  6. You can’t perform indexed operations on ArrayDeque. ArrayDeque doesn’t have the methods to support those operations.
  7. ArrayDeque is not a thread safe.



HashSet Class:


The HashSet class in Java is an implementation of Set interface. HashSet is a collection of objects which contains only unique elements. Duplicates are not allowed in HashSet. HashSet gives constant time performance for insertion, removal and retrieval operations. It allows only one null element.

The HashSet internally uses HashMap to store the objects. The elements you insert in HashSet will be stored as keys of that HashMap object and their values will be a constant called PRESENT. This constant is defined as private static final Object PRESENT = new Object() in the source code of HashSet class.

HashSet class extends AbstractSet class and implements Set interface. It also implements Cloneable and Serializable marker interfaces.

Properties of hash sets:

  1. HashSet class uses HashMap internally to store the objects. The keys of that HashMap object will be the elements of HashSet and their values will be a constant.
  2. HashSet does not allow duplicate elements. If you try to insert a duplicate element, older element will be overwritten.
  3. HashSet can have maximum one null element.
  4. HashSet doesn’t maintain any order. The order of the elements will be largely unpredictable. And it also doesn’t guarantee that order will remain constant over time.
  5. HashSet offers constant time performance for insertion, removal and retrieval operations.
  6. HashSet class is not synchronized. If you want synchronized HashSet, use Collections.synchronizedSet() method.



LinkedHashSet Class:


The LinkedHashSet in java is an ordered version of HashSet which internally maintains one doubly linked list running through it’s elements. This doubly linked list is responsible for maintaining the insertion order of the elements. Unlike HashSet which maintains no order, LinkedHashSet maintains insertion order of elements. i.e elements are placed in the order they are inserted. LinkedHashSet is recommended over HashSet if you want a unique collection of objects in an insertion order.

The LinkedHashSet class extends HashSet class and implements Set interface. It also implements Cloneable and Serializable marker interfaces.

Properties of linked hash sets:

  1. LinkedHashSet internally uses LinkedHashMap to store it’s elements just like HashSet which internally uses HashMap to store it’s elements.
  2. LinkedHashSet maintains insertion order. This is the main difference between LinkedHashSet and HashSet.
  3. LinkedhashSet also gives constant time performance for insertion, removal and retrieval operations. The performance of LinkedHashSet is slightly less than the Hashset as it has to maintain doubly linked list internally to order it’s elements.
  4. Iterator returned by LinkedHashSet is fail-fast. i.e if the LinkedHashSet is modified at any time after the Iterator is created, it throws ConcurrentModificationException.
  5. LinkedHashSet doesn’t allow duplicate elements and allows only one null element.
  6. LinkedHashSet is not synchronized. To get the synchronized LinkedHashSet, use Collections.synchronizedSet() method.



TreeSet Class:


The TreeSet is another popular implementation of Set interface. We have seen other two implementations of Set interface – HashSet and LinkedHashSet. HashSet doesn’t maintain any order where as LinkedHashSet maintains insertion order. The main difference between these two implementations and Treeset is, elements in TreeSet are sorted according to supplied Comparator. You need to supply this Comparator while creating a TreeSet itself. If you don’t pass any Comparator while creating a TreeSet, elements will be placed in their natural ascending order.

The TreeSet class in java is a direct implementation of NavigableSet interface which in turn extends SortedSet interface (which in turn extends Set interface). Below is the hierarchy diagram of TreeSet class.

Properties of tree sets:

  1. The elements in TreeSet are sorted according to specified Comparator. If no Comparator is specified, elements will be placed according to their natural ascending order.
  2. Elements inserted in the TreeSet must be of Comparable type and elements must be mutually comparable. If the elements are not mutually comparable, you will get ClassCastException at run time.
  3. TreeSet does not allow even a single null element.
  4. TreeSet is not synchronized. To get a synchronized TreeSet, use Collections.synchronizedSortedSet() method.
  5. TreeSet gives performance of order log(n) for insertion, removal and retrieval operations.
  6. Iterator returned by TreeSet is of fail-fast nature. That means, If TreeSet is modified after the creation of Iterator object, you will get ConcurrentModificationException.
  7. TreeSet internally uses TreeMap to store it’s elements just like HashSet and LinkedHashSet which use HashMap and LinkedHashMap respectively to store their elements.



Map Interface:


The Map interface in java is one of the four top level interfaces of Java Collection Framework along with List, Set and Queue interfaces. But, unlike others, it doesn’t inherit from Collection interface. Instead it starts it’s own interface hierarchy for maintaining the key-value associations. Map is an object of key-value pairs where each key is associated with a value. This interface is the replacement for ‘Dictionary‘ class which is an abstract class introduced in JDK 1.0.

HashMap, LinkedHashMap and TreeMap are three popular implementations of Map interface. 

Properties of maps:

  1. Map interface is a part of Java Collection Framework, but it doesn’t inherit Collection Interface.
  2. Map interface stores the data as a key-value pairs where each key is associated with a value.
  3. A map can not have duplicate keys but can have duplicate values.
  4. Each key at most must be associated with one value.
  5. Each key-value pairs of the map are stored as Map.Entry objects. Map.Entry is an inner interface of Map interface.
  6. The common implementations of Map interface are HashMap, LinkedHashMap and TreeMap.
  7. Order of elements in map is implementation dependent. HashMap doesn’t maintain any order of elements. LinkedHashMap maintains insertion order of elements. Where as TreeMap places the elements according to supplied Comparator.
  8. The Map interface provides three methods, which allows map’s contents to be viewed as a set of keys (keySet() method), collection of values (values() method), or set of key-value mappings (entrySet() method).



What is the difference between Collection and Collections in java?


This is one of the most confusing java interview question asked many a times to java freshers. Most of time, this question has been asked to java freshers to check their basic knowledge about the Java Collection Framework. 

This question seems confusing because both “Collection” and “Collections” look similar. Both are part of java collection framework, but both serve different purpose. Collection is a top level interface of java collection framework where as Collections is an utility class. In this article, we will discuss the differences between Collection and Collections in java.

Collection Interface:

Collection is a root level interface of the Java Collection Framework. Most of the classes in Java Collection Framework inherit from this interface. List, Set and Queue are main sub interfaces of this interface. JDK doesn’t provide any direct implementations of this interface. But, JDK provides direct implementations of it’s sub interfaces. ArrayList, Vector, HashSet, LinkedHashSet, PriorityQueue are some indirect implementations of Collection interface. Map interface, which is also a part of java collection framework, doesn’t inherit from Collection interface. Collection interface is a member of java.util package.

Collections Class:

Collections is an utility class in java.util package. It consists of only static methods which are used to operate on objects of type Collection. For example, it has the method to find the maximum element in a collection, it has the method to sort the collection, it has the method to search for a particular element in a collection. Below is the list of some important methods of Collections class.

Tuesday, 15 September 2015

Amazon Pop-up Loft in London

Amazon threw a AWS Pop-up Loft London Opening  Party that took place on September 10th, 2015 at 1 Fore Street, London, EC2Y 5EJ:



Werner Vogels, CTO of Amazon.com, talked about how AWS is helping startups in the UK to grow at a faster rate. There were other talks done by startup owners, accelerators, incubators, and some AWS technical experts. There was nice food and drinks, as well as free AWS giveaways.

Presentations were quite educative and later they were published online:


Saturday, 5 September 2015

Commodity Price Engine - Deployment Diagram

These are possible components for Commodity Price Engine that could be built on top of AWS or Windows Azure and be based on available open-source products and some modules developed in-house.



Tuesday, 1 September 2015

Brief Q&As About Solution Architecture

1. What would be the key artefacts that need to be delivered when defining a solution architecture?

Key artefacts present different views on solution design where each view is targeting specific user groups. For example:

  1. Business architecture artefacts address business users, planners and business stakeholders.
  2. Data architecture artefacts address database and system administrators.
  3. Application architecture artefacts provide guidelines and instructions for software development and test teams.
  4. Technology architecture artefacts are essential for infrastructure administrators, development team and system managers.
  5. Security architecture artefacts address the development and test teams, system administrators, security auditors, business users and managers.
Solution architecture artefacts also depend on methodology used by an organisation that the solution is designed for. Popular methodologies include but not limited to TOGAF, Zachman, RUP, CMMI, FEAF.  Examples of key artefacts could be:

  1. TOGAF:
    1. Catalogues (role catalogue, product catalogue, data entity catalogue, etc)
    2. Matrices (business interaction matrix, actor/role matrix, system data matrix, etc)
    3. Core and extension diagrams (functional decomposition diagram, use case diagram, process flow diagram, event diagram, product lifecycle diagram, data security diagram, etc)
  2.  Zachman:
    1. Practically sufficient subset of Lists, Models, Diagrams, Specifications and Details documents that are based on 30 views of Zachman framework matrix – Why/How/What/Who/Where/When by Contextual/Conceptual/Logical/Physical/Detailed.
Sometimes a final Architecture Document could combine major views and present the solution to most user groups. From my experience I found that in some cases that require relatively urgent delivery, a properly built Business Requirements Document could become a part of the final Architecture Document. I published a short manual for such BRD on my blog at http://bananaqualitytester.blogspot.co.uk/2013/01/guidelines-for-business-requirements.html.


2. What would be the role of Solution Architect during the project lifecycle?

Solution Architect’s activity during the project lifecycle should be organically coupled with business stakeholders’ decision making, development team/s delivery progress and the end users expectations to ensure:

  • Target architecture would accommodate any possible evolution of the business stakeholders’ vision of the future system.
  • Development teams would follow the architectural guidelines without sacrificing quality, robustness and effectiveness of the future system.
  • The end users have an early access to the system-in-development, play with it, test it and provide feedback to architecture and development teams.
It always a good idea to keep architecture artefacts such as definitions, models, diagrams, specifications, etc updated during the project to make sure they could be re-used in the future by the business users, help desk and system support staff.


3. How would a Solution Architecture work in the context of Waterfall and Agile methodologies?

Once all business requirements are clarified and understood, a typical search for a solution may include such activities as reviewing competitive technologies, evaluating open and commercial off-the-shelf products, building and testing prototypes, etc. In some organisations it’s called Research and Development (R&D) stage. The output of R&D stage is a solution design that is (or at least very close to) target architecture that would be eventually defined in such documents as Architecture Document mentioned above.

Waterfall methodology is acceptable in cases when business decision about future system is final and a chance for any changes is very low. In this case R&D stage is paramount for the whole project as the solution architecture artefacts would be translated into project management timelines, delivery milestones, QA test case scenarios and system releases. Given the importance of R&D, it would make sense to spend more time on avoiding any ambiguity during reviewing business, functional and non-functional requirements, looking for possible issues with off-the-shelf products and stress-testing prototypes to ensure a delivery of a flawless solution.

Agile methodology doesn’t require absolutely complete set of requirements, giving it a chance to evolve during the system development. From high-level perspective, an agile project could be represented as follows:



The straight lines coming from the centre (epicentre) are individual use cases, circle lines are delivery milestones and the red spiral line is actual system development that ‘covers’ use cases (and milestones) more and more with each iteration until all of them are done (or delivered). In this case the core system functionality (area around the epi-centre) is paramount and requires clear prioritisation by the solution architect. Because of project agile nature, it is expected that solution architecture artefacts would evolve during the project and therefore should be continuously updated by the architect and those changes propagated through all teams involved into the product delivery.


4. What process steps would be expected between the capture of requirements and start of the coding?

These steps may depend on architecture methodology used in the organisation (TOGAF, RUP, Zachman, etc) as well as the type of given project (waterfall or agile). They may include:

  • Business, functional and non-functional requirements review and clarification.
  • Business requirements update (see Guideline for Business Requirements mentioned above) and their verification with business users. These steps would help to build with business users unambiguous project vocabulary, learn more about their expectations, identify key people in their team and establish working relationships with them.
  • R&D stage – please, see (3) for more details.
  • Initial solution design.
  • Verification of acceptability of proposed solution for existing or target infrastructure.
  • Cross-check of proposed solution design with budget requirements.
  • Validation of proposed solution design against system security constraints.
  • Validation of proposed solution design against internal and/or external rules and legislation.
  • Approval of solution design with key stakeholders.
  • Final (or semi-final in case of agile) solution design.
  • Creation of solution architecture artefacts - please, see (1) for more details.
  • Definition of development environment. This may include continuous integration and automated testing, knowledge management system, issue tracking system, network topology (in case of delivering a Cloud-based solution), etc – this is a range of approaches that would allow to keep development process transparent to the architecture team as well as to other interested parties involved in the project.
  •  Requirements for system test coverage.

Friday, 28 August 2015

Machine Learning with Windows Azure

This is a brief overview of Machine Learning usage on Azure cloud.

Login to your Azure account, select Machine Learning and follow these

Steps:

  1. Create new Experiment
  2. Drag-n-drop datasets and modules and modify their properties
  3. Create a workflow. There are available pre-configured ML modules with R and Python scripts that:
    1. Following best practices may help to solve a problem for specific domain.
    2. Have pre-built data schemas.
    3. Contain domain specific data processing and feature engineering.
    4. Include training algorithms and calculation metrics.

Workflow Concerns:

  • A Single Experiment:
    • could be too complex
    • difficult to navigate
    • easy to mess up
    • iterations have to go through entire graph
  • Multi-Steps Workflow:
    • each step in a separate Experiment
    • uses saved datasets / readers / writers
    • iterations come through individual steps separately

Machine Learning Templates from Azure Machine Learning Gallery

  • Text classification (text tagging, text categorisation, etc). For example, assign piece of text to one or more pre-defined set of classes or categories:
    • categorise articles
    • organise web pages into hierarchical categories
    • filter email spam
    • search query user intend prediction
    • support ticket team routing
    • sentiment analysis
    • feedback analysis
  • Fraud detection
  • Retail forecasting
  • Predictive maintenance

Text Processing:

  1. Language detection
  2. Normalisation:
    1. Convert words into normalised ones:
      1. Down case: The -> the
      2. Lemmatisation: plays -> play
      3. Stemming: mice -> mouse
  3. Stop-words removal: the, a, to, with, etc
  4. Special character removal: for example, ignore all Non-Alpha-Numeric
  5. Ignore numbers, emails, URLs, etc

Feature Extraction:

  • 'Bags-of-words' - the document is treated as a set of words regardless order and grammar
    • Split text into Bi-grams, tri-grams, n-grams
    • Score how it correlates with your topic (e.g. sport, business, etc)
  • Move from symbols to numbers:
    • term occurrence (number of times words or n-grams appear in document)
    • term frequency
    • Inverse Document Frequency (IDF)
      • the inverted rate of documents that contain words or n-grams against the whole training data set of documents
      • downgrades the importance of useless words like 'which', 'the', etc
    • TF-IDF
      • frequent words that appear only in a small number of documents achieve high value
  • Dimensionality Reduction (vocabulary used in documents could be too large) - 'curse of dimensionality'. To do:
    • filter-based filter reduction
    • wrapper-based filter reduction
    • feature hashing
    • topic modelling (LDA)

Model Training:

  • Binary-class Learners:
    • 2-class Logistic Regression
    • 2-class Support Vector Machine
    • 2-class Boosted Decision Tree
  • Multi-class Learners:
    • One-vs-All Muliclass
    • Multi-class Logistic Regression
    • Multi-class Decision Forest

Model Evaluation:

  • Split data into Train, Development and Test subsets
  • Compare and visualise the results of comparison between two trained models, e.g. N-grams and Uni-grams

Finally

Deploy your model on Azure as a web service. 

Easy!
:)


Wednesday, 26 August 2015

Content Enrichment

Content enrichment is about manipulating crawled content before it is added to the search index. For example, add a sentiment analysis score to indexed social activity.

Some components that support content enrichment are:

  • Version control
  • Technical metadata (formats, format versions, validation rules, etc)
  • Provenance data (processing history)

Questions:


  • How standardised is the enrichment information?
  • How volatile is enriched information?
  • When is the content enhanced (by author, during submission, during editorial, etc)?
  • Where does enhanced information live (embedded, externally)?

Key challenges:


  • What is the master source/copy of the information?
  • Is the information normalised or de-normalised (repeating parent metadata across child elements)?
  • How to synchronised across multiple systems?




Friday, 10 July 2015

NetKernel Bootcamp London - 17th August 2015

Speaking of the devil, 1060 Research has just announced about their bootcamp in London!

The event takes place on Monday 17th August and will be a full day introduction to Resource Oriented Computing (ROC) using NetKernel: ROC is what happens when you take micro-services to the limit and start to think in terms of resources as the first class concern of a software system.

The day will be hosted by 1060 Research at TechUK:


Thursday, 2 July 2015

NetKernel Takes Micro-Services to the Ultimate Level

While talking about such relatively new boys on the market as Vert.x, Akka, Chronicle, Kafka, Ready! API, RxJava, etc which certainly are great components for solutions that respond to current demand for micro-services, the mainstream seems to be completely missing such nice, mature and easy to use product as NetKernel. The latter one is not competing with newcomers and together they can comprise quite elegant solutions that any architect would be eventually proud of.

NOTE: This is not a promotion for NetKernel. I don't work for them. This is just an attempt to be fair to those that somehow happened to be on a side of the road.

Functional Requirements for Commodity Price Engine

Introduction

Commodity Price Engine is a derivatives sales tool and potentially a trading application designed specifically for the commodities market covering energy, base metals and agricultural products. It provides server based pricing and sensitivities for structures consisting of forwards and options that incorporate volatility skew and is designed to be delivered via the web and as native mobile applica- tions. The implemented functionalities in the prototype are detailed below, along with market data requirements and planned extensions.

Supported Underlying Assets

Commodity Price Engine supports any asset with forward curves and implied volatility surfaces. This includes exchange traded products with sufficient liquidity and products for which the user is able to supply the forward curves and volatility surfaces. A planned extension for Commodity Price Engine would build required curves and surfaces to accommodate structures on illiquid underlying assets.

Supported Derivatives and Valuation

Pricing and sensitivities are available for forwards, bullet and Asian options, and structures consisting of any combination of forwards and options. The valuation model takes into account volatility skew and has been benchmarked against commercial software used in investment banks. Price and sensitivities can be converted to any currency and standard metric units.

4. Sales and Trading Features

Commodity Price Engine would allow addition of sales and trading margins, shifting of forward curves and volatility surfaces for what-if analysis, solving for break even strikes for structures, generation of term sheets, and graphing of forward curves and payoff diagrams. It also would accommodate back-dated pricing for available historical data.

5. Planned Extensions and Enhancements

Additional features that are planned for Commodity Price Engine include:
- Construction of illiquid forward curves and implied volatility surfaces.
- Calculation of credit value adjustment (CVA).
- Computation of value-at-risk (VaR).

6. Market Data Requirements

Commodity Price Engine assumes availability of the following market data:
- Yield curves for required currencies (it would be possible to bootstrap yield curves from cash, futures, OIS, swap, and single currency basis swap quotes).
- Forward curve and implied volatility surface (it is possible to build volatility surfaces from market quoted option prices) for required underlying assets.
- FX forward curve and volatility surface for required currency pairs.
- Implied survival probabilities for relevant entities if CVA calculation is required (it would be possible to compute the survival probabilities from yield curves and credit default swap (CDS) spread quotes).
- Historical data for above if VaR calculation is required.
Market data can be obtained from commercial data vendors such as Bloomberg or Reuters (commodities data from such market data sources as ze.com will need to be supplemented by interest rate, FX, and credit data).

7. Technology Architecture

Commodity Price Engine architecture consists of server and client side components. The server side manages market data and could be loosely coupled with a grid of quantitative pricing libraries. The client side is the Graphical User Interface (GUI) that communicates with the server via secure protocol and could be accessed from desktops or a variety of mobile devices. Pricing libraries could be placed on the client side if required.

8. Conclusion

Commodity Price Engine would be a sales and trading application designed for participants in the commodities market who traditionally relied on investment banks for pricing support due to limited access to suitable tools. It would have the capacity to become a full-scale trading platform if supplemented with modules for connecting to trade booking and counterparty portfolio management systems.

Tuesday, 19 May 2015

Microsoft Workshop: Developing for Internet of Things, London

The workshop took place in Microsoft office at 100 Victoria Street. The crowd was pretty big. First we came through a couple of presentations and then did three labs. Overall, the workshop was very interesting, it gave a good overview of what IoT consists of, where we are with it at the moment and how it would possibly evolve in near future. See below some take aways that I think could be helpful to review later on.

Our presenters were:

  • Paul Foster, DX Microsoft UK, and
  • Robert Hogg, MVP, Microsoft Integration, MD Black Marble

Some notes:
  1. There are open source IoT frameworks (for example, check out AllJoyn)
  2. IoT provides Data-Driven Insights (Telemetry):
    1. More efficient use of resources (cost reduction, environmental impact)
    2. More targeted products and services (social impact, increased revenue)
  3. While working with connected devices, it's very hard to predict in advance what data will be useful. The important data may not be what was expected in the beginning. Therefore:
    1. It's tempting but likely inefficient to try for business transformation in the first step.
    2. Need to think about not only device telemetry but also diagnostic telemetry.
  4. Privacy and security have to be addressed at very early stages.
  5. Although the ability to control devices remotely could be quite helpful, in the beginning designers may need to get used to work with devices that provide one-way communication only.
  6. Microsoft goal to support in Azure ANY device!
  7. https://www.wirelessthings.net
  8. Hortonworks Sandbox is a free installation of Hadoop that comes with sample data and tutorials. It could be installed on a personal computer - it's a great tool to start playing with real Hadoop.
  9. Lots of interest in R programming. R is used in practically all universities across UK and investment banking. Many R scripts come for free from academia.
  10. Practical Data Science and support for it is quite popular within nowadays business activity.
  11. Microsoft provides free consulting advises for IoT initiatives.

Some slides:

1. It is expected that interest in IoT will get into initial peak then it may cool off with gradual and steady grows of popularity afterwards:


2. Different level of IoT evolution:


3. ToDo roadmap:


4. Variety of IoT devices:


5. IoT challenges:


6. Pattern to start with:

7. This is what Microsoft offers on Windows Azure for IoT:

8. Some IoT problems that could be solved with Windows Azure:

9.
10.
11. This Event Hub is already available in Windows Azure. In fact we used it in our first lab.

12. Stream Analytics is also already available in Windows Azure. We used it in out second lab.

13. Stream Analytics front-end in Windows Azure looks almost as simple as this diagram:

14. I'm not sure if it's really a 'pattern' but it's good to keep in mind that volume of incoming messages in IoT could be really huge:

15. Possible IoT participants:

16. This slide represents a great desire to keep IoThings under a tight control. We'll see if it would become a reality or stay just a dream:

17. This is what Event Hub on Windows Azure is capable of:

18. When I see such slides I think more and more about Lua, Barracuda Embedded Server and Express Logic:

19. Network security means encryption. I'm not quite sure why does a message from, say, a temperature sensor that has only two fields - IP address and temperature value - have to be encrypted? Keep in mind that millions of such messages would need to be decrypted at the Event Hub on arrival...

20. More about security:

21. It's good to know that there is the IoT Suite. We didn't play with it, so I don't really know how it looks like:

22. More concerns about IoT:

23. I guess that if you would follow one of the last two links, you might find this presentation in an original file:

24. These are three labs that I did on that day. First two required configuration on Windows Azure. In last one I used a Raspberry platform as a sensor that sends messages to the Event Hub configured in the first lab. I should admit that it was quite interesting to do this. Event Hub with Stream Analytics looked very similar to CEP (Complex Event Processing) that I worked with before.

25. Azure community is steadily growing. I have already booked a place for IoT & Data Hackathon in Reading and hope to put some info about it on the web as well:

26. It seems that topics on this slide and many more could be learnt on Microsoft workshops in London for free:

27. More events:

28. More links:

29. And more links:




Online Encyclopedia of Statistical Science (Free)

Please, click on the chart below to go to the source: