Wednesday, 7 December 2016

Graph Database OrientDB

Neo4j's book 'Graph Databases' provides an easy introduction to the topic (google for its PDF version). Try to concentrate on schema design principles supported by practical examples and skip learning their query language Cypher unless you want to use Neo4j for your projects.

On Dec 7th, 2016 there was a good webinar about migrating data from Neo4j to OrientDB. It was surprising to see what OrientDB has under the hood - nice SQL, indexing, clustering, sharding, security model and visual representation of actual graphs stored in the database. Also OrientDB could be used as a scheme-less or schema-based  graph-oriented database or a document-oriented database or use both models (graph and document) simultaneously. Unfortunately it doesn't run on Java ME yet (it's in its roadmap though). A shortened earlier version of this webinar is available on Youtube. Below is a copy of questions and answers popped up during the presentation:

Q: Is OrientDB enterprise edition available under an open source license as well as commercial?
A: OrientDB Enterprise Edition is only commercial, but the Community is licensed as Open Source with Apache2. You can try Enterprise Edition for 45 days before to decide.

Q: LIke a viral license (AGPL) that would allow us to use enterprise edition if we open source the code we use with orient?
A: OrientDB Community Edition is licensed as Apache2, so it's not viral. You can use it for any purpose, even embedding it at no cost.

Q: Which version of OrientDB are you using?
A: OrientDB v2.2.13

Q: Do you offer a startup program? So that small companies can use enterprise edition at no/low cost?
A: Absolutely. We provide 50% of discount for startups. No hidden costs, it's all on our web site.

Q: when using the object model are ther constraints to consider can i mix match models?
A: The object model (JPA-like) doesn't work very well with the graph one, so we sugget to pick one of them. The Graph model is more powerful and supported.

Q: Does the choice of schema mode influence the performance of standard or index-based lookups?
A: Yes, using the schema makes the database much smaller (property names are not saved in the record) and therefore a smaller database is faster.

Q: Inaccurate to say that Neo4j does not support inheritance or polymorphic queries- it does with labels.
A: Neo4j labels are not polymorphic and there is not such concept in Neo4j. For more information look at http://stackoverflow.com/questions/24873067/how-to-work-with-type-hierarchies-in-neo4j, specially at the last answer/comment.

Q: How would you compare orient's extended SQL to Gremlin?
A: SQL and Gremlin are quite different in many senses, SQL is declarative, Gremlin is more oriented to step-by-step traversal/filtering. Please consider that both OrientDB and Neo4j support Gremlin, that is a standard, so you can easily migrate from one to another

Q: How do you do variable depth queries in orientdb? e.g. min depth 1 to max depth 5
A: you can use a mix of TRAVERSE and SELECT, like "SELECT FROM (TRAVERSE ... WHILE $depth < 10) WHERE $depth > 3. Or you can use the MATCH syntax, where you have distinct WHILE and WHERE conditions, one for the traversal and the other for filtering

Q: Can you query across clusters? For example if the graph is fully connected?
A: Yes, OrientDB will manage the query for you. Of course the query performance depends on how many hops you do between clusters

Q: So a cluster is a cut of the graph essentially?
A: Yes, exactly

Q: If you can write to multiple mastes, how does orient handle consistency / transactions?
A: OrientdB supports distributed transactions that assure the concistency of the database by using a 2-phase locking protocol across the servers.

Q: If you can write to multiple mastes, how does orient handle consistency / transactions?
A: Consistency is based on MVCC and quorum based consensus.

Q: What requirements does OrientDB have for Java runtime? Will it run on Java ME?
A: In terms of runtime, it only needs Java SE (no Java ME supported for now)

Q: What version of Neo4j are you comparing OrientDB to? And what version of OrientDB are you talking about?
A: We compared last GA version of both products, so Neo4j 3.0.6 and OrientDB 2.2.13

Q: Does orient work with LDAP / active directory?
A: OrientDB supports Kerberos and you can import LDAP users

Q: Neo4j seems to be a much larger company (in fact I think they just raised a bunch of money). Why do you think it is possible for orient to have so many more features than Neo4j while being a much smaller company?
A: Receiving funding is not a guarantee the company will be there tomorrow. Look at what's happened to RethinkDB and other companies that have received funding, but weren't focused on building a sunstainable business. OrientDB company is profitable since 3 years ago and its investors are its clients. We believe this is the only healthy business :-)

Q: Does key/value model (like Redis) stays in memory? We use Redis for the speed (memory residence). How will the speed be affected if we drop Redis and move to Orient for Key/Value?
A: As "just" Key/Value DBMS, Redis is faster, so if you just need a K/V we suggest to use Redis. But if your domain is more complex and requires documents, graphs, etc, then the Multi-Model approach is the best in terms of global performance and complexity.

Q: What is the theoretical/practical limit to the number of classes in ODB?
A: in current release you can have up to 32.000 data files, so if you have one per class you can have 32.000 classes

Q: What are the most common use cases for your customers in production now? How do these differ from the use-cases of customers using Neo4j?
A: While Neo4j is "only" a Graph Database, OrientDB can be used on a wider number of use cases, especially when an Operational database is required. For Operational I mean a primary database, while Neo4j in 99% of the cases is used as a secondary database, mostly for analytics with data loaded from a RDBMS (the primary one).

Q: Does the graph have size limits? Please give an example of a BIG graph already deployed in OrientDB.
A: These are the limitations: http://orientdb.com/docs/2.2/Limits.html. You can create up to 302,231,454,903 Trillion of vertices and edges, it should be enough :-) The biggest installation is for an energy company with +100 servers.

Sunday, 20 November 2016

Alibaba Cloud

Alibaba's statement on IP EXPO LONDON was plain and clear - tell us what you need and we'll do it for you, let's work together for our mutual benefit.

It didn't take long - Alibaba Cloud came to Europe. Although Alibaba has a quite efficient payment system that they technically polished in China while processing huge number of transactions on the daily basis, in Europe they started with offering just traditional cloud infrastructure. I can't wait and see what kind of other offers would follow.

How Alibaba presence looks from a consumer perspective? I've got a quick example - during Alibaba's 11.11 event I bought two 5000LM X800 'tactical' torches (they are great for cycling!) for a half price of one. I got them from an eBay seller who most likely proxied Alibaba transaction on that day.

If supply chain, online catalogues, shopping carts, checkouts and delivery of purchased items are all done under the same umbrella (Alibaba), what would happen to current distributors of Chinese goods in Europe and programmers that provide support for their online transactions? How would Amazon and eBay compete with Alibaba?

What kind of services (MaaS, SaaS, PaaS, etc) and APIs would Alibaba provide for its European infrastructure? I hope we'll find it out pretty soon.


Saturday, 15 October 2016

Kotlin Night in London on October 12th, 2016

Kotlin is a programming language invented by JetBrains that simplifies and even beautifies programming for Java Virtual Machine. Although syntactically it differs from Java, it provides full access to existing Java libraries.

Practically it's a wrapper around Java programming language that behind the scenes generates some boilerplate code that simply became too time-consuming to keep taking care of again and again and again... As a result, the code becomes more readable even for those Java programmers who is not really familiar with Kotlin yet. Look for more details about it on JetBrain's website.

Kotlin Night in London, Real World Kotlin, was quite interesting, informative and even surprising. The venue was almost full, some presentations were done by those practitioners who actively use Kotlin in their day work for production code already.

Takeaway points are:

1. Kotlin combines object-oriented and functional programming styles in a way when coding efficiency is the must. In other words, it was built by practitioners for practitioners.
2. Kotlin is free, it really costs nothing to start using it.
3. It does not generate any new kind of binaries. Kotlin code is translated to Java and then compiled into normal Java binaries.
4. Kotlin code looks way cleaner than Java, it's easy to read and understand what it does.
5. JetBrains seems to be a pretty decent and reliable vendor to be behind things like that.
6. If a Java programmer is familiar with some functional programming then it would be very easy for him/ her to pick up Kotlin in no time at all.
7. If I'm to start a new Java project, I'll definitely give it a go.

Recording of presentations from that night could be found here.

Saturday, 8 October 2016

IP EXPO EUROPE @ ExCel London on 5-6 October, 2016

This was quite interesting exhibition saturated with well-organised presentations. There were lots of big and small companies that offered stand-alone and cloud-based products and services for Cyber Security, Networks and Infrastructure, Data Analytics, DevOps and Open Source sectors.
Multiple presentations were going on simultaneously in different venues. I've have scheduled my own list beforehand and was jumping from place to place trying to catch as much as I could.
Unfortunately, most presentation were conducted by business/sales representative and were not very useful from technical perspective. I really liked topics discussed by SplunkASI Data ScienceAppCheck NGAlibabaFirst Base Technologies and Libelium. Below are some notes taken there.

Spunk (Reinventing IT Operations):
They put monitoring of applications and infrastructure on a new level when a monitoring solution could be plugged into existing systems literally within a day. A list of clients where they have already installed their solution was quite impressive. It would be a good idea to compare their functionality (and overall prices) with Zenoss that I used quite often in the past.

Claire (DevOps Platform for the Evolving Enterprise):
The idea: different teams (dev, business, testers, etc) use different tools (Jira, PM, etc), tools orchestration is complex, data synchronisation used by those tools is cumbersome. Continuous integration is difficult. Clarive offers Lean Application Delivery.
I liked their idea as it ticks all major check boxes: application lifecycle, release management and change request handling. Possibly these components combined under one umbrella may become a 'holy grail' knowledge management systems that I was looking for a long time.

Rubrik (Recover, Manage and Secure Data in the Enterprise Cloud):
Although a typical Enterprise favours delegation of extensive data processing (e.g. concurrent parallel calculations) elsewhere, it is quite cautious (as well as often constrained legally) about keeping its data on the Cloud. Because of Security... Rubrik tries to convince the public that they can keep the data on the Cloud and handle its security nicely:











Citrix (When Big Data meets Small Things – Secure Event Delivery for the Internet of Things):
This presentation was a clear message that Citrix is in IoT already and in regards to security it take the IoT services on the Cloud very seriously:












Puppet (Continuous Delivery: DevOps Holy Grail):
Well-known company with a great product. The speaker was good but his presentation was kinda useless - it was not exciting from business perspective and it was too shallow for techies, a pretty general talk about continuous delivery and how Puppet could help with it.

ASI Data Science (Practical machine learning for business applications):
These guys were amazing! A relatively small London-based company somehow accomplished 120 data science projects in a relatively short time! Well, some of those projects were probably small but the overall number of them is still astounding. They have developed their own framework that looks quite practical. I reckon that 'Data Science Project Cheat Sheet' is the most valuable screenshot I took on IP EXPO. I asked them for their presentation in electronic form and will upload it here as soon as (and if) they send it to me.
(TODO: publish ASI Data Science presentation slides)















AppCheck NG (Web Application Security: Challenges Old and New):
It was a good educative presentation. I didn't take any photos as they promised to make the slides available on their website. I haven't found it there yet and sent them a message. I'll upload their presentation here as soon as I get it.
(TODO: publish AppCheck NG presentation slides)

Avaya (Internet of Things – Forget the hype this is the reality...):
The presentation was given by Jean Turgeon in the biggest venue available. Hardly any chairs were left empty.  The talk hasn't been very technical though. I guess, the main message was that Avaya understands popularity of IoT and it is aggressively investing into this market sector.




Alibaba (Alibaba Cloud, More than just cloud):
Although the venue was small and many chairs stayed empty, the presentation was a Big Surprise! In short, Alibaba is The Real Deal on the Cloud and it's coming to Europe. I took few shots but the presenter, Mr. Yeming Wang accidentally (I hope that was an accident) removed them from my iPhone Notes when he was typing in his email address (apparently, he had no business cards left). He promised to provide me with his presentation in electronic format that I will happily publish here as soon as I receive it from him.
(TODO: publish Alibaba presentation slides)

First Base Technologies (Major Real World Red Team Exercise - The story you are about to hear is true...):
That was a very educative story about hacking into some sort of military or police database where important artefacts were stored in various electronic formats. The story was full of tiny and very interesting details. Apparently, technical hacking is greatly enhanced by social hacking. Below is just a part of that story:
They found all remote branches that the target databases is accessible from. They identified the less secured branch and looked for names of its employees. Once few names were known (some employees would go out for a lunch with their security badges attached to their clothes in a plain view), they collected personal information from social networks (LinkedIn, Facebook, Instagram, etc).
Now imaging that a guy has just returned from his holidays in Spain. He receives an email message with a logo of a hotel his family stayed in and an offer for a relatively good discount for the next trip. All he has to do is to open an attachment (then kinda print it, sign it and send it back). Once the attachment is opened, voila, intruders got a remote access to his machine! It's phishing, isn't it? Everybody knows about it, right? Do you know in how many cases this kind of files got opened? Well, in thrilling 50%...

Libelium (IoT Interoperability: any sensor, any protocol and any cloud):
This relatively small Spanish company happened to acquire a lot of knowledge in IoT space, particularly in various sensors design, implementation, installation and maintenance. Below is a short summary of what has been said there, some points could be invaluable for newcomers:

  • At the moment there are no unified standards for connection between IoT sensors and back-end systems.
  • New technologies keep appearing every year.
  • Everyone wants to get it into IoT.
  • There is a lack of clearly defined roles in implementation of IoT solutions.
  • Libelium works with sensors and communication only, it nothing to do with clouds, integration and analytics software,  they are trying to stay as close to the customers who use sensors as possible.
  • Solutions for Industrial IoT (IIoT) are very difficult to replicate. 

Lessons learnt:
    1. Sensors are nothing but the tracks (customers ask for higher quality sensors) meaning that they are the base of the business.
    2. Interoperability is the key - this means that any sensor on any cloud should be connectable using any communication protocol.
    3. IoT players need to quickly adapt to new technologies, tight coupling with selected technologies is deadly
    4. Installation and maintenance do matters.
    5. Don't go for quantity but for quality and accuracy.
    6. Be easy to evolve, for ex. iPhone app-sensor that helps to prototype a new system.



Thursday, 16 June 2016

Microsoft DevOps Tech Day

It was a good, educative workshop in a new Microsoft office steps away from Paddington train station. Several topics were presented by people from Microsoft as well as Microsoft partner RedGate. See below some notes taken there.

Microsoft:

Continuous Integration with TFS


  • TFS on the cloud is $8/month if there is no subscription. Any of most popular IDEs could be connect to it.
  • As part of the build, an application could be deployed on the cloud. Subscription to Azure could be identified via credentials file taken from Azure (management certificate).
  • Builds could be done against the master branch or any other selected branches.
  • Git doesn't support Gated Check-in (that are accepted only if submitted changes merge and build successfully) but TFS does support it.


Infrastructure 


  • Infrastructure deployment could be templated, versioned and then automated.
  • This could be done on Azure public or private Azure-inspired clouds.
  • To instantiate a template go to: Visual Studio > Azure Resource Group > deploymentTemplate.json: parameters/variables/resources/outputs - use wizards in Visual Studio. Create Resource Group in PS1 file and use Powershell to deploy it to Azure.
  • Go to Azure, click on Resource Group and see what has been deployed.
  • It's possible to modify some features and then export it as a template to JSON file.
  • Templates could be also deployed via Azure browser-based GUI (Azure GUI). Search there for 'Deploy Template'.
  • This is an Infrastructure as a Code (IaaC) - it could be versioned and auto-deployed on the need-to-do basis using continuous integration workflow.
  • Check azure-quickstart-templates in GitHub. Main JSON file may come with additional one that contains parameters. Those parameters could be displayed in Azure GUI as well. 
  • Check Azure QuickStart Templates on Microsoft website.
  • A single template could be split into (or consist of) multiple JSON files.
  • Powershell DSC could be used for a more customised deployments. As a matter of fact, it's very easy to create a customised deployment using Powershell.
  • Check Powershell Gallery for more resources.
  • Azure has a library of predefined images to be used for such deployments (ex SQLServer, Windows Server, etc) - this is to get started with it.
  • Azure Automaton DSC could be used with ChefPuppetAnsible.
  • See slide for Local Configuration Manager.
  • Azure Visualizer could be used for inspecting instantiated infrastructure.
  • Configuration could be tested with Powershell Pester Tests (which is a community project).


Continuous Delivery


  • Team Services (web GUI) has management for Releases - it's a new feature.
  • Release management can take artifacts from Team Services, Jenkins, Tram City, etc.
  • An application release could be deployed on different environments.
  • There could be particular approvers assigned to particular release deployments.
  • For unsuccessful releases new Bugs could be created using the same GUI (similar to Jira).
  • There is no rollback task. Instead you can use the previous release for redeployment. In this case the release could be done in manual mode.
  • A successful deployment on DEV environment could expect an approval for subsequent deployment on QA environment. Approval could be done using the same GUI by a person previously authorised for that.
  • Log output is visible in the GUI in real time.
  • See Xamarin Test Cloud and HockeyApp for beta distribution.
  • Release Management (from above) could use a task for testing an application on Xamarin Cloud.


RedGate: 

DevOps for databases


  • You can create a database project in VisualStudio.
  • How to deal with db changes - schema updates, drifts, etc.


Infrastructure and Application Monitoring


  • Infrastructure Insights:
    • Create a new dashboard in Azure.
    • Pin to the dashboard required tiles.
    • Settings for each tile could be configured.
    • Once dashboard is ready it could be shared with other people.
  • Application Insights:
  • Telemetry sources: traces, events, etc.
  • Log Analytics (OMS):
    • (Microsoft Operation Management Suite)
    • Another customizable dashboard.
    • Both for Windows and Linux.
    • It's solution based. See Solution Gallery.
    • Try OMS at www.mms.microsoft.com
    • Feedback and ideas windowsserver.uservoice.com


Online Encyclopedia of Statistical Science (Free)

Please, click on the chart below to go to the source: