It’s been an interesting week, culminated by a request from a colleague, Dr. John Levy. John asked me if I would substitute teach one of his lectures for the Fromm Institute, which is part of the University of San Francisco. Fromm was established to provide ongoing educational opportunities for retired adults over the age of 50. The lecture John has asked me to give is on Cloud Computing and Big Data, and is part of an 8-lecture series titled Digital World – Invisible Computers. As I’ve started preparing for this lecture a stream of thoughts have unfolded around how I can present the current and future state of Cloud Computing and Big Data, along with a lead-in of the path that has gotten us here. I even thought about using a Gartner “hype cycle” model to demonstrate where we are in the life cycle. Shouldn’t be too complex. After all, we’ve crossed the “hump” – right?
John’s request wasn’t the only thing that happened last week that motivated me to sit down and pen this post. I had the opportunity (treat) to visit with Simon Crosby, the founder and CTO of Bromium. Simon was one of the founders of XenSource (now Citrix XenServer) and has a very rich and intriguingly interesting background in technology and academia, holding a PhD in Computer Science from the University of Cambridge. If you’ve ever heard Simon speak on technology and business topics, you will have to agree that he brings a unique and amazingly broad perspective to the table. In our session a lot of the discussion dealt with standards, standards bodies, consortiums, the complexity and maturity of this thing we call the “cloud.” The one thing that stood out from our discussion is that this stuff is not getting any simpler, and if you look at the underpinnings of cloud computing, we really haven’t progressed in terms of our ability to elegantly integrate the key elements that make up a cloud ecosystem that can support enterprise applications. Simon used the term “galactic glue” in reference to describe how some companies are touting their capabilities to make the connections between IaaS/PaaS/SaaS. But as we all know, there are lots of different types of glues for different applications… Another key point that Simon made is that while there is a lot of hype around “Dev Ops”, in the real world Dev = Ops. In most cases, there is not a distinction between the two and what is running in Ops is really still in Dev – if you get my meaning.
…are we at a point where the systems architectures required for cloud-based, highly distributed, loosely-coupled services-based applications are so complex that the only way to ensure they work is to see how they fail?
This led to another connection point later in the week, when I reviewed Adrian Cockcroft’s presentation that he gave at SVForum on 3/27. Adrian is another one of those people who just oozes knowledge when it comes to anything related to technology and especially cloud computing. His presentation was titled Cloud Architecture at Netflix – How Netflix Built a Scalable Java Oriented PaaS Running on AWS. Two things that Adrian and Netflix do extremely well are architect systems for high reliability, and share how they do it. I suggest you grab a copy of Adrian’s presentation – well worth the time digest it. The one thing that jumped out at me is that this stuff is not simple. The 65 pages of the presentation are chock full of architecture and design and technical stuff like “circuit breakers” and “scale up linearity” and “token awareness” and “consistency levels” and “key stores” and… you get the picture. This stuff is not any different than the stuff we’ve dealt with for years, it’s now just applied to a different service delivery model – and that makes it even more challenging IMHO.
What linked the meeting with Simon to Adrian’s presentation was a discussion around Chaos Monkey. For those of you not familiar with Chaos Monkey, it’s a tool that Netflix wrote to randomly inject failures into an environment to see how it will react and hopefully recover. The premise of Chaos Monkey (per Adrian’s presentation) is that “Computers (Datacenter or AWS) randomly die – Fact of life, but too infrequent to test resiliency.” A great idea. Force all applications to succumb to chaos testing to ensure that those “circuit breakers” and “fail fast” techniques we’ve employed actually work. But this got me to thinking – are we at a point where the systems architectures required for cloud-based, highly distributed, loosely-coupled services-based applications are so complex that the only way to ensure they work is to see how they fail? Maybe I’m overreacting a little, but I look back to the major AWS outage that occurred in early 2011 – Netflix was the only customer that seemed to weather the storm. Can everybody really afford and enforce the discipline required to design and implement “Chaos Monkey-tolerant” systems? Note: As we all love about this industry, there’s always a bandwagon to be jumped on…
The third thing that happened last week was that I was accepted into a beta for a new cloud service, provided by a well-known but not to be named hardware/software and IT service provider. I’ve been following their efforts for quite a while and was anxious to see how their service would hopefully set them apart from the rest of the cloud solutions. With the resources they have to pull this off, I was expecting great things. Unfortunately, I was very disappointed. What they’ve offered is at best a very rudimentary and limited IaaS capability that requires a lot of system-admin level effort to utilize. Granted, I understand that you have to do the technical work to make all this happen, but for most companies (especially SMBs) that want to take advantage of the economic benefits of cloud computing, it still requires too much technical expertise and pull it off. Where is that “galactic glue” that Simon mentioned.
The fourth thing (trust me, there is an end to this…) that prompted me to write this was a meeting with a potential customer of one of the companies with which I am consulting. In our discussions they outlined where they were on their journey to private and eventually hybrid cloud nirvana. What surprised me was how early they were in the journey – talking about server virtualization and consolidation. I was somewhat taken aback that they weren’t any further along. This is a pretty well-known company with some really impressive system and data (billions of “objects”) challenges. Maybe I should attribute their current situation to hyper-growth and lack of time to focus on longer-term architecture and strategy. But this tells me that while we are starting to get comfortable with the notion of enterprise-ready clouds, we probably aren’t as far along as we think.
Fifth (getting closer…) was Citrix’s announcement that they are joining the Apache Software Foundation and contributing OpenCloud to the open source community – in effect promoting OpenCloud over their participation and backing in OpenStack. As with anything, there are hundreds of different opinions on why Citrix is making this move. I’ve been keeping tabs on the seemingly limitless posturing and “gorilla dust” battles that have been going on between the various consortiums, standards bodies, open source foundations, and template creation bodies that are trying to bring some degree of order to the technology and cloud playing field. But just as a panelist at a recent Carnegie Mellon-sponsored event on “Big Data in the Smart Grid” shared, there are “50 science projects going on in each of the 50 states.” Standards have always lagged innovation and adoption of technology, but I question if the constant reshuffling of deck chairs by these entities is really getting us any closer to a truly transparent and tranportable cloud model.
And finally… I read Ray DePena’s latest “Cloud 300” list of who’s gaining mindshare in cloud computing. I have to give Ray credit, he has “staying power.” I maintain a pretty extensive “watch list” for cloud computing, but nothing to the scale of Ray’s. Nice work Ray. But this once again brought me back to my premise for this post – with time, isn’t this stuff supposed to get simpler? As Ray points out, his list actually contains more than 300 entries, and these are just the ones he believes warrant some degree of mindshare. The landscape is cluttered with hundreds of more that barely see the light of day.
And in summary… what does this have to do with teaching a bunch of senior citizens about cloud computing and big data? Nothing really. None of this will be mentioned during my lecture. But to stand up there and tell them about the wonders of cloud computing without thinking about all the things that still need to be put in place to make it a truly effective service delivery model will be hard. But this is nothing new – we’ve been dealing with these issues for years… maybe I’m just too impatient and need to realize that we’re still at the dawning of cloud computing in terms of “technology years…”