Monday, March 24, 2008

Java is losing the battle for the modern Web. Can the JVM save the vendors?

A few years ago I worked on a very big Enterprise IBM Websphere project. We had some brilliant engineers in the project both in the development and architecture groups. I remember having had several discussions with some of the brightest people on the team regarding PHP and dynamic languages and generally they were looked upon as toy languages without a bright future. Lack of strict typing, scripting performance, and other reasons were given for why Java would persevere as the language of choice.


This was the typical reaction dynamic languages would get from the Java community. There were many believable reasons for why these languages, especially the ones gaining fame on top of the LAMP stack, would not last. However, one thing which the Java community ignored for many years was the radical shift to the Web, not only for media and e-commerce Web sites but for a large majority of business applications including CRM, ERP, reporting, document management, etc… As a result Java EE (then called J2EE) was not built with the Web in mind but rather focused on enterprise integration, transaction management and other back-end processing. While Java EE has long supported Web development with servlets and JSP the companies driving the standards ignored the RESTful nature of the Web and rather continued to drive a general purpose platform.


In parallel, the LAMP-like architecture built on top of the C language’s eco-system of libraries and tools started becoming the most popular platform for developing Web applications. This trend grew in the second half of the 90s and with a recession following the burst of the .com bubble it greatly accelerated due to the lower TCO that the LAMP solutions had to offer. While there are a variety of dynamic languages which make up the LAMP development and deployment paradigm, the most ubiquitous language has been PHP. As a result of PHP being domain specific to the Web it has been shaped in a way which makes it fit the Web paradigm like a glove. By focusing on solving the common Web patterns quickly and easily it holds the biggest market share on the Web. In two separate surveys of one of the most popular Ajax Web sites, the Ajaxian.com, around 50% of Rich Internet Applications developers are using PHP. The trend has also been significantly accelerated as a result of the many popular PHP packages including Wordpress, Drupal, mediaWiki, osCommerce, SugarCRM, and more…


When it became apparent to the large Java vendors that the Web paradigm was being built and innovated without Java they started backing a variety of both standards and non-standards driven Java Web application frameworks which promised to adapt Java to the Web. Such frameworks included Java Server Faces, Struts, Spring MVC and others. Some of these frameworks have been more successful than others but in general none of them managed to resolve one of Java’s main pain points on the Web. The strict typing and overly complex architecture of Java applications meant longer development times and a need for more skilled engineers in order to push Java applications into the market, i.e. Java’s TCO on the Web was unsatisfactory.


In the meanwhile the large Java vendors were trying to hold the stick at both ends. On one hand trying to be part of the Web paradigm shift and on the other hand protecting their multi-billion dollar businesses built on the Java language. Even the pervasiveness of dynamic languages in the Web space didn’t change the vendor’s behavior significantly. The big change came when Microsoft aggressively pursued a multi-language runtime environment for the .NET platform. Not only did they support C# and VB on their virtual machine but they worked with their developer community to add a large amount of languages including Cobol, Eiffel, Ruby, Python, and others. As dynamic languages continued to grow to the point where industry analysts started defining categories (e.g. Forrester Research on dynamic languages) Microsoft continued to leverage their common runtime which was designed from the get go to support multiple languages.


As mentioned earlier the de-facto standard implementations of the successful dynamic languages including PHP, Perl, Python and Ruby are all written in C and leverage the breadth and depth of the eco-system of C libraries. As community driven projects these languages do not have a specification nor is their development hindered by corporate bureaucracy. On the contrary, these languages are being developed by their users who have only one end goal – get the job done, quickly… As a result the languages are constantly evolving often adding significant enhancements in minor releases. With the rapid changes in how modern Web applications are being built and deployed this agile nature is a must-have to keep up with the latest trends.

In addition, the LAMP deployment paradigm has significant advantages. By featuring a multi-process architecture, faults in the Web Server and dynamic language software will typically not lead to sites going down. While one process may crash all other processes serving Web requests will continue running. This is in contrast to multi-threaded environments like the JVM (Java Virtual Machine, Java’s execution environment) where software faults including crashes and deadlocks will typically lead to system down situations. In addition, the ability to recycle processes after a set time will prevent memory leaks and memory fragmentation, two common software memory problems, from degrading the system efficiency over time. Another key advantage LAMP developers enjoy is the easy deployment paradigm. Software updates can easily and incrementally be pushed out to LAMP servers without requiring prolonged build and packaging processes. While this may lead to unorthodox and sometimes too lax of a process, when done correctly it makes the lives of the developers and the operations personnel much easier.


While LAMP’s growth was fueled by many of these development and deployment advantages, the Java vendors were stuck with the JVM which was very closely aligned to the Java language and had little support for targeting multiple languages. Instead of shifting towards a loosely coupled model of LAMP technology and Java technology in order to deliver the best of both worlds to their customers, most hesitated to lose control over the customer’s workload and entered an arms race to deliver dynamic languages on top of the JVM. With Microsoft on one side and the Java competitors on the other, each vendor set out to develop their own dynamic languages strategy.


Today Sun is investing in JRuby (Ruby) and Jython (Python) support for its Java EE solution; the IBM Websphere group has realized the ineffectiveness of the Java EE platform for running modern Web workloads and has invested heavily in Project Zero which aims to make big blue a Web 2.0 player and initially delivers support for Groovy and PHP; BEA has also had some incubation projects going but with the upcoming sale to Oracle it is unclear whether any of those efforts will materialize. Project Zero’s Chief Architect is one of the first IBMers to admit in public that Java today can be considered as a system language and is not desirable for building RESTful Web applications which is Project Zero’s goal (slide #4 of the presentation- see slide #11 to see how a simple “Hello, World” in Java compares to dynamic languages like Groovy and PHP). It has taken over 10 years for the Java stronghold to admit Java’s poor ROI on the Web and with the current recession it is likely that many Java customers are going to be making more informed investments. As a result there will be considerable rise in uptake of dynamic languages. Similar to the mainframe Java is heavily entrenched in enterprise IT and business-critical applications and is therefore not going away. That said for fueling modern Web applications the Java language will likely see a steep decline in market share.


The question to be asked is whether the non-Microsoft Web market will buy into the JVM implementations of dynamic languages or whether they will move to the LAMP stack which hosts the de-facto standards for the most popular languages.  While I believe there will be customers who are attracted to the JVM implementations especially the ones who are heavily influenced by their relationships with the Java companies, the majority of the market is going to prefer to go down the route of the LAMP stack. Reasons include:


-    The popular dynamic languages are all backed by very vibrant developer communities and are constantly evolving and adapting. The JVM ports of these languages will always lag behind the community driven de-facto standards implementation and therefore compatibility will be an issue. This is very similar to the problems the Mono community has in keeping up with .NET and this is even after help they get from Microsoft.


-    The JVM was not originally designed to host dynamic languages. For the foreseeable future the vendors will have significant challenges in keeping up with real-world use-cases. While they may show good performance in synthetic benchmarks such as for loops where JVMs are often superior in real world scenarios they will likely be impaired due to the dynamic nature of these languages which include closures, indirect method calls and a significant amount of type juggling.  See an example of how JRuby compares to Ruby’s current C implementation.  Also, we have to consider whether it’s truly in the hardware vendor’s interest to pursue the most optimized runtimes. With open-source community driven technologies the answer is clear.


-    The scalability requirements of the modern Web will require an increasing amount of processing density on the Web tier. C-based architectures are much more likely to be able to deliver the highest possible density by most efficiently interfacing with the operating system primitives and by delivering efficient, small foot print architectures. Such examples include high-performance Web Servers such as lighttpd, Zeus, IIS 7; high-performance caching systems such as memcached which is used by some of the largest Web sites including Facebook; and other performance critical subsystems such as memory management.


-    Multi-core systems work very well with the LAMP stack’s multi-process paradigm. With the chip industry now focusing primarily on multi-core as opposed to hyper-threading technology, the benefits of multi-threaded environments such as the JVM are not substantially realized on today’s hardware. Instead the multi-process paradigm delivers more stability and reliability.


-    Due to its simplicity, the LAMP stack delivers a very low barrier to entry for developers while still delivering the scalability of large scale production systems such as Yahoo! and Facebook.


In conclusion, it is becoming clear that dynamic languages are going to increasingly become the standard for Web development. Microsoft and the Java vendors have all recognized this trend and are now aggressively pursuing solutions on top of their software stacks. However, as the core dynamic language communities thrive outside of the .NET CLR and Java JVM software stacks the vendors will be in a challenging position if they solely depend on the uphill battle of cloning the successful dynamic languages onto their software stacks. Some vendors are aware of this challenge and have built hybrid strategies which also aim to deliver the de-facto standard dynamic languages to their customers even if they don’t have full synergy with their solution stack, this includes Microsoft’s investment in making PHP work with their solution stack and Sun’s initial attempts to deliver native Ruby and PHP implementations to their customer base. I believe that while the JVM approach to dynamic languages may appeal to some Java customers it will never be able to catch up with the broader open-source movement around native dynamic languages implementations. The JVM dynamic languages implementations will not be enough for the Java vendors to catch up and they will need to embrace the native de-facto standard community driven dynamic languages.