Today we announced a partnership with Microsoft. The goal of this technical collaboration is to make PHP on Windows a first class citizen. It has always been our goal to make PHP run well on any platform, and this initiative reinforces that. Both companies have also observed huge interest both from the community and our respective customers bases in good PHP support on Windows, and therefore, collaborating on making this happen is a no-brainer for both.
This is actually not the first time where PHP and Windows have crossed paths. Five years ago, Microsoft invited the PHP development team to come to Redmond and spend a week in their labs to optimize PHP on Windows. Zeev, Shane Caraveo and myself represented the PHP team. We actually made quite a few significant improvements but since that time, there hasn't been much focus on making PHP run well on Windows. Also, five years ago the team that invited us Redmond didn't have very much of a corporate mandate so that effort was never really followed through.
Fortunately, things have changed. There are now people at Redmond who understand that supporting interoperability between open-source software and the Windows platform will have benefits to their customer base. 4.5M PHP developers are hard to ignore, especially when so many PHP developers actually do their development on Windows.
For Zend it has always been important to make PHP as widely applicable as possible. Just this year we announced support for the System I platform (AS/400). With the wide reach of Windows and having significant customer demand for Windows support, it makes a lot of sense to work with Microsoft on getting better integration between the two solutions.
One of the things I'm really excited about is that Microsoft was really willing to change their internal plans in order to better support PHP. Although they've had FastCGI requests from their user-base implementing it never actually happened. It also wasn't on the plans for IIS 7. However, recognizing that good PHP support really required a first-class FastCGI implementation, they managed to be exceptionally agile for a large company and change IIS 7 plans to include FastCGI support. Today they are releasing a preview release of this FastCGI implementation which in our labs tests performs very well and they continue to improve it. This preview release will support both IIS 7 (Windows Vista and Windows Server "Longhorn") and will also support Windows Server 2003 and Windows XP. Not only is this good news for many dynamic languages out there that support FastCGI but it's also a good sign that they have embraced an open de-facto standard as opposed to creating their own "FastCGI" standard.
The second piece in making PHP work well on Windows is PHP itself. After spending a lot of time on benchmarks it was clear that PHP performance on Windows was way behind other platforms. The trivial improvement would of course be a thread-safe build (gives about a 40% performance boost). But there were several other issues which have significantly impaired PHP performance on Windows. Up-to-date we already have over a dozen of improvements that we've made which make PHP run much faster on Windows, whether using better APIs, better algorithms or better utilize the available tools. We have released a technology preview version of Zend Core which include these improvements and people are welcome to play around with it. In the coming weeks we will share these patches with the PHP community and work towards including such improvements into PHP. For more information on the IIS FastCGI component visit the IIS Web site.
In addition, we will do continuous engineering to make sure that PHP continues to work well on Windows, application compatibility issues are met, and make sure that we continue to serve the community and business who are interested in good Windows support. Getting PHP to perform well and be stable on the Windows platform was the highest priority for this collaboration. It's the basis which any further collaboration would build on. Over the coming months we will continue to investigate other ways to strengthen PHP's interoperability with the Windows platform and try and resolve other issues which might arise. We've already received various requests for better Active Directory integration, SQL support, URL rewriting solution, and more. Besides the obvious need for making PHP work on Windows, there are many other improvements which PHP developers could take advantage of.
I very much welcome the Microsoft initiative to invest in getting PHP to run well on their platform. For Zend, working with Microsoft is in line with our collaborative strategy to further PHP usage and allow it to play well in heterogeneous environments. Whether it's Linux, AIX, Windows, i5/OS; DB2, MySQL, or Oracle; Apache, IIS, or lighttpd; we are commited to helping put PHP everywhere and anywhere.
Any questions, comments, feedback - feel free to drop me an email at andi at zend dot you know what.
Tuesday, October 31, 2006
Wednesday, October 25, 2006
PHP and Database Connection Pooling
During my recent vacation I worked on this blog entry which talks about connection pooling in PHP and some of the recent interesting results IBM published in a recent Zend Developer Zone article. I just saw Christopher Jones’ blog entry on a new feature coming in Oracle 11g so I’ve slightly adjusted it to include the news.
As is apparent from the likes of Yahoo!, Facebook and other large Web 2.0 companies who have billions of page views per month, PHP can indeed be deployed in a way which scales extremely well. Naturally with any technology that has to scale to huge volumes of traffic using large clusters of servers, there are always scalability issues that need resolving. Common issues that IT personnel deal with are monitoring and management of their clusters, application deployment, file system scalability, network topology, high availability and database scalability.
In my experience, databases have had a long history of being the typical bottleneck in PHP applications. There are many reasons for that including:
A) Dynamic Web applications tend to be heavily database driven and use the database as a searchable content store (CMS) and for transactional operations (e-commerce).
B) PHP developers prefer writing PHP code than SQL, thus often leading to sub optimal queries and moving the majority of the processing logic into PHP.
C) Web sites often serve millions of users, thus requiring PHP to deal with very heavy traffic. This in turn puts huge loads on the databases.
D) Lack of good cheap tools to find bottlenecks in database queries (such tools have always been very expensive. It will be interesting to see how MySQL's new service which also promises performance monitoring will change this).
D) PHP works best in multi-process environments such as Apache. There are many benefits to such a model especially stability, but it also typically requires each process to manage its own connection to the database.
What I'd like to talk about is the last point. Most production PHP deployments prefer to use persistent database connections in order to save the overhead of opening new connections on each request. This overhead doesn't only include the TCP/IP connection and the handshake with the database, but also server-side resource allocation such as processes, threads and the connection context which can take considerable time.
Using persistent database connections with the multi-process model creates a huge problem for large PHP clusters. Let's assume that the average PHP server is configured to spawn 200 Apache worker processes. For companies who have clusters with dozens or hundreds of PHP servers, this can end up being a huge strain on database servers who have to deal with thousands of concurrent connections. For example, a 100 server cluster will typically require 20000 concurrent connections. The performance hit on the server can be devastating for reasons including large memory consumption and crippling context switch overhead due to large quantities of threads and/or processes.
Such problems are not foreign to IBM's DB2 team. They have faced such multi-process architectures in the past and implemented a feature called connection concentrator. This feature efficiently handles large numbers of connections that have short-lived transactions which are very typical to PHP applications. Just recently IBM did some PHP specific benchmarks of this feature and wrote an article describing the results. These results really impressed me as they managed to easily scale to 10,000 concurrent DB connections (equal about to the size of a PHP cluster with 50 server) on a standard two-socket, dual-core processor Linux machine with 4 GB of memory. And not only was it on what is considered a lightweight machine for a database, there was memory to spare.
Although stability of the multi-process model has always been one of the main selling points for Apache, it has often been critiqued as not allowing for application-server like sharing of database connections between the various worker processes. This not only affects PHP but also other languages such as Python and Perl. Application server advocates have tended to point this out time and again, when in fact, it's a workaround for the database servers not scaling in this kind of scenario. Why should we be solving this problem by abandoning the rock-solid multi-process model rather than having the databases handle this common and increasingly growing runtime architecture? The traditional J2EE application server model is dying in favor of a more loosely coupled, robust and stable set of technologies that all interoperate and scale well together. Whether it's LAMP, OPAL, WIMP or other combinations of technologies, the world is moving away from monolithic systems to heterogeneous loosely-coupled systems. In such environments it becomes increasingly important for each part of the system to scale in various configurations and system architectures.
IBM has shown with this feature that they are successfully playing the loosely coupled game. Now Oracle are following and adding a feature which is also supposed to solve this problem and allow them to better scale in a heavy duty PHP environment. It will be interesting to see how their performance numbers stack up against IBM’s when the feature is released.
As is apparent from the likes of Yahoo!, Facebook and other large Web 2.0 companies who have billions of page views per month, PHP can indeed be deployed in a way which scales extremely well. Naturally with any technology that has to scale to huge volumes of traffic using large clusters of servers, there are always scalability issues that need resolving. Common issues that IT personnel deal with are monitoring and management of their clusters, application deployment, file system scalability, network topology, high availability and database scalability.
In my experience, databases have had a long history of being the typical bottleneck in PHP applications. There are many reasons for that including:
A) Dynamic Web applications tend to be heavily database driven and use the database as a searchable content store (CMS) and for transactional operations (e-commerce).
B) PHP developers prefer writing PHP code than SQL, thus often leading to sub optimal queries and moving the majority of the processing logic into PHP.
C) Web sites often serve millions of users, thus requiring PHP to deal with very heavy traffic. This in turn puts huge loads on the databases.
D) Lack of good cheap tools to find bottlenecks in database queries (such tools have always been very expensive. It will be interesting to see how MySQL's new service which also promises performance monitoring will change this).
D) PHP works best in multi-process environments such as Apache. There are many benefits to such a model especially stability, but it also typically requires each process to manage its own connection to the database.
What I'd like to talk about is the last point. Most production PHP deployments prefer to use persistent database connections in order to save the overhead of opening new connections on each request. This overhead doesn't only include the TCP/IP connection and the handshake with the database, but also server-side resource allocation such as processes, threads and the connection context which can take considerable time.
Using persistent database connections with the multi-process model creates a huge problem for large PHP clusters. Let's assume that the average PHP server is configured to spawn 200 Apache worker processes. For companies who have clusters with dozens or hundreds of PHP servers, this can end up being a huge strain on database servers who have to deal with thousands of concurrent connections. For example, a 100 server cluster will typically require 20000 concurrent connections. The performance hit on the server can be devastating for reasons including large memory consumption and crippling context switch overhead due to large quantities of threads and/or processes.
Such problems are not foreign to IBM's DB2 team. They have faced such multi-process architectures in the past and implemented a feature called connection concentrator. This feature efficiently handles large numbers of connections that have short-lived transactions which are very typical to PHP applications. Just recently IBM did some PHP specific benchmarks of this feature and wrote an article describing the results. These results really impressed me as they managed to easily scale to 10,000 concurrent DB connections (equal about to the size of a PHP cluster with 50 server) on a standard two-socket, dual-core processor Linux machine with 4 GB of memory. And not only was it on what is considered a lightweight machine for a database, there was memory to spare.
Although stability of the multi-process model has always been one of the main selling points for Apache, it has often been critiqued as not allowing for application-server like sharing of database connections between the various worker processes. This not only affects PHP but also other languages such as Python and Perl. Application server advocates have tended to point this out time and again, when in fact, it's a workaround for the database servers not scaling in this kind of scenario. Why should we be solving this problem by abandoning the rock-solid multi-process model rather than having the databases handle this common and increasingly growing runtime architecture? The traditional J2EE application server model is dying in favor of a more loosely coupled, robust and stable set of technologies that all interoperate and scale well together. Whether it's LAMP, OPAL, WIMP or other combinations of technologies, the world is moving away from monolithic systems to heterogeneous loosely-coupled systems. In such environments it becomes increasingly important for each part of the system to scale in various configurations and system architectures.
IBM has shown with this feature that they are successfully playing the loosely coupled game. Now Oracle are following and adding a feature which is also supposed to solve this problem and allow them to better scale in a heavy duty PHP environment. It will be interesting to see how their performance numbers stack up against IBM’s when the feature is released.
Subscribe to:
Posts (Atom)