I've been working with the Basecamp API to plugin our IRC bot that we use for time tracking and I'm astounded to learn that escaping single and/or double quotes for XPath queries in PHP does not have a well documented, best practices solution.
In fact, it seems as though this is not peculiar to PHP. I took a look around and found this excellent article by "Kushal" (s/he doesn't have his full name on his blog):
http://kushalm.com/the-perils-of-xpath-expressions-specifically-escaping-quotes
I've produced a PHP solution for the general escaping/concatenation problem:
https://gist.github.com/1155973
Anyone got a better/more elegant solution? I'm always a big fan of code golf :)
Filed under: php | View Comments
Parsing the output of PHPs print_r function (or how not to create a log file format in PHP)
11 Jan 2011
I recently deployed a job on which the timeline was so tight that my ability to type quickly was what made the difference between delivering on time or not.
Everything was rushed, the budget was tight, it was one of those real seat of the pants deals and there was far too little testing done.
Just before I cut the site live (ie. minutes before I had to put this into production and have hundreds of people using it) I thought "You know, I bet something will go wrong somewhere here and I'll need complete logs of all the SERVER, POST and GET requests to piece it back together".
Boy am I glad I had that thought! Because it did. Nothing catastrophic, mind you, but there was a tricky integration with an external system that, as it turned out, periodically died without any errors or exceptions and just started returning bogus values.
The only problem is that, in the 5 minutes before the site was supposed to go live, I didn't really have much time to thoughtfully prepare a logging system to record all this stuff and, in my haste, I settled for:
When I came to actually process this information into something usable I realised what a dumb format for a log file this was and began kicking myself.
I had a look around for a snippet to parse the output of PHPs print_r but couldn't find anything so I thought I'd post my solution here just in case anyone else ever makes the same mistake - might save them a couple of hours (I've added comments in the code below):
Filed under: php | View Comments
This article assumes you have already experimented with using the vim debugger as described in this fine article. Big respect to Sueng Woo Shin for writing the debugger script itself. If you've already experimented with it and think you want to use it more then read on...
Using a debugger wins hands down over the good old echo/die approach but if you want to use the aforementioned debugger (which you do because you want to use vim right? right!) then you may need to get used to a few little quirks and maybe even tweak it a little bit so feels a bit....righter.
So with that in mind here's a few little things that I've done and realised that have made it start to feel that little bit righter to me ...
How to debug a script which is invoked via the PHP CLI
To debug a page being served up via apache you point your browser to:
http://example.com/index.php?XDEBUG_SESSION_START=1
in order to set a cookie your browser and then the presence of this cookie indicates that you're keen to debug some stuff.
If the code you want to debug is being run via the PHP CLI then the above won't work so instead you need to set an environment variable before you run invoke the PHP CLI to run your script. This works for tcsh:
setenv "XDEBUG_CONFIG=vim=1"
Then run your script via the CLI (eg, php -f my_script.php) and as long as you have Vim open and have pressed F5 (like you normally do when you want to debug in vim) then you should be able to debug your script.
What's going on with regards to the files that get opened when a debugging session starts up?
Well regardless of if you like it or not the debugger always opens the "first" file in the call stack, if you like. So if you are debugging a handler that gets invoked from say http://workingsoftware.com.au/myapp/index.php?h=FindProduct then it always opens index.php. This is quite annoying cos most of the time you don't really want to debug index.php...in the above example you probably want to debug find_product.class.php, which is the file where the application code actually resides (whereas index.php is part of a framework and you're usually interested in debugging the application, not the framework). (ed: this is referring to usage of the RocketSled PHP framework but will be similar for any framework where you have one script which dispatches control to a Controller or Handler class or script)
There's no real way around this that I can see. So this is what I do if say I want to debug a file called find_product.class.php which is a Handler class (or Controller in MVC terminology) invoked by hitting the URL http://workingsoftware.com.au/myapp/index.php?h=FindProduct:
- cd myapp/packages/myapp/handlers/
- open find_product.class.php at the command line by doing: vim find_product.class.php
- hit F5
- quickly (within the 5 second limit....see below - I have made this limit longer) go to the URL in your browser (ie, http://workingsoftware.com.au/myapp/index.php?h=FindProduct)
- the debugger will open up index.php and in the process close your find_product.class.php leaving you with index.php and the other debugger specific windows. Bit annoying. Since you dont want to debug index.php close this file by doing :q
- now you're left with just the debugger specific windows (actually viewports) in vim. Now open up the file you want to actually debug, find_product.class.php, by doing :sp find_product.class.php
- Now go ahead and do your debugging. Once the debugger session finishes you'll be left with the find_product.class.php file open in vim.
The good news is that if you set any breakpoints in your find_product.class.php file while you were debugging, as long as you don't close vim itself (doesn't matter if you close the find_product.class.php file and reopen it, as long as vim stays running) then you can hit F5 again to start another debugging session and repeat the above steps and once you've gotten back past the final step you will see that the breakpoints you set in the last debugging session have been remembered in the new one.
What are all these bloody windows?
I have found that using the debugger was made quite hokey due to the fact that the screen space got crowded out with a lot of superflous stuff. I can't see, at this stage, why you really need the TRACE_WINDOW or HELP_WINDOW to be there in every debugging session. The WATCH_WINDOW and the STACK_WINDOW are the only real useful ones.
So I have edited the debugger.py file to do away with these windows. You can download my edited version here
http://www.workingsoftware.com.au/downloads/debugger.py
NB: All the original code has been more or less left in there, I've just commented the various bits out. You can run a diff on the original to see exactly what the changes are.
Now when you start up a debugging session you'll see the file(s) you're debugging, the WATCH_WINDOW and the STACK_WINDOW all stacked on top each other like you'd get if you did :sp normally.
The fine peeps who developed this debugger have mapped the F1 key so that it resizes the debugger windows. This is cool but I found you had to press F1 three times to get the window to resize the way I wanted it. So I edited debugger.py again to make it just toggle between the equivalent of Ctrl+W= and Ctrl+W_ . So when you're debugging if you are in the find_product.class.php file's window and you press F1 it will blow that window up to take up the majority of the screen. Press it again and it will go back to an even split between find_product.class.php, WATCH_WINDOW and STACK_WINDOW. Go into the WATCH_WINDOW window and press F1 and it will make the WATCH_WINDOW blow up, hit again and back to even split. Same goes for STACK_WINDOW obviously.
Controlling the debugger - keyboard commands
I realise this is probably stating the obvious for a lot of people and that it is also covered in the debugger script documentation but I'll include it anyway for completeness... Also if you're using the aforementioned edited version of debugger.py you will no longer have the HELP_WINDOW in your debugging session so you'll have to remember the commands with your brain... this might help:
Command
What it does
F1
toggle resize
F2
step into
F3
step over
F4
step out
F5
run (to next breakpoint or end of script)
F6
end debugging session
F11
dump context to WATCH_WINDOW. this will dump the values of all the variables in scope into the WATCH_WINDOW
F12
if you have the cursor positioned over a variable in the file that you are debugging then hitting F12 will dump the value of that variable out into the watch window. NB: I have listed 2 caveats associated with F12 below
,e
If you do ,e and then type in the name of some variable then you will see that variable's value. If its an array you'll see the array contents, if its an object you'll see the object's contents. if it's empty or out of scope you'll see (null).
:B
sets a breakpoint
:Bp
unsets a breakpoint
F12 caveats
- If you hit F12 and the cursor is positioned on an empty line then it will cause vim to go into INSERT mode. Quite annoying but good to know.
- If you hit F12 and the cursor is in the WATCH_WINDOW or the STACK_WINDOW then it will print "no commands none" and end yourde bugging session. Again annoying but good to know.
A couple of handy commands for managing breakpoints
So you can use :B and :Bp to toggle a break point on/off. But some other useful stuff that's not immediately obvious is:
Command
What it does
:sign place
lists all breakpoints, showing which line each one is on
:sign unplace *
clears all breakpoints
???
go to next breakpoint (haven't figured this one out yet ... anyone?)
Five seconds doesn't seem like long enough..
The debugger.py script is hard coded to wait 5 seconds for a connection after you've pressed F5. This didn't seem like long enough to me so I changed it to wait 15 seconds.
This change is in the edited debugger.py which you can download here:
http://www.workingsoftware.com.au/downloads/debugger.py
Just search for 'serv.settimeout(15);' and change 15 to however many seconds you think is reasonable if you want to change it.
So they're a few little things I found handy. If you find more let us know ;-)
Comment Archive
This post originally appeared on an older version of the Working Software website which implemented it's own comment mechanism. This new version of our website/blog uses Disqus for comments (see below) however I wanted to preserve the comments made on the previous post here:
hi just to add another usefull thing :Bp would set up a breakpoint with an expression according to context you might wanna check, http://slackdna.blogspot.com/search/label/vim-dbgpclient
Zeft (2009-03-11)
Filed under: php | View Comments
Cross Browser Compatible Image Submit Buttons
28 Nov 2007
Usually, you can keep the design of your interface pretty well separated from the logic behind it. One example where this falls down, however, is in the case where the interface uses image inputs.
The image input type allows the use of an image for a button. They're not really all that popular due to accessibility issues, but this is not an ideal world and sometimes you have to work to a design brief that includes image buttons.
So the catch with these is that Browsers send the name attribute of the button to the server differently. In Firefox, for example, you get three $_POST variables in PHP:
- button_name
- button_name_x
- button_name_y
The x and y are the positions of the button on the page and are (were) used in server side image maps. However Internet Explorer will send only the latter two values:
- button_name_x
- button_name_y
So the safest way to check if a particular button was clicked in your PHP script (or other server side scripting language) is to look for the parameter with the _x or _y appended (in .NET this becomes .x or .y):
The use of both _x and _y here is redundant but I thought I'd just include it for the sake of completeness.
The problem I have with this is that it ties the way the form is being processed to the way the form appears. So my workaround for this is to always use a function to check for the presence of button parameters:
That way you can check which button the user clicked and it won't matter if a later design change adds in image buttons.
Filed under: php | View Comments
I recently had to take a string with some delimited values as placeholders and replace them with variables from an array. This is useful for things like sending out email newsletters or SMS messages with people's names or other unique information in each message.
An example of a string with placeholders in it, using the string '%%' as a delimeter is:
Hello %%NAME%%, your %%CAR%% is ready to pickup from %%STORE%%
Now take an array with some key/value pairs:
The following code will perform the replacement based on the filter:
The interesting parts of the above code are, in the pattern:
the "?" makes this a "non-greedy" match, meaning that it won't match everything in between the first and last occurence of $delim. Using "?" in this way allows you to make individual matches non-greedy. You can also use the /U modifier at the end of the expression to make all matches non greedy:
The next interesting bit is the fact that the second argument to preg_replace_callback can be an array with an object and a method. This allows you to use preg_replace_callback in your OO based PHP application without breaking out into procedural code (cos where would you put it!?).
Filed under: php | View Comments
I'm working on a job for a client where legacy database data are being used to generate an XML document for processing with an XSLT stylesheet.
The data are encoded HTML entities in the database. So when I created my DOMDocument, I got the following warnings:
Warning: DOMDocument::loadXML() [function.DOMDocument-loadXML]: Entity 'middot' not defined in Entity, line: 963 in /usr/local/www/data-dist/sheds/includes/SDEHSFunctions.php on line 414
Instead of passing in '·' in the XML string to the constructor of the DOMDocument object, I needed to either declare all entities in the XML doctype (bothersome) or I needed to convert these text entities into numeric ones (eg. '·' becomes '·').
I took a look around and found this handy function:
http://php.net/get_html_translation_table
I did a print_r on the translation table returned and found that it returns an array where the key is the actual character represented and the element is the textual HTML entity. So here's a quick function to get the character coded equivalent:
Filed under: php | View Comments
Here is my little recipe for getting PHP4 and PHP5 to run concurrently on FreeBSD using Apache 2.2 and mod_proxy.
First, check this out:
http://wiki.coggeshall.org/37.html (ed: this link is currently broken - I've emailed John to see if he has an updated link)
That's the basic recipe, however there are a couple of problems with it. The first is what appears to be a syntax error (perhaps an incompatibility between 1.3 and 2.2 versions of Apache) that I fix later, the second is that this ProxyPass only does half the job! I found that, using this method, if i went to the PHP4 host, it displayed the correct index page, however if i put something like http://php4.myhost.com/subdir then it changed the URL of the browser to be http://127.0.0.1:8081/subdir! This wasn't to do at all. Thanks to megapaz in the apache chatroom on freenode for pointing me to this next article (and thankyou to niq also for writing it!):
http://www.apachetutor.org/admin/reverseproxies
So I also need a reverse proxy. That article is really long and deals with the issue in great detail, more detail than I required. I do not need to deal with multimedia content, javascript links or problems with links in general because I am only using PHP4 to run one very specific application.
There is one specific upshot of this: the discussion below deals with compiling the proxy modules from the FreeBSD ports system, however when I tested it out, this did not include all the modules mentioned in the above article, so in order to get all those modules compiled in, you will need to look into which 'KNOBS' you can turn on (see below) or manually change the Makefile.
Compiling with mod_proxy from FreeBSD ports
Because when you install from a port in FreeBSD you don't actually type ./configure, you have these things called 'knobs' that allow you to set/remove common options. In order to configure with mod_proxy when installing Apache 2.2 from the ports, I did this:
I then include the following lines up the top of httpd.conf (where I
have added line breaks for readability, they are marked with "\"):
Installing another Apache instance
For this I just downloaded the latest version of Apache 2.2 from the apache.org website and cut and pasted the instructions in John Coggeshall's tute above to install it without the ports.
PHP Installation
Now that I had Apache 2.2 installed from the ports, I went through my usual PHP5 installation for which I don't actually use the ports. So I just get the latest from php.net and have a script called runconf.sh which has my configure line in it that I've been using forever:
Once that was all up and running, I needed to install PHP4. So I downloaded the latest version of PHP4 from php.net, and used the following line in configure:
So when I installed, my PHP4 installation was located in my second Apache installation folder. I then copied php.ini-recommended into:
And downloaded the latest APC (Alternative PHP Cache). I wanted to install it for the PHP4 installation, so I followed the instructions in the INSTALL file, but used the phpize and php-config included in the php4 folder I had installed under my apache_php4 installation:
If you haven't used phpize on this machine before it will complain it can't find autoconf etc. FreeBSD comes with autotools 253 and 259. I've found that 259 works for phpize, so I do:
Configuring and running Apache
I won't bother going through Apache configuration here, but there is a mistake on the Coggeshall wiki (well, for v2.2 it doesn't work, perhaps it worked for 1.3 or something). I put a file called php4.conf in /usr/local/etc/apache22/Includes/ with the following in it:
And use the 'Listen' directive explained in the Coggeshall wiki in the instance of Apache that is running PHP4.
I then run my Apache PHP5 instance as normal, and use the good old apachectl for the PHP4 instance:
Bug 37770
After running for a few hours, I noticed that the Proxy would mysteriously stop working, and would return a 502 error page with the error:
proxy: error reading status line from remote server 127.0.0.1
This was a real doozy, as it was a very intermittent bug and didn't seem to be related to load or anything in the code that was being executed. I tried running httperf on it to load test and could not reproduce the error reliably. After some reading, I found this blog post mentioning a similar error using mod_proxy to run Mongrel for Ruby on Rails hosting:
http://blog.pomozov.info/posts/apache-mongrel-and-proxy-error.html (ed: This link is broken. I've attempted to contact Anatol Pomazov to see if there is an archive of the content somewhere)
I also found this mentioned here:
http://httpd.apache.org/docs/2.0/mod/mod_proxy.html#envsettings
So from this I concluded that my VirtualHost include should now look like this:
I then restarted both the PHP5 and PHP4 Apache instances:
and I am yet to see the error recur having run the system for nearly 48 hours since doing it. If the error does recur I'll update my post but it appears to be fixed.
That's it! Now I can point my browser to http://php4.myhost.com/ and it serves up pages using PHP4, but maintains the correct domain in the browsers URL bar. Because I've used a VirtualHost directive to pass to the PHP4 instance, I can easily make any domain I'd like get served up with PHP4.
Filed under: php | View Comments
UPDATE Wednesday December 29 14:27 GMT+11: I've added another post that follows this up with a fix for the blocking behaviour of PHP sockets: Interprocess Communication in PHP with Non-Blocking Sockets which is intended to complement but not replace this original post.
I recently wrote a little application that dumps a file across a forwarded port. It was tricky, but very convenient because you can do things like report status of the upload process (I could also have done this by parsing the output of something like scp, but I kind of liked having direct access to the copy process/stats).
When I first wrote it, I didn't know what I was doing and had never written socket code before. It was a big procedural mess. Naturally I was keen to separate out my socket class into it's own package but this presented a problem: the controlling process needed to check the status but how could I decouple the process that instantiated the socket class from the socket code itself? I didn't want to hard code the status reporting into the socket code and I wasn't too hot on the idea of passing in some kind of status reporting 'callback object' to receive messages during the process.
I figured what I really needed was a new thread. I'm not sure if that's exactly what a fork is, but it's close enough and serves my purposes. You can fork in PHP using the pcntl_fork method:
When I first started out, I thought that this would be really easy. Think again fool! It did turn out to be quite easy, but just not obvious. For starters, I read this article on PHP forking (ed: the original article was located at http://www.van-steenbeek.net/?q=php_pcntl_fork which has been taken offline, although the author was good enough to organise an archive) and the man page at php.net.
As you can see from the PHP manual, in order to use the features in this article you will need to compile PHP with the following options:
Compile arguments for PHP to enable pcntl functions
OO and Me
I'm a nut on object orientation. To me it makes loads of sense so I always write stuff with objects. The rest of this article uses objects but the issues I'm discussing apply equally well in a procedural arena.
The First Barrier - Copy On Write
When you fork a process, the script splits in two (like Lorraine in that episode of Astro Boy when she gets into the robot fighting league). As "Jeff" on the pcntl_fork man page describes (in the user contributed notes), this operation is not expensive because it uses a "copy-on-write" model. This means that, until each process writes to a variable, that variable is shared between the two processes. Great! Inexpensive.
But this causes some problems. Consider the following piece of code (which was my first attempt):
The ForkMaster class PHP source code
If you run that code with PHP on the command line, it will swamp your screen with 0's. Oh if only it were that easy!!
See the problem is that as soon as the child process writes to the $this->up_to variable, it gets it's own copy. So when the parent process checks if $this->up_to < 1000, it's always checking it's own copy, which is always 0 and never changes.
Interprocess Communication using Sockets
So it turns out there is a really convenient way to communicate between processes using the PHP socket_create_pair() function!
You can read about it here:
http://php.net/socket_create_pair
There's even an example on there doing exactly what I was after! So after reading that I was sold. The only thing left to do was to put it into a nice, reusable object pattern. The code I came up with is below, and I'll explain each part of it for you in detail:
Threader class full source code
What??
Okay let's take a look at the various parts of that code.
Start of the Threader class for explanation
So this is pretty standard. I'm defining a class called Threader. Interesting to note that, for reasons you'll see later on, I've chosen to implement this class using the Singleton pattern, so I've got:
The static instance member declaration
The other thing to notice is that the constructor is private. That's because in a Singleton pattern you never instantiate a class directly. You do it with a factory method like this:
This means that when you call:
you will always get the same instance of the Threader object that is instantiated the first time you call it. Of course this does not hold true in the case of forked processes (I had hoped that it would but I wasn't so lucky). Now we get down to the nitty gritty.
The countToOneMillion() method is the one that we want to fork because it's going to take so long and we want to have status updates. So the first thing to do is create the sockets we'll use for interprocess communication:
Threader socket_create_pair() call
This is straight off the socket_create_pair man page. The only tricky bit here is that we are taking the second of the two sockets we created and storing it in a member variable. Now comes the fork we've all been waiting for:
I won't got too much into explaining the fork code because the PHP manual pages as well as the article I posted above do a good job of that. I'll just draw your attention to two things: firstly when I fork, I assign the $pid returned to a member variable of the Threader instance. Luckily for me, this adds it to both the Threader instance in the child process and the parent instance.
This means that where the code checks if(!$this->pid) that code is only executed in the child process. This is where this example differs significantly from other examples in the manual and tutorial above. Because we set the $pid as a member variable, we are now free to go about our merry way. As the child process counts up to one million, it reports its status by writing to the first of the pair of sockets we created.
Curious Interlude
Although you can't write to variables owned by other processes, all the resources are shared. This means that file descriptors, sockets and database connections are shared between the two processes. This is why we can share sockets between the two processes. There is an interesting article explaining this here:
http://www.hudzilla.org/phpbook/read.php/16_1_4
Back Into It ...
So now we have a child process which is going to happily sit there and count to one million, and keep us up to do date by writing status messages (in a pre-determined format) to a socket. So we just have to listen to the other end of the socket! Here is the code that will do that for us:
The first thing it does is check that $this->pid evaluates to true with if($this->pid). As you'll recall, because we assigned the $pid returned from pcntl_fork to a member variable of the Threader instance, we can now use that to check if we are in the parent or child process at any time. As it states in the PHP manual, a $pid > 0 means we are in the parent process. Convenient!!
We then call the socket_read function to get the data that was written using the socket_write function from our other process. One not so convenient part of this is that you have to provide a length for the messages, which I've included as a member constant of the Threader class. What this means is that, when the child process writes to the socket, we will only get one message at a time, and we won't miss any messages.
I then just use a few simple if/else statements to check what the message was. If it was an "UP TO" message, which means that it was updating the status of the progress in counting up to one million, then I set the $up_to member variable on the object. Now ...
The Code that Calls It
This is what I've been building up to the whole time, and chiefly why I just wasted a day doing this when I could have just as easily passed a callback object. Because I've used a Singleton pattern and because the pid is stored in that Singleton, the code inside the Threader can be completely decoupled from the code that calls it. So:
First thing to note here is that I'm accessing the Threader class using it's current() method rather than instantiating one using the 'new' keyword. As I mentioned above, this is because the Threader class implements a Singleton pattern. So when we check Threader::current()->pid(), we ensure that the code will only execute in the parent process. The child process is busy ticking away the whole time since we called countToOneMillion.
The real magic happens when we call:
If you'll remember, that's where we read the status messages from the socket that are being written by the child process. This method also updates the $up_to variable in the Threader instance. We can then echo out what number we are up to.
Now something screwy happened here. I didn't want to echo out a million numbers, so I put a sleep(30) call in so that the status would only update once every 30 seconds. However it just hung. Never went anywhere. If anyone knows why I'd love to hear it. So instead I just put in some code that only echo's out the number it's up to if it's more than 1000 greater than the previous update.
Conclusion
The actual result above is identical to what could be achieved by simply echo'ing out the values in the countToOneMillion method. However the significant thing is that you can have a class which performs some time consuming task, then in the meantime you can do something else like send someone an email or SMS updating the progress.
The other significant thing is that the class performing the time consuming function doesn't need to know anything about the environment in which it is being called. In terms of creating a reusable class that you can drop into any project this is highly desirable.
Happy forking!
Comments Archive
This post originally appeared on an older version of the Working Software website which implemented it's own comment mechanism. This new version of our website/blog uses Disqus for comments (see below) however I wanted to preserve the comments made on the previous post here:
For the 30 sec issue: I would bet that PHP simple timed out. See here: http://www.php.net/set_time_limit
Peter Nagy (2008-06-01)
OOPS!! Don't know how that got in there :) fixed now. thanks for stopping by!
Iain Dooley (2007-06-11)
Hi. Thanks for the article with the link for my site. I'm sorry to see you've got the link wrong, though :-) Your link is: http://www.van-steenbeek.net/%EE%9B%8Fq=php_pcntl_fork And it should read: http://www.van-steenbeek.net/?q=php_pcntl_fork Again, thanks. Keep up the good work!
FST777 (2007-06-11)
Wow, good job, that's some really interesting code, thanks for publishing.
dave (2007-06-08)
good to see you got it working, and even better that you wrote about it :-D - ecoleman
Eric Coleman (2007-06-08)
Filed under: php | View Comments
Subscribe
Building software in the real world - the Working Software blog
We write about our experiences, ideas and interests in business, software and the business of software. We also sometimes write about our own products (in order to promote them).
Recent Posts
- 18 Things I Wish I Knew 7 Years Ago
- When does automation become coding
- A list of things you can do to afford Mixergy Premium in 2012
- Thanks Louis now here is my dad
- Your templating engine sucks and everything you have ever written is spaghetti code yes you
- Energy for Opportunity website is now live
- Escaping single and double quotes in XPath queries in PHP
- The reason that outsourcing software is so difficult
- Help us build an awesome crowd sourced search engine
- Maybe people just don't care