Development

Stop child-processes from keeping lock (LockHandler)

0

This kept me busy for a while. I have a queueing system that run constantly on our servers – built in PHP because, why now 🙂

Basically I have the controller, run as: php app/console aqueue:controller --bootstrap  – this merely starts, and one by one run the child-processes and disowns them, like so:

However, the problem is – in order to make sure somebody doesn’t run the bootstrap script more than one at the same time, I get a lock at the top of the console command before continuing – and if I can’t get the lock, it exits immediately. It’s a safety measure.

I get the lock like this:

The problem is, once you start the child-process – each of them ALSO holds the lock that the aqueue:controller created above. Thus, when the bootstrapping controller is done, but the child processes are still running (as they should be) – it can’t start the bootstrapper again because the lockfile is held by other processes.

To fix this: Simple, release the lock.

So to do this easily and elegantly, you merely create a destructor that releases the lock in the bootstrapping command, like so:

Once this is done, the lockfile inheritance problem goes away 🙂

Exiting script mid-way – gracefully (with a die-file)

0

Why I never thought of this before, I don’t know. But this is saving my bacon right now:

The scene:

I have a script that does 10 000 things on each loop on my DB. Unfortunately, because of the architecture that I’m interacting with, I can’t wrap the entire thing in a transaction. If I could, then killing the script mid-way will have no negative effect on the db, because it’ll just roll back the whole thing.

However, killing the script in the middle having it non-transactionalised, introduces a whole range of negative side-effects – life for instance inflating the summary data I’m building.

Anyways, forget the above, it doesn’t matter.

All I want to do is to “tell my script” that I’d like it to quit after the current iteration of the loop.

My solution: Give it a “die file”.

Say for instance my script’s name is “maintenance20150216.php”

I put the following code inside it:

Cool thing about this? All I got to do while the script is running:

This in turn creates the “die file”, which tells the script to quit after the next iteration of the loop.

This way I don’t have to kill or ctrl-c the script and risk it quitting at the wrong place inside the loop. My script can run for it’s calculated days of processing, and if I need to make a change – or at the very least have it back off for a bit to give the DB a breather, I can do so easily and safely now.

Culling your module/actions (Symfony1)

0

This post is centered around Symfony1 – but it translates to other frameworks quite easily if you have filter chains. The reason we needed to do this, was to figure out which of our old legacy symfony1 module/actions are still in use, how heavy the usage is on each, and which ones are not being used anymore – so we can delete them, yay fun 🙂

Just some background: We’ve run Symfony 1.4 for a long time, and most of our legacy systems were written in that. Then when Symfony2 was launched, we started to write new code in there, and slowly started to move things from the old legacy system into the new system – especially things we wanted to make changes to. We’re a point now, that we would like to get rid of all the front-end module/actions in our old system.

Here’s the approach that seems to be working rather well…

Figuring out what module/actions are still in use:

A filter does the job nicely. We wrote a small filter that which merely logs to a file the time and what module/action was called. (lib/myDeprecationFilter.class.php)

(Please ensure that you install Monolog using composer – but you really can log it any other way you want as long as your parser then knows how to read that)

You then have to put this in your filter.yml in your app folder (somewhere similar to apps/yourAppName/config/filters.yml)

(the order in which you put this really doesn’t matter)

The log file:

When you make this live (remember to clear cache), you’ll notice that the log file is created and starts to get entries written to it, here’s an extract of how it looks like:

Monolog adds a nice bunch of information bits to the logfile, which we use for other things – but the two that you’re concerned about is the action and the module that’s being logged, called “ctxt_action” and “ctxt_module” respectively.

It may be prudent to logrotate this file, as it will grow fast (very fast for our site) and we don’t want it to be the cause for a filled up disk error ;). Merely add something like this in your logrotate.d:

(note the rotate 30 – we thus only keep 30 days worth of logs)

Reading / Parsing the log

This is not nearly as elegant as it can be, but who cares – it’s only for us internally.

As you can see above (please excuse the hardcoding – no need to put the CLI handler in here), we then run through all the files (including the compressed .gz ones that logrotate create) and reads every single line (nice and inefficient, right? 😉 ). We then do counts into an array (using @, lol… I mentioned this is an internal tool right?), and then output the contents of the array.

It looks something similar to the below extract:

From the above, you can see that we have a multi-dimensional-array that has the module name and then the actions inside it which was called.

The above then shows you which modules (for a start) have been called in the last 30 days (remember our logrotate?)

For our systems, it’s safe to assume that if somebody have not looked at a module/action in 30 days we have created and linked the new one in our new Symfony2 install and that clients/customers cannot get to the old one anymore (except off course if they have it bookmarked)

From here it’s easy to extract the module names only (with a simple regex) and put that into a file.

Then you list the modules in your apps/appName/modules folder, and put that in another file.

And then simply run a diff between the two files.

The ones that’s in the directory listing but not in the called list – those are the ones you want to delete / cull off your server and get rid of in your code 🙂

Luckily for us we don’t really dig into the actions as we rewrite the entire module into the new code and only then do we remove it from the old code, so we don’t have to overcomplicate it.

Have fun and let me know if you have something to add to this 🙂

Know your IDE (Data Sources in PhpStorm 7+)

0

Sometimes it’s the small things that makes a huge difference to your day. It’s those tiny things that makes a developer grin – smile even! 🙂

Like, for instance, setting up your Data Source so that SQL statements not only look cool, but actually give you some added productivity.

Having a look at PhpStorm when putting a simple SQL statement in, it’s got a rather ugly colour depicting that you don’t have a data-source set up. On top of that, it actually displays a little light bulb when clicking on it indicating that something is wrong, but the IDE knows how to fix it if you just gave it a little bit of into.

Screenshot 2014-09-05 23.19.17

 

 

 

 

 

 

If you look at the screen grab above, you’ll note the colour of the string containing the SQL, as well as the light bulb, which I clicked on.

Clicking on the “Configure Data Source” gives us the ability to set up the data source, and for good measure which database to connect to as a default:

PS: You may need to have PhpStorm install the JDBC first, check at the bottom of the pop-up window for details on that.

Screenshot 2014-09-05 23.24.20

 

 

 

 

 

 

 

 

 

 

 

It’s always a good idea to test the connection first, but if all goes well and you save and apply… your SQL statement inside your file suddenly changes colour and you have some extra functionality there:

Screenshot 2014-09-05 23.27.50

 

 

 

Note that it highlights the column name in a bold and slightly different colour. This indicates that the column actually exists.

Also, when working with complex databases, it’s not always possible (or even necessary) for developers to memorize all the fields – and their spelling – by heart. Now, because you set up your data-source, you get code-assist!!! Awesome, right!

Screenshot 2014-09-05 23.31.13

 

 

 

 

 

 

 

As you can see, not only do you get code-assist on the column names – but it also tells you which is the primary key with the little key-icon, it tells you what the data type is of the column, and it indicates that the column is referenced to another table (in the case of the project_id field with the little highlighted part of the icon).

Pretty awesome, hey!

So go ahead, learn your IDE and set this kind of thing up.

You won’t be disappointed!

 

 

Analytics – Not just for Corporates

0

We sometimes forget that analytics isn’t only for use in corporates / company websites, but also for personal use!

In our office, we were talking about Google Analytics (http://www.google.com/analytics) and tracking of sites. I thought it’s worth mentioning that a lot of people have side-line or personal projects to make some extra cash lately. What we (and even us Software Engineers) don’t always realise, is that we can do a LOT more with our site if we run some analytics on it.

Google Analytics is a great tool. Best of all, it’s free!

My colleague showed me how he tracks his wife’s site with Google Analytics. I was amazed to find out that with the Adwords special how quick the traffic picked up! Within days he had loads more people browsing the site!

We started to talk about Google’s Webmaster Tools, which – as one of the guys put it – makes sure you didn’t do something stupid (http://www.google.com/webmasters)

Another great way to drive the rating of the site up, is by having other sites link to it. In the interest of that, I’ve linked to it below 🙂 Hey, don’t judge, I’m helping out! 🙂

The product his wife is selling, is a pillow for pregnant ladies, and a lot of the guys in the office got their ladies the product already, and they’re raving about it! Seriously, awesome product!! Go check it out here. (http://4mybaby.co.za/)

I think the following quote from the site sums up why it’s the best thing since sliced bread: “Cushi has be design to ensure a good night’s rest and support for your baby bump, hip and back during pregnancy.”

Warn about Security Holes, but don’t be an asshole

0
The view of the world from the perspective of the Developer having the Security Problem:

We’ve all read about security, password encryption vs password hashing, why encryption is better than clear text, why hashing is better than encryption, why bcrypt is better than say sha1, and so on. But the fact of the matter is, is that most projects on the outset de-prioritises these security concerns for other, “more important” things at the time, like for instance getting the job done and launching the project.

“Sure, we’ll look at security after launch” is most commonly used among developers, and yes even though this is the wrong thing to do and we all agree that it’s the wrong thing do to, a lot of us if not most does exactly this.

The problem with this is, however, that we never “get around to it” when it comes to fixing these things. The reasons for it is three-fold. Either we’re lazy. Let’s not do that when it comes to security. The other two reasons are more understandable though. We’re perhaps just too darn busy. For the business factor we have to actively as professionals in the industry make an effort to “make the time” for these important “behind the scenes” work – especially when it comes to security, but this is a whole different discussion.

The other more common reason is “because it’s really difficult and we’ll likely have to change our client base’s behaviour”. This is tricky, because a lot of times either the company you work for has a management team that’s really great at marketing and business, but don’t really care that much about security (or perhaps know that much about it). One can convince them to take the plunge by pointing out the marketing or PR nightmares if we get hacked, but it’s still difficult. Also, most of our client-base are really not phased about security, in fact they don’t even know what it is until one of two things happen: 1) you change something and they notice or 2) you get hacked and they notice because it’s in the local newspaper (or news site). Both, unfortunately, bad, because both create phone-calls and support tickets, generally which you have enough of and would like to not create more admin by changing something.

Anyways, so let’s all agree that it’s the right thing to do to make the security change required and change the behaviour of the clients. The sooner the better, but now that you’re 6 years into a project it’s going to take some time. It’s not only going to take time, it’s going to be difficult to juggle security and ease-of-use to make sure that the clients don’t complain too much. My advice: Plan! Plan how you’re going to attack the problem, make sure you talk to your internal security specialists (or external consultants (or both)) to make sure you’re doing the right thing. Then lay out your time-line and try to stick to it as close as possible.

All said and done, as Developer ultimately it’s your responsibility to protect your client’s data, including and very importantly, their passwords.

The view of the world from the perspective of the “Guy who notices the security hole”:

Now, to get to what the subject of the post is about. Let’s switch our role from being the developer who’s charged with fixing the security problem, to some random client (or prospective client) who’s using the system/site/service or about to sign up for it.

From time to time we as internet users come across security holes in systems. The world is full of security problems as discussed above. But for all the security problems that we notice there’s almost always somebody behind that who really care about fixing that problem, but whom are faced by the difficult task of doing so (accompanied by all the human and technical problems surrounding it). And furthermore, in this group there’s a sub-group of developers who are in the planning phase or the active development phase of fixing the problem. This is where my story resides.

With the modern web of social media everywhere, we tend to talk a lot… a LOT… online. We generally tend to talk more about the things that we’re passionate about and especially that we know more than others about (well, we hope so anyways). Some of us are developers and some of us are even security specialists, even working for a security consultancy firms. Thus we’ll tend to talk about security problems. However, we need to be careful about when we talk about it, how we approach the discussion, and how public the discussion is.

Let me use an example of what happened a while ago:

Our team was faced with a security problem (not quite as big as the one described above, but still complex and tough to fix) for quite some time. We’ve spend a great deal of time and effort planning how we’ll be fixing the problem. Development on fixing the problem was about to start, in fact the morning of the below described incident, we had a water-cooler-chat where it was agreed that development on the problem is starting and we’ll finish the initial phase of it in about 2 weeks (hey, a nice agile scrum sprint!)

Then the “unthinkable” (read “annoying thing”) happened:

Now, we’ve had security concerns raised by clients and prospective clients numerous times. Most companies in the IT industry do. Every few weeks we get the odd ticket where a client is concerned about a security problem on our system. Usually it’s a small security hole that we plug and make sure it exists nowhere else on our system. But every few months we get the more serious concern, to which we thank the client and assure them that we’re looking into it (which we do/did). This time, though, the concern broke on a very public way – on Twitter. Yes, you can just imagine the storm this kicked up! Think about it… on twitter, for everybody to see, a concern was raised about the security of our system.

Now, this is not the first time it happens in this fashion, but what’s annoying is that the individual who tweeted it should’ve known better – he/she is a “professional security consultant”.

The wrong assumptions of it was made on twitter which made the problem look even worse to the public and untrained eye (no we didn’t store passwords in clear text, but this was said on the internet, which naturally makes it true *sarcasm*). Now, as everybody knows, the people running social media on an organisation is extremely good at what they do; that said, they’re not security specialists. Thus, starting a discussion/fight on Twitter (on twitter!!) with a company about security means you’re fighting/discussing a matter with somebody who is ill-equipped at answering your questions properly. The team running the social PR did exactly what they were supposed to do, they tried to mitigate the public problem by ensuring the tweeter that we’re serious about security, and that they will take it up with the relevant department. Still the tweeter attacked the problem (publicly). The tweeter proceeded to paste a link to an article about it. Again the social team assured him that it’s being looked at. Yet, the tweeter again explained that we’re being stupid (yes, I agree we were stupid) and that this is “security 101” (ok) – still, publicly… on twitter… with a company of thousands of followers. Then the social team took a different approach, to which the tweeter baited them by asking something along the lines of “So, I can assume that you’ll sort the problem out?”.

Let me categorically state: We were in the wrong here. When it comes to the question around security, we did the wrong thing. But that’s not what the issue is that I have here.

What scares me, is that we have a so called professional, a person who knows what the effects of talking publicly about security, whom broadcasts the fact that this security hazard exists (and even makes it seem to be a bigger hole than it actually is). Let me put it another way: This person basically SMS’d (Texted) tens of thousands of people about the security vulnerability. Think I’m being too harsh? Read the definition of twitter: “Twitter is an online social networking and microblogging service that enables users to send and read “tweets”, which are text messages limited to 140 characters. Registered users can read and post tweets but unregistered users can only read them. Users access Twitter through the website interface, SMS, or mobile device app.[read more here]

Luckily, nothing bad happened, however the carefully planned timeline and actions that we were going to take over a 2 week period went out the door. Thousands of rand in money of the time and resources spent planning, and the first part of the development was flushed down the drain. All because there was a public announcement of the security hole.

Upon inspecting the user, I noticed that she/he did not contact us about this in a private manner whatsoever. There was never a phone-call from the user. There was never a support ticket created about the problem. Nothing at all. The first time this user interacted with us was when the tweet was made.

This, sir/madam, is called being an asshole.

Conclusion

So, the moral of the story, to all my colleagues and friends and readers of this blog, if you spot a security vulnerability… by all means, notify and warn the company about the problem. Because face it, we’re all in this together, and we’d like our industry to get better at what we do, and we’d like to see security holes to go away. Heck, even make a fuss about it and cause some hairs to raise with throwing a few choice-words around over the phone until you speak to somebody who gives a damn about security. But for the love of world peace… don’t… be… an arse and post it on twitter/facebook/google+/linkedin/whatever social media you procrastinate on.

If you’re so concerned about the company’s security and they don’t do anything about it, then take your business elsewhere if they don’t listen. But really, think before you post, don’t just have verbal (or keyboard) diarrhoea.

Side-note: The security problem was “plugged” with a bad hack, and then fixed properly afterward – but only a fraction of the ease-of-use features that we were planning to build in to it was implemented thus hurting innocent clients, because… well… once again, our window has closed – and now that the security hole was plugged, we no longer had a burning need to work on the cosmetics of this project, there’s other more important things to work on.

So, the tweeter in question wanting to help the industry (well I hope so) by publicly posting a security vulnerability to get the company’s attention, but actually ended up just hurting the industry.

Anyways…

… happy surfing and posting… and remember…

… don’t be an asshole 🙂

Visualising the Evolution of our Source Code

0

We have well over 100k lines of code in the core system right now, that’s not even counting all the supporting templates, images etc that is all over the place.

We run a hybrid system between Symfony 1.3 and Symfony 2.1, parts of our system is in the old Symfony and parts in the new Symfony2.

People always likes cool pictures (especially the kind that move), so I set off on a path to take our SVN repository of everything and visualise it.

Luckily for me, there’s already a tool to do this, called Gource. I have Gource 0.38 but it’s already at version 0.40 already with some nifty extra features.

Here’s the youtube video of the code as it progressed from 2007 up to August 2013. Can’t believe it’s been 6 years worth of development! http://www.youtube.com/watch?v=DojMulQ8CcI (Best viewed at HD-720p)

In order to do this, you go into your base-SVN folder and do the following commands:

As you can see, it’s rather straight forward.

  1. Firstly you get the SVN log – this takes a while especially on a large project like this.
  2. Secondly you run the log (in XML format) through gource to create the video – you pipe the video to ffmpeg to make the mp4
  3. Lastly if you’d like, open it in iMovie and add sound to it, like I did.

Important to notice that you shouldn’t fiddle with the video while gource is running, because everything you do to the app while it runs is saved to the mp4. Just… be patient 🙂

You can find gource here: https://code.google.com/p/gource/

I just installed it with brew (in OSX), instructions can be found here: https://code.google.com/p/gource/wiki/MacSupport

Well, it was fun 🙂 And it looks pretty, so I guess it was worth the hour it took me 🙂

Analytics – Why it’s important for Internal Tools

0

Introduction: Analytics on internal systems, and why it’s important!

The modern web is all about numbers, statistics, getting feedback fast and adapting to the feedback to make the experience that our users have when visiting out sites better, and having them come back to the site more often – which translates into sales / conversions. We advocate to our clients the crucial need to have a good analytics toolkit installed on their system. Something that measures page views, events, technology, referrals, goal funnels, and many more metrics. Each one of these metrics is a necessary and very useful part of the “big picture” that we grow of out site, how people are using it and what to do (and not to do) in order to guarantee success.

Here’s the key question, though: Why do so many companies forget that their internal systems also has users?

By users, I mean mere mortal human beings that have opinions about the software, get into bad (and good) habits, prefer certain technologies and uses our systems in their own way, as opposed to the way that we thought or intended them to use it.

At Afrihost we have internal systems just like any other organisation. We are in the lucky position that our internal systems are all web-based (well, most of them anyways). Having web-based internal systems means quite frankly that putting something like Google Analytics, Piwik, Clicky, Open Web Analytics – to name but a few – on your system is very easy and an absolute must.

Why?

To answer the “why” is very difficult and different for every implementation. It’s different because each implementation of analytic tracking tries to answer a specific question. Once the question is answered, something is done about the answer (we either change something or we tap ourselves on the shoulder for a good job done – the formar is more often the case). Then we move on to ask the next question. Often, however, we don’t know the question until we see a problem staring us in the face. We can only see the problem if we track it somehow. Let me explain a few “why” or “problems” or “answers” by way of example:

Know what technology do people use without asking!

Initially when we started our “intranet” or “internal system”, we went through the same pains as most Web Software Development groups go: Why isn’t this working in IE? Why does it look this way in FireFox but in Chrome it’s all broken? Why do we even bother with Safari? Wouldn’t it be cool if we can just stick with one platform? We then realised that we actually have to decide on a single web platform to use: It’s was going to be less costly to have everybody standardise on one platform (at the time FireFox), than to spend hundreds of hours in software development to plan and develop for multiple browsers. (Remember, this was the time of IE 6 and 7 which was crap to say the least – and that’s actually giving them a compliment).

Lucky for us, we put in Google Analytic tracking in our internal system a few weeks before that realisation – not because we thought we needed Google Analytics, but purely because one of the developers thought it was really cool to see all the pretty graphs and stuff (true story!).

We went into Google Analytics and saw to our surprise that the majority of users actually did user Firefox… and the ones who used IE were on the newer IE7 which was less of a tirant than IE6. This made the decision to standardise everybody on Firefox much easier because we could take the stats to our bosses and say: “See, most of the company are on FF anyways, why don’t we just standardise on that?”. There was the usual pushback from IE-die-hard-kinds, but each time we brought out the stats and the pretty pie charts (updated off course after each convert) they buckled under the pressure of pure statistical awesomeness and went with the group to Firefox.

This saved us hundreds if not thousands of hours and money in compatibility development. Sure, you can’t do this as aggressive and sudden as we did it with external systems, but we’re not talking about externals here – we’re talking about internal systems that only internal staff are seeing and only internal staff are using. Nobody’s going to say “I’m going to stop buying your stuff because you don’t support IE version 0.9”.

Another example is when we needed more screen real-estate. We were at the time, believe it or not, designing and working hard to push everything into 800 x 600. The perception among the team was that 800 pixels wide is the standard we should go for… here & there we were “brave” and went with 1024 wide – but we believed 800 wide was set in stone because of a simple human nature fact: An adorable and widely loved older lady that worked in one of our sections who weren’t extremely tech savvy but commanded the respect and love of all developers used 800×600. Here eyes weren’t so great, so she went with 800×600 because she could see the stuff on the screen better. She was also very, very verbal about anything that was too small font and had to scroll – so human nature guided us to think in error that a lot of people in the company used 800×600 and that’s a really important line not to cross (because nobody wanted to make her unhappy).

We went into GA, and pulled the stats of how many people used 800×600. Out of a company of about 80 staff members using the software there was exactly 1 machine using 800×600. There was about 20% using 1024×768… and the rest was all 1280 wide and greater. Boy were we wrong!!

We stopped spending so much time trying to re-work screen layouts to fit everything in to 800×600, and rather just spent the money, bought the loveable dame a really really big screen that made the font look bigger even at 1280 wide and decided to now standardise on 1280 wide as a minimum resolution. It saved a lot of money!

Know how people use your software without asking!

Soon the system grew and the company grew in staff (and the staff turnover was faster than usual). We got to the point where you can no longer have a good understanding of everybody’s behaviour in the system. Heck, we didn’t have the time to go around talking to the staff / users on a regular enough basis to understand their usage patterns.

We then, by now realising that Google Analytics answered questions before, went to the GA system and started to look at the top page views. Our system is very ajax driven and we simulated page-views by hooking jQuery’s ajax system to also do a call to GA to “simulare a page view” with whatever criteria we decided on to identify different modules in our system. A side-effect of a rapidly growing internal system is the lack of cohesion. This is unfortunate, but I’ve come to accept this over the years of developing these systems. A problem arises, however, when you have two systems doing the same thing, both using different sets of code – one being easy to maintain and another being hard to maintain.

There was two ways of passing a credit note in our system. One was hooked into all sorts of weird places in the code and was very complicated to maintain, the other one was simple and elegant and needless to say extremely easy to maintain. To our surprise, the easy to maintain and simplistic code was the more often used system to do the same job. That led us to extract the bits that we needed from the complex module and do just that, making them simplistic in design, remove the duplicated functionality and deprecating the visual element of the one and notified people that from now on there is only one way to do that task – which only effected 2 people who didn’t realize that the other way existed and was quite alright with changing their ways.

Again… lots of hours saved in maintaining a system that wasn’t being used!

Know how long certain systems run without asking / waiting for somebody to complain!

By now we were looking at Google Analytics regularly – at least once every 2 or 3 weeks – definitely once a month! Then Google launched something that helped us tremendously – “Page Load Time”.

We were constantly seeing high load on either our database server or our web server running the system, but with a system that does on average 400+ queries a second (sure there’s peaks during billing runs so let’s call it 50 or so per second), it was extremely difficult for us to figure out what was going on. We looked at (and fixed) a few slow queries, but then the majority of the slow queries was running for say 2 seconds – that’s nothing, right!?!

When the Page Load Time was live for about a week, the reason for our problem was as clear as day! We had a view, that showed you a client’s financial record with us. For the most part there’s only about 20 or so lines to show, as we show only the last 3 months and if you want you can change the dates and view the rest. However, here & there you’d find a client who have not only one or two products with us, but hundreds. With these clients there were hundreds of financial transactions created just by the monthly billing scripts, not even talking about the ad-hoc purchases done by them via our online gateways. The Page Load Time showed clearly that on average the load time of that view was less than 5 seconds but every now & then it spikes to hundreds of seconds. I remember the one entry recorded 1200+ seconds (yes, that’s  20 minutes… and yes our apache allowed that… and yes that’s stupid… but… anyways).

This realisation that this view was causing problems didn’t show us exactly what the problem was, but instead showed us where to look for the problem. Upon further inspection it appeared that this view was doing an SQL query for every line that it showed… remember those queries that I said only ran for 2 seconds? Multiply that by hundreds for the same view.

We fixed the problem in a few days and our database administrators are still to this day buying us beers for thanks (well, not really but I can dream!)

The list of problems we picked up using this method is countless, I can spend the whole day showing how things was before we had a look at Page Load Time and how it’s changed.

Find weird errors you can’t replicate without sitting with the user for infinity!

This is a tough one, because most errors that you can’t replicate is, well, just that… extremely tough to duplicate / find in the first place. But, if you use a tool, any tool, and find at least one of these without having to sit with the user watching them work for infinity and hope you have the right metrics when the error occurs to know what happened, then it’s time/money well spent.

Case in point, we had a problem where some times our system would return a blank page. It wasn’t on the same ajax action every time, it wasn’t at the same time, there seemed to be nothing in common from what we could see by just looking at the odd email we get about it every week or week and a half. This is where human nature started to play a role. It turns out, this error happened a lot more often than we thought, but because the humans using the system got so used to it and was so busy working with clients that instead of typing us an email explaining exactly what they did, they rather just clicked again and then it worked (most of the time… sometimes it took two clicks or three, but never more than say 5 clicks).

We put another callback to check when we’re getting an empty result when doing an ajax call, and each time that happened we simulated a distinct pageview to google analytics, something like “weird_empty_result_occurred.php” with a bit of extra information we thought we might need at the end of the simulated page view.

It was mere days, and we saw a pattern. The errors occurred on the hour, every hour from around 08:00 in the morning up to around 16:00 in the afternoon – and off course on and off during the night. This pointed us to our back-ground scripts, and we then realised there’s one script that runs each hour which touches the accounting / financial tables in the database, and it dawned on us by looking at the Google Analytics that this occurred only when trying to insert / update something in the financial side of the system. The penny dropped and we realised our script is locking up the tables. The fix was rather easy, and it was a problem we would’ve been looking for for weeks on end and spending hundreds or thousands of hours on. We didn’t. We saved time. We saved money.

Identifying reasons for call centre floods without asking!

Any company, especially public facing service oriented companies (like ISP’s), get days where all hell breaks loose and nobody knows exactly why. A nifty side-effect of having internal analytics (especially long-term telemetry) is that you can spot what sections of the system your support staff is suddenly using move of. Take for examples a mess-up with sending out the wrong invoices and people think they’re being over-billed.

When you have sent out the wrong invoices via email, you bet your bottom dollar the financial section of the system is going to get hit hard – both in your public facing online client zone, as well as in your internal client management system. Imagine the scenario where everything was calm up to 10:00 in the morning, suddenly the phones start to ring off the hook having 10 people waiting in the queue to talk to somebody for every support agent you have working in your company. Nobody has the time to stop, stand up & figure out what the heck is going on.

By looking at the real time analytics, one sees that there is normal usage of the network testing systems, normal usage of the new client fraud check systems, normal usage of just about any system – except for the financial system, and specifically the system looking at invoices. It doesn’t take a genius to figure out that “something is wrong directly or indirectly with invoices and people’s money”. You then realise that the invoice email script kicks off at 10:00 each morning, put two and two together and after checking one or two invoices realise that we multiplied every cent by say 1000 and not 100 – oops.

Quick, send out an apology email, put something on the hold-system for the phones, send text messages out, anything to calm the mob!

A variation of this happened to us (luckily not something stupid like the wrong multiplier, but something equally small yet massive in impact). Twitter and Facebook was on fire, we were being attacked on all fronts. When those first few hundred corrective emails and text messages hit, our clients (the mob) started to relay the same message on Twitter and Facebook and other mediums, meaning some people heard / read about the problem and that we’re fixing it even before they received their text message. What would’ve been an absolute PR nightmare turned out into something positive. Message of thanks for the communications started to stream in. There was a bunch of “wow, you guys really responded fast to the problem ‘you da best'” flew around. And we even had a few clients stand up for us and got really upset with the usual suspects that just hated us for the sake of hating (we all know those clients).

There’s no way of counting how much money having the analytics saved us that day… even if it saved us only a few minutes in time to figure out what exactly was going on… but you can count on it that it was a lot of money! (Especially in future sales / loss of sales / cancellations).

Identifying miss-use of your precious system without prowling around with hawk eyes!

We once had a heavy report, that was made available only to top management (because it was so heavy on the DB). We discussed doing some updates to the database design, creating some redundancy in the information we save, in order to make the report run faster. But, because this report was only supposed to be run once or twice a week and the amount of data wasn’t so much we put off the changes (costly changes). Sound familiar? Yeah, we all do these things. A year and a half later, there was a campaign run to get more sales in a certain area of the business. An area that this report is crucial to showing whether the campaign is successful or not.

A junior developer did the small project, took him two days to do, and it was launched. Then, suddenly, servers caught fire and satellites fell out of orbit on our servers. (Sure it wasn’t that bad, but that’s how it felt when we didn’t know WTF). There was multiple projects running at the same time, rollouts on a daily basis, and we didn’t know which section of code caused the problems. Turns out, none of the new code caused the problem:

We soon realised (again, you predicted it) by looking at Google Analytics that the report I mentioned above which runs large SQL queries and does a whole bunch of complex calculations was running often. In fact, the report was running every 60 seconds!!! It turns out, a bunch of people who had a vested interest in the sales campaign got access to the report a few months back, and today they were refreshing the report every now & then – “not that often, every 10 or so minutes” was one of their responses. Problem is, there was a team of people doing the same.

A quick “stop it!” was all that was needed to get rid of the high load on the server. Sure, we made some tweaks and put in some restrictions on the frequency the report can be run and cached the results and some more cool stuff. Fact is, once again, having analytics on the internal system saved us from buying a bigger server or whatever lengths we would’ve gone to because users who didn’t know any better (and that’s fine!) was abusing / miss-using a system that wasn’t intended to be hit so hard.

Security, no really… analytics = information, information = power i.t.o. security.

Lastly, a fun one. We’re based in South Africa and as with all internal systems one needs to make sure that security is rock-solid around the system. We were locking the system down to a VPN (and before that the IP address of the office). When we installed Google Analytics for the first time, we played around with it and setup triggers to email us things about geography, page views, and what not.

Years later (about 4 years later actually), we were tight on support staff. We decided to outsource over-flow work to a reputable company in India. This, I was unaware of at the time as we were busy with other projects and I was not briefed.

A VPN was setup for them, and they started to provide basic support while they were trained on our products and so on.

Next thing you know, I get the fright of my life… “oh no! we’re compromised! our internal system is being accessed from India!!”. I frantically locked down the system – clearly overreacting. And minutes later after the Indian company complained they can’t get into the system, was briefed about the situation.

It was funny, and I actually caused more hassle than good, but what if this was a real attack on our system? If it was real, I would’ve saved the day… no… wait… having analytics would’ve saved the day.

Conclusion

Use analytics. It’s great. True story!

Seriously, there’s so many examples, things that we take for granted today, that shows how having some kind of an analytics system helps you make your internal systems better, helps you protect them and helps you generally just figure out how people are using (and miss-using) it.

You might have mentioned that I put “without asking” at the end of most of the headings. Yes, there’s no reason why you shouldn’t be able to figure out most of these things in a company of 20 people. Heck, even in a company of 100 people it’s conceivably possible to do without having these analytics. But soon you’ll reach a point where either geography becomes a problem, or the size of the organisation becomes a problem. When that day comes, you’ll thank your younger (and more intelligent) self for installing those 3 lines of javascript to put in the analytics in your system.

Make a file or folder invisible in Mac OS X Finder

0

I wanted to make the files that my new WD Live creates for it’s library invisible in my mac. It creates two files for each movie, a movie.metathumb file and a movie.xml, where “movie” is the name of the movie, say “Alice In Wonderland.avi”.

When you install XCode (Apple Mac Developer tools) you get a little script called “SetFile”, which lives in: /usr/bin/SetFile

You can run the following commands to make these files invisible to your Finder:

PS: To make them visible again, you’ll notice that it’s a capital “V”, just run the same with lower case “v”:

It worked wonderfully!

Now all I have to do is create a shell script that gets run each time I mount my drive when I’m copying movies on to it 🙂

Symfony2

Symfony2 Auto ACL / permissions bash script

0

Recently I had to install Symfony2 from scratch. Being a company that works on one install for years, we don’t always get acquainted with the install phase that often.

Every time I do a personal install just to screw around I have to go read how to do the ACL / Permissions for the cache and logs folders in /app/*

I decided a quick shell script to keep in my toolbox would be a good use of my time this evening. Here it is:

This quick script just follows a few steps that is done by the documentation at the Symfony2 Installation Documentation

  1. Copy this code into a file called “set_permissions.sh” in your project root.
  2. Add executable permission to the script, by running chmod +x set_permissions.sh
  3. Run the script:
    ./set_permissions.sh
  4. If you’d like, delete the script afterward, alternatively you can keep it for when you do an update to your code / mess up your permissions (like I do from time to time 🙂 )

Have fun, hope it’s useful.

Grateful to Matthias Noback for his post on How to Install Sismo which is what gave me the idea of creating the shell script.

Go to Top