Tuesday, October 18, 2011

JSPC Maven Plugin Package Name

Specifying package name when using the JSPC Maven plugin to pre-compile jsps


Tuesday, October 04, 2011

Amazon AWS Service Status

AWS currently does not provide an API to query service status. They only provide a health dashboard webpage. Although the page has RSS feed capability, I couldn't detect any specific pattern to watch for. This simple solution below works for now. We have created nagios monitoring out of these one liners

status0 = OK
status2 = Performance Issues
status3 = ERROR

The service and region name below is literally as is from the web page.

wget -qO- "http://status.aws.amazon.com/" | grep -B 1 "Amazon Elastic Compute Cloud (N. California)" | head -n 1 |
perl -n -e 'm/images\/([\s\S]+?)\.gif/ && {print "$1\n"}'


wget -qO- "http://status.aws.amazon.com/" | grep -B 1 "Amazon Elastic Compute Cloud (N. Virginia)" | head -n 1 |
perl -n -e 'm/images\/([\s\S]+?)\.gif/ && {print "$1\n"}'


Saturday, August 13, 2011

Rooted my Motorola Atrix

System version: 4.5.91
Android Version: 2.3.4

Instructions found below:


If you get the "device not found" error when running "adb shell" then make sure you have sync drivers installed. When you connect Atrix to your computer the first time it installs a bunch of drivers and also gives an option to install "synch drivers". Make sure to install the "sync drivers".

Thursday, July 07, 2011

Repository for JBoss RestEasy Releases

We have been using JBoss RestEasy for developing restful services. The framework is new and we have had to change the repo many times with various version releases and often times the repos fail with permission denied error or dependencies resolution error. A more stable repo for RestEasy is being maintained by the Atlassian folks. We use the below repo and its been working consistently and updates with new releases fairly quickly


Wednesday, June 01, 2011

Check Logged Person In Campfire Room

We use campfire as a sole medium for team communication so its essential every team member is logged in so they are accessible. All team members use Mac and we use the Fluid app to run Campfire as a app. The app logs out automatically sometimes when in not use and we needed a way to get alerted when member was not logged into campfire.

Campfire has a pretty nifty API that allows to query many room attributes. Below script can be used to detect if a person is not logged in. Each of our team member has the below script cronned in their machines to run every 10mins or so to get alerted when they have been logged out. below are the attributes you need to substitute

{authcode here} - To get your auth code in Campfire go to "Edit my Campfire account" or "My Info" at the top screen in Campfire and there you will see your auth code. This is a secret code and should not be shared with others so be watchful and ensure the below script only has read/execute perms for the member user on the box.

"#{id}" - To get the room ID simply login to Campfire and click on the room. The browser url should show the room ID.

{member name} - Should be the same as the name specified when creating campfire account

curl -s -u {authcode here}:X "https://{your specific domain}.campfirenow.com/room/#{id}.xml" > /tmp/campfire-room.xml
present=`grep "{member name}" /tmp/campfire-room.xml | wc -l`
if [ $present -eq 0 ]
`/usr/local/bin/growlnotify --name "Not Logged In Campfire" -s --message "You are not logged into campfire.
This might reflect in your performance review next year so you better login now :)"`

Sunday, May 08, 2011

Transitioning Traditional Data Support Folks To New Breed ETL Systems

Like many other folks out there who are discovering the power of data compute grids, we recently transitioned a part of our traditional database based ETL system to Hadoop based processing system. Being in the digital advertising field, we get a whole lot of impression data both from our tracking systems and external. Our existing ETL system is mainly comprised of four components: cleansing, standardization, dimensioning, aggregation.

The cleansing and standardization components that involve a whole lot of text parsing and mapping take the major brunt as they deal with the raw volume of incoming data. As part of the transition we moved these two components to the new system. We have a dedicated product support team that handles most of the daily user data queries. Issues like missing/incorrect data, re-running jobs due to changed dimensions or external data outages, configuring and testing new data fields/sources, generating ad-hoc reports, configuring new clients etc. These folks have a thorough domain knowledge and are well versed with this data. They also fully understand the current ETL data flow and the various business rules thats gets applied as part of the data processing. Technology wise they are comfortable with databases, SQL, basic scripting, Excel and are usually enthusiastic about learning new technologies as need be.

To be able to perform the above mentioned issues, they essentially need a way to slice and dice raw/stage level data and with the data residing in HDFS this becomes an issue. We have been brainstorming ways on how to expose the new system to support folks and below are some options.

Apache Hive: Facebook was the first company to encounter this problem wherein they transitioned their analysts folks from a RDBMS based warehouse to a more scalable hadoop system. They developed Hive which is essentially a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. HiveQL is very similar to SQL although it does not support the full SQL-92 specification. As per my reading it more closely resembles MySql's SQL dialect and that makes sense because Facebook is a mysql shop and a similarity would make the transition easy for its folks. This option stands top in our list of possibilities given its similarity to SQL.

Apache Pig: Apache Pig is also a higher level abstraction for map/reduce. Pig uses Pig Latin language to express data flows. Although this is a powerful tool, it would require the support folks to learn a entirely new language.

Commercial tools like Datameer, IBM Big Sheets: It is a well know fact that Microsoft Excel is most versatile analytic tool. Analysts love the ease of use and the tools it provides to slice and dice/graph datasets. Imagine an Excel like tool with the power of Hadoop. Its essentially what these commercial tools are. We recently received a great demo from the Datameer folks and were impressed by its ease of use and especially its pluggable architecture. Easy and familiar spreadsheet-like interface for business users with complete set of data integration, transformation/analytic and visualization tools. It also has a neat scheduler for cron based job scheduling. This option is also a strong contender in our option set given its ease of use and spreadsheet like usability and feel.

We haven't decided on an option yet. The next couple of weeks will involve closely working with support folks to evaluate these options and ensuring a smooth transition.

Friday, March 18, 2011

Maven3 error "Could not find artifact"

We recently switched our artifactory repo to a new server and things were fine with build running fine. One of the team dev deployed a new artifact version to the repo and that caused the build to fail. We kept getting the below error on our continuous integration server while the build was working fine on all dev boxes

"Failed to execute goal on project... Could not resolve dependencies for project....SNAPSHOT: The following artifacts could not be resolved: {new artifact}. Failure to find ... in http://localhost/artifactory/repo was cached in the local repository, resolution will not be reattempted until the update interval of artifactory has elapsed or updates are forced"

We debugged this issue for a while and made sure the artifact was deployed correctly to the repo and all that but the main issue was that in the error message the artifact location was wrong. It was looking in the localhost. We checked the POM and ensured the repositories link specified were correct. After quite a bit of head banging we leaned towards a issue with local cache of repo and decided to purge the local repository. We deleted the .m2/repository/artifact directory and that resolved the above error but gave the below error

"Failed to execute goal on project... Could not resolve dependencies for project....SNAPSHOT: The following artifacts could not be resolved: {new artifact} Could not find artifact in artifactory (http://localhost/artifactory/repo) "

We were still looking to find as to why it was not pointing to the correct repo location. The fact that it was working fine on all all places made us believe that there was a specific issue with the continuous integration box so we decided to specify a override on the local repo location so it downloads fresh a copy of all dependencies and thats when we saw a default mirror repo location specified in the local m2 settings.xml file. The file had the below properties set


That was causing the issue. We modified this to point to the new repo location and boom the error was gone

Thursday, March 10, 2011

Mac OSX Disable gconsync

I recently enabled synching with google contacts in ITunes while synching my IPhone. Ever since then I have been getting this nagging pop up where a program called "gconsync" keeps asking me password to access keychain. On my iphone I have now setup google account as exchange account and that automatically synchs my contacts with google so I no longer need to enable the google account synching with itunes but even after disabling that I still kept getting the gconsync popup for keychain access. Here's the way to disable this

Open the Address book app on Mac: /Applications/Address Book.app
Go to Preferences -> Account -> uncheck the "synchronize with Google" option

That should get rid of the popup.

Friday, March 04, 2011

JIRA Current Iteration Filter

The current version of JIRA we use does not provide a option to query by current iteration. Although the JQL querying option allows to do fairly fine grain searching it fails to provide a current iteration option. One way they could add is by looking at the Iteration start and end date and compare that to current date. I set up various filters for current iteration that search for issue without story points, subtasks with missing hours etc. It was a pain to modify the filters every new iteration to specify the iteration number.

We installed a free JQL query extension that allows a versionList function

Once installed you can query the current iteration using the below JQL

project = COLL and fixVersion in unreleasedVersions() and fixVersion in VersionList("Iteration ??")

To make the above work, I had to rename all future iterations to some thing line 'Pre Iteration....". That wasn't a big deal as we only wanted to make sure all historic and current iteration are named correctly.

To query all issues that don't have a story point one can use the below query and subscribe to daily email to get alerts

project = COLL and fixVersion in unreleasedVersions() and fixVersion in VersionList("Iteration ??") and type not in (Bug,Sub-task, "Support Ticket") and storyPoint = empty and (resolution in (Fixed,Completed) or status = Open)