Welcome to Cykod. We are a fully-integrated, self-funded web-development startup located in Boston, MA.

Cykod Web Development and Consulting Blog

Recovering a Website from Thin Air

Or, Undeleting Mysql MyISAM

Note: this post is from 2012 but I never got around to cleaning up and posting it.

Around 7:45 AM in the morning of one Thursday in 2012, I got the following piece of good news in my email:

URGENT: Website.com deleted their entire blog

Website.com (name changed) is a medium sized niche news blog that averages somewhere in the range of 100k uniques a month and has around 25,000 posts. I had helped build some specific functionality for the site on top of a CMS I had built as a consultant in 2010. It’s hosted on its own Linode instance. What had happened was that a user had been given a few too many permissions, had meandered into an admin screen she shouldn’t have while trying to delete a post, had bravely clicked onwards, ignoring multiple warnings of doom and gloom, and had gone ahead and deleted the entire blog along with all 25,000 of its posts.

“Aw, Crap,” I thought, knowing the consultancy that had built it (my former client) had no one on staff that could pull the lost information out of backups and merge it in.

Logging on with SSH, I went to the config directory to find out where the machine was keeping its backups. I got a little scared when I didn’t see the requisite backup.yml file, but figured I’d just have to restore the Linode to the state of the last full-machine backup, losing the past day’s worth of forum activity and users comments. My ex-client’s client would be a little pissed, but as they weren’t without blame in this either it could all probably be smoothed over. Logging on to the Linode control panel I notice something that sent a palpable chill down my spine: a little link asking me if I wanted to enable backups on the instance.

There were no active backups.

The last site backup was from the middle of 2011 before it had been moved to it’s current server.

Fuck.

Nearly a year of the blog, 2,000 posts and thousands of man-hours of people’s lives, for all intents and purposes, were gone.

I relayed the news, explaining that there were no recent backups.

I was asked what could be done. “Well…” My first impulse was to state the obvious: “Nothing.”

The website and the data was gone. This wasn’t exactly true though. Search engines and the archive.org crawler cache the full content of pages all the time. We didn’t need any images, just the blog data. Provided we took down the website immediately, any cached copies would stick around long enough for us to grab them down. The only problem with this is that we aren’t talking about a 10 page brochure site. We were talking about thousands of posts in various categories and sections strewn throughout the site.

But, suddenly all didn't seem lost. My mood improved as I realized that there were additional fingerprints from the site’s existence strewn about the server: logs, data files, caches, etc. Anything that might relate back to the blog could be assembled, one piece at a time, back into a cohesive set of data for the blog.

Getting permission to shut down Apache to prevent any search engine’s caches from being updated, I scoured the server for anything that might be reconstituted back into blog data. At minimum, the rotating Rails logs for the previous 14 days were around and any posts could be recreated from the post-data in the log files, which was better than nothing.

As I was working my way through the server I figured I might as well take a look at the mysql data files and see what there was to see. I didn’t know anything about how MySQL works interally, so I didn’t have any idea what I would do with them in any case. What I saw stopped me in my tracks:

-rw-r——- 1 root root 13392 2012-03-22 12:14 blog_blogs.frm
-rw-r——- 1 root root 8714 2012-03-22 12:14 blog_categories.frm
-rw-r——- 1 root root 13312 2012-03-22 12:14 blog_post_revisions.frm
-rw-r——- 1 root root 232787772 2012-03-22 12:14 blog_post_revisions.MYD
-rw-r——- 1 root root 1608704 2012-03-22 12:14 blog_post_revisions.MYI
-rw-r——- 1 root root 8656 2012-03-22 12:14 blog_posts_categories.frm
-rw-r——- 1 root root 9008 2012-03-22 12:14 blog_posts.frm
-rw-r——- 1 root root 3385064 2012-03-22 12:14 blog_posts.MYD
-rw-r——- 1 root root 648192 2012-03-22 12:14 blog_posts.MYI
-rw-r——- 1 root root 8652 2012-03-22 12:14 blog_targets.frm

blog_post_revisions was the table where the damage had been done, and it had been cleaned out completely, with every record in it deleted off the face of the earth. If that was the case, why was it nearly two hundred and fifty megabytes big? I did a quick:

$ more blog_post_revisions.MYD

Between the occasional angry beeps of ASCII BELL characters and various non-ASCII characters mucking with my terminal, there was a lot of stuff that looked exactly like the contents of the deleted blog posts. 232 megabytes of stuff, to be exact. blog_post_revisions is the table where Webiva stores a copy of the content every-time the user presses ‘Save’ on a blog post. The most important data for the blog was still there, hanging out in the deleted-row-netherland.

I stopped mysql to prevent any additional data-lose and copied all the data down to my local machine. Opening the file up in a hex-editor confirmed that the majority of the data was there

Running a myisamchk on the file gave me good news:

$ myisamchk -e blog_post_revisions.MYI
Checking MyISAM file: blog_post_revisions.MYI
Data records: 0 Deleted blocks: 70457
- check file-size
- check record delete-chain
- check key delete-chain
- check index reference
- check data record references index: 1
- check data record references index: 2
- check records and index references

70457 deleted blocks sounded promising, if the checker knew about the blocks then there had to be a substantial amount of data here. Now the questions became: how does Mysql actually store it’s data in binary format (for a MyISAM table in this specific case) and can MyISAM rows be undeleted?

A quick consultation of the Google was inconclusive: some people claimed something was possible but that there was no automated way to undelete data. This turned out to be true: because of the way it’s stored, there was no automated way to undelete MyISAM data when the row format is Dynamic (as blog posts, which were rife with MEDIUMTEXT fields, most certainly were.) If the table had been innodb a company called Percona has released an open-source data recovery tool: https://launchpad.net/percona-data-recovery-tool-for-innodb If the table had been a fixed MyISAM table life would also have been easier, unfortunately the table was MyISAM and Dynamic and as such I wasn’t able to find any tool that might help in recovery.

I did, however, come across this jewel:

http://dev.mysql.com/doc/internals/en/myisam.html

While this document may be outdated and has at least one inaccuracy that stymied me for a bit (specifically DATETIME’s aren’t stored as 2 32-bit integers, but rather 1 64-bit integer in my version of MySQL) it proved to be the keystone of the whole operation. It describes the data format of MySQL’s MyISAM files from beginning to end and explained exactly why a universal undelete was not possible.

A MyISAM table consists of three separate files on the file system: a .MYD (MySQL Data), a .MYI (MySQL Index), and a .frm (Format) file. The format file describes the columns in the table. For this case I only cared about the .MYD file as the structure of the table was still intact. The section in this page on the Layout of the Record Storage Frame explained exactly why pull the data out of the deleted rows was not going to be straightforward:

http://dev.mysql.com/doc/internals/en/myisam-dynamic-data-file-layout.html

Most of the blog posts would be well under 64k, which means that they would fit in a “small” record, which means that their header length would be only 3 bytes. Deleted blocks, on the other hand, overwrite the beginning of each record block with a 20 byte header. This means that at least 17 bytes of data are missing from the start of each deleted record. This wouldn’t be a huge deal except for that fact that in those 17 bytes is the bitmap that defines which fields are NULL or empty and which have content. If you have a table that has a lot of fields that might or might not have data in them, you’re going to wish you knew which fields are represented in the data file and which weren’t.

Since the blog_post_revisions table is mostly a write-only table, there was a significant bonus that most of the records originally were type 1/01 which means that they were full small records. Tables that have a lot of deletions followed by writes would have been much more painful to handle as they would have been filled with records that would only be partial records continued in other parts of the file (see types 5/05, 7/07, 11/0B in the link above) that would have been unparsable because the details pointing to the next record would have been overwritten.

The table in question, blog_post_revisions looked like this:

1 id int(11)
2 blog_post_id int(11)
3 title varchar(255)
4 domain_file_id int(11)
5 status varchar(255)
6 keywords varchar(255)
7 preview mediumtext
8 preview_html mediumtext
9 body mediumtext
10 body_html mediumtext
11 author varchar(255)
12 created_at datetime
13 updated_at datetime
14 end_user_id int(11)
15 media_file_id int(11)
16 embedded_media text
17 preview_title varchar(255)

As you can see, the table has a number of fixed length fields?—?int(11) for example always takes up 4 bytes?—?along with variable length fields: varchar(255), mediumtext, and text. Variable length fields are prefixed with a length field whose size depends on the size of the field. varchar(255) fields have, as one would expect, 1 byte of length data. mediumtext has 3. text has 2. Except, as mentioned above, when the fields empty. Then they just aren’t there (information that would be contained in the bitmap, but unfortunately overwritten by the too-large deleted row header.) As you can imagine, this makes life more interesting when you’re trying to blunder your way through the file and extract useful data.

Looking through the data with my trusty hex editor I realized that this was going to be a challenge and was going to rely significantly on the actual data stored in the table: which fields were normally filled in and which weren’t. The first issue was to orient myself on an individual record. Achieving this turned out to be a fortunate coincidence. The only records that we cared about to restore were ones that had a status of ‘active’. This string, combined with it’s length prefix of 6 (resulting in the hex string of “06 61 63 74 69 76 65" was specific enough (having a bit less than a 2^56 or so chance of appearing randomly) to be an anchoring point for the recovery.

My plan of attack involved finding that specific hex string in the file, then going backwards 4 bytes to save the domain_file_id (the image used in the blog post) if present, then going backwards an unknown number of bytes to save the title of the post. Doing those two things were a bit tricky, as I didn’t know a priori if the post had an image and the first two characters of the title along with the byte that represented the length of the title itself were chopped of by the extended deleted header. The only solution was to look at the four bytes preceding the status and if any of them were non-alphanumeric ASCII characters then assume we had a file id (the title, after all, would be all valid ASCII alphanumeric characters). From there the path would continue backwards until the first non-alphanumeric character was reached. After which I chopped two characters off the front of the title for good measure to prevent any coincidental letters from getting into the title (as the title was already incomplete, it wasn’t a big deal if we lost a couple more characters).

Going forward from the status turned out to be a bit easier, as the start of the preview, preview_html, body and body_html of each post would always begin with a HTML-tag starting “<” (an artifact of the WYSIWYG editor Webiva uses and the glorious consistency of the bloggers writing the posts) This meant that unless any of the fields happened to be exactly 0x3C characters long the rest of the data was pretty manageable.

This technique was extended to another table that contained a search index containing the url of the post along with a slightly-less incomplete version of the title. Merging those two records then running the title through the Bing API, pulling down the cached version of the first page that matched the URL of the website and using the Ruby Nokogiri library pull out various pieces filled in some more of the data.

In the end there wasn’t a silver bullet. Recovering all the data required a combination of parsing the blog_post_revisions.MYD file, parsing the search index content_node_values.MYD file, parsing the production.log files for any missing url and finally hitting the Bing API for cached versions of pages when all that failed to pull the user data. I lost that Thursday, Friday and then the entire weekend that followed mixing and matching data where I could to piece together the content for the site. But in the end what I thought originally was going to be impossible turned out to be entirely doable with a little bit of metaphorical elbow grease.

This post will end the same way as the many other people who recount harrowing tales of partially-avoided data loss tend to end their posts: reminding you not to suffer the same fate as I and double check your backups right now. Even if it’s not a machine you’re employed to maintain, if you are going to be the one recovering the data, go ahead, log-on and double check your backups. You’ll thank me later.

For those of you found this post for whom it’s too late, know that with a little bit of luck you probably have more of the data for lost site than you think, you’ll just need to look at all your options for pulling the site out of the cloud and the MYSQL-binary-format badlands.

When it was all said and done, we were able to get all but a few of the 2000 lost posts back and I lost out on a gorgeous weekend here in Boston (ok a gorgeous Saturday, Sunday was cold and rainy) that would have been much better spent in ways other than traipsing around the specific of MySQL’s binary format and violating the Bing API’s terms of use.

P.S. The Bing API actually Rocks. Microsoft has done what Google has specifically outlawed users from doing: hitting their API to generate custom search results for users on the server side. Who would have guessed 5 years ago that Google would now habitually play the role of restrictive, privacy violating bad guys, while Microsoft would come to the rescue as API saviors?

Posted Wednesday, Nov 13 2013 05:58 PM by Pascal Rettig

HTML5 Gaming: the sound of Inevitability

[X-post from HTML5GameDevelopment.org]

There's a scene in the original Matrix near the end where one of the Agents is holding Neo in front of a train and says: "Do you hear that sound Mr. Anderson? That is the sound of inevitability."

With the cacophony of discussion of whether HTML5 is ready or not as a gaming platform, that phrase always pops into my mind (Let's ignore, for the sake of the bad metaphor, that the Agent was wrong)

2012 will most likely be a gap year in HTML5 gaming. It's not there yet, primarily because on mobile, HTML5's sweet spot, the platform is still underpowered for games. But I'd guess that at some point late during the year, when hardware accelerated canvas has reached a crucial level of market penetration, there's going to be a game that captivates the masses that could only have been written in HTML5. Much like the shot in the arm of Indie developers everywhere that MineCraft gave, this game will do the same.

But that is actually beside the point. It doesn't matter if there is never a game that does that. The die have been cast and the tea leaves have been read. The reason for the inevitable sucess of HTML5 gaming over competing platforms is access. Everyone in the world with an electronic device has access to a HTML5 Game development environment (the Browser) with a full debugger (Dev Tools, Firebug) already installed on their Desktop. Throw in a text editor and you have a world-class IDE.

A discussion I had with some colleagues at a meetup recently drove this point home. Every one of them had gotten into programming by developing games. Out of 5 people in the conversation, every single one had the same story. 

The next generation of programmers (and game programmers) are already weaning themselves on the most accessible programming environment at their disposal: JavaScript. Online coures like Codecademy teach JavaScript because there's an interpretor already built into every browser. Programmers in training aren't going to buy a $500 program (Flash) or download gigabytes of IDEs (Visual Studio, XCode) to learn a development environment that's more difficult with less instant gratification unless they have to.

As soon as HTML5 Gaming as platform is "Good Enough" for whatever it is you want to build (which it is dangerously close to being,) the battle is going to be over. Look at the success of JavaScript versus all the alternatives that have come and gone. Simplicity and availability win the day. 

When that next generation starts building indie games and starts entering the workforce, they are going to come pre-packaged as HTML5 Game Developers and have their say on the next generation of games being developed.

It may be in a year or so or it may be a few, but it's a train that's a coming.

 

Posted Friday, Jan 27 2012 01:40 PM by Pascal Rettig

How to speak Internet.

Alternative title: Internet speak for the over 50.

TL;DR - drop the The when talking about Internet sites, know what Memes are.

Any work group or subculture has its own language that is unique to group. Architects, computer scientists, Construction workers, and pretty much any other profession all have unique sayings and phrases that sound strange to people outside the group. People who spend a lot of time on Internet are no different. The only difference is that the Internet, in its ubiquity, tends to spread its tenticle-like vocabulary to the rest of the world. You can't escape Twitter, Facebook or Google. Even if you're just a regular person in another field just trying to get through the day.

So, despite your best efforts you'll occasionally be called on to speak about Internet related things, whether it be by your colleagues or your children. If this is the case and you're not an Internet illuminati, it's important to keep a level head and remember a few basic rules.

1. Drop the "The"

The first mistake people make is to try to treat internet websites as if they are one of a franchise of locations by putting definite articles in front of them. Don't do this. There is only 1 Twitter around, you don't want to preface it with "The" as if you were talking about a specific one.

Compare the following:

I'm going to Starbucks
I'm going to the Starbucks.

The latter only makes sense if you are referring to a particular one of many (in this case there are many starbucks) and the person you are talking to should know the one you are talking to about. The second one is basically an abbreviation for something like:

I'm going to the Starbucks around corner. 

Putting "The" in front of something like "Twitter" or "Google" sounds wrong. It's like you're saying:

I'm going to the Twitter over there on the weird Internet thingy.

It's pretty much a clear give away that you're not a comfortable with the ways of the Internet.

Let's try a few examples out:

Yes: Do you have a Tumblr blog?
No: Do you have a blog on the Tumblr?

Yes: I'm listening to Spotify
No: I'm listening to the Spotify

Yes: Are you on Twitter?
No: Are you on the Twitter?

Yes: I'm on Facebook
No: I'm on the Facebook

Facebook is an especially challenging example because the site actually was "TheFacebook.com" when it launched, so referring to it as The Facebook back in 2004 was perfectly acceptable, but that ship has long sailed. It's Facebook. Not "The Facebook."

Overcome the urge to add in "The" and you'll be happy.

Note: the same applies to using demonstratives like "That Twitter" or "This Facebook Thing." Don't do it. You'll sound much more current and will draw far less attention to yourself if you just go with the name of the website or service and leave the rest off.

The Internet is an exception to this rule. It should almost always be prefaced with "The" when used as a noun. 

Wrong: I'm going to surf Internet
Right: I'm going to surf the Internet
If you are feeling really confidant in your newfound ability to speak Internet websites, you can opt to occasionally go for a rather advanced combo - intentionally add the "The" back in and pluralize your noun. Here's an example:
I haven't seen you on the Twitters lately.

Note: this is an area you'll want to tread carefully on as it must be clear you're making and joke and being intentionally ironic or people will miss the point.

2. Know your verbs

The second major point is to know your verbs. Just like you wouldn't say "I'm going to Car myself to Starbucks" It's important to know when there is a verb that has entered common usage.

Most of the time that verb will match the name of the company, but sometimes a different verb will be used. Sometimes a company has not reached a level of ubiquity that using their name as a verb sounds wrong. It's a fine line here and you'll want to tread carefully. There is also a specificity issue, when the verb is specific enough you can drop the company name completely, when it's not you'll want to keep it.

Wrong: Did you search Google for it?
Right: Did you Google it? (To Google now means "to search")
Wrong: Did you post something?
Right: Did you post something on Facebook? (not specific enough, unless FB is already in discussion)
Wrong: Do you Tweet on Twitter?
Right: Do you Tweet? (Twitter is the only place you can Tweet)

Sometimes it's less clear. For example Foursquare is wildly popular among a subset. Which of the following to use depends on context:

Did you check-in?
Did you check-in on Foursquare? 

Here's a general table of websites and actions relating to them:

Website

Actions

Means

Company name?

Example

Facebook

Post Post Content Yes You should post that to Facebook
Like Like something from another website No Did you Like the Band's page?
Friend Add someone as a friend No Did you Friend Joanne?
Unfriend Remove someone as a friend No I unfriended Steven

Google

Google To search for something No I don't know, Google it
Plus One Click the +1 on (similar to Like) No Make sure you Plus One our website
Share Share something on Google+ Yes Did you share that on Google+
Add to a Circle Add someone to your Google+ account so you see their posts. No You should add them to your Circles.

Twitter

Tweet Post a Tweet No Did you Tweet about it?
Follow Click Follow & add them to your stream No* Make sure you follow them
Unfollow Remove someone form your stream No* They Tweeted too much, I unfollowed them

Foursquare

Check-in Mark your current location No* Did you Check-in to the concert?

Tumblr

Post Write a blog Post Yes You should post that to your Tumblr

Wikipedia

Wikipedia To look something up on Wikipedia No Let's Wikipedia it

IMDB

IMDB To look something up on IMDB.com No I can't remember the actor's name - IMDB him.

Blog

Write a Post To write a blog post N/A I wrote a blog post today

 

Memes

The last frontier of Internet-speak citizenship is the understanding of Memes.

Memes are the shared jokes of the Internet that are pop up on a regular basis and are usually beaten into the ground just as quickly through overuse. Some memes stick around and stand the test of time (like Chuck Norris) but most come and go rather quickly. If you are really interested check out http://knowyourmeme.com. In general there are far too many to try to keep track of and unless you spend your entire life on 4chan there are a few tricks to avoid looking dumb around memes. 

The first trick is to make sure you are pronouncing the word correctly. Take a look at this video if you need help.

The next is to recognize when Memes are being discussed. The general giveaway is when Internet hipster folks are laughing at stuff that isn't funny or doesn't make sense. Usually they are laughing because of shared knowledge of some meme. Don't laugh along if you haven't heard of the meme as you may get lost and called out in the ensuing conversation, just nonchalantly say something like "Oh is that a meme?" or, if you're sure it's a meme, say "Oh, I haven't heard of that one." Since the Internet moves quickly you won't be expected to know all the current memes, and someone will jump at the opportunity to explain, in most likely overly tedious detail, where the meme popped up from and why you should think it's funny. 

Did you know? The TL;DR at the top of this post is a Meme that got started by lazy internet folks who are upset with having to read more than a few sentences. It's short for "Too Long; didn't read." It's now sometimes used to provide a summary for those same folks who don't have attention spans that last more than a few paragraphs.

There you have it - all you need to not flop around like a fish out of water when things on the Internet are being discussed.

Posted Wednesday, Jan 25 2012 10:05 AM by Pascal Rettig

Class Slides from Web Design 1 - Fall 2011

Martha and I taught Web Design 1 this past fall semester at MassArt and used CoderDeck for creating interactive, runnable slides. We had the students all use GitHub (specifically GitHub for mac) for pushing their code to the Web using Github pages - which despite the occasional crash worked out pretty well. 

We're releasing the class sides under the creative commons BY-NC-SA license. 

Use the arrow keys to go between slides. There's a master list of links to each class. It's Reproduced below:

 

We're both teaching again this spring, any feedback or suggestions very welcome.

Posted Monday, Jan 02 2012 11:31 AM by Pascal Rettig

Discover your non-testing Goat w/ Git blame and RCov

I wrote a quick n' dirty script for fun last night to see how we were doing on test coverage and who was responsible for the most untested code in the git repo. The idea behind the script is to take the output of RCov and line it up with the output of `git blame` and then track who's responsible for each line. It's ugly code, but it's was fun to pull together a couple of different pieces. Here's a sample modified coverage/ file (with the commit and person responsible for the code add on the left):

Gist below.

I unfortunately discovered I was the goat.




Posted Tuesday, Nov 29 2011 09:48 AM by Pascal Rettig | Consulting, Development

Gumptionology for developers

Having just finished "Zen and the Art of Motorcycle Maintenance" I find that the word "Gumption" has been rattling around my brain for the past week, sneaking it's way into my mindset as I go about my daily tasks.

"Zen and the Art of Motorcycle Maintenance," (ZAMM)  subtitled "An enquiry into values" was recommended to me as a great "Programmer book" - not a "Programming book", but rather one of the those more intangibly beneficial books that effect pause and reflection in the normally full-speed-ahead world of software development. A book that couldn't be further from the world of development yet somehow come across referenced in a bibliography or quoted in a presentation every couple of months. 

So I took the bait and jumped in, and really had no idea what hit me. I found myself sneaking in reading to get through the book - before work in the mornings - waiting in line - with an enthusiasm normally saved for Neal Stephenson and Douglas Adams.

The central focus of the book is the concept of "Quality" as a unifying principle for the world. Heady stuff. I'll leave the discussion on "Quality" in software development to more intelligent people than I, but there was one distinctly concrete idea that Pirsig dropped into near the end of ZAMM: The concept of Gumption and Gumption traps.

Of Gumption, Pirsig says:

 "I like it because it describes exactly what happens to someone who connects with Quality. He gets filled with gumption. The greeks called it enthousiasmos, the root of "enthusiasm," which means literally "filled with Theos," or God, or Quality. See how that fits? ( Loc. 4928 )

The programmer with an ample supply of gumption is easy to spot - he's the one for whom no problem is a boring problem, because we as developers can create our own gumption by solving boring problems in novel ways. Yes, tying together a rube-goldberg machine of Unix pipes is probably not the best way to solve a 10-minute data entry problem, but if it takes 9 minutes and let's me learn more about AWK, it should get counted in the win column. (You can take this way too far, but good dev's skirt the line between making boring problems interesting and making problems out of non-problems.)

But Gumption is easy to lose. When the job becomes rote and there's not a creative spark left in the daily grind, work suffers. Life suffers. And developers will either die a little inside or spend that creative spark on side projects and leave nothing for the 9-5.

So in the same way Pirsig described the Gumption traps that hinder Motorcycle Maintenance, here are the Programmer Gumptions Traps I've come across (imagine a PHB saying the following to get in the spirit)

"This is the way we do it" - There's nothing quite as dehabilitating as being told, when proposing a novel solution to a problem that "this is the way we do it, this is the way we've always done it, this is the way you're going to do it" It literally can deflate the life out of a developer in two seconds flat. That doesn't mean every programming-language or framework-related whim should be catered to, but it does mean that people making the technical decisions in a company should have the technical knowledge to make them and be able to make them on solid technical grounds. Nothing kills creativity quite like being told that whatever flawed methodology that last guy in your position came up with during a caffeine induced epiphany is one of the 10 commandments of company methodology. Not allowing any discussion on the methodologies a company uses is a sign that the company isn't doing things for the most technically sound reasons.

"Implement this UML" - As I've written about before Implementation matters - it's not just a matter of taking some designs a software architect wrote up and just writing the simple code that makes it actually run. Real software doesn't work like that, and code implemented to fullfill the requirements of waterfall design usually jump through a number of hoops to account for design flaws that could have been weeded out with a few feedback loop driven iterations.  

 "This is the best hardware we can afford" - Programmers are expensive, and giving them sub-standard hardware (or less than 2 monitors) makes absolutely no sense from both a financial and a gumption perspective. If your developers are sitting around waiting for tests to run or libraries to compile, they aren't working. There's a certain length of time that sites as the cut-off time for the "flow" that transcendent state where the developer and machine becomes one. If you're machine response time is less than that magic number, you'll be able to keep razor focused on that task at hand. If it's above that line, your hand will subconsciously hit Ctrl-t, bring up a new tab and type in slashdot or reddit.com.

 Keeping your Gumption, being allowed to bring something of Quality into the world is one of the greatest feeling in the entire world. If you're working for yourself of managing developers, make the world a little bit of a better place by paying attention to the Gumption traps that you're laying for yourself and your developers.

Posted Saturday, Oct 15 2011 03:14 PM by Pascal Rettig

Using Node.js and your phone to control a Browser game

This past week I undertook a pretty cool project as the Intern here at Cykod. We were wondering how easily a smart phone –specifically using its gyroscopes and accelerometers– could be used as a controller for a multi-player game on a larger screen. With a bit of Node.js and HTML5 magic, it turned out to be pretty simple.

Concept

We want to use a desktop (laptop, iPad, etc. something with a bigger screen that multiple players can easily look at) connection to act as the common game space. Once that screen is initialized, each player connects to a specific URL in their phone browser that links them to that game instance. We'll follow this basic outline:

  • Register new connections to the server and decide if it is a room or mobile user:
  • Create a new room,
  • Or add the connection to an existing room
  • Constantly poll the mobile device for orientation data
  • Use said data to update the HTML5 Canvas game
  • Handle dropped connections appropriately

Result

The proof-of-concept full game is up at http://bit.ly/G4LSpaceWords  

 

The Technology

Node.js
Node.js is what makes this project possible. Built on Google's V8 Javascript Engine, Node.js is a server environment written in -wait for it- JavaScript. I started this project with zero knowledge of writing a server or what it would take, and Node made it super easy. Unfortunately, because the Node.js project is growing so quickly, up-to-date documentation with current-version examples can be lacking.

Socket.io
A Node.js module, Socket.io adds multiple levels of socket support, accommodating nearly every browser. This allows Node.js to quickly communicate with the browser similar to the way AJAX would, but with less overhead. Socket.io does not yet seem to support the updated Websocket Spec deployed in the latest Chrome and Firefox. As soon as it does, performance will be significantly smoother compared to the current default of XHR Long polling.  Now supported in the newest version of Socket.io, and controller performance is better.

Mobile Phone Orientation
Nearly all smart phones on the market have some sort of accelerometer or gyroscope in them. The phone parses this information and makes it accessible in the browser. The HTML5 DeviceOrienation and DeviceMotion events allow us to take advantage of this. You can read more about it at HTML5 Rocks. (Fun Fact: The native Android browser does not support access of this data. A third party browser like Firefox is needed)

Building the server

The actions performed by the mobile phone and desktop will be completely independent of each other, communicating only through the server (app.js). The best way to do this is through two different html files. We could embed all the code into one large file, but that would overly complicate our simple proof of concept. Create index.html (for the desktop), and mobile.html (for the phone).

Room Structure
Each game instance needs a desktop screen and at least one mobile device to communicate. We don't want any outside interference, so we'll create a new game instance for each desktop screen. We'll refer to a game instance as a room. Each room contains a desktop and an unknown number of mobile connections. For clarity's sake (and to help distinguish between different rooms), we'll also include a room id. Room Structure:

App.js

//An array to store the existing rooms
var rooms = [];
function room(roomSocket, roomId){
  this.roomSocket = roomSocket;  //Stores the socket for the desktop connection
  this.roomId = roomId;          //The room id/name. A unique string that links desktop to mobile
  this.mobileSockets = [];       //A list of all the mobile connections
};

Definition: A socket is how we refer to a single connection. So for every device connected to the server, a socket is created. Node.js and Socket.io streamline this process, automatically creating and destroying sockets as needed. This continually simplifies our room management.
 

New Connections
When a new socket is created (anytime a user visits index.html or mobile.html), we want to notify app.js so it can either:

A) Create a new room (if index.html is sending the data),

index.html

//This sends the signal 'new room' to the server, along with data containing the room name id. For live deployment, this should be a random string. See the full documentation for example socket.emit('new room', { room: “lolcats”});

 

app.js

//The receiving end of 'new room'
socket.on("new room", function(data){
  //Pushes a new room instance, storing the desktop connection as the roomSocket and data.room ("lolcats") as the roomId
  rooms.push(new room(socket, data.room));
});

Awesome! So we have a new room object being created and stored each time a new desktop connection is established. For efficiency's sake, we want to monitor any lost connections and delete those rooms, but we'll get into that shortly.

 

B) Or add a user to a room instance (if mobile.html is sending the data)

mobile.html

socket.emit('connect mobile', { room: getUrlVars()["id"]}, function(data){
  if(data.registered = true){
    registered = true;
  }else{
    $('#error').append(data.error);
  }
});

Similar to a new desktop connection, mobile.html sends an emit('connect mobile') to app.js. We also pass along the id parameter from the URL (mobile.html?id=RoomName) to specify which room this mobile user should belong to. Lastly, a callback function informs the mobile user that they have connected successfully, and can now transmit data.

app.js

socket.on("connect mobile", function(data, fn){
  var desktopRoom = null;

  //Cycle through all the rooms and find the room with the same roomId as our mobile device
  for(var i = 0; i < rooms.length; i++){
    if(rooms[i].roomId == data.room){
      desktopRoom = i;
    }
  }

  if(desktopRoom !== null){
    rooms[desktopRoom].mobileSockets.push(socket);

    //Store the position of our room that this mobile device belongs to
    socket.set('roomi', desktopRoom, function(){})

    //Return the callback as true
    fn({registered: true});

    //Access the room that this socket belongs to, and emit directly to the index.html to 'add user' with the socketId as a unique indentifier.
    rooms[socket.store.data.roomi].roomSocket.emit('add user', socket.id, data);
  }else{
    //Callback returns false with an error
    fn({registered: false, error: "No live desktop connection found"});
  }
});

The server searches through the array of rooms to locate the correct one. Once we've identified the room by its name, it is saved in desktopRoom. After double checkingagainst a null value to ensure we have located a room, the mobile socket is pushed into the room[id].mobileSocket array. The socket.set method is then used to store data directly in the socket. We save the room's position in the array. With this value, we can easily access the appropriate room without having to search the array each time. The callback is returned as true if successful, or false with an error message.

Lost connections
But what happens when we lose a connection? Socket.io has a built in 'disconnect' function that is called when a socket disconnects. We start by testing for the existence of socket.store.data.roomi. Because we only set that value for the mobile connections, we know instantly the type of connection.

node.js

socket.on("disconnect", function(){
  var destroyThis = null;
  i
f(typeof socket.store.data.roomi == 'undefined'){

    //The lost socket is a room

    //Search through all the rooms and remove the socket which matches our disconnected id
    for(var i in rooms){
      if(rooms[i].roomSocket.id == socket.id){
        destroyThis = rooms[i];
      }
    }

    if(destroyThis !== null){ rooms.splice(destroyThis, 1);}

  }else{
    //Lost socket is a mobile connections

    //Sort through the mobile sockets for that particular room, and remove accordingly
    var roomId = socket.store.data.roomi;
    for(var i in rooms[roomId].mobileSockets){
      if(rooms[roomId].mobileSockets[i] == socket){
        destroyThis = i;
      }
    }

    if(destroyThis !== null){rooms[roomId].mobileSockets.splice(destroyThis, 1);}

    //alert the room that this user was a member of
    rooms[roomId].roomSocket.emit('remove user', socket.id);
  }
});

We now have a fully functioning Node.js server that handles all connections, disconnections, and stores our data in easy to parse room structures.

Updating the Tilt Data

With the connection established, we can easily send tilt data from the mobile to desktop side. This is covered in much better detail over at HTML5 Rocks, so we'll skip to the Node stuff.

mobile.html

function deviceOrientationHandler(tiltLR, tiltFB, dir, motionUD) {
  if(registered){
    socket.emit('update movement', { tilt_LR: Math.round(tiltLR), tilt_FB: Math.round(tiltFB)});
  }
}

This function will run every time the phone gets new mobile tilt data. We're really only interested in the tiltLR (Left Right), and tilfFB (front back). Read through the HTML5 Rocks for more info. app.js receives this data and immediately forwards it along to the desktop corresponding to our mobibile device.

app.js

//Update the position socket.on("update movement", function(data){
  if(typeof socket.store.data.roomi !== 'undefined'){
    if(typeof rooms[socket.store.data.roomi] !== 'undefined'){
      rooms[socket.store.data.roomi].roomSocket.emit('update position', socket.id, data);
    }
  }
});

Because mobile.html transmits data after a connection is established, there is no error checking to make sure the index.html counterpart still exists. Our 'update movement' performs this check to ensure the desktop connection still exits. It emits data directly to the correct room in the 'update position' signal.

index.html

socket.on('update position', function(socketId, data){}

The server will then signal index.html and call the socket.on("update position") . This passes all our tilt data to the desktop client, leaving the world wide open for awesome canvas implementations.

I won't be going into the canvas game aspects, but the example code and a basic version of canvas game play is available on Github. You can also see our language learning game implementation, Super Space Words .

Posted Wednesday, Aug 24 2011 09:15 PM by James Burke | Development, HTML5 | Node.js, Socket.io

How To: Social Plugins

 

[ This is a Guest Post from Cykod Intern James Burke ]


This is a follow up post to the talk I gave recently on Social Plugins and Widgets at the Boston Front End Developers Meetup.

Social Plugins are still a fairly new concept. While widgets that track visitors or display RSS feeds have been done to death, Facebook breathed new life into social sharing with their Facebook Platform and Social Graph. Since debut, Facebook has been storing user information and building a huge network of data. With the debut of Social Plugins, any site can take advantage of the social giant Facebook has become.

 

Facebook’s Open Graph Protocol:

Access to Facebook’s Social Graph is accessible through their aptly named, Graph API. Nearly all Facebook public data is available through this graph. People, pages, events, photo galleries, etc. The ability to see this information is a very cool and powerful way to do analytics research.

We’re going to use this platform to socialize our websites. In order to properly integrate our web pages, we need to turn our content into a form that Facebook can recognize. We declare this information as meta data in the page header. There are a bunch of properties to use, but Facebook requires these four specifically:

  • og:title - What your content is about.
  • og:type - What object type your content is.
  • og:url - The URL for this object, usually dynamically generated.
  • og:image - An image that Facebook with utilize in their graph and other social actions.

 

So in my example for Cykod.com, the Open Graph tags would look like this,

<meta property="og:title" content="Cykod"/>
<meta property="og:type" content="website"/>
<meta property="og:url" content="http://cykod.com/"/>
<meta property="og:image" content="http://cykod.com/system/storage/3/11/logo.png"/>
<meta property="og:description" content="A forward thinking, Rails-based web start-up looking to change the way you interact and play with the web."/>

You can take a shortcut for your Open Graph tags by skipping to step two on the Like plugin page.

 

Like/Share Button:

This is a really simple one to implement. Just follow along on the Facebook documentation.

There’s a form to help you out. Enter in the content you like, and compare to the live preview.

Like/Share button options:

  • URL to Like - Leaving this blank will default it to the current page (for XFBML only, but we’ll get more into that in a second).
  • Send Button - Once again, XFBML only. It allows the user to specifically target who they want this page to be ‘sent’ to. Clever name right?
  • Layout Style - Each is a little different. Play with different settings and the live preview to decide what fits best.
  • Width - Based on your site layout.
  • Show Faces - The option to display small thumbnail images of the user’s friends when they see the like button.
  • Verb to display - “Like” or “Recommend”
  • Color Scheme and Font - Again, based on your layout.

XFBML or IFrame?

 eXtended Facebook Markup Language is Facebook’s very own specialized XML markup to quickly add these plugins.

IFrames are HTML tags that allow other pages to be embedded as a frame within another page. These are quick and easy to implement, but carry accessibility and SEO concerns.

In my opinion, XFBML is a cleaner approach with more options. It gives you —the developer— more control over the code.

 

Comments:

The comments plugin is pretty cool. In just a few minutes, your entire site can be up and running with a fully-fledged comment and moderation system without worrying about login/registering users.

Similar to the Like button we just covered, Facebook offers a nice form that generates the code for us.

After entering your site URL, your code will look like this:

<div id="fb-root"></div>
<script src="http://connect.facebook.net/en_US/all.js#xfbml=1"></script>
<fb:comments href="example.com" num_posts="2" width="500"></fb:comments>

  • We specify a div with the id “fb-root”. This is necessary for the running the JavaScript SDK (ie. any XFBML we use). If you’re using more than one plugin, it would be wise to consolidate the location of this tag.
  • The necessary JavaScript is pulled directly from Facebook.
  • A custom XML based tag is where our comment box will be inserted. Check out the additional parameters in the documentation for more customization.

Special Note: If you want each page on your site to have a different comment box (versus a side wide comment thread), then you’ll want to replace the url found in the href attribute. Using your favorite server side language, print out the url of the current page. For example, in Code Igniter (a PHP framework), I have a global function current_url(); that will return the —wait for it— current url. so, <fb:comments href=”<?php echo current_url(); ?> ...  creates a different comment thread for each page. Try Googling "{language of choice} print current url" if you need more help with the back end.

   You should see the comment box being displayed. Make sure you’re logged into Facebook and enjoy your quick commenting system!

 

Posting a message on the user’s wall:

The JavaScript SDKoffers us a bunch of additional tools to play with user data. First though, we need to take a short detour and setup a Facebook application page (so Facebook knows who is asking to publish content).

  • Visit the create application form.
  • Fill out the name of your app appropriately (I usually just use whatever the title of my site is), and agree to Facebook’s terms.
  • Skip ahead by clicking save.
  • Copy the App ID for your application. Hold onto this.

We’re going to focus on the first example from the Social Channels page. We want to prompt a logged in user to share a link on their wall. Facebook gives us the following base code.

<div id="fb-root"></div>
<script src="http://connect.facebook.net/en_US/all.js"></script>
<script>
FB.init({ appId:'YOUR_APP_ID', cookie:true, status:true, xfbml:true });
FB.ui({ method: 'feed', message: 'Facebook for Websites is super-cool'});
</script>

The first two lines should seem pretty familiar. We have the #fb-root again, as well as the JavaScript sdk.

Additionally, the FB.init and FB.ui methods are called right away;

  • FB.init is Facebook’s initialize method. This connects to our app and sets a few other basic parameters.
  • FB.ui is another SDK method. There’s a lot more to it, but we’re calling the feed method to display a popup to post on the users wall.

FB.ui will execute as soon as the page is done loading, which is not the best scenario. We would really like it only occur after clicking a link. Let’s wrap FB.ui in a method we can then easily call.

<script>
function postToFacebookWall(content){
FB.ui({ method: 'feed', message: content});
}
</script>

<a href=”javascript:postToFacebookWall('This is our content!');”>Click here to share this page on your Facebook wall</a>

When a user clicks on our link, we call the FB.ui to prompt a post. We pass the content parameter to populate the message value within FB.ui. This should give a bit more control when creating new links.

 

Other Sharing:

Besides Facebook plugins, there’s a whole world of different sharing plugins. Thankfully, they’re all very similar in code and usage. Nearly every instance you come across will have a form to generate customized code for you.

  • Twitter - Tweet Button
    • Select your shape, the default tweet text, what URL to focus on, and you’re done! Copy, paste, and tweet!
  • Twitter - Follow Button
    • Enter your user name. Copy, paste, follow!
  • Google Plus - Plus 1
    • The newest kid on the block, the asynchronous code looks worse than it is.
  • LinkedIn - Share Button
    • You may want keep this one away from your cute kitten Tumblr.
  • Delicious - Save Buttons
    • Although it may not be around for much longer, Delicious was the original social bookmarking website.
  • Digg - Digg Button
    • One of the earliest implementations of the social sharing.
  • Reddit - Reddit Buttons
    • Arguably where all the former Digg users now live.

Group Sharing Widgets

Easily embed dozens of Social Widgets at once. These take the guess work out of your code and make it very easy to share. There are a bunch more, but these are generally the top three options.

 

Building social functions is a tricky task, but these social plugins help the hurt. Optimized by some of the best engineers in the industry, these code snippets will help your site grow and make sharing content easy.

Make sure you dig through the Facebook documentation and let us know how you integrate social into your sites!

 

 


 

James Burke is the summer intern at Cykod. He is a front end designer and developer from Pennslvania, going into his senior year for Computer Science.

Posted Friday, Jul 29 2011 12:41 PM by James | Development | Facebook, Social

Boston Front End Web Developers Meetup, Round #1

 

The first Boston Front End Web Developers Meetup went off this past Wednesday (May 25th) according to plan with around 30 people showing up for the inaugural meeting - I gave a couple of talks, the first was a "State of Front End Development 2011" covering the current state of the craft and talked a bit about the aims of the group. The second one covered the basics of a Semantic HTML5 Chop with some extra details on Meta-data and the Semantic Web (I believe only a couple people nodded off during that one) Both presentations are embedded at the end of this post.

I asked people to fill out a survey with a few questions - the most important of which was what topics of interest for upcoming presentation might be - here's the responses:

 

Actual Data:

Writing Semantic HTML, HTML5 101 20 69%
HTML5 & Progressive Enhancement, Shiv & Fallback Options 21 72%
Using Grids, Fixed and Fluid 17 59%
SaSS, Haml, Less 11 38%
Coding for the Mobile Web - Media queries, jQuery Touch, Mobile CSS support, etc 21 72%
Javascript, Basic (language overview) 15 52%
Javascript, Advanced (closures, namespacing, ... ) 14 48%
jQuery, Basic (jQuery 101 - Selectors, Animation, Ajax) 17 59%
jQuery, Advanced (Promises, Templating, Data elements, new features 18 62%
Client side performance optimization (CDNs, minification, async loading, spriting, ... ) 17 59%
A/B Testing 14 48%
Analytics Options 12 41%
Facebook integration, OAuth, OpenGraph, Social Widgets 18 62%
Semantic Web, RDFa, Microdata 11 38%


What I thought was interesting was: one, the number of topics that had significant interest (no topic got under 38%) and two, the level of interest in learning more about mobile. It's pretty clear that there's a lot of interest in everything HTML5 related, and additionally both a serious need for mobile development and a lack of confidence in the best practices.  

If you're in the Boston area, join us for the next Boston Front End Web Developers meetup, and if you're interested in presenting on a topic shoot me an email at pascal at this domain. 




Posted Saturday, May 28 2011 06:03 PM by Pascal Rettig | Development, HTML5, Presentation

When's the best time to email a busy person?

   

Mid-morning.

 

Here's why: many people (myself included) don't get through their emails each day. When the number of emails someone receives each day exceeds their ability to respond to them those emails start to back up and can go unanswered for days. Now, most mail clients show the newest emails first, so new emails actually have a higher visual priority than old emails and will tend to get handled sooner as the recipient races to try to keep up with the inflow of contact requests.

If you get your emails into someone's inbox after they have handled all the overnight notifications, marketing and spam messages, it'll sit as close to the top of their inbox for the greatest period of time and has the best chance of actually getting handled before falling into the bottom-of-the-inbox-hell. Given the variation in times in when people arrive at the office, mornings also tend to be less "busy" (with calls, meetings and the like) and so I'd guess people tend to spend more of their time responding to emails in the AM than the PM. 

When's the worst time to email someone? Anytime in the late afternoon into the evening. Those email will get stacked up with all the overnight emails and may be ignored. Even worse, If they are viewed on a smartphone or the like but can't be responded to immediately because the reader is not at the office, they email client will still mark them as "read" and they will effectively disappear into the clutter of the inbox.

(Note, this is all from my own personal experience as both the sender and recipient, so I'm sure there's lots of other opinions on the matter)

Posted Tuesday, May 24 2011 09:47 AM by Pascal Rettig | Consulting
1 2 3 4 5 .. 10   > >