My first attempt at open source: PageRecommender

For personal/private [non-work related] projects, I tend to shy away from creating/working on open source projects. Typically open source projects tend to resemble the same work I do during the day, working and dealing with others. That’s no fun when you just want to build something you need. However, I am trying something new. I’m releasing one of the personal projects into the world of open source. It is the component that is used on this website for making recommendations for projects. [Example: see the bottom of the Financial Strategy Simulator page] A project page that contains more technical information about this project can be found under /a/Projects.

To find this source pull it [using git] from: https://github.com/monksy/PageRecommender

What is this project about? This project is designed to analyze Apache request logs, and attempt to piece together sessions, and then to create an Amazon/Newegg-like statistical recommendation. The desired output is XML representing a parent/child relationship of a page and the next connecting page.  The output comes from standard out. The component is designed to be used as a quiet utility.

What this project isn’t: This project isn’t a completely generic solution that’ll fit your site. It is designed within the context of my current website, and the format of the standard Apache log files. Want to have it look for pages that don’t fall under the /p/{Name} syntax? Change up the Apache log file format? Well this project won’t work for you without modification. Also, there is no such warranty provided by this code. It’s open source, it’s free as in speech but not as in beer.

What could be improved: I realize that this project isn’t perfect. I could have designed it to be slightly easier to read. It could be documented much, much better. However, this is an internal project. Want to improve it? Github should allow you to make such changes. [I’m rather new to Github, so don’t hold me to that statement] The XStream dependency could be removed. But for right now it works.

What do I recommend?

Grab the source! Add more tests, and send me input on how you think it could be improved.

Review: “Test-Driven Development” by Kent Beck [The creator of JUnit]

It is always interesting to read a book about a technical topic from its creator. They tend to better identify the motivating factors and history that went into creating the product than other authors could possibly try. Kent Beck is the creator of JUnit, and happens to be the author of this book. The book is designed for people who are new to unit testing and Test-Driven Development. For anyone in the current software development industry, this is very few people.

The book is divided into three separate sections. The first being a walk-through with JUnit (by example), xUnit, and then a “best practices guide.” The JUnit section, the majority of the content of this book, is focused on developing a currency class. While there were some interesting design decisions, this part irritated me the most. The first section produced blatant errors, just to create a test that would fail. I would have preferred if these errors had been described rather than presented to the reader. The currency example was a good example, however adding another example would have been better. The xUnit section of the book focused on creating a test environment for Python. I am not a Python developer, but I have written a very small amount of code. I was not a fan of this section. However, I was pleased with the best practices section. The only issue I had with the last section is that I do not recall that the best practices section went into why the recommendations are better than what they replace. If my memory serves me correctly, it did not.

This is a great resource for developers who have never heard of TDD. It may even be a great book for students that have just completed half of a class in Java. However, it is not such a great book for people who have worked with Test Driven Development.  For those who have already worked with TDD, a Manning Press’s “In Action” series may be more suitable. Although their resource on JUnit looks to be a bit dated. Manning Press’s dated JUnit book includes information on integrations with other frameworks, which are slightly harder to test against [J2EE, XML, Servlets, EJB, DBs….].

In summary, I was probably looking for another book similar to “Pragmatic Unit Testing in Java with JUnit” (Hunt and Davis). Their approach was more focused on how to try to find trouble spots, and improve existing code.

Given that this was written in 2002, the following is not exactly fair criticism, but it should be mentioned by anyone that is claiming that TDD is beneficial. I would have liked to see more support [research/studies] on why Test Driven Development is beneficial. Every TDD-related article/publication/book that I have read always attempts to convince the reader that it’s necessarily with marketing-speak. The claim is that it is good for “high stress environments,” or it reduces defects. However, I have never seen evidence supporting the claim mentioned. I, personally, have found that TDD to be a significant improvement in development. However, I have never been able to quantify how much of an improvement it has made. What is the amount of time saved by TDD? Has it made today’s software more reliable than before? Has it affected the developer job market? [Increased/Decreased] Has it made the users of the products using TDD happier? Most importantly, has test-driven development reduced the stress of software development?

Well That Was Silly Of Me, Issues with Sed….

Refreshing my memory on sed caused me to run into two issues tonight. Firstly…. the -n parameter only shows the patterns that you wish to show [after it is used]. Secondly, the order of deleting and printing lines  matters. It turns out that it matters a lot.

Lets say you have a file named contents. It contains:

Gooogle
GooooogleBot
Gooooogle Pictures
Google Plus
Reddit
Yahoo

Let’s assume that you wanted to just show all lines that contained “Gooogle” [and its similar brothers] with sed. You would write a line that contains this:

 
sed -e '/Goo[o]\+gle/p' content

Right? Nope. It’ll show all of the items, despite that you used the print command to display items that matched that pattern. To fix this, put in the -n option before -e.

That’s great… But that returns: Gooogle, GooooogleBot, and Gooooogle Pictures. In this example, we don’t like GoogleBot. So lets remove it. You may now write something like: 

 
sed -n -e '/Goo[o]\+gle/p' -e '/Goo[o]\+gleBot/d' content

It seems like a logical extension. Right? The next regular expression should pass over the printed lines left and make an evaluation. Nope, it doesn’t. It’ll display the same results prior to the second expression. What’s going on? Its not a bad expression. Its not a bad command. It’s due to the placement of the prints, deletes, and where you ask that the pattern space be shown. This is some odd quirk, that I haven’t found an explanation for [yet]. But what it turns out to be the correct way of doing it is to rearrange everything where the deletes are first, and then the prints occur [Also, to refuse to print the pattern space after the deletes (weird I know).

So the correct form is:

sed -e '/Goo[o]\+gleBot/d' -n -e '/Goo[o]\+gle/p' content

Bizarre? Yes, very much so, but it works.

Regular Expressions Tester

This is one elusive tool that has been bugging me for years. There are quite a lot of regular expression tools out there, and not all of them test the expression environment that you need. [Rarely is it ever mentioned which environment its testing for either]. Through the magic of google I’ve found a tool to test regular expressions in Java. Its RegExPlanet. The tool allows you to test regular expressions for the environments: Java, Ruby, PHP, .NET, and Python.

For right now, all I can attest to is its ability to test regular expressions in Java. It goes beyond a simple regular expression match.

For Java Regular Expressions, It’ll show:

  • Group Counts
  • Individual Group Matches 
  • The Java String [to copy and paste directly into your code] (However, at the time of writing it does not allow you to enter the RegEx in the Java String format)
  • Replace First [if there is a replacement string]
  • Replace All [if there is a replacement string]
  • Simple Validation

From the looks of it, it also has a feature to save your previous expressions. That’s really cool! It’s practically the Fiddle [see jsFiddle] of Regular Expressions. It hints that it has the ability to save them to shortcodes (to share), but I have not tried that feature. Additionally, from the the looks of it; this is a new site. I would encourage developers to use this site.

A Tool That Would Be Rather Nice To Have

There is one tool that I haven’t found an open source project for or a commercial product. I think that it would be quite useful in creating dependable software. I would like to see for evaluating unit tests. It would be something like a Lint or Static analysis tool – but for the tests themselves.

Some of the features that I would imagine that would make this tool useful in evaluating tests for the particular cases:

  • Seek out infinite loop cases
  • Check bounds on parameters
  • Duplicate test cases [are there redundant test cases]
  • If exceptional cases are tested
  • Evaluate for bad parameters
  • Evaluate potential recursion issues.

This would differ from a code coverage tool, because a coverage tool is only effective for evaluating what isn’t covered by tests. The tool I’m suggesting would be responsible for making sure that the tests are effective, and would handle potential pitfalls in input and common programmatic areas.

My question to the reader: Does this already exist? Are there any new open source projects starting up that are designed to tackle this problem?

If not – how hard would it be to adapt existing analysis tools/testing frameworks, and where would you approach starting the development?

Computer Science Classes I Would Like To See Offered

My friend Warren recently expressed a very “scorched earth” opinion of the standard Computer Science curriculum. He suggested that given the available of free CS courses (OpenCourseWare and the like) and experts at your fingertips (Blogs and Stackoverflow); wheres the value of a formal CS degree?  That lead me to think …. What are some classes that would be very interesting to take: [Even as an alumni of 2 CS programs, I would be interested]

  • Software Development Tools: A lot of classes mention tools, but not how to actually use and extend them. [Build systems, version control systems, formatters, emulators/virtualization]
  • Debugging: The goal of this class is to teach basic skills and then go through labs of finding/fixing bugs. Later, to write unit tests to verify the bugs.
  • Testing software: This would be a hands-on class to teach students how to write unit tests, implement mocks, test black box software, and write up reports on testing procedures. This would also the involve testing of embedded hardware and software, black box systems, functional languages, network services, concurrent software, and even components that lack a stable test environment.
  • Computer Vision: It’s just interesting, and visual, often you don’t hit this class until graduate school.
  • How to design an API/Surveying APIs: Let’s get rid of bad APIs. This would also be responsible for demonstrating the differences [good and bad] of available APIs.
  • Open Source: Not really a history of open source. But this is to take an open source project, and extend it. The goal of this project is to get students involved with working with one project, and demonstrating improvement to the project.
  • Marketing: How to market your work, or someone else’s software. This isn’t designed to replace developers with marketers, but it just helps the developer understand how their work is sold.
  • Automation: How to automate manual tasks with software. This could be with build scripts, batch scripts, testing automation software, or even simple system scheduling.
  • From hardware to software: The goal of this class would be 2 things. To create a small embedded device, and then go all the way up the chain to a working software client. This would involve writing a device driver, interfacing with hardware ports, and using the driver. The second part of this class would involve creating a simple processor either as a circuit or a manual build. http://blog.makezine.com/2009/10/03/building-a-cpu-from-scratch/
  • How Computer Science relates to [Field x, y or z]: This is more of an open ended suggestion. For example: Offer a class on Bioinformatics [It’s a class that combines biology, and computer science], combine a class on art and computer science (that’s more of visualizations though), history and CS [examine potential DSLs involved with history research], or even math and CS [shows the tools, and libraries that one can use in their application].
  • How to crack software/re-engineer a binary: [Probably with the permission of the publisher] Crack a copy protection system. Many students know about software cracks, but very few actually know how to create them or how they work. The goal of this class would to familiarize the student with how software is compiled, and can be reworked after compilation. Also, this would demonstrate how to protect their application as well.
  • Alternative Language Survey: Yes, this is technically a standard class. But I’d like to see one on functional languages, Groovy, BF, or even creating your own domain specific languages be taught.
  • Community Service: This really isn’t as altruistic as the title may imply. This is more of a class to create a software component [as a large group] for a member of a local community. This could be a small interesting game [example I worked with a group for a class], or working with a small company or individual to improve or sofware-ize their product. Elon’s CS department did this a while back with the game deflection. even taking an existing board game [with the original creator] and making  a software version. The goal of this class is to research market need, create something usable and getting their fellow peers involved in using it. This would help communicate to the rest of the students about some of the cool things that CS can produce.
  • Author selected: Get an author of a quality software related book to teach a class. Have John Skeet teach a C# class. Paul Graham teach a class on LISP. Brian Goetz to teach a class on Concurrency.
  • Hands on Software Optimization class: Take an open source system, and optimize it. This class would teach formal procedures on how to optimize an existing application to perform as quickly as possible, monitor, and document the improvements.

Last of all, these classes should be fun and engaging. If you’re not actively involving the student, don’t even bother trying these suggestions.

I’m a student, what [language, framework, API, concept] should I learn?

Part one of a continuing series.

There are two ways to learn something. The first way is to learn from another’s advice [the easy way], and then there is the hard way [through experience and making mistakes]. When starting out in a Computer Science degree, learning the material and concepts will require a lot of effort. [the hard way] It’s not a wrong way to go about it. The hard route helps to reinforce the material. However, not everything in your journey to become a software developer has to be difficult.

One of the most frustrating things for a student, when in a Computer Science program, is producing a deliverable to submit. Many students, outside of the introductory course, quickly find out that a few quick hacks won’t get the highest grade available. Many professors require documentation, specific file layouts, compliable code, funky submission instructions, build instructions, shout-outs in the comments (1), or even coding conventions. These extra/non-technical requirements add to more complexity and make assignments more difficult. However, they do have merit and “build” the student’s technical character. In academia, there are very few courses in the US that will teach the tools of the trade, this is something that the student is left to learn on his or her own.

So what does this have to do with answering the question: What should I [as a current Computer Science student] learn? Build systems. Learning how to consume a web service to Facebook or writing the next twitter client is great, however learning how to use a build system will make your professor’s, and your life much easier. A build system is an external tool [usually separated from an IDE] that defines the instructions on how to build an application. In addition, it makes your life easier when questioning if your attempt was worthy enough to submit. Did the build system pass and produce a deliverable? If you have all of the rules and unit tests configured to run, the answer is easy… Yes! Once you are confident that the assignment was completed to satisfaction, you can go party err study ‘social algorithms.’

So let’s imagine that you have the following requirements from a professor:

  • Build a chat system [One client, and one server]
  • Include unit tests for every method used
  • Include Java Doc
  • Email to: professor@university.com
  • FTP The resulting zip file to: ftp server.
  • Create a readme at the base, and include your name and the word “Screaming Monkeys” within the readme file.
  • Include instructions on how to build your software
  • Use the following file format:
    • /
    • Readme
    • BuildDoc
    • Doc/
    • Source/Client/*….
    • Source/Server/*…
    • Binary/*…

Building a working chat client and server solution may be difficult for someone that has just learned about sockets and lacks experience. The non-technical requirements could become overwhelming. However the assignment never required all of the non-technical instructions to be performed manually. I have NEVER heard of a professor that will take points off for using these tools. If anything, using a build tool will improve your grade. It allows for you to plan for these requirements ahead of time, and let software handle the rest for you.

“But hey! That looks like a lot of work to do a few simple steps!” You’re right; all of those steps can be performed manually and probably quicker. However, if you screw up part of your final deliverable , you’ll have to perform all of those menial tasks all over again. With all of those tasks, Ant can perform those automatically by just typing in “ant.” You can configure ant to build the software, run all of the unit tests [and stop the build if one fails], configure the output of the java doc, run utilities [to confirm that your name is there, and the “special word”], perform all of the file manipulation and deliverable locations, and even submit the result [ftp, email, source control]. If there is an ant task for the action, you can use ant to do it.  Another benefit is that it can even run the source code through a “coding convention” checker. Have a professor that requires you to use the K&R coding convention? No problem, just add a check or reformator in the build environment. Problem solved.

In summary, assignments in computer science are difficult with the shear amount of demands. Reduce the potential for mistakes by automating the tedious tasks. Learn a build system early in your education. It will help ensure consistency in your code, get rid of annoying non-technical requirements, and make your work a lot simpler.

Need another reason? Do it for getting a job after you get out of school. There are very few students that have even used a build system prior to entering the work world. Most students entering the work world have to learn at least one build system as soon as they start their first job. The software industry thrives on build systems. If they aren’t working, the developers are unable to progress. Hey, worst case scenario: Let’s assume that you’re not a very good CS student, and you barely pass with your CS degree. If you know a build system extremely well, you can still proceed with a career of being a “Build Systems engineer.” Those guys can still make quite a lot of money. [Indeed claims that a build engineer starts at $50k and has a sizable amount of jobs that will earn $130k+ a year]

Some of the build systems that are currently in use:

  • Ant [Good for nearly every language out there]
  • MSBuild [Good for the MS supported language (VC++, VB?, anything .NET)]
  • Maven
  • Gradle
  • Make [Good for C++, and bash]
  • Apache Ivy

(1)    In my prior work as a teaching assistant I am guilty for supporting the shout-outs rule. The requirement was for the student to include his or her name in the top of the file. This was mentioned on the grading rubric (included in the assignment writeup/specs).  It’s a simple request to complete, and it makes the grader/professor’s life a lot easier. It’s akin to asking elementary school children to write their name on top of the assignment. A lot of new students don’t add their name to their code.

Lost the Passion for Software Development?

This is a list of potential tips to help a fellow Software Engineer recover passion in his or her work:

  1. Learn A New Language: Stuck with working solely in Java and C++ languages? Learn Erlang, or Lisp. Try out Scheme, Perl, Bash, Haskell, or D.
  2. Learn a New Framework: Some of the frameworks to learn: xUnit, Testing frameworks [other than JUnit], Spring.net, Apache Tapestry, XMPP, Esper (Complex Event Processing), OpenGL, JasperReports, OpenCV, Win32, OSGi, or even Hessian binary web services.
  3. Revive old projects by refactoring the code to use a new framework. Review code in open source projects.
  4. Attempt to become an expert in an open source project. You might become an authority on the subject, and write a book.
  5. Meet new developers. To do this, branch outside of your company and go to user groups related to the language. JUG for Java groups, LUG for Linux user groups, NUG for .NET user groups etc.
  6. Read and listen to lectures on InfoQ.
  7. Take a “Thirty Day Challenge” by coming up with a complex application and writing it in a completely unfamiliar framework or language. Something that would be rather unique, write a web application completely in Prolog or Erlang. I haven’t heard of anyone that has done that yet, but it would be rather interesting.
  8.  Annoyed with an application, language or framework? Fix it. If it’s open source, then write a fix for it and publically submit a patch. If it’s due to a closed source application, then rewrite the main interface [if it connects to a backend service], or rewrite the application completely.
  9. Have a business idea? The book “The Lean Startup” suggests that startup-interested developers should create a stripped down demo, to gauge market interest.
  10. Add new “favorite tags” to your StackOverflow account. Look for questions that contain lots of votes. Look for questions that have not been answered in a long time, research a solution and answer the question.
  11. Find ways to make your current job/task easier. If you are spending a lot of time writing the same type of unit tests over and over again, find a way to automate this, or write your own domain specific framework. Even better, write a Domain Specific Language for the current development project.
  12. Use coverage tools to find new places to test code that is currently untested.
  13. Read research papers. This is typically a very dull task, but there are some quality papers available. It takes some effort to find those papers, but the ones that are well written are worth ones effort.
  14. Hack: [I’m not responsible for unethical/illegal actions, I would suggest doing these things only for personal interest] Setup a VM to learn how to exploit system services, learn how to perform a privilege escalation, learn how to write a buffer overflow and run shell code, learn how an intrusion detection system works and attempt to exploit them, and attempt to crack legally obtained software.  [I advise that the reader do these things ethically (don’t share a software crack). The reader is responsible for his or her own actions. ]
  15. “Hack Hardware”- Write an application that interfaces with an Ardinio device. Root an Android or Apple phone.  You could also go the route that Linus Torvalds went, take data from existing devices and write an application that interprets the data. His project is SubSurface, its designed to take the data from a dive-computer, and to transform it into something the user can add notes to, and visually interpret the data. Write software for an embedded processor (FPGA, Basic/Java stamp).
  16. Learn how to make your own operating system. Take a Gentoo distribution, and reconfigure everything. Try out a real time Linux distribution.
  17. Look for project suggestions on Stackoverflow. There are quite a few questions from students asking about project ideas.  One popular suggestion is to write a plugin for a popular game. From my understanding, one can write a plugin for Civilization 4 by using Python.
  18. Find ways to make your job more fun, or easier. Identify tasks that are uninteresting or tedious and find a way to automate them. Learn how to configure and/or extend a build tool. Learn Ant, Maven, or Gradle.
  19. Take a vacation: Over-working one’s self is not an achievement, no one is impressed.

Lastly, (This really isn’t considered to be a tip) to help connect with others who have a similar interest to what you’re doing, write about your attempts, successes, or failures in a blog. Even if you decide that your current job is not fulfilling, then these are things that you can mention on your resume or future employers can see on your blog.

You’re a developer, as a developer you have the unique ability to create things. There is very little holding you back from developing something you want.

Have I missed some important tips? Have these tips help you? Are there any languages or frameworks that one should study? If so, leave them in the comments box.

Technical Writing for Programmers

Forewarning: I’m not an expert on this subject, however I do play one on television.

Software engineering is mostly a solitary, and technical occupation. Most of the non-programming language communication made by a developer is communicated to another person with domain, contextual, and similar technical knowledge. Very rarely, information is communicated to someone outside of the domain, or contextual atmosphere, let alone technical knowledge. This guide is more of a collection tips and reasons to document that I’ve discovered over the years.

  1. Acknowledge your audience. For frameworks, nothing is more frustrating to the reader when a quickstart guide is written for existing developers of that framework. Those developers do not want a quickstart guide, they want the Java/Net-doc. I’m looking at you Spring. If your audience is technical and new to your framework, please identify concepts that are exercised within the framework. Preferably, as an appendix and referenced to at the introduction of the book/document.
  2. Proof-read the document, even better: Have someone, even better people, from the targeted audience proof read the document for accuracy and readability. It is unbelievably surprising to discover how difficult it is to write documents that deal with troubleshooting, installation, configuration, or documentation of “common tasks.” From my experience, just the modification of an existing code base to add a new report took 4-5 pages of bullet point instructions. Once a documentation guide is published errors/incorrect documentation is embarrassing, are difficult to correct, and can be very costly.
  3. Screenshots: Screenshots make describing tasks involving a GUI clearer.
  4. Pictures [more than screenshots]: Reading words are useful, however a simple diagram is often useful for explaining complex architectures, or tasks. It also includes people who are visual learners.
  5. Versioning the document: This helps the user identify the document’s “up-to-date-ness.” I really hate advising this, as that some documents begin to include the change list. [Who really cares about this? All I want to know, as a reader, is if I have the most current version]
  6. If you are documenting source code and haven’t written by the time that you get to the unit test. Write the documentation while the tests are being written. If the documentation has already been written, write the unit tests based on that, even if you have access to the source code. From my experience this can piss off the original developer. However, as a tester, you should depend on the resources given to you, not what the actual structure is. From my experience, this level of pedantry can piss off the original developer, but in the long run it may prevent major errors in future reuse of the code. Make the unit test break if the documentation isn’t followed!!
  7. Provide documentation! Your developers/designers/user’s are very intelligent, however being intelligent does not make one a mind reader. Don’t expect another person to understand your [or your team’s] motivation, history, design decisions, or politics surrounding the project.
  8. Allow for feedback. If there is mistake in the documentation, provide the user to contact the author of the documentation for confirmation of a mistake. If you don’t do this, expect for people to be pissed off about your API, idea, project, source code, or product.
  9. Document bugs or deprecation: If something doesn’t work or is going to be removed. Warn the developer.  It’s better to know beforehand.
  10. Support: Support is more than a scripted technical support call. If you have better documentation low level support calls can be eliminated or substantially lessened. Additionally, supporting the documentation/product can give insight to the functionality used in the product and/or bugs that were previously not found.
  11. Job Security: It may sound contradicting (Most people I’ve talked to claim that obfuscation keeps a person employed for a while, at the expense of others), however it’s hard to claim that a person isn’t doing the job they were tasked with, or deny a promotion if their procedures are sufficiently documented.
  12. Documentation can help you identify architectural and interaction [GUI, and API] issues. This is more of a case of an problem, previously unrealized, coming into the consciousness. You’ve identified the issue, now you can talk about it and come up with a solution.
  13. For APIs: Provide examples for every method. Write a paragraph on: why someone should use that method, why one shouldn’t use it, the performance and memory costs, and the state change to the operation of the software or environment.

Have tips for technical writing, documentation? Leave them in the comments! [See I even provided feedback for this post :P]

Suggestions for Authors of Technical Books

If you are writing about something that has to be installed on the reader’s machine there has been something that has become annoying. Please do not provide distribution based instructions for installing the software/component/framework. I would prefer it if this was listed on the book’s website, or linked to the platform/distribution’s website. In place of the section, I would prefer a brief history of the component that the book requires. For example, if the framework’s API was forked from another project, if the API would have been motivated by particular problems, and/or who uses it.

Having a website for the installation and configuration would allow it to be updated when the software gets changed. For example, if there was a major break in the API usage, the website could be updated and the user could be warned against using future versions.