BLOG

Archive for the ‘ruby’ Category

Daily Review #34

with 2 comments

I’m still recovering from my largest freelancing client shutting down operations. I have a few other clients in various stages of start and maybe one more trickling it’s way in. I’m still chasing down some other leads, but I have some immediate availability if you have some ruby or rails work to get done.

Amidst it all, though, I’m working on a couple side projects. 2010 is going to be the year of the product for me, when I switch from living mainly on software consulting money to product based revenue.

Hopefully Project M will sneak out amidst tryptophan dreams of the upcoming holiday. Project G will come a little later.

I also switched mailing services from MadMimi to AWeber. I left for two primary reasons and only miss leaving for one. I left because AWeber let me A/B Testing on any messages that go to lists of more than 100 users. If I was a marketing genius, I wouldn’t need to A/B Test, okay I probably still would, I couldn’t do it with MadMimi. Second, AWeber lets me set up an autoresponder (or drip or trickle campaign) in 2 ways:

  • Days since last contact. – This version is useful for traditional autoresponders, a new contact is added to my system, they get a welcome letter, then 2 days later they get a follow up, then 3 days later, and so on.
  • Days until a specific date. – This version lets you run a “countdown” campaign. So for my Get Clients Now!™ Coaching Seminar that starts on January 11, 2010, I can start sending emails 6 weeks outs, 4 weeks out, 2 weeks out, 1 week, the day before, the day of, etc.

The only feature I’m going to miss leaving MadMimi is multiuser accounts. I was able to set up my VA an account and she had access to my subscribers and promotions without having access to billing information or my password.

Interesting tidbits from around the web

  • 31 Days to Start Freelancing – I haven’t made my way through all the individual posts, but it looks like a good list. On the other hand, if you’re freelancing on rails you should spend $9 and pick up Mike Gunderloy’s book Rails Freelancing Handbook.
  • Easier Timezones – Project M has some concerns with timezones, not much, but I haven’t used them before and this seems to be the default reference.
  • 10 Things you should never stop asking yourself – This should probably go on a wall somewhere, or your desktop. Add some more, like “What can I take away to make my product better?”. Then set up a calendar reminder so you look at it every so often. Helps to be reminded.
  • Kanban kick-start example – Does a nice job of explaining what kinds of things go on a kanban.
  • 49 Tools for Living the Location Independent Lifestyle – Interesting list. My wife and daughters are tied to school 10 months out of the year, but I’d love to start taking extended working vacations during breaks.
  • Drobo++ – More hardware to covet. I’ve wanted a drobo since I first saw it, but I have a hard time paying that kind of money when a fraction of it and I could get this Home Server. It’s not Apple but this is one of the things I think Microsoft got right, Home Server. Sure, you can cobble together some pieces and make something almost similar on Ubuntu. Or spend 3x as much money on a Mac Mini Server. But when push comes to shove I’d rather have the Acer and a couple 24″ Samsung SyncMaster’s. Rob pointed me to it. Maybe Santa will bring it for me.
  • Designer’s Guide to Passive Income – What’s not to like? Make money while you sleep.


Introducing BDDCasts and Brandizzle

with one comment

Istvan Hoka and I are proud to announce BDDCasts, and the Brandizzle series of screencasts that provide a “fly on the wall” perspective to how we work. We’ll create Brandizzle from “rails brandizzle” through to “cap deploy” and every step in between over a number of sessions. We practice virtual pair programming, BDD with cucumber and rspec, and we both use mac and textmate to hack code.

You can check out Brandizzle here and get the code on github.

You can check out all our episodes on bddcasts.com

To get a bit more feel for what we’re doing, check out this video:


Written by jeff

September 2nd, 2009 at 1:59 pm

URLAgg released as open source

with one comment

I’ve seen a lot of questions and requests for how people are using cucumber for BDD and what files they’re spec-ing with rspec. With a slant on real life examples, and examples that are on Github.

There’s nothing magical about the URLAgg code, nothing proprietary, it was just scratching an itch for me. So I spent some time this morning ripping out some private configuration information, fighting deployments, and finally have URLAgg up on Github here: http://github.com/jschoolcraft/urlagg/tree/master

There will be more to this story later, but for the time being I’ve spent enough time today trying to release this code.

If you have questions, comments, suggestions, fork away or get in touch.


Written by jeff

August 4th, 2009 at 11:06 am

@apressdailydeal updates

without comments

I’ve tweaked how my twitter bot @apressdailydeal tweets. One of the biggest sources of confusion was caused by my linking directly to the ebook details page on Apress’ site. Today’s book, for example, is Storage Networks but the details page doesn’t show the $10 daily deal price. It’s shown in the card and on the Daily Deal itself.

The old format went something like this:

[Date] Title (link to details page for book)

So yesterday’s deal was:

[2009-06-24]: Excel 2007 PivotTables Recipes: A Problem-Solution Approach (http://rubyurl.com/HtMV)

The new format:

Title, Publish date, Author, (DailyDeal link)

So today’s deal is:

Storage Networks Pub. Jun 2004 by Daniel J. Worden (http://su.pr/2R677z)

Twitter takes care of the date, I add author and publish date so you can decide if it’s worth buying a book from 2004 (sometimes they are, other times not) and I avoid confusion by linking you back to the page that says $10.

Disclaimer: I’m not associated with Apress in any way. I was just annoyed/surprised that they didn’t have an RSS feed or something for a “daily deal”. I wouldn’t remember to come back every day. Yes, Apress has some affiliate type program but you can’t earn anything off the daily deal. I think the only thing this has really done, for me, is cause me to spend more money because I’m actually seeing the deals every day. (Though if they wanted to send me a Kindle DX to read all these books on, I wouldn’t send it back :) )


Written by jeff

June 25th, 2009 at 4:18 am

Posted in books,ebook,ruby

Tagged with ,

Why my impatience drove me to reinvent the wheel and how I fixed it

without comments

URLAgg was my pet project, an itch scratcher. It pulls the delicious popular json feeds and makes sure I only see new links and that they’re available to me, across all or one of my tags, via summary or itemized feeds. It’s supposed to save me time by not living on delicious, manually moving around popular feeds and trying to pick out new links.

The first few days I’d add a tag, and at the top of the hour after the fetch had run I’d see new links for that tag, hot and popular. Then all I’d see was hot and popular.

The delicious algorithms for hot and popular are proprietary (read: smoke and mirrors). I thought I was missing something so I added a bunch of changes to pull the tag feed directly. Then I needed to make another call for each link to see how many times it was bookmarked. I had to add a threshold because I was seeing anywhere from single digits to almost 65,000. So I added a threshold (first 50, then 500) and made sure it was used when showing links on the web. Then I had to deal with freshness, I didn’t want to work on updating the 50,000 links I’d pulled down if they were old.

All told I’ve spent a week on that, week and a half or so when you consider the time I was paired as well. Not 60 hours, but time where I’m not doing anything else. Finally, trying to tweak it one more time this morning, because it looked like dupes were coming back, a lot after marking as read, it seemed like “I’m trying to implement my own delicious popular algorithm”.

Ugh. So I started disabling that code and updating specs to go back to where it was before. I could have reverted but in the mean time I added an admin dashboard to help–among other things–debug my filtering.

I’m going to give it a week or so and not touch it. See if things level out, if there really aren’t that many links becoming popular for individual tags.

I have enough to keep me busy. Beta invitations, coupons and making sure express checkout always runs and passes in my cucumber tests (for a different project).

The lesson for me, I hope, is “let the system return to equilibrium before trying to make adjustments”. Only make adjustments based on customer needs, and I’m not always the best customer.


Written by jeff

May 27th, 2009 at 8:37 pm

Posted in Cucumber,URLAgg,rails,ruby

Tagged with , , ,

MySQL charsets, rails and collation mismatch

without comments

URLAgg parses json feeds once an hour in a rake task that is kicked off by cron.hourly. I’ve used backgroundrb for other projects (hushchamber most recently) but went with the simplest thing that could possibly work. I’ll probably have to move to something else because I’d like to parse feeds near real time when a user tracks a completely new tag to URLAgg, but that’s another story.

I get emails from cron letting me know how things went, and started noticing a few had bombed out with an error message like this:

rake aborted!
Mysql::Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
and (utf8_general_ci,COERCIBLE) for operation '=':
SELECT * FROM `source_tags` WHERE (`source_tags`.`name` = 'フリーソフト') LIMIT 1

See full trace by running task with --trace)
(in /var/www/apps/urlagg/releases/20090510100125)

I had UTF8 specified as my encoding in database.yml and thought that would have been good enough. Obviously it wasn’t.

You see, MySQL can have different charsets for the database, tables and even columns within the tables as well as the connection.

To find out what charsets you have defined you can do this:

SHOW VARIABLES LIKE ‘character_set%’;
SHOW VARIABLES LIKE ‘collation%’;

Mine was a mix of latin and utf8. I wanted UTF8 across the board so I wrote up a ruby shell script to take care of the problem for me:


I’m still looking for a way to set my charset to UTF8 in one of my config files (/etc/my.cnf or something) so that it’s defaulted everywhere to UTF8. If you’re aware of how to do that let me know. In the mean time, hopefully this saves you some headaches.

[edit] Formatting of the

 and gist code blocks aren't very nice. I'd love to be able to edit formatted code snippets through Blogo and not mess with this, but atm I can't. Hopefully soon.


Written by jeff

May 15th, 2009 at 3:10 am

Posted in Mysql,rails,ruby

Tagged with , , , ,

hushchamber.com

with one comment

It all started with a simple post from Jason Haley while I was on vacation with my family at Emerald Isle.  The post was about creating a link blog aggregator to show the days top picks and diamonds in the rough with or without having to sift through everyone’s mention of X.  Like I said, I was on vacation and trying really hard to stay disconnected.  I managed to slip in a few minutes here and there while my youngest daughter took her naps throughout the day and played a bit with Hpricot trying to parse some feeds.

I put things on hold, finished vacation and got caught up with a couple other things.  Finally I came back and decided to crank out a 1.0 version (no beta’s here).  The result is hushchamber.com

It’s another ruby on rails project for me with some interesting challenges.  I’m using backgroundrb to do all my asynchronous and background processing (mainly fetching new feeds from RSS and parsing those feeds for new links).  I’m using god to monitor backgroundrb and restart it if one of the workers goes down, this normally happens for me with a timeout with open-uri (I think, I can’t catch the error in a rescue for some reason).

So what were the challenges?

Parsing was a big one.  Not that it’s difficult with Hpricot but some blog clients produce better markup than others.  I’m trying to get only link mentions and not links to other stuff (like the link bloggers blog, or job search sites, or events that are repeated for up to two weeks).  In some instances I had to strip out some content before I could successfully parse it.

Links, surprisingly were another one.  Depending on how the linkblogger linked to content–sometimes through a feedburner feed, sometimes directly to the resource on the web, some with URL shortening services–produced different URLs.  The challenge here was getting all 3 links to the same resource to behave like the same resource.  I ended up having to “resolve” the URL to it’s original web form and use that as the unique identifier for the link.

Titles were another issue, titles for links that is.  I started using the inner html for the anchor tag as the text/title for the link itself.  So in the following Example I’d grab “Jeff’s Blog” and set that as the text to use for the link.

  1: <a href="http://thequeue.net/blog">Jeff's Blog</a>

The problem is, some titles have <font> and <strong> tags in them, and I really didn’t want that.  Other times it’s just that link bloggers description of the resource which may or may not make sense out of the context of the blog entry itself.

I played with the idea of going out to the resource directly and using hpricot again to grab the <title> attribute on the page.  I actually did that but 90% of the sites I tried that on have the real title plus some other information about the site as well.  It’s really bad when it’s a Code Project link, the title++ portion is something like an extra dozen words.  I ended up going back to just using the linkbloggers inner html, and it’s first in wins.

Top links and diamonds in the rough

Surprisingly there weren’t as many links mentioned from a bunch of linkbloggers as I thought there would be.  Even on jQuery Monday there was a lot of talk about jQuery and Visual Studio and ASP.NET MVC but they were all to different posts.

image

Which brings up another interesting suggestion, or thought, to display links grouped by tag.  I’ve played around with going out to del.icio.us (this is still faster than their new URL for me to type) and grabbing the top tag for a given link and grouping by that tag.  We’re left at the mercy of the initial taggers of a resource but most of the time they’re close.  Though it doesn’t help for a completely unrelated bunch of links.

So, check it out: http://hushchamber.com

There’s also an RSS Feed for it that you can subscribe to and get the last 30 days or so worth of links, if old posts come in you do get updates to RSS.  let me know what you think by using the feedback button:

image

Written by jeff

October 15th, 2008 at 2:19 pm

Posted in blogging,rails,ruby

APRESS Daily Deal Twitterbot

with 2 comments

In a previous post I described how I’m keeping up with APRESS’ eBook Deal of the Day.  At the time I was using a Firefox extension, Update Scanner to watch for changes to the web page.  It worked well enough, but it still required me to physically go out to the page and I’ve noticed a couple quirks.  Since then I got excited about a tangent, zenhabits Twitterbot Challenge.  I never actually participated in the challenge as the Twitterbot thread took me in a different direction.

While exploring some of the different uses people have been using twitter for I saw a w00t twitterbot that tweets w00t items, handy in a w00t off.  That wasn’t far off from what I wanted from APRESS (I would have been happy with RSS, but I’m using twitter regularly enough that it’s a decent solution).  All I need to do is go out to the daily deal page, scrape it for the book name and link, then tweet it.  How hard can it be?

Not that hard, really, about fifteen lines of ruby code and a few gems:

  1: #!/usr/bin/ruby
  2: 
  3: require 'rubygems'
  4: require 'hpricot'
  5: require 'shorturl'
  6: require 'twitter'
  7: require 'open-uri'
  8: 
  9: doc = Hpricot(open('http://www.apress.com/info/dailydeal')) 
 10: deal = doc.search("//div.bookdetails")
 11: book = (deal/"h3/a").first
 12: description = (deal/"p")
 13: details = (description/"div")
 14: (description/"div").remove
 15: 
 16: root_url = 'http://www.apress.com'
 17: book_url = ShortURL.shorten(root_url + book.attributes['href'])
 18: book_title = book.inner_html
 19: 
 20: tweet_start = "[#{Date.today}]: "
 21: tweet_end = " (#{book_url})"
 22: book_title_shortened = book_title[0, 140 - tweet_start.length - tweet_end.length]
 23: tweet = tweet_start + book_title_shortened + tweet_end
 24: 
 25: twitter ||= Twitter::Base.new("ApressDailyDeal", "my_super_secret_password")
 26: twitter.post tweet

So, let’s break it down.

Line 1 is there because I’m going to set this to run as a cron job later and I’d rather do ./dailydeal.rb than ruby dailydeal.rb.

Lines 3-7 bring in some gems (similar to perl’s cpan packages).  open-uri to get the page, Hpricot to scrape it, short-url to make the link to the book shorter and twitter to tweet.

Lines 9-14 could probably be tightened up, but I rather like the “clarity over cleverness” meme.  Surprisingly, APRESS’ page was rather easy to scrape which means it was well marked up, using classes appropriately for semantic meaning (div class=”bookdetails” actually held… book details). 

I stitch the URL together in 16-18 and call out to short-url’s shorten method.  It supports a bunch of services, rubyurl is the default and with tinyurl not working for me, I went with it.  I actually forked the project on github to add support for is.gd (every character counts with twitter) but I’m waiting for a gem update before I switch to that.

In lines 20-23 I assemble the tweet.  I broke it into a few parts because I need to make sure the message length doesn’t exceed 140 characters.  The date and URL to the book are more important than the entire title of the book, so I truncate that if necessary.

The last lines are just sending the tweet.

On my linux box I used crontab -e to enter the following job:

  1: 30 08 * * * /home/jeff/development/apress_twitterbot/dailydeal.rb

So at 8:30 every morning, give or take a bit of drift either way, I’ll tweet off today’s new Daily Deal:

image

The original page:

image

 

 So, if you’re interested in getting APRESS’ Daily Deal tweeted to you every morning (EST) follow my bot: http://twitter.com/ApressDailyDeal

eBooks are becoming a lot more interesting with the introduction of the Plastic Logic reading device (http://www.plasticlogic.com/)

Written by jeff

September 15th, 2008 at 2:15 pm

Posted in ruby

Euler 14, F#, C#, Ruby and a Geriatric Gerbil

without comments

I came across this post while clearing out my Google Reader items this morning. While I’m not particularly interested in F# I am interested in math problems so I skimmed the post. I have nothing against F#, I just don’t have time to learn yet another thing.

I’ll describe Euler 14 in a minute, but I need to get some attribution out of the way.  As I said, this post was my introduction to the problem.  It turned out to be a response to this post, asking for a faster F# solution.  So here, is problem 14 of Project Euler.net:

The following iterative sequence is defined for the set of positive integers:

nn/2 (n is even)
n → 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:

13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.

I thought I’d do it in ruby, mainly to pass the time and give me an excuse to use ruby. I first started out just printing the first hundred Euler chains.  Then I thought I’d see if I could solve the problem, the longest chain generated from a number less than 1,000,000. My first, brute force implementation looked something like this:

require 'benchmark'

def even?(num)
  num % 2 == 0
end

def euler_chain_size(start)
  length = 0
  while start > 1
    if even?(start)
      start = start/2
    else
      start = 3*start+1
    end
    length += 1
  end
  length
end

max_chain_size = 0
max_chain_start = 1

Benchmark.bm do |x|
  x.report do
    1.upto(1000000) do |i|
      chain_size = euler_chain_size(i)
      if chain_size > max_chain_size
        max_chain_size = chain_size
        max_chain_start = i
      end
    end
  end

  puts "(#{max_chain_start}, #{max_chain_size})"
end

I had saved that off to euler14.rb and typed this in at my console:

time ruby euler14.rb

For my windows readers unfamiliar with time, it “runs programs and summarizes system resources”. Basically it’s a cheap way to see how long it took to run. The benchmark was added so I can get timings in windows (or anywhere).

For me that came back at:

real 7m34.232s

Ugh. The geriatric gerbil generating juice for my X41T must have died halfway through and his twin somehow resuscitated him (it’s a core2 duo, of course). To make matters worse, I’m doing this all on Ubuntu in a VM on Windows Vista SP1.

I was playing a bit and mentioned the problem to a buddy of mine. He’s a python dev and he whipped up a nearly identical brute force method in python. His took 1m17s. He does have some crazy fast laptop, but still, that’s it’s like 5x faster.

I don’t know why, but I thought I’d move even? into an inline ternary. That version didn’t change much, so here is the new euler_chain_size method:

require 'benchmark'

def euler_chain_size(start)
  length = 0
  while start > 1
    start = start%2 == 0 ? start/2 : 3*start+1
    length += 1
  end
  length
end

That run, for me, was nearly twice as fast.

C:\development\InstantRails2\dev>ruby euler14.rb

user

system

total

real

207.898000

0.010000

207.908000

(252.110000)

I’m still new to ruby but, I remember reading about methods and messages in ruby and think this may be an issue with ruby messaging having to call even?(num) 1,000,000 times. Maybe not, someone that knows, would you let me know?

Not bad, but I’m still not even close to my buddies time. So I applied a quick optimization and now I’m down to:

C:\development\InstantRails2\dev>ruby euler14a.rb

user system total real
17.585 0.030000 17.615000 ( 21.230000)

How am I going from 208 seconds to 18? I’m not recalculating the lengths of chains we encountered before (or already calculated).

require 'benchmark' 

def euler_chain_size(start)
  length = 0
  while start > 1
    start = start%2 == 0 ? start/2 : 3*start+1
    length += 1
    if @chains.has_key?(start)
      length += @chains[start]
      break
    end
  end
  length
end 

@chains = {}
max_chain_size = 0
max_chain_start = 1 

Benchmark.bm do |x|
  x.report do
    1.upto(1000000) do |i|
      chain_size = euler_chain_size(i)
      @chains[i] = chain_size
      if chain_size > max_chain_size
        max_chain_size = chain_size
        max_chain_start = i
      end
    end
  end
  puts "(#{max_chain_start}, #{max_chain_size})"
end

I store the length of chains in a hash, keyed off of their start value. When I get to the start value of a chain that I already know (is in the hash) I just add that length to my current length and break out of my while loop.

I’m sure I could optimize this more and it’s probably horribly ugly ruby in a c# mind set, so I’d love input on how to refactor my code.

The answer? You should really work it out on your own.  If you’re really curious post a comment or send me an email.

Written by jeff

April 4th, 2008 at 12:33 am