mp3Salad BLOG








Purchase Albums when you find music


Posted At : June 26, 2007 3:50 PM | Posted By : Justin

If you a frequent visitor to the site or someone new, you will have noticed that there are links beneath the music player to purchase albums for related to your search. These albums link over to Amazon.com where you can purchase them directly.

Comments (0) | Print | Send | del.icio.us | Linking Blogs | 672 Views

Improved MP3 Link Parser : Updated


Posted At : February 15, 2007 6:33 PM | Posted By : Justin

Yesterday I updated the link parser that is used by the results detail page to extract links to mp3s. It now has the ability to parse most pages that the previous version of the parser could not.


As I had mentioned in my previous post, the old link parser used regular expressions to extract links from a page. This method of parsing was very fast, but not very accurate (or accurate enough). So I decided to switch over to a DOM based parsing strategy , in that I convert the target site into a DOM structure and then pick out the links that I want using xQuery. The end result is more accurate results, giving the end user more mp3 links, but has slightly slower runtime.


One good thing about having a second strategy to rely on, is that I still have the first, in case the 2nd doesn't work. The link parser will fall back to the regular expression parsing method if for some reason it can't convert the target page into it's DOM.

Comments (2) | Print | Send | del.icio.us | Linking Blogs | 664 Views

Improved MP3 Link Parser


Posted At : February 8, 2007 10:12 PM | Posted By : Justin

Improvements to the MP3 Link Parser are on it's way. The other day I noticed a few sites were incorrectly being labeled as onions because my link parser wasn't intelligently identifying links to mp3s.


I was using a regular expression to find and extract links, which worked well for 'index of' sites, but not so good on non-directory listing pages.Because of that I'm testing an approach to parse the returned page content from the crawler into an XML object which I can then easily query using XPath to pick off links.

<More>

Comments (0) | Print | Send | del.icio.us | Linking Blogs | 680 Views