Version 0.1 of MD_Extract pushed to github

Just pushed version 0.1 of MD_Extract to github. So what’s new? I improved the text extraction function to do simple things like adding a line break after a br element or not picking up text from within a comment, etc. The reason it’s 0.1 and not 0.99 it’s simply because it has not been tested, and unless I get some feedback I won’t know if I’m missing some weird markup people might write or edge cases i might not be considering. Anyhow i will be moving to writing a validation framework on top of this, but first I will be taking some time to look at the tidy source code.

Fixed hcard_extract :-)

I just fixed the original application I wrote for extracting hcards from webpages: http://www.metonymie.com/hCard_extract/app.html. I just checked the source code and there were some conf files that were out of place after we changed servers and other than that there was some minor changes to avoid warnings in new php settings. Anyhow, it’s live like it originally was, with the original code that is available for download. I’m kind of happy with having this live again.

Version 10^-1.8 of MD_Extract

Just pushed to github version 10^-1.8 of MD_Extract. Added a construct by URL method.

One clarification, what I meant on the other post by economic efficiency was simply that i have better things to do with my life. Ie: Time is a scarcer resource than CPU cycles. And since I’m doing this just to showcase microdata as a technology (I don’t have an actual need for microdata right now), the approach I’m taking is good enough.

Post Archive

Post Categories

Search Posts