A simple library for doing Form Validations with JQuery

I’ve just uploaded to github JQuery-Valid, a simple library for doing form Validations in JQuery. I developed this a couple of years ago, but I think it’s still useful when dealing with browsers that do not support the HTML 5 spec completely.

An example of the library’s usage can be found Here.

Version 0.1 of MD_Extract pushed to github

Just pushed version 0.1 of MD_Extract to github. So what’s new? I improved the text extraction function to do simple things like adding a line break after a br element or not picking up text from within a comment, etc. The reason it’s 0.1 and not 0.99 it’s simply because it has not been tested, and unless I get some feedback I won’t know if I’m missing some weird markup people might write or edge cases i might not be considering. Anyhow i will be moving to writing a validation framework on top of this, but first I will be taking some time to look at the tidy source code.

Fixed hcard_extract :-)

I just fixed the original application I wrote for extracting hcards from webpages: http://www.metonymie.com/hCard_extract/app.html. I just checked the source code and there were some conf files that were out of place after we changed servers and other than that there was some minor changes to avoid warnings in new php settings. Anyhow, it’s live like it originally was, with the original code that is available for download. I’m kind of happy with having this live again.

Version 10^-1.8 of MD_Extract

Just pushed to github version 10^-1.8 of MD_Extract. Added a construct by URL method.

One clarification, what I meant on the other post by economic efficiency was simply that i have better things to do with my life. Ie: Time is a scarcer resource than CPU cycles. And since I’m doing this just to showcase microdata as a technology (I don’t have an actual need for microdata right now), the approach I’m taking is good enough.

My take on Microdata versus Microformats

To provide some background, I’m going to start with a little story of how I originally got involved with the microformats community. A couple of years ago I was working at thethingsiwant.com, I had implemented a little hack that whenever someone added an item from amazon I would analyze the URL, extract the ASIN, send a Web service request to their server and then get the price, image and referal link of the product. Users of the site really liked having an image for their items and they started asking to have this on other sites. Feeling encouraged I ran a query on the database, picked the 100 most popular websites and wrote an extendable crawler that picked basic product data from their pages (yes, I know, I’m insane) [*]. Anyhow, what happened next was that some of our users had e-commerce sites of their own and they started asking “Why doesn’t TTIWBot pick data from my site?”, so I would generally take their request and add a customization of the bot for their site. The thing is, this was very time consuming and also every time a website would change their layout I would have to change the customization of the bot, it quickly became clear that some sort of common format would have been better. Now, I could have went the easier route and asked them to implement an XML Web Service but some were really small sites and you could tell they were done on a budget and truth is it would have been cruel to have asked them to do something as complicated and as expensive as a Web Service was at the time. Enter Microformats.

I read somewhere (I don’t exactly remember where) that there was a new technology being developed called microformats, originally being backed by technorati. I looked around the documentation, understood the basic concepts and decided to contribute. To be honest, I was a little scared to be posting in their list in the first place [*] since most people there seemed to had either advanced degrees from prestigious universities, were working on blooming silicon valley startups or where people whose work I had read ( like Mark Pilgrim ). At the beginning, I didn’t felt that I was taken very seriously, and I felt it was reasonable since the only affiliations I had at the time was with TTIW which was a small, independent website with no backing whatsoever and whose markup wasn’t exactly perfect. So, to prove that I was serious I wrote a little web app that you can find here though due to the changes in PHP it currently doesn’t work anymore. To get an idea of how it used to work you can see the review by Michael Coté, here. I also contributed with the examples in the wild section of hListing. Of course, my enthusiasm took over and I wrote all kind of extra stuff that I just thought of, like what you can find here, but that’s because I liked the technology and I started thinking about uses for it.

Anyhow, my point here is that I had an actual need. As the discussions continued ( which I didn’t followed completely since I was really busy with other stuff ), it became clearer that there were some problems with the formats being developed and what I was actually needing. Specifically, I remember not being very happy with the format for price since not having a currency attached to a value makes the information worthless and TTIW had users from a variety of different countries. Well, I also thought that there were some design issues and as an end user I thought that having a validator would make life easier for everybody involved. I found a way to write one ( using the structure provided by tidy ) and tried contacting the people responsible of microformats to tell them about this and maybe warn them about some possible problems, but I got no reply. So I decided to go ahead and release it ( xmfp ), wrote a little discharge on some of the issues I thought the technology had and sort of withdrawed from the list.

A lot of things happened to me and the world after that and I sort of withdrawed from web development and disconnected for a while. And then the Microdata spec came out and since I like the technology I wrote a quick implementation which I’m still working on. So let’s move on to why I believe Microdata is a better spec.

A quick summary of why I believe Microdata to be a better format than Microformats

It doesn’t break RDF

It is entirely possible without much effort to encode rdf in microdata and an example is provided here. A lot of work has been done on RDF and on linked data technologies in general and there are plenty of things you can do with linked data (with or without RDF) that you cannot do with or that are outside the scope of microdata. And a good summary of this was provided by Kingsley Idehen in this comment on a post by Georgi kobilarov.

Clear syntax rules

The syntax is extremely simple and once you have access to a tree representation of the HTML document, like the one provided by Tidy or the Dom, it’s really simple to extract the data, and the WHATWG even provides an algorithm for doing so. Furthermore, if you are working in a scripting language, you could base your implementation on the one I did ( it’s just 5 very recursive functions ).

Another good thing is that even though it’s very simple, it is extremely powerful and you could encode a wide variety of complex data types with it.

It doesn’t require post processing of the picked up values. Which I believe was a terrible mistake in the design of Microformats. Why do I believe this? Because it makes a generalized extractor or parser impossible to build, since you would have to add by hand the post processing rules of any new format you may need to add to the extractor.

There are no vocabulary validation rules so far in the Microdata spec, so this is still open, but there were some vocabularies developed by the microformats community that had the problem of having some properties with different value types, for example: org in hcard might have been either a string "org":"example" or an structure with 2 different values "org":{"organization-name":"example-name", "organization-unit":"example-unit" } . This is unnecessarily complex to work with when working with the data in general applications.

Not limited to a closed set of vocabularies

The microdata spec does not force the use of any particular vocabulary. In fact, the choice of vocabulary is completely up to the implementor. This means that if a user has a particular need (like the one I had) and there is no vocabulary that fits that need, he can create his own.

It’s a W3C Spec

There is not much more to say about it ( and I mean this in a good way ). This also means that it will probably be implemented by most browsers.

(*) None of this is working anymore, since most websites have changed layouts and Amazon requires a timestamp and TTIW’s server is so old the clock keeps getting out of sync. I also did a lot more like this (like integrating the crawler with different referal services datafeeds, identifying a product across different websites by ISBN or UPC, etc.) but none of this is relevant to what this post is about and in the end I wrote another algorithm in javascript that picks the image based on the areas of HTML elements on the page and also some basic price picking on the text of the page, and that’s what’s currently live, but TTIW is not actively being maintained.

(*) I even made a couple of faux passes, like on a discussion of semantic URLs I was trying to find an example and went to TTIW’s Tags page, and clicked on the tag “gothic lolita” (which was a popular tag at the site since a lot of the users were teen girls, and this youth subculture was trendy at the time) then I realized that it didn’t look very serious and changed it to star wars, unfortunately I forgot to change the link. It’s just that I was petrified of posting at the list.

Version 10^-1.9 of MD_Extract pushed to github

Completely changed the way the string representing the HTML is preprocessed before being fed to tidy. I’ve just changed the function and the approach. The function is not really very elegant but it fixes a bunch of bugs. It’s mostly character iteration and lots and lots of flags (old school style!!). But it got me thinking after doing some quick browsing on the HTML parsing algorithm provided by the WHATWG if I shouldn’t just write my own (though it looks sort of hard and specially time consuming). I’ve also been looking at the source code of tidy and though it’s quite big the other option would be to try to contribute to it and help update it to HTML 5, but it would take some time for me to get to know the base code and the project seems to have been abandoned (and it might be quite big for just one person to work on). Anyhow, I’m not promising anything so far.

I do understand that the current approach that the library takes on this (preprocessing and then sending to tidy) is not the most efficient one. However there is another take on efficiency and that’s economic efficiency, and except for really heavy duty Microdata consuming the library does fulfill it’s purpose and the truth is Microdata is a new spec that still has to be widely adopted, so that’s not a real concern right now. So the question is whether if it makes sense to spend the next 3 months writing a parser from scratch, when the one I have does fit my needs (and probably those of 99.999% of PHP developers that may use the library). So far I don’t see the point. But then again my geeky side keeps bugging me to do it right.

Well, anyhow if you find any bugs (and I’m sure there might be many, simply because there are very few microdata examples and I might be missing strange markup some user might come up with ), please report them!!. Other than that I will write a post next on why I believe microdata to be better than microformats and I would also probably write a personal post that I’m sort of owing myself to write.

My first attempt at a Microdata Extractor.

I’ve just pushed to github, version 10^-2 of MD_Extract . It’s my first attempt at a Microdata consumer.

I based the extraction algorithm on the one published by the whatwg , though the implementation has some variations, mainly for clarity of code and also due to the particulars of it being done in PHP. I took Tab’s suggestion and it does a first pass through the HTML tree to collect references to elements with IDs which makes the code so much clearer and nicer than what I was originally planning of doing. In fact I think the algorithm is beautiful ( and it’s O(n), where n is the number of nodes in the html tree ).

I have versioned it at V. 10^-2 because I have not found that many examples to test it, there are also some anticipated problems with character encodings that do not extend ASCII and a couple of little things I’d like to add. But as far as I know, regarding microdata syntax it’s 100% compliant with the latest spec.

Version 0.7.2 of Extensible Microformat Parser Released

Maintenance release, fixed the issues with the changes in the PHP language. I’m sorry to people that reported the issue, due to some google code settings I was not receiving emails. Anyhow, other than maintenance and bug fixing I won’t be maintaining the code anymore since I find microdata to be a way better spec than microformats.

Version 0.7 of Extensible Microformat Parser Released

I’ve just officially uploaded to google code the new version of XMFP for download. This release adds transformation of the parsed microformat content into JSON, a wider array of Microformats support and fixes a bunch of bugs and some design issues from the older version. It is basically the downloadable version of the changes that I’ve added to the SVN version since the last downloadable version.

I’ve also changed the License to an MIT License.

Dom Manipulation of SVG embeded inside XHTML

It’s been awhile now since the W3C proposed the XHTML + SVG + MAthML document type declaration and a certain level of support has been available for a while. I’ve been testing XHTML + MathML lately (see http://www.metonymie.com/apuntes/2008/05/14/probabilidad-formulas-basicas.html). Now Firefox 3 is out, and supposedly it has support not only for embedded SVG but also for Dom manipulation the SVG nodes ( Correction: Jeff Schiller has pointed out that SVG support has been available in Firefox since version 1.5 ). So this is a series of tests of the basics of working with SVG from Javascript.

Getting the content to be recognized by the browser.

While testing this on my machine, I found out that for the svg to be accepted inside the html I had to name the file with an .xhtml extension. Since the same document with a .html extension just wasn´t recognized by firefox (IE 7 didn´t recognize the embedded SVG one way or the other). An example: SVG embedded inline inside XHTML.

So that was fine for local tests, but what about a dynamically generated document?.

Well first we need to define the type declaration, and the appropiate namespaces:

	<?xml version="1.0"?>
	<!DOCTYPE html PUBLIC    “-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN”
	“http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd”><html xmlns=”http://www.w3.org/1999/xhtml” xml:lang=”en”>
<html xmlns=”http://www.w3.org/1999/xhtml”

Note: Declaring the namespace on the document level saves us from having to declare it in each and every instance of an SVG element. So out of laziness, I prefer to do it on the top of the document.

But this is not enough if we want the document to be recognized as XHTML, we need to either use the .xhtml extension or to use a header when serving the document with content type=”text/xml”. In php we can do it like this:


An example: SVG embedded inline inside XHTML in a file with a non .xhtml extension.

You can also set the content type to a more specific definition, for example (inside the head of the document):

	<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8"/>

Using SVG inside XHTML

Now we’ll get into a little (just a little) more complex svg example that we’ll use for the rest of the post: Dom Manipulations of SVG Inside XHTML Examples

Defining the SVG elements

First we´ll create some basic SVG elements inside our document. Basically a Rectangle and a Circle inside a div.

	<div id="svg-content">
	<svg:svg height="300" width="700" is="svg-container">
	<svg:rect fill="#ff5500" height="50" width="200" y="100" x="300" stroke="#000000" stroke-width="2px"/>
    <svg:circle cx="150px" cy="100px" r="50px" fill="#ff0000" stroke="#000000" stroke-width="5px" id="circle"/>


Accesing the SVG Objects Properties with Javascript

Getting the value of an SVG Object attribute

It is as simple as retrieving the element by ID and getting the attribute as we would for any other DOM object. In the example page, this is the code on the first button:

	alert( document.getElementById('circle').getAttribute('fill') );

Changing the value of an SVG Object Attribute

Again this is done as we would normally do with any other DOM Object.

	document.getElementById('circle').setAttribute('fill', '#ffdd22');

Creating and adding a new SVG Object

This is a little more complex since we have to create a document node with the SVG Namespace defined.

	function add_new_rectangle () {
		var atts = {"stroke-width":"1", "stroke":"blue", "fill":"yellow", "height":"20", "width":"40", "y":"100", "x":"220"};
		//Defining the SVG Namespace
		var svgNS = "http://www.w3.org/2000/svg";
		//Creating a Document by Namespace
		var node = document.createElementNS(svgNS, "rect");
		//Setting attributes by namespace
		node.setAttributeNS(null, "id", "new-rect");
		for(name in atts) {
			node.setAttributeNS(null, name, atts[name]);
		var cont = document.getElementById(”svg-container”);
		//Appending the new node

Using innerHTML with SVG

Now, a very simple way to deal with elements in HTML with the dom, is to use the innerHTML property. Luckily we can do the same with Divs containing SVG. Code for the fourth button in the example:


And of course we can manipulate and change it with a new SVG element declaration. The function that calls the fifth button:

	function change_innerHTML_to_a_new_SVG() {
		var svg_str='‘;
		for(var x=1; x<10; x++) {
			svg_str += '<svg:circle cx="' + (x*30) + 'px" cy="100" r="' + (x*5) + 'px" fill="#ff' + (x*10) +  '00" stroke="#000000" stroke-width="3px"/>';
		svg_str += '</svg:svg>';
		var cont = document.getElementById("svg-content");
		cont.innerHTML = svg_str;

Post Archive

Post Categories

Search Posts