Web Technologies (or What's wrong with Facebook?)

There are a lot of web technologies out there which are gaining momentum, and which are utterly scalable and relatively secure. Together they are shaping the future of the web as we know it.

In the beginning

A long time ago, we all had home pages. Your site would be http://yourisp.com/~username/ and you would upload static HTML files to the 'public_html' directory of your home directory and they would magically appear on your web site. You would tell your friends about it and their sites would hopefully link to your site and you reciprocated this process so you could expose who your friends were. The essence of the web. People linking to other people. Eventually, altavista or yahoo got wind of your web page and you could ogle your own search result. Times were good.

Along came web journals which eventually became known as the ubiquitous "blog". Structured sites weren't for individuals; they simply wanted to post new items and they would be ordered by date. They wanted to file them under different labels, what we now know as tags. The blog was typically a set of CGI scripts you uploaded to your ISP, or even a livejournal or other blogging server. You lost control of the pages, since you weren't hand coding them. You may also have lost some of your friends links too. A blog is more than a home page, but a blog usually doesn't indicate who your friends are, since a blog is more about posting text.

Whatever forms the blog had it had many of the same characteristics: They were ordered chronologically. They were often tagged with one or more labels, they usually only had a title, a body and a single author (you). Thus different formats arose (different types of RSS) for syndicating the content. There was a rush for blog providers and vendors to provide RSS feeds of their users' postings.

Also, blogs were more and more becoming a way of expressing ones own opinions of mundane things. Wouldn't it be nice if your friends could comment on the things they are doing? Hey of course. Each post would get their own little discussion forum. Comments were even simpler than the posting they were commented on: A body was all that was needed; both the title and author were typically optiona. Anonymity is still important. And of course some geeks made RSS feeds of the comments to their postings. Sheesh!

Now of course, If I have a blog and you have a blog, why should I have to write on your blog if I want to comment your stuff. What if someone wants to comment my comment? Well, sure enough someone found out that they could simply post it on their own blog and tell the other blog about it. This is the whole trackback / pingback issue. I would post a follow up to my blog citing your posting on your blog. And the protocols in place would indicate to your blog that there's a comment over here for you. Both blogs would carry my comment, and everything is cool. And we have a lot of links going between our blogs so (now omnipresent) Google knows we like each other.

Bookmarks

With all of this content floating around the need to bookmark things became more and more important. In a sense, you were your bookmarks. You could of course put your bookmarks file on the web but that wasn't quite so cool (but a lot of "Joe's Bookmarks" pages do exist). If you knew your way around programming languages you probably had your own little CGI script that took care of your bookmarks for you, but eventually you would find out that del.icio.us had pioneered this quite a while ago. The cool thing about del.icio.us was that it was public. Everyone could see your "bookmarks" and the definition of "you" was now available for all to see. What you were reading, which sites you visited, and how you commented on those bookmarks, and which labels (tags) you gave your bookmarks. Since del.icio.us also exported public RSS feeds of your bookmarks you could import them anywhere else you would read RSS feeds, such as right there on your own blog.

Enter the mobile camera phone age in the early 2000s. The whole concept of blogging images was utterly unheard of until well into the 2000s. Now every kid on the block has their own photo blog site. Flickr was one of the more successful, and you could bulk upload photos of the day when you were asleep at night. The photos at flickr were also available using RSS feeds, and could therefore also be added to your blog page; flickr even had a nice flash thingy showing tiny preview images of the photos. They even went as far as using the (now) standardized blogging APIs available to post certain photos directly to your blog. Life was so, so cool.

Social networks

Then came facebook and ruined it all. The web was a nice place before facebook. LinkedIn was partly responsible, for showing that it was possible to get that many people to sign up and define themselves on-line, and lure their friends into their little network. Sure, facebook was cool, but it was just not the direction the Web was headed. Neither was LinkedIn. They simply provided a sexy interface on top of functionality which already existed in the web. They of course also required that your information be hosted by them as well; LinkedIn would keep your profile, your friends list and so on. Why couldn't you host it yourself? They wanted to tie the users to their site so they could make money of targeted advertisement. If they made it so that you could host it yourself, they wouldn't make any money, since anyone else could set up a similar (better) service and steal your users. Your bandwidth usage could drop from Gigabits one day to Kilobits the next unless you held a firm grip around your "unique visitor" base.

Every corporation does it. ICQ, AIM, Microsoft with Passport / Live logins, Flickr, Google with its ever expanding array of services, LiveJournal, del.icio.us, and of course Facebook. The list goes on. They all want to keep you as their user so they can keep you as their faithful advertisment target. Ugh!

Other ways to solve this?

Technologies exist to make all of these services work in a decentralized way, meaning that no single corporation owns the users or the content. It also means that anyone can be friends with anyone else, no matter where they have their "online identity"

One of the first ways of doing this was the semantic web way. It defines a language of nouns / verbs that define how humans "know" each other. It is based on the Resource Description Framework (RDF) and is called Friend of a Friend (FOAF). It is many years old already but really hasn't caught on. This is probably because of its decentralized nature and that there are no really good products out there that utilize the technology well.

RDF basically puts you back in the driver's seat. You create a single web page on your site or blog which follows a strict XML syntax. That file says that "you are called Joe and that you're interested in these things, and that you know Sue and you've met John, and that you've gone fishing with Harry. Of course instead of just listing your friends' names you of course link to their blog or even better their RDF page (commonly known as foaf.rdf). You would also point to your blog or to the RSS feed of your blog so that anyone (based on your 'foaf.rdf') could easily see a list of postings, or your bookmarks or even your photo stream.

Note how things are decentralised in that your blog could be hosted by livejournal, and your photo stream could be static files uploaded to a web site, and your bookmarks could be most anything. This is because the technology embraces the core technologies of the web, and does not try to work against them. If you decide to move your photos to google or some other photo provider, you simply move your content and change your foaf.rdf. Anyone who wants to see your photos can keep doing what they've always done, namely go to your 'foaf.rdf' and follow the 'photo stream' link.

Who are "you"?

The FOAF file becomes your identity. It becomes the starting point of anything related to you. From your foaf you can link to anything. And I mean absolutely anything. I've mentioned your name and the names of (and links to) your friends, but you can give out your GPS coordinates if you wanted to (Geo positioning, GDF84), or list your entire career (DOAC) or even mark up your project and say that the project's subversion repository is here and that you can browse the repository there (DOAP). All of this is machine readable and has real meaning. It isn't just a silly old human readable web page with lots of '<a href..>' tags. Any service could infer where on the globe you work.

There are a few obstacles for the semantic web to take off. First off is the fragmentation and overlap of the many different vocabularies. http://en.wikipedia.org/wiki/Semantic_Web lists a few of the vocabularies and projects, but a few of them are more popular and useful than others. You can pick and choose and add almost any semantic meaning to your document, such as your favorite pizza takeaway or genealogical meta data or your CD or MP3 collection. When the expressiveness is so diverse, it is also hard if not impossible to write an editor that supports it all, unless you make a generic boring RDF editor. Perhaps that is the answer... Otherwise, a tool will have to generate the RDF for you, or a plug-in to your favorite bloging or content management system would allow you to add stuff to a profile which it exposes as RDF.

Another thing is that you can blatantly lie about a relationship. What use is it if someone says they are you, with your date of birth, e-mail and link to your blog and so on? This also means that you need to trust the individual who claims to be who he is. I might say that I'm heir to the throne of England, but it would just stand there. Lies can also be made about ownership of sites. Say I run a blog, I link to my own blog, but I could also say that "I own this blog too" and link to it. There are ways to verify such statements.

Establishing trust

MicroID is one such way. MicroID is a deceptively simple and foolproof method of verifying that one resource "belongs" to another resource, typically that a web page (or part of a web-page) belongs to a person (identified by her e-mail address). It does so by creating a one-way hash of the e-mail address of the owner and the URL of the resource so that the e-mail address is impossble to sniff (to protect from spammers). The resulting hash is then embedded in the actual resource. This means that anyone that can retrieve the document can also assert wether or not it belongs to a specific e-mail address (e.g. as asserted by some RDF) by rehashing and checking if the author of the document has the same hash. If so, it is certain that blog belongs to the address described in the RDF.

This technique can be used in many more ways. If you post a comment to a blog, and enter your e-mail address, the commenting system could very easily perform the same hash (since it knows the URL and the e-mail address at the time of authoring) and simply embed the hash in a div tag encompassing the comment. Anyone can then assert that you (or someone impersonating you) actually wrote that comment.

Impersonating someone is a whole different topic. What is "you"? Is it your blog? Your e-mail address? What is the canonical "yourself" that you wish the whole world to know? Quite often it is a web page, possibly just a web site. In future it could be an OpenID. OpenID is a decentralised identification system. You ask any old OpenID provider to give you an identifier, and in return (typically after validating that you own an e-mail address) a web page which is guaranteed to be yours. It contains information which a web application can use to construct a HTTP 302 REDIRECT response. The browser will be redirected to the end user's OpenID provider and the user will be given a familiar login. When the user says "It's OK to authenticate me for this site" the OpenID provider will redirect the browser back to the site, with authentication tokens that verify indeed that the user is logged in. The site never saw my password. Sweet.

OpenID prooves to a web site that the end user owns a specific OpenID account, but not an e-mail. To do that, the OpenID page would have to add MicroID indicating that a specific e-mail address owns that OpenID account. If the OpenID account also has some RDF links included then the web site can traverse that link and extract the necessary information it needs. If the user chooses to divulge her e-mail address it can easily be checked for correctness by checking the logged in OpenID's MicroID.

Wrapping it up

Ok this might be going a bit fast but bear with me. It's all very simple, since it all revolves around embedding <link> and <meta> tags in the web pages you create. These prove that you authored the documents, or at least that someone impersonating you did. Now to stop the impersonating!

To stop someone from impersonating you is accomplished by having a single site which lists the resources you claim to have written. Since anyone can post under your name in a blog somewhere, and give your e-mail address or your web site as "proof" it is important that you stake your claims on content around the 'net. This is done via a reverse link which basically says the same thing: that you authored this specific piece of content.

Claiming a resource is as simple as providing a link from a resource which can be proven belongs to you. simply link to the resource which you claim (and can proove) authorship. That's all.

Who are you again?

I am my home page (mogsie.com). On that page I have a MicroID telling the world that they can verify my e-mail address as the owner of this site if I divulge it. I also have my OpenID site listed there, so mogsie.com is my user name. There's a link to my RDF too. It describes my on-line presence such as my blogs, my delicious tags, my on-line profiles such as MSN ID or ICQ ID, my whereabouts and my friends. Authorship is proven in that I link to it from my proven home page; in a sense I have "claimed" these resources as my own.

Now FOAF and RDF haven't caught on, since the enormous error prone XML file that is 'foaf.rdf' is clunky at best. There are systems that can do this part for you (albeit not using RDF or FOAF).

ClaimID.org is one such claiming system. ClaimID is a lot like a bookmark site, except that you only bookmark things that you claim to own. This "claim" list will provide others with the second half of the irrefutible proof that you actually have authored the content, and not just trust the content's claims itself. So (1) the content claims it was authored by you and (2) you have staked your claim by linking to it. Add ClaimID with MicroID and anyone who wants can check that you authored that ClaimID (by verifying your e-mail address). So we are nearing a complete web of trust.

Future directions

There is potential for a site such as wink.com to use the semantic web (i.e. foaf) to automatically populate a wink page based on the web of trust gained through MicroID and OpenID and some RDF or ClaimID site. As far as I can tell it doesn't do this yet, probably since only 0.002 percent of the blogs out there have RDF associated with it. But if plug-ins were to be developed and template sets would include RDF tagged meta information by default then the world would be much more interlinked.

Conclusion

Contrast this to adding everything about you into yet another social network (e.g. Facebook, friendster, or whatever) just to meet the needs of all of your different groups of friends... NNNGH!

But will it ever catch on? Nobody owns me, since I am my own publisher. I assert my own identity, and my links, friends and so on. But since there is nobody here to make money off me other than myself (I am not a big corporation) it is rather unlikely that it will ever happen...

#