News
TinEye from IDEE
Here's yet another image recognition project. In this case the claim is for a fairly narrow set of recognition capabilities. Nevertheless it shows how quickly this space is going to grow. May you live in exciting times!
Recent MIT AI Lab Research on Image Recognition
http://web.mit.edu/newsoffice/2008/csail-tt0521.html
Here’s an interesting bit of research out of MIT. The article points out the huge challenge today in identifying features of images and talks about efficient representation of exactly what key features are required to hold onto a fingerprint of an image.
This work is from Antonio Torralba, assistant professor at the MIT Computer Science and Artificial Intelligence Laboratory and his colleagues.
ImageWiki submission for Ignite Portland
Here's the submission entry http://ignite-proposals.pragmaticraft.com/proposals/43 . We tried to spin this in a more serious light perhaps than we should have. It ultimately wasn't accepted.
Speaker - Anselm Hook
Affiliation - http://blog.makerlab.org http://meedan.net
URL - http://hook.org/
Biography of Speaker :
Anselm Hook is a Hacker, Dad, Backpacker, Entrepreneur, and former Games Developer. He was born in Paris, France in 1967 but grew up mostly in Alberta and Saskatchewan. His father followed the Oil Boom west and started The Computer Shop of Calgary in 1976. Anselm has traveled, spelunked and hiked in places including Canada, the US, the UK, Iceland, Europe, Mexico, South America, New Zealand, Hawaii, Japan and India. Today Anselm is the CTO of Meedan a real-time English/Arabic translation project funded by IBM and MacArthur.
Anselm is passionate about Social Cartography. He is focused on creating tools to help people understand and appreciate the world around them. He helped launch Ning and led the engineering of Platial. Currently he is the co-chair of WhereCamp, volunteered a tiny bit on Calagator, as well as co-founding the MakerLab, a project incubator based in Portland, Oregon.
Within the Makerlab, the current project he is working on is the ImageWiki. With an ImageWiki a person can point their phone at an image of anything and have it find similar images and comments that other people have submitted on that image. One can take a picture of a favorite bar, or of a beer label or of a music album and see what people think about that thing, or even see if friends nearby have commented on the same thing.
Description :
Our civilization is on the verge of a technological breakthrough that will allow for image based search. Soon you'll be able to take a picture with your camera phone, send it to a computer and then have it tell you more about that thing you photographed. My talk stems from reflections and concerns about the implications of this critical juncture of technology.
We live in an image saturated world. Images straddle barriers of language and culture. Corporations work hard to create a brand association between certain images and their products. Photographers and artists often go out of their way to protect an ownership over their images. Certain images have a strong cultural value for one community and a totally different value for another community. Some images are taboo and other images are enshrined. Computers are going to start to allow our thoughts about images to begin to leak across normally disconnected boundaries.
Search providers great and small are gearing up rapidly to service this emerging potential for visual search as a new market. But who gets to dictate what information is returned on a visual search of a Coke can? How about a visual search of a childs' drawing of a Coke can? How do we crowd-source trustworthy statements about the things we see? What are the Patent issues of various image recognition approaches such as SIFT? What are the implications for Trademark owners? How about for stock photographers and flickr enthusiasts? Will there need to be a Creative Commons license to cover this? If we think of images as a new kind of hyperlink then does visual search then become a valid "resolving opportunity" like DNS? Copyright has not even begun to touch this space.
Some of these questions are rhetorical but we believe that these issues and more are pushing us towards a need to define an Image Commons.
In 5 brief minutes I hope to present on these issues and also introduce the ImageWiki project. This will not be a pitch insofar as that project is open source, but rather a call to arms encouraging everyone to see and claim this space. My hope out of this work is praxis; to have a clear statement made not just in words but in code.
What is ImageWiki - a review
The ImageWiki project came out of a year long conversation between all of us over at http://makerlab.org. Paige, who comes from an arts background, kept asking for a way to visually search for similar images. It took us all a while to figure out what a good idea it was - and once we saw the implications we sat down right away and knocked out a site.
The implications,
In the near future a variety of companies are going to release tools that let you use your cellphone camera as a way to let you find out more about something. You will be able to point your camera at a restarant menu and find comments and reviews about the food. You will be able to find out more about a book, a poster, a music album or anything else that you point your camera at. You'll be able to leave virtual notes attached to graffiti that only your friends can see.
What is going to make this all possible is recent advances in computer robot vision and in available computer processing power. The image recognition tools are getting good enough that a computer can now tell you if an image that you are looking at is similar to an image that it has already seen.
Words and Letters will no longer be the only input medium,
This may create a huge shift in how we can engage with computers and with each other.
We've been largely interacting with our machines by using words and text. Modern computers have keyboards or touch pads that are designed to help act as a bridge between us and the knowledge we are seeking. And most of us are pretty good at using a keyboard to communicate with a computer.
In the last few years however there has been some frustration with having to type things in. This is especially true in the mobile space. It is hard to type in a long URL for a website when you are using a keypad or even an IPhone with a little keyboard.
QRCodesand Barcodes may be obsolete,
Projects like IMatrix and others have sprung up to try and address this. In the IMatrix approach they encourage the use of carefully defined images called "QR-Codes" that a computer can easily recognize. You can point your camera at one of these images and it can retrieve some actual content. This is useful because as a QR-Code creator you can stick a QR-Code onto something that you want people to be able to find out more about. But it also has drawbacks in that QR-Codes are basically ugly and have to be "pre defined" by a creator. Searching on pure images meets both aesthetic demands and functional demands and in a way is more in line with how humans interact with each other - by this metric it may very well be that barcodes and QRCodes will disappear from human civilization.
Social Image Anchors,
What is possible now is that a pure image by itself can be used to retrieve an URL or some content, and this opens up the door to a whole new spectrum of possibilities.
One of the possibilities is the idea of social image search - where anybody can decorate any image with their own comments and opinions - this creates a socially framed context around an image and the image itself becomes a kind of anchor for conversations. Such a capability is hugely useful for people who want to be able to leave notes for friends. Effectively images become just like websites - and the same energy and passion that people bring to conversations on the web can now be associated with individual images.
Who owns the images we use?
Who owns the image rights to associate a given image with a given piece of information? Is this going to be one company - a new version of Network Solutions?
In many ways images are like music. They are a subjective experience, beauty or meaning is often in the eye of the beholder. In some ways the person who consumes the image adds at least half of the value; it is their choice to value the image that makes it valuable. At the same time the creator of the image deserves credit for understanding how to make something that people care about. In traditional society the creator owns the image - the rights to the image - its reproduction - and soon the associated discoverable metadata - will probably belong to the creator. This however under-represents the social value of images. Images need to be freely redistributable, and need to have objective comments associated with them - not simply the brand that the creator chooses to define. Clearly businesses work hard to create brand associations between certain images and certain ideas; this will have to start to change as people start to use images as a way to share meaning.
Semantically relating content,
Beyond this there are lots of exciting possibilities. Photosynth is a good example of where the semantic relationships between images help add value to images. And in the future, as these tools become better at recognizing human faces, there are all kinds of interesting social repercussions that will happen.
This projects role,
On the ImageWiki Project we are raising the issues pointed above and we are going to work very hard to define an "image commons" that is not owned by any one interest but that is truly open.
Part of the way we think that has to happen is that we will try to encourage "image handling brokers" where we'll provide our own image matching service but will try to foster more of a brokerage or ecosystem where an image search request is farmed out to any listeners that are interested and those listeners can all take a shot at trying to advise on what the best result it.
Join us!
We want to talk about images, to explore the medium, to see where it takes us. We hope you'll join us for the ride!
Pecha Kucha Presentation Tuesday November 18th 2008
Paige and I presented on the ImageWiki idea at Pecha Kucha last night,
http://www.slideshare.net/anselm/pechakucha-presentation-750335
It was terrific fun, and inspiring. My own practice is one of "praxis". I like to spend half my time implementing and half my time talking. However for ImageWiki I haven't been able to find any forums to speak about it in - and the feeling was that there was no interest in the topic in general. This is why we put it down for about 90 days or so.
Over the last eight months we've successfully failed to find an audience for it in the Knight Grants, in Ignite, in the Conflux event, and in several other events. Remarkably, for what is such an amazing idea - we've just been completely unable to get anybody else interested in it. It was baffling to me.
But, with the Pecha Kucha talk, we did feel inspired again to push it forward another notch. And put in quite a bit of effort to rev the code base, and get a good presentation together. We iterated several times on the presentation, video taping it and getting feedback from peers, and so hopefully the actual real presentation we did at Pecha Kucha turned out well. I was pretty happy regardless - just to have a chance to speak about the idea and its implications.
Now we have the server working well again; it was having some problems with the migration to merb and the subsequent update of merb to 1.0 and the upgrade of the hardware to use 'phusion passenger' and a few other things. The iphone client is still out of date but we will try and get a version running for android. Also thematically it feels like the most interesting local use is to capture local art around Portland. Capturing amazon books is probably also another good idea.
See our new blog at http://blog.makerlab.org as well!
ImageWiki migrated to Merb
We went ahead and moved the entire thing to merb - this allows much faster performance because it can handle more than one request at a time (unlike rails). This radical a shift should be infrequent. I wanted to get the code base ready for wider use. Next is to update the iphone client and perhaps get it into the iphone store.
Also we switched away from SVN to git - see:
http://github.com/makerlab/imagewiki/tree/master
In general I feel we will no longer use SVN on MakerLab at all but always try use github - it is external - better maintained and allows branchs.
ImageWiki Foss4G
The hope right now is to try and articulate this vision as widely as possible and to get feedback on it. In that light we're going to pop in on the major events and hopefully get into one or more of them. At least we'll have a good excuse to travel! South Africa also sounds like fun.
See //conference.osgeo.org/foss4g/2008">http://conference.osgeo.org/foss4g/2008 about FOSS4G. Basically it is a group of open source social cartographers and hackers coming out of the same community that has been pursuing this for the last ten years but now really picking up steam.
Here's what we submitted for FOSS4G:
About the ImageWiki Project,
The ImageWiki ( http://imagewiki.org ) is an attempt to build an "image commons" around the emerging possibility of visual search. It needs help, input and support.
Many commercial organizations are rushing to build visual search engines and each of them has specific markets in mind. We are also doing something similar except that we are doing it as an open source public utility.
Using the SIFT algorithm we are able to cluster images together based on similarity. We can assert that a a series of images, such as a series of pictures of a specific beer bottle label, taken from different vantage points, or partially obscured, or upside down or blurry, may in fact be referring to the same beer bottle label.
Effectively we create a strong relationship between images and we can then leak meta-data between those images. If a given image has a description, a link, tags or a location associated with itself, then we can make a leap that other images may also have similar associations.
This has strong locative aspects. If you take a picture of a store front with your camera phone, and you post it to a shared server - then somebody else who takes a picture of the same store front may be able to discover that you were nearby and when. So for example if the image commons were to be populated with Google Street View - you would have a way of using photos of buildings themselves as a kind of GPS.
There are however a variety of other interesting and unsolved implications. It is clear that an Image DNS will emerge and there is a question of who will own that Image DNS. Wikipedia faced the same challenges in attempting to define an objective database of human knowledge. There are specific participants such as corporations who invest a lot of money in defining a certain brand identity - will they protest when visual searches do not return their website first? OpenStreetMaps also faced similar kinds of problems in that the ownership and provenance of their data had to meet the highest standards of integrity or else be at risk of putting their entire database and their entire effort at risk. Creative Commons Licensing and Copyright Law does not seem to cover the case of what metadata should be returned on an visual search query. This appears to be new ground.
We will present on both our open source effort to define a shared durable open and public Image Commons and will comment on some of the unexpected and rather surprising social, legal and political implications that will come out of this new capability that humans will soon have.
Also available in: Atom
