Entries Tagged as ''

MIT helps develop new image-recognition software

By David Chandler, MIT News Office

It takes surprisingly few pixels of information to be able to identify the subject of an image, a team led by an MIT researcher has found. The discovery could lead to great advances in the automated identification of online images and, ultimately, provide a basis for computers to see like humans do.

Antonio Torralba, assistant professor in MIT’s Computer Science and Artificial Intelligence Laboratory, and colleagues have been trying to find out what is the smallest amount of information–that is, the shortest numerical representation–that can be derived from an image that will provide a useful indication of its content.

Deriving such a short representation would be an important step toward making it possible to catalog the billions of images on the Internet automatically. At present, the only ways to search for images are based on text captions that people have entered by hand for each picture, and many images lack such information. Automatic identification would also provide a way to index pictures people download from digital cameras onto their computers, without having to go through and caption each one by hand. And ultimately it could lead to true machine vision, which could someday allow robots to make sense of the data coming from their cameras and figure out where they are.

“We’re trying to find very short codes for images,” says Torralba, “so that if two images have a similar sequence [of numbers], they are probably similar–composed of roughly the same object, in roughly the same configuration.” If one image has been identified with a caption or title, then other images that match its numerical code would likely show the same object (such as a car, tree, or person) and so the name associated with one picture can be transferred to the others.

“With very large amounts of images, even relatively simple algorithms are able to perform fairly well” in identifying images this way, says Torralba. He will be presenting his latest findings this June in Alaska at a conference on Computer Vision and Pattern Recognition. The work was done in collaboration with Rob Fergus at the Courant Institute in New York University and Yair Weiss of Hebrew University in Jerusalem.

To find out how little image information is needed for people to recognize the subject of a picture, Torralba and his co-authors tried reducing images to lower and lower resolution, and seeing how many images at each level people could identify.

“We are able to recognize what is in images, even if the resolution is very low, because we know so much about images,” he says. “The amount of information you need to identify most images is about 32 by 32.” By contrast, even the small “thumbnail” images shown in a Google search are typically 100 by 100.

Even an inexpensive current digital camera produces images consisting of several megapixels of data–and each pixel typically consists of 24 bits (zero or one) of data. But Torralba and his collaborators devised a mathematical system that can reduce the data from each picture even further, and it turns out that many images are recognizable even when coded into a numerical representation containing as little as 256 to 1024 bits of data.

Using such small amounts of data per image makes it possible to search for similar pictures through millions of images in a database, using an ordinary PC, in less than a second, Torralba says. And unlike other methods that require first breaking down an image into sections containing different objects, this method uses the entire image, making it simple to apply to large datasets without human intervention.

For example, using the coding system they developed, Torralba and his colleagues were able to represent a set of 12.9 million images from the Internet with just 600 megabytes of data–small enough to fit in the RAM memory of most current PCs, and to be stored on a memory stick. The image database and software to enable searches of the database, are being made publicly available on the web.

Of course, a system using drastically reduced amounts of information can’t come close to perfect identification. At present, the matching works for the most common kinds of images. “Not all images are created equal,” he says. The more complex or unusual an image is, the less likely it is to be correctly matched. But for the most common objects in pictures–people, cars, flowers, buildings–the results are quite impressive.

The work is part of research being carried out by hundreds of teams around the world, aimed at analyzing the content of visual information. Torralba has also collaborated on related work with other MIT researchers including William Freeman, a professor in the Department of Electrical Engineering and Computer Science; Aude Oliva, professor in the Department of Brain and Cognitive Sciences; and graduate students Bryan Russell and Ce Liu, in CSAIL. Torralba’s work is supported in part by a grant from the National Science Foundation.

Torralba stresses that the research is still preliminary and that there will always be problems with identifying the more-unusual subjects. It’s similar to the way we recognize language, Torralba says. “There are many words you hear very often, but no matter how long you have been living, there will always be one that you haven’t heard before. You always need to be able to understand [something new] from one example.”

A version of this article appeared in MIT Tech Talk on May 21, 2008 (download PDF).

Ranking acessed files based in Apache log and writed in perl

As shown below it will print out the top 100 popular pages on your website, but that is easily changed.

TOP 100 MOST-REQUESTED FILES:
-----------------------------
  10485            /
  7175             /index.xml
  2075             /fredpalma.com

(97 more lines of output deleted here)

Without any further ado, here is the Perl source code for my Apache log file analyzer.

#!/usr/bin/perl
#
#----------------------------------------------------------------------------#
#   PROGRAM: most-hits.pl
#
#   PURPOSE: This program serves one purpose.  It reads an Apache
#   access_log file in standard ECLF format,
#   and print out the number of hits that have been
#   recorded for each file and/or directory.
#
#   The user can control the number of files that are
#   displayed in the output by changing NUM_RECS_TO_PRINT.
#
#   This tool lets you analyze your web site so you can
#   understand what content your viewers are interested in.
#
#   USAGE:
#
#   most-hits.pl access_log > results
#   perl most-hits.pl access_log > results
#
#----------------------------------------------------------------------------#

#----------------------------------------------------------------------------#
# COPYRIGHT:                                                                 #
#                                                                            #
# This sample program is provided free of charge under the terms of the      #
# GNU GPL.                                                                   #
#----------------------------------------------------------------------------#

use File::Basename;

#------------------------------------------------------------------------------#
#  Global variables that control the program action and output.                #
#------------------------------------------------------------------------------#

$NUM_RECS_TO_PRINT = 100;   # num of output recs to print per section

#---------------------------------------------------------------------#
#  Change this array to include index filenames used on your system.  #
#---------------------------------------------------------------------#

@indexFilenames = ('index.htm', 'index.html', 'index.shtml');

#----------------------------------------------------------------------#
# don't change anything below here unless you're comfortable with Perl #
#----------------------------------------------------------------------#

sub usage {
   print STDERR "\n\tUsage:  logHBF.pl access_log_file > output_file\n";
}

#----------------------------------------------------------#
#  These are two helper routines for the 'sort' function.  #
#----------------------------------------------------------#

sub fileNumericAscending {
   $numFileRequests{$a} <=> $numFileRequests{$b};
}

sub fileNumericDescending {
   $numFileRequests{$b} <=> $numFileRequests{$a};
}

sub trim($)
{
   my $string = shift;
   $string =~ s/^\s+//;
   $string =~ s/\s+$//;
   return $string;
}

#----------------------------<<   main   >>-----------------------------#

   #--------------------------------------------------------------------#
   #  Start by making sure the user is invoking this program properly.  #
   #--------------------------------------------------------------------#

   $numArgs = $#ARGV + 1;

   if ($numArgs != 1) {
      &usage;
      exit 1;
   }

   $logFile = $ARGV[0];

   open (LOGFILE,"$logFile") || die "  Error opening log file $logFile.\n";

   #------------------------------------------------------------------#
   #  Start reading and processing the access_log file in this loop.  #
   #------------------------------------------------------------------#

   printf "<pre>\n";
   while(<LOGFILE>)
   {

      chomp;

      #----------------------------------------------#
        #  condense one or more whitespace character   #
      #  to one single space                         #
      #----------------------------------------------#

      s/\s+/ /go;

      #----------------------------------------------------------#
      #  the next line breaks each line of the access_log into   #
      #  nine variables                                          #
      #----------------------------------------------------------#

      ($clientAddress,    $rfc1413,      $username,
      $localTime,         $httpRequest,  $statusCode,
      $bytesSentToClient, $referer,      $clientSoftware) =
      /^(\S+) (\S+) (\S+) \[(.+)\] \"(.+)\" (\S+) (\S+) \"(.*)\" \"(.*)\"/o;

      #--------------------------------------------------------------------#
      # take care of problem where the $httpRequest may simply be a hyphen #
      #--------------------------------------------------------------------#

      next if ($httpRequest =~ '^-$');

      #-----------------------------------------#
      #  Determine the value of $fileRequested  #
      #-----------------------------------------#

      ($getPost, $fileRequested, $junk) = split(' ', $httpRequest, 3);

      #--------------------------------------------------------#
      # ignore hits to the following file types.
      # this section of code needs to be fixed so the user can
      # declare extensions to ignore at the top of the program
      #--------------------------------------------------------#

      if ($fileRequested =~ /\.gif$/i) {
        next;
      }
      if ($fileRequested =~ /\.jpg$/i) {
        next;
      }
      if ($fileRequested =~ /\.css$/i) {
        next;
      }
      if ($fileRequested =~ /\.png$/i) {
        next;
      }
      if ($fileRequested =~ /\.java$/i) {
        next;
      }
      if ($fileRequested =~ /favicon\.ico$/i) {
        next;
      }
      if ($fileRequested =~ /robots\.txt$/i) {
        next;
      }

      #-----------------------------------------------------------------#
      #  if the base filename is something like index.htm, index.html,  #
      #  or index.shtml, interpret this to be the same as the path by   #
      #  itself.  This way, '/java/' is the same as '/java/index.html'. #
      #-----------------------------------------------------------------#

      foreach $indexFile (@indexFilenames) {
        chomp($fileRequested);
        $fileRequested = trim($fileRequested);
        if ($fileRequested =~ /^\s+$/) {
           next;
        }
        if ($fileRequested =~ /^$/) {
           next;
        }
        if (basename($fileRequested) =~ /$indexFile/i) {
           $fileRequested = dirname($fileRequested);
           last;
        }
      }

      #----------------------------------------------------------------#
      #  If the last character in $fileRequested is a '/', remove it.  #
      #  This makes /perl/ equal to /perl.                             #
      #----------------------------------------------------------------#

      if (length($fileRequested) > 1)
      {
        if (substr($fileRequested,length($fileRequested)-1,1) eq '/')
        {
          chop($fileRequested);
        }
      }

      #-----------------------------------------------------#
      #  here's where we count the number of hits per file  #
      #-----------------------------------------------------#

      $numFileRequests{$fileRequested}++;

   }

   close (LOGFILE);

   #--------------------------------------#
   #  Output the number of hits per file  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT MOST-REQUESTED FILES:\n";
   print "-----------------------------\n\n";
   $count=0;
   foreach $key (sort fileNumericDescending (keys(%numFileRequests))) {
      last if ($count >= $NUM_RECS_TO_PRINT);
      print "$numFileRequests{$key} \t\t $key\n";
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";

   # the end

If you’re looking for a much more powerful Apache log file analysis program, awstats looks like a good tool to use. I haven’t tried it yet, but it looks like it may also be written in Perl, and seems to provide all the pretty graphs most people like to see.

Thank’s devdaily.com and dicas-l.com.br .

Installing PostgreSQL 8.3 on Ubuntu 8.04

These are my notes for installing the Postgresql 8.3 database server on Linux Ubuntu Server 8.04 with set up allowing access to other IP’ s.
These steps should be applicable to other Debian based distros.

At the command-line, enter the following commands (or search for the listed packages using synaptic):

These installs the database server/client, some extra utility scripts and the pgAdmin GUI application for working with the database.

$ sudo apt-get install postgresql postgresql-client postgresql-contrib

If you want the pgadmin:

$ sudo apt-get install pgadmin3

To reset the password for the ‘postgres’ admin account for the server.  Change the Password  you want to use for your administrator account:

$ sudo su postgres -c psql template1
template1=# ALTER USER postgres WITH PASSWORD ‘Password’;
template1=# \q

Do the same for the unix user ‘postgres’. Enter the same password that you used previously:

$ sudo passwd -d postgres
$ sudo su postgres -c passwd

You’ re able to use command-line access and pgAdmin to run the database server.

To set-up the PostgreSQL admin pack that enables better logging and monitoring within pgAdmin. Run the following at the command-line:

$ sudo su postgres -c psql < /usr/share/postgresql/8.3/contrib/adminpack.sql

To open up the server enabling access and use it remotely, edit (vi, nano, gedit) the postgresql.conf file:

$ sudo vi /etc/postgresql/8.3/main/postgresql.conf

In Connection settings section:
# – Connection Settings -

listen_addresses = ‘*’

and for security:

password_encryption = on

Save the file and close the editor.

To define who can access the server, edit the pg_hba.conf file and in the last line, add in your subnet mask and the IP address of the machine that you want to enable access to your server.

$ sudo vi /etc/postgresql/8.3/main/pg_hba.conf

# DO NOT DISABLE!
# If you change this first entry you will need to make sure that the
# database
# super user can access the database using some other method.
# Noninteractive
# access to all databases is required during automatic maintenance
# (autovacuum, daily cronjob, replication, and similar tasks).
#
# Database administrative login by UNIX sockets
local   all         postgres                          ident sameuser
# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD

# “local” is for Unix domain socket connections only
local   all         all                               md5
# IPv4 local connections:
host    all         all         127.0.0.1/32          md5
# IPv6 local connections:
host    all         all         ::1/128               md5

# Connections for all PCs on the subnet
#
# DATABASE USER IP-ADDRESS IP-MASK METHOD
host    all         all         [ip address]          [subnet mask]  md5

Restart the server:

$ sudo /etc/init.d/postgresql-8.3 restart

How to add user to sudoers list (Ubuntu)

As root:

#visudo

Put this line at the end (replace username with your username):
username ALL=(ALL) ALL

Ex:
# User privilege specification
root    ALL=(ALL) ALL
fred    ALL=(ALL) ALL

Delicious Add on for Firefox 3

Like any beta release this Firefox Add-on is meant to provide you with a preview of upcoming features. Some of the features and interface choices, like Jump To Tag, are experimental and may change before we officially launch and we’re eager to hear your feedback on the changes, especially if you think we can do certain things better. If you have issues or comments, please let us know in the delicious-firefox-extension Yahoo! Group.

Download it here . Also, since this is prerelease software, please do not submit support requests via the normal channel- we have a Yahoo! Group for discussion of the Firefox 3 Add-on.

If you’re not already familiar with the extension, take a look at our Quick Tour which explains the basics. It hasn’t been updated to reflect all the new features but is a great way to get started.

Thanks for helping us test- we’ve built and upgraded these Add-ons based on lots of user feedback and we hope you enjoy trying them out.

Nick Nguyen
Senior Product Manager, del.icio.us

http://del.icio.us/extensions/firefox/delicious.xpi

Powering up IT for professional learning

The study and implementation of technology-enhanced professional learning has been fragmented. Now European researchers have linked such ‘islands’ of knowledge into a thriving, collaborative community.

The EU-funded PROLEARN project was set up four years ago to bridge the currently existing gap between research and education at universities and similar organisations, and training and continuous education that is provided for and within companies.

The connections created gives members on the networks the ability to develop a whole new breed of educational tools and technologies that could benefit learners in their professional spheres and workplaces.

More and more companies are realising they must invest in their employees’ continuing education to stay competitive. PROLEARN brings together the key research groups, other organisations and industrial partners, helping to create a ‘network of excellence’ in professional learning and training, says Dr Eelco Herder, the project’s manager.

The project leverages the significant resources universities and research centres all over Europe have devoted to improving technology-enhanced learning (TEL).

“Before PROLEARN, universities all over Europe often had exciting projects,” he says. “But they tried to find stand alone solutions. Little isolated islands of knowledge had their own standards, their own communications systems, their own proprietary TEL products and services.”

Flashmeeting software. Photo: ©  Prolearn project.

The power of shared knowledge

The basic aim of the project was to share knowledge and develop it via joint research projects.

“Because academic institutions are where TEL is being researched, they become the first adopters of new technologies, but there are also implications for the corporate world,” he says.

To ensure the wider adoption of TEL, systems from different institutions need to share data and ‘talk’ with one another.  To promote system compatibility, the researchers employ an educationally-focused Simple Query Interface (SQI), a very basic application containing programming commands.

The SQI takes care of sending and responding to user queries, making it relatively easy to implement across the different systems used by universities around Europe.

“Despite some excellent results, this is still a work in progress because it is a political as well as a technological challenge,” Herder admits. “Quite apart from the technology aspect, people have to be persuaded adoption is in their best interests. The more people use it, the more effective it will be.”

To further the task the project’s researchers have also set up a new European Association for Technology Enhanced Learning (EATEL), and established annual conferences and events that will endure in their own right.

PROLEARN is also providing support to companies by establishing a Virtual Competence Centre. The networking service is designed to help improve the effectiveness of competence centres run by individual companies or by trade and industry associations.

Meanwhile the PROLEARN Academy, another network, is designed to transfer research results into education and training programmes, international conferences, and scientific journals.
PROLEARN is also testing software tools in common usage throughout the TEL community.

These have applications such as videoconferencing, web browsing and portfolio management. One of the most popular of these has been FlashMeeting, created by the UK’s Open University and currently offered to the community by EATEL.

FlashMeeting is a simple solution for videoconferencing via a web browser. The program has been adopted by academics around Europe and overseas due both to its ease of use and its features.

PROLEARN’s researchers have also helped developed other popular software tools, including the Conzilla dedicated browser and the Confolio portfolio management system.  The programs are available free to EATEL’s members.

Academics struggle with commercial world

So far the research partners have not made a serious attempt to commercialise the programs, although they are exploring ways of developing sustainable business models.

“Academics are not primarily interested in creating commercial business models, but if we offer companies the chance to become paying members of our community they will be able to offer their ideas and use our infrastructure,” Herder says.  “That will help pay for our ongoing staff and other costs for maintaining and further developing our tools and platforms.”

Private sector organisations and policy makers were able to exchange ideas with the TEL community at the first ECTEL conference in 2006.

Last year the second edition of the conference attracted about 200 delegates from 30 countries. Organisers expect a similar turnout this year on 17-19 September in Maastricht, the Netherlands.

The PROLEARN Summer School, which brings together graduate students and leading academics in TEL, has also been a great success for several years, according to Herder. The school will also continue and this year it will take place in Ohrid, Macedonia.

Meanwhile PROLEARN’s research partners have developed some forecasts on what education will look like by 2020. Collaboration will continue to be the key, the partners believe.

“The learner will not just be a consumer of knowledge, but somebody who is also able to create knowledge using external forces and feed it back into the community,” Herder says. “Learning becomes a two-way street and intelligence becomes collective by being part of a community.

He adds: “We hope PROLEARN has created a legacy for TEL by leaving a roadmap for research and an infrastructure which will make sure the community does not split up and go back to being a series of isolated islands.”

PROLEARN received funding from the EU’s Sixth Framework Programme for research as a ‘Network of Excellence’ programme dealing with technology enhanced professional learning.

Source: Cordis

Links: Prolearn project Prolearn fact sheet on CORDIS

Braille converter bridges the information gap

A free, e-mail-based service that translates text into Braille and audio recordings is helping to bridge the information gap for blind and visually impaired people, giving them quick and easy access to books, news articles and web pages.

Developed by European researchers, the RoboBraille service offers a unique solution to the problem of converting text into Braille and audio without the need for users to operate complicated software.

“We started working in this field 20 years ago, developing software to translate text into Braille, but we discovered that users found the programs difficult to use – we therefore searched for a simpler solution,” explains project coordinator Lars Ballieu Christensen, who also works for Synscenter Refsnaes, a Danish centre for visually impaired children.

The result of the EU-funded project was RoboBraille, a service that requires no more skill with a computer than the ability to send an e-mail.

Users simply attach a text they want to translate in one of several recognised formats, from plain text and Word documents to HTML and XML. They then e-mail the text to the service’s server. Software agents then automatically begin the process of translating the text into Braille or converting it into an audio recording through a text-to-speech engine.

“The type of output and the language depends on the e-mail address the user sends the text to,” Christensen says. “A document sent to britspeech@robobraille.org would be converted into spoken British English while a text sent to textoparabraille@robobraille.org would be translated from Portuguese into six-dot Braille.”

The user then receives the translation back by e-mail, which can be read on a Braille printer or on a tactile display, a device connected to the computer with a series of pins that are raised or lowered to represent Braille characters.

RoboBraille can currently translate text written in English, Danish, Italian, Greek and Portuguese into Braille and speech. The service can also handle text-to-speech conversions in French and Lithuanian.

Christensen notes that the RoboBraille partners are constantly working on adding new languages to the service and plan to start providing Braille and audio translations for Russian, Spanish, German and Arabic. They are also working on making the service compatible with PDF documents and text scanned from images.

Up to 14,000 translations a day

At present, the service translates an average of 500 documents a day, although it could handle as many as 14,000. RoboBraille can return a simple text in Braille in under a minute while taking as long as 10 hours to provide an audio recording of a book.

As of January, the RoboBraille system had carried out 250,000 translations since it first went online.

The team have won widespread recognition for their work, receiving the 2007 Social Contribution Award from the British Computer Society in December while in April they were awarded the 2008 award for technological innovation from Milan-based Well-Tech.

“We initially started offering the service only in Denmark but to make it viable commercially we needed to broaden our horizons. Hence the eTen project which allowed us to involve other organisations across Europe in developing and expanding the service, not only geographically but also in terms of users,” Christensen says.

In addition to the blind and visually impaired, the service can also help dyslexics, people with reading difficulties and the illiterate. The project partners plan to continue to offer the service for free to such users and other individuals, while in parallel developing commercial services for companies and public institutions.

“Pharmaceutical companies in Europe will soon be required to ensure all medicine packaging is labelled in Braille and we are currently working with three big firms to provide that service,” Christensen explains. “Banks and insurance companies are also interested in using it to provide statements in Braille as too is the Danish tax office. In Italy there is interest in using it in the tourism sector.”

The RoboBraille team, which recently received an €1.1 million grant over four years from the Danish government, expect the service to be profitable within four or five years.

And although they are not actively seeking investors, they are interested in partnerships with organisations interested in collaborating on specific social projects.

RoboBraille was funded under the EU’s eTEN programme for market validation and implementation.

Source: Cordis
Links:Robobraille project RoboBraille fact sheet on eTen

How to know what version of Linux (Ubuntu) you are running

Try:
# lsb_release -d -s -c

Ubuntu 8.04
hardy

# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=8.04
DISTRIB_CODENAME=hardy
DISTRIB_DESCRIPTION=”Ubuntu 8.04″

# cat /etc/issue
Ubuntu 8.04 \n \l