Entries Tagged as ''

Berkeley Lab Team Wins Special ACM Gordon Bell Prize for Algorithm Innovation

BERKELEY, CA — A team of scientists from the U.S. Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) has won a prestigious Gordon Bell Prize, sponsored by the Association for Computing Machinery (ACM), for special achievement in high performance computing for their research into the energy harnessing potential of nanostructures. Their method, which was used to predict the efficiency of a new solar cell material, achieved impressive performance and scalability.

The ACM Gordon Bell Prize annually recognizes the best performance of scientific applications on supercomputers. This year’s prize, presented in a special category for algorithm innovation, was announced Thursday, Nov. 20, at the awards session of the SC08 conference in Austin.
A test run of LS3DF, which took one hour on 17,000 processors of the Franklin supercomputer at NERSC), performed electronic structure calculations for a 3500-atom ZnTeO alloy. Isosurface plots (yellow) show the electron wavefunction squares for the bottom of the conduction band (left) and the top of the oxygen-induced band (right). The small grey dots are Zn atoms, the blue dots are Te atoms, and the red dots are oxygen atoms. (Image courtesy of Lin-Wang Wang)

The Berkeley Lab researchers used three of the most advanced scientific computing facilities of the Department of Energy (DOE) Office of Science for this award-winning work: the National Energy Research Scientific Computing Center (NERSC) at Berkeley Lab, the Argonne Leadership Computing Facilities (ALCF) at Argonne National Laboratory and the National Center of Computational Sciences (NCCS) at Oak Ridge National Laboratory. Their study was titled: “Linearly Scaling 3D Fragment Method for Large-Scale Electronic Structure Calculations.”

Nanostructures, tiny materials 100,000 times finer than a human hair, may hold the key to energy independence. Scientists believe that a fundamental understanding of nanostructure behaviors and properties could provide solutions for curbing our dependence on petroleum, coal and other fossil fuels.

To better understand and demonstrate the potential of nanostructures, the Berkeley Lab researchers simulated their behavior through development of the Linearly Scaling Three Dimensional Fragment (LS3DF) method. These computer algorithms use a novel “divide-and-conquer” technique to efficiently gain insights into how nanostructures function in systems with 10,000 or more atoms.

The LS3DF team consisted of Berkeley Lab’s Lin-Wang Wang, Byounghak Lee, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier and David Bailey, an agregate of materials scientists, mathematicians and computer scientists contributing their own special expertise to solve this problem.
Lin-Wang Wang, of Berkeley Lab’s Computational Research Division, led the development of the LS3DF algorithms, which used a novel “divide-and-conquer” technique to efficiently compute how nanostructures function in systems with 10,000 or more atoms.

The LS3DF application ultimately achieved a speed of 442 teraflop/s (442 trillion calculations per second) on a Cray XT5 system with 147,146 cores at the NCCS. The Berkeley Lab researchers were also able to run the code on the IBM BlueGene/P system at Argonne, reaching 224 teraflop/s on 163,840 cores, or 40.5 percent of the system’s peak performance capability.

The team first ran the LS3DF application on 36,864 cores of the Cray XT4 (Franklin) at NERSC, achieving 135 Tflop/s. These initial results at NERSC provided the key scientific insights from the application.

“By incorporating the correct chemical formulas into efficient computer programs, scientists can learn a lot about the structures and properties of molecules and solid,” said. Lin-Wang Wang, a computational material scientist who led the Berkeley Lab team. “I like to think of computers as chemistry’s third pillar. In most cases, computer simulations complement information obtained by chemical experiments, but in some cases they can also predict unobserved phenomena.”

A science run using LS3DF, which took one hour on 17,280 cores of the NERSC Franklin system, computed the electronic structure of a 3,500-atom ZnTeO alloy. This run verified that the code could be used to compute properties of the ZnTeO alloy that previously had been experimentally observed. The simulation led to a prediction for the efficiency of this alloy as a new solar cell material.

LS3DF offers a more efficient way for calculating energy potential because it is based on the observation that the total energy of a large nanostructure system can be broken down into small pieces, and each piece can be calculated separately. More traditional methods calculate the entire structure as a whole system and are much more time consuming and resource intensive. Because LS3DF scales almost perfectly with the number of compute cores, it is the first electronic structure code that runs efficiently on computer systems with tens to hundreds of thousands of cores.

“We are excited by the results we are seeing,” said LS3DF team member Meza, who heads Berkeley Lab’s High Performance Computing Research. “The efficiency of LS3DF on these large computer systems is impressive, but the real story is the power of algorithms. Using a linear scaling algorithm, we can now study systems that would otherwise take over 1,000 times longer on even the biggest machines today. Instead of hours, we would be talking about months of computer time for a single study.”

Getting codes to run with such high efficiencies on massively parallel machines is not a trivial task. Bailey, Shan and Strohmaier of the DOE Office of Science’s Scientific Discovery through Advanced Computing (SciDAC) Performance Engineering Research Institute (PERI) worked hand-in-hand with Wang and his colleagues to analyze the performance of LS3DF and to identify potential performance improvements. Responding to this analysis, Berkeley Lab researchers assisted with a major revision of the code, which led to the prize-winning submission.

“The computational power we have is staggering and it is important to make sure that each research project can effectively harness the power of Argonne’s Intrepid and optimize their calculations”, said Katherine Riley, the ALCF computational scientist who worked with the Berkeley Lab team. “Not only can we drastically reduce the time it takes to generate results, we can help scientists ask different questions and develop new insights in order to accelerate breakthroughs.”

Once the LS3DF code had been optimized it was a matter of days before it was running at each of the DOE supercomputing facilities. Oak Ridge National Laboratory invited Wang and other Gordon Bell finalists to carry out runs on ORNL’s leadership Cray supercomputer, Jaguar. In Wang’s case, the winning simulation was achieved after only two runs over a two-day period, demonstrating the ease of porting – and running – high-performance applications on the Cray XT architecture. The project had previously been awarded time on Jaguar under DOE’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program.

“We still don’t quite understand how the electron moves around in a nanostructure, and how such properties depend on the size, geometry, composition and surface passivations,” said Wang. “Understanding this dependence will allow us to design nanostructures for desired applications. Using our improved LS3DF method will help us to understand and predict these properties.”

The ALCF, NCCS and NERSC are funded by the Office of Advanced Scientific Computing research in the DOE Office of Science, providing some of the world’s most powerful computing resources and support to thousands of researchers around the country.

Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California. Visit our Website at www.lbl.gov

Additional Information

For more information about NERSC, visit: www.nersc.gov

For more information about the ALCF, visit: www.alcf.anl.gov

For more information about the NCCS, visit: www.nccs.gov

From Berkeley Lab

Researchers hack spam network for study

Written by Iain Thomson in San Francisco vnunet.com

Researchers from the University of California, Berkeley and UC San Diego have published a report detailing how they hacked into a criminal network to collect data on the economics of spam.

The team managed to get into the Storm botnet and configured the command and control infrastructure so that results were sent back to them for analysis. The team followed three spam campaigns involving 469 million pieces of spam.

“Spam-based marketing is a curious beast. We all receive the advertisements but few of us have encountered a person who admits to following through on this offer and making a purchase,” said the Spamalytics report (PDF).

“And yet the relentlessness with which such spam continually clogs inboxes, despite years of energetic deployment of anti-spam technology, provides undeniable testament that spammers find their campaigns profitable. Someone is clearly buying. But how many, how often and how much?”

The researchers found that a campaign for pharmaceuticals achieved a 0.00001 per cent conversion rate from spam to sale, and that all but one of the sales were for ‘male enhancement’ products.

Nevertheless, the low cost of sending out vast amounts of email, which the researchers estimate at £51 per million, means that the spammers could earn £1.75m a year from spam, although how much of that is profit is unknown.

The research also revealed some interesting data on the effectiveness of anti-spam filters, which typically cut out about a quarter of all spam. They are a serious concern to spammers, but not deployed widely enough to cut traffic significantly.

The effectiveness of blacklisting was also called into question, since lists had to be updated every half hour and were frequently ineffective.

How Will We Interact with the Web of Data?

From IEEE Internet Computing
by
Tom Heath Talis Information
T
he Semantic Web is a global information space of linked data, designed for machine consumption rather than human use. Right? Well, yes and no. It’s true to say that machine-readable data, given explicit semantics and published online, coupled with the ability to link data in distributed data sets, are the Semantic Web’s key selling points. Together, these features allow aggregation and integration of heterogeneous data on an unprecedented scale, and machines will do the grunt work for us.

However, without a human somewhere in this process to reap the rewards of these new capabilities, the endeavour is meaningless. Far from removing humans from the equation, a Web of machine-readable data (the Semantic Web; we also call it the “Web of data”) creates significant challenges and opportunities for human-computer interaction.

To date, the Semantic Web community has mostly been busy developing the technical infrastructure to make the Web of data feasible in principle and publishing linked data sets to make it a reality. If we’re to fully exploit the challenges and opportunities of a Web of data, we need to move beyond the initial phase and work to understand how this changes the Web’s user interaction paradigm.

In this column, I’ll discuss some ways in which our interaction with the Web of data might differ from how we interact with the established Web of documents, and what this might mean for both users and producers of Web content.

Semantic Web: from vision to reality

In 1999, Jakob Nielsen wrote about a looming crisis — the Web was growing at a phenomenal rate, and without closer attention to user interface principles, he predicted it would become an unusable mass of documents.1 Almost 10 years have passed since then, and the Web is undergoing another seismic shift. The result of this is the emergence of the Web of data, or Semantic Web; envisioned for more than a decade, and the result of many years’ work on the underlying technology. Although we might refer to them as distinct concepts, the Web of data isn’t a separate entity distinct from the existing Web of hypertext documents, but more akin to another layer of cloth interwoven with the Web as we know it.

Now, though, the headline statistics for the Web’s growth aren’t quoted in terms of Web pages or Web sites. Instead, people talk about numbers of triples published in the Web of data using the Resource Description Framework (RDF), and the number of links these triples create between distributed data sets.

RDF is a W3C specification for making statements about things in machine-readable form. These statements each consist of a subject, predicate, and object, hence the name triples. In most cases, the subject of a triple is a uniform resource identifier (URI) that can identify anything the data publisher chooses, be that a person, a place, a document in the Web, an abstract concept — in short, anything. Predicates specify the nature of the relationship between the subject and object, are drawn from vocabularies published in the Web, and are identified by URIs. An RDF triple’s object is usually a string literal or another URI. When the object is a URI from a different namespace — that is, it identifies something in an external data set — the RDF triple creates a link between those data sets, replacing isolated data islands with a giant, distributed data set built on top of the Web architecture — a true Web of data.

When members of the grassroots Linking Open Data project last tried to calculate the current size of the Web of data, their conservative estimates suggested that data sets in the Web contained more than 2 billion RDF triples, 3 million of which were links across data sets.2 The rate of growth in this Web is so great that any future estimates are likely to be out of date as soon as they’re published.

One other RDF feature worth noting — it enables easy integration of triples contained in any number of documents distributed across the Web. Source documents can be merged painlessly, without the graph that results from this merge needing to conform to a particular schema. One consequence of this is a major reduction in the headaches associated with integrating heterogeneous data.

Throw out your homepage!

In the Web of documents, individuals and organizations often take great care to develop visually attractive homepages that send out just the right message to their target audience. If RDF enables data from multiple sources to be easily integrated to form a coherent view of a particular thing, what does this mean for how we publish data on the Web? It means that the Web page as we know it is dead.

Developers of Web 2.0 mashups have been demonstrating this for some time, integrating data from a handful of different sources to present a novel view that none of the source data sets alone can provide. The Web of data is the logical extension, letting developers create links between data sources that are themselves exposed on the Web for others to reuse to build large-scale, ad hoc mashups, while simultaneously reducing the challenges of integrating heterogeneous data.

Documents will always be useful data containers, but in many cases, I predict they will become nothing more than that. In the Semantic Web, you can’t assume you have control over how the information you publish will be presented — it’s just data. Thinking at the visual design level, RDF represents an extension of the long-established principle of separating content from presentation. For some content creators, this might ring alarm bells — how will brand be maintained if you have less control over presentation? For others, it will represent an opportunity to free themselves from visual design concerns, concentrate first on publishing relevant, high-quality data, and let others build the views they want rather than those that someone else assumes they need.

At the data level, publishers can have some influence over which external sources their data links to, primarily by creating these links themselves and publishing them for others to consume. However, in the Web of data, no one can control with any degree of certainty the sources with which their data is integrated — enabling serendipitous reuse is exactly the point! As already discussed, data published in the Web in a reusable form enables new views that have value beyond the sum of the parts and that the original creators might not have anticipated in advance.

It’s for these reasons that I suggest we throw away our homepages. Researchers well know the challenge of connecting all the pieces of their professional activities into a coherent whole: the projects, the papers, the committee and editorial board memberships, the blog entries and photo albums, all scattered across isolated islands on the Web, maybe replicated on their personal Web site or connected by strands of hypertext; or maybe not, owing to the effort involved.

A homepage for the Web of data takes a different shape. At its most basic level, it could simply be a collection of RDF triples that tie together data we want to express that’s distributed across numerous locations. The machines’ job is then to assemble this data into a coherent view, ready for human consumption.

To put my money where my mouth is, next time I get business cards printed, I won’t be including my homepage address. Instead, I’ll put my URI on the card, safe in the knowledge that a human being with a browser, Semantic or otherwise, can look up that URI and find some of what the Web has to say about me.

What should a Semantic Web browser look like?

Extending these ideas, we can see that the document in which a particular RDF graph is published becomes primarily an indicator of provenance, rather than representing the definitive packaging of a certain slice of data or content.

Of far greater relevance than the documents themselves are the things described in those documents — the people, places, and concepts. So far, I’ve talked about a Web of data, but I’m really using that term as shorthand for “Web of data about things” — any things. We might not be able to retrieve a car over HTTP, but we can identify it with an HTTP URI and use the Web to retrieve its description in RDF.

It’s at the level of “things” that browsers for the Web of data should operate. Providing simple browsers for RDF triples, and the documents in which they’re published, is one option for enabling people to interact with this information space. We saw this trend with some of the earliest Semantic Web browsers, but it rather misses the point. The one-page-at-a-time style of browsing, which we know well from the Web of documents, would make nothing of the potential we now have for integrated views of data assembled from numerous locations.

So, Semantic Web browsers must not simply echo the underlying representation of the data.3 Instead, they must treat “things,” in the broadest sense, as first-class citizens of the interface. A particular thing of interest should take center stage, with the browser assembling relevant information seamlessly behind the scenes.

We’re seeing shades of this trend in Semantic Web browsers such as The Tabulator4 and DBpedia Mobile,5 in which the thing of interest is of greater importance, and specific documents simply supply fragments of data that together make up a broader picture. Despite these moves in the right direction, we still have some way to go.

Conventional browsers have largely failed to deliver on the original vision of the Web as a read/write medium. Although this vision is slowly being realized at a general level through, for example, blogs, wikis, and specialized annotation interfaces such as Flickr, there remains a significant degree of indirection when it comes to editing Web documents. In some cases, this process still involves starting an editor for HTML documents, making appropriate changes, and then starting another application (such as an FTP client) to publish the updated document.

Browsers for the Semantic Web, which I suggest we call “thing browsers,” have an opportunity to enable a far greater degree of direct manipulation in their interfaces. Different types of objects afford different types of actions, and knowing the type of object on which the user is focused should let browsers provide a menu of actions specialized for this object type, and perhaps even adapt these according to the context.

For example, if the user is currently browsing a person, the browser could let the user send a message to that person, share an object with them, or arrange a meeting, without any of these functions having been explicitly listed as actions that can be invoked on these individuals. Instead, the Semantic Web at large can provide the necessary knowledge and services on which to offer such functionality, such as statements describing “arrange meeting with” as a valid action for a thing of type “person,” or definitions of what constitutes a meeting, or venue suggestions that are tailored to the relationship between the two parties and the time of day.

Clearly, a Web of data can’t offer direct manipulation of real-world things, such as cars and dogs, which are not, and never will be, online. However, in a Web where we can explicitly reference anything, not just documents, there’s great potential to reduce the degree of indirection in Web interfaces. We no longer have to refer to Web pages about things but can refer to the things themselves.

In case there was any doubt, this is no overnight endeavor but a trend that will take years to be realized and could take many different forms. Giving a keynote talk at the 2007 World Wide Web conference, Bill Buxton of Microsoft Research made the claim that “The diversity of ‘Web browsers’ tomorrow will match the diversity of ‘ink browsers’ (a.k.a. paper) today — in terms of diversity of form, function, location, and importance.” I don’t get the impression that Buxton was thinking about the Web of data when making this statement, but the claim stands up nonetheless — a true Web of things will require similar diversity in the interfaces through which we exploit it. The browser is just one approach.

A back button for the Semantic Web?

Accepting the shift from document to thing, and from predefined views to those assembled dynamically, won’t just require completely new interfaces but also several changes to the interaction widgets in interfaces with which we’re already familiar. If browsing becomes not just about moving from one document to another by following links, but about integrated views of data assembled from various sources, then the notion of the “back” button takes on a slightly different meaning in the interface. Rather than moving between documents, the back button in a Semantic Web browser should move the user to previously viewed things. More significant, a form of “undo” button, as you might find in a word processor, could be of critical importance in an environment in which vast amounts of data can be assembled at minimal cost, but not all of it will be pertinent to the job in hand.

The range of potential sources from which data will be available about a certain thing will be immense. Imagine entering a URI for “London” into your Semantic Web browser’s address bar. All the data available on the Web about London can’t feasibly be presented in one interface; users will need to decide which sources to add in depending on their current task or context, or will need the browser to make this decision intelligently for them, with the ability to undo the addition of any particular sources. This functionality becomes even more critical if automated reasoning is carried out on Semantic Web data, creating knowledge that wasn’t previously explicit in any of the individual data sources.

How to manage the assembly of these data sources becomes a critical issue. When several colleagues and I evaluated the deployment of various Semantic Web technologies to delegates at the 2006 European Semantic Web Conference, one of the key themes to emerge was “coherence.”6 Delegates had been presented with various Semantic Web applications for use at the conference; they expected data to be integrated across these and presented as a coherent whole. For various reasons described elsewhere,6 this wasn’t possible, leading to a suboptimal user experience and confusion for delegates.

Key to developing Web of data browsers will be look-up services such as Sindice,7 which provide a means to find other RDF documents on the Semantic Web that mention a particular thing. This kind of service might help ensure that the user experience is coherent — that is, that it includes all data the user expects it to. However, ensuring that a particular view of data is useful is another issue.

Any system aiming to integrate heterogeneous data on an ad hoc basis and present this to users will need to adopt sophisticated models of relevance, quality, and trust that are sensitive to the user’s current task and its context. How that might be achieved is a question for another day.

References

  1. J. Nielsen, “User Interface Directions for the Web,” Comm. ACM, vol. 42, no. 1, 1999, pp. 65–72.
  2. C. Bizer et al., “Linked Data on the Web (LDOW 08),” Proc. 17th Int’l World Wide Web Conference (WWW 08), ACM Press, 2008, pp. 1265–1266.
  3. D. Karger and M.C. Schraefel, “The Pathetic Fallacy of RDF,” Proc. 3rd Int’l Semantic Web User Interaction Workshop (SWUI 06), 5thInt’l Semantic Web Conf., 2006.
  4. T. Berners-Lee et al., “Tabulator: Exploring and Analyzing Linked Data on the Semantic Web,” Proc. 3rd Int’l Semantic Web User Interaction Workshop (SWUI 06), 5th International Semantic Web Conf., 2006. (pdf)
  5. C. Becker and C. Bizer, “DBpedia Mobile: A Location-Enabled Linked Data Browser,” Proc. Workshop on Linked Data on the Web (LDOW 08), Central Europe (CEUR) Workshop Proc., 2008. (pdf)
  6. T. Heath, J. Domingue, and P. Shabajee, “User Interaction and Uptake Challenges to Successfully Deploying Semantic Web Technologies,” Proc. 3rd Int’l Semantic Web User Interaction Workshop (SWUI 06), 5thInt’l Semantic Web Conf., 2006. (pdf)
  7. G. Tummarello, E. Oren, and R. Delbru, “Sindice.com: Weaving the Open Linked Data,” Proc. 6th Int’l Semantic Web Conf. and 2nd Asian Semantic Web Conf. (ISWC+ASWC 07), Springer, 2007, pp. 552–565.

Tom Heath is a researcher in the Platform Divison at Talis. His research interests include recommendation, trust, and social networks in a linked data/Semantic Web context. Heath is a PhD candidate in computer science at The Open University’s Knowledge Media Institute and has a first degree in psychology. He is a member of the IEEE and the ACM. Contact him at tom.heath@talis.com.

A Better Network for Outer Space: Why Vint Cerf wants to put Internet-style networking in space.

By Brittany Sauser from Technologyreview.com

Having designed the networking protocols that launched the Internet, Vint Cerf now wants to put the same kind of robust communications network in outer space. Currently, astronauts and robotic spacecraft communicate with Earth using point-to-point radio links and communications schemes that are tailored to nearly every new mission. This inhibits interoperability and the repurposing of communications equipment, and as the number and complexity of missions increases, it will only become more problematic.

Cerf, who is Google’s vice president and chief Internet evangelist, is working with a team at NASA’s Jet Propulsion Laboratory (JPL), where he is also a visiting scientist, and at the MITRE Corporation, based in Washington, DC, to design and implement a revolutionary new scheme for space communication. The project, dubbed the Interplanetary Internet, will be tested aboard the International Space Station (ISS) in 2009, and Cerf hopes that by 2010, new space missions will be designed to use the protocols.

Ultimately, the network could interconnect manned and robotic spacecraft, forming the backbone of a communications system that reaches across the solar system.

Technology Review’s Brittany Sauser caught up with Cerf to discuss the details of the project.

Technology Review: What’s the purpose of the Interplanetary Internet?

Vint Cerf:The project started 10 years ago as an attempt to figure out what kind of technical networking standards would be useful to support interplanetary communication. Bear in mind, we have been flying robotic equipment to the inner and outer planets, asteroids, comets, and such since the 1960s. We have been able to communicate with those robotic devices and with manned missions using point-to-point radio communications. In fact, for many of these missions, we used a dedicated communications system called the Deep Space Network (DSN), built by JPL in 1964.

But one problem with space communication has been the limited use of standards. When we launch a spacecraft with a unique set of sensors onboard, we often end up writing special communication and application software that is adapted to that spacecraft’s sensor systems and manipulators. In the Internet world, we use standards called the TCP/IP protocol suite–packet switching and store-and-forward methods–to allow a lot of different devices, billions of things, to interact compatibly with each other. The team set out to develop a suite of protocols that would allow us to have the kind of network flexibility in space that we have on Earth. The Interplanetary Internet project is primarily about developing a set of communication standards and technical specifications to support rich networking in space environments.

TR:What are the challenges of building such a network in space?

VC:We started by working on a set of protocols that could deal with two very important properties of space communication. The first is delay. The distances between the planets are very large. For example, when Earth and Mars are closest together, it still takes 3.5 minutes for a radio signal moving at the speed of light to propagate. If I were on Mars and you were on Earth, it would take seven minutes at best before you heard a response. When Earth and Mars are farthest apart, the round trip takes 40 minutes! The reason we can talk back and forth on Earth so easily is that propagation times are very short by comparison.

The other problem is that the planets and their satellites are in motion, and most are rotating. The rotation of the planets means that if you are talking to something that is on the surface of the planet, it may rotate out of the line of sight so you cannot talk to it anymore, until the device on the surface rotates into view again. The same could be said for some orbiting satellites. You have to develop protocols that will deal with the fact that you cannot always communicate with the other party: the communication is both delayed and potentially disrupted. So that is what we designed: a delay- and disruption-tolerant networking system [DTN]. It will allow us to maintain communications more effectively, getting much more data because we don’t have to be in direct line of sight with the ultimate recipient in order to transfer data. The new protocols will be proposed to serve as a potential international standard for space networking.

TR:How does this new protocol, the delay- and disruption-tolerant networking system, work?

VC: We are using store-and-forward methods [routing information through hosts that hold on to it until a communications link can be established] similar to the TCP/IP design in order to service space-communication requirements. But our new bundle protocol is based on DTN principles. We have to cope with the fact that there is a really high potential for delay and disruption in the system. For example, Pluto is a long ways away, on the order of three to five billion miles and about 12 hours round-trip time. Using the DTN bundle protocol allows us to design more-complex mission configurations involving many devices on the surface of planets and in orbit around them. At Mars, for example, there are four orbiters and three landed and operational spacecraft. We expect to be able to use the standard TCP/IP protocols on the surface of planets and inside spacecraft, but we will use the DTN protocols for interplanetary distance communications.

TR: Is this going to require putting new infrastructure in space?

VC: The answer is yes and no. For example, the Deep Impact spacecraft [now called EPOXI] is already in orbit around the sun. It was used to launch a probe into a comet to examine its interior. EPOXI is being temporarily repurposed to test the new DTN protocols. The spacecraft has processing, memory, radio equipment, and solar panels for power so we don’t have to put new hardware up. We just have to upload new software. We are lucky to not have to field any new equipment yet, but the DTN protocols eventually have to show up in a fairly significant number of devices in the system to create the kind of network that can serve space-communication needs. Some specialized spacecraft could become store-and-forward routers. Each time a new mission is launched, using the standard bundle protocol, previous mission assets that are still in operation could be used to support the communication requirements of the new mission. In this way, we hope to accrete a kind of interplanetary backbone network.

TR: How are you handling security issues?

VC: There are security concerns, and we have been very careful to build defenses into the basic design. Each bundle-aware node will verify the identity of any other nodes that it is communicating with, and it will refuse to forward data from any nodes that it does not recognize. We will be using strong authentication methods, cryptographic communication methods, to ensure that the parties that are using the resources are authorized to do so.

TR: What is the biggest advantage of building new protocols for space communication?

VC: The important part here is that we have standardized protocols that will allow internetworking of various spacecraft launched by all the spacefaring nations. Over time, as new missions are launched, you start to build up a backbone capability. Every time you put up a new mission, you basically are putting up another potential node in the network. Our hope in the near term is to start putting DTN/bundle-protocol applications up on the Internet terrestrially, and also put them up on the International Space Station for testing. Eventually, we hope to have this capability running all the time, and then, when new deep-space missions are launched using these standard protocols, they will become part of the interplanetary communications system.

Good Code, Bad Computations: a Computer Security Gray Area

From UCSD

If you want to make sure your computer or server is not tricked into undertaking malicious or undesirable behavior, it’s not enough to keep bad code out of the system.

Two graduate students from UC San Diego’s computer science department—Erik Buchanan and Ryan Roemer—have just published work showing that the process of building bad programs from good code using “return-oriented programming” can be automated and that this vulnerability applies to RISC computer architectures and not just the x86 architecture (which includes the vast majority of personal computers).

Last year, UC San Diego computer science professor Hovav Shacham formally described how return-oriented programming could be used to force computers with the x86 architecture to behave maliciously without introducing any bad code into the system. However, the attack required painstaking construction by hand and appeared to rely a unique quirk of the x86 design.


This new automation and generalization work from graduate students and professors from UC San Diego’s Jacobs School of Engineering will be presented on October 28 at ACM’s Conference on Communications and Computer Security (CCS) 2008, one of the premier academic computer security conferences.

“Most computer security defenses are based on the notion that preventing the introduction of malicious code is sufficient to protect a computer. This assumption is at the core of trusted computing, anti-virus software, and various defenses like Intel and AMD’s no execute protections. There is a subtle fallacy in the logic, however: simply keeping out bad code is not sufficient to keep out bad computation,” said UC San Diego computer science professor Stefan Savage, an author on the CCS 2008 paper.

Return-oriented Programming

Return-oriented programming exploits start out like more familiar attacks on computers. The attacker takes advantage of a programming error in the target system to overwrite the runtime stack and divert program execution away from the path intended by the system’s designers. But instead of injecting outside code—the approach used in traditional malicious exploits—return-oriented programming enables attackers to create any kind of nasty computation or program by using just the existing code.

“You can create any kind of malicious program you can imagine—Turing complete functionality,” said Shacham.

For example, a user’s Web browser could be subverted to record passwords typed by the user or to send spam e-mail to all address book contacts, using only the code that makes up the browser itself.

“There is value in showing just how big of a potential problem return-oriented programming may turn out to be,” said computer science graduate student Erik Buchanan.

The term “return-oriented programming” describes the fact that the “good” instructions that can be strung together in order to build malicious programs need to end with a return command. The graduate students showed that the process of building these malicious programs from good code can be largely automated by grouping sets of instructions into “gadgets” and then abstracting much of the tedious work behind a programming language and compiler.

Imagine taking a 700 page book, picking and choosing words and phrases in no particular order and then assembling a 50 page story that has nothing to do with the original book. Return-oriented programming allows you to do something similar. Here the 700 page book is the code that makes up the system being attacked—for example, the standard C-language library libc—and the story is the malicious program the attacker wishes to have executed.

“We found that return-oriented programming poses a much more general vulnerability than people initially thought,” said computer science graduate student Ryan Roemer. He and Buchanan chose to study return-oriented programming for a class project after they heard Shacham outline a series of open questions in a guest lecture he gave in Savage’s computer security course last winter.

Living with Return-Oriented Programming

“The threat posed by return-oriented programming, across all architectures and systems, has negative implications for an entire class of security mechanisms: those that seek to prevent malicious computation by preventing the execution of malicious code,” the authors write in their CCS 2008 paper.

For instance, Intel and AMD have implemented security functionality into their chips (NX/XD) that prevents code from being executed from certain memory regions. Operating systems in turn use these features to prevent input data from being executed as code (e.g., Microsoft’s Data Execution Prevention feature introduced in Windows XP SP2). The new research from UC San Diego, however, highlights an entire class of exploits that would not be stopped by these security measures since no malicious code is actually executed. Instead, the stack is “hijacked” and forced to run good code in bad ways.

“We have demonstrated that return-oriented exploits are practical to write, as the complexity of gadget combination is abstracted behind a programming language and compiler. Finally, we argue that this approach provides a simple bypass for the vast majority of exploitation mitigations in use today,” the computer scientists write.

The authors outline a series of approaches to combat return-oriented programming. Eliminating vulnerabilities permitting control flow manipulation remains a high priority—as it has for 20 years. Other possibilities: hardware and software support for further constraining control flow and addressing the power of the return-oriented approach itself.

“Finally, if the approaches fail, we may be forced to abandon the convenient model that code is statically either good or bad, and instead focus on dynamically distinguishing whether a particular execution stream exhibits good or bad behavior,” the authors write.

When Good Instructions Go Bad: Generalizing Return-Oriented Programming to RISC,” by Erik Buchanan, Ryan Roemer, Hovav Shacham, and Stefan Savage, Department of Computer Science & Engineering University of California, San Diego’s Jacobs School of Engineering.

This work was made possible by the National Science Foundation (NSF).

Opening the Cloud: Open-source cloud-computing tools could give companies greater flexibility.

By Erica Naone from Technology Review

Cloud-computing platforms such as Amazon’s Elastic Compute Cloud (EC2), Microsoft’s Azure Services Platform, and Google App Engine have given many businesses flexible access to computing resources, ushering in an era in which, among other things, startups can operate with much lower infrastructure costs. Instead of having to buy or rent hardware, users can pay for only the processing power that they actually use and are free to use more or less as their needs change.

However, relying on cloud computing comes with drawbacks, including privacy, security, and reliability concerns. So there is now growing interest in open-source cloud-computing tools, for which the source code is freely available. These tools could let companies build and customize their own computing clouds to work alongside more powerful commercial solutions.

One open-source software-infrastructure project, called Eucalyptus, imitates the experience of using EC2 but lets users run programs on their own resources and provides a detailed view of what would otherwise be the black box of cloud-computing services.

Another open-source cloud-computing project is the University of Chicago’s Globus Nimbus, which is widely recognized as having pioneered the field. And a European cloud-computing initiative coordinated by IBM, called RESERVOIR, features several open-source components, including OpenNebula, a tool for managing the virtual machines within a cloud. Even some companies, such as Enomaly and 10gen, are developing open-source cloud-computing tools.

Rich Wolski, a professor in the computer-science department at the University of California, Santa Barbara, who directs the Eucalyptus project, says that his focus is on developing a platform that is easy to use, maintain, and modify. “We actually started from first principles to build something that looks like a cloud,” he says. “As a result, we believe that our thing is more malleable. We can modify it, we can see inside it, we can install it and maintain it in a cloud environment in a more natural way.”

Reuven Cohen, founder and chief technologist of Enomaly, explains that an open-source cloud provides useful flexibility for academics and large companies. For example, he says, a company might want to run most of its computing in a commercial cloud such as that provided by Amazon but use the same software to process sensitive data on its own machines, for added security. Alternatively, a user might want to run software on his or her own resources most of the time, but have the option to expand to a commercial service in times of high demand. In both cases, an open-source cloud-computing interface can offer that flexibility, serving as a complement to the commercial service rather than a replacement.

Indeed, Wolski says that Eucalyptus isn’t meant to be an EC2 killer (for one thing, it’s not designed to scale to the same size). However, he believes that the project can make a productive contribution by offering a simple way to customize programs for use in the cloud. Wolski says that it’s easier to assess a program’s performance when it’s possible to see how it operates both at the interface and from within a cloud.

Wolski says that Eucalyptus will also imitate Amazon’s popular Simple Storage Surface, which allows users to access storage space on demand, as well as its Elastic IP addresses, which keeps the address of Web resources the same, even if the physical location changes.

Ignacio Llorente, a professor in the distributed systems architecture group at the Universidad Complutense de Madrid, in Spain, who works on OpenNebula, says that Eucalyptus’s main advantage is that it uses the popular EC2 interface. However, he adds that “the open-source interface is only one part of the solution. Their back-end [the system's internal management of physical resources and virtual machines] is too basic. A complete cloud solution requires other components.” Llorente says that Eucalyptus is just one example of a growing ecosystem of open-source cloud-computing components.

Wolski expects many of Eucalyptus’s users to be academics interested in studying cloud-computing infrastructure. Although he doubts that such a platform would be used as a distributed system for ordinary computer users, he doesn’t discount the possibility. “You can argue it both ways,” he notes. But Wolski says that he thinks some open-source cloud-computing tool will become important in the future. “If it’s not Eucalyptus, I suspect [it will be] something else,” he says. “There will be an open-source thing that everyone gets excited about and runs in their environment.”

WPA Wi-Fi Encryption Is Cracked

From PCWorld Robert McMillan, IDG News Service

Security researchers say they’ve developed a way to partially crack the Wi-Fi Protected Access (WPA) encryption standard used to protect data on many wireless networks.

The attack, described as the first practical attack on WPA, will be discussed at the PacSec conference in Tokyo next week. There, researcher Erik Tews will show how he was able to crack WPA encryption, in order to read data being sent from a router to a laptop computer. The attack could also be used to send bogus information to a client connected to the router.

To do this, Tews and his co-researcher Martin Beck found a way to break the Temporal Key Integrity Protocol (TKIP) key, used by WPA, in a relatively short amount of time: 12 to 15 minutes, according to Dragos Ruiu, the PacSec conference’s organizer.

They have not, however, managed to crack the encryption keys used to secure data that goes from the PC to the router in this particular attack

Security experts had known that TKIP could be cracked using what’s known as a dictionary attack. Using massive computational resources, the attacker essentially cracks the encryption by making an extremely large number of educated guesses as to what key is being used to secure the wireless data.

The work of Tews and Beck does not involve a dictionary attack, however.

To pull off their trick, the researchers first discovered a way to trick a WPA router into sending them large amounts of data. This makes cracking the key easier, but this technique is also combined with a “mathematical breakthrough,” that lets them crack WPA much more quickly than any previous attempt, Ruiu said.

Tews is planning to publish the cryptographic work in an academic journal in the coming months, Ruiu said. Some of the code used in the attack was quietly added to Beck’s Aircrack-ng Wi-Fi encryption hacking tool two weeks ago, he added.

WPA is widely used on today’s Wi-Fi networks and is considered a better alternative to the original WEP (Wired Equivalent Privacy) standard, which was developed in the late 1990s. Soon after the development of WEP, however, hackers found a way to break its encryption and it is now considered insecure by most security professionals. Store chain T.J. Maxx was in the process of upgrading from WEP to WPA encryption when it experienced one of the most widely publicized data breaches in U.S. history, in which hundreds of millions of credit card numbers were stolen over a two-year period.

A new wireless standard known as WPA2 is considered safe from the attack developed by Tews and Beck, but many WPA2 routers also support WPA.

“Everybody has been saying, ‘Go to WPA because WEP is broken,’” Ruiu said. “This is a break in WPA.”

If WPA is significantly compromised, it would be a big blow for enterprise customers who have been increasingly adopting it, said Sri Sundaralingam, vice president of product management with wireless network security vendor AirTight Networks. Although customers can adopt Wi-Fi technology such as WPA2 or virtual private network software that will protect them from this attack, there are still may devices that connect to the network using WPA, or even the thoroughly cracked WEP standard, he said.

Ruiu expects a lot more WPA research to follow this work. “Its just the starting point,” he said. “Erik and Martin have just opened the box on a whole new hacker playground.”

Nanotechnology Publications and Patents: A Review of Social Science Studies and Search Strategies

From Can Huang, Ad Notten, and Nico Rasters – UNU-MERIT and Maastricht University

The paper provides a comprehensive review of more than 120 social science studies in nanoscience and technology, all of which analyze publication and patent data. We conduct a comparative analysis of bibliometric search strategies that these studies use to harvest publication and patent data related to nanoscience and technology. We implement these strategies on the 2006 publication data and find that Mogoutov and Kahane (2007) [Mogoutov, A. and B. Kahane, 2007. Data search strategy for science and technology emergence: A scalable and evolutionary query for nanotechnology tracking. Research Policy, 36: 893–903.], with their evolutionary lexical query search strategy, extract the highest number of records from the Web of Science. The strategies of Glanzel et al. (2003) [Glanzel, W., et al., 2003. Nanotechnology: Analysis of an Emerging Domain of Scientific and Technological Endeavour. Steunpunt O&O Statistieken, Report. Leuven: K.U. Leuven.], Noyons et al. (2003) [Noyons, E.C.M., et al., 2003. Mapping excellence in science and technology across Europe Nanoscience and nanotechnology. Draft report of project EC-PPN CT-2002-2001 to the European Commission.], Porter et al. (2008) [Porter, A.L. et al., 2008. Refining search terms for nanotechnology. Journal of Nanoparticle Research, 10(5):715-728.] and Mogoutov and Kahane (2007) produce very similar ranking tables of the top ten nanotechnology subject areas and the top ten most prolific countries and institutions.
Download