CDI Pre-Proposal

From HoSE 0.1

Jump to: navigation, search


Contents

[edit] Introduction

The last 15 years has seen substantial changes in how research gets published. This change began when databases such as INSPEC went online to allow users to easily search for articles. Presently, the new content of every journal in science and engineering is available online, with some journals only being published online, e.g. PLoS .

While the number of online publications has exploded, the format and basic content of a ‘paper' has changed very little. Papers are still expected to have sections that introduce the work, typically giving a brief historical account of relevant prior work and outlining the basic idea or hypothesis; detail theoretical assumptions and/or experimental techniques; present raw results from the models and/or experiments; discuss how the results support the hypothesis or improve on prior results; and provide a summary and/or concluding remarks about the contribution made by the work. The need to publish only self-contained, finalized and static work was necessitated not only by conventions but also by the modes of distribution that have existed for years. While research is rarely distributed to readers on paper anymore, the method by which researchers disseminate their work has basically stayed the same.

What was a near optimal dissemination system five years ago would appear no longer to be so. While part of this shift is due to recent advances in technology, the primary change has come in the form of a social experiment: Wikipedia. With very few rules and little moderation, Wikipedia has demonstrated that (semi-)open collaborative environments can indeed converge into something stable and useful, and not diverge to chaos as most anticipated. While there have certainly been growing pains, the system has matured and stabilized into one on which many students and researchers now rely. To bring scientific publishing into the 21st century, the PIs propose to form a virtual organization to promote a concept they call Evolutionary Publishing (EvoPub) and the software, called Hubs of Science Engineering (HoSE), needed to enable it. This project will leverage distributed expertise in science and engineering in order to support and improve research.

[edit] A Social Experiment Gone Right: Wikipedia

Wikipedia centers around a web application called a wiki. (Those reviewers unfamiliar with Wikipedia can visit the site .) There are many different software packages that implement the wiki concept. Wikipedia uses MediaWiki (MW), which is open source and readily customized. The basic idea is that content on any page can be edited directly in a web browser by any authorized user. Simple markup languages help with formatting and embedding multi-media content. The content is stored in a server-side database and web pages are formatted dynamically from the most recent content in the database. An essential component of wikis is a version control system; all versions of a page are archived and edits attributed to a user. ‘Watchers' of a page are notified automatically via email whenever that page is modified. Thus, the system is quite easy to police, with vandalism quickly removed and vandals blocked from making further edits.

There is a good analogy to be made by comparing encyclopedias and traditional scientific publishing with Wikipedia and EvoPub. The benefits that Wikipedia has over, say, the Encyclopedia Britannica are indicative of the benefits EvoPub will have over the present system. One of the great advantages is the timeliness of the information. Information on existing pages can be changed continually as events unfold or discoveries are made. In addition, new topics can be added or branched from more general ones at any time. Another important benefit is that no one person is in charge of writing a comprehensive article about, say, radar. Pieces of information can be added by whomever whenever. People can fill in small bits of information and the expertise of many people with knowledge of specific areas of a topic can be combined into a comprehensive article. Thus, collaboration is inherently promoted and communities naturally form around topics. There are still more benefits to the system we propose. For example, there are no (page) limits on the amount of information that can be included. If someone wants to develop a page that defines the $operator in every conceivable coordinate system, s/he is free to do so. Likewise, there is no limit on what is considered important. The community decides what is important by the pages they use. Typographical, grammatical, and factual mistakes can be quickly corrected with many users providing editorial skills.

Wikipedia is not without its problems, and there have certainly been some growing pains. One of the biggest criticisms is that there is no one directly responsible for the accuracy of information. The hope is that the community can police itself. This has largely been the case due to a newly implemented register-to-edit requirement and there being no real benefit to intentionally posting misinformation. A policy of not permitting opinions, an effort to cite references, and behind-the-scenes discussion pages have contributed to keeping the content accurate. There is still the possibility that at any given instant someone could obtain and then use inaccurate information from Wikipedia. A new initiative to have edits to popular pages validated by the community before they are displayed will help quell occasional vandalism, opinionated statements, and misinformation.

The success of Wikipedia has given the PIs confidence that EvoPub will work and provide a more timely, efficient, and collaboration-producing environment. The following sections outline necessary changes in both the social structure and the underlying technology to make the system attractive for academic researchers.

[edit] Evolutionary Publishing (EvoPub): A Revolution in Research Dissemination

There are many compelling reasons to change the current system for scientific publishing. Timeliness is one important improvement. Ideas and results could be published as they are generated. The peer review system, which appears to be collapsing under the weight of the number of papers submitted to journals, could be revamped to reward those that do thorough and timely reviews. Collaboration would be encouraged through research discussions. Researchers could be rewarded based on how their work contributes to the field. Everything becomes dynamic (evolutionary) and can be updated as new results are discovered and/or errors found.

While the idea for EvoPub is based on Wikipedia, a transition away from traditional publishing will require a system tailored for the purpose. EvoPub will consist of hierarchical layers of detail. The plan is to develop the middle layers first. These layers will consist of living review articles. Like traditional review articles these pages will summarize the state-of-the-art in a given topic. Because the APS copyright transfer agreement gives authors permission, the PIs will first contact authors of articles in Reviews of Modern Physics to allow us to include their work in EvoPub. The PIs will also solicit respected members of the community to contribute original review articles. These articles will be converted to wiki pages that can then evolve as new discoveries are made or omissions in the original article included. They will also contain hyperlinks to the papers cited and associated discussion pages where changes to the main article are suggested and debated. By putting something useful online to begin with, a positive feedback mechanism will be initiated in which more users generate more contributions which attract more users.

Unlike the freely editable pages on Wikipedia, living reviews will be curated by a committee. The curators would decide, based on suggestions and discussion, if and when a living review should be updated and who should make the update. For example, if there is a review article on the mechanical properties of carbon nanotubes and group X is finally able to determine experimentally how and when they transition into nanoribbons, either a member of group X or someone familiar with the work could suggest an update to the review article. If the curators decide the research is sound, they could request that someone submit an update to the review. Upon approval of the curators the updated review would be published. Thus, the curators would have a similar role to an editorial board. Quality would be assured.

Such a system would also overcome a weakness in the Wikipedia system and a necessity in academia: a peer review and reward system. A side bar on each review page would list contributors and the number of contributions s/he has made to that page’s content. Being added to this list would acknowledge a person's contribution to the topic. Because all revisions will be archived, readers could readily view individual contributions.

The PIs also envision layers of detail above and below the living review layer. Both of these layers will evolve as the user base does and become major components of EvoPub. The top level will be tailored for those outside of the research community, such as K-12 educators and students and undergraduate students. These pages will consolidate the knowledge contained in the living reviews and present it in a basic way. These will be similar to encyclopedia articles and will also have a committee of curators that will consider suggestions for changes and additions. As the EvoPub organization grows, members will be solicited to write these articles, which will be edited for readability and content by professional editors.

Below the living review layer will be a system, called ‘Open Source Publishing' (OSP), for disseminating research including finished works, basic ideas for new research, new results, and computer code. Each submission may be linked to by any number of living reviews but will have a home under just one. The challenge here will be to make the environment compelling enough for researchers to want to publish their work in the system. An essential part of this will be a peer review system to allow academic administrators to assess the quality of the work. Many blogging sites use built-in rating systems to rate comments and reward those that make significant contributions. A similar system will be implemented in OSP. Thus, not only will a member's new work and ideas be evaluated by the community, so will their comments and reviews of other's work whether made anonymously or attributed. Unlike in our current system, good reviewers will be rewarded. The rating system could be as sophisticated as desired with, say, an algorithm that accounts for the reputation of someone giving a rating. Additionally, OSP will allow completed works to go through an open review process in which authors may request the editors of the home living review start an official review of the work. Whether the work is accepted will be based on both the solicited and unsolicited reviews. If accepted, the work will be marked as such and official notification sent to the authors. In this way, the community itself will decide who and what is important.

By allowing people to post ideas and preliminary results in OSP, research can reach the community much quicker than it currently does. Researchers will be able to float speculative ideas to see if others have attempted similar things or if there is interest in collaboration. This would also allow people with far more ideas than time to investigate them a place to give others in the community an opportunity to pursue the idea. While there might be initial trepidation about posting one's ideas just to have others capitalize on them, each post will be time stamped so the idea's originator can be identified. Researchers would then be able to get their ideas out and receive credit for them instead of having them buried for years in half finished papers. This will promote collaboration, which is one of the main goals of OSP.

Professional editing, one of the benefits to publishing in traditional journals, will be handled by a system to be developed by PI Selber. He will establish internships for English students focusing on publishing. Each such student must presently take a three-credit internship to graduate and this project integrates well with the existing program. During the grant period, Selber will oversee these interns, who will work on a system to organize whatever content the OSP contains. Ideas for efficient ways to categorize, browse, and search the content will be prototyped and tested. When the system is feature complete, English interns will begin work on documentation for OSP. Finally, interns will begin to help with copyediting content. This will allow English students to gain experience and provide a free service to the journal.

EvoPub and OSP will use existing open source software. However, the PIs realize that significant improvements and customizations need to be made to software packages designed primarily for social networks, not scientific research publishing and collaboration. Fortunately, the PIs have extensive experience in customizing open source web applications including MW for use by scientists and engineers. The following will briefly describe the enabling software, called Hub of Science Engineering (HoSE), that will be developed. Note that while the software will be developed for and motivated by EvoPub and OSP, it will be bundled in an easily installable package for use by all communities from small research groups to large virtual organizations.

[edit] Hub of Science Engineering (HoSE): Flexible Software to Revolutionize Scientific Publishing

HoSE will consist of three components focusing on common aspects of scientific research: writing, coding, and discussion. Each of the components and then their integration will be described briefly.

[edit] Collaborative Writing

A customized MediaWiki will power the collaborative writing environment. It is open source and one of the most actively developed and used wikis. The PI has extensive experience using and modifying MW, and has used it to write ten collaborative proposals and papers over the past four years. This way of writing has been extremely productive and many others have shown interest in the idea. The wiki used to write this proposal is open to all and can be found at http://dssl.mne.psu.edu/nsfcdi. Figure  used to test the system. Key features are enlarged and explained in the caption.

The features shown in Fig.  and add-ons for making tables, adding figures, citing references, and cross-referencing are planned.

Continued development is planned to improve or include: Conversion to HTML. Only a small percentage of LaWML is currently converted to HTML. Fortunately, this small percentage encompasses the most commonly used formatting commands. Conversion of LaWML. The PI has converted raw different sources for the demonstration site and has developed a collection of robust regular expressions to do this conversion. These will be extended to convert a larger set of and macros. Integration of BibTeX. User will want to generate properly formatted bibliographies from the content of a document. This facility is not currently built into MW, although extensions add rudimentary support. Those familiar with BibTeX or EndNote will find these lacking. Users will be able to maintain a bibliography database in OSP. LaWML and WikiPub currently include facilities to add references from a user's database and generate bibliographies in both HTML and PDF. The ability to site directly using DOIs and ISBNs is planned. Cross-referencing. Cross-referencing can be accomplished in MW through existing extensions. However, the PI is implementing a more robust system that works more like and tags). Automatic numbering. There is no automatic way to number figures, tables, and equations in MW. The PI is developing an automatic numbering mechanisms. A prototype system is in place but needs to be refined and thoroughly tested.

[edit] Collaborative Coding

Essential to our vision is an ability to publish code. There are many web-based collaborative coding (CC) systems available (e.g. sourceforge.net). These systems are mature and user friendly, although adoption outside of computer and information sciences has been slow. However, there is great benefit to being able to maintain and share source code, however small, in a central repository that will transcend personnel changes in an organization, be it a small local research group or a large international society. A web-based code versioning system such as ViewVC or even MW will be included in HoSE and used for OSP.

To entice users to publish code in OSP, they will be able to easily wrap code in web-based GUIs for use by others. Because most scientific code does not need a sophisticated GUI, requiring usually a small number of input parameters and producing data that one would most likely plot, wrapping them in web interfaces is not difficult for those familiar with web server interfacing. Thus, the innovation in the CC module will be a simple point-and-click interface for developing web interfaces. The Web Interface Builder (WIB) will itself be a web-based application. The platform for creating the interface will then be the same as for using it, and CPU and operating system independent.

Unlike other interface builders (e.g. Rappture), WIB will generate true web applications that use the browser as an interface, not just as a Java runtime environment. Because the interface will be simple HTML with a Javascript support library, it will be extremely portable and easy to modify. Various Web 2.0 technologies (Ajax, etc.) will be used to make both the generated interface and WIB itself more responsive.

[edit] Collaborative Discussion

Having used MW for many years, the biggest weakness of the software is its environment for discussion. MW discussion pages are identical to the content pages and require users to format them appropriately. Unfortunately, this freedom and lack of structure usually means discussions are difficult to follow. Determining who made a comment and when can be very difficult. Because discussions will be an essential part of EvoPub and OSP, a much better mechanism will be integrated.

Unlike wikis, blogs were developed from the beginning to promote and organize discussion. Thus, instead of using wiki pages for discussion, each content page will be linked to a discussion page maintained as a blog entry. A RSS feed of such a blog entry also allows users to stay current with the collaborative discussion. The PIs have found that the blogging software Drupal is a good open-source solution; PIs Suo and Li have used it for their successful site iMechanica . A major effort will be to tightly integrate Drupal and MW so that there is a seamless user experience.

[edit] List of Participants

The five PIs on the proposal will work together to develop the prototype systems discussed. They each have unique skills and experiences that will be used here.

Eric Mockensturm, the project lead and coordinator, is an associate professor in the Department of Mechanical and Nuclear Engineering at the Pennsylvania State University. He has a background in theoretical and applied mechanics, and extensive experience developing web applications using Javascript, PHP, and MySQL (e.g. http://smallfeats.com). He will be the primary software developer on the project and coordinate the efforts of the co-PIs. He has been using the ‘cyberinfrastructure' to collaborate since, as a 12-year-old in 1982, he obtained an account on M-Net , “America's first public access UNIX system. Collaborating then was not a great deal different than collaborating now. One's ‘finger' profile (usually filled with ASCII art) has evolved into mySpace pages, ‘talk' has evolved into instant messaging, ‘mail' has evolved into email, and message boards have evolved into blogs.

Stuart Selber is an Associate Professor of English and Science, Technology, and Society; an Affiliate Associate Professor of Information Sciences and Technology; and Director of Composition at the Pennsylvania State University. His research interests include rhetorics of technology, social dimensions of human-computer interaction, politics of academic computing, technical communication, computers and composition. He will coordinate editorial aspects and organize an internship program from which we can draw editorial interns, who will also help organize and document the system.

Vincent Crespi is a professor in the Physics Department and the Department of Materials Science and Engineering at the Pennsylvania State University. He has a background in theoretical condensed matter physics and is the developer of many web sites including Number2.com, a site offering free standardized test preparation courses, and the Penn State Physics Department's web site. As an experienced publisher and web site developer, he and his students will help test early versions of the software, provide feedback and development. He will also help promote the site in the physics community.

Teng Li is an assistant professor in the Department of Mechanical Engineering at the University of Maryland. He has a background in solid mechanics. He and Suo are the two founders and architects of iMechanica.org. He is also the founder and editor of www.macroelectronics.org, a Web 2.0-enabled information platform for the emerging technology of flexible electronics. He will help develop a Drupal communication channel in iMechanica.org to serve as the colaborative discussion module for EvoPub and OSP.

Zhigang Suo is the Allen E. and Marilyn M. Puckett Professor of Mechanics and Materials in the School of Engineering and Applied Sciences at Harvard University. He has a background in solid mechanics and a member of the Executive Committee of the ASME Applied Mechanics Division; he is the main force behind iMechanica.org' success. Suo will solicit the online mechanics community through iMechanica to use and support EvoPub.

[edit] Summary

Proposed here are concepts we call Evolutionary Publishing (EvoPub) and Open Source Publishing (OSP), and the continued development of a system to enable it called Hub of Science and Engineering (HoSE). EvoPub will initially consist of curated living review articles based on both newly written and existing ones. By basing EvoPub on the system powering Wikipedia, these articles will continually evolve so that they never become dated and always present the current state of knowledge on a topic. OSP will consist of a more detailed layer below the living reviews. Here users will be able to post, discuss and edit complex technical documents using open-source web-based software. A layer of less detail above the living reviews will be developed for K-12 educators and students.

The software to enable EvoPub, HoSE, will consist of three components. A collaborative writing environment will consist of a highly customized version of MediaWiki. It will automatically typeset the document into a more readable format; number equations, figures, tables, and references; and generate cross-reference links for easy navigation. A collaborative coding environment will allow researchers to maintain and share source code with the community. A web interface builder will provide a simple mechanism for wrapping code in web-based interfaces. A collaborative discussion environment will be based on a highly customized version of the blogging software, Drupal. It will be tightly integrated into the collaborative writing and coding environments. The entire package will be bundled for use by any organization. Due to the scalability of the software used, these organizations could be anything from a small research group looking to more effectively collaborate to a large international society. The PIs will benchmark the software in a fast growing international online community of mechanics. A demonstration site containing this proposal, sample papers written by the PIs, and notes from a class on metamaterials generously donated by Graeme Milton and Biswajit Banerjee is online at http://dssl.mne.psu.edu/nsfcdi. Anyone can edit the content by clicking on the edit tab at the top of the page.

Intellectual merit. The intellectual merit of this proposal is twofold. First, we will demonstrate a platform that will make published knowledge constantly evolve and new ideas rapidly spread. Second, we will demonstrate a process to evolve knowledge by massive collaboration and computational thinking.

Broad impacts. The need to leverage new web tools to evolve knowledge exists in all disciplines. The platform developed and experience gained should be replicable to any discipline. The OSP enables students and young researchers to participate in the paper review process, a potentially valuable experience exclusively available to experienced researchers in the current publishing system.

Personal tools