Thursday, May 2, 2013

Eclipse plugin - Workspace Local Code Snippet Suggestions

Problem background

Today almost every software is built using existing frameworks/APIs. Most of framework libraries often provide a large number of classes and methods. Furthermore, libraries provided by different companies and different organizations follow different styles. Hence it’s always been a burden for the developers to learn new APIs for each new project that they work with. It’s a fact that the best documentation for a framework is the existing code itself. When working with new projects, the developers sometimes get stuck with a particular piece of code and even though they are certain that the other developers might have faced (and solved) the same problem previously, it is really hard to traverse through the entire source code to find out relevant pieces of code.
Full text search over source code isn’t really helping in these kind of situations as the target is to find out code snippets which have a particular set of structural characteristics. For an example a developer may need to search for a code snippet in which an object of particular data type (or several objects) is used and a particular method (or several methods) is called on that object.

A brief description about the Solution

By building an IDE extension which helps developers to easily find out code snippets which are in similar context in which the developer is currently involved in, it is able to increase the framework learning speed of the developer. Since full text search is not enough, the search engine should be able to capture the structural information of source code.
Furthermore it has to be fast enough to provide a comfortable experience to the developer. There should also be a good scoring system for the results so that the most important snippets come up in the list.

A brief description about available Code Search tools for Eclipse IDE

As discussed by (Böhm, 2012) , there are few tools available for Eclipse IDE which provide structural information based search over source code, namely Strathcona, Java Tools Language, Java Development Tools, JQuery ( not the JavaScript Library ) and CodeGenie. With all of them excluding Strathcona, developer has to manually query for data types, method signatures, etc. Some of them have disadvantages such as index is not updated frequently, index is hosted online and someone has to manually update the index, etc.
Considering the information being captured and the options been given for the search, JDT provides excellent tools for java source code searching. Its search index is saved on client side and updating the index happens upon saving a file. Below table represents the information captured by JDT Search Index. Source: (Böhm, 2012)

Type

Method

Field

Name
Modifiers
Implemented Types
Declaring Type
Type Arguments
Extended Types
Annotations
Fields written
Fields read
Used Types
Used Methods

Name
Modifiers
Declaring Type
Return Type
Parameter Types
Checked Exception Types
Annotations
Fields written
Fields read
Used Types
Used Methods

Name
Modifiers
Declaring Type
Type

 

Advantages of using the proposed solution over available tools

The proposed solution will use an indexed storage on client side as same as JDT hence getting results will be fast. Comparing to JDT indexing, the proposed search engine will be able to capture a lot more structural information of source code such as Try/catch blocs, variable usage, etc. (Please refer to the topic “Prototype Implementation” to view a full list of information being captured).
The proposed solution will detect developer’s current context automatically and provide a list of similar code snippets in a condensed snippet form. So the search query will be created automatically rather than requiring the developer to manually execute the queries. This a similar kind of approach as used in Strathcona, but comparing to the information captured by Strathcona, this plugin is far ahead.

Project Description

The objective of the project is to implement a plugin for the Eclipse IDE, which will suggest existing source code snippets to the developer depending on the context in which the developer is involved in.

Scope

Since the way of capturing structural information differs from language to language, this plugin will only target the projects which are built using Java language. Furthermore it will provide example code snippets only from the existing source code which is available in local workspace. Hence it will be more suitable for projects which already have enough amount of existing source code.

Features

The users will be able to enable the plugin for a particular project via the IDE preferences window so that it is able to enable the indexing of source code on demand, preventing unnecessary performance issues. The preference window will also provide additional options for the plugin.
The users will be able to get a list of suggested code snippets in a condensed snippet form so that they can quickly decide which snippets are worth a closer look. By clicking on a result, users will be able to open the source code file which contains the relevant code snippet, in a new editor view.

Prototype search engine Implementation

Please refer to this git repository of the prototype implementation. A prototype of a code search engine has already been developed as a Master's thesis by Tobias Böhm. It is implemented using Apache Lucene. His project’s main objective was to allow developers to execute search queries by using a query language called Lucene QL which is built upon Lucene Query Syntax.
The indexing mechanism implemented in the prototype is outstanding. This prototype is able to capture a large amount of structural information under five main entities data type, method, field, tryCatch block and Variable Usage. Below list is a full set of structural information captured by indexers of prototype implementation. Source: (Böhm, 2012)

Type

Method

Field

Try Catch

Type
Handle
FriendlyName
AllDeclaredMethodNames
DeclaredMethodNames
DeclaredFieldNames
AllDeclaredFieldNames
FullText
FieldsRead
Annotations
FullyQualifiedName
ImplementedTypes
ExtendedTypes
UsedTypes
InstanceofTypes
AllImplementedTypes
AllExtendedTypes
DeclaredFieldTypes
UsedMethods
OverriddenMethods
DeclaredMethods
ResourcePath
Timestamp
Modifiers
ProjectName

Type
Handle
FriendlyName
ReturnVariableExpressions
DeclaredFieldNames
AllDeclaredFieldNames
FullText
FieldsRead
FieldsWritten
ParameterTypesStructural
Annotations
FullyQualifiedName
UsedTypes
ParameterTypes
ReturnType
InstanceofTypes
DeclaredFieldTypes
DeclaringType
CheckedExceptions
UsedMethods
ResourcePath
ParameterCount
Timestamp
Modifiers
ProjectName

Type
Handle
FriendlyName
FullText
FullyQualifiedName
UsedTypes
FieldType
DeclaringType
ResourcePath
Timestamp
Modifiers
ProjectName

Type
DeclaredFieldNames
AllDeclaredFieldNames
FullText
FieldsRead
FieldsWritten
UsedFieldsInFinally
UsedFieldsInTry
FullyQualifiedName
UsedTypes
UsedTypesInTry
UsedTypesInFinally
InstanceofTypes
CaughtType
DeclaredFieldTypes
DeclaringType
UsedMethods
UsedMethodsInTry
UsedMethodsInFinally
ResourcePath
Timestamp
ProjectName

Variable usage

Type
Handle
VariableName
VariableType
DeclaringMethod
UsedAsParameterInMethods
UsedAsTargetForMethods
VariableDefinition

There’s also an extdoc provider named local code samples implemented using this search engine prototype. The local code samples extdoc provider is currently able to provide code snippets suggestions depending on variable type and called methods.
For an example, when a developer selects a variable name or a method call in the editor, list of code snippets in which a similar type of variable/ similar method call is used, will be displayed in the extdoc provider view given that the indexing of code base is completed.

Implementation of proposed solution

The prototype is using Apache Lucene to implement the search engine. Since Lucene is using an Inverted Index for the index storage, even though insertion may take some time, querying will be pretty much fast. Since viewing a list of condensed code snippets should be done in a short amount of time, it suits best for the local code search plugin.
Considering the fact that the above mentioned search engine does a great job with indexing and there is a good amount of code which is worth to keep, it make sense to improve the search engine and use it for the local code search plugin rather than building a search engine from the scratch.

Targetting Deliverables

As mentioned by Marcel (refer to this comment) the searcher and indexer do not work together very well. Sometimes accessing the search index fails. This need to be looked in deeper and fixed since it is the heart of everything. This will be a main task of the project.
The implemented extdoc provider already triggers search and displays results when a filed name or method call is selected. The extdoc provider can be extended to trigger search upon selection of many more structral elements in the editor. Search can be triggered upon selecting,
1. Data Types (upon selecting Extended, Implemented or Declared Types)
2. Return Types (find methods which returns similar types)
3. Overidden Methods
4. Annotations
5. Checked Exceptions
The representation of code snippet results in condensed form need improvements. Currently it only displays a simple code summery of each snippet.

This can be improved to provide more expressive summary so that the developers will be able to quickly decide which snippets are worth looking at. Below is a list of things which can be added to summary view so that it will be more expressive.
1. Containing Package
2. The Class name/ Method Name
3. Whether it is a abstract class/ Interface
4. Whether it is a abstract method/ overriden Method, overloads available, etc
5. May be diplay two or three surrounding code lines in muted text

Further possible improvements to prototype

Please refer to the bugzilla issue for more information.
As discussed by Tobias, there are several improvements which can be done for the indexing.
    • Structural information such as Class Castings and thrown exceptions are not captured by current indexers.
    • Improvements also can be done to indexing variable usage as currently it only captures information of variables which are declared inside the same block where it is used.
For an example, if a variable is declared as a class variable and used inside a method, usage of variable isn’t considered when the method is being indexed.
  • The current implementation doesn’t provide a custom ranking mechanism for search results.
    Scoring is all left to default Lucene scoring which uses a Vector Space Model (VSM). It is possible to affect default scoring of Lucene by implementing our own query classes. This will help to receive a search result in which the most relevant code snippets are on top of the list.
These improvements needs further discussions, specially implementing a scoring system.

Involved Technologies

  • Eclipse Plugin Development Environment ( PDE)
  • Eclipse Java Development Tool Kit ( JDT)
    • Eclipse Java model and Java Abstract Syntax Tree (AST)
  • Apache Lucene

References

Böhm, T., 2012. Searching Repositories in Eclipse. Finding Java Source Code using Expressive Code Search Query Languages
Bajracharya, S. et al., 2006. Sourcerer: A Search Engine for Open Source Code Supporting Structure-Based Search. ACM.
Bruch, M., Monperrus, M. & Mezini, M., 2009. Learning from examples to improve code completion systems. ACM.
Bruch, M., Schäfer, T. & Mezini, M., 2008. On Evaluating Recommender Systems for API Usages. ACM.
Clayberg, E. & Rubel, D., 2008. Eclipse Plugins. 3rd ed. s.l.:Addison-Wesley Professional.
Holmes, R., 2002. Using Structural Context to Recommend Source Code Examples. IEEE.
Lazzarini Lemos, O. A., Bajracharya, S. & Ossher, J., 2007. CodeGenie:: a tool for test-driven source code search. ACM.
Panchenko, O., 2009. Hybrid Storage for Enabling Fully-Featured Text Search and Fine-Grained Structural Search over Source Code. ACM.
Paul, S. & Prakash, A., 1994. A Framework for Source Code Using Program Patterns. IEEE.
Zhong, H. et al., 2009. MAPO: Mining and Recommending API Usage Patterns. ACM

Friday, July 27, 2012

GSOC - After the mid term evaluations...

GSOC mid term evaluations was ended few days ago. Well, I am happy with what I have accomplished with the project. First of all here is the link to the code of newly implemented "Media" package for Joomla platform, available in my github.

"Media" seemed very reasonable as the name of the package as it does things for assets and considering about future improvements that can be done. And the name originally was suggested by many community members in their feedback. So that's how the name is found.

Under media sub package you will find JCompressor and JCombiner abstract classes. And the compressor and combiner folders contain the extended compressor and combiner classes. The structure of these classes are implemented in a similar way as it is done with database classes.

JCombiner and JCompressor classes contains getInstance($options) static method. In options developer can pass the value for 'type' property. If $options[ 'type' ] == 'js', it will return the relevant object for javascripts.

And also those two classes contain getCompressors() and getCombiners() static methods which will return an array of available compressors and combiners. Below is an example of using a compressor.

        $compressor = JMediaCompressor::getInstance(array('type' => 'css'));
       
        $compressor->setOptions($options);//Or you can pass options in above statement
       
        $uncompressed= JPATH_SITE.'/tests/media/compress/css/layout.css';
       
        $compressor->setUncompressed(JFile::read($uncompressed));
       
        $compressor->compress();
       
        $compressed = $compressor->getCompressed()

In combiner classes there is method called setSources($files). This will set the sources files to combine and calling combine() and getCombined() will do the job.

In the JCompressor and JCombiner classes there are  static methods called compressFile(), compressString() and combineFiles() as alternatives to above approach.

And also there are static methods called isSupported() which will check a file for compability and getCompressors()/getCombiners() which will return an array of available compressors/combiners.

The structure of these classes are implemented in a way to support future addition of compressors/combiners for different file types.

Options available for compressors.

TYPE, REMOVE_COMMENTS , MIN_COLOR_CODES, LIMIT_LINE_LENGTH etc.

REMOVE_COMMENTS is to tell whether to keep or remove comments. Default is true.
MIN_COLOR_CODES is to tell whether to minimize html color codes in a css file.
LIMIT_LINE_LENGHT is to tell whether to break down compressed string to several lines.

with compressFile() method you have two extra options called PREFIX and OVERWRITE. PREFIX to define the file prefix for the compressed file name.

Options available for combiners.

TYPE, COMPRESS, COMPRESS_OPTIONS and FILE_COMMENTS.

COMPRESS will determine whether to compress each source file or not, before combining them. COMPRESS_OPTIONS will be the options for the compressor if you are interested in compressing also.
FILE_COMMENTS is to tell whether to add comments to the combined file regarding the start and end points of each separate file combined.
And also there are two extra options for combineFiles() static method called PREFIX and OVERWRITE which are same as in compressor.

Unit tests for the API

Unit tests for the compressors and combiners are not completely implemented yet. I am hoping to complete them soon because the function freeze for Joomla 3.0 is due in few days. So I need to hurry to do it soon to be able to get this API included in Joomla 3.0. Unit tests for the media package can be found here. Please note that they are not completed yet.

Invoking Globally

When considering about the invoking the API globally, JDocumentRendererHead class seemed to be the best place to interact with. It does the job of rendering the head section of output html docs.
The CSS and javascript  files/declarations reregistered with native methods such as addStyleSheet(), AddScript()... seemed to be very easy to catch up from there.

We can have a global configuration options to determine whether compression or combination of assets are allowed. This part is very important when considering about the CMS integration.

 

Monday, April 2, 2012

Google Summer of Code 2012 with Joomla


A Joomla Platform Library to compile and compress Javascript and CSS files

Today, many of web applications have a large amount of JavaScript and CSS files. For the reason of increased number of web applications and their rich content interactions, developers expect their applications to increasingly make use of CSS and JavaScript. Because these files determine how the page will be displayed on client browser, rendering the page in the browser is delayed until CSS and JavaScript files are completely downloaded. So it will take more time to render a web page, as the number of JavaScript and CSS files increases.
Why it takes that much time? Server generates a page in milliseconds. So that shouldn’t be the problem. Problem seems to be a combination of two things. Firstly, the JavaScript and CSS files with a lots of comment blocks and unnecessary spaces, new lines and longer variable names, which are meant for the developers but not for the end users, will increase the file size and hence the download time will be increased. Secondly, the large number of separate JavaScript/CSS files for a page, which cause one extra round trip per file to the server and back, will increase the number of server requests. So as the number of server requests increases, the page’s load time will be increased. To develop web applications which are nice and snappy to use, developers need to optimize the size and the amount of JavaScript/CSS files and make sure end users are having the optimum experience.
The goal of this project is to come up with a library for Joomla Platform which will facilitate compressing and combining of JavaScript and CSS files before they are sent to the client browsers. So developers can keep their well structured, commented and easy to read files aside and give compressed files which are optimized for speed downloading, to be served to the end users. And even more they will be able to send a single combined JavaScript/CSS file to render the page instead of sending a number of separate files. So developers will be able to reduce the page download time as well as to save their valuable bandwidth. This will benefit users with faster page loads and webmasters with reduced bandwidth usage.

Project Scope
The scope of the project is to implement a library for Joomla platform with following functionality. It will contain classes to compress JavaScript/CSS files using an algorithm, which will compress the contents of files with securing the output of the code and also classes which will do combining several JavaScript/ CSS files to a single compressed file thus reducing the bandwidth usage of web application. Each class will provide several functions to achieve above mentioned goals.

Research
There are several compression tools available to compress javascript and css files such as JShrink, JsMinifier, JsMin, YUI compressor, mrclay-minify, UglifyJS and csstidy, etc. They are implemented differently and in various platforms such as php and javascript. Some of them have drawbacks such as string variables and keywords are not guaranteed to be the same after compression. And also some of them may crash when processing a large amount of data. Below I have summarized the best steps used to compress a file in above mentioned tools. Most of tools mentioned above have not been updated in several years. So there may be better ways to do it. I have implemented a library for Joomla Platform and for now, I am using these steps to compress files and I am looking forward for feedback from mentors and community members to find out better ways of doing it.

Compressing JavaScript files
As a summary, we can compress a JavaScript by removing all comments, unnecessary white-space and shortening all local variable names to a single character. We can achieve this using several steps.
  1. Backup Strings – Remove all strings from the JavaScript and store to reinsert later.
  2. Remove Comments – Remove all comment blocks from the script.
  3. Rename local variables – Replace all local variable names with a single character.
  4. Preserve Keywords – Before removing whitespaces, backup Keywords in script that requires whitespaces and replace them with identifiers. (Eg: var, return, function… need to be followed by a space)
  5. Remove unnecessary white space – Remove unnecessary spaces including tabs and new lines.
  6. Restore keywords – Replace which were backed up in 4th step, with identifiers.
  7. Restore Strings – Restore all strings which were backed up in 1st step.
Source : http://www.dynamic-tools.net/toolbox/javascript_compressor/

Compressing CSS files
Compressing a CSS file can be done in several steps.
  1. Remove Comments.
  2. Remove unnecessary new lines and semi colons.
  3. Remove 1st digit of floats which are <1 (Eg: 0.56 -> .56)
  4. Remove units of measure from zero values. (Eg: margin:0px -> margin:0)
  5. Replace "none" with 0. (Eg: border:none -> border:0)
  6. Remove unnecessary white space – Including tabs and spaces surrounding {}:; characters.
  7. Compress HTML color codes if possible (Eg: #FF0088 -> #F08).
Source : http://developer.yahoo.com/yui/compressor/css.html

And also CSSTidy tool has an extra compression step as shown below.

a{margin-top:10px; margin-bottom:10px; margin-left:10px; margin-right:10px;}

CSSTidy convert that to   a{margin:10px;}

source : http://www.if-not-true-then-false.com/2009/css-compression-with-own-php-class-vs-csstidy/

Combining Files
Combining files will facilitate reducing bandwidth usage by transferring a minimum number of files by combining several javascript/css files in to a single .js or .css file. JavaScript/CSS files can be compressed before combining in to a single file (compressing a file after combining will make problems as the large file size will causes errors as a result of PHP memory limitations). So files will be very much optimized for a speedy download.
source : http://rakaz.nl/2006/12/make-your-pages-load-faster-by-combining-and-compressing-javascript-and-css-files.html

Implementation
I have already implemented a prototype library with several classes with a number of working functions. These classes are declared in a way to support class auto loading feature of Joomla Platform.
class_diagram

Class Functions*
JCompress compressFile()
writeFile()
compressCss()
compressJs()
isSupported()
JCompressMinifier (Interface) getInstance()
setUncompressed()
setOptions()
compress()
getCompressed()
getRatio()
JCompressCssMinifier getInstance()
setUncompressed()
setOptions()
compress()
getCompressed()
printCompressed()
getRatio()
JCompressJavascriptMinifier getInstance()
setUncompressed()
setOptions()
compress()
preserveStrings()
preserveKeywords()
restoreStrings()
restoreKeywords()
getCompressed()
getRatio()
JCompressCombine
JCompressCombiner (Interface)
JCompressCssCombiner
JCompressJavascriptCombiner
JCompressException

*Function parameters are not presented here.

I have implemented the static function compressFile() inside JCompress class. It will take three arguments sourcefilepath, destination file path and options. Below is an example of using JCompress::compressFile() function.
$source = JPATH_SITE.’/css/layout.css’;

$destination = JPATH_SITE.’/css/layout_compressed.css’;

$options = array(“overwrite” => true);

JCompress::compressFile($source, $destination, $options);

This function will make a relevent JCompressMinifier object depending on the file type and will compress the code and write it to the destination file. I have implemented some steps mentioned above under Compressing CSS files, in the JCompressCssMinifier class and Removing comments step in JCompressJavascriptMinifier class. I have tested JCompress::compressFile() function with layout.css file in beez_20 template for joomla CMS 2.5. The file size was reduced by 10KBs (30KB -> 20KB) and the compressed css file still gives the same output. With the support of the mentor, I will be able to develop a more advanced algorithm to achieve a higher compression ratio.

I haven't yet implemented functions for the combiner classes. I have done a research on combining JavaScript/CSS files effectively and I am looking forward to implement several working functions as soon as possible. I will push my work on combiner classes to my git hub repository of joomla platform which can be found at the link mentioned below.

There are several other working functions implemented already in some classes mentioned above.The implementation of the library is available here, in my Joomla Platform repository on github. This is just a prototype library and can be changed upon mentor’s concern.

Basic Approach of using this library

( According to the feedback given by community members for my post in platform dev mailing list and reviews on this proposal, I have understood that this approach has several drawbacks. I have suggested a better approach according to their ideas under topic - better approach)

This describes how compression library can be used. Developers can compress a js/css using JCompress::compressFile() method and use url of compressed_file when using addScript() and addStyleSheet() methods on JDocument object. Below is an example of that.

class TestApp extends JApplicationWeb{
...
$original = JPATH_SITE.'/css/main.css';
$compressed = JPATH_SITE.'/css/main_compressed.css';
JCompress::compressFile($original, $compressed, array("overwirte" => true) );
$this->document->addStyleSheet($compressed_file);
...
}

Or they can use JCompress::compressCss() / JCompress::compressJs() with addStyleDeclaration() / addScriptDeclaration() methods of JDocument object if they use small stylesheets/javascripts and prefer not to write files. Below is an example of that.

...
$this->document->addStyleDeclaration(JCompress::compressCss($uncompressed_code));
....

Better improved Approach of using Library

According to the feedback given by community members for my post in platform dev mailing list and reviews on this proposal, I have understood that previous mentioned approach has several drawbacks. Compressing/combining javascripts/css automatically will benefit in many ways. So rather than using above mentioned approach I suggest following approach for using this library.

1. Automatically compress/combine assets added to JDocument via native methods such as addScript() and addStyleSheet() by invoking a global function. We can use a configuration setting in global config to enable or disable this functionality upon developers choice. We can think of achieving this by implementing some override mechanism like the assets will be compressed/combined and registered in JDocument object when registering via native methods. For an example we can expand addScript() with additional parameters to enable/disable compression like addScript(.., $compress = true). Or we can get all registered assets from the fields such as _scripts, _styleSheets of JDocument object upon execution and compress/combine them before they are sent to client browser. In that case, we can think of overriding the JDocumentRenderHead::render() method. I am doing a research on this and I will be able to figure out the better way of doing it with the support of mentors.

2. Some times, there are assets that are loaded with traditional html ways (using link/scripts tags) rather than using framework methods. An example can be found here in index.php file of beez_20 template for Joomla CMS. Implementing functionality to automatically compress/combine these assets requires more research on this. I am looking forward it. We can think of parsing html output when rendering a page and setting a global configuration to enable/disable. An alternative solution would be keeping previously mentioned basic approach also enabled since developer is avoiding native methods and using html ways, he/she will need to manually use this library with the above mentioned basic approach. But this needs, re-coding of files/extensions which are already implemented, which is not a good thing.

3. There are some special cases, where scripts are loaded at the end of the document. We need to consider about them also.

As a summary, this approach will be more suitable for using the library.Please note that these approaches may change with ideas of mentors. I am going to do researches for implementing this approach. But before application  deadline, I haven't got enough time to come up with an idea how exactly it can be done. I will push my work to my github repository of joomla platform as I progress.

Caching files

Caching of combined/compressed files will increase the performance of the web application. Since different pages use different assets this will be little bit difficult to implement. I am looking forward to mentors' idea towards this.

Integrating with CMS

We can integrate CMS back end configurations with the library to decide which files to compress/combine, etc. And also we can use a global configuration to enable or disable compressing/combining by default.

Benefits over Extensions

Being a platform library, this will facilitate compressing/combining as a core platform functionality rather than through a extension. This will benefit both CMS and Platform developers when building their own web applications and will be available by default.

Problems to face
PHP memory limitations will cause errors when processing large files. As a solution, we can process a part of the file at a time.