Hapax Semantic Clustering

To show the generic nature of the approach, we apply it at different levels of abstraction on case-studies written indifferent languages.

1. In the first case-study we analyze the core and the plugins of a large framework, the Moose re-engineering environment. This experiment focuses on the relation between architecture and semantics. It reveals, among other findings, four cases of duplicated code and a core functionality misplaced in one of the plug-ins.

2. The second case-study is the class MSEModel, which is one of the largest classes in Moose. This experiment applies our approach on a different level of abstraction to focus on more in-detail findings. It visualizes the relationship among methods of a large class, and reveals that the class should be split as it servers at least two different purposes.

3. The third case-study, the JEdit open-source Java editor, focuses the relationships among classes and proves the strength of our approach in identifying and labeling semantic concepts.

The following table summarizes the problem size of each case study. It lists the number of documents and terms in the vector-space-model, and the rank to which the vector space has been broken down with LSI. Moose and JEdit use classes as input documents, and MSEModel uses methods.

Tags : , , , , , ,

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Leave Comment