TY - GEN
T1 - Summarizing developer work history using time series segmentation
AU - Siy, Harvey
AU - Chundi, Parvathi
AU - Subramaniam, Mahadevan
PY - 2008
Y1 - 2008
N2 - Temporal segmentation partitions time series data with the intent of producing more homogeneous segments. It is a technique used to preprocess data so that subsequent time series analysis on individual segments can detect trends that may not be evident when performing time series analysis on the entire dataset. This technique allows data miners to partition a large dataset without making any assumption of periodicity or aan other a priori knowdedge of the dataset's features. We investigate the insights that can be gained from the application of time series segmentation to software version repositories. Software version repositories from large projects contain on the order of hundreds of thousands of timestamped entries or more. It is a continuing challenge to aggregate such data so that noise is reduced and important characteristics are brought out. In this paper, we present a way to summarize developer work history in terms of the files they have modified over time by segmenting the CVS change data of individual Eclipse developers. We show that the files they modify tends to change significantly over time though most of them tend to work within the same directories.
AB - Temporal segmentation partitions time series data with the intent of producing more homogeneous segments. It is a technique used to preprocess data so that subsequent time series analysis on individual segments can detect trends that may not be evident when performing time series analysis on the entire dataset. This technique allows data miners to partition a large dataset without making any assumption of periodicity or aan other a priori knowdedge of the dataset's features. We investigate the insights that can be gained from the application of time series segmentation to software version repositories. Software version repositories from large projects contain on the order of hundreds of thousands of timestamped entries or more. It is a continuing challenge to aggregate such data so that noise is reduced and important characteristics are brought out. In this paper, we present a way to summarize developer work history in terms of the files they have modified over time by segmenting the CVS change data of individual Eclipse developers. We show that the files they modify tends to change significantly over time though most of them tend to work within the same directories.
KW - Mining software repositories
KW - Open source
KW - Temporal segmentation
KW - Time series
UR - http://www.scopus.com/inward/record.url?scp=57049127008&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=57049127008&partnerID=8YFLogxK
U2 - 10.1145/1370750.1370784
DO - 10.1145/1370750.1370784
M3 - Conference contribution
AN - SCOPUS:57049127008
SN - 9781605580241
T3 - Proceedings - International Conference on Software Engineering
SP - 137
EP - 140
BT - 30th International Conference on Software Engineering, ICSE 2008 - 2008 International Working Conference on Mining Software Repositories, MSR'08
PB - IEEE Computer Society
T2 - 30th International Conference on Software Engineering, ICSE 2008 - 2008 International Working Conference on Mining Software Repositories, MSR'08
Y2 - 10 May 2008 through 11 May 2008
ER -