changes for jar

This commit is contained in:
fixminer
2020-04-07 18:06:53 +02:00
parent c5463f91f8
commit 40a8b0c290
57 changed files with 1180 additions and 301 deletions
+79 -72
View File
@@ -24,7 +24,7 @@ Fixminer is a systematic and automated approach to mine relevant and actionable
## II. Environment setup
* OS: macOS Mojave (10.14.3)
* JDK8: Oracle jdk1.8 (**important!**)
* JDK8: (**important!**)
* Download and configure Anaconda
* Create an python environment using the [environment file](environment.yml)
```powershell
@@ -32,7 +32,12 @@ Fixminer is a systematic and automated approach to mine relevant and actionable
```
* After creating the environment, activate it. It is containing necessary dependencies for redis, and python.
```powershell
source activate redisEnv
source activate fixminerEnv
```
* Update the config.yml file with the corresponding paths in your computer. An example config.yml file could be found under
```powershell
fixminer_source/src/main/resources/config.yml
```
<!---
[fixminer.sh](python/fixminer.sh)
@@ -45,95 +50,52 @@ In order to launch FixMiner, execute [fixminer.sh](python/fixminer.sh)
bash fixminer.sh /Users/..../enhancedASTDiff/python/ stats
--->
## III. Replication Data
Replication Data:
[singleBR.pickle](python/data/singleBR.pickle)
This pickle contains the list bug reports (i.e. bid) with the their corresponding fixes (i.e. commit) for each project in the dataset (i.e. project).
[bugReports.7z.00X](python/data/bugReports.7z.001)
This is the dump of the bug reports archive extracted from each commit. These bug reports are not necessarily considered as BUG,CLOSED; this archive is the contins initial bug reports before identifying the fixes.
[gumInput.7z.001](python/data/gumInput.7z.001)
This archive contains all the patches in our dataset, formatted in a way that can be processed by GumTree (i.e DiffEntries, prevFiles, revFiles)
[ALLbugReportsComplete.pickle](python/data/ALLbugReportsComplete.pickle)
The pickle object that represents the bug reports under the following columns 'bugReport', 'summary', 'description', 'created', 'updated', 'resolved', 'reporterDN', 'reporterEmail','hasAttachment', 'attachmentTime', 'hasPR', 'commentsCount'
#### Data Viewer
The data provided with replication package is listed in directory [python/data](python/data)
The data is stored in different formats. (e.g. pickle, db, csv, etc..)
The see content of the .pickle file the following script could be used.
```python
import pickle as p
import gzip
def load_zipped_pickle(filename):
with gzip.open(filename, 'rb') as f:
loaded_object = p.load(f)
return loaded_object
```
Usage
```python
result = load_zipped_pickle('code/LANGbugReportsComplete.pickle')
# Result is pandas object which can be exported to several formats
# Details on how to export is listed in offical library documentation
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
```
## IV. Step-by-Step execution
#### Before running
* Update [config file](python/config.yml) with corresponding user paths.
* Update [config file](src/main/resources/config.yml) with corresponding user paths.
* Active the conda environment from shell
```powershell
source activate redisEnv
source activate fixminerEnv
```
In order to launch FixMiner, execute [fixminer.sh](python/fixminer.sh)
bash fixminer.sh [PATH_TO_PYTHON_FOLDER] [OPTIONS]
e.g. bash fixminer.sh Users/fixminer-core/python/ stats
#### Running Options
bash fixminer.sh [JOB] [CONFIG_FILE]
e.g. bash fixminer.sh dataset4c /Users/projects/release/fixminer_source/src/main/resources/config.yml
*FixMiner* needs to specify an option to run.
#### Job Types
1. 'dataset': Create a mining dataset from the project speficied in [subjects.csv](python/data/subjects.csv)
Eventually dataset option is the execution of the following steps, which are merged under 'dataset' option
for this demo. Eventually single options can be activated by commenting out the corresponding option in [main.py](python/main.py)
*FixMiner* needs to specify a job to run.
`clone` : Clone target project repository.
`collect` : Collect all commit from repository.
`fix` : Collect commits linked to a bug report.
`bugPoints` : Identify the snapshot of the repository before the bug fixing commit introducted.
`brDownload` : Download bug reports recovered from commit log
`brParser` : Parse bug reports to select the bug report where type labelled as BUG and status as RESOLVED or CLOSED
1. __dataset4j__ / __dataset4c__: Create a java/c mining dataset from the projects listed in [subjects.csv](python/data/subjects.csv) or [datasets.csv](python/data/datasets.csv) for c
2. 'richEditScript': Rich edit script computation step.
2. __richEditScript__: Calls the jar file produced as the results as maven package to compute Rich edit scripts.
This step can be invoke natively from java or using the [Launcher](src/main/java/edu/lu/uni/serval/richedit/Launcher.java) with appropriate arguments.
```powershell
java -jar FixPatternMiner-1.0.0-jar-with-dependencies.jar /Users/projects/release/fixminer_source/src/main/resources/config.yml RICHEDITSCRIPT
```
3. 'shapeSI': Search index creation for shapes. The output of this step is written to [pairs](python/data/pairs)
3. __shapeSI__: Search index creation for shapes. The output of this step is written to __pairs__ which will be generated under __datapath__ in [config file](src/main/resources/config.yml)
4. 'compareShapes' : ShapeTree comparison
4. __compareShapes__ : Calls the jar file produced as the results as maven package to compare the trees.
This step can be invoke natively from java or using the [Launcher](src/main/java/edu/lu/uni/serval/richedit/Launcher.java) with appropriate arguments.
```powershell
java -jar FixPatternMiner-1.0.0-jar-with-dependencies.jar /Users/projects/release/fixminer_source/src/main/resources/config.yml COMPARETREES
```
5. 'cluster': Forms clusters of identical shapetree. The output of this step is written to [shapes](python/data/shapes)
5. 'cluster': Forms clusters of identical shapetree. The output of this step is written to [shapes](python/data/shapes)
6. 'actionSI': Search index creation for actions. The output of this step is written to [pairs](python/data/pairsAction)
@@ -233,6 +195,51 @@ It is necessary to run the FixMiner, following the order.
-->
## III. Replication Data
Replication Data:
[singleBR.pickle](python/data/singleBR.pickle)
This pickle contains the list bug reports (i.e. bid) with the their corresponding fixes (i.e. commit) for each project in the dataset (i.e. project).
[bugReports.7z.00X](python/data/bugReports.7z.001)
This is the dump of the bug reports archive extracted from each commit. These bug reports are not necessarily considered as BUG,CLOSED; this archive is the contins initial bug reports before identifying the fixes.
[gumInput.7z.001](python/data/gumInput.7z.001)
This archive contains all the patches in our dataset, formatted in a way that can be processed by GumTree (i.e DiffEntries, prevFiles, revFiles)
[ALLbugReportsComplete.pickle](python/data/ALLbugReportsComplete.pickle)
The pickle object that represents the bug reports under the following columns 'bugReport', 'summary', 'description', 'created', 'updated', 'resolved', 'reporterDN', 'reporterEmail','hasAttachment', 'attachmentTime', 'hasPR', 'commentsCount'
#### Data Viewer
The data provided with replication package is listed in directory [python/data](python/data)
The data is stored in different formats. (e.g. pickle, db, csv, etc..)
The see content of the .pickle file the following script could be used.
```python
import pickle as p
import gzip
def load_zipped_pickle(filename):
with gzip.open(filename, 'rb') as f:
loaded_object = p.load(f)
return loaded_object
```
Usage
```python
result = load_zipped_pickle('code/LANGbugReportsComplete.pickle')
# Result is pandas object which can be exported to several formats
# Details on how to export is listed in offical library documentation
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
```