User Guide

Installation:

Installation instructions are now on a new page.

Concepts:

The Laws of Refactoring
Steps in a Refactoring Operation
Refactoring Success & Failure
Code Coverage Policy

Commands:

help
show-config
records
show-editable
find-replace
preg-replace
rename-file
rename-class
rename-method
scripting
no-tests option

The Laws of Refactoring

Automated refactoring tools require a great deal of trust from the end user. I feel like I'm asking to borrow your shiny new car for the weekend. If I don't hand it back intact there's going to be trouble.

Perhaps there are three main Laws for a program like Rephactor:

1. All code changes should be clearly announced.

In other words, no surprises. A user must easily be able to identify sections of code changed by the refactoring.

2. All code changes should be reversible.

Code is complex stuff and it's difficult to cover every possible edge case. If things go wrong users must be able to revert the changes made by the refactoring and these must be isolated in a transaction so that other work is not lost.

3. Detect any loss of functionality.

Rephactor relies upon the project's automated test suite to verify changes made by a refactoring op. Tests are run both for the primary target and its client apps. With no test suite(s), it is not possible to verify if changed code still works correctly.

Rephactor Overview

A refactoring is applied both to a primary target and to any other codebases on the local machine which may be clients of the primary target. For example, you may have a core library which is used in several other projects. When the library is edited, its clients may also need to be updated, depending on the nature of the change.

For each working copy touched by a refactoring operation, Rephactor will try to:

sync (svn update & svn commit)
run all tests
refactor
run all tests again

Just follow the prompts.

The initial "sync" (an svn update & svn commit) is necessary to isolate a refactoring transaction. Internally, an svn revert is used to roll back and hence, without the new revision, uncommitted changes made prior to the refactoring op would be lost when a refactoring is rolled back.

When the refactoring is complete, you'll have an option to save or revert the changes. You'll get an alert if the refactoring broke something and you can view diffs to see what changes were made.

Note that, when you choose the revert option after a failed refactoring, all working copies are rolled back including those where the refactoring might have succeeded. If a file in a Foo library was successfully renamed but then the rename failed in app Bar which uses the Foo library, Bar is now broken. Both Foo and Bar must be reverted.

You can also choose to save a failed refactoring (all changes in all working copies will be kept). If it's nearly there it might not be too difficult to finish it off by hand.

You can set policies for files which should never be edited and dirs which should never be followed by Rephactor's editing tools (see rephactor records). For example, with a subversion working copy, you definitely want /\.svn/ in the never-follow list.. ;)

Be careful if you have files open in an editor while you work with Rephactor. Some editors will not automatically refresh when files are altered, and changes made by Rephactor obviously won't be visible until you close and re-open. Worse, they could be overwritten when you click on save.

Refactoring Success & Failure

Success or failure is determined by running all tests before the refactoring, running them again after, and then comparing the results. Running all tests twice for each working copy will obviously take some time.

You may be wondering why not just run tests once, after refactoring, and check for zero failures? Often there will be one or two failing tests in a project which persist for a little while until the issue is dealt with. Meantime you might want to continue working — and refactoring — in other areas.

Results are compared according to a refactoring success policy. The out-of-the-box default, IdenticalTestReporterOutput, simply does a diff -qr comparison of the pre/post results. This is very sensitive. For example, script timer output, if present, might be 35s on one run and then 36s on the next. This would be interpreted as a failed refactoring even if there were no other differences and all tests are passing. However, the success policy is injectable and hence it's fairly easy to define other behaviours if you wish*.

It could be that a refactoring op is the solution to fix some failing tests. Note that the default success policy will always interpret this as a failure since the pre/post test results are not identical. Clearly I need to have another look at this. The project is still at an early stage and the general development style is to explore the broad scope of the problem using simplified components and then flesh out the details later.

If you really want to, tests can be turned off with the "--no-tests" option.

*See rephactor/lib/success.php. Policies implement RefactoringSuccessPolicy and are registered in rephactor/dic-conf/all-refactorings.php.

Code Coverage Policy

Fatal errors or skipped tests leave large chunks of code untested. If a refactoring op were allowed to proceed it could break something in one of these testing blind spots but Rephactor (which relies on test results to verify refactoring success) wouldn't know anything about it. This breaks the third law, above: "detect any loss of functionality". Hence a code coverage policy tries to check if any tests failed to run. No edits will be made if the policy is not satisfied.

Note that I'm not interested in the absolute code coverage here ie the 95% or whatever statistic you would get from a fully-executed all-tests script: I only want to know if some of those 95% tests failed to run. Rephactor won't force you to write a minimum number of tests.

The current default, StdCodeCoveragePolicy, simply checks for fatal / parse errors. These can obviously knock a large hole in the code coverage.

I haven't quite figured out what to do with skips yet. Suppose a required service such as mysql isn't running, and this triggers a "no ping" skip condition. The full test suite can't be run, the refactoring can't be verified, and hence Rephactor should abort. On the other hand you might have some windows-only tests set to skip when running in a nix environment (and vice versa). In this case you'd want to ignore the skip messages and continue with the refactoring. All the tests which should be run on the current OS are being run; nothing was missed out.

The code coverage policy is injectable to allow users to tweak the behaviour. Policies should implement the CodeCoveragePolicy interface (see rephactor/lib/success.php). Implementations are registered in rephactor/dic-conf/all-refactorings.php. A script object (the all-tests script) is passed in to a policy instance at instantiation. Methods: getStdout() and getStderr() provide the script output which you can then check according to your needs.

Note that, in working copy records, you have an option to specify services which must be present for an all-tests run (for example a database server). If a required service isn't running Rephactor will abort. This is an additional check separate to the code coverage policy. However, it's likely that this will be removed at some point. The test framework's skip mechanism should probably be used instead, alongside a skip-aware code coverage policy.

Rephactor Commands

help

$ rephactor help

Unrecognised commands will also produce the help screen.

show-config

Display Rephactor configuration settings (defined in rephactor/config.php).

$ rephactor show-config

records

Once Rephactor is installed, you need to tell it about your projects. A record needs to be created for each working copy which you might want to refactor. Enter this command and follow the prompts:

$ rephactor records

If you plan to run Rephactor scripts you'll also need to create records for remote libraries. If that doesn't mean anything to you, just ignore it for now.

show-editable

When you create a record for a working copy, you can specify patterns for dirs which should never be followed and files which should never be edited. This creates a policy which limits the scope of all refactoring operations. You can preview editable files with the following command:

$ rephactor foo show-editable

You must add /\.svn/ for a subversion working copy otherwise a refactoring op will also edit the .svn files. That's bad.

Later, I'll look at auto-detecting the version control system.

find-replace

A simple, case-sensitive find-replace.

$ rephactor foo find-replace "replace this" "with this"

This is based on the str_replace() function (ie case-sensitive). Be sure you really do want to make the specified change. Anything, anywhere containing the $from string will be changed regardless of context.

Remember that the shell will remove double or single quote delimiters on a string arg. For example, [ "a phrase" ] will turn up in argv as: [ a phrase ]. If you want to find-replace a double-quoted string including the quotes enter: [ '"a phrase"' ] and vice versa for single quotes.
"$" should be escaped unless they are inside a single-quoted string eg:

# rephactor myapp find-replace \$foo \$bar
# rephactor myapp find-replace "\$foo" "\$bar"
# rephactor myapp find-replace '$foo' '$bar'

preg-replace

Pretty straightforward. You need delimiters for the pattern arg. Escape special regex chars as normal.

$ rephactor foo preg-replace /perl-regex/i "replacement string"

Behind the scenes a preg_replace() is applied to each and every working copy file. Be sure you really do want to make the specified change. Anything, anywhere containing the $from pattern will be changed regardless of context.

rename-file

$ rephactor foo rename-file a/b/from-this.php c/to-that.php

The from/to paths should either be (a) relative to your current working directory or (b) absolute paths.

Identifying paths in php code is not always easy. Say a variable is passed to fopen(): Rephactor would have to read back through whichever scopes the var is passed to find where it's declared.

There are all kinds of issues really but, for now, I'm ignoring them all pending development of some code analysis tools. Currently Rephactor will only deal with the following types of paths:

Includes, requires, and _onces where the path arg is a quoted string and is found in the include_path eg:

<?php require_once('simpletest/mock_objects.php')
dirname(__FILE__) type paths:
<?php dirname(__FILE__) . '/style/paths'
Path token concatenated with string:

<?php FOO_LIB . 'path/to/file' $this->registry->get('foo_lib') . 'path/to/file'
A path token can be a constant or any user-defined php expression. Tokens must always evaluate to the same path, and this should be an absolute path.

Currently rename-file does not support autoload-friendly Class_Names_Which_Map_To_A_File_System_Path. The file will be renamed but the class name will not be. I'm not keen to add this feature since I would feel like I'm supporting a bad practice.

rename-class

$ rephactor foo rename-class OldName NewName

rename a class, abstract class, or interface
rename all references to the class name in php code (inc hints, implements, extends, etc)
rename a namesake php file, if it exists ie OldName.php --> NewName.php
also tries to update SimpleTest tests containing mock objects: "MockOldName" will be changed to "MockNewName", and "PartialOldName" will be changed to "PartialNewName"

The current implementation of rename-class is rather crude. A variety of regex patterns are used to attempt to identify class contexts but this won't work perfectly. Quoted string parameters passed to any method which match the OldName will be identified as class names and replaced. Since class names are often lexically unusual the risk of false positives is significantly reduced but it does exist. For example, suppose the refactoring is rename-class Foo to Bar. All the Foos below will be replaced with Bars:


<?php    

        class_exists('Foo');          // OK

        fopen('Foo', 'a');            // false positive

To do more would require writing some pretty hefty introspection tools. This is something I hope to look at later — indeed it will be essential for a rename-method.

Note that class names in documentation files will not be replaced by the rename-class refactoring. This will only edit php code. You could consider falling back to a find-replace to deal with the docs but only if class names are lexically unique and only if you don't have a namesake file. A find-replace will alter include paths in php code but not the namesake file name and this will break the app. Only rename-class will update a namesake file.

The fixture files rename-class-hotlinked.php and rename-class-target.php should explain what behaviour is currently being asserted. If tests are passing, all "%1$s" are being successfully replaced.

The file rename-class-false-positives.php illustrates the false positives problem mentioned above. The "Foo" strings should not be replaced by a "rename-class Foo Bar" refactoring but at the time of writing this isn't part of the test suite and they are. You have been warned.

rename-method

$ rephactor foo rename-method ClassOrInterface::oldMethodName newMethodName

Suppose you have a Policy interface with a pass() method. You want to change this to isValid(). Perhaps there are dozens of pass() methods throughout the codebase, and a range of associated meanings in addition to the sense of "pass a test" (ie satisfy the policy). You can pass the parcel (transfer). You can make a pass (flirt). You can pass on something (decline) and you can head someone off at the pass (a bottleneck). How do we pick out the classes we want to change?

This brings us to the concept of a class "family". The simplest example of a family would be a single inheritance hierarchy. If you rename a method in one class you must obviously do the same to any other class in the hierarchy which implements the renamable method. Methods always mean the same thing within a hierarchy.

However, a family isn't just an inheritance hierarchy. Different hierarchies can share common interfaces and concrete implementations of the interface methods will all mean the same thing in the different hierarchies.

Thus, when Rephactor applies a rename-method, it will attempt to identify class families by looking for linking interfaces containing the renamable method. It's up to you to add these where appropriate.

Note the command syntax: as well as the old and new method names, you need to specify the name of a class (or interface) in the family. Any one will do

Currently, rename-method is only half-implemented. Method definitions are updated but not (yet) method calls. (Project-spanning-families..?)

The looser: "rename every method foo() wherever it's found" will also be a useful refactoring. That will be added later.

scripting

Rephactor can automatically update client apps at the same time as the primary target if they're all on the local machine. With script files, library authors can distribute updates to end users.

Script files are pretty straightforward. One command per line. You can put several refactorings in one script:

------------------------
find-replace old new
rename-file from/this to/this
------------------------

End users can run the script:

$ rephactor projectname from-script path/to/script/file

They will obviously need to have Rephactor installed and have all the relevant working copy records defined (including a record for the service app..??).

Note that you can add a record for a remote codebase. This is useful if you have several local projects which use the same (remote) library. When the local projects are entered as dependents in the remote codebase's record, they can all be updated with one command:

$ rephactor remote-codebase from-script path/to/script/file

Same as normal except the primary target (the remote codebase) is skipped. The script moves straight on to the next step: editing the hotlinked dependencies.

Rephactor saves a description of each refactoring in the subversion log. Later, I'll add something which can automatically generate script files from the log.

--no-tests

The "--no-tests" option turns off testing and hence you won't get any warning if something has been broken by the refactoring.

Of course you'll still have an option to visually inspect diffs of changed files, and to roll back if something doesn't look right.