Monday, 13 January 2014

Several approaches to processing events (part 1 - synchronous processing)

Let's consider the following problem that we have to solve:

We have a file with numbers (each line has one number).
Our goal is to process every number and count its prime factors.
We need to write the result along with the processed number in a separate file.
In case of any exception during processing we need to write the exception to a file together with the number that was processed by the time it occurred.
Apart from that we also need to write the summary of the time that we spent on processing.

The class that counts prime factors looks as following:

public class PrimeFactorCounter {
public long primeFactors(long number) {
long n = number > 0 ? number : -number;
long counter = 0;
for (int i = 2; i <= n / i; i++) {
while (n % i == 0) {
counter++;
n /= i;
}
}
if (n > 1) {
counter++;
}
return counter;
}
}
The simplest but not the optimal way to solve this would be to process each line synchronously. It could look like that:

public class Bootstrap {
private final Logger logger = LoggerFactory.getLogger(this.getClass());
static final String PRIME_FACTOR_FILE = "primeFactorCounter-synchronous.txt";
static final String EXCEPTIONS_FILE = "exceptions-synchronous.txt";
static final String SUMMARY_FILE = "summary-synchronous.txt";
private final PrimeFactorCounter primeFactorCounter = new PrimeFactorCounter();
private final Reader reader = new Reader();
private final Writer writer = new Writer();
public void start() throws IOException {
long startTime = System.currentTimeMillis();
Iterable<String> numbers = reader.getNumbers("numbers.txt");
StringBuilder valuesWithPrimeFactorCounter = new StringBuilder();
StringBuilder exceptions = new StringBuilder();
for (String number : numbers) {
try {
long value = Long.valueOf(number);
if (value > 0) {
long factor = primeFactorCounter.primeFactors(value);
valuesWithPrimeFactorCounter.append(value).append(",").append(factor).append("\n");
}
} catch (Exception e) {
exceptions.append(number).append(", ").append(e).append("\n");
}
}
writeResults(startTime, valuesWithPrimeFactorCounter, exceptions);
}
private void writeResults(long startTime, StringBuilder valuesWithPrimeFactorCounter, StringBuilder exceptions) throws IOException {
writer.write(PRIME_FACTOR_FILE, valuesWithPrimeFactorCounter.toString());
writer.write(EXCEPTIONS_FILE, exceptions.toString());
String summary = "synchronous processing, time taken " + (System.currentTimeMillis() - startTime) + " ms";
logger.debug(summary);
writer.write(SUMMARY_FILE, summary);
}
}
We're reading lines from numbers.txt file and then we're processing each line in a for loop. At the end we're writing everything to three files.

The Reader and Writer classes are quite simple:

public class Reader {
public Iterable<String> getNumbers(String fileName) throws IOException {
String fileContent = FileUtils.readFileToString(new File(fileName));
return Splitter.on("\n").omitEmptyStrings().split(fileContent);
}
}
public class Writer {
public void write(String fileName, String content) throws IOException {
FileUtils.writeStringToFile(new File(fileName), content);
}
}
They're using Guava and Apache Commons dependencies that have following declaration:

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>15.0</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>
view raw dep.xml hosted with ❤ by GitHub
We can check the results in a following test:

public class SynchronousProcessingSystemTest {
@Before
public void setUp() {
new File(Bootstrap.PRIME_FACTOR_FILE).delete();
new File(Bootstrap.EXCEPTIONS_FILE).delete();
new File(Bootstrap.SUMMARY_FILE).delete();
}
@Test
public void shouldProcessFile() throws Exception {
// when
new Bootstrap().start();
// then
assertResultFile();
assertExceptionsFile();
assertSummaryFile();
}
private void assertResultFile() throws IOException {
List<String> lines = FileUtils.readLines(new File(Bootstrap.PRIME_FACTOR_FILE));
assertThat(lines.size(), is(9900));
assertThat(lines.contains("5556634922133,5"), is(true));
}
private void assertExceptionsFile() throws IOException {
File exceptionsFile = new File(Bootstrap.EXCEPTIONS_FILE);
assertThat(exceptionsFile.exists(), is(true));
String exceptionsFileContent = FileUtils.readFileToString(exceptionsFile);
assertThat(exceptionsFileContent, containsString("badData1"));
assertThat(exceptionsFileContent, containsString("badData2"));
}
private void assertSummaryFile() {
assertThat(new File(Bootstrap.SUMMARY_FILE).exists(), is(true));
}
}
On a machine with Core2 Duo 2.53GHz it takes ~73 seconds to process 10 000 numbers.

The whole project can be found at github.

In next posts we'll take a look at other approaches to solve this problem.