Michal Jedynak - notes about software development on Java platform

Saturday, 12 February 2022

How to purge Kafka topic programatically using AdminClient

The need to purge kafka topic (without deleting the topic itself) comes pretty frequently (usually in test environments).

There are possibilities to do it by temporarily changing retention.ms on the topic with bash commands (described here) or with UI like Confluent Control Center (if you're using Confluent).

In this post, we'll check how to do it programatically using AdminClient. (Sample application will be written in Kotlin with Spring Boot and Spring Kafka)

Let's start with a look at the dependencies in build.gradle.kts (project skeleton was generated by Spring Initializr):

We need to define a bean with AdminClient:
For the sake of simplicity in this example, we're just setting property of kafka broker address (kafka.bootstrapAddress) taken from application.properties file.

Once we have AdminClient configured, we can use it to purge the topic (notice that AdminClient is injected):
We need to describe the topic we want to purge to obtain number of partitions.
Once we have that, we create a map - for each partition (as key) we define that we want to delete all records (RecordsToDelete.beforeOffset(-1)).

Let's check how we could write integration test for such functionality: There's a lot going on here, let's break it into smaller chunks.
We're starting embedded kafka broker on port 9092 with @EmbeddedKafka(ports = [9092]) annotation. (Port corresponds with property in application.properties)
At the beginning of the test we add a topic with 2 partitions (line 12).
Then we send 1 message to each partition (lines 16,17).
After that, we check in each partition that earliest offset is set to 0 (line 21) and latest offset is set to 1 (line 22). That's it for the testcase setup.
Later we invoke the actual purging and check offsets in both partitions once again (earliest and latest should be set to 1). That proves that all the records were removed.

The whole project can be found on github.

Sunday, 1 April 2018

Handling nulls with Optional

Let's imagine following situation:
We have some 3rd party code (or legacy one that you can't change at the moment) that fetches objects representing people based on some parameter. This could look as following:

Our task is to retrieve city in which specific person lives for further processing. We don't have any guarantee over data consistency, so we need to handle it on our own.

Task seems to be simple, how can we achieve this?
One solution could look something like: As you can see above it's quite ugly. Multiple nested if statements don't look nice. What is more, returning null is not a good practice.

What can we do to get rid of the ifs and indicate that city might not be present?

We can use Optional type: map methods take care of handling null and we're returning Optional to indicate that value might not be there.
Instead of returning Optional we could consider null object

The whole project (with the tests checking the solution) can be found on github

Saturday, 6 August 2016

Thread safe lazy initialization using proxy

Sometimes we need to defer the object creation until it is needed.
The reasons may be various like creation is time/resource consuming (database connection etc.) and we want to speed up application startup.

This is a use case for the lazy initialization pattern.

We have a a following service interface and implementation that returns current time:

Let's imagine that the initialization is very costly.

Simple lazy initialization could look as follows:
We're creating DefaultService only once when we're invoking getTime method.
This solution is fine in single threaded environment (we will cover multi threaded environment later).

The problem arises when the service does not have few methods but a lot of them (it may happen when you're using some legacy or not well designed library).
Such implementation would look awful in this situation - you would have to implement every method and try to initialize the object in all of them.

In that case we can create a proxy object that would lazy initialization for each method invocation.
To create such proxy we need to implement java.lang.reflect.InvocationHandler interface.
There are some pitfalls the we need to overcome (more info here), that's why it's easier to use Guava's AbstractInvocationHandler:

The only thing left is thread safety. We can implement it on our own, but the safest way is to use ConcurrentInitializer from commons-lang:

I chose LazyInitializer that uses double-checked idiom but there are other implementation that might be more useful depending on the context.

Let's run all the implementations and see how they work (notice when the info about initialization is printed on the console) :

The whole project can be found on github.

Monday, 6 July 2015

Dynamic thread pool

Sometimes we're facing the problem, that we need to solve using dynamic thread pool.

By 'dynamic' I mean, that the pool is shrinking and growing automatically but it does have upper bound (so it can't grow indefinitely).

The API of the pool would be quite simple:

We should be able to submit the task to be executed asynchronously and monitor number of threads in the pool.
Having control over maximum number of threads and their longevity would be another feature.

Let's try to document functional requirements in the form of unit tests.
That would be our test setup:
Initial number of threads should be zero:
The execution of the tasks should be done concurrently:
Number of threads should grow automatically:
The size of thread pool should shrink when there is no active tasks and keep alive time was exceeded:
If the number of submitted task is greater than maximum threads number, we should block:
Actual implementation is quite simple, we need to use ThreadPoolExecutor:
There are two important things here.
First, we are required to create our own RejectedExecutionHandler to support blocking when the maximum number is exceed.
Secondly, we need to use the blocking queue of size zero (in our case I chose SynchronousQueue) - the queue needs to be filled up before we can create more than initial number of threads (in our case zero).
The whole project can be found at github.

Sunday, 16 March 2014

Several approaches to processing events (part 2 - asynchronous processing with blocking queues and thread pool)

In the previous post we were solving problem with computing prime factors for numbers read from a large file.
We were taking a synchronous approach, which was quite easy to implement but very slow.
This time we'll try to go the asynchronous way using blocking queues and thread pool.

First of all, let's take a look at our Reader class:

The reader reads each line and puts it into the blocking queue. Once it's finished, it informs the latch.

The LineProcessor class is more complicated:

Let's take a closer look at the collaborators.

inputQueue is the queue that Reader is writing to.
successQueue and exceptionsQueue are the queues that will be populated based on the line processing result.
inputLatch is the latch modified by the Reader.
outputLatch will be informed when there are no more lines to be processed.

The LineProcessor checks if the Reader already finished by checking inputLatch and inputQueue.
If it didn't, it takes a line from the inputQueue and and populates appropriate queue with the result.
If it did finish, it informs outputLatch and terminates processing.

The Writer class is quite simple:

It takes the messages from the queue and writes it to a file.
It terminates, if the queue is empty and the latch was informed that there is no more messages.

The only thing left is the Bootstrap class that binds it all together:

The latches and queues are initialized and then passed to the processing objects via constructor.
Line processors are wrapped in a thread pool.
Reader and two writers are started as threads. We then wait until all writers are finished and the summary is printed.

We can check the behavior with the following test:

On a machine with Core2 Duo 2.53GHz it takes ~43 seconds to process 10 000 numbers.
The whole project can be found at github.

Wednesday, 26 February 2014

Scheduling tasks with Spring (updated)

Spring provides an easy way to schedule tasks.

Let's say that we would like to have an information about current time printed on a console periodically.

The class that prints the time may look like this:

We need to define a class that will encapsulate the printing task:

The @Scheduled annotation is the key here: the method reportCurrentTime is annotated by it, therefore it will be invoked every 5 seconds.
You can also specify cron expression. You can use fixedRateString parameter if you want to read it from properties file.
Please note setting of the thread name - it will be needed for the test.
Adding production code only for tests is generally not a good practice, but in this case it can also be used for monitoring purposes.

The spring configuration looks as following:

To run it we need to create an invoker class:

Unfortunately there is no trivial way to test it automatically. We can do it the following way.
Let's create a test class:

When we run the test, the spring context will start and the task will be invoked by scheduler.
We check if the thread with our name exists.
We do it for some time to avoid race condition - it may happen that verification method will be invoked before thread starts.
The whole project alongside with dependencies can be found on github.

Monday, 13 January 2014

Several approaches to processing events (part 1 - synchronous processing)

Let's consider the following problem that we have to solve:

We have a file with numbers (each line has one number).
Our goal is to process every number and count its prime factors.
We need to write the result along with the processed number in a separate file.
In case of any exception during processing we need to write the exception to a file together with the number that was processed by the time it occurred.
Apart from that we also need to write the summary of the time that we spent on processing.

The class that counts prime factors looks as following:

The simplest but not the optimal way to solve this would be to process each line synchronously. It could look like that:

We're reading lines from numbers.txt file and then we're processing each line in a for loop. At the end we're writing everything to three files.

The Reader and Writer classes are quite simple:

They're using Guava and Apache Commons dependencies that have following declaration:

We can check the results in a following test:

On a machine with Core2 Duo 2.53GHz it takes ~73 seconds to process 10 000 numbers.

The whole project can be found at github.

In next posts we'll take a look at other approaches to solve this problem.

Sunday, 27 October 2013

Running Hadoop locally without installation

If we want to take Hadoop for a test-drive without installing the whole distribution, we can do it quite easily.

First of all, let's create a maven project with the following dependencies:

There is a known issue with running newer version on Windows, so the older one is chosen.
Cygwin is also required to be installed when running on Windows.

We will create a job to count the words in files (it's a well-known example taken from the official tutorial).

Our mapper would look like:

The mapper splits the lines from file into words and pass on each word as the key with the value of one.

Here comes the reducer:

The reducer receives all values for given key and counts them.

All that is left is a main class that will run the job:

We are setting job's mapper, reducer and classer for key and value.
Input and output paths are set as well.
You can run it directly and check the output file with the result.
The whole project can be found on github.

Saturday, 21 September 2013

Transaction management with Spring Data JPA

Spring provides an easy way to manage transactions.
Let's see how to make our methods transactional without using any XML configuration.

We will start with Spring Java config for our application:

We have an entity representing a bank account:

We also have a repository from Spring Data JPA for Account objects:

The TransferService allows transferring money from one account to another:

Just for example sake, we are adding money to one account and before subtracting from the second one, we check if it has enough funds.
If the method wasn't transactional, we would introduce a major bug.
However, @Transactional annotation makes it eligible for rollback if the exception is thrown.
It's also important to note that many methods from the repository are transactional with default propagation, so the transaction from our service will be reused.
Let's make sure that it works by writing an integration test:

If we remove @Transactional annotation the test will fail.
Be aware that managing transactions with Spring have some traps that are described in this article.
The whole project can be found at github.

Thursday, 29 August 2013

Changing application behavior at runtime with JMX

Sometimes we need to be able to change the behavior of our application without a restart.
JMX, apart from its monitoring capabilities, is a perfect solution for this.
Spring provides great JMX that will ease our task.

Let's start with a simple service, which behavior we will change at runtime.

DiscountService calculates discount based on a globalDiscount - it's value is harcoded for simplicity purposes, it would probably be read from some configuration file or database in more realistic example.

First of all, in order to expose methods to manage globalDiscount we need to add @ManagedResource annotation to our class and add the methods with @ManagedOperation annotation.
We could also use @ManagedAttribute if we would treat these methods as simple getter and setter for globalDiscount.

Class with needed methods and annotations would look like:

In Spring configuration we just need to define the bean for DiscountService and enabling exporting MBeans with MBean server.

We can run the application with:

Now we're ready to manage our service with jconsole:

Our service is exposed locally, but if we want to be able to connect to it remotely, we will need to add following beans to the Spring configuration:

Service is now exposed via RMI.
We can invoke the exposed methods programmatically, which allows us to write some scripts and manage services without using jconsole.

Let's write an integration test to check that it works correctly.
We will need a Spring config for the test with the RMI client.

The test will increment the value of globalDiscount.
Take a closer look at exposed methods invocation, which is very cumbersome, especially if the method has parameters.

The whole project can be found at github.

Sunday, 21 July 2013

Caching with Spring Cache

From time to time we need to use a cache in our application.
Instead of writing it on our own and reinventing the wheel, we can use Spring Cache.
Its biggest advantage is unobtrusiveness - besides few annotations we keep our code intact.

Let's see how to do it.
At first let's have a look at some fake service that is the reason for using cache (in real life it may be some database call, web service etc.)

We will cache the invocations of isVipClient(...)

Let's start with adding dependencies to our project:

We'd like to use xml-free spring config, so we will define our beans in java:

We have our SlowService defined as a bean. We also have the cacheManager along with a cache name and its implementation - in our case it's based on ConcurrentHashMap. We can't forget about enabling caching via annotation.

The only thing to do is to add annotation @Cacheable with a cache name to our method:

How to test our cache?
One way to do it is to check if the double invocation of our method with the same parameter will actually execute the method only once.
We will use springockito to inject a spy into our class and verify its behaviour.
We need to start with a spring test context:

Basically the context will read the configuration from java class and will replace the slowService bean with a spy. It would be nicer to do it with an annotation but at the time of this writing the @WrapWithSpy doesn't work (https://bitbucket.org/kubek2k/springockito/issue/33).

Here's our test:

This was the very basic example, spring cache offers more advanced features like cache eviction and update.

The whole project can be found at github.

Saturday, 29 June 2013

Handling command line arguments with args4j

From time to time, we need to write a tool that is using command line arguments as input.
Having an interface similar to unix command line tools is not a trivial task but with args4j it becomes quite easy.

We're going to write a fake tool called FileCompressor which process of invocation would look like:

where -i is input file name, -o output file name and -p the priority of the process.

Let's start with defining the dependencies of our project:

We'll need args4j, the rest is for testing.

Our fake FileCompressor requires a configuration that will be populated by args4j.

The configuration object has fields that will be mapped to command line arguments with args4j annotations.

And here is the class that will be invoked:

First of all, the arguments are parsed and configuration object is populated with them.
If there is a parsing problem, the exception is printed along with the usage (args4j can automatically print the usage of our application based on used annotations).

Testing our application requires some effort.
To begin with, we would like to mock completely the FileCompressor. There is no setter for it so we'll use Spring's ReflectionTestUtils together with Mockito.

We've also prepared some test data.

Let's check if the configuration object is populated properly and if the collabolator was invoked:

We would also like to know if the usage was printed when invalid arguments were passed.
To do that we need to use a spy, as we want to have the original behavior of populating argument, but we want to verify if printUsage method was invoked.

The last thing to check is if the exception was printed to the output stream when invalid arguments were passed.
Similarly we need to use a spy.

The whole project can be found at github.

Sunday, 19 May 2013

Testing JavaScript with Jasmine

When developing a web application, sooner or later we need to deal with JavaScript. To keep high code quality we must write unit tests.
Jasmine is a nice testing framework that allows us to write tests in a BDD manner.
Let's see how to use it in our project.

First of all, we need to incorporate it into our build. There is a maven plugin that will let us execute JavaScript tests within test build phase. We will add it to our pom.xml :

Apart from executing tests within build, we can run them during development phase.
The maven goal jasmine:bdd starts the Jetty server and under http://localhost:8234 URL we can see the results of executing test fixtures.
Reloading the page will execute the latest version of our code.

Let's see it in action. We have a simple JavaScript function residing in src/main/javascript/simple.js that increments the number passed as an argument.

The test (in Jasmine called spec) is located in src/test/javascript/simple_spec.js:

A test begins with a call to the global Jasmine function describe.
Then the specific test cases are defined by calling function it.
We're testing the sunny day scenario and border case that throws the exception.
At the beginning as a sanity check we can test if the function is defined.
Jasmine has a lot of advanced features, we can even use mocks, expectations and argument matchers.

Nothing stops us from writing tests first and going through red-green-refactor cycle.
Let's write a function that will count the number of specific elements on our page.

The behaviour of our component is defined by following spec:

Before testing the case with one element, we're adding it to the document. In more complex case we could load the HTML content.

To implement, it we'll use jquery (we need to add it to src/main/javascript):

After executing mvn test we'll see a nice output:

Whole project can be found at github.

Thursday, 25 April 2013

Testing Spring Integration

Adding Spring Integration to our project (apart from many advantages) brings drawbacks as well. One of them is more difficult testing.

Let's try to test the flow from MessageRouter to Persister via JSONHandler (please refer to the previous post) .

First of all, it won't be a unit test but an integration test as we need to start-up the spring context and test the flow between components.

Let's prepare spring xml config:

We're importing the application config and overriding beans with mockito mocks.

The test would look like:

Spring injects all the required components. We're mocking the behaviour, sending message to the channel and then verifying. Mocks need to be reset as well if we want to have more test methods.

By looking at the source code we don't see which object is mocked. To solve this problem (and simplify the code as well) we can use springockito.
Then in our config file we only need to import the application config without overriding any beans.

The overriding part is done in the test by using annotations:

To use springockito we need to change the context loader to SpringockitoContextLoader.
The @ReplaceWithMock annotation is self-descriptive.
The context is dirtied after execution of each test method which is basically equivalent to resetting the mocks.
Whole project can be found at github.

Wednesday, 20 March 2013

Spring Integration vs plain Java code

Adding frameworks to our system may solve some problems and cause too much complexity at the same time.

Let's imagine a simple flow:

We have a MessageRouter object that routes received message to either XMLHandler or JSONHandler. The processed message in both handlers are then passed to Persister that is storing the message.

Modelling it is not that difficult. Here's our MessageRouter class:

Handlers:
And finally persister: We can see it in action by running the following class:
Everything seems fine here, but we have a problem - tight coupling.
Message router knows about handlers and handlers know about persister.
We would gain loose coupling and high cohesion if they weren't aware of each other which would automatically make them concentrate on one task only.

Spring Integration can help us achieve this.
It is a separate module of Spring, enabling lightweight messaging within application and supporting Enterprise Integration Patterns.

All the arrows from the above picture will be represented as channels.

The XML Spring context configuration for the channels looks pretty straightforward:

Actually it's not even required as Spring can create them by default.

Apart from the channels, our MessageRouter only role is to return the channel name to which the message will be passed.
Also the handlers and persister need to become Service Activators (their methods will be invoked when the message will go through the channel).
The config for those:
To see it in action we can run the following class:
Notice that the code is concise and simple.
The components do not depend on each other.
What is more, if we wanted to modify the flow and move Persister component in front of the MessageRouter, we would need to change the xml config to:

Changing the flow in the first version that is using plain Java code would require much more modifications.

Nevertheless we increased the complexity of our application. Now we depend on a framework and we need to maintain additional configuration in an XML file.

Another big disadvantage is testing. We could easily unit test the previous code using mocks. Now we need to test the Java code and Spring Integration configuration as well, which is not that simple.
I'll show how to do it in the next post.

Friday, 15 February 2013

MarkLogic Java client

MarkLogic is a NoSQL document database that allows to handle XML efficiently.

Let's take a look how to setup MarkLogic database instance on a local machine and write a simple application that will perform CRUD and searching operations on XML documents.

First of all, we will need to download MarkLogic server (account is required).
Installation and starting procedures are described here - they're pretty straightforward.
When server is already started, we need to create a new database with REST API instance and a user having write access - follow this link.
REST is used by the Java client as a communication protocol but we can also use it manually in our browser.

Once database is created we can start writing the client code.

Let's start with setting up required dependencies:
To use MarkLogic Java client we need to specify MarkLogic maven repository. We'll also use xml-matchers library to compare created XML documents.

Here's an example of XML document that will be representing person:
Let's define an interface that will allow some simple CRUD and searching operations:
The sample implementation could look like:
In order to perform CRUD operations we need to have DocumentManager object (in our case XMLDocumentManager as we're handling XML). It is thread-safe object (can be shared across multiple threads) and its usage is quite intuitive. Each operation needs a specific handle object - as our interface declared String, we'll use StringHandle that is being populated with the result by manager.

To do query operations QueryManager is required. There are many types of queries, we'll use searching by element value.
It's a little bit more complicated than simple CRUD operations - SearchHandle object is initially populated by running the query on query manager.
Then we're iterating over each SearchHandle's result represented by MatchDocumentSummary object and retrieve its URI, that is given to DocumentManager that reads full document.
Please note that the number of returned documents has been limited to 10.

The integration test (it requires running MarkLogic server):
In setUp() method DatabaseClientFactory creates DatabaseClient based on the given credentials (they need to be the same as the one used during setting up the database).
Once we have the client we can create managers needed by the implementation.

One thing to note: when manager cannot find the document it throws ResourceNotFoundException.

The whole project can be found at github.

Wednesday, 6 February 2013

Spring Data JPA sample project

In previous post I showed how to setup a sample project with JPA and Hibernate.
Even though it wasn't difficult, there was a main disadvantage - we need to do a lot of coding around our DAO objects even if we want only simple operations.
Spring Data JPA helps us reduce data access coding.

Let's start with defining dependencies:

Compared to previous project there are more dependencies because of Spring. spring-test is needed to allow our test use the spring context. And this time we're going o use HSQLDB.

pesistence.xml is much smaller because the persistence configuration will be defined in spring context.

Please note that Hibernate is still the persistence provider. The Person class is exactly the same as before.

The context is defined as:

We need to specify dataSource, entityManagerFactory and transactionManager.
The pl.mjedynak package will be scanned by spring to do autowiring.
Spring Data JPA introduces the concept of repository which is higher level of abstraction than the DAO.
In our context we define a package with the repositories.

The only thing to do to be able to manage our Person class is to create an interface that will extend CrudRepository - it gives us basic operations like save, find, delete etc.

We can of course expand it with more sophisticated methods if we want.

The integration test is almost the same as before, except it needs to use spring context.

The whole project can be found at: https://github.com/mjedynak/spring-data-jpa-example

Friday, 25 January 2013

Hibernate JPA sample project

Here's a simple example of a project that uses JPA with Hibernate as default implementation.

Let's start with the required dependencies:

We're going to use the embedded Derby database.

We will also need the persistence.xml placed in META-INF directory on our classpath :

We're setting Hibernate as our persistence provider and Derby as JDBC driver.
Entity class Person is also specified as belonging to this persistent unit.

It's defined as:

In order to manage Person we need some kind of DAO. Let's define an interface PersonDao:

For simplicity it has only methods for adding person and finding all persons.

The sample implementation could look like:

And the most important - integration test that checks if everything is glued together correctly:

The whole project can be found at: https://github.com/mjedynak/hibernate-jpa-example

To sum up:
it's quite easy to create JPA with Hibernate project, however every entity class needs it's own DAO (unless we use a generic one).
In the next post I'll take a look at Spring Data Jpa project that solves that impediment.

Sunday, 18 November 2012

Mutation testing with PIT

How can we know that our unit tests are good?

Code coverage can't say that we have proper unit tests - it's just a side effect of using TDD.
It says what code has been executed in the unit test - we can write a test which just executes a method and without any assertions we'll have 100% code coverage.

Let's take a look at the following class:

It's a simple class that tells us if a given number is higher or equal to zero.

Let's take a look at the unit test:

When we execute cobertura on this test, we will see 100% coverage, both line and branch, even though it forgets about the case where the number is 0.

Mutation testing is an idea that tries to measure the quality of unit tests. They are run against modified version of the source code (by applying various mutators - hence the name). If the application code is changed, it should produce different result, therefore the unit test should fail. If it doesn't fail, it may indicate problem in the test suite.

PIT (http://pitest.org/) is a bytecode mutation testing system that automatically applies mutators to the application code.

If we run it against above mentioned class, it will produce a nice HTML report saying that 'changed conditional boundary' has survived - which means that we didn't test all boundary conditions.

We can integrate execution of PIT into our build (there are plugins for maven, ant and gradle) or we can execute it within our IDE (plugins for Eclipse and IntelliJ - that one I've created recently).

The project is actively developed, so we can expect more features to come.

Wednesday, 3 October 2012

JUnit - hamcrest incompatibility

When we're using the latest version of JUnit (4.10) and hamcrest (1.3) together we might come across an unexpected behaviour.

Let's take a look at a simple test:

We might think that this test will fail because of the assertion, but the result is:

java.lang.NoSuchMethodError: org.hamcrest.Matcher.describeMismatch(Ljava/lang/Object;Lorg/hamcrest/Description;)V

That's because the default JUnit has a dependency to hamcrest 1.1 and also the artifact org.junit:junit has some hamcrest classes inside its jar. The hasSize matcher is not present in 1.1.

To avoid this problem we need to declare following dependencies in our project:

We need to use JUnit distribution that doesn't have any hamcrest internals and we need to exclude hamcrest dependency as well. What's interesting, we will not bump into this behaviour unless our assertion throws an error.