How to log Hibernate SQL statements including binding parameter values with Logback / SLF4J

When working with Hibernate as a Java ORM Framework,  SQL becomes kind of a black box for developers. This is fine in most cases and is exactly the purpose of such frameworks, but the time comes when you want to know the truth…. the raw SQL statements.
I realized that it is kind of hard to really get to the SQL statements, especially if you also want to know the statement AND the binding parameters. For those who don’t know the binding parameters are the actual values for the question marks in the queries. 

The output you will get from this short tutorial is something like this:

2013-08-30 18:01:15,083 | update stepprovider set created_at=?, lastupdated_at=?, version=?, bundlelocation=?, category_id=?, customer_id=?, description=?, icon_file_id=?, name=?, shareStatus=?, spversion=?, status=?, title=?, type=?, num_used=? where id=? 
2013-08-30 18:01:15,084 | binding parameter [1] as [TIMESTAMP] - 2012-07-11 09:57:32.0 
2013-08-30 18:01:15,085 | binding parameter [2] as [TIMESTAMP] - Fri Aug 30 18:01:15 CEST 2013 
2013-08-30 18:01:15,086 | binding parameter [3] as [INTEGER] -  
2013-08-30 18:01:15,086 | binding parameter [4] as [VARCHAR] - com.mypackage.foo 
2013-08-30 18:01:15,087 | binding parameter [5] as [VARCHAR] -  
2013-08-30 18:01:15,087 | binding parameter [6] as [VARCHAR] -  
2013-08-30 18:01:15,087 | binding parameter [7] as [VARCHAR] - TODO 
2013-08-30 18:01:15,087 | binding parameter [8] as [VARCHAR] -  
2013-08-30 18:01:15,088 | binding parameter [9] as [VARCHAR] - MatchingStep@com.mypackage.foo
2013-08-30 18:01:15,088 | binding parameter [10] as [VARCHAR] - PRIVATE 
2013-08-30 18:01:15,088 | binding parameter [11] as [VARCHAR] - 1.0 
2013-08-30 18:01:15,088 | binding parameter [12] as [VARCHAR] - 32 
2013-08-30 18:01:15,088 | binding parameter [13] as [VARCHAR] - MatchingStep 
2013-08-30 18:01:15,089 | binding parameter [14] as [VARCHAR] -  
2013-08-30 18:01:15,089 | binding parameter [15] as [INTEGER] - 0 
2013-08-30 18:01:15,089 | binding parameter [16] as [VARCHAR] - 053c2e65-5d51-4c09-85f3-2281a1024f64

If you are using Logback as your SLF4J implementation the following logback-test.xml or logback-main.xml snippet is needed to get a nice output like this:

<appender name="SQLROLLINGFILE">
 <File>/tmp/sql.log</File>	
<rollingPolicy> 
<FileNamePattern>logFile.%d{yyyy-MM-dd}.log</FileNamePattern>	
</rollingPolicy><layout>	 
<Pattern>%-4date | %msg %n</Pattern>	
</layout>
</appender>
<logger name="org.hibernate.SQL" additivity="false" >	<level value="DEBUG" />	<appender-ref ref="SQLROLLINGFILE" /></logger>
<logger name="org.hibernate.type" additivity="false" >	<level value="TRACE" />	<appender-ref ref="SQLROLLINGFILE" /></logger>

This will create a file under the path /tmp/sql.log (if you are under Windows replace ‘/tmp/’ with e.g. ‘c:/somefolder/’).
Warning: This file can become very large in a short number of seconds depending on how much SQL your application is doing in the background. If you have lots of background jobs running or a large number of queries this file can reach a few gigabyte in a few minutes easily. So watch your disk space ;)

Posted in Software-Development | Tagged , , , | Kommentare deaktiviert

What do people write about in customer product reviews?

Have you ever been in the situation that you wanted to buy something on amazon but you could not decide between product A and product B. Both products have a few hundred product reviews and you would like to have a summary of what all the customer reviews are saying?

Now there is a new web app for exactly that type of problem. IReadLess.com.
It reads customer reviews and extracts the product features the people are talking about and also if they are writing positive or negative about it.

IReadLess under the hood performs something which is called Sentiment Analysis on customer reviews. The term describes various techniques to recognize opinions and emotions in text. Normally this is used to classify a piece of text into the two categories positive or negative. IReadLess takes this one step further and also tries to extract the product features people are writing about and recognize if it is positive or negative.

So what can I use it for?

I think you can see it as kind of a shopping advisor. It gives an additional base to make a decision for or against a product. It is kind of an additional view on a product by crawling through all the customer reviews and extracts and summarizes what is written there. It saves you from doing that yourself and it is this high-level summarized view on a product which you do not have when you just check the 5-star ratings alone.

How does it work?

It is pretty simple. Drag the bookmarklet on their page into the bookmark-bar of your browser, view a product on amazon and click the bookmarklet which takes you to the ireadless result page for that product. If the product you were viewing has customer reviews then you will see two top5-lists of best and worst-rated product features.
It also displays a circle-chart showing the number of positive and negative ratings in total. This is useful for the first instant to see if the majority of mentions is positive or negative.

You then can click on each feature (like the sound of this headphone) and see the parts of the reviews it was mentioned in. In each sentence it was mentioned in you see the feature highlighted along with the words describing it as positive or negative. In some cases it is not 100% correct, especially with things like sarcasm which is a well known and very difficult problem in Sentiment Analysis and natural language processing (NLP) in general like this article describes.

Conclusion

If you are buying on Amazon a lot then this new tool may be interesting for you.

Disclaimer: I am one of the people behind IReadLess, but wanted to write here from a more personal perspective. Hope you like it anyway. Your feedback is always appreciated :)

Posted in Software-Development | Tagged , , , | Kommentare deaktiviert

Tualoop – und der Sommer kann kommen

Im Sommer wird draußen gespielt, hieß es früher schon immer als Kind. Und seither pilgert Jung und Alt in die Parks der Stadt und spielt Badminton, Frisbee, Diabolo, Hockern oder Sonstiges.

Aber jetzt habe ich ein neues tolles Spielzeug entdeckt und erworben. Die Rede ist vom Tualoop, einem neuen Spielzeug für die Sommertage im Park.

Das Tualoop ist ein Wurf- und Fangspiel für mindestens zwei Personen. Man wirft einen Ring mit Hilfe von zwei Holzstöckern durch die Luft und der/die andere Mitspieler/in fängt diesen auch wieder mit den gleichen zwei Stöckern. Klingt simpel und ist auch so. Was einfach klingt, erfordert am Anfang ein kleines bisschen Übung bis der Ring so richtig schön durch die Luft fliegt. Das Fangen klappt eigentlich von Anfang an sehr gut, da es egal ist ob man mit einem Stock fängt oder mit beiden. Das gleiche gilt fürs Werfen: der Kreativität sind keine Grenzen gesetzt – Hauptsache das Ding fliegt und man fängt es. Damit beginnt es dann auch schnell großen Spaß zu machen. Man versucht Fangen und wieder Werfen durch eine flüssige Bewegung geschickt und vor allem gutaussehend zu verbinden – und scheppert den Ring dann auch schon mal auf den gedeckten Kaffeetisch im Garten ;) Ich werde mir mal noch ein Set holen, da man es auch mit mehr Personen spielen kann. Das macht dann bestimmt erst recht Laune.

Auf jeden Fall scheinen die TicToys Jungs hinter dem Tualoop ein schönes neues Freizeitspielzeug für den Sommer 2013 geschaffen zu haben. Das Ganze ist, wie auch schon das Ticayo, nachhaltig regional produziert. Selbst der Ring ist nicht aus Plaste, sondern aus Holzfasern, Harzen und Glukose und mit einer Biofarbe besprüht. Spielen mit ruhigem Gewissen nenn ich das :)

Viel Spaß beim Tualoopen.

 

Posted in Allgemein | Tagged , | Kommentare deaktiviert

Flying Backwards – Soundcloud Song

I just felt like posting some of my songs on SoundCloud.com here.

Posted in Allgemein | Leave a comment

CSV to XLS converter – How to quickly convert CSV files online without coding in your browser

Today I had to create a CSV (comma separated value) file containing product offers and convert it into another spreadsheet file which required slightly different columns and some modifications to columns.

I could have done all this in Excel or Open Office or write my self a little script e.g. in PHP or Java. But I found a cool web-based tool called Transformy which allowed me to do just that without any coding or hacking in Excel. In this article I want to do a little review of this still young tool.

Transformy works pretty simple. You upload your CSV or Excel file, it automatically detects your file delimiter and shows your data in a table. Then you just click here and there to modify your columns. You can remove, add or rename columns, change the order of columns as required. You can apply different various functions like Search&Replace, simple math calculations or date format conversions to each column to modify the content. The Quick-Start Example is pretty nice to get started even without an own file.

The UI comes pretty clean, but it also contains lots of useful advanced features covered behind the Settings button. It has pretty good support for all the low level stuff like delimiters, text qualifiers, line endings (Unix, Windows) and character encoding. All that can be configured for the source and for the target file.

In a perfect world all systems would use UTF-8 for encoding, but as most of us know file encodings often can be a real pain and properly converting between different encodings is necessary. In e-commerce webshops, marketplaces or price comparison sites exchange product data, prices and stock data using Comma-separated files but each system usually requires a different encoding. This tool makes it pretty easy to convert between the most common encodings with a few clicks.

Another nice effect is that it is pretty tolerant even to invalid CSV-files. A frequent problem when people create those files are double quotes and line breaks in descriptions. In order to let CSV-parsers recognize line breaks you have to enclose your content with a text-qualifier, usually double quotes (“your content”). Doing that allows you to also have line-breaks  in description fields. The problem is when your description also contains a double quote character as in 22″ monitor. Tor circumvent the problem you have to double the quote character like this 22″” monitor to still let the parser parse your file. Transformy seemed to even handle those cases where the doubling was not done properly which is often reality.

Another nice feature is the built-in group by functionality. It allows you to group by one column and apply aggregate-functions to the other columns.

In the screenshot I grouped by the column pricecurrency and applied two aggregate functions to the identifier and name column. The identifier column has the Concat distinct values aggregate function applied which concatenates all grouped values together using a specified delimiter. The name column just has the Count rows aggregate function applied which shows the number of grouped rows. This grouping and aggregate is pretty handy when you want to quickly analyse e.g. a file with product data to see how many distinct categories are contained in the category column.

The sorting option lets you sort the complete file by a specified column in ascending or descending order.

Of course, most of the stuff can be done with Excel or a database like MySQL. But this tool is web-based and just allows to do it quickly online without requiring you to do coding, mastering Excel or a MySQL database.

 

If you are all done mass-editing, crunching and converting and you see the final result you can simply download the target file using your specified delimiter, text-qualifier and encoding or alternatively download it as a real Excel (.xls) file. If you are working on large files and just want to test the resulting file, you can also use the Download preview file which just downloads a file containing the rows you see in the preview, which is much smaller than the actual final file. I especially liked the Excel download option, because it gives me a easy way to convert a txt file to Excel (.xls) online. This is great because Excel always has problems with line-breaks in columns of CSV files, but if I first open it in Transformy and then convert it to Excel I don’t have problems with line breaks anymore.

That’s it for my little review of Transformy. If you are working with CSV files and spreadsheets and often need to manipulate and crunch the data or even just need to quickly convert between those formats or export your Magento product data to a different format then maybe it also is a tool for you. A german description of the tool can be found here.

Posted in Software-Development | Tagged | 33 Comments

Great resources on solving OSGi “Uses” constraint violations

I recently had the “famous” “uses” constraint violation when building my OSGI application with PDE Build. It was complaining about:

Package uses conflict: Import-Package: javax.mail; version=”1.4.1″

The following resources have helped me locating the root cause and understanding the problem:

http://njbartlett.name/2011/02/09/uses-constraints.html
http://blog.springsource.com/2008/10/20/understanding-the-osgi-uses-directive/
http://blog.springsource.org/2008/11/22/diagnosing-osgi-uses-conflicts/

In my case the culprit was the following effect:

  • one bundle was doing a Require-Bundle: org.eclipse.osgi which is also referred to as the “OSGI system” bundle. According to the first article in Java 6 it automatically exports some core packages of the JDK / JRE into the classpath like javax.activation
  • The problem was that in our OSGI target platform we also had the jar com.springsource.javax.activation-1.1.1.jar which also exports javax.activation
  • The bundle which caused the problem seems to need something from javax.activation (most likely through the import-package of javax.mail and OSGI was not able to resolve this dependency because javax.activation was exported from two sources (JDK because of Require-bundle: org.eclipse.osgi and the jar com.springsource.javax.activation-1.1.1.jar)

Solution:

  • Modify MANIFEST.MF
  • replace Require-Bundle: org.eclipse.osgi with Import-package: org.osgi.framework

After that change the build was successful again.


 

Posted in Software-Development | 1 Comment

SVN: Shell script to get a modified files report of all sub-directories which are all a separate SVN folder

For our automated build we wanted a simple way to generate a report of all files which have changed between two SVN revisions.
‘svn log -v’ usually gives you such an output.

But in our scenario we have a folder /source which contains about 100 sub-directories. The way svn in this project was organized, each sub-folder had to be checked out separately , thus we could not just do a simple ‘svn log’ on /source, because /source itself was not under version control.

We wanted to execute svn log -v for each sub-folder and write the results into a file. In addition only write the output for sub-folders which really had changes between the two svn revisions.

Here is our little script svnlogreport.sh which does the trick:

#put this into a file svnlogreport.sh
output1=$(svn log -v -r $2:$3 $1)
 
if [[ "$output1" == *"line"* ]]
then
 echo "Changes in $1 for revision $2 to $3 (command: svn log -v -r $2:$3 $1)"
 echo "$output1"
fi

This script also checks if the output contains the word ‘line’ which is an indicator that there are changes in this revision. This has been tested with SVN Version 1.6.15 (r1038135).

Now go to the folder with all those sub-folders and execute the script:

cd /source
find . -maxdepth 1 -type d -exec svnlogreport.sh {} 6654 6694  \; > svnlog.txt

6654 is the start revision and 6694 is the target revision.

This command basically executes the script for each folder.

Hope that helps somebody.

Posted in Software-Development | Tagged , , , , | Kommentare deaktiviert

Using Hibernate EventListeners to encrypt / decrypt properties of an entity based on other properties

Update 2012/07/24:
It turns out my approach below is NOT GOOD! What I wanted to do is conditional encryption which is also asked in this Jasypt forum, where they say one should implement your own UserType.
http://forum.jasypt.org/Hibernate-and-Conditional-encryption-td5586032.html

The serious problem of my approach is decribed at the end. It is that the onLoad Listener calls a setter of the entity for decryption and thus marks it as dirty which causes Hibernate to persist the changed state to DB again, which is not what you want. Seems the EventListener approach is too late in the chain and UserTypes is the way to go. Unfortunatelly it is lots of code which I would need to borrow from Jasypt in this case just for a tiny extension. Let’s see if I can maybe contribute it back to Jasypt.

Here is the original entry:
We are currenlty introducing the Jasypt library in a project @Synesty in order to encrypt sensitive values in .properties files and more important for sensitive user-content in the database. Jasypt brings stuff to make this easier. But (I think) we need to do something which goes beyond the stuff what is super-simple in Jasypt.

What we have:

We have an entity JobProperty(String key, String value, boolean isAlreadyEncrypted)

What we want to do when the entity is saved/updated:

prop.setValue(encryper.encrypt(prop.getValue());
prop.setIsAlreadyEncrypted(true);

What we want to do when the entity is loaded:

if(prop.isAlreadyEncrypted == true) {
prop.setValue(encryper.decrypt(prop.getValue());
}

Basically what we want to do is encrypting a property based on one (or more) other properties, in this case depending on the property ‘isAlreadyEncoded’, which is false for legacy data before encryption was introduced. Those legacy entities are not encrypted yet, thus the code needs to know that those values cannot be decrypted.
Maybe in the future we are adding more stuff e.g. we want the user to be able to decide whether or not to encrypt, so there could be another property which we are checking in order to decide whether or not to encrypt. For those readers who might wonder why we are not using the Hibernate Custom UserType approach by Jasypt: The reason is, that this approach does not support our scenario where encryption should depend on other properties of the entity. It works great though if you always want to encrypt some property. In that case the described Hibernate UserType approach works great. But we need more flexibility and also backwards compatibility for existing non-encrypted data and also for our planned support for seemless key rotation via key-profiles.

Our approach: Hibernate Event Listeners (SaveOrUpdateEventListener, PostLoadEventListener)

We are basically using the Hibernate Event System so that we can hook into the different phases of Hibernates Entity lifecycle (a full list of Events can be found here or examples here). This approach can also be used for another common pattern, which is to maintain an “created or last-updated timestamp” field on all entities every time the entity gets updated.

We are basically hooking into the “save-update” (to encrypt before persisting the entity) and “post-load” (to decrypt after loading the entity) event.

The problem

One problem we had was the onPostLoad event, which was persisting the data again to the database after decrypting the value. This is not what we wanted.
What we wanted was that the decrypted value is just like a transient value for displaying it in the UI. The underlying database value should still be encrypted.

The solution

Our current solution for this problem is the org.hibernate.Session.setReadOnly(entity, true) method.

/**
* Set an unmodified persistent object to read only mode, or a read only
* object to modifiable mode. In read only mode, no snapshot is maintained
* and the instance is never dirty checked.
*
* @see Query#setReadOnly(boolean)
*/
public void setReadOnly(Object entity, boolean readOnly);

So basically what we are doing in pseudo-code is:

if(prop.isAlreadyEncrypted == true) {
event.getSession().setReadOnly(prop, true);  // mark the object as readonly for the current session, because otherwise hibernate will persist the decrypted value to db again.
prop.setValue(encryper.decrypt(prop.getValue());
}

Show me a full example

In our case the listener looks like this:

 

public class MyLoadInsertUpdateListener extends DefaultSaveOrUpdateEventListener implements PostLoadEventListener, SaveOrUpdateEventListener{
 
	@Override
	public void onSaveOrUpdate(SaveOrUpdateEvent arg0) {
		if(arg0.getObject() instanceof Jobsproperties){
			Jobsproperties p = (Jobsproperties) arg0.getObject();
 
				EncryptionService encService = Activator.getServiceFactory().getService(EncryptionService.class);
				if(p.getJpvalue() != null){
					p.setJpvalue(encService.encrypt(p.getJpvalue()));
					p.setIsEncrypted(1);
				}
 
		}
 
		if(arg0.getObject() instanceof JobStepProperties){
			JobStepProperties p = (JobStepProperties) arg0.getObject();
 
				EncryptionService encService = Activator.getServiceFactory().getService(EncryptionService.class);
				if(p.getJspvalue() != null){
					p.setJspvalue(encService.encrypt(p.getJspvalue()));
					p.setIsEncrypted(1);
				}
 
		}
 
		super.onSaveOrUpdate(arg0);
	}
 
	@Override
	public void onPostLoad(PostLoadEvent arg0) {
 
		if(arg0.getEntity() instanceof Jobsproperties){
			Jobsproperties p = (Jobsproperties) arg0.getEntity();
			if(p.getIsEncrypted() == 1){
				if(p.getJpvalue() != null){
					EncryptionService encService = Activator.getServiceFactory().getService(EncryptionService.class);
 
					// workaround: mark the object as readonly, because otherwise hibernate will persist the decrypted value to db again.
					arg0.getSession().setReadOnly(p, true);
 
					p.setJpvalue(encService.decrypt(p.getJpvalue()));
 
				}
			}
		}
 
		if(arg0.getEntity() instanceof JobStepProperties){
			JobStepProperties p = (JobStepProperties) arg0.getEntity();
			if(p.getIsEncrypted() == 1){
				if(p.getJspvalue() != null){
					EncryptionService encService = Activator.getServiceFactory().getService(EncryptionService.class);
 
					// workaround: mark the object as readonly, because otherwise hibernate will persist the decrypted value to db again.
					arg0.getSession().setReadOnly(p, true);
					p.setJspvalue(encService.decrypt(p.getJspvalue()));
 
				}
			}
		}
 
	}
 
}

The used EncryptionService is just an internal interface implementation which provides an encrypt/decrypt method, but it is not important in this post.

As we are using Spring in combination with Hibernate it might be also interesting to see how our listener is registered, because there are also some pitfalls:

			.... our classes
 
				... hibernate properties
<map>
</map>

If you are wondering why we registed the same class 4 times as 4 beans? Hibernates doc states:

Listeners registered declaratively cannot share instances. If the same class name is used in multiple elements, each reference will result in a separate instance of that class.

With this approach so far we managed to solve our requirement but we are still testing it. One proplem we are seeing with this approach is that the Session.setReadOnly() method does have some code smell and is maybe a hack. I think in cases where we are just loading an entity for display purposes it works fine. But in cases where we are loading the entity, performing some update and then persisting the entity we could run into problems because of the read-only mode. But so far this concern has not settled yet although we clearly have load-update-persist patterns in the code.

Alternatives

What alternatives did we have and why did we go this way?

I think the alternative would have been to maybe also implement a custom Hibernate UserType. Our first idea was to extend the Jasypt’s org.jasypt.hibernate3.type.AbstractEncryptedAsStringType but unfortunatelly all the interesting methods are declared final so the only option would be to copy & paste the whole class…. but this should be our last resort if there is no other way.

If anybody has suggestions for a better solution please comment or get in touch with me.

I hope this post is somehow helpful for you to either go down the same road or get inspiration for different approaches and the related problems around it.

Posted in Software-Development | Tagged , , , , | Kommentare deaktiviert

New Google Analytics Dashboard Layout: the opposite of a dashboard?

New Dashboard – less useful

Old Dashboard – more useful

This question in the Google Analytics help forum with the title “New Layout: the opposite of a dashboard?” nails down my opinion about the new Google Analytics. While I generally like the new glossy design which google rolls out throughout all of their apps I feel that the new Analytics dashboard is a huge step backwards especially in terms of usability. It seems Google in this specific case has not understood how people use analytics.

While researching I also came across this article where they say that this feature was ‘lost’ and they are ‘working on it’ (my favorite quote of every company’s support crew ;) ).

To summarize: I really hope that the old dashboard will make it into the new Analytics, otherwise I will use the old version as long as possible. If you feels so too, please tweet, vote or take part in the discussions to make GA useful again.

 

 

Posted in Allgemein, Software-Development | Tagged , , | Kommentare deaktiviert

What are pros and cons for url structure of a SaaS multi tenant web-app regarding scalability / load-balancing?

I just posted the question above on Quora in hope to get some answers.

I would like to know the pros and cons of choosing a URL structure for a multi tenant web-application. I also think that this could impact scalability of the site when traffic grows.
The dimensions of the application I see are two main things:
- mini-apps
- customers can register per application
- the load/traffic hotspots can be for both (e.g. there could be one customer using very many resources and there could be also a mini-app which uses much more resources than other mini-apps. So we need to be able to route traffic to different machines / parts of the cluster, depending on the traffic patterns of either users and/or apps.

I thought about the following:

Approach 1:

Main domains for each app, for guest users / anonymous:
http://apps.nameofourwebapp.com/nameOfMiniApp1
http://apps.nameofourwebapp.com/nameOfMiniApp2

Once the customer is registered I thought about:
http://yourcompanyxyz.apps.nameofourwebapp.com/nameOfMiniApp1
http://yourcompanyxyz.apps.nameofourwebapp.com/nameOfMiniApp2

Approach 2:

The other way round I could also imagine:
http://nameOfMiniApp1.nameofourwebapp.com/
http://nameOfMiniApp1.nameofourwebapp.com/yourcompanyxyz

What are the advantages and disadvantages of both approaches especially with regards to scaling / load-balancing?

Does allow Approach 1 easier routing of traffic to different machines per customer (e.g. via Amazon Route53) than Approach 2?

If you have any thoughts on this question feel free to add an answer on this Quora question. Thanks.

Posted in Software-Development | Tagged , , , , | Kommentare deaktiviert