Data Management


Data management comprises all the disciplines related to managing data as a valuable resource.
Data Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise.

During my MS studies, I followed two interesting lectures related to Data Management.

  • Introduction to Data Mining (Summary)
  • Introduction to Information Retrieval (Summary)

Data Mining is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. In the context of Data Mining, Data Warehouses (DW) form an important aspect. Data Warehouses generalize and consolidate data in multidimensional space. The construction of DW is an important pre-processing step for data mining involving data cleaning, data integration, data transformation.
I have summarized all the notes I have taken during the Introduction to Data Mining lecture as well as some of my solutions to the exercises within the following document : Summary of the Data Mining Lecture.

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within a large collections (usually stored on computers). Information Retrieval is a field concerned with the structure, analysis, organisation, storage, searching and retrieval of information.
Here as well I have summarized the notes taken during the lecture within the following document : Summary of the Information Retrieval Lecture.

 
 
 
 

Hard software engineering interview questions


For some reasons that I'd rather keep private, I got interested in the kind of questions google, microsoft, amazon and other tech companies are asking to candidate during the recruitment process. Most of these questions are oriented towards algorithmics or mathematics. Some other are logic questions or puzzles the candidate is expected to be able to solve in a dozen of minutes in front of the interviewer.

If found various sites online providing lists of typical interview questions. Other sites are discussing topics like "the ten toughest questions asked by google" or by microsoft, etc.
Then I wondered how many of them I could answer on my own without help. The truth is that while I can answer most of these questions by myself, I still needed help for almost as much as half of them.

Anyway, I have collected my answers to a hundred of these questions below.

[Read More]

 
 
 
 

AirXCell - online programmable spreadsheet and R GUI


I want to share an interesting project that has appeared on the Web recently : the AirXCell project.
(As some of you already know, I am somewhat involved in this project :-)

AirXCell is an online R application framework currently supporting a programmable spreadsheet and an R development environment.

AirXCell is based on R - The GNU R Project for Statistical Computing. Current version is still somewhat limited yet fully functional.

Quoting the AirXCell User documentation :

AirXCell intents to revolution the world of spreadsheet applications and computational software by providing a product that:

  • implements a web application on the most cutting edge of technology that outpaces the current classical spreadsheet applications in terms of user experience, potential and features,
  • merges the world of spreadsheet application (e.g. Microsoft Excel, GNUmeric, etc.) and the world of computational software (e.g. Mathematica, Mathlab, etc.) and
  • revolutions the usual approach in spreadsheet applications.
[Read More]
 
 
 
 

Introduction to mathematical optimization


During my MSc studies, I followed an extended set of very interesting lectures related to Mathematical Optimization using basic mathematic concepts and simple algorithms such as the Newton (and/or Newton-based) methods or the simplex algorithm (and/or simplex based such as branch-and-bound, branch-and-cut, etc.).

Quoting Wikipedia:
"In the simplest case, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. The generalization of optimization theory and techniques to other formulations comprises a large area of applied mathematics.
More generally, optimization includes finding best available values of some objective function given a defined domain, including a variety of different types of objective functions and different types of domains."

I have these days a (very) little more than usual free time and I've compiled a resume of these lectures from my various notes and individual chapters resumes. So I decided to put this document online as it might help some of the future MSc students following any lecture related to Mathematical Optimization by providing them with an introduction to the field.

The resume is available here : resume_optim.pdf.

 
 
 
 

About linkedin, software architects and a little disappointment


I am really amazed and astonished by a few updates I've been seeing on linkedin recently.

I've been working these ten last years with incredibly gifted people. You know, the kind of guys you discuss with wondering whether you yourself will ever be as good, clever and keen as them. I really think being that good is nothing to be ashamed of so let's assume I can name these guys. The very first one I remember is Thomas Beck (Geneva, Switzerland) . I've been working two years under his supervision (he was the software architect on our project) and I have learn more about the job discussing with him than I ever did reading whatever software architecture or design related book (agile, DDD, whatever). Happily I have learn a lot more since I left him yet I'm quite sure he did even more so I believe I'm still far from reaching his level of mastering of the software architecture business.
Other people I would also mention here are Sebastien Ursini, Sebastien Marc and Thomas Caprez (Geneva and Lausanne / Switzerland). I haven't seen these folks since several years for some of them yet I can still pretty clearly remember what they taught me and there's not one single day where I don't benefit from these teachings in my job.

On the other hand, just as everybody, I really had much more often the occasion to work with terrible software engineers. I principally encountered two categories.

The first one is this kind of people that went to great engineering schools or universities and assume the time they invest in their studies is well enough and exempts them from providing any little additional effort to keep learning since they graduated. These people are fools believing they're great only because of some piece of paper assessing they have once been able to learn something. I hope all my very good french colleagues won't hate me for this but I have to say that specifically french engineers are subject to this bad tendency.
Unfortunately, life doesn't make any gift to anyone and most of them are sooner or later taught the hard way how they're wrong and start kicking their buts to actually start learning the job and make some progress.

The second category is way more dangerous. This is the kind of people that sell themselves as software architects without any real software development experience. These folks read lots of books, follow lots of software architecture blogs and assume that this exempts them from building their own experience before claiming being software architects. I'm not saying reading is not good, but I am pretty sure that it is in no way comparable to experience. Unfortunately, due to poor recruitment processes one one side, and the lack of good software engineers on the market on the other side, these guys manage to find a software architect job and end up taking software architecture decisions.

I am involved in the recruitment process in my current company (just as I was in my former companies). I take care of the technical assessment. I myself am usually a nice guy (well I think) and yet I show no mercy to candidates. I am pretty well aware that a mistake I make in this process might well lead me to work with bad engineers a few months later and this is a risk I'm not willing to take at all.
I am the guy killing those people. When I see someone coming in front of me with a resume claiming several years of experience in software architecture and not able to answer correctly the very first questions I'm asking him, it usually puts me in such a bad mood that I still keep the guy for the two hours that were planned and bury him 7 feet under ground. Hopefully the guy will work on a resume a little more humble before applying to another position (in another company, needless to say).
Just a word on "answering correctly": there is usually not only one good answer to a design problem or an architectural question, neither do I expect one. But I expect the candidate at least to build a proper conceptual model of the issue I'm presenting and to be able to outline a few solutions.

Now why am I putting all this online ?

[Read More]

 
 
 
 

niceideas-commons 1.1-beta-0.1


Following the initial release of the niceideas-commons package here : niceideas-commons 1.0-alpha-0.7, The niceideas-commons 1.1-beta-0.1 is released today.

Major changes are :

  • Basic relation mapping support added to the DAO framework
  • More helper and utilities related to resource finding and loading
  • More utilities of various kinds
  • Various bug fixes

[Read More]

 
 
 
 

Java - Create enum instances dynamically


I remember the introduction of the brand new enum type in Java 5 (1.5) was a very exciting announce. However, when I finally switched from 1.4 to 1.5 and actually tried Java's flavoured enum types, I was a bit disappointed.

Before that, I was using Josh Bloch's "Typesafe enum" pattern (effective java) for quite a long time and I didn't really see what was so much better with the new Java native enum construction. Ok, fine, there was the ability to use enum instances in switch - case statements which seemed fine, but what else ?

Besides, what I used to find great with the "typesafe enum" pattern is that it could be tricked and changed the way I wanted, for instance to be able to dynamically (at runtime) add enum instances to a specific typesafe enum class. I found it very disappointing not to be able to do the very same thing easily with the native Java enum construction.

And now you might wonder "Why the hell could one ever need to dynamically add enum values ?!?". You do, right ? Well, let's imagine this scenario:

You have a specific column in a DB table which contains various codes as values. There are more than hundred different codes actually in use in this column. Related to this, you have a business logic which performs different operations on the rows coming from this table, the actual kind of operation applied on the row depends on the value of this code. So there are chance you end up with a lot of if - elseif statements checking the actual value of the code.
I myself am allergic to using string comparison in conditions so I want to be able to map the values from this column to an enum type in Java. This way I can compare enum values instead of strings in my conditions and reduce my dependency on the format of the string value.

Now when there are more than a hundred different possible codes in the DB I really don't have any intent to define them all manually in my enum type. I want to define only the few I am actually using the Java code and let the system add the other ones dynamically, at runtime, when it (the ORM system or whatever I am using for reading the DB rows) encounters a new value from the DB.

Hence my need for dynamically added enum values.

So recently I faced this need once again and took a few hours to build a little solution which enables one to dynamically add values to a Java enum type. The solution is the following :

[Read More]

 
 
 
 

Java rocks !


I've been facing an interesting problem with string manipulation in Java lately at work. The requirement was the following :

We have a field on some screen where the user can type in a comment. The comment can have any length the user wants, absolutely any. Should he want to type in a comment of a million characters, he should be able to do so.

Now the right way to store this comment in a database is using a CLOB, a BLOB or a LONGVARCHAR or whatever feature the database natively provides to do so. Unfortunately that's not the way it was designed. Due to legacy integration needs, all these advance DB types are prohibited within our application. So the way we have to store the comment consists of using several rows with a single comment field of a maximum length of 500 characters. That means the long comment has to be split in several sub-strings of 500 characters and each of them is stored in a separate row in the DB table. The table has a counter as part of the primary key which is incremented for each new row belonging to the same comment. This way we can easily spot every row part of the same comment.

Now another problem we have is that under DB2 a field defined as VARCHAR(500) can contain 500 bytes max even though the strings are encoded in UTF-8 in the database. That means we might not be able to store 500 characters if the string contains one or more 2 bytes UTF-8 characters. Working in a french environment, this happens a lot.
So we had to write a little algorithm taking care of the splitting of the string in 500 bytes sub-strings.

The very first version of our algorithm was quite stupid and ended up in splitting the string in a quite naive way: we converted the string to a byte array following an UTF-8 encoding and split the byte array instead of the string. Then each of the 500 bytes arrays was converted back to a string before being inserted in the database.
Happily, we figured out quite soon that this doesn't work as it ends up quite often splitting the string right in the middle of a 2 bytes character. The byte arrays being then converted back to strings, the split 2 bytes character was corrupted and could not be corrected any more.

Before writing as smarter version of the algorithm which would manually test the byte length of the character right at the position of the split, we took a leap backward and wondered : "Can it be that Java doesn't offer natively a simple way to do just that ?"

And the answer is yes of course.

[Read More]

 
 
 
 

CommunityBoard


CommunityBoard is a sample multi-module maven / glassfish / eclipse Java EE project.

It realizes is a little Forum / Note publishing application. Its main purpose it to act as an introducing laboratory to Java EE programming. As such the functionalities are rather limited. Yet it covers the most fundamental aspects or issues with Java EE programing in the way it show hows to :

  • write entity beans with bi-directional relationship;
  • use these Entity beans in EJBs (Statless session beans);
  • use other EJBs in EJBs;
  • use EJBs in a servlet or a JSP located in a WAR (i.e. no processing of the @EJB annotation);
  • build a multi-module Java EE maven project with jars, wars, ears;
  • how to write JSPs with the JSTL (Ok I am not very proud of these JSPs yet they do the job) and
  • deploy a multi-module ear within Glassfish and use a container defined datasource

[Read More]

 
 
 
 

Funny developer tale


I've been working a few years ago on an architectural concept for some very specific piece of software my former company had to develop. The technical challenges were huge and the field was pretty complex. In addition, the timeframe was very little and we have had to rush a lot to get it ready and prototyped in time.

In the end we screwed up ... totally. The concept was miles away from what was required and we pretty much had to start it all over. Months of work were just good enough to be thrown away with the trash.

Not used at all to such failures, I decided to take some time to understand what happened, what went wrong.

My investigations led to the following story, a pretty funny though quite common developer tale :

[Read More]

 
 
 
 

DWR : A paradigm shift in web development


Wow ... This is and will most likely be the most pompous title for a post on this blog usually focused on much more concrete stuff.

Ok let's get into this :

First what is DWR ?

DWR stands for Direct Web Remoting - Easy Ajax for Java.

DWR is a Java library that enables Java on the server and JavaScript in a browser to interact and call each other as simply as possible.

Quoting the official website :

"DWR is a RPC library which makes it easy to call Java functions from JavaScript and to call JavaScript functions from Java (a.k.a Reverse Ajax)."
Read this : http://directwebremoting.org/dwr/introduction/index.html.

[Read More]

 
 
 
 

Snake ! 0.2-alpha-0.1


Snake is a little C++/OpenGL project which shows a snake eating apples on a two dimensional board. It really is very much like the famous Nokia phone game except the snake finds its way on its own.

No nice textures, no sweet drawings yet, the world elements are mostly simple spheres. Trivial OpenGL features such as fog, lightning and shadows are implemented though.
Oh, and the snake it quite stupid at the moment. I wrote the path finding algorithm in an hour or so and I really need to come up with something smarter.

[Read More]

 
 
 
 

niceideas-commons 1.0-alpha-0.6


The company for which I am working currently is the fourth for which I am a Java developer / architect. I am mostly programming in the Java language for quite a long time now, and I have to admit that I have my way of doing things and my little habits. I know a large amount of java libraries and use almost always the same sets for the same needs, with little variations.

But even with the very large set of java libraries available out there, there are a few classes or utilities that I keep re-developing myself again and again each time I start a new job. Not that most of these utilities are not already available somewhere but I dislike the implementations, or want something simple, or anything else. A few of these classes though are really unique by nature.

Well, lately I found myself tired of re-writing the same sets of classes again and again so I wrote them once more ... once and for all. I made from them an open-source and freely available project released under GNU LGPL license so that I can freely use them in my current company as well as for any future employer I might be working for.
Having done this I thought I could share them here.

[Read More]

 
 
 
 

hibernate's not-found="ignore" is buggy as hell


I'm working on a java application which makes an extensive usage of hibernate's relation mapping system. The later offers several ways to define association mapping. We mostly use many-to-one relation declarations. The problem comes from the database. It's a pre-relational, pre-transactional, legacy database running on a prehistorical IBM zSeries host. The data on this database is very often dumb or corrupted. The lack of a proper referential integrity support and the foolish design make us end up quite often following non-existent relations.

Happily, hibernate provides a semantic which allow the application not to bother when a relation is missing, just as the legacy app does. This semantic is the not-found="ignore" parameter on the relation definition.

However, the usage of this semantic resumes to open very wide the doors to oblivion.

[Read More]

 
 
 
 

Nokia n900 vs. Apple iPhone 3GS vs. HTC (OOpps.. Google) Nexus One


I'm looking for a new phone. The 3 models I found appealing are the ones mentioned in the title of this post. I'm posting here the criterions I ran through when looking at these phones and the reasons that make me choose one or the other.

[Read More]

 
 
 
 
 
« août 2014
lun.mar.mer.jeu.ven.sam.dim.
    
1
2
3
4
5
6
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
       
Today
 
Main | Next page »
© niceideas.ch