codingjourney.github.io/index2.html at master · codingjourney/codingjourney.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
<!DOCTYPE html>
<html lang="en">
<head>
                <title>meandering journey</title>
        <meta charset="utf-8" />
        <link href="./theme/css/main.css" type="text/css" rel="stylesheet" />
                <link href="http://meandering.journey.sk/feeds/all.atom.xml" type="application/atom+xml" rel="alternate" title="meandering journey Full Atom Feed" />
                                                <link href="http://meandering.journey.sk/feeds/.atom.xml" type="application/atom+xml" rel="alternate" title="meandering journey Categories Atom Feed" />
                                        </head>

<body id="index" class="home">
  <div class="all">
  <div class="extra-info">

    <aside>
  <h3>About the blog</h3>
  A platform to practice my written communication skills.
  The range of topics tends to surprise even myself.
</aside>
<aside>
  <h3>About the author</h3>
  My name is Ján Hušták. I live near
  <a href="http://maps.google.com/maps?q=bratislava&z=6">Bratislava</a>.
  I've been developing software professionally since 1998.
  The Java platform has served me well but I don't dwell on it.
</aside>
<aside>
  <h3>Links</h3>
  <a href="http://www.journey.sk">Main site</a>,
  <a href="http://coding.journey.sk">projects page</a>,
  <a href="https://github.com/codingjourney">GitHub</a>.
  Sorry, no social networks. I do read mail sent to
  coding &#97;&#116; journey.sk.
</aside>
    <aside class="links">
  <h3>                        <a href="./index.html">&laquo;</a>
                Page 2 / 6
            <a href="./index3.html">&raquo;</a>
    </h3>
      <a href="#the-levenshtein-rules">The Levenshtein rules</a>
      <a href="#the-levenshtein-puzzle">The Levenshtein puzzle</a>
      <a href="#software-as-knowledge-repository">Software as knowledge repository</a>
      <a href="#reputation-as-a-measure-of-success">Reputation as a measure of success</a>
      <a href="#the-psychology-of-casual-street-cleaning">The psychology of casual street cleaning</a>
      <a href="#the-minimal-patch-fallacy">The minimal patch fallacy</a>
      <a href="#pimp-my-calendar-aftermath">PIMp my calendar: Aftermath</a>
      <a href="#pimp-my-calendar-part-5">PIMp my calendar, part 5</a>
    </aside>
  <aside id="tags">
  <h3>Tags</h3>
          <a href="./tag/motivation.html">motivation</a>
      -     <a href="./tag/htpc.html">HTPC</a>
      -     <a href="./tag/openbsd.html">OpenBSD</a>
      -     <a href="./tag/qt.html">Qt</a>
      -     <a href="./tag/upsheet.html">upsheet</a>
      -     <a href="./tag/python.html">Python</a>
      -     <a href="./tag/kde.html">KDE</a>
      -     <a href="./tag/cloud-computing.html">cloud computing</a>
      -     <a href="./tag/caldav.html">CalDAV</a>
      -     <a href="./tag/howto.html">howto</a>
      -     <a href="./tag/jetty.html">Jetty</a>
      -     <a href="./tag/craftsmanship.html">craftsmanship</a>
      -     <a href="./tag/meta.html">meta</a>
      -     <a href="./tag/music.html">music</a>
      -     <a href="./tag/it-misadventures.html">IT misadventures</a>
      -     <a href="./tag/algorithms.html">algorithms</a>
      -     <a href="./tag/android.html">Android</a>
      -     <a href="./tag/cups.html">CUPS</a>
      -     <a href="./tag/security.html">security</a>
      -     <a href="./tag/html5.html">HTML5</a>
    </aside>


  </div><!-- /.extra-info -->
  <div class="main-column">
  <header id="banner" class="body">
    <h1><a href=".">meandering<img src="./theme/images/logo.png"/>journey</a></h1>
  </header><!-- /#banner -->
        <section id="content">

<article class="hentry">
  <header>
    <h3>
      <a name="the-levenshtein-rules"></a>
      <a href="./the-levenshtein-rules.html" rel="bookmark" title="Permalink to The Levenshtein rules">The Levenshtein rules</a>
    </h3>
  </header>
  <footer class="post-info">
      Published on <abbr class="published" title="2013-07-28T06:30:00"> Sun 28 July 2013 </abbr> under
        <a href="./tag/algorithms.html">algorithms</a>      </footer><!-- /.post-info -->
  <div class="entry-content"> <p>I continued my exploration of the <a href="./the-levenshtein-puzzle.html">Levenshtein distance</a> by writing an implementation in the <a href="http://docs.jboss.org/drools/release/6.0.0.Beta5/drools-expert-docs/html/ch04.html#d0e4036">Drools Rule Language</a> a.k.a. DRL. It turned out to be very interesting because the properties of DRL forced me to formulate the problem in a completely different way.</p>
<p>This isn't the right place to delve into rule engines but I'll try to explain the basics. A DRL program is executed in a <em>working memory</em> containing <em>facts</em> and <em>rules</em>. A rule is a simple statement of the form</p>
<div class="codehilite"><pre><span class="n">when</span>
    <span class="n">the</span> <span class="n">working</span> <span class="n">memory</span> <span class="n">is</span> <span class="n">in</span> <span class="n">such</span> <span class="n">and</span> <span class="n">such</span> <span class="n">state</span>
<span class="n">then</span>
    <span class="n">perform</span> <span class="n">these</span> <span class="n">actions</span>
    <span class="p">(</span><span class="n">potentially</span> <span class="n">affecting</span> <span class="n">the</span> <span class="n">working</span> <span class="n">memory</span><span class="p">)</span>
</pre></div>


<p>The <em>when</em> part contains a fact pattern. When a fact is inserted into the working memory all rules whose <em>when</em> parts match the new fact get their <em>then</em> parts executed. This may cause other facts to appear in the working memory, triggering a cascade of rule firing. Well-written rules adhere to two important principles:</p>
<ol>
<li>The <em>then</em> part should contain no conditional logic (as Drools creator Mark Proctor says, it should be <em>"if this then that"</em>, not <em>"if this then maybe that"</em>). All decision-making should be expressed in the <em>when</em> sections.</li>
<li>The rules should have the same semantics regardless of their execution order (e.g. when a new fact matches two rules it shouldn't matter which fires first).</li>
</ol>
<p>As you can see, a rule author gives up a lot of control over the program flow. The idea is to specify <em>what</em> should happen and let the rule engine figure out <em>how</em> to do it. The way it looks in practice is that you decompose your input into very small parts that are straightforward to reason about. From that you can formulate rules that let the engine construct the desired result.</p>
<p>My original Levenshtein distance algorithm used the concepts of identical and orthogonal sub-words. Those are not really suitable for a rule engine because their discovery is in itself quite complex. I replaced them with the idea of character locations. A character location is a simple object that says "there is an 'a' at offset 2 in the word 'banana'". Converting a word into character locations is trivial and I can then throw them into the working memory as new facts (the examples use pseudo-code rather than actual DRL syntax):</p>
<div class="codehilite"><pre><span class="n">when</span>
    <span class="n">word</span> <span class="o">:</span> <span class="n">String</span>
<span class="n">then</span>
    <span class="k">for</span> <span class="n">offset</span> <span class="n">from</span> <span class="mi">1</span> <span class="n">to</span> <span class="n">word</span><span class="p">.</span><span class="n">length</span>
    <span class="n">insert</span> <span class="n">CharacterLocation</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">offset</span><span class="p">)</span>
</pre></div>


<p>The rule will be triggered for each of the words as they are inserted into the working memory. Armed with a bunch of CharacterLocations, I can identify character matches:</p>
<div class="codehilite"><pre><span class="n">when</span>
    <span class="n">location1</span><span class="p">,</span> <span class="n">location2</span> <span class="o">:</span> <span class="n">CharacterLocation</span>
    <span class="n">location1</span><span class="p">.</span><span class="n">character</span> <span class="o">==</span> <span class="n">location2</span><span class="p">.</span><span class="n">character</span>
    <span class="n">location1</span><span class="p">.</span><span class="n">word</span> <span class="o">!=</span> <span class="n">location2</span><span class="p">.</span><span class="n">word</span>
<span class="n">then</span>
    <span class="n">insert</span> <span class="n">Match</span><span class="p">(</span><span class="n">location1</span><span class="p">,</span> <span class="n">location2</span><span class="p">)</span>
</pre></div>


<p>This rule, in turn, will be triggered for each suitable pair of CharacterLocations, generating all possible Matches:</p>
<p><img alt="invalid match set" src="./static/images/levenshtein2_all.svg" /></p>
<p>For the Levenshtein distance I need a combination of Matches that covers as much of the two words as possible. Not every combination makes sense:</p>
<p><img alt="invalid match set" src="./static/images/levenshtein2_invalid.svg" /></p>
<p>so I'm actually looking for <em>sequences</em> of <em>strictly ordered</em> Matches, such as this one:</p>
<p><img alt="valid match set" src="./static/images/levenshtein2_valid.svg" /></p>
<p>Generating valid sequences takes a bit of induction. I first create "seeds" - sequences containing just two Matches:</p>
<div class="codehilite"><pre><span class="n">when</span>
    <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">:</span> <span class="n">Match</span>
    <span class="n">x</span> <span class="o">&lt;</span> <span class="n">y</span>
    <span class="n">not</span> <span class="n">exists</span> <span class="n">Sequence</span> <span class="n">s</span> <span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">))</span>
<span class="n">then</span>
    <span class="n">insert</span> <span class="n">Sequence</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
</pre></div>


<p>I proceed to grow each Sequence from the middle, using the <code>visited</code> set to avoid creating the same one twice:</p>
<div class="codehilite"><pre><span class="n">when</span>
    <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">candidate</span> <span class="o">:</span> <span class="n">Match</span>
    <span class="n">s</span> <span class="o">:</span> <span class="n">Sequence</span>
    <span class="n">x</span> <span class="o">&lt;</span> <span class="n">candidate</span> <span class="o">&lt;</span> <span class="n">y</span>
    <span class="o">!</span><span class="n">s</span><span class="p">.</span><span class="n">visited</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="n">candidate</span><span class="p">)</span>
    <span class="o">!</span><span class="n">s</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="n">candidate</span><span class="p">)</span>
    <span class="n">s</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
    <span class="n">s</span><span class="p">.</span><span class="n">indexOf</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="o">==</span> <span class="n">s</span><span class="p">.</span><span class="n">indexOf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">then</span>
    <span class="n">insert</span> <span class="n">s</span><span class="p">.</span><span class="n">clone</span><span class="p">(</span><span class="n">visited</span> <span class="o">+=</span> <span class="n">candidate</span><span class="p">)</span>
    <span class="n">s</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">candidate</span> <span class="n">between</span> <span class="n">x</span> <span class="n">and</span> <span class="n">y</span><span class="p">)</span>
</pre></div>


<p>The distance corresponding to a Sequence is determined by the gaps it leaves open:</p>
<p><img alt="sequence with gaps" src="./static/images/levenshtein2_gaps.svg" /></p>
<p>so once all valid Sequences have been generated, I simply pick the best one:</p>
<div class="codehilite"><pre><span class="n">when</span>
    <span class="n">there</span> <span class="n">are</span> <span class="n">no</span> <span class="n">more</span> <span class="n">other</span> <span class="n">rules</span> <span class="n">to</span> <span class="n">run</span>
    <span class="n">s</span> <span class="o">:</span> <span class="n">set</span> <span class="n">of</span> <span class="n">all</span> <span class="n">Sequence</span> <span class="n">instances</span>
<span class="n">then</span>
    <span class="n">print</span> <span class="s">&quot;Found distance %s&quot;</span> <span class="o">%</span> <span class="n">min</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">_</span><span class="p">.</span><span class="n">distance</span><span class="p">))</span>
</pre></div>


<p>And that's it. From a complexity point of view, the algorithm is quite a pig. It explores the entire solution space and doesn't even use the best-known result for pruning. It isn't even easily parallelizable, with all the each-on-each semantics going on. It does, however, stick to the rule-based declarative approach so performance is the rule engine's problem ;-)</p> </div><!-- /.entry-content -->
  <hr/>
</article>
<article class="hentry">
  <header>
    <h3>
      <a name="the-levenshtein-puzzle"></a>
      <a href="./the-levenshtein-puzzle.html" rel="bookmark" title="Permalink to The Levenshtein puzzle">The Levenshtein puzzle</a>
    </h3>
  </header>
  <footer class="post-info">
      Published on <abbr class="published" title="2013-06-09T04:30:00"> Sun 09 June 2013 </abbr> under
        <a href="./tag/algorithms.html">algorithms</a>      </footer><!-- /.post-info -->
  <div class="entry-content"> <p>I've read a few blog posts recently that mentioned the concept of Levenshtein distance. It's a measure of the difference between two strings <a href="http://en.wikipedia.org/wiki/Levenshtein_distance">defined as</a> <em>"the minimum number of single-character edits (insertion, deletion, substitution) required to change one word into the other"</em>. The definition is very straightforward but when I thought about calculating it I saw no immediately obvious way. Rather than looking it up, I decided to discover an algorithm on my own, with nothing but the definition to start from.</p>
<p>After a period of requisite head-scratching and ad-hoc attempts I identified two trivial corner cases: identical words (<em>d=0</em>) and words with no letters in common (<em>d=max(first.length, second.length)</em>, let's call them <em>orthogonal</em>). Then came the crucial realization: any pair of words can be chopped up into a sequence of identical and orthogonal sub-words:</p>
<p style="font-family: monospace">
d<span style="color: #e08020">ar</span>t<span style="color: red">e</span>d<br/>
st<span style="color: #e08020">ar</span><span style="color: red">e</span>
</p>

<p>The total distance is then the sum of the distances of the orthogonal parts. Note that an orthogonal pair may consist of one empty and one non-empty string as well, such as "t" vs. "" in the example above.</p>
<p>Trouble is, there may be more than one way to slice the words:</p>
<p style="font-family: monospace">
b<span style="color: #e08020">a</span>r<span style="color: red">b</span>a<span style="color: #6060e0">ra</span><br/>
<span style="color: #e08020">a</span><span style="color: red">b</span><span style="color: #6060e0">ra</span>cadabra
</p>

<p style="font-family: monospace">
<span style="color: #e08020">b</span><span style="color: red">a</span><span style="color: #6060e0">r</span>bara<br/>
a<span style="color: #e08020">b</span>r<span style="color: red">a</span>cadab<span style="color: #6060e0">r</span>a
</p>

<p style="font-family: monospace">
b<span style="color: #e08020">a</span><span style="color: red">r</span><span style="color: #6060e0">b</span>a<span style="color: #a06060">ra</span><br/>
<span style="color: #e08020">a</span>b<span style="color: red">r</span>acada<span style="color: #6060e0">b</span><span style="color: #a06060">ra</span>
</p>

<p>and so on. The distances corresponding to these splits are 10, 11 and 8, respectively. The actual minimal distance is 6; discovering the correct split (or splits) is left as an exercise for the reader. The way my algorithm goes about it is a straightforward exhaustive trawling of the solution space. In pseudo-Python:</p>
<div class="codehilite"><pre><span class="n">def</span> <span class="n">distance</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span><span class="o">:</span>
  <span class="n">result</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="n">left</span><span class="p">.</span><span class="n">length</span><span class="p">,</span> <span class="n">right</span><span class="p">.</span><span class="n">length</span><span class="p">)</span>
  <span class="k">for</span> <span class="n">each</span> <span class="n">index</span> <span class="n">x</span> <span class="n">in</span> <span class="n">left</span><span class="o">:</span>
     <span class="n">letter</span> <span class="o">=</span> <span class="n">left</span><span class="p">[</span><span class="n">x</span><span class="p">]</span>
     <span class="k">for</span> <span class="n">each</span> <span class="n">location</span> <span class="n">y</span> <span class="n">of</span> <span class="n">letter</span> <span class="n">in</span> <span class="n">right</span><span class="o">:</span>
       <span class="n">subLeft</span>  <span class="o">=</span> <span class="n">left</span><span class="p">[</span><span class="n">x</span><span class="o">:</span><span class="p">]</span>
       <span class="n">subRight</span> <span class="o">=</span> <span class="n">right</span><span class="p">[</span><span class="n">y</span><span class="o">:</span><span class="p">]</span>
       <span class="n">beforeMatch</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
       <span class="n">match</span> <span class="o">=</span> <span class="n">length</span> <span class="n">of</span> <span class="n">the</span> <span class="n">identical</span> <span class="n">prefix</span> <span class="n">of</span> <span class="n">subLeft</span> <span class="n">and</span> <span class="n">subRight</span>
       <span class="n">afterMatch</span> <span class="o">=</span> <span class="n">distance</span><span class="p">(</span><span class="n">subLeft</span><span class="p">[</span><span class="n">match</span><span class="o">:</span><span class="p">],</span> <span class="n">subRight</span><span class="p">[</span><span class="n">match</span><span class="o">:</span><span class="p">])</span>
       <span class="n">newDistance</span> <span class="o">=</span> <span class="n">beforeMatch</span> <span class="o">+</span> <span class="n">afterMatch</span>
       <span class="n">result</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">newDistance</span><span class="p">)</span>
  <span class="k">return</span> <span class="n">result</span>
</pre></div>


<p>As you can see, it's a recursive function not amenable to tail-call optimization so it's prone to overflowing the stack, among other things. There's a ton of potential for performance improvement. One thing I actually have in my implementation is that when beforeMatch &gt;= result I don't go into recursion as it can't possibly produce a lower newDistance. This mini-optimization is omitted from the pseudo-code for clarity.</p>
<p>Other than that, proper ordering seems to be the key. The algorithm is asymmetric in that it "starts from" the left word and tries to "reach" the right one. Should it always start from the shorter word or from the longer one? Or from the word with fewer unique letters or more unique letters? Should the letters with most locations in the starting word be tried first? Or the ones closest to its beginning?</p>
<p>A proper complexity analysis (or a robust set of test pairs with known Levenshtein distances) would answer those questions, increasing the likelihood of encountering good matches early, cutting off bigger branches of the search tree. Alas, I have no time for such work, no matter how much fun it would be and how much I'd learn from it. I've solved the puzzle itself and the optimization paths are all well trodden by more capable explorers, I'm sure. Also, given that my solution completely ignores the distribution of letters in the "target" word, there's bound to be a fundamentally better, more symmetric approach. I'm looking forward to reading up on it :-)</p>
<p>My original implementation is in JavaScript with a <a href="./static/snippets/levenshtein.html">web page</a> for easy testing of correctness. I later re-wrote it as command-line scripts in <a href="./static/snippets/levenshtein.js">node.js</a>, <a href="./static/snippets/levenshtein.py">Python</a> and <a href="./static/snippets/levenshtein.go">Go</a> in order to compare performance. Surprisingly, Go seems to be only about 33 % faster than both node.js and Python. Mind you, I don't know any of those languages intimately enough to be sure I didn't screw something up performance-wise so the comparison is very rough and not serious in any way. Tasting the different langages' flavors was great fun, though, and I'm itching for C and Drools implementations if I find the time. A functional variant in Scala or Clojure would also be nice but swapping between the imperative and functional mind-sets is <em>really</em> time-consuming.</p> </div><!-- /.entry-content -->
  <hr/>
</article>
<article class="hentry">
  <header>
    <h3>
      <a name="software-as-knowledge-repository"></a>
      <a href="./software-as-knowledge-repository.html" rel="bookmark" title="Permalink to Software as knowledge repository">Software as knowledge repository</a>
    </h3>
  </header>
  <footer class="post-info">
      Published on <abbr class="published" title="2013-05-27T16:30:00"> Mon 27 May 2013 </abbr> under
        <a href="./tag/craftsmanship.html">craftsmanship</a>      </footer><!-- /.post-info -->
  <div class="entry-content"> <p><em><strong>Summary</strong> Software is the most relevant representation of knowledge about a business. It is crucial that this knowledge be accessible to decision-makers. When managers resent refactoring as "not adding business value", they overlook the value of software as a knowledge repository.</em></p>
<p>Software is the most relevant representation of knowledge about a business. Although it only implements automated workflows, those form an ever-growing part of business activities. Source code is the only 100% accurate and up-to-date documentation of such workflows and often reveals a lot about manual processes as well.</p>
<p>More importantly, software is <em>executable</em> knowledge. It doesn't simply describe automated processes - software <em>becomes</em> the process once running in a production environment.</p>
<h4>The problem of access</h4>
<p>It follows that a decision-maker trying to truly master the processes in a business needs fast, flexible and efficient access to software-bound knowledge. What's more, she needs <em>read-write</em> access in order to augment, extend and deepen the knowledge. It's a non-trivial requirement:</p>
<ul>
<li>In software, business-specific knowledge is intertwined with lots of other concerns (infrastructure, ergonomics, security, governance, ...).</li>
<li>Source code addresses several audiences at once (end user, other programmers, compiler).</li>
<li>Software tends to be heterogeneous even in modestly-sized businesses (different authors, different programming languages and platforms, ...).</li>
<li>Software has to be precise enough for a computer to execute (i.e. extremely precise by human measure).</li>
</ul>
<p>The way software is usually managed presents additional difficulties. Business knowledge gets "baked in" at various stages of a complicated dance involving decision-makers, analysts, architects, programmers, testers - oh, and hopefully also end users. The transformation progresses via specification documents, UML diagrams, Scrum stories, bug reports or what have you. Given sufficient skill and will on all sides, there is reasonable certainty that the software does what decision-makers intend it to do. The process is neither flexible, fast nor efficient, however. Improving this situation surely means competitive advantage.</p>
<h4>Distilling knowledge</h4>
<p>Many approaches have been tried to bring stakeholders closer to software (and many more will be tried as software grows more important). They often start by isolating business-specific parts of a program (a.k.a. "business logic") from the rest. The idea is sensible enough yet fraught with pitfalls:</p>
<ul>
<li>The boundary between generic and business-specific code is often blurred.</li>
<li>Business-specific algorithms can exist at many levels of abstraction.</li>
<li>Local peculiarities creep into unforeseen places (masked as validation criteria etc.).</li>
<li>Isolating parameters is much easier than isolating an algorithm.</li>
</ul>
<p>The last point is especially relevant. Many businesses end up with an agreed set of easy-to-tweak configuration settings, complemented by a change-request process that supposedly balances flexibility with maintainability. Such a set-up makes it all too easy to make the wrong trade-offs under schedule pressure. Another possible outcome is programming by configuration, i.e. spiking a program with so many parameters as to make both the code and the configuration extremely complex (this is the shadowy domain of SAP consultants).</p>
<h4>Matters of presentation</h4>
<p>When it comes to giving decision-makers useful access, extracting algorithms as well as parameters is clearly essential but won't suffice unless the CEO is fluent in the language in which the code is written. Making business logic palatable to non-technical folks is therefore a major area of activity featuring its own array of methods and tools, with <a href="http://en.wikipedia.org/wiki/Business_process_management">Business Process Management</a> (BPM) apparently in vogue these days, complemented by <a href="http://www.brcommunity.com/">Business Rule Mangement</a> (BRM).</p>
<p>Such tools are definitely useful but they only go part of the way. Even with declarative approaches such as BRM, understanding software requires algorithmic thinking. It's a hurdle no programming language nor diagramming technique will overcome. Algorithmic thinking is a skill and can be learned, however. The recent debate regarding <a href="http://www.digitaltrends.com/computing/why-learning-to-code-is-not-just-a-horrible-trend/">whether everyone should learn to code</a> is, at its core, about letting people comprehend the world around them as software pervades it. The concern is especially relevant for business decision-makers.</p>
<p>It will take a generation shift (maybe several) for algorithmic thinking to become common among non-programmers. Mediation between decision-makers and software is therefore poised to remain a feature of the business world for the foreseeable future. As mentioned, the process tends to be rather chaotic in practice and varies immensely between businesses. What's universal is the ever-stronger imperative to make it fast, flexible and efficient.</p>
<h4>Perils of mediation</h4>
<p>This is where agile methodologies enter the picture. <a href="http://agilemanifesto.org/">The Agile Manifesto</a> reads like a sigh of frustration with the communication barriers between programmers and other stakeholders. Yet it tackles precisely the interface between decision-makers and the software they pay for - which, as is hopefully clear by now, is a really tough nut to crack.</p>
<p>Agile has endured some <a href="http://www.ambysoft.com/certification/scam.html">backlash</a> over the years but it represents great progress nevertheless. It clarifies that there are aspects to developing software that don't have technical solutions. That's always been obvious to managers but putting it in words programmers can understand was certainly commendable. Overall, agile practices do result in <a href="http://www.ibm.com/developerworks/rational/agile/agile-survey/">improvements</a> when applied properly.</p>
<p>The most common cause of failure in agile projects seems to be misunderstanding and mistrust of the <a href="http://agilemanifesto.org/principles.html">agile principles</a> on the part of key stakeholders. The principles are thus not applied consistently and the project ends up a half-hearted faux-agile mess.</p>
<p>Two agile values are particularly difficult yet vital from the perspective of software as knowledge repository:</p>
<ul>
<li><strong>Working software</strong> tends to be misunderstood by non-programmers - they don't appreciate how fragile working software is and how easy it is to introduce bugs. Hence the ignorance of test coverage and the general disregard for software quality in the presence of other priorities.</li>
<li><strong>Responding to change</strong> tends to be misinterpreted by programmers and non-programmers alike as giving in to schedule pressure. Requirement analysis becomes sloppy, necessary maintenance is skipped.</li>
</ul>
<p>Both pitfalls lead to low-quality code that makes business-specific knowledge catastrophically opaque. The problem can arise surprisingly quickly and the consequences are serious. Requirements are not met, changes take lots of time and introduce bizarre bugs, developers quitting cause undue disruption - a typical project in disarray.</p>
<h4>Refactoring to the rescue</h4>
<p>The agile practice of <a href="http://refactoring.com/">refactoring</a> aims to achieve high software quality without slowing down the pace of development, even increasing it. Briefly, refactoring means changing software in small incremental steps and confirming the changes through automated test runs (the test runs ensure the behavior of the software stays the same as its structure improves).</p>
<p>Refactoring is intended to be used early and often, preventing the build-up of hard-to-understand code. The proposition is counter-intuitive because it sounds like extra work. As a result, many teams never adopt the practice. It's a shame: refactoring does speed up coding by slashing the cognitive load imposed on the programmer. In other words, there are far fewer things to worry about when modifications are modest and unit tests are robust.</p>
<h4>Why no-one does it</h4>
<p>What happens in practice is that software quality is left to atrophy until a developer comes to a decision-maker and says "Hey, we <em>really</em> need to refactor this." Time is alotted but it's not enough so big chunks of the code are cleaned up haphazardly with no test coverage. A flurry of new bugs appears and refactoring gets another blemish on its reputation even though it was never brought to bear. Overcoming this anti-pattern requires adaptation on both sides:</p>
<ul>
<li>Software developers must get serious practical exposure to true refactoring as part of their training. The impact has to be experienced to be appreciated.</li>
<li>Decision-makers must admit to themselves how critical sotware is to their organization and stop treating it as a "cost center" or an obstacle to achieving business objectives.</li>
</ul>
<h4>Why you should do it</h4>
<p>A well-maintained body of code is valuable whether or not it's directly readable by non-technical decision-makers. Knowledge is at hand, answers are quick, changes are smooth. The company becomes nimble in a way it never could without automation. At a software startup, this notion forms the core of its competitive strategy - that's why working at a startup is so alluring to programmers. It also drives startups' disruptive potential. Decision-makers at conventional companies face the tough task of integrating the value of software into their cultures even if they don't compete with startups. <a href="http://www.businessspectator.com.au/article/2012/12/18/technology/app-assault-taxi-monopoly">Once they do</a> it may be too late.</p> </div><!-- /.entry-content -->
  <hr/>
</article>
<article class="hentry">
  <header>
    <h3>
      <a name="reputation-as-a-measure-of-success"></a>
      <a href="./reputation-as-a-measure-of-success.html" rel="bookmark" title="Permalink to Reputation as a measure of success">Reputation as a measure of success</a>
    </h3>
  </header>
  <footer class="post-info">
      Published on <abbr class="published" title="2013-05-15T06:31:00"> Wed 15 May 2013 </abbr> under
        <a href="./tag/craftsmanship.html">craftsmanship</a>,        <a href="./tag/motivation.html">motivation</a>      </footer><!-- /.post-info -->
  <div class="entry-content"> <p>I prefer definitions of success that don't involve exclusivity. The problem with exclusivity is that it doesn't scale. When success is defined as being "the best in the world", for example, the number of successful people is limited by the number of categories in which one can be "the best in the world". Many companies thus present themselves as "the global leader" in whatever absurd bombastic-sounding niche they dream up for themselves. In addition, exclusivity is ethically questionable - in the words of <a href="http://www.youtube.com/watch?v=Xt_YhSxjshY">Scatman John</a>, <em>"how can someone win when winning means that someone loses?"</em>.</p>
<p>I believe everyone deserves the confidence and satisfaction that comes with success. Definitions that don't require excluding anyone are preferable from this point of view. For a business, "being profitable" may be one such definition of success. "Growing consistently" may be another one, although growth does become exclusive once a market matures.</p>
<p>The other extreme - feel-good notions of "success" that don't require any effort - is even more problematic. Slight positive bias in one's self image is said to help achieve goals but the goals have to be there in the first place. (Interesting aside: does presence of goals indicate absence of success? Is success a state or a process? Why do we want to succeed, anyway?)</p>
<p>I became aware of these issues quite early in my youth and decided my definition of career success would be <em>"achieving respect in a community of competent professionals"</em>. This was before the Internet. I can now say I've been achieving this success through most of my career if the "community" is defined as one's workplace and its circle of competent professionals.</p>
<p>That's no longer enough. For years, I have been standing on the sidelines of the great community that is the Internet. I would love to achieve a measure of respect there but it's quite scary. As the Red Woman says, <a href="http://www.youtube.com/watch?v=_YmVI84iYOQ">the Net is vast and full of strangers</a>, many of them jerks or worse. Even the sub-Internet of "competent software development professionals" is vast and full of strangers, many of them jerks or worse.</p>
<p>This brings up an awkward fact: when a community becomes large enough, respect of peers becomes exclusive. Respecting someone requires being aware of their existence, achievements and other attributes. Awareness is a limited resource. In my team at work there are so few of us we can comfortably judge each other's competence and award respect to everyone who deserves it. On the internet, however, I compete for the respect of my peers just as they compete for mine.</p>
<p>What to do about this? It's obvious that my youthful definition of success was flawed as it didn't correspond to my own ethics. I need to formulate another definition fully immune from exclusion. Perhaps something like "creating works of high quality useful to customers and delightful to users", as mundane as that sounds. (Of course, the previous definition did mention "works of high quality" between the lines: it spoke of "competent professionals" rather than "gullible fools".)</p>
<p>Having said that, my craving for "respect" doesn't feel like a symptom of vanity. I'd say it reflects a pretty basic human need for acceptance within the group I identify with. When such acceptance is a scarcity I can either give up on being accepted, pursue the acceptance to the exclusion of others or choose a different, smaller community to participate in. I don't feel like giving up but both of the other options involve, well, talking to strangers. Oh my...</p> </div><!-- /.entry-content -->
  <hr/>
</article>
<article class="hentry">
  <header>
    <h3>
      <a name="the-psychology-of-casual-street-cleaning"></a>
      <a href="./the-psychology-of-casual-street-cleaning.html" rel="bookmark" title="Permalink to The psychology of casual street cleaning">The psychology of casual street cleaning</a>
    </h3>
  </header>
  <footer class="post-info">
      Published on <abbr class="published" title="2013-05-11T00:21:00"> Sat 11 May 2013 </abbr> under
        <a href="./tag/motivation.html">motivation</a>      </footer><!-- /.post-info -->
  <div class="entry-content"> <p><em><strong>Summary</strong> Picking up other people's trash is an empowering gesture that turns you from a whiner into a fixer.</em></p>
<p>I've developed a peculiar habit. Sometimes when I'm approaching our apartment block I scan the ground for small pieces of trash. If I spot one not too far from my trajectory I pick it up and put it where it belongs - in a nearby dumpster. This behavior is much less common than you might think. It's also surprisingly rewarding.</p>
<p>Street litter has always been a common problem here in Bratislava and it used to upset me. That's the prevailing attitude in this city. People don't like dirty streets and will tell you so when asked. Few, however, do anything about it. There are several psychological barriers:</p>
<ul>
<li>It is widely acknowledged that someone else should do the cleaning. Some - one might call them "conservatives" - would tell you it's up to those who did the littering. Others - "socialists" perhaps - would maintain it's the city administration's responsibility. Both notions are rather naive. The underlying attitude is that it's simply not fair that we, upstanding citizens who never litter the streets, should clean up after those who do.</li>
<li>When most people think of street cleaning they imagine removing <em>all</em> the litter from a substantial area. That's obviously a lot of work which needs many people if it's to be finished reasonably soon. Coordination is required and before you know it there's a project to manage.</li>
<li>Another mental block stems from the sheer futility of the effort. When a street does get cleaned up by municipal workers or by "spring cleaning" volunteers it doesn't take long for new litter to bloom. And it doesn't take a lot of litter to make a street look messy.</li>
<li>There is the simple unpleasantness of trash itself. Picking up someone else's cigarette butt and carrying it to a trash can means overcoming a hint of revulsion. It's ironic: the very feeling that motivates street cleaning also makes it difficult.</li>
</ul>
<p>Coping with these inhibitions is all about awareness. Street litter presents no immediate threat nor opportunity so it's blocked out by the subconscious most of the time, along with many other details of the urban exterior. The blocking takes work, however. Garbage is visually loud - it mostly consists of discarded packaging designed to stand out on store shelves. Navigating dirty streets thus incurs a subliminal mental cost we're mostly unaware of. Once we recognize the full magnitude of the cost we become more willing to deal with the problem.</p>
<p>Another thing to realize is that removing even a single piece of trash reduces the mental cost in a tangible way. The street may be quite as dirty as before but the piece we picked up must have caught our attention which means it was somehow "more important" than the rest of the environment, amplifying the cost reduction. The immediacy of the reduction gives it even more impact (the reptilian brain is a sucker for immediate rewards).</p>
<p>From a more long-term perspective, the effort to remove one piece of trash is a one-time investment which pays off every time we visit the affected place. This is a delayed reward further compromised by new garbage appearing all the time, so it's not very significant. What's more important is that if we experience the immediate reward often enough a habit starts to form, lowering the mental cost of the act itself and making the reward even more attractive. A virtuous cycle forms.</p>
<p>All of this speculation may sound rather abstract but the psychological benefit I've experienced is real and substantial. When I notice a piece of litter these days it doesn't bother me anymore. I either pick it up and dispose of it properly or concede to myself that it's too far off my path. There's a sober honesty and clarity about it which does feel liberating. At the same time, I get regular experiences of doing noticeable good with modest effort.</p>
<p>In conclusion, I can recommend casual street cleaning as a worthwhile activity (given proper sanitary precautions, of course). Next time you find yourself angry at the unknown hooligans, why not try undoing their carelessness? You will help yourself more than anyone else.</p> </div><!-- /.entry-content -->
  <hr/>
</article>
<article class="hentry">
  <header>
    <h3>
      <a name="the-minimal-patch-fallacy"></a>
      <a href="./the-minimal-patch-fallacy.html" rel="bookmark" title="Permalink to The minimal patch fallacy">The minimal patch fallacy</a>
    </h3>
  </header>
  <footer class="post-info">
      Published on <abbr class="published" title="2013-04-25T21:10:00"> Thu 25 April 2013 </abbr> under
        <a href="./tag/craftsmanship.html">craftsmanship</a>      </footer><!-- /.post-info -->
  <div class="entry-content"> <p>One of the goals of refactoring is to improve the readability of code. I have noticed, however, that I sometimes forsake a refactoring in order to preserve the readability of the resulting changeset. It tends to happen when I'm making a sensitive change in a complicated spot - the point is to make future "detective work" easier. I have evolved this approach after delving into Mercurial history way too many times while trying to understand opaque code.</p>
<p>At first blush, it seems there is a genuine trade-off at play. A refactoring - by definition - introduces changes that don't impact the observable behavior of a program. Mixing such modifications into a bug-fixing or feature-implementing changeset necessarily obscures its purpose. Clarity of the code is at odds with the clarity of its history.</p>
<p>Closer inspection reveals that the trade-off is pure nonsense. <em>Code</em> gets executed - not history. The present code must hence be readable on its own, in its present context. The very need to consult version control only arises when there are code smells all over the place. The urge to keep history clean is thus nothing more than a symptom of fear - fear of disrupting the fragile mess the code has become. <em>The same fear refactoring works to obliterate.</em></p>
<p>Having said that, isolating refactorings in their own changesets is certainly a good idea - it makes the history clearer with no negative impact on the code. It may not be always possible, though (the need for a clean-up often arises in the middle of other work).</p>
<p>I fell prey to the minimal patch fallacy while working on an allegedly agile project. Despite the efforts of all involved there came a point when deadlines defeated test coverage, technical debt payments were postponed and all that remained were stand-ups. I started leaning on code history not long thereafter. Goes to show that cargo-cult agility ruins not just the code but also programming habits...</p> </div><!-- /.entry-content -->
  <hr/>
</article>
<article class="hentry">
  <header>
    <h3>
      <a name="pimp-my-calendar-aftermath"></a>
      <a href="./pimp-my-calendar-aftermath.html" rel="bookmark" title="Permalink to PIMp my calendar: Aftermath">PIMp my calendar: Aftermath</a>
    </h3>
  </header>
  <footer class="post-info">
      Published on <abbr class="published" title="2013-04-21T18:55:00"> Sun 21 April 2013 </abbr> under
        <a href="./tag/it-misadventures.html">IT misadventures</a>,        <a href="./tag/android.html">Android</a>,        <a href="./tag/caldav.html">CalDAV</a>      </footer><!-- /.post-info -->
  <div class="entry-content"> <p>I spent some of my free time over the past few weeks trying to get the Calendar app on my Android phone to run with a non-Google and non-MS-Exchange account (see <a href="./pimp-my-calendar-part-1.html">part 1</a>, <a href="./pimp-my-calendar-part-2.html">part 2</a>, <a href="./pimp-my-calendar-part-3.html">part 3</a>, <a href="./pimp-my-calendar-part-4.html">part 4</a> and <a href="./pimp-my-calendar-part-5.html">part 5</a>). As with my <a href="./the-battle-of-the-c5280-aftermath.html">printer woes</a>, it turned out to be an enlightening excercise. I ended up running the <a href="http://radicale.org/">Radicale</a> CALDav server on my home server and Marten Gajda's <a href="http://dmfs.org/caldav/">CalDAV adapter</a> on the little machine. I discovered fun facts about Android and its development tools along the way (the highlight was discovering that changing the minimum API level in Eclipse ADT fails to trigger a re-build).</p>
<p>As for the lessons learned, I guess there is a balance to be struck when troubleshooting a technical issue. One approach involves poking and prodding the system, observing responses and formulating hypotheses. Another option is to study information <em>about</em> the system - source code, documentation, online forums and other resources. Combining these modes optimally is rather delicate and I do tend to err on the side of experimentation. The knowledge it yields may be overly specific and not very reusable but the process fosters a problem-solving mind-set that comes in handy in other situations as well. Besides, documentation may be out of date and, <a href="./pimp-my-calendar-part-4.html">as we've seen</a>, even source code may not be quite what it seems. In fact, what hindered me in resolving the calendaring problem was that I failed to recognize the extent of the system I was working with - namely that it included the development environment.</p> </div><!-- /.entry-content -->
  <hr/>
</article>
<article class="hentry">
  <header>
    <h3>
      <a name="pimp-my-calendar-part-5"></a>
      <a href="./pimp-my-calendar-part-5.html" rel="bookmark" title="Permalink to PIMp my calendar, part 5">PIMp my calendar, part 5</a>
    </h3>
  </header>
  <footer class="post-info">
      Published on <abbr class="published" title="2013-04-21T18:55:00"> Sun 21 April 2013 </abbr> under
        <a href="./tag/it-misadventures.html">IT misadventures</a>,        <a href="./tag/android.html">Android</a>,        <a href="./tag/caldav.html">CalDAV</a>      </footer><!-- /.post-info -->
  <div class="entry-content"> <p>The self-hosted calendaring solution I had been <a href="./pimp-my-calendar-part-4.html">trying to set up</a> turned out not to work due to an incompatibility between my Android phone and the CalDAV client I tried to run on it. My options at this point consisted of</p>
<ol>
<li>upgrading the firmware in my phone to Ice Cream Sandwich or higher</li>
<li>buying a new phone</li>
<li>porting the CalDAV Adapter to the unofficial and unsupported APIs</li>
<li>choosing a different calendaring solution</li>
</ol>
<p>My handset is a <a href="http://www.google.com/url?q=http://en.wikipedia.org/wiki/GeeksPhone_One">Geeksphone One</a> - a sturdy little machine well supported by Cyanogen Mod yet simply too weak to handle anything beyond 2.3.x. It's condemned to stay in the <a href="http://en.wikipedia.org/wiki/File:Android-dist-by-dessert.png">Gingerbread zombie army</a> until I retire it. Buying a new phone is plausible in the mid- to long-term but I do want a slide-out QWERTY keyboard. Looks like I'll have to overcome my allergy to <a href="http://www.gsmarena.com/sony_ericsson_xperia_pro-3779.php">Sony</a>.</p>
<p>I looked into back-porting the <a href="https://github.com/gggard/AndroidCaldavSyncAdapater">CalDAV adapter</a> to the unofficial APIs available in Gingerbread. It seemed like a ton of work with dubious benefits - especially when I found out the work had already been done. You see, the adapter I'd been working with (written by Gérald Garcia) was not the only one I'd found - there is also <a href="http://dmfs.org/caldav/">another adapter</a> written by Marten Gajda which I hadn't considered since it isn't distributed as open-source. It does work under Gingerbread, however, which made a proper impression on me given everything I knew at this stage.</p>
<p>I ended up doing something I hadn't done in years - I purchased and installed a piece of closed-source software. One thing that convinced me was Marten's website which is simple and sticks to the point; the same can be said of the software. Unfortunately, even with a bona-fide polished product on the Android side it wasn't smooth sailing. I had a Radicale instance installed on my notebook for debugging and it talked to the adapter just fine - unlike the home server. Both were running Radicale 0.7 by this point so I compared their OS-specific patches (using <em>apt-get src</em> on my Debian notebook and the ports tree on the OpenBSD home server). One of the Debian patches added automatic creation of calendars which was lacking in the OpenBSD version (this functionality is mentioned in current Radicale documentation but that supposedly refers to 0.7.1; the experiment with <a href="http://www.mozilla.org/projects/calendar/lightning/">Mozilla Lightning</a> in <a href="./pimp-my-calendar-part-3.html">part 3</a> had worked because the Debian version was involved). Porting the patch and installing from ports was a matter of minutes. After that, <strong>everything worked like a charm</strong>.</p> </div><!-- /.entry-content -->
  <hr/>
</article>
                        <a href="./index.html">&laquo;</a>
                Page 2 / 6
            <a href="./index3.html">&raquo;</a>
    </section><!-- /#content -->
        <footer id="contentinfo" class="body">
                <address id="about" class="vcard body">
                Proudly powered by <a href="http://getpelican.com/">Pelican</a>,
                which takes great advantage of <a href="http://python.org">Python</a>.
                </address><!-- /#about -->
        </footer><!-- /#contentinfo -->
  </div><!-- /.main-column -->
  </div><!-- /.all -->
</body>
</html>