Dive deep into Kadane’s algorithm

I am writing this post as I could not find any reliable material that could clearly explain the intuition behind Kadane’s algorithm.

To start, here’s a (very) crisp explanation of Kadane’s algorithm from Wikipedia.

Kadane’s algorithm begins with a simple inductive question: if we know the maximum subarray sum ending at position $i$ (call this $B_i$ ), what is the maximum subarray sum ending at position $i+1$ (equivalently, what is $B_{i+1})$ ? The answer turns out to be relatively straightforward: either the maximum subarray sum ending at position $i+1$ includes the maximum subarray sum ending at position $i$ as a prefix, or it doesn’t (equivalently $B_{i+1}=max(A_{i+1}, A_{i+1} + B_i)$ , where $A_{i+1}$ is the element at index $i+1$

Unfortunately, when you see this explanation, it is hard to convince yourself why the algorithm works, but once I prove it to you, then it’ll seem trivial. Here is my attempt to make this explanation more easy to digest.

Continue reading “Dive deep into Kadane’s algorithm” →

Proof for the divisibility test of 3

This post is a bit mathematical, and talks about the divisibility test of 3 we all learnt in high school. As I’m not aiming for brevity but for completeness and ease of explanation let me remind you all of the theorem involved to test the divisibility of a number by 3 and then move onto its proof.

To test the divisibility of a number N by 3, recursively sum the digits that make up the number till you reach a manageable number whose divisibility of 3 can be inferred trivially.

For example if we’re testing the divisibility of 3423 by 3, we’ll add up 3+4+2+3 and check if that sum is divisible by 3. We know that 12 is divisible by 3, hence the number 3423 is divisible by 3 as well . If we did not know that 12 was divisible by 3 we would have summed up the digits of 12 and and try to see if that sum is divisible by 3 and so on.

Proof for theorem:

The main crux of the proof is representing the number as a specific set of additions and multiplications and your work is almost done. If we have a number $N$ with $n$ digits $a_{1}a_{2} \ldots a_{n}$ , then the number $N$ can also be represented as follows

$N = a_1 \cdot 10^{n-1} + a_2 \cdot 10^{n-2} + ... + a_n \cdot 1$

With out proof in mind let’s separate the sum of all the digits of $N$ i.e, $a_1 + a_2 + \ldots + a_n$ and see what we get.

$N = \left(\underbrace{99\ldots9}_{n\text{-times}} \cdot a_1 + \underbrace{99\ldots9}_{n-1\text{-times}} \cdot a_2 + \ldots 9 \cdot a_{n-1} \right) + \left( a_1 + a_2 + \ldots + a_n \right)$

Now if we can prove that the right glob we have here is divisible by 3 then our job is done (We’ll know that the if the sum of its digits is divisible by 3 then $N$ is divisible by 3).

Corollary #1:

A number with just 9 as its digits is divisble by 3

Proof for Corollary #1:

We know that any number which is divisible by 9 is divisible by 3 as well $\left(3 * 3 = 9\right)$ . So a number $M$ which has $m$ digits in it and all $m$ digits are 9 can be represented as

$\underbrace{99\ldots9}_{m\text{-times}} = 9 \cdot 10^{m-1} + 9 \cdot 10^{m-2} + ... + 9$

On further simplification we get

$\underbrace{99\ldots9}_{m\text{-times}} = 9 \left(10^{m-1} + 10^{m-2} + ... + 1\right)$

The above proves the corollary. With the corollary proven we can concur that the sum of the digits of a number should also be divisible by 3 if a number is divisible by 3 and vice versa.

Acknowledgement: I first saw this proof on the One Mathematical Cat website, but the instructor just proved it for a specific case of 3 digits. This was my effort in generalizing the proof for any arbitrary number of digits.

Deploy Meteor apps on Bluemix in 5 easy steps

All IBMers have free access to Bluemix, unfortunately Bluemix does support the Meteor buildpack by default (Node.js app). Here’s how you can work around that limitation in 5 simple steps

I’m assuming that you’ve logged into Bluemix via the CloudFoundry command line interface

Step #1: Go to the Meteor application directory on your local machine

cd path/to/meteor/app/dir

Step #2: Add .cfignore file to ignore downloaded packages associated with this app (This will be downloaded again while deploying Meteor, so no need to upload them).

echo ".meteor/local" >> ./.cfignore

Step #3: Create a MongoLab service (specifically go for MongoLab and not any other MongoDB provider). At the time of this writing Bluemix did not have a MongoLab instance in Europe (eu-gb) so you will probably be able to deploy your Meteor app only in North America (US-South).

cf create-service mongolab sandbox

Step #4: Bind the created Mongolab service to this application

cf bind-service

Step #5: Push the app with Ben Cox‘s Meteor buildpack for Bluemix.

cf push  -b https://github.com/ind1go/bluemix-buildpack-meteor.git

That’s it. Wait for the push to succeed, you should see your Meteor application up an running on Bluemix!

Dynamic Arrays: Facts and figures

The one problem we face with arrays is that its size is fixed. We can run out of space once the array has been filled up. This is pretty bad in production as we cannot really estimate the size of the array beforehand in many situations. Of course, we could allocate a huge array but we’d be wasting resources at the end of the day if they’re not populated. Sounds like a chicken and an egg problem.. How would we want to solve it?

The answer is dynamic array allocation. The concept is pretty simple. While instantiating a dynamic array, initially allocate a considerable size $N$ (for example, 10 as done in Java). Then once we run out of space we double its size from $N$ to $2 \cdot N$ . To delve a bit deeper into it, we first allocate an array of size $2 \cdot N$ , then copy the $N$ elements over to the newly allocated array. Then we give up the older array’s to the memory management system after making the array reference point to the new one. The figure below illustrates the state of the array before, during and after re-sizing the array.

Figure: Dynamic Array Re-sizing (click image to enlarge)

It can be noted that the same procedure holds good while we have too much space with us, once the array is only filled to $N/2$ , it gets re-sized to this number, instead of holding up space for all the $N$ elements.

Now let’s get into the interesting part, algorithmic complexity. For simplicity, let’s assume that the initial size of the array is $1$ .

The first question to be asked here is how many times do we have to double the array to reach a final capacity of $N$ . Before thinking in abstract terms, let us talk some numbers. Suppose if the capacity of my dynamic array is 16 right now, how many times have I doubled the capacity of my array? We’ve doubled 4 times (1 to 2, 2 to 4, 4 to 8, 8 to 16). Thus we can conlude that to reach the capacity of $N$ we must have doubled $log_2 N$ times.

The question to be asked next is how many times have we re-copied elements into a new array the original array is currently at capacity of $N$ and we’ve just re-sized to a capacity of $2 \cdot N$ ? Half of the elements of the original array has moved once, the quarter of the original array have moved twice, and so on. When we formulate this as a summation $S$ we get:

$S = \sum_{i=1}^{\log N}{i \cdot n / 2^i} = n \cdot \sum_{i=1}^{\log N}{i / 2^i} < N \cdot \sum_{i=1}^{\log \infty}{i / 2^i} = 2 \cdot N$

Notice that the we’re talking about the total number of movements (re-copies) of elements rather than how many times an element moves. This is useful because, when we look at $N$ insertions as a whole, we’ve just spent $2 \cdot N$ amount of work. This is really cool because, creating and managing a dynamic array framework is still guarenteed to be $O\left(N\right)$ . Such a guarantee/cost analysis is formally known as an amortized guarantee/analysis depending on the context.

The intuition behind computing combinations recursively (N choose K)

Note: I have rewritten this article to make it more straightforward, and also I have corrected a lot of grammatical errors.

With this article, I plan to explain the intuition behind the following “recursive” formula.

${N \choose k}={N-1 \choose k-1}+{N-1 \choose k}$
or
$C(N,k) = C(N-1,k-1) + C(N-1,k)$

Let’s start by building a mental model, imagine that you have two baskets with you. Let’s call the first as $B_1$ , has a capacity of $N$ and is full. The second, $B_2$ has a capacity $k$ (for clarity, $k \leq N$ ) and empty. For simplicity, let’s assume that all items are distinct and distinguishable.

The idea of choosing $k$ items from $N$ is nothing but counting the number of ways you can pick $k$ items from $B_1$ and put it in $B_2$ .

Another critical thing to note is that, while computing combinations, the order of how you choose those $k$ items is unimportant, the only thing that matters is what items you choose. Choosing $k$ items from $N$ is mathematically denoted as ${N \choose k}$ and I shall be using this notation from now on.

One way to approach a problem recursively is to see if it makes sense to decrease the problem size by one and delegate the problem to someone else. Let’s do just that, and pick an item out of $B_1$ . Now you can either put it in $B_2$ or keep it aside. Each choice leads to interesting consequences:

Case 1: The item was placed in $B_2$ , so now $B_1$ has $N-1$ items in it, and you can only put $k-1$ items in $B_2$ . So now you have to choose $k-1$ items from $N-1$ items or in other words ${N-1 \choose k-1}$ .

Case 2: The item was placed aside (not in $B_2$ ), consequently $B_1$ has $N-1$ items in it, but $B_2$ is still empty and you can still put $k$ items in it. This translates to you having to choose $k$ items from $N-1$ items or in other words ${N-1 \choose k}$ .

We can conclude that the number of ways of choosing $k$ items from $N$ is the sum of:

Number of ways of choosing $k-1$ items from $N-1$
Number of ways of choosing $k$ items from $N-1$ .

Thus we have derived the recursive equation, ${N \choose k}={N-1 \choose k-1}+{N-1 \choose k}$ .

Are we done? No, not yet. We still haven’t discussed another crucial thing when it comes to recursion, the base cases. Base cases are how the recursive formulae yield an outcome.

Q1) What is ${N \choose N}$ ?

A: We have only one way of choosing all the items from $B_1$ and putting them into $B_2$ . Hence, we can also conclude that ${1 \choose 1}$ is $1$ .

Q2) What is ${N \choose 1}$ ?

A: We have $N$ choices, hence $N$ ways of choosing one item from $N$ items.

Let us make how we stop our recursion a bit smarter by considering the following case:

Q3) What is ${N \choose N-k}$ ?

A: Choosing $k$ items out of $N$ is the same as not choosing $N-k$ items out of $N$ . Hence, you have ${N \choose k} = {N \choose N-k}$ . You can derive some interesting results out of this, here are a few

${N \choose 0} = {N \choose N} = 1$
${N \choose N-1} = {N \choose 1} = N$

Google App Engine: Memcaching for dummies example

Here’s a very simple code written in Python 2.7 and webapp2 on Google App Engine that does memcaching. It is possibly the easiest example I could think of, that I could implement memcaching on Google App Engine.

The main confusion which arises when one starts to look into memcaching is that Google fails to show that memecaching has to be done after instantiating a memcache client object.

The source code for this application is available here and is self explanatory.

Questions, comments, doubts are welcome.

Encoding Special Characters with URLLib Quote Function in Python

I was actually testing my final year project today and there was so much noise in the data, I was really frustrated by the end of it.

One of the most troublesome and difficult to figure out was urllib.quote(movie) function.

You should see movie titles people update, here are a few.

I ♥ Bollywood, Funniest movies ever ツ

That really seemed a challenge to be sent over for an API call. We thought of stripping them off in the sentence, but there are a few French and Italian movies which always have some or the other odd character in them. I used quote() function and was getting KeyError exception.

Finally I figured it out, you have to encode it into UTF-8 so that they can be sent across. So while calling a URL, if it has any special characters in it, better encode it and sent it accross.

Example: (Google App Engine)

import urllib
from google.appengine.api import urlfetch

data = u'♥+ツ'
url = 'http://www.google.com?search='

response = urlfetch.fetch(url + urllib.quote(data.encode(encoding = 'UTF-8')))

if response.status_code == 200:
    output = response.content
    self.response.out.write(output)

I wasted almost an hour on figuring out what to do. If you ever get a KeyError when you are using URLLib.Quote, then this is the solution.

Google App Engine: Geo location identification and reverse geo-coding example

Today, I developed an application that asks users for their location (be it PC or cell phones) and then it displays the latitude and longitude (Geo-Location identification), it even looks up the address (Reverse Geocoding) from Google Maps API. I also store HTTP headers so that this information can be used to identify the device used (I’m working on it right now!)

Okay! This might not involve a lot of App Engine Code, just some basic code to store user location (datastore) and Users API from Google.

These things which are stored can be viewed in an Administrator Area which shows the last 100 peoples’ information.

But this has a lot of jQuery and JavaScript code in it! I know jQuery sounds fancy but just visit W3CSchools jQuery tutorial and finish it (it’ll not take you more than 5-10 minutes), you’ll understand (almost) every line of code.

What I’ve used:

Google App Engine (webapp2, Python 2.7)
jQuery
Google Maps V3 API

The code can be found here. It’s well commented and I expect you to know a bit of GAE (templates as well), jQuery is used a lot so a bit of that as well. Lookup Google Maps documentation if you don’t understand the code written for Google Maps.

N-Gram generation in Python

UPDATE – 24/03/2018: I’m in the process of rewriting this article. For those of you who can understand a bit of non-trivial Python code you can take a look at my GitHub repository for a more elegant implementation.

OUTDATED information from here…

I’ve written a very small code snippet that actually generates n-grams. I’ve also added a small tweak that gives us the number of times a n-gram has appeared in the document.
The example I’ve considered is a Shakespeare’s play (All is Well that Ends Well). I’ll be generating the most common 3,4,5 or 6 word phrases that were used by Shakespeare in this particular play.
The first thing to do is cleaning up the document. Removing stuff like ACT1, SCENE 1, [To Derpina] etc. The next step is tokenising the document (splitting the document into tokens by stripping punctuations and white spaces).
Now we get into action:

#By now you should have a list of the words in the file
#There should not be unnecessary punctuation marks in the end
#of the words or any unnecessary white spaces as well.

#now word_list contains a list, generate a n-gram
#print word_list

#n for n-gram
#Change it to whatever the requirement is
n = 6

ngrams = dict()

#create an n-gram list
for i in range(len(word_list) - n + 1):

    gram = tuple(word_list[i:i+n])

    if gram in ngrams:
        ngrams[gram] += 1
    else:
        ngrams[gram] = 1

#now ngrams contains all the ngrams of the book
sorted_ngrams = sorted(ngrams.iteritems(), key = operator.itemgetter(1), reverse = True)

Okay! this is the only working part of this program that needs to be explained. I believe the the code is self-explanatory if you know a bit of Python.
The source code can be found in my repository .

Google App Engine: Facebook server side authentication

When you’re working with devices without JavaScript or in cases where the user has disabled JavaScript you’ll need to work out strategies to do your stuff on the server-side. A classic case is form validation using JavaScript, which fails when you don’t have a server side verification system and allows users to post junk onto your site.

I’ve just converted a small script shown in this example on Facebook’s site that does the same thing, but on PHP. This code closely emulates the one that leads to the link above.

Another case you might need this is because Facebook has scrapped its Python API support since OAuth 2.0 has been introduced.

There are some limitations to this application, one of which is as Facebook prevents POST headers, you can’t integrate this method into a canvas.
It builds upon the sessions example I’ve covered in the previous post.

The build up of the program is as follows:

There are 3 files:

state_variable.py : A class that generate state variable – a unique combination of 13/27 characters that Facebook generates when you make a request. the variable ‘code’ in the URL of a Facebook application.
session_module.py: This is a class that handles the sessions. It must be inherited by any class that uses sessions. Refer to the post (the link) which covers it for more information on it.
main.py: This is the main Python program that Google App Engine handles. The two step procedure to check whether the URL has the ‘code’ request variable in the request header if so continue with OAuth authentication, else redirect the user to the Facebook page on which he gives permissions/authenticates the application.

It really is a primitive code for sandboxing purposes, you’ll need to refine the code with exception handling and other stuff to actually deploy it.

Any bugs/suggestions are always welcome. I really don’t know if I’ll be working on this in any near future. I just wanted this application in public, so that if anybody needs it they can build upon this code.

The link to the code repository is here.

thirumal's blog

thoughts on systems programming, math and what not.