Practical Cloud Computing: August 2013

Saturday, August 31, 2013

Social Applications that Live in Clouds and Help People Link Their Work

Social Application are not Social Networks, especially not those that are popular today. Social apps are simple applications which do work socially -- normally by facilitating social collaboration. Note that there are not such apps today.

You can see my previous posts about social collaboration here and here. In fact, one app that welcomes social collaboration is still running in the wild ... wide open to the public .. with not a single client other than my own in these few weeks. Let's put social collaboration itself aside.

Consider this app:

It lives in the cloud because it is hosted by, say, Google Drive (or Dropbox, or some other) and keeps its output in the same place in the cloud it came from. In order to make it work you need two things:

(1) Solution to the Stringex Problem on which I wrote before, preferably a working software client.

(2) People willing to participate to form the three in the figure in the bottom-up fashion. Note that there are Miners and Mappers.

The problem is that (2) creates security problems. Specifically, people might be afraid to open up their data to public. This fear is not completely ungrounded. Dropbox API, for example, opens your public spaces wide open which means that anyone with the keys can overwrite, erase, add, or do anything with your data. Which is why most such apps limit their scope to the Dropbox account of the user him/herself. You basically write to your own account. On the other hand, Social Apps NEED to share their public spaces.

Still can be done. As long as you provide a separate PRIVATE space which no one can touch and develop a SMART SYNC. The smartness of the sync should be judged the same as Wikipedia does it -- it should be much easier to revert malicious changes than to make them.

Open problem which I am currently trying to close...

Tuesday, August 27, 2013

Passwordless SSH Login in Fedora 18+

Been having this problem for a while. Most machines used to be either cloud stacks (like XCP) or FC16 machines (traffic capture) and I had no problem with them. For someone who runs automation across machines, inability to login into SSH passwordlessly is a huge problem. Not because of SSH per se, but because RSYNC over SSH is probably the safest way to sync files across machines and if each RSYNC keeps asking you for the password, ... then you cannot get anything done.

Found several places where this was discussed. But the actual solution was much simpler.

> vi /etc/ssh/sshd_config

#MaxAuthTries 6
#MaxSessions 10

#RSAAuthentication yes
PubkeyAuthentication yes

# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
#AuthorizedKeysFile .ssh/authorized_keys

#AuthorizedKeysCommand none

The key is the authorized_keys2. It looks like FC18 forces SSH to read *_keys file instead of _keys2. Since I was using the customary *_keys2 file, I could not login without the password. As soon as I commented it out, I got my passwordless login.

Weird how things change sometimes.

Thursday, August 22, 2013

The Stringex Problem: Client-Side Indexing in Clouds

Hadoop and Lucene-style indexing all sound nice, until you stumble upon a new practical usecase (read: problem) and need to build a webapp which needs indexing on the client side (read: browser) in realtime (read:continuously). That's when you run into the Stringex Problem. Hint: When you need to access the data you keep on the cloud, you need to mind the size of the hole through the membrane (read: API capability).

ArcGIS or GoogleMapsAPI?

The problem earlier expressed in this blog and solved in this webapp keeps nagging me.

Hypothetical: I am sitting in say, a Toyota car and use its navigator to find how to get from A to B. I can enter anything I want as A or B -- addresses, telephone numbers, names of shops (which in Japan include region/city/etc names). Navigator then calculates the route and shows it to me.

Now, say I need to build a robot that performs this unit action but does it many many times. For 100 locations I would need ~5K route requests. As I explained earlier, I need it to build a graph. You can loosely call it a dataset. Since such datasets do not exist in nature -- they are neither maps now traditional GIS formats (vector or otherwise) -- the datasets have to be created from scratch. Note that what ArcGIS calls a network is not really a graph -- it is more like a set of loosely-coupled pair-wise routes.

Now, consider this:

(1) Toyota Navigators, or actually their DVD ROMs are in proprietary format. Basically you cannot use it.
(2) ArcGIS can actually find addresses and you can use ArcScripts to find routes between places (see this example), but it is only possible in a paid ArcGIS Desktop version of software. However, if you have the paid version, you can load is with various data and use it offline.
(3) GoogleMapsAPI does all that in one package. It provides the scripts (API) that find you the route between two locations. It is equally strong to a car navigator in terms of being able to find places based on irregular string identifiers. When the API returns the route, it is split in laps (roads?) and each is provided with coordinates of intersections. Each road also has a name. The problem is the daily quota in the free access to the API.

The reason why this problem is nagging me is this. ArcGIS is a paid product -- it is actually not cheap. There is also ArcGIS Japan which probably has reliable maps of Japan. I can also overcome the Google Maps API quota problem by paying $9 a month to be able to issue 100K requests a day -- I can basically build a large dataset a day with this quota and I can stop paying when I am done (monthly contracts).

But. Not that I am cheap .... but I was really hoping that I could build an online community of collaborators which need the same kinds of datasets as I do and which I could work with to possibly build a library of custom datasets. The pilot webapp that builds such a dataset is kept wide open at this link for several days now but the only collaborators I can see are my own clients.

Will wait/try some more ... just for the fun of it. If not, will have to resort to the paid version of Google Maps API.

#SteveJobsinity

#stevejobsinity is a type of technological trend when a technology is opted out of advanced features in order to simplify the product and thus maximize its reception by the general public.

Hate Steve Jobs. Hate the late victim of a fracking incident enough to #hashtag his persisting existence.

Not for what he did -- or did not do, actually, but for what he represents and what is now attracting the biggest followship in the technical world.

Because of #stevejobsinity -- now this time I refer to the mode in which the technical world is operating these days -- good ideas are left unnoticed ... even before they were properly understood.

Let me give you an example. I was a happy owner of a 7" HTC Flyer for a while last year. I bought three more for 3 of my students at the time. All were happy. I was happy. Now let's test how deep the #stevejobsinity goes right now. I am the tester you are the testee. Look at the picture below, follow this link with specs, take you time to study the information and come back beneath the picture to verify if you understood what idea was lost.

So? Did you get it? I only got it after about an hour of playing with the thing -- cause reading online did not help one bit. The key point is the stylus. But not its presence per se. You can find pens in lots of other devices now. I have four between me and my wife -- all Samsung Notes. It is not about the pen, it is about its technology.

HTC Flyer went and did something unconventional -- it separated the streams for the pen from the general input stream. You can still work it with your fingers and even those weird rubbery pens, but the pen stream is processes by the hardware separately. They actually had to tinker with Android (which is why you did not get OS updates on that thing) to put in a driver for the pen. But after it was done, its outcome was beautiful!

Now, you should probably guess that the pen would not work with just any application. Any new application you install (I use Papyrus for notes for example) would only work with your fingers and would not see the wonder-pen. But HTC covered all the basic uses -- Office, Notes application and a Foxit PDF Reader. The last application was used most often by me and my students. Annotation, naturally.

Now take this. When I try to use my Note for presentations, I end up using it only for slides and am forced to give up annotation. Simple because I cannot change pages (or do any fingering) when in annotation mode. It is the same no matter which PDF app I use -- Adobe Reader, ezPDF, RepliGO, etc. The same problem everywhere. Switching between annotation and viewing modes is clumsy and unfit for official presentations.

HTC Flyer could actually make +annotation presentation possible. Pen would automatically write on the PDF (freehand commenting) and fingering could switch pages, zoom, etc. I am not sure if you can picture this scenario ... but if you can it should boggle your mind!

My current presentation machine is Asus Taichi and my application is PDF Annotator. I still can't use annotation and paging mode together -- in fact, PDF Annotation does not recognize the difference between fingers and pens. So, I have to use an Air Mouse for paging. You can picture me with this air mouse in one hand and a pen in the other... making presentation. My own assessment of my equipped-ness is that I cannot possibly do better with present technology.

Sad that I could use HTC Flyer with equal success and more. Sad that I do not have the machine anymore. But even sadder is the fact that because of the #stevejobsinity of technological trends it is unlikely that HTC will invest further into this idea.

Tuesday, August 20, 2013

Google Maps as GIS, Road Graphs, and Social Collaboration

There is an active discussion on whether Google Maps is a GIS or not: http://highearthorbit.com/is-googlemaps-gis/

For me personally, GooogleMapsAPI wins this contest by a huge margin. The biggest win for Google's take at GIS is that it uses GPS coordinates -- specifically, you normally find the hash with { ih, kb, ...} keys ... I do not actually remember the keys but they are there and they actually represent the ... what was it? .... longitude and latitude of a coordinate. My knowledge of the meaning underlying the coordinates themselves is lame. I know it. I do not apologies for it. Simply because for me the physical meaning is not important, as long as I can plot them on a 2D or 3D visualization. In fact, I once plotted a worldwide AS-level network topology randomly assigning X and Y only to realize that the world was on its side. You could actually tell because the ISPs formed pretty god outlines of continents.

Now, all I need to know about GISes is the datatypes. By far the most popular datatype in traditional GIS is SHAPE (.shp). Sometimes these files are referred to as SHAPEFILES. You can read all about the format here, but I would recommend to see the diagram from here:

Sorry for the Japanese, but I could not find English diagram with the same level of visual clarity. The wikipedia page definitely does not tell you that right away. The .SHP format is very simple:

(1) It is a vector format.
(2) It can be in 2D-mesh or 3D-mesh forms, where the 2D and 3D do not stand for coordinate dimensions (which is why Google search on the format will only confuse you), but rather precision of the coordinate itself (see figure above). Simply put, 2D-mesh gives you 10x10km dots and 3D-mesh gives you 1x1km dots.

I once officially requested Tokyo road map from some official association, got their CD ... only to find out that the data was in 2D-mesh format. This meant that the center of Tokyo on the plot looked like a single dot. Literally!

That was the point at which I decided to just use Google Maps as my main GIS system. YES, this improved precision of my datasets, but NO, it did not solve all my problems -- the big problem further on.

I recently had to use Google Maps as GIS to build road maps (graph structures) collecting certain places. The problem I stumbled upon is collaboration. Specifically, collaboration in building datasets -- not using them. Majority of GIS portals I know (included Google Maps Engine) are there for 2 purposes:

(1) to use a GIS dataset together
(2) to build a dataset based on the interface provided by the portal.

The simple purpose I am trying to pursue is left unfulfilled by both the above. Which is actually strange because quotas on requests exist for any existing API -- GoogleMapsAPI or ArcGIS, although it could be in slightly varied forms -- ArcGIS is paid from the start and has rather big quotas while GoogleMapsAPI has very restrictive quotas in its free form.

Why quotas? This is simple. Google as well as all other big players have to ration access to their Big Data. Imagine all free users having unlimited quotas to all Google APIs?! Specifically, a free GoogleMapsAPI client (per IP) has the daily quota of 2500 requests. This is key because ... say, if I have 300 locations and I need to build an undirectional (A-B = B-A) graph among them, I need to make > 35K API requests. About a month for one client. About a day for 30 clients. About an hour if the crowd piles up and helps me with it ... on demand. The simple idea behind online collaboration on this topic is that I would borrow daily quotas from users who ... in their majority ... do not use much of their daily quotas anyway. Note that Google's Terms of Use do not prohibit such a use of its API.

For example:

http://t.co/3SmOedz2l0

is a simple serverless web application which uses client's daily quota to suck routes from GoogleMapsAPI and store them in my Dropbox folder. All clients write to the same folder. Collisions are resolved by the web application. This is a very simple collaboration.

Note that its purpose is different from that of default modes of ArcGIS or GoogleMaps (API or Engine) and cannot be created by the either of them. I am surprised to find myself in the position that public collaboration on a (useful?) dataset is completely impossible just because it departs -- a little bit -- from the default use.

Let me go one step deeper into this rant. I tried posting a request on Twitter with useful hashtags like #googlemaps #googlemapsapi #datasets, etc. in three languages, only to find out later that my message did not show up in none of the hashtag streams. I do not know what the criteria are used by Twitter when it aggregates its hashtag streams, but it looks like posts saying "I really like Google Maps" are more worthy the inclusion into the stream than a meaningful post. In the end, my posts ended up viewed only by the few (3?) people I have as my direct followers. Not a very social person, obviously.

Now I am posting it here. Let's see if I can actually find a way to broadcast this message beyond the small social box in which I live.