Friday, December 27, 2013

How to Build a Robot in Browser

Robot, bot, browserbot, botser, botowser? I actually used a ReBot -- REcommendation Bot before. Many possible names, the same meaning -- you need a software robot running in your browser.

Why browser? Some of the reasons are:

REASON 1. The stuff you will feed to your robot is in the web -- something you view in your tabs or possibly something that your robot will open and read on its own (possible!, done that already!),
REASON 2. Final results of your robot's digestion tract (I avoid saying poop, obviously) are also web-based -- the closest example is storing your stuff in cloud drive, where I personally use the Dropbox JS API which I wrote myself (comes with the code below).
REASON 3. You need your robot to possess maximum achievable compatibility, where web technologies is obviously the way to go. What did the Firefox guy recently say about Firefox OS release? -- "All other platforms are beautiful rose gardens surrounded by unreasonably high fences".
REASON 4. ... fill in with your personal reasons ... of which I have a couple but will not write them here to stay focused on the main topic.


So, what do you need to write a robot? Not much, it comes out. See the points below.

POINT 1. Use Chrome. See illustration below about how chrome designs its extensions. Firefox and other browsers have their own designs, but I find Chrome the easiest to use. In, fact I failed to get an example Firefox extension to work in the first place. Does not indicate my stupidity... it indicates how clumsy the design is -- take my word for it. Besides, Chrome makes it easy to debug extensions. You can open consoles for each of the three below components -- float, inpage and background.





POINT 2.
About extension components. Use them wisely, meaning "in accordance to their purpose". *.bg.js (background) will start running immediately when the extension is loaded. *.inpage.js will run on each page which matches your prefix. *.float.js will show the pretty (hopefully) GUI when user clicks on the icon that shows up in the browser. Yes, you can create a pretty icon for your extension.

POINT 3. If you work with one URL prefix, it is easy -- just write the matching rule in your manifest.json. However, if your robot wants to digest many different pages (like mine does), then you need to write the matching rule as "matches":["http://*/*"] (exact line in manifest.json) and fork into individual processors from inside your *.inpage.js. Increases complexity, but totally worthwhile given the increased scope/coverage -- ultimately you are building a Swiss knife of web page parsing.

POINT3. Do not be afraid to throw all the web technology you have at your problem. Specifically, note the following unique aspects of Chrome extensions. (1) you have DOM in all three components, ... it would seem that there should be no DOM in *.bg.js but there is (!) one .. DOM is important when you need to set timers to ... hm ... time/pace things such as Dropbox accesses, Google Map requests, etc.... I use jQuery's Timer extension which needs a DOM to work. (2) you are free of Same Origin policy inside your extensions so contact whatever service you want... I use Dropbox for cloud-side storage, for example. (3) you can work with all the advanced features such as jQuery (load its JS from manifest.json), CANVAS/SVG, local storage, etc.



That's pretty much it. You can see my working testcases at this GitHub repository. It is a ready-to-use Chrome extension which you can load in its current form -- just point Chrome to this folder and tell it to load the extension. The icon should pop up in the bar immediately. Obviously I have no background or inpage scripts, but the testcases will show you where I am shooting at.

I will not go into details about the serverless.js file which contains all my custom components including those of GUI nature. The extension has several working testcases. Some of them are not finished yet ... like Stringex and possibly CloudStorage, but the rest will work. I am particularly proud of NICECOVERring and SidePaneStack components which I will use as low-level QUI components. Those definitely work and you are free to play around with them.







Tuesday, December 17, 2013

StackEdit for Markdown (.md) in Google Drives ... kind of works


UPDATE 2014/01/22 later that day. -- it looks like it still works but there is now a checkbox in StackEdit settings that says "Markdown Extra/GitHub Flavored Markdown syntax". If it is ON, the HTML inside .md is ignored. If you UNCHECK it, it works as is described below.

UPDATE 2014/01/22 -- This kind-of-works has recently turned into does-not-work. StackEdit obviously stopped recognizing HTML mixed in with your .md text. I used to have pretty looking pages and now they all reverted to custom format. There could be a workaround -- possibly adding the STYLE tag into the templates in StackEdit setting. I will try that later.

====

Yep. Kind of works. Meaning that it does work entirely satisfactor... torrrily .. is that a word? ... but has some flaws when it comes to sharing.


Specifically, PROS:
--------------
- you can work with content which is viewable in browsers. Literally with HTML. You do not have to type in HTML because .md has some custom behavior based on some syntax you use in text (do a search on markdown), but if you do type in HTML it will interpret it correctly;
- it is easier when writing notes (meeting minutes, manuals, etc.) when its closest rival -- Word, including the one in Google Drive. I hate to type docs I create on the fly in Word. See the example below for an example *pretty doc*.
- Print-to-PDF is excellent! Keeps all the formats, colors, etc. These docs are pretty!


Now, CONS:
----------
- a bit sluggish because it is not native to Google Drive
- viewing mode is one button away but the default is a split screen (input versus WYSIWYG) -- not good when you share your doc with others
- sharing is painful. Even if your share-ees have Google accounts, they have to walk through the installation process to enable StackEdit on their accounts. The biggest pain is that people who do not have Google accounts cannot view your .md docs ... at ALL. I still get a couple of those among my friends. Ended up creating a PDF for the file and sharing that.


Now, this blog is on a specific topic related to .md files. Default .md behavior is not what you might expect. For example, H1 is too big with huge margins, there is no italic, now underlined text, etc. However, including a small STYLE section at the beginning of your .md file will take care of all that. Think of this as a FORMAT stab which you an copy across a subset of your documents.

Here is mine:

STYLE -- enclose in angle braces as is normally done in HTML
em { font-size:larger; font-weight: bold; font-style: normal; text-decoration: none; }
strong { font-size:larger; font-weight: bold; text-decoration: underline; }
h1 { margin: 20px 0px 5px 0px; font-size: 20px; font-weight: bold; background-color: #ccc; padding: 3px 0px; }
h2 { margin: 20px 0px 5px 0px; font-size: 20px; font-weight: bold; background-color: #f00; padding: 3px 0px; color: #fff; }
code { background-color:#ccc; }
hr { border: 1px solid #999; margin: 5px 0px 10px; }
/STYLE


It looks like this in on the left size of your StackEdit:

And like in WYSIWYG on the right side (one of my current docs):


The syntax the style of which I altered is:
1. `code`
2. *stress*
3. **more stress**
4. # normal header/section
5. ## red header / section

For the rest of default syntax see a Markdown howto.


Tuesday, October 1, 2013

Markdown as a Native Google Drive Doctype

I have been maintaining some presence in GitHub for awhile now. For those who know, GitHub -- among many other programming-related portals -- uses .md aka Markdown format for readme files. Markdown is really handy for structured text. In fact, I prefer it to a Word file.

Now, let's say that you want to work on a file within a community. For this you would normally share a file -- say, make it public in your Google Drive -- and let other people edit it on their side. Works just fine with all the traditional Google Drive formats. Not with .md files until now.

Actually, this is a lie. Apparently, Google Drive DOCTYPES can be extended. See this:



That StackEdit doctype in the list is what I found in the list under "Connect More Apps". There list of special doctypes is very list, actually. I did not really look at all the others.

It is a bit weird on the first run -- that's when you have to be careful to allow popups and let StackEdit initiate the 3-way OAuth handshake to become an authorized app for your GoogleDrive, but after that it will work the same as any other application. Except it is closer to a WYSIWYG HTML editor because the concept of input is that of a markdown, by definition.

The problem of sharing your .md files with people who do not have Google accounts remains unsolved. It is possible with native doctypes but .md (actually, x-markdown mime) will require you to log in and install the application (3-way handshake needs an account).

So, summarizing

(1) StackEdit as native doctype in Google Drive -- OK
(2) .md files shared publicly for community edits (sharing) -- OK
(3) Sharing with people who do not have Google accounts -- FAILED


Home the (3) becomes possible is the future.

Friday, September 13, 2013

NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledge Generation on Top of Scientific Portals

The title is a mouthful, I know. But you need that long to describe the idea.

There is a story behind it. I have first-hand experience in how major scientific portals (won't tell you which one exactly) upgrade their portals. Not good, my friends, not good ... are the processes and routines these people use.

Anyway, NiceCover is literally a nice cover for the scientific portals. The target is:

(1) Build a social collaboration layer on top of scientific portals.
(2) Write serverless webapps.
(3) Make sure you are within terms of use, but go wild otherwise.

These slides report on the first step.



There will be more soon.

Wednesday, September 4, 2013

StopWastingFoodAPI

Now, is it really that difficult to solve the problem of 2/3 of the product wasted at our supermarkets?

Saturday, August 31, 2013

Social Applications that Live in Clouds and Help People Link Their Work

Social Application are not Social Networks, especially not those that are popular today. Social apps are simple applications which do work socially -- normally by facilitating social collaboration. Note that there are not such apps today.

You can see my previous posts about social collaboration here and here. In fact, one app that welcomes social collaboration is still running in the wild ... wide open to the public .. with not a single client other than my own in these few weeks. Let's put social collaboration itself aside.

Consider this app:



It lives in the cloud because it is hosted by, say, Google Drive (or Dropbox, or some other) and keeps its output in the same place in the cloud it came from. In order to make it work you need two things:

(1) Solution to the Stringex Problem on which I wrote before, preferably a working software client.

(2) People willing to participate to form the three in the figure in the bottom-up fashion. Note that there are Miners and Mappers.


The problem is that (2) creates security problems. Specifically, people might be afraid to open up their data to public. This fear is not completely ungrounded. Dropbox API, for example, opens your public spaces wide open which means that anyone with the keys can overwrite, erase, add, or do anything with your data. Which is why most such apps limit their scope to the Dropbox account of the user him/herself. You basically write to your own account. On the other hand, Social Apps NEED to share their public spaces.

Still can be done. As long as you provide a separate PRIVATE space which no one can touch and develop a SMART SYNC. The smartness of the sync should be judged the same as Wikipedia does it -- it should be much easier to revert malicious changes than to make them.

Open problem which I am currently trying to close...

Tuesday, August 27, 2013

Passwordless SSH Login in Fedora 18+

Been having this problem for a while. Most machines used to be either cloud stacks (like XCP) or FC16 machines (traffic capture) and I had no problem with them. For someone who runs automation across machines, inability to login into SSH passwordlessly is a huge problem. Not because of SSH per se, but because RSYNC over SSH is probably the safest way to sync files across machines and if each RSYNC keeps asking you for the password, ... then you cannot get anything done.

Found several places where this was discussed. But the actual solution was much simpler.

> vi /etc/ssh/sshd_config

#MaxAuthTries 6
#MaxSessions 10

#RSAAuthentication yes
PubkeyAuthentication yes

# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
#AuthorizedKeysFile .ssh/authorized_keys

#AuthorizedKeysCommand none



The key is the authorized_keys2. It looks like FC18 forces SSH to read *_keys file instead of _keys2. Since I was using the customary *_keys2 file, I could not login without the password. As soon as I commented it out, I got my passwordless login.

Weird how things change sometimes.

Thursday, August 22, 2013

The Stringex Problem: Client-Side Indexing in Clouds

Hadoop and Lucene-style indexing all sound nice, until you stumble upon a new practical usecase (read: problem) and need to build a webapp which needs indexing on the client side (read: browser) in realtime (read:continuously). That's when you run into the Stringex Problem. Hint: When you need to access the data you keep on the cloud, you need to mind the size of the hole through the membrane (read: API capability).

ArcGIS or GoogleMapsAPI?

The problem earlier expressed in this blog and solved in this webapp keeps nagging me.

Hypothetical: I am sitting in say, a Toyota car and use its navigator to find how to get from A to B. I can enter anything I want as A or B -- addresses, telephone numbers, names of shops (which in Japan include region/city/etc names). Navigator then calculates the route and shows it to me.

Now, say I need to build a robot that performs this unit action but does it many many times. For 100 locations I would need ~5K route requests. As I explained earlier, I need it to build a graph. You can loosely call it a dataset. Since such datasets do not exist in nature -- they are neither maps now traditional GIS formats (vector or otherwise) -- the datasets have to be created from scratch. Note that what ArcGIS calls a network is not really a graph -- it is more like a set of loosely-coupled pair-wise routes.


Now, consider this:

(1) Toyota Navigators, or actually their DVD ROMs are in proprietary format. Basically you cannot use it.
(2) ArcGIS can actually find addresses and you can use ArcScripts to find routes between places (see this example), but it is only possible in a paid ArcGIS Desktop version of software. However, if you have the paid version, you can load is with various data and use it offline.
(3) GoogleMapsAPI does all that in one package. It provides the scripts (API) that find you the route between two locations. It is equally strong to a car navigator in terms of being able to find places based on irregular string identifiers. When the API returns the route, it is split in laps (roads?) and each is provided with coordinates of intersections. Each road also has a name. The problem is the daily quota in the free access to the API.


The reason why this problem is nagging me is this. ArcGIS is a paid product -- it is actually not cheap. There is also ArcGIS Japan which probably has reliable maps of Japan. I can also overcome the Google Maps API quota problem by paying $9 a month to be able to issue 100K requests a day -- I can basically build a large dataset a day with this quota and I can stop paying when I am done (monthly contracts).

But. Not that I am cheap .... but I was really hoping that I could build an online community of collaborators which need the same kinds of datasets as I do and which I could work with to possibly build a library of custom datasets. The pilot webapp that builds such a dataset is kept wide open at this link for several days now but the only collaborators I can see are my own clients.

Will wait/try some more ... just for the fun of it. If not, will have to resort to the paid version of Google Maps API.

#SteveJobsinity

#stevejobsinity is a type of technological trend when a technology is opted out of advanced features in order to simplify the product and thus maximize its reception by the general public.

Hate Steve Jobs. Hate the late victim of a fracking incident enough to #hashtag his persisting existence.

Not for what he did -- or did not do, actually, but for what he represents and what is now attracting the biggest followship in the technical world.

Because of #stevejobsinity -- now this time I refer to the mode in which the technical world is operating these days -- good ideas are left unnoticed ... even before they were properly understood.

Let me give you an example. I was a happy owner of a 7" HTC Flyer for a while last year. I bought three more for 3 of my students at the time. All were happy. I was happy. Now let's test how deep the #stevejobsinity goes right now. I am the tester you are the testee. Look at the picture below, follow this link with specs, take you time to study the information and come back beneath the picture to verify if you understood what idea was lost.


So? Did you get it? I only got it after about an hour of playing with the thing -- cause reading online did not help one bit. The key point is the stylus. But not its presence per se. You can find pens in lots of other devices now. I have four between me and my wife -- all Samsung Notes. It is not about the pen, it is about its technology.


HTC Flyer went and did something unconventional -- it separated the streams for the pen from the general input stream. You can still work it with your fingers and even those weird rubbery pens, but the pen stream is processes by the hardware separately. They actually had to tinker with Android (which is why you did not get OS updates on that thing) to put in a driver for the pen. But after it was done, its outcome was beautiful!

Now, you should probably guess that the pen would not work with just any application. Any new application you install (I use Papyrus for notes for example) would only work with your fingers and would not see the wonder-pen. But HTC covered all the basic uses -- Office, Notes application and a Foxit PDF Reader. The last application was used most often by me and my students. Annotation, naturally.

Now take this. When I try to use my Note for presentations, I end up using it only for slides and am forced to give up annotation. Simple because I cannot change pages (or do any fingering) when in annotation mode. It is the same no matter which PDF app I use -- Adobe Reader, ezPDF, RepliGO, etc. The same problem everywhere. Switching between annotation and viewing modes is clumsy and unfit for official presentations.

HTC Flyer could actually make +annotation presentation possible. Pen would automatically write on the PDF (freehand commenting) and fingering could switch pages, zoom, etc. I am not sure if you can picture this scenario ... but if you can it should boggle your mind!

My current presentation machine is Asus Taichi and my application is PDF Annotator. I still can't use annotation and paging mode together -- in fact, PDF Annotation does not recognize the difference between fingers and pens. So, I have to use an Air Mouse for paging. You can picture me with this air mouse in one hand and a pen in the other... making presentation. My own assessment of my equipped-ness is that I cannot possibly do better with present technology.

Sad that I could use HTC Flyer with equal success and more. Sad that I do not have the machine anymore. But even sadder is the fact that because of the #stevejobsinity of technological trends it is unlikely that HTC will invest further into this idea.

Tuesday, August 20, 2013

Google Maps as GIS, Road Graphs, and Social Collaboration

There is an active discussion on whether Google Maps is a GIS or not: http://highearthorbit.com/is-googlemaps-gis/

For me personally, GooogleMapsAPI wins this contest by a huge margin. The biggest win for Google's take at GIS is that it uses GPS coordinates -- specifically, you normally find the hash with { ih, kb, ...} keys ... I do not actually remember the keys but they are there and they actually represent the ... what was it? .... longitude and latitude of a coordinate. My knowledge of the meaning underlying the coordinates themselves is lame. I know it. I do not apologies for it. Simply because for me the physical meaning is not important, as long as I can plot them on a 2D or 3D visualization. In fact, I once plotted a worldwide AS-level network topology randomly assigning X and Y only to realize that the world was on its side. You could actually tell because the ISPs formed pretty god outlines of continents.

Now, all I need to know about GISes is the datatypes. By far the most popular datatype in traditional GIS is SHAPE (.shp). Sometimes these files are referred to as SHAPEFILES. You can read all about the format here, but I would recommend to see the diagram from here:


Sorry for the Japanese, but I could not find English diagram with the same level of visual clarity. The wikipedia page definitely does not tell you that right away. The .SHP format is very simple:

(1) It is a vector format.
(2) It can be in 2D-mesh or 3D-mesh forms, where the 2D and 3D do not stand for coordinate dimensions (which is why Google search on the format will only confuse you), but rather precision of the coordinate itself (see figure above). Simply put, 2D-mesh gives you 10x10km dots and 3D-mesh gives you 1x1km dots.

I once officially requested Tokyo road map from some official association, got their CD ... only to find out that the data was in 2D-mesh format. This meant that the center of Tokyo on the plot looked like a single dot. Literally!

That was the point at which I decided to just use Google Maps as my main GIS system. YES, this improved precision of my datasets, but NO, it did not solve all my problems -- the big problem further on.


I recently had to use Google Maps as GIS to build road maps (graph structures) collecting certain places. The problem I stumbled upon is collaboration. Specifically, collaboration in building datasets -- not using them. Majority of GIS portals I know (included Google Maps Engine) are there for 2 purposes:

(1) to use a GIS dataset together
(2) to build a dataset based on the interface provided by the portal.

The simple purpose I am trying to pursue is left unfulfilled by both the above. Which is actually strange because quotas on requests exist for any existing API -- GoogleMapsAPI or ArcGIS, although it could be in slightly varied forms -- ArcGIS is paid from the start and has rather big quotas while GoogleMapsAPI has very restrictive quotas in its free form.

Why quotas? This is simple. Google as well as all other big players have to ration access to their Big Data. Imagine all free users having unlimited quotas to all Google APIs?! Specifically, a free GoogleMapsAPI client (per IP) has the daily quota of 2500 requests. This is key because ... say, if I have 300 locations and I need to build an undirectional (A-B = B-A) graph among them, I need to make > 35K API requests. About a month for one client. About a day for 30 clients. About an hour if the crowd piles up and helps me with it ... on demand. The simple idea behind online collaboration on this topic is that I would borrow daily quotas from users who ... in their majority ... do not use much of their daily quotas anyway. Note that Google's Terms of Use do not prohibit such a use of its API.

For example:

http://t.co/3SmOedz2l0

is a simple serverless web application which uses client's daily quota to suck routes from GoogleMapsAPI and store them in my Dropbox folder. All clients write to the same folder. Collisions are resolved by the web application. This is a very simple collaboration.

Note that its purpose is different from that of default modes of ArcGIS or GoogleMaps (API or Engine) and cannot be created by the either of them. I am surprised to find myself in the position that public collaboration on a (useful?) dataset is completely impossible just because it departs -- a little bit -- from the default use.

Let me go one step deeper into this rant. I tried posting a request on Twitter with useful hashtags like #googlemaps #googlemapsapi #datasets, etc. in three languages, only to find out later that my message did not show up in none of the hashtag streams. I do not know what the criteria are used by Twitter when it aggregates its hashtag streams, but it looks like posts saying "I really like Google Maps" are more worthy the inclusion into the stream than a meaningful post. In the end, my posts ended up viewed only by the few (3?) people I have as my direct followers. Not a very social person, obviously.

Now I am posting it here. Let's see if I can actually find a way to broadcast this message beyond the small social box in which I live.