Interesting Problems in Computer Vision

Mining Photo Archives
John Resig, used TinyEye’s MatchEngine to do computer vision on a photo archive. He has detailed his current work here – http://ejohn.org/research/computer-vision-photo-archives/. The challenge in this project will be use John’s data and create various vision prototypes which can be compared with MatchEngine.
Data Set(s): Photo Archive Link.

Mitosis Detection in Breast Cancer Histological Images
This contest at ICPR 2012 outlines a good set of data with histological images for breast cancer. The challenge is to use computer vision to detect mitosis.
Data Set(s): Histological images link.

ImageNet Challenges
ImageNet is really large set of images with about 1000 categories – http://image-net.org/. A number of computer vision challenges are possible on this general purpose dataset.
Data Set(s): Image Net link.
Related Data Set:  CIFAR is like ImageNet, but it is much smaller. There’s the 10-category and the 100-category.

Cancer Detection with Computer Vision
There are tons of medical images in this collection of datasets, the Cancer Imaging Archive: http://www.cancerimagingarchive.net , a number of challenge problems are possible on this archive.
Data Set(s): Cancer Image Archive link

Optimal Frequency Calculation

Different algorithms and structures have been used to solve rate-limiting. In this post I want to focus on use of probabilistic method & count-min sketch in particular.

Why

To save space or time. It approximates the result.

There is extensive research on solving the membership problem using data structures from hash tables to balanced trees and B-tree indices, and these form the backbone of systems from OSs, compilers, databases and beyond. Many of these data structures have been in widespread use for forty or more years.

Count-min consumes a stream of events and produces approximate frequency of each of the events. It can be queried for frequency of a certain event and it will return the frequency of that event with certain probability.

Usage

Compressed sensing, Networking, Databases, NLP, Security (cryptography, finding primes), Computation Geometry (finding vertices),  Machine Learning.

Implementation

Simple python implementation from github.

def query(self, x):
"""
Return an estimation of the amount of times 
`x` has occurred. The returned value always 
overestimates the real value.
"""
return min(table[i] for table, i in zip(self.tables, 
                                           self._hash(x))

References

  • Count-Min C Implementation – https://sites.google.com/site/countminsketch/home
  • Count-Min Go Implementation – https://github.com/tylertreat/BoomFilters
  • Algorithms to live by – http://algorithmstoliveby.com/
  • C-Implementation – http://www.cs.rutgers.edu/∼muthu/ massdal-code-index.html

Writing Backlog

My posts have been very sporadic for awhile. Wish I can fix that and get my thoughts, writings, learnings and tidbits out on a more regular cadence. I have long list of items which I have been intending to post about so in this note, I’m going to actually try and enumerate all those things I should be writing about. This is my way of holding myself a bit accountable and putting a forcing function on!

  1. Shyamapana – the Festival of Forgiveness : Every-year I take a week off to focus, rejuvenate and read something spiritual. This year I wanted to read on Reiki and learn more about Music especially Meditative and western classical. I need to write a post which will be about by Reiki book review and potentially a spotify list of meditative music.
  2. BlockChain in HealthCare: I have been spending a lot of time on developing a blockchain platform applicable in Healthcare IT. I wanted to summarize the learnings from Stanford Medx and launch of IEEE & YouBases’ hackathon. Overtime I’m planning to just compile a weekly blockchain newsletter – more like 10 links by Friday. My post would focus on various blockchain initiatives, the talks at Stanford MedX and all the problems being solved in various hackathons along with sample of my first 10 link blockchain newsletter.
  3. La Victrola: Over last six months I worked on a team which built a 25′ gramophone weight 14 tonnes of steel! I did a lot of art, lifting and playa work for this project as it got installed at Burning Man 2016. My post would focus on this journey, pictures of all the projects I completed and the details behind it and perhaps some stories. I want to take a more story telling approach for this one.
  4. Libar:  So I have made some sporadic posts about cocktail events but I haven’t done much in terms of refining and bringing the whole experience together. In this post I would combine pictures of all the cocktails, menus of all the speakeasies I have been to and pinterests of project which will be interesting to explore in the cocktail scene.
  5. A Vagabond Life – Digital Nomading : Over 18 months I travelled a lot and did some esoteric trips including 5000 mile motorcycle ride. In this post I would like to bring it all together. One post for the entire journey won’t be sufficient but this post will highlight my memories.

So these are my writing projects. I’m hoping to finish them in next couple of weeks. Lets see if this forcing function helps.

What is Deep Machine Learning?

Deep Machine Learning often referred to as deep learning by the media is the mimicking human brain’s neo cortex by an AI engine. This sub-branch of machine learning is very nascent, largely deriving from neural network research of 1980s and some representational breakthroughs in 2006. Deep learning offers some solutions to problems such as reading hand-writing or finding objects in images using machines.

Several software toolkits such as opencv, mlib, tensorflow, thenos provide a set of neural network representations and algorithms for Deep Machine Learning. Middleware like keras makes it easy to enable toolkit portability.

There are several industry problems, which are currently using deep learning – the top problem areas are around image search, and machine vision (automotive/ aviation).

At this stage, we are very  interested in image search and application of automated learning to find data schemas for health care, finance and other domains and are exploring use cases to begin testing.

References

Reviewer(s)

  • Brian Hur

Festival Of Forgiveness & Week Of Reflection

The week-long Jain Paryushan ended with Festival of Forgiveness today. As every year I wish to ask for you to let go of my mistakes.

Like last year, this year I decided to focus the week on some spiritual reading, and reflecting.

I picked up the book Siddhartha by Herman Hesse. This is a very well written book on the journey of spirituality. Siddhartha after being a Brahmin, Shraman, Buddha’s disciple starts on a journey of this own with this quote — “Neither Yoga-Veda shall teach me any more, nor Atharva-Veda, nor the ascetics, nor any kind of teachings. I want to learn from myself, want to be my student, want to get to know myself, the secret of Siddhatha!”

In addition, to reading I’m doing some writing, I intend to keep up a very short writeup of my beliefs and ethics, and test it overtime. I got inspiration for it from this Codex.

Well on to next spiritual year!

Le Moto Tour – Feedback Wanted!

MotoTourSo I’m planning to do a motorcycle tour after a long time! I just bought a BMW 1150 RS, and this machine is now all decked up for a tour! The current plan is to tour 12-14 days, and head up from Seattle to Canadian Rockies, the Banff national park, then Glaciers national park and then finally end up in San Francisco after a visit to Yellowstone National park. I should perhaps call this a 2000 mile national parks pilgrimage tour!

I’m super excited about it, but I’m still finalizing the route and the journey details. Currently I’m pegging about 8 days on the route, and I have about 5 days which I need to disperse around. If you have any ideas, thoughts or suggestions please send them my way!