TOC

Transfer learning

If you have a look at the architeture of modern machine learning and neural network, you would realize that you don’t usually build a model from scratch for a real world (or commercial) application. Enter transfer learning. Transfer learning is when you make use of models that were trained with typically huge budget of computation resources and human capital (mostly a team full of capable scientists in academic institutions and/or R&D departments).

Transfer learning has multiple levels of reusing the trained models. In some cases, when you have a general trained model (such as BERT: a model that were trained to analyze natural langauge), then you can freeze the base (low level neurons) and continue to train the top layers with the dataset on computer science so that its weights learn to speak the new specialized style of computer science. There are a lot of variances of BERT doing so. In some other cases, you simply add a classification or regression layer on top to reuse the full model for prediction and then build an application around it. In this post, we would make use of a model named MoveNet whose usage is to detect human poses in sports. We can build an application around it to score a lay person’s yoga poses and measure their improvement over time.

MoveNet

MoveNet is a machine learning model that were trained on people images in various settings in the COCO dataset, plus a set of people doing fitness activities on Youtube. It has several variance: Lightning for mobile and Thunder for webapp. Its main design is to get the center of heatmap, then regress to estimate 17 body keypoints (nose, eyes, ankles, knees, etc). Its main aim is to detect the posture of one person. Please refer to the tensorflow tutorial for the code of using MoveNet. In this post, we set the parameter model_name = "movenet_thunder"

chair pose downward sequence

As you can see that the model returns 17 key points (together with its confidence) in a vector. This would be used to calculate the score for each pose.

CDN & JS

CDN (content deliver network) is a place where people host their stuff so that others can access those resources over the internet without the need to install anything. MoveNet above was implemented into javascript and hosted on p5js.org. Javascript is the language of the web, there are so many free service in javascript making building something for the web very simple. We just need to put the cdn script tags for p5js to load it with the MoveNet js model into a simple index.html with script to load the camera stream: we have a detecting algorithm in a browser! (and this page is still considered static). glitch.com is a place where they provide free hosting for simple website (not just static one, they also provide free service for dynamic web - with small database), so we can put our static website there. Check out the embeded website:

Algorithm

We would implement the scoring part as follows, see the full code on github:

  • Take perfect pose pictures (pexels is a good place to find free quality images)
tree pose chair pose
  • Use MoveNet to output 17 key points of those poses
Xtree = [0.32474753, 0.31742835, 0.315193  , 0.33014518, 0.330263  ,
        0.38090655, 0.38585427, 0.27876797, 0.27838552, 0.1950332 ,
        0.19406383, 0.5454995 , 0.55252606, 0.62930524, 0.7234637 ,
        0.6545082 , 0.8363562 ]
Ytree = [0.52622736, 0.53796893, 0.514909  , 0.55743146, 0.50785697,
        0.57873464, 0.4932102 , 0.56562304, 0.49372613, 0.54242104,
        0.5113031 , 0.56869173, 0.5054548 , 0.6828927 , 0.53807205,
        0.5603934 , 0.5661066 ]

Xchair = [0.34023464, 0.3276967 , 0.32793763, 0.33311573, 0.33541682,
        0.39712426, 0.3941802 , 0.49390513, 0.49845377, 0.42812088,
        0.43213347, 0.5732867 , 0.5668779 , 0.7322788 , 0.5021988 ,
        0.8870048 , 0.6521647 ]

Ychair = [0.4781877 , 0.4728554 , 0.47384417, 0.4453547 , 0.44738495,
        0.4402206 , 0.44152382, 0.48746884, 0.49766824, 0.5320453 ,
        0.5396985 , 0.43115434, 0.43280736, 0.4503186 , 0.57631177,
        0.45137438, 0.5604785 ]
  • Make an interface for user to choose a desired pose

interface

  • Impose those key points of that pose on the camera stream

  • When user make their move trying to fit the perfect pose on the screen, calculate the score for their pose. Scoring is simple: use MoveNet to detect and output 17 key points of the user’s pose. Take the euclidean distance between user’s pose and the perfect pose. The smaller the distance, the better, so we measure a region on the screen for the user’s pose to be good and when user fits that region, they achieve reasonably high score and can be reassured that they are getting good result from their practice

RwHKDq

  • Save the progress of user over time (this would involve saving the data to a database)

  • Plot the progress for user