Sunday, December 2, 2018

Installing Conda TensorFlow-gpu and Keras on Ubuntu 18.04 tackling issues

 The compulsion to have an Ubuntu laptop with GPU configured for deep learning was there for sometime. My previous windows laptop with 32gb ram and Nvidia GeForce GTX 860M GPU recently breathed it's last after being functional for close to 4 years. After that I was in search for a more affordable GPU enabled laptop and got one in recent black friday deal for less than one forth price of the previous one. This one though has 16 gb RAM and GeForce GTX 1050 GPU. Initially although I installed Ubuntu 16.04 LTS on it as dual boot alongside Windows 10, eventually I upgraded that to Ubuntu 18.04.1 LTS.

After that I decided to install Nvidia driver in it along side Cuda and Cudnn following this article published in medium. Following are the steps that I took along with the challenges faced with that.

Step 1 : Install Miniconda
Download Miniconda from here, then run
bash Miniconda3-latest-Linux-x86_64.sh
However tensorflow-gpu still don't work with python 3.7, following fix would be required upon installation.
conda install python==3.6
Step 2 :  Install Java and gcc
sudo apt update
sudo apt install openjdk-8-jdk
sudo apt-get install gcc-4.8 g++-4.8
Step 3 :  Install drivers 

This is the step where most of the mistakes happen, that can jeopardize the whole installation process as well as screw up your Ubuntu. For example the medium blog asks to do sudo ubuntu-drivers autoinstall, that in turns install latest nvidia driver (nvidia-415 in this case). However your tensorflow-gpu can handle only Cuda 9.2, and which tries to install nvidia-396. A mixed installation of nvidia-396 driver along with nvidia-415 caused broken pipe in my ubuntu. I could come out of that by taking following step sudo dpkg -i --force-overwrite-all path-to-the-nvidia-deb-file . So a better way of avoiding such version conflict is to see what cuda and cudnn version the present tensorflow-gpu supports and also what nvidia version is required for that Cuda. At present the stable version of tensorflow-gpu 1.12 requires Cuda 9.2 and Cudnn 7.2.1, which in turn requires nvidia-396. So following should be a right sequence to follow.

Add graphics drivers to your source list
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt upgrade
Check available nvidia drivers
ubuntu-drivers devices
Result :
vendor   : NVIDIA Corporation
model    : GP107M [GeForce GTX 1050 Mobile]
driver   : nvidia-driver-390 - third-party free
driver   : nvidia-driver-396 - third-party free
driver   : nvidia-driver-415 - third-party free recommended
driver   : nvidia-396 - third-party non-free
driver   : nvidia-driver-410 - third-party free
driver   : xserver-xorg-video-nouveau - distro free builtin

Install the compatible driver (avoid running sudo ubuntu-drivers autoinstall) using following command for  Cuda 9.2 and tensorflow-gpu 1.12.

sudo apt install nvidia-396

Now you need to reboot you system and after reboot run

lsmod | grep nvidia
or
nvidia-smi
and that should give following output


Step 4 : Install Cuda

Install CUDA®, which is a parallel computing platform and programming model developed by NVIDIA. Cuda is needed needed to run TensorFlow with GPU support.Download Cuda Toolkit 9.2 from here. Choose the following settings:



Upon installation verify it with following command 
ls /usr/local/cuda-9.2/
and the result should be something like this


After that add Cuda to your path using following code
export PATH=/usr/local/cuda-9.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Step 5 : Install cuDNN

cuDNN is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK. cuDNN 7.2.1 can be downloaded here. To download you need to sign in or log in to your Nvidia account. Upon download use following set of commands to install it.

Unpack the archive
tar -xzvf cudnn-9.2-linux-x64-v7.2.1.38.tgz
Move the unpacked contents to your CUDA directory
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
Give read access to all users
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
Now you need to add following block of code to the end of
~/.bashrc by doing gedit ~/.bashrc
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
export PATH=/usr/local/cuda-9.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} 
Now you need to run following line of code to make these effective
source ~/.bashrc
sudo ldconfig
Check if the path are correctly installed :
$ echo $CUDA_HOME
The output should be like this
/usr/local/cuda
Step 6 : Install tensorflow-gpu and Keras

Create virtual environment, I names it tf36 for tensorflow and python 3.6
conda create --name tf36
source activate tf36
Although there are lots of discussion on whether to use conda or pip for tensorflow-gpu, I've done it in conda way.
conda install -c anaconda tensorflow-gpu
Check TensorFlow installation with:
python
>>> import tensorflow as tf
>>> tf.Session(config=tf.ConfigProto(log_device_placement=True))
 Install Keras with :
conda install -c conda-forge keras
Last but not least as shown in the medium blog we can install further libraries such as matplotlib using conda, however that may cause python to downgrade dependent libraries and in turn stop using the gpu. So everytime after adding a new library to the virtual environment if you see that tensorflow isn't using gpu anymore, feel free to run the installation again using
conda install -c anaconda tensorflow-gpu
Hope it goes well for you and feel free to leave feedback about your experience.

Wednesday, October 21, 2015

Life as a Data Scientist!

Well,I keep getting emails or linkedin messages asking what I do as part of my daily life as a data scientist.Being a lazy writer I avoid answering in detail,although I know that there are lot to tell actually.

Is it something like this? Not actually,but yes at times!


 Is it something like this? well,quite likely!Just like avengers you may end up becoming superheros in many different fields,or just like an actor you get to live the life of different characters.You can lead the life of a hacker,a journalist,a scientist,a business analyst,a developer,a miner,a purchase manager,an artist and may be the life of a celebrity.


Let me explain why it is so.If you read any internet article or white paper that will tell you data science is all about 70-80% data manipulation and 30-20% machine learning.So it is obvious that you should be doing data munging and machine learning (model building) a lot.I will try to tell what it doesn't say.

 Data munging purely depends on from where the data comes,if your data source is external (such as 3rd party website,for which your company isn't ready to spend a dime) you have to know web scrapping.Web scrapping can be of different type.In some cases the website owner will be kind enough to make the data available using GET protocol itself.Others may not be that generous and those are the cases where you may have to use python packages like suds (which may break because of lack of maintenance and can come up with different github fork by a fan,but that's another story).The website might be having the data in a interactive manner in which case packages like selenium will be your saving grace.Also different webapi,text extraction packages will be useful at times depending on the nature of the website.So you really have to be a hacker (white-hat obviously) in your heart for this!


Yes,you get to lead the life of a jurno as well.Most of the analytic projects nowadays involve having interviews with people who have done related stuff.So don't be astonished if you have to arrange such interviews with some eminent professors or researchers and if you literally have to take notes during such hour long interviews.


 You will lead the life of researcher quite often.Cases like these can occur in different situations when you realize that a minor improvement in a recent research paper on noise removal from data could be useful in your project or coming out with a novel text categorization nlp work or innovating a feature selection mechanism that could be only useful for your particular project or creating altogether a brand new award wining classifier/regressor like xgboost.So yes,the Scientist tag in Data Scientist is there for a reason.Although it varies from project to project, but 5-15% of total time should be a fair estimate.


 Unfortunately a large chunk of time you'd have to spend as part of your job will be sitting with your clients and fellow business analyst,virtually playing the role of another business analyst to understand their requirement.They will often have a notion that a data scientist is nothing other than a superman fortune teller who should't even need historical data to predict future result.Often you may end up trying really hard to convince the fellow business analyst that he is actually not a data scientist and it's rather your job.A strong coffee at times might be of your real help during those long meetings in this role playing!


 Well,you have to become a heck of a programmer to deal with different type of data in different granularity to put them in a 2D format on which machine learning algorithms (classification,regression or clustering) can actually work.If you aren't from computer science background you'll generally start with R for its ease of use,but gradually you will make friendship with python for different reasons.After few months with these languages in your armor when you'll start feeling safe and secure,one fine morning a colleague of yours will tell you that,we must learn Scala for some other work.To irritate you further some of your friends will tell you about advent of faster languages on the horizon such as Julia ,Go and F#.You may feel clueless at that time,but I'll leave you to deal with your haplessness yourself.Probably it's the time when you'll realize that just like a pretty woman a beautiful programming language also comes in your life with her own imperfection.


 If that is not all big data gives you the feeling of a data miner with its vastness.You make yourself,your boss and your IT guys happy as long as you remain composed with traditional big data mining tools such as hive and pig.However to make the matter worse that colleague and some of your friend will keep telling you how awesome the latest apache project is and how fast it will disrupt the existing one.You'll get happy when you'll learn apache spark for its speed but months later it'll give you headache when you'll learn that you have to unlearn that for another awesome technology.Eventually you'll forward this demand to you IT setup guys to make their life equally challenging.


 Yes,you get to play the role of a purchase manager as well when you have to be present in regular demo sessions arranged by your company where sales persons from different big data/machine learning product companies will try to impress you with the awesomeness of their tools.It's altogether a different case when at the end of the day while using that tool you'll realize that you are doing more bug reporting than actually using the tool successfully.


 Needless to say the hidden artist inside you,who used to draw crappy paintings in school or rather worse looking replica of his teenage girlfriend will finally get to use his artistic sense in real commercial place.Your boss will keep pushing you to create state of the art visualizations using Tableu,Trifacta,Oracle BDD or D3 only to cause a havoc confusion inside you "What is the expectation from me,am I a programmer,a data expert or an artist !!".



 Considering all these the best part of being a data scientist is often you get the treatment of a celebrity when you log in to your linkedin profile.The sheer number of messages from recruiters trying to pull/place you in a different company will definitely give you the feeling of a celebrity.On top of that you might end up getting invitation from data science based conferences to grace them with your presence as speaker! So typical celeb life nonetheless.

 At the end, it will be unfair to not mention wannabe data scientists and their numerous questions that will inspire you to write such blogs which should once and for all relieve you from answering such questions requiring detailed answers.