Skip to content

Yusuf Tas

  • Home
  • Deep Learning
  • Data Science
  • General Programming
  • Contact
  • About
December 31, 2021 / General Programming

Convert Web Page to Text

It has been a while since I last published a post. Finally it is time to come back to this blog and keep learning new stuff I can share. Let’s get back to business.

As the title suggests, we will take a URL of a web page and save that page in a text document. This is particularly useful when working with NLP based problems and you need textual information about something. Web is the best source of abundance of information, for example Wikipedia. But copying and pasting manually from web will not be efficient where you need to process a lot of pages. So here comes the solution, automate web to text conversion with little help from Python.

While looking for this, I came across BeautifulSoup. It is a great tool in Python for processing html. And it does have a function called get_text, how lucky we are 😀 Here is a very short function for requesting a webpage and getting text:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import urllib.request
from bs4 import BeautifulSoup
 
 
def Web2Text(url, outname):
    # Header for the http request
    user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
    headers={'User-Agent':user_agent,}
 
    # Request and read the html from the given url
    request  = urllib.request.Request(url,None,headers)
    response = urllib.request.urlopen(request)
    data = response.read() # HTML data of the web page's source
 
    # Clean html
    raw = BeautifulSoup(data).get_text()
    print(raw)
 
 
    with open(outname, 'w',  encoding="utf-8") as outf:
        outf.writelines(raw)
(more…)
Continue Reading
April 28, 2018 / Data Science

Cryptocurrency Trading Bot Using Deep Learning: Part-1 Data Gathering

Recently, cryptocurrency trading has been one of the most talked topics of the technology. With severe ups and downs, bitcoin and cryptocurrency trading gets attention from millions of investors. Its unpredictable nature and volatility attracted my attention. So I decided to develop an automated cryptocurrency bot using deep learning. The bot should be able to analyze the current trends and changes in the price and should decide when and how much it will buy or sell to make profits.

There will be several posts to break the bot into manageable parts. So this time, instead of a one script project, it will be a big project with Object-Oriented Design and several files. It would be very difficult to share all the code in one post, so if you are interested and want to run the code, you can get it from here: https://github.com/yusuftas/deep_trader_bot .

This first post will cover the part of data gathering. Since I am planning to use Deep Learning, it will need lots of data samples to make good predictions. It is possible to get historical data without all this code, but I am planning to run real time tests to see if the bot can make a profit, all this code posted here will definitely be needed.

(more…)

Continue Reading
April 16, 2018 / Data Science

Beginner’s Guide to R

What is R? “R is a language and environment for statistical computing and graphics. R is an integrated suite of software facilities for data manipulation, calculation and graphical display” from the official website. R is a great tool to have in any Data Scientist’s skill set. It is a statistical and graphical plotting tool more than a programming language.

As I’m learning R myself, I will post what I learned along the way. It is kind of lecture note that might also be helpful to others. I will time to time update this post with more tips and tricks in R. Let’s begin.

Installing R is pretty straightforward. You can find it in the official website after chosing a mirror: https://cran.r-project.org/mirrors.html. After installing and starting R, you will see a command console, similar to how Python works with command shells, you can run any command in the shell, or create scripts and run your scripts through console. I would suggest using RStudio which is like an IDE for R environment. It makes things easier and has a nice interface. It is free for non-commercial use.

(more…)

Continue Reading
April 10, 2018 / Computer Vision

Using Tensorflow Object Detection API with OpenCV

In this post, I will go over how to use Tensorflow Object Detection API within OpenCV. To be honest, I haven’t used OpenCV for quite some time. And after recently looking into it, I have realized how awesome OpenCV has become. It now has a dedicated DNN (deep neural network) module. This module also has functionality to load Caffe and Tensorflow trained networks. I am just so happy to see that functionality in OpenCV. Just think about it, you can use your Caffe or Tensorflow trained networks within OpenCV.

Alright, enough blubbering, let’s get back to the topic. In this post, I will use OpenCV DNN’s functionality to load a trained tensorflow network and use this network to apply object detection to a webcam stream. So in the end, we will have a display that shows webcam stream and in the stream we modify the frames and display detected objects with rectangles. Before we begin, let’s start with the result:

(more…)

Continue Reading
April 10, 2018 / Deep Learning

Caffe Python Installation with Anaconda

Caffe is one of the famous Deep Learning frameworks. Its main core implementation is in C++ which got my attention when I started my Phd. Other than C++ it also has wrappers/interface for Matlab, Python and command line. Matlab interface is called matcaffe and python interface is called pycaffe. In this post I will talk about my observations and experiences in installation process of pycaffe.

First of all, you might ask, why Caffe or even pycaffe ?. Caffe is one of the first frameworks I learned,modified,extended etc. So I wanted to go back to it for this instance. Actually main reason is that its C++ core and python interface makes it easy for me to use OpenCV with it which will be the next topic I will post.

In this post I won’t go into too much details of Caffe / pycaffe. This post is mainly about installation and the problems you can face during installation.

(more…)

Continue Reading
April 1, 2018 / Data Science

Fully Connected Regression using Tensorflow

After my last post on Learning the randomness, I realized I might need to post something simpler on tensorflow. So in this post I will go over a simple regression problem to show that we can teach machines at least something 🙂

Today’s problem is square function, yes the simple x^2 function :

y = x^2

By using tensorflow and manually generated data, we will try to learn the square function. First lets have a look at the data generation part :

Data generation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
datacount = 1000
 
#linearly spaced 1000 values between 0 and 1
trainx = np.linspace(0, 1, 1000)                      #trainx = np.random.rand(datacount)
trainy = np.asarray([v**2 for v in trainx])
 
 
#shuffle the X for a good train-test distribution
p = np.random.permutation(datacount)
trainx = trainx[p]
trainy = trainy[p]
 
    
# Divide data by half for testing and training
train_X = np.asarray(trainx[0:int(datacount/2)])
train_Y = np.asarray(trainy[0:int(datacount/2)])
test_X  = np.asarray(trainx[int(datacount/2):])
test_Y  = np.asarray(trainy[int(datacount/2):])
 
#plot the train and test data
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
 
ax1.scatter(train_X,train_Y)
ax1.set_title('Training data')
ax2.scatter(test_X,test_Y, color='r')
ax2.set_title('Testing data')

(more…)

Continue Reading
March 31, 2018 / Data Science

Learning the Randomness – Attempt 1

Have you ever wondered how randomness work in computers ? Try the button to generate a random number :

Every time you click the button, you will randomly get a number. Isn’t it amazing that the computer, a deterministic object, can create these sequence of random numbers ? To me, this is a very interesting topic. Let me tell you something, actually what you see isn’t random. These sequences generated by using a function called pseudo-random generator.

Recently I was thinking about pseudo-random number generators. Pseudo-random number generators (prng) are functions that generate a sequence of numbers in a way that the sequence approximates randomness. The reason it is called pseudo is because the sequence is actually deterministic. If you know the starting point, you can get all the values in the coming sequence. This initial value is called the seed of the prng. So generated sequence completely depends on the value of the seed. To make a prng a true random generator, usually prng’s seed would get connected to a truly random event. For example the moment in micro seconds where a user presses the button generate. This event is totally random in terms of time, we can’t really know when the user will press. So if we were to use the moment in micro seconds as the seed of the prng, we would be able to get truly random number sequence, since the seed is randomized by a true random event.

Let’s first look at how random is this generator in reality? What if we randomly sample 1000 x points and 1000 y points and plot (x,y) to see what we get :

Random 1000 points plot
Python
1
2
3
4
5
6
7
import numpy as np
import tensorflow as tf
import time
import matplotlib.pyplot as plt
 
plt.scatter(rng.rand(1,1000),rng.rand(1,1000))
plt.show()

It looks pretty much random to me 🙂 So my idea is to somehow learn the randomness, but that plot basically shouts “I’m randooom. ” but still there must be something I can try, I thought. I decided to look into random seed. For a given random seed, generated random sequence is deterministic. Maybe I can generate some data, using different random seeds and generate the first random value generator will generate. If we can utilize this data to figure out the value prng will generate, we can pretty much solve the randomness.

(more…)

Continue Reading
March 31, 2018 / General Programming

Random Walking Robots in Matlab

Hi Everyone,

In this post, I wanted share my experience of coding random walking in matlab. This was a little lecture problem for one of the courses I was tutoring. Problem is to simulate random walking robots in a 2D field:

  1. There will be several robots in the field. Configuration of the map should be read from a file. File will respectively include : size of the field (w,h) ; number of robots; position and color marker for the each robot line by line e.g 10,10,bo
  2. Each robot will move randomly in one of the 4 directions: up,down,left,right
  3. When a robot has picked a direction to move, if another robot is occupying that position, it should not move for this turn.
  4. When a robot reaches any of the boundaries, it should stop.
  5. Animation should end when all of the given robots stops.

Let’s start with a visualization of what we are trying to get :

Random walking robots simulation

To solve this problem we will need several things. We need to read the configuration from a file, we need to generate the map plot with given settings and then generate the simulation with the given specifications. Now lets start with reading the input file.

(more…)

Continue Reading
February 12, 2018 / Life

Hello world!

It has always started with a “Hello world!”.

Babies are born and they are told “Hello world!”. Everyone who starts learning coding tries to print “Hello world!” before even understanding any programming. And here I am, starting my first post in my new blog with a “Hello world!”.

There is a saying that “Ignorance is bliss”. But not all the time. Out of my ignorance I lost my old blog. To be honest, I was not expecting that domain hunters would pick my website and try to sell it back to me for more than $1000 and so I ignored. Ignorance was a bliss until I realized I lost my domain. So here we are, with a new domain.

(more…)

Continue Reading

Reach me

  • Github
  • Twitter
  • Youtube
  • Mail

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Recent Posts

  • Convert Web Page to Text
  • Cryptocurrency Trading Bot Using Deep Learning: Part-1 Data Gathering
  • Beginner’s Guide to R
  • Using Tensorflow Object Detection API with OpenCV
  • Caffe Python Installation with Anaconda

Recent Comments

  • juan on Cryptocurrency Trading Bot Using Deep Learning: Part-1 Data Gathering
  • Maajid Khan on Caffe Python Installation with Anaconda
  • Nick G on Caffe Python Installation with Anaconda
  • Morne Supra on Using Tensorflow Object Detection API with OpenCV
  • Leon Ardo on Cryptocurrency Trading Bot Using Deep Learning: Part-1 Data Gathering

Categories

  • Computer Vision (1)
  • Data Science (4)
  • Deep Learning (5)
  • General Programming (4)
  • Life (1)

Tags

algorithm trading anaconda pycaffe automatic trading bot beautifulsoup beginner r binance-api caffe compiling errors caffe python installation cell array cryptocurrency data science deep learning deeptraderbot fc layer tensorflow fully connected networks hello world html render learning randomness life matlab mobilenet nlp opencv object detection phd prng pycaffe python python-binance r random randomness random pattern random walk random walking r vs python simple tensorflow square function statistics struct tensorflow tensorflow object detection web2text webcam object detection webcam stream web to text
© 2023 Yusuf Tas - Powered by SimplyNews
 

Loading Comments...