1. Introduction: Business Problem

1.1. Exploring the best venues in Dhaka, the Capital of Bangladesh.

Dhaka, the capital of Bangladesh. More than 19.5 million people live in Dhaka. It is the most densely populated city in the country spanning 306.38 square kilometers – about 118.29 sq miles [1]. The rich traditions and the interesting history of Dhaka have managed to attract a lot of tourists over the years. And the number of tourists visiting the area has only increased with time. This city definitely stands on the expectations and pressure that comes with the capital city. Dhaka is much more than traditions and boring business numbers.

When tourists visit a city or country it’s apparent that they will look for the best places to stay and eat. They tend to visit places based on previous user ratings or prices. There are many venues in Dhaka for eating and staying, but not all of them are best, or not all of them are affordable for everyone.

The main goal of this project is to explore various venues in Dhaka and analyze them based on user ratings and pricing tiers. We’ll focus specifically on the ‘food’ and ‘hotel/resort’ category. This project will use FourSquare API [2] as its prime data gathering source as it has a database of millions of places, especially their places API which provides the ability to perform location search, location sharing, and details about a business. Further, a map of the venues with specific color attributes will be plotted to highlight the position and information about these venues. This enables tourists to choose the perfect venues to eat and stay.

1.2. Target Audience:

The target audience for this project can be divided into two categories. First, tourists who are visiting Dhaka, Bangladesh can use the clustered map to decide what places to visit and in which hotel or resort to stay based on their budget and rating preferences. Second, developers can create a website or mobile application using this information, in which the data about location will be updated on a regular basis, to help tourists decide the places to visit and hotels to stay in.

2. Data

2.1. Data Source

Foursquare API Data: This project uses Foursquare API to get location data and other information about different venues in Dhaka, Bangladesh.

2.2. Data Extraction:

Foursquare API returns a lot of properties/features for each venue. Not all the features are necessary for our analysis. I’ve only retrieved the following features:

  1. Name: The name of the venue
  2. Rating: The rating provided by the users
  3. Price category: The price category the venues belongs to as defined by the API
  4. Latitude: The latitude of the venue
  5. Longitude: The longitude of the venue
  6. Category: The category defined by the API
  7. Address: The address of the venue
2.2.1. Dhaka:

First, I’ve extracted the latitude and longitude of Gulshan, Dhaka (chosen Gulshan as it’s in the middle of Dhaka), using the geocoder library.

Latitude of Dhaka: 23.782660000000078, Longitude of Dhaka: 90.41164000000003.

I created a map zoomed on Dhaka using the folium library. I plotted a marker on Gulshan, Dhaka, using the extracted coordinates. It’ll help us visualize our starting point for the next phase, i.e data extraction from FourSquare API.

2.2.2. Data Extraction from Foursquare:

For using Foursquare API we’ll need to create a developer/sandbox account. After creating the account we’ll have to create a project. Once the project is created we’ll be able to access the client_id, client_secret that will be necessary for fetching data from Foursquare API. We’ll also have to set up an access_token that is required for some endpoints.

Next, I’ve set the radius to 9 Kilometers. A version parameter in ‘YYYYMMDD’ format is also required. We can simply set it to the current date to retrieve the latest data.

In the first step, I have retrieved the venues from Foursquare within 9km from Gulshan, Dhaka. Then I’ve removed the rows that are not related to food and hotel categories. Here is a sample of the dataset:

blank

The initial data doesn’t contain venue details like price, rating, etc. So I’ve retrieved more details about each venue from Foursquare by the venue_id that I’ve retrieved earlier.

Now there are two different datasets, retrieved from API separately, so I combined both datasets. Here’s a sample of the combined dataset.

blank

There were some rows with missing PricingTier and Rating values, so I’ve removed them. After finalizing the dataset we had a total number of 85 venues available. Then I’ve plotted all the venues in the dataset on the map.

blank

3. Methodology

The main goal of this project is to explore various venues in Dhaka based on user rating and average price. It aims at allowing visitors to identify the best places to eat and stay around the city. We will extract data within an area of 9km around the city center, as it covers most of the places of the City.

In the first step, we have retrieved the venues from Foursquare within 9km from Gulshan, Dhaka. But the initial data doesn’t contain venue details like price, rating, etc. So we’ve retrieved more details about each venue from Foursquare by the venue_id that we’ve retrieved earlier.

In the second step, we’ve combined the detailed dataset with the previously retrieved data. Then we’ve cleaned the data by dropping the rows with missing prices or ratings. So the final dataset contains all the required features about each venue including price and rating.

In the final step, we’ll analyze the data that we created based on the ratings and price of each venue. We will use heatmaps to identify areas where the venues are located. This will allow visitors to choose the areas with a large number of venues, this way they will have more options to choose from in one go. We’ll explore venues based on pricing tier and rating. We’ll identify the number of venues per category using a barplot. We’ll identify which categories of venues are popular in the city. We will present the results of the analysis in the form of plots, maps to the stakeholders so that it’ll be easy to understand for them. The venues will be clustered based on the extracted information we’ve for each venue. We’ll analyze the areas that belong to each cluster. This way we’ll be able to identify the areas that are most suitable for a particular visitor based on their pricing and rating preferences.

4. Analysis

Then I’ve performed some basic explanatory data analysis to get more insights from our raw data. I’ve used a heatmap to see in which areas of the city the venues are located. I’ve collected the geojson data of Dhaka from [3] to draw the map.

blank

Most of the venues are located around the Dhanmondi and Gulshan area. So, these areas will be pretty much a good choice for visitors.

4.1. Categories:

Performed some analysis based on the category of each venue. I’ve used a barplot to visualize the number of venues for each category.
blank

Among the venues in our dataset, most of them are cafes, and a bit lower number of Coffee shops, Restaurants. So, visitors with interest in Cafe will have more options to choose from.

4.2. Rating:

Rating plays an important role in determining if a venue is worth visiting. Also, tourists tend to be more interested in previous user ratings for a venue, to decide if they should visit that venue or not.

The range of rating in the dataset is 1-10, I’ve divided them into 4 bins and assigned labels to the bins (Low, Okay, Good, Excellent). I’ve used a barplot to show the number of venues in each category.

blank

I’ve used boxplot to see what venue categories are highly rated.

blank

The plot shows the Ice Cream Shop and Cafe categories are highly rated. Then plotted the rating categories on the map, and see which areas have the highest rated venues. Here’s what each color represents: ‘Low’: ‘red’, ‘Okay’: ‘lightred’, ‘Good’: ‘orange’, ‘Excellent’: ‘green’.

blank

4.3. Price:

There are many venues in Dhaka for eating and staying, but every person has their own budget or not all the venues are affordable to everyone. I’ve used barplot to see how many venues fall under each pricing tier.

There are four pricing tiers defined by the Foursquare API, 1 being the cheapest, and 4 being the most costly.

blank

It’s good to see that most of the venues are in the cheapest pricing tier. Then plotted the pricing tiers on the map to visualize pricing tiers in different areas. Here’s what the colors represent: ‘1.0’: ‘green’, ‘2.0’: ‘orange’, ‘3.0’: ‘lightred’, ‘4.0’: ‘red’.

blank

4.4. Clustering

I’ve clustered the venues based on their price, rating, and location to get more insights about the venues and identify the relationship between them.

First, I’ve dropped the columns that are not used for clustering. Then, I’ve used the Elbow Method to determine the optimal number of clusters.

blank

The steps of the elbow seem pretty smooth. I decided to go with 2 clusters. Here is a sample with cluster labels for each venue.

blank

I’ve plotted the clusters on the map to get a better understanding.

blank

4.5. Examine Clusters

Cluster 1: The average pricing tier of Cluster 1 is 1.40 and the average rating is 7.71.

Cluster 2: The average pricing tier of Cluster 2 is 1.39 and the average rating is 6.23.

5. Results and Discussion

From our analysis, we can come up with several conclusions that will enable tourists to choose the perfect venues to eat and stay in Dhaka, Bangladesh. We have extracted data within an area of 9km around the center of Dhaka. Initially, a total of 154 venues were fetched containing the categories of venues. As our analysis was focused mainly on the food and hotel/resort category, we removed other categories. The total number of venues were 98 after removing other categories. For these 98 venues, we fetched additional data: price_tier, rating, address. Then we combined the two datasets and dropped the rows containing the missing price or rating. Finally, we are had a total number of 85 venues available.

From the visualization of the venues of our interest, we see that most of the venues are located around the Dhanmondi and Gulshan area. So, these areas will be pretty much a good choice for visitors. Among the venues in our dataset, most of them are cafes, and a bit lower number of Coffee shops, Restaurants. So, visitors with interest in Cafe will have more options to choose from.

The range of rating in our dataset is 1-10, We’ve divided them into 4 bins with labels Low, Okay, Good, Excellent. We’ve identified that most of the venues are rated good. The map shows that these venues are around Dhanmondi and Gulshan areas. We’ve identified that the Ice Cream Shop and Cafe categories are highly rated. So we can say that these categories are most popular around the city. The analysis also shows that most of the venues are in the cheapest pricing tier.

Using clustering we’ve separated the venues into two clusters. The average pricing tier of Cluster 1 shows 1.40 which is cheaper and the average rating of Cluster 1 shows 7.71 which is good. Cluster one contains most of the venues from Gulshan and a few from Dhanmondi. And the average pricing tier of Cluster 2 shows 1.39 which is cheaper and the average rating of Cluster 2 shows 6.23 which lower than cluster 1. Cluster two contains most of the venues from Dhanmondi. So if a tourist is looking for highly rated and cheaper venues then he/she can choose Gulshan. Otherwise, Dhanmondi is also a decent choice. Apart from helping tourists, developers can create a website or mobile application using this information, to help tourists decide the places to visit and hotels to stay in.

6. Conclusion

The main goal of this project was to explore and identify popular venues in Dhaka city for tourists based on the user ratings and prices of venues. This project fetched data from the Foursquare API and analyzed the data to get insights. Then the results were visualized using different plots for a better understanding. Finally, a clustering step has been carried out and plotted the resultant clusters on the map. The map shows that there are two main areas tourists can visit: Dhanmondi and Gulshan. Although the Dhanmondi area contains some venues from cluster 1 also. The price ranges of both venues are almost the same, but the rating is a bit higher in the Gulshan area. Other than that both areas are great choices for tourists.

7. References