Six Free Public Data Sets for Your Data Science Project: what else you need to practice

As a fresh data scientist, completing your first project gives you a lot of happiness. Of course, it is a major milestone of your career that reinforces your skills towards the way of success.If you don’t have prior experience working on data sets, you must be thinking about how and where to find an appropriate data science data set? How long it should be and how simple or complicated data set require to test your skills.

Worry not, we will help you to find some interesting and suitable datasets for your first project. But first, let’s revise some essential details.

What is a dataset?

 A collection of data is called a data set. You can find data set in the format of spreadsheets or CVS format. Usually, a single file that is organized in columns and rows has stored requires details or a data set. But, some data set could be in the form of other formats. A data set can be in a zip file or folder that contains multiple data tables with related data.

How are datasets created?

Data sets are created in different ways. You can find many types of data set. Some data set are machine-generated and some are collected via surveys. Some data set are recorded from observations and some have been scraped from websites.

When you start working on a data set, it is essential to know how data set has created and where it comes from. Just start analyzing gives you a tougher challenge. Finding data sets can be fun or frustrating at the same time because sift through thousands of datasets to find a suitable one is a difficult task for a newbie.

Fortunately, you can find a suitable data set from online repositories that arrange datasets and remove uninteresting ones.

Free public dataset to use

  1. United States Census Data

The U.S Census Bureau public stack of demographic data at various levels. You can find these interesting data set at the state, city, or zip code level to use for free.  To find a suitable dataset, you can access the Census Bureau website and create data visualization projects.

The data can be access via API through choroplethr. These datasets are clean, organized and nuanced and don’t require cleaning manually.

  • Medicare Hospital Quality

You can find interesting datasets from The Centers for Medicare & Medicaid Services that curate a database on quality of care at 4,000 Medicare-certified hospitals all over the U.S. These interesting records provide amazing comparisons. This is a huge store of data sets and you will need to do a bit of research to understand. The data may be spread over multiple files, so you need to be careful. 

  • Bureau of Labor Statistics

Bureau of Labor Statistics maintains the records of economic activities such as unemployment, inflation or GDP, etc. This large data set helps you to understand data visualizations and data processing.


  • UNICEF data records in various categories can be an interesting and credible source for your project. You can find the data about the lives of children, children’s health condition, or the education rate of children around the world.

  • Wikipedia

Wikipedia provides information about any topic. People search for information on Wikipedia and they have access to edit any information. The Wikipedia database could be an interesting resource for research and projects. The Wikipedia

Database is available for personal use and mirroring, so you can download a huge number of data set to your computer and clean it to use.

  • IMF Economic Data

The International Monetary Fund’s website provides you a stack of global financial statistics. You can access them for data visualization and data cleaning.

