You are reading:

Find out how to create mock and dummy data for your data science projects

Photo by

Volkan Olmez



We all know that data is essential. The problem is that many times we do not have (enough of) it. As we develop data applications or pipelines, we need to test them with data that resembles what might be seen in production.

It is difficult to manually create realistic datasets of sufficient volume and variety (e.g., different data types, characteristics). Furthermore, hand-created data is prone to our subconscious and systematic biases.

Fortunately, there are free online resources that can generate realistic fake data for us to use for testing. Let’s take a look at some of them:








JSON Schema Faker




Mock Turtle

(1) Faker

The term ‘Faker’ is synonymous with mock data generation, given that there are numerous Faker data mocking libraries for different programming languages (e.g.,








). The Faker library featured here is the one under the Python version.

Faker is a Python library that helps you generate fake data. From the documentation, we can see that it can be easily installed the following command: pip install Faker

Once installed, we instantiate a Faker class object and call the various methods to generate the type of mock data we want:

Screengrab from Faker GitHub Repo | Image used under

MIT License

Furthermore, Faker has its own pytest plugin, which provides a faker fixture you can use in your tests.


  • Faker GitHub

(2) Mockaroo

Mockaroo allows you to quickly and easily download large amounts of randomly generated test data based on the specifications you define.

The wonderful thing about this platform is that no programming is needed, and you can download them in many different formats (e.g., SQL, XML, JSON, CSV) to be loaded directly into your test environment. Upon signing up, you can also create and save your schemas for future reuse.

With Mockaroo, you can

design your own mock APIs

and deploy them in your private cloud by leveraging the Mockaroo

docker image


Screenshot of Mockaroo homepage | Image used with permission from Mockaroo

See more: vegas-x-credits-Working-Vegas-X-credits-generator-cheat-codes 1.0.0


  • Mockaroo website

  • Mockaroo GitHub

(3) GenerateData

GenerateData is a free, open-source tool that allows you to generate large volumes of custom data quickly. Like Mockaroo, the website offers an easy-to-use interface (with a quick-start feature) for creating numerous different types of data in various formats.

Screenshot of GenerateData (V4) | Image used under

GPL3 License

After testing out the demo on the main website, you can also download the free, fully functional, GNU-licensed version. If you require mock data beyond the maximum of 100 rows per run, a small $20 donation lets you generate and save up to 5,000 records at a time.

At this writing, the new version of GenerateData (V4) is close to a beta release, so do check out the GitHub repo for updates.


  • GenerateData website

    (Latest version — v4)

  • GenerateData website

    (Old version — v3)

  • GenerateData GitHub

(4) JSON Schema Faker

The JSON file format is one of the most popular ways of storing and transmitting data objects. Hence, generating both the fake data and the JSON schema that defines the data structure would be beneficial.

The JSON Schema Faker combines JSON Schema standard with fake data generators, allowing you to generate fake data that conforms to the schema.

The website has a user interface for you to define the schema. Instead of manually writing the schema, you can select and build upon the list of Examples already prepared for you.

Screenshot of JSON Schema Faker tool | Image used under

MIT License

See more: Angry Birds Star Wars for Android-Download the APK from Uptodown


  • JSON Schema Faker website

  • JSON Schema Faker GitHub

(5) FakeStoreAPI

You should have come across a fair share of generic (i.e., Loren Ipsum) kind of mock data by now. This is where FakeStoreAPI switches things up.

It is a free online REST API for creating pseudo-real data for e-commerce or shopping use cases without running server-side code. The mock data will be highly relevant for projects that require retail-related data (e.g., products, carts, users, login tokens) in JSON format.

With just a few lines of code for the API call, the mock data can be readily created or modified:

Screenshot from FakeStoreAPI website | Image used with permission from FakeStoreAPI


  • FakeStoreAPI website

  • FakeStoreAPI GitHub

(6) Mock Turtle

Mock Turtle is a user-friendly GUI-based platform for users to generate fake data in a JSON schema.

The tool mimics a JSON tree structure, and you can directly see the changes in the schema upon each click.

Besides JSON schema parsing, it also allows for the generation of nested structures and large datasets at no cost.

Screenshot from Mock Turtle website | Image used with permission from Mock Turtle

See more: Google Meet-Secure Video Meetings APK for Android-Download


  • Mock Turtle website

Know of other excellent mock data generators? Let me know in the Comments section.

Before you go

I welcome you to join me on a data science learning journey! Follow this


page and check out my


to stay in the loop of more exciting data science content. Meanwhile, have fun generating fake data!

Enhance your Python code’s readability with pycodestyle

Automatically review the readability and quality of your Python scripts based on PEP-8 style conventions

Analyzing English Premier League VAR Football Decisions

Reviewing the controversial implementation of Video Assistant Referees in English football using Python

Chuyên mục: App

Related Articles

Leave a Reply

Your email address will not be published.

Back to top button
444 live app 444 live 444 live app 444live kisslive kiss live yy live yylive