You’ve got a huge spreadsheet that has a lot of data in it, and you’ve built an Android app which works like a search engine on the data. Nice! But there’s a problem: when you build your app with all of that data in it, the APK is huge! You want to reduce the size of the app. And you also want to offload the search functionality onto a relational database, which is probably going to provide a more efficient search. How do you start?

This blog post explores one way to do it. It’s “quick and dirty”, and it’s not recommended to do things exactly this way. I’ll talk about why in Part II. But this method will give you a start.

Here’s a quick sketch of the idea: You put your data in the cloud using the Amazon Relational Database Service (RDS). Then you add JDBC calls to your app to access the cloud. It’s pretty quick. Here are the steps, using a simple example that I tried for myself.

Technical Details: My development environment runs Ubuntu 16.04, and I have a MySQL client and the MySQL database already installed on my local machine. I use Android Studio 3.5 IDE for building Android apps. Also, I have an Amazon AWS account set up already. You can follow this tutorial if you don’t have any of that, but then specific steps will differ for you.

Get Your Data Source Ready

For my data source, I downloaded some food inspection data from healthdata.gov in a csv (“comma-separated values”) format. I opened the csv file in a spreadsheet, selected some of the columns that I wanted, and exported them to another file, also in csv format. You can use the csv file that I generated by starting with this small, truncated version of the data. Later, you can use or create your own, very large data source for experiments.

Create an Amazon RDS MySQL Database

Visit the Amazon MySQL RDS page and click “Get Started”. If you don’t have an AWS account, you will need to sign up for one, first. Check out the pricing, if you are worried. There’s a free tier, great!

If you’re already signed in, another way to get started is to visit the AWS Management Console, search for “RDS”, and click the result for “Managed Relational Database Service”.

At this point, you’ll see a “Create Database” button. Choose “MySQL”, and click the “free tier”. Type in healthdata-1 for the name. Choose a username when requested. I’m using fullstackdev. Pick a secure password. The other parts of the form are straightforward. You can think about using IAM based authentication later. For this proof-of-concept piece of work, let’s keep it simple, and use password based authentication. For the rest, accept all defaults.

At this point, a page opens which says the database is being created.

AWS RDS creating database

Click the “modify” button. You’ll see that you can modify various things about the database later, if you want. Just be aware of this. For right now, you’ll need to “modify” the RDS so that it can be accessed from external sources – so choose “Public accessibility” and set it to Yes, and make sure to click the “Continue” button at the bottom of the page to save your changes. You need to do this so that you can create a database, load data into it, and access it via JDBC.

Now we’ve got an RDS in the cloud, and it’s accessible from our home environment. Next, we need to create a database.

Create Your Database and Manage Access

If you click the DB identifier in your RDS console, you will see an area called “Connectivity & security”. That area tells you what your endpoint is, and what your port is. The port defaults to 3306. Your endpoint will be something like healthdata-1.c84gpzpanfrn.us-east-1.rds.amazonaws.com. This is a URL you can use to access the database from another machine.

In the ‘Security’ pane, at the right, you will see your VPC (Virtual Private Cloud) security groups with a link to the default. Click that. It will take you to your Security Groups area. The default VPC security group should be preselected. Look at the bottom panel, where you should see the “Description”, “Inbound”, “Outbound”, and “Tags” tabs. Click “Inbound” and hit the “Edit” button. Click the “Add Rule” button, select MySQL/Aurora, make sure that the protocol is set to TCP/IP and the port to 3306, thne choose “MyIP” as the source. Your IP address will be set when doing this. Then hit the “Save” button.

Remember that you’ve added this rule just for your own IP address! You’re doing this for test purposes. Later, if you want, you can make different inbound rules, but this setup is good for a proof-of-concept.

Now the RDS is accessible. I am comfortable using the command line for MySQL client, so I used this to step into the cloud, and create my database. You can use whatever tool you want to do this.

First, I connected via this command:

mysql -u fullstackdev -P 3306 -p -h healthdata-1.c84gpzpanfrn.us-east-1.rds.amazonaws.com healthdata-1

The -p option tells the client to ask for a password interactively. I gave the password that I had set up earlier, and immediately, I was connected. This is what I saw:

Type: MySQL/Aurora,
Protocol: TCP
Port Range: 3306
Source: MyIP
Description: MySQL client

show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| innodb             |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.03 sec)

It’s the usual default MySQL database setup.

I had already designed a database around the food inspection data that I had decided to import. I created my own database like this:

CREATE DATABASE food_inspections;
USE food_inspections;
DROP TABLE health_reports;
CREATE TABLE health_reports (
    id INT AUTO_INCREMENT PRIMARY KEY,
	inspection_id INT,
	dba_name TEXT,
	aka_name TEXT,
	license_num INT,
	facility_type TEXT,
	risk TEXT, address TEXT,
	city TEXT, state TEXT,
	zip TEXT, inspection_date DATE,
	inspection_type TEXT, results TEXT,
	violations TEXT, location TEXT
);

I didn’t add any indexes for the columns other than the primary key. That can all be added later, when performance tuning.

Push Your Data to Amazon RDS MySQL Database

AWS provides instructions for pushing data to a MySQL RDS in the cloud. Since we have a new RDS which is already set up, we can skip straight to step 5, “Load the Data”.

They tell you to use the mysqlimport command, and you can do that if you want. There are other tools that can be used to import data, too. However, since I was already in the MySQL client, I used the LOAD DATA command, like so:

LOAD DATA LOCAL INFILE 'Food_Inspections_small.csv' INTO TABLE health_reports
	FIELDS TERMINATED BY ',' ENCLOSED BY '"'
    LINES TERMINATED BY '\n' (@inspection_id, @dba_name, @aka_name,
		@license_num, @facility_type, @risk, @address, @city, @state, @zip,
		@inspection_date, @inspection_type, @results, @violations, @location)
	SET inspection_id = @inspection_id, dba_name = @dba_name, aka_name = @aka_name,
		license_num = @license_num, facility_type = @facility_type, risk = @risk,
		address = @address, city = @city, state = @state, zip = @zip,
		inspection_date = @inspection_date, inspection_type = @inspection_type,
		results = @results, violations = @violations, location = @location;

Keep in mind that you may need to modify this command for your own purposes. I had launched the MySQL client from within the same directory where my Food_Inspections_small.csv was located, , so this command worked for me straightaway.

Now, my RDS is all set up, complete with data! That is half the battle. In my next blog post, I’ll cover how to access the RDS using an Android app.