English Romanian dictionary for Machine Learning

This post is an initiative to build a list of Romanian translations for Machine Learning terms.

A

Artificial Intelligence = Inteligență Artificială

B

Biased = tendențios.

E

Eigenvalues (vectors) = valori (vectori) proprii.

F

Features = atribute, particularități, caracteristici.

M

Machine Learning = Învățare Automată

O

Outliers = valori extreme.

S

Skewed = asimetric.

T

Threshold = prag.

Toy problem = problemă didactică.

Automating custom workflow in Emacs

Due to the lack of human resources in a research project I'm involved the team decided to take upon itself the semantic comparison of 3000 pairs of papers from the medical domain.

Each paper is a json file with structured contents of the publication like in the picture below: nil

Since we also want to do at least some cross-validation we decided that each member of the team should compare 2000 pairs of publications so that each pair will be compared by two out of three persons from the team. So we split the 3000 publication pairs into 3 sets which were saved into csv files with the following structure: file1, file2, similarity_score; where file1 and file2 are the names of the files and similarity_score is to be filled with the semantic similarity score.

My first idea was to have Emacs split into three windows to allow for both registering the score and a side-by-side comparison of files: nil

For each pair of files from the left window I would:

  1. Copy the name of the first file using C-a C-SPC C-u 2 M-f M-w
  2. Go the the top window on the right using C-x o 2 (I'm using ace-window thus after pressing C-x o I can select the window to switch to by pressing it's number key)
  3. Open the file using C-x C-f then navigating to the directory and yanking the file name at the end
  4. Move back to the first window using C-x o 1
  5. Copy the name of the second file by first moving over the first comma with C-f then C-u 2 M-f M-w
  6. Go to the bottom window on the right using C-x o 3
  7. Repeat step 3
  8. Repeat step 4
  9. Compare publications and register score in similarity_score column

Although this workflow seems quite laborious due to so many steps I've gotten used quite rapidly (from the second pair) to it and managed to form some sort of temporary muscle memory for opening files in the sense that I was doing that on full auto-pilot, without thinking that I'm opening files.

However, there was a problem with this workflow: the directory in which the files to be compared are located contains around 100 K such json files and on my 5400 RPM hard-disk it took what seemed like forever to scan the directory when pressing C-x C-f in order to give me the benefit of autocompletion which I did not need because I already knew the files names. So basically for each pair of publications I was stuck waiting twice for the directory to be scanned.

And then I got an idea: I already knew the file names; the directory containing them is the same so wouldn't it be faster to send the full path (obtained by concatenating file name and directory path) to find-file function instead of waiting for a directory scan?

I switched to the *scratch* buffer and evaluated a find-file call with the full path of a file and of course, the file was loaded instantly.

So I decided to take it a bit further: Can't I, while on a line in left window open the files on that line in the other two windows?

Of course I can because Emacs is awesome. All I need to do is write some code lisp code that will do what I want.

Fisrt things first, I declared a (global) variable to hold the full path to the directory containing the files:

(defvar *compare-publications-dir*
  "<directory-path>"
  "The location of files to compare.")

Then I created a function to get the whole line from the csv file:

(defun get-current-line()
  (progn
    (beginning-of-visual-line)
    (let ((start (point)))
      (end-of-visual-line)
      (buffer-substring-no-properties start (point)))))

This function moves the point to the beginning of the line by calling (beginning-of-visual-line) then saves the position in a local variable start and moves to the end of the line via a call to (end-of-visual-line). In the end it returns the substring which is between start position and the position returned by (point) function.

Having the line of text I need to: a) Split the line by ~,~ b) Store first and second parts of the line into two variables called file-1 and file-2 c) Move to the top-right window d) Concatenate the values of *compare-publications-dir* and file-1 and pass the result to (find-file-readonly) (I don't want to accidentally change the files being compared) e) Move to the bottom-right window f) Repeat d) with file-2 instead of file-1 g) Return to the left window

This was also implemented with a function which can be called interactively via M-x:

(defun compare-publications()
  (interactive)
  (let* ((files (csv-split-string (get-current-line) ","))
	 (file-1 (car files))
	 (file-2 (car (cdr files))))
    (other-window 1)
    (find-file-read-only (concat *compare-publications-dir* file-1))
    (other-window 1)
    (find-file-read-only (concat *compare-publications-dir* file-2))
    (other-window 1)))

And that's it. After calling eval on the functions above I have an automated workflow. Now, to compare two files, I just navigate to a line in the csv file and from there type M-x compare-publications. The only thing left to do manually (beside the comparison itself) is to press the key for the similarity_score of files from that line.

ServiceActivationException when auto-starting WCF services with AutofacServiceHostFactory

I switched teams at work and as a welcome gift into the new team I got to investigate the following error:

Exception: System.ServiceModel.ServiceActivationException: The service '/AuthorisationService.svc' cannot be activated due to an exception during compilation. The exception message is: The AutofacServiceHost.Container static property must be set before services can be instantiated.. —> System.InvalidOperationException: The AutofacServiceHost.Container static property must be set before services can be instantiated. at Autofac.Integration.Wcf.AutofacHostFactory.CreateServiceHost(String constructorString, Uri[] baseAddresses) at System.ServiceModel.ServiceHostingEnvironment.HostingManager.CreateService(String normalizedVirtualPath, EventTraceActivity eventTraceActivity) at System.ServiceModel.ServiceHostingEnvironment.HostingManager.ActivateService(ServiceActivationInfo serviceActivationInfo, EventTraceActivity eventTraceActivity) at System.ServiceModel.ServiceHostingEnvironment.HostingManager.EnsureServiceAvailable(String normalizedVirtualPath, EventTraceActivity eventTraceActivity) — End of inner exception stack trace — at System.ServiceModel.ServiceHostingEnvironment.HostingManager.EnsureServiceAvailable(String normalizedVirtualPath, EventTraceActivity eventTraceActivity) at System.ServiceModel.ServiceHostingEnvironment.EnsureServiceAvailableFast(String relativeVirtualPath, EventTraceActivity eventTraceActivity) Process Name: w3wp Process ID: 9776

The troublesome service is hosted in an ASP.NET web application and it's preloaded with a custom implementation of IProcessHostPreloadClient which more or less does what's described in this blog post. Since the project hosting the service is using AutoFac as it's DI framework/library the service is setup to use AutofacServiceHostFactory as the service factory:

<%@ ServiceHost
  Service="AuthorizationServiceImpl, AuthorizationService"
  Factory="Autofac.Integration.Wcf.AutofacServiceHostFactory, Autofac.Integration.Wcf" %>

After some googling for the error I turned out to the AutoFac documentation page where I got the first idea of what is happening:

When hosting WCF Services in WAS (Windows Activation Service), you are not given an opportunity to build your container in the ApplicationStart event defined in your Global.asax because WAS doesn’t use the standard ASP.NET pipeline.

Ok, great! Now I know that ServiceHostingEnvironment.EnsureServiceAvailable() method (which is called to activate the service) doesn't use the HTTP pipeline from ASP.NET. A solution to this issue is in the next paragraph of the documentation:

The alternative approach is to place a code file in your App_Code folder that contains a type with a public static void AppInitialize() method.

And that's what I did. I went to to the project in Visual Studio, added a special ASP.NET folder named App_Code and added a class named AppStart to it with a single method public static void AppInitialize() which contained all the required bootstrapping logic for AutoFac. I redeployed the application on but the error kept popping and it's after carefully reading the comments from this StackOverflow answer and this blog post on how to initialize WCF services I found why the AppInitialize method wasn't invoked: it was because the AppStart.cs needs it's build action to be Content not Compile

So when getting the ServiceActivationexception with the error message The AutofacServiceHost.Container static property must be set before services can be instantiated make sure to have the following:

  1. The special ASP.NET folder App_Code
  2. A class in App_Code having a method with this signature public static void AppInitialize() which contains all the required initialization code
  3. The build action of the file containing the above method is set to Content as shown in the picture below

nil

MediatR - Handler not found error when the DataContext couldn't be initialized

TL;DR

If you use MediatR package and it suddenly it fails with Handler was not found for request of type <type> inspect the dependencies of the handler it fails to create/invoke. One or more of those dependencies (a DbContext in my case) throws an error when instantiated and the error is making MediatR fail.

Jimmy Bogards' MediatR is a little gem of a package I like using because it enables a good separation of business logic from the boilerplate code and provides a clean and structured enforcement of the Single Responsibility Principle.

I use this package extensively in one of my outside work projects (I'm proud to say that it's not a pet project anymore) to delegate requests/commands to their respective request/command handlers. The project itself consists of two parts - an ASP.NET MVC application for public access and back office management and a WebAPI application used for registering payments. In order to keep both Web Application and Web API URLs consistent (and pretty) I have hosted the Web API application as a virtual directory inside the main Web Application.

Recently, after an update of the application the payment module went down (giving me a tiny heart attack). As expected I dove into the application logs and after some thorough search I found the culprit with the following error message:

An unhandled exception of type 'System.InvalidOperationException' occurred in MediatR.dll Additional information: Handler was not found for request of type GetAuthorizedUserRequest. Container or service locator not configured properly or handlers not registered with your container.

The exception was popping inside the IsAuthorized method of a custom AuthorizeAttribute

protected override bool IsAuthorized(HttpActionContext actionContext)
{
    try
    {
	  var authorizationToken = new AuthorizationToken(actionContext.Request);
	  if (String.IsNullOrEmpty(authorizationToken.Value))
	  {
		return false;
	  }
	  var request = new GetAuthorizedUserRequest
	  {
		AuthorizationToken = authorizationToken.Value
	  };
	  var user = _securityService.GetAuthorizedUser(request);
	  return user != null;
    }
    catch (Exception)
    {
	  return false;
    }
}

The first thing to do was to thoroughly inspect what does the IoC container (StructureMap in my case) has registered. After a glimpse through the output of WhatDoIHave() method I saw that the handler GetAuthorizedUserRequestHandler was indeed registered as a IRequestHandler<GetAuthorizedUserRequest, GetAuthorizedUserResponse>.

So, what is the problem then? The InnerException property of the exception that was caught was null and I was stuck.

On the dawn of divine inspiration I decided to comment out the existing constructor of the request handler and create a default one (also return a dummy user). It worked - the exception wasn't thrown and the user got authenticated.

However, the next request (dispatched through MediatR) that had to query something in the database failed which gave me the idea that there must be some issues with the DbContext initialization (I use Entity Framework). Sure enough - when I put a breakpoint in the constructor of my DataContext class (derived from DbContext) I got an exception mentioning that the key "mssqllocaldb" is missing from <connectionStrings> section.

Then, I remembered that the latest code update also came with an update of Entity Framework NuGet package and it dawned upon me why the MediatR was failing. As I said in the beginning, the Web API application is hosted under the main Web Application. This means that the <entityFramework> configuration element in the child application is inherited from the parent one so the Web.config file of the child application did not contain any section related to Entity Framework. When I did the upgrade of the NuGet package the installer added the configuration section with default values. Those default and wrong values were read by the DbContext class constructor and since the values were wrong the constructor failed. After deleting the <entityFramework> configuration element the application went back online.

The common pitfalls of ORM frameworks - RBAR

ORM frameworks are a great tool especially for junior developers because they allow bluring the line between the application logic and the data it crunches. Except that the aforementioned line blurring advantage may become a real production issue if not taken in consideration when writing the code.

Let us consider an example. Let's suppose we're working on a (what else?) e-commerce platform. Somewhere in the depts of that platform there is a IOrderService which exposes the following method:

public interface IOrderService
{
    void PlaceOrder(Guid customerId, IEnumerable<OrderItem> itemIds)
}

where OrderItem holds the data about an ordered item.

public class OrderItem
{
    public Guid ItemId { get; set; }

    public int Quantity { get; set; }
}

The PlaceOrder method needs to:

  • Lookup the Customer in the database
  • Create a new CustomerOrder instance
  • Add each Item to the order and decrease stock count
  • Save the CustomerOrder in the database

Of course, since we're using an ORM framework, the classes used by the repositories - Customer, CustomerOrder and Item - are mapped to database tables.

Given the above, someone would be tempted to implement the PlaceOrder method like this:

public void PlaceOrder(Guid customerId, IEnumerable<OrderItem> orderItems)
{
    var customer = _customerRepository.Get(customerId);
    var order = new CustomerOrder(customer);

    foreach(var orderedItem in orderItems)
    {
	var item = _itemRepository.Get(orderedItem.ItemId);
	item.DecreaseStockCount(orderedItem.Quantity);
	order.Add(item);
    }

    _customerOrderRepository.Register(order);
    _unitOfWork.Commit();
}

And why wouldn't they? It seems the most straightforward transposition of the requirements defined above. The code behaves as expected in both Dev and QA environments and afterwards it's promoted to production where lies a database with hundreds of thousands of rows in the Items table. There also, the behavior is as expected until one day an eager customer wants to buy 980 distinct items (because why not?).

What happens with the code above? It still works well but the database command times out and the customer cannot place a significant order.

So what is the problem? Why it times out? Well, because the aforementioned line between application logic and database is blurred enough for the iterative paradigm to creep into the set-based one. In the SQL community this paradigm creep has a name - Row By Agonizing Row.

To put it in the context of the example above - it takes more time to do 980 pairs of SELECT and UPDATE operations than to do one SELECT which returns 980 rows followed by one UPDATE which alters 980 rows.

So, let's switch the paradigm and start working with collections in our code to minimize the number of operations in the database. The first thing to do is to load all items in bulk instead of loading them one by one. This will reduce the number of SELECT operations from 980 to 1 (a whooping 50% of the number of operations). We still need to update the stock counts for each item individually because the ORM framework doesn't know how to translate the changes for each item into a single UPDATE statement but considering that we've halved the total number of operations let's give this approach a try shall we?

public void PlaceOrder(Guid customerId, IEnumerable<OrderItem> orderItems)
{
    var customer = _customerRepository.Get(customerId);
    var customerOrder = new CustomerOrder();

    var items = itemRepository.Items
	.Join(orderItems,
	      item => item.Id,
	      orderedItem => orderedItem.ItemId,
	      (item, _) => item)
	.ToDictionary(i => i.Id);

    foreach(var orderedItem in orderedItems)
    {
	var item  = items[orderedItem.ItemId];
	item.DecreaseStockCount(orderedItem.Quantity);
	customerOrder.Add(item);
    }

    _customerOrderRepository.Register(order);
    _unitOfWork.Commit();
}

This will solve the problem with the timeout but will create another one - useless load on the system. The code loads 980 rows from the database but only uses two attributes of each row - Id and Barcode. We might say that this can be solved by projecting an Item into a tuple of <Barcode, Id> but this would be a partial solution because we can still place a great burden on system memory by sending an request of 10k items. Also, there are still 980 UPDATE statements that need to be executed which is still a lot.

The best approach to this is to not load any data at all from the database and to do the processing as close to the actual data as possible. And how can we do that? Exactly - with stored procedures.

declare procedure CreateCustomerOrder(
	@customerId uniqueidentifier not null,
	@orderItems udttorderitems readonly)
begin
    set no_count on

    update sc
    set sc.Count = sc.Count - o.Quantity
    from StockCounts sc
    join Items i on sc.ItemId == i.Id
    join @orderItems 0 on i.Id = o.ItemId

    insert into CustomerOrder(CustomerId, OrderDateTime)
    values (@customerId, GetDate())

    insert into OrderLines(OrderId, ItemId, Quantity)
    select scope_identity(), i.Id, o.Quantity
    from Items i
    join @orderItems o on o.ItemId = i.Id
end

Now, of course in real life situations there won't be a customer that orders almost 1000 items with a single order and the second approach (bulk load items and iterate the collection) will do just fine. The important thing to keep in mind in such cases is the need to switch from a procedural mindset to a set-based one thus pruning this phenomenon of paradigm creep which can become a full-blown RBAR processing.

Python development using Emacs from terminal

A few weeks ago, while working on a hackathon project I found myself very disappointed with my progress.

I had the impression that I can do better but something is holding me back and then I realized that I was too distracted by Alt-Tab-ing through all open applications, iterating through dozens of open tabs in the browser and spending too much time on websites that were of no use at that time.

At that moment, on a whim I decided to try and eliminate all of these distractions the hard way - by not using the X server at all (I was working on Kubuntu).

Since I was mainly working with Python code and occasionally I would need to open some file for inspection and all of these were done from Emacs I said to myself:

Emacs can be opened from console so why not start working on hackathon from console?

Said and done. Ctrl-Alt-F1 and I was prompted with the TTY cursor. I logged in, started Emacs opened all the required files and started working. All good until I found myself in the situation where I needed to lookup something on the Internet. I knew I could use eww as a web browser so normally I did so (yeah, I'm one of those people that use Bing instead of Google):

M-x eww
Enter URL or keywords: www.bing.com

And nothing… Oh, wait, I remember needing to enter some username and password when connecting to the Wi-Fi but I wasn't prompted for those after logging into terminal. How do I connect to the network?

As there was no way for me to find that out without using some sort of GUI (I'm not that good with terminals) I started a new X session, connected from there to Wi-Fi and found this StackOverflow answer. So I logged back to the terminal and from Emacs started eshell with M-x eshell. From there I issued the following command

nmcli c up <wi-fi-name>

which connected me to the Wi-Fi network.

Note: I got connected because on previous sessions I opted to store the credentials for the connection; to get a prompt for username and password for the Wi-Fi use the --ask parameter like this:

nmcli --ask c up <wi-fi-name>

After connecting I resumed my coding and only at the end of the hackathon I stopped to ponder upon my experience; it wasn't as smooth as I expected. Although I consider a big plus the fact that I was able to issue shell commands within Emacs through eshell there were some hick-ups along the way.

The first thing I noticed is that under terminal not all shortcuts that are very familiar for me are available. Namely, in org-mode the combination M-right which is used for indentation, moving columns within a table and demoting list items is not available; instead I had to use either C-c C-x r shortcut or explicitly invoke the command using M-x org-meta-right. Although I did not invoke this command frequently, without the shortcut I felt like I was pulled out of the flow each time I had to use an alternative method of invoking the command.

The second and by far the biggest nuisance was the lack of proper web-browsing experience. Although I most frequently landed on StackOverflow pages and although eww rendered them pretty good (see image below) the lack of visual experience I was used to gave me a sense of discomfort. nil

However, when I got to analyze how much I have accomplished while working from terminal I was simply amazed. Having no distraction and meaningless motion like cycling through windows and tabs had a huge impact on my productivity. I was able to fully concentrate and immerse in the code and by doing so I had a lot of work done.

Rename multiple files with Emacs dired

While adding text files from within a folder to a project file I noticed that the files in the folder were lacking naming consistency. Namely, there were files which had the .txt extension and files without extension, as shown in the image below: nil

Since there were about 100 files without extension I started asking myself: Is there a way to add .txt extension to those files without manually renaming each one?

Of course there is. Here's what I did using Emacs and dired:

  • M-x dired to the desired directory (obviously)
  • In the dired buffer enter the edit mode with C-x C-q
  • Go to the last file that has extension before the block of files without extension.
  • Starting from that file, place a mark and select the whole block of files without extension (the selection should include the last file with extension).
  • Narrow to the selected region using M-x narrow-to-region or C-x n n The buffer should look like the image below: nil
  • Move to the beginning of buffer using M-<
  • Start defining a new keyboard macro using C-x (
    • Move to next line using C-n
    • Navigate to the end of line using C-e
    • Add the .txt extension
  • Save the macro with C-x )
  • Now that I have a macro to add .txt extension to a file name I just need to run it as many times as there are unnamed files (100 in my case). To do so just C-u 100 F4. This will repeat the macro 100 times.
  • Once all the files are renamed exit the narrow-region using M-x widen or C-x n w
  • Save changes with C-c C-c

That's it!

Managing bibliography using Emacs Org-Mode and Org-Ref

Since I've started to use Emacs more and more I started wondering whether I can use org-mode to keep a reading list/bibliography?

A quick search led me to this blog post where the author was presenting his setup for the same thing. However, after reading into the post I saw that the author uses a combination of tasks and a reading list which requires custom code to be executed and is too complex for my needs.

All I want is a simple list that:

  • should be available on multiple workstations
  • can be built/managed with out-of-the-shelf components and without much effort
  • should allow me to change the status of an entry.

I did however liked the idea of using references to the papers being read and since I recently saw a YouTube video presenting org-ref I thought I should give it a try.

To handle the availability part I decided to use Dropbox which is also suggested by org-ref.

Setup org-ref

org-ref is available on Melpa so to install it just type M-x package-install org-ref. Afterwards copy the code below to your init file and adjust the paths:

(setq reftex-default-bibliography '("~/Dropbox/bibliography/references.bib"))
;; see org-ref for use of these variables
(setq org-ref-bibliography-notes "~/Dropbox/bibliography/notes.org"
      org-ref-default-bibliography '("~/Dropbox/bibliography/references.bib")
      org-ref-pdf-directory "~/Dropbox/bibliography/bibtex-pdfs/")

(setq bibtex-completion-bibliography "~/Dropbox/bibliography/references.bib"
      bibtex-completion-library-path "~/Dropbox/bibliography/bibtex-pdfs"
      bibtex-completion-notes-path "~/Dropbox/bibliography/helm-bibtex-notes")

Creating the reading list

With org-ref in place, it was time to setup the reading list so I created a new file named reading-list.org under ~/Dropbox/bibliography/ with the following header:

#+TITLE: Reading list
#+STATUS: "Maybe" "Pending" "Reading" "Finished" ""
#+COLUMNS: %120ITEM %STATUS

The first line obviously defines the title of the document. The second line defines the values for status where:

  • Maybe means that reading the entry is optional
  • Pending - the entry will be read sometime after finishing the item that I'm currently reading
  • Reading - the item currently being read
  • Finished - the entries that are already read.

Adding an entry to the list

  • Add bibtex entry in references.bib file. E.g.:
    @inproceedings{le2014distributed,
      title={Distributed representations of sentences and documents},
      author={Le, Quoc and Mikolov, Tomas},
      booktitle={Proceedings of the 31st International Conference on Machine Learning (ICML-14)},
      pages={1188--1196},
      year={2014}
    }
    
  • In the reading-list.org file add the title to the list using M-return
  • Add Status and Source properties
    • With the cursor on the header:
      • Press C-c C-x p
      • Select or write Status
      • Press return
      • Select the value for status (e. g. Pending)
      • Press return
    • With the cursor on the header:
      • Press C-c C-x p
      • Write or select Source
      • Press return
      • If you know the citation key (le2014distributed in the example above) then you can write directly cite:le2014distributed; otherwise, leave the value for Source empty and put the cursor after the property declaration. Then, press C-c ] and select the entry from the reference list.

Repeat the steps above and you should end up with a list like this: nil

Change the status of an entry

To change the status of an entry:

  • Navigate to the desired entry
  • Repeat the steps above for setting the Status property and select the proper value for Status

Status overview

After creating the list you may want to have an overview of the status for each entry. This can be achieved using Org Column View. The setup for column view is in the third line of the header

#+COLUMNS: %120ITEM %STATUS

which tells org-mode how to display the entries. Namely, we're defining two columns:

  1. Item which will display the heading on 120 characters and
  2. Status which will take as much space as needed to display the status

Switching to column view

To switch to column view, place the cursor outside the headings and press C-c C-x C-c (or M-x org-columns). The list should look like the image below: nil If your cursor was on a heading when pressing C-c C-x C-c (invoking org-columns) then the column view will be activated only for the selected heading.

Exiting column view

To exit column view position the cursor on a heading that is currently in column view and press q.

That's it. Happy reading!

Adding Disqus comments to Ghost blog on Azure

If you have a Ghost hosted on Azure (like I did) then you may want to add Disqus comments to it (like I did).

To do so, follow the steps below:

  1. Create a Disqus account if you haven't done so already.
  2. Login to Disqus and navigate to the admin page.
  3. Click on Install Disqus and choose Ghost from the list.
  4. Open a new tab and navigate to Azure portal and from the dashboard open/navigate to the application hosting your Ghost blog.
  5. On the application blade select App Service Editor and press Go-> to open the editor for your blog.
  6. In the editor navigate to wwwroot\content\themes\<your-theme>\post.hbs. This will load the file in the panel on the right.
  7. Go back to Ghost Install Instructions on Disqus and copy the Universal Embed Code.
  8. Paste the code into post.hbs file in the place where you want your comment section to be.
  9. Find the section to define configuration variables and make it look like this:
    var disqus_config = function () {
        this.page.url = '{{@blog.url}}{{url}}';
        this.page.identifier = '{{id}}';
    };
    

Happy blogging!

Use Emacs sql-mode to connect to database on a Docker image

While working on a project I had to load and process some resources from a MySQL database. I had a database dump and all I needed was to sudo apt-get install mysql but I decided against it because that would just bloat my OS with software used only once and drain my laptop battery with the service running in the background.

Instead, I decided to restore the database on a Docker image for MySQL and query the data using mysql-client.

Install mysql-client locally

This one is simple; just run:

sudo apt-get install mysql-client

Install Docker on Ubuntu

The first thing to do is to head to Docker documentation for instalation instructions which I've copied here:

  • Install packages to allow apt to use a repository over HTTPS:
    sudo apt-get install \
        apt-transport-https \
        ca-certificates \
        curl \
        software-properties-common
    
  • Add the official GPG key for Docker
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    
  • Setup the stable repository. Make sure to select the proper architecture; in my case it's amd64
    sudo add-apt-repository \
        "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
        $(lsb_release -cs) \
        stable"
    
  • Update the package index
    sudo apt-get update
    
  • Install Docker
    sudo apt-get install docker-ce
    

Restore MySQL database to a Docker container

  • Download the Docker image for MySQL
    sudo docker pull mysql
    
  • Create an empty database
    sudo docker run --name <container-name> -e MYSQL_ROOT_PASSWORD=<password> -e MYSQL_DATABASE=<db-name> -d mysql
    

    This will create a new container with an empty database and login root.

  • Restore database from dump
    sudo docker exec -i <container-name> mysql -u<user-name> -p<password> --database=<db-name> < <path-to-sql-file>
    

    Notes:

    1. Make sure that there is no space between -u and <user-name>, e.g. for user root the option should be -uroot
    2. The same goes for password - e.g. if password is my-secret then the option should be -pmy-secret
    3. path-to-sql-file should point to a file on host OS

Connect to MySQL database running on Docker container from Emacs

  • First, start the container sudo docker start <container-name>
  • Get the IP Address of the container
  • Get the container configuration using sudo docker inspect <container-name>
  • Copy the IP Address from the output under NetworkSettings/IPAddress
  • In Emacs execute M-x sql-mysql
    • For User: enter the value for <user-name> (root is the default)
    • For Password: enter the value for <password>
    • For Database: enter the value for <db-name>
    • For Server: enter the IP Address from 2.

That's it. Happy querying!