Getting Started With C# DataFrame and XPlot.Ploty

For the Python programming language, Pandas is a powerful and popular tool for analyzing data, especially a Dataframe, used to manipulate and display data. For .NET programming languages, we can use the Deedle or Microsoft.Data.Analysis package provided in NuGet which also provides a DataFrame class that is used to manipulate, transform, and display data.

This example focuses on the Microsoft.Data.Analysis package by showing some of the key features of the DataFrame class in Jupyter Notebook.

It also uses XPlot.Plotly which is an F# data visualization package to plot data in a Dataframe. The source code is available on github.

Prerequisites

To run examples in this article, please refer to the Using .NET Core in Jupyter Notebook article to setup Jupyter Notebook to support .NET programming languages.

Install the package

The Microsoft.Data.Analysis package is available in Nuget so dotnet-Interactive #r The magic command can be used to install the package from NuGet.

Run the command below to install Microsoft . data analysis Package version 0.4.0.

#r "nuget:Microsoft.Data.Analysis,0.4.0"

Refer namespaces

This article uses categories from the following four packages. Therefore, it uses Utilization statement to indicate those packages.

  • XPlot.Plotly: A cross-platform data presentation package for the F# and .NET programming languages
  • Microsoft . data analysisAn easy-to-use, high-performance library for data analysis and transformation
  • the system: classes and interfaces that support queries that use the integrated query language
  • Microsoft.AspNetCore.Html . file: types for processing HTML content
using XPlot.Plotly;
using Microsoft.Data.Analysis;
using System.Linq;
using Microsoft.AspNetCore.Html;

Rendering a DataFrame as an HTML Table

By default, the DataFrame is presented as an HTML table with one row and two columns (columns and rows).

Columns vs. Rows

This can be bypassed by registering the custom formatter for the DataFrame. The code below records the custom formatters for data frame And DataFrameRow Displays data in an HTML table.

Custom Format

Displays only the first 100 rows. This can be changed by adjusting the value of Takes Factor.

Formatter<DataFrame>.Register((df, writer) => {     var headers = new List<IHtmlContent>();     headers.Add(th(i("index")));     headers.AddRange(df.Columns.Select(c => (IHtmlContent) th(c.Name)));     var rows = new List<List<IHtmlContent>>();     var take = 100;     for (var i = 0; i < Math.Min(take, df.Rows.Count); i++)     {         var cells = new List<IHtmlContent>();         cells.Add(td(i));         foreach (var obj in df.Rows[i])         {             cells.Add(td(obj));         }         rows.Add(cells);     }      var t = table(         thead(             headers),         tbody(             rows.Select(                 r => tr(r))));     writer.Write     writer.Write(df.Rows.Count + " x "+df.Columns.Count); }, "text/html");   Formatter<DataFrameRow>.Register((dataFrameRow, writer) => {     var cells = new List<IHtmlContent>();     cells.Add(td(i));     foreach (var obj in dataFrameRow)     {         cells.Add(td(obj));     }     var t = table(         tbody(             cells));     writer.Write }, "text/html");

Create a DataFrame

DataFrameColumn

DataFrame can be created by passing a list DataFrameColumn Objects to the DataFrame constructor.

Public DataFrame (params DataFrameColumn[] pillars);
Public DataFrame (IEnumerable columns );

The following code creates a DataFrame that contains 200 rows and 2 columns. The first column contains the dates and the second column contains random integers. summons the PrimitiveDataFrameColumn Constructor to create a file DataFrameColumn cases.

var start = new DateTime(2009,1,1);
Random rand = new Random();
var numDataPoint = 200; 

PrimitiveDataFrameColumn<DateTime> date = new PrimitiveDataFrameColumn<DateTime>("Date", 
    Enumerable.Range(0, numDataPoint)
          .Select(offset => start.AddDays(offset))
          .ToList()); 
PrimitiveDataFrameColumn<int> data = new PrimitiveDataFrameColumn<int>("Data",
    Enumerable.Range(0, numDataPoint)
                        .Select(r => rand.Next(100))
                        .ToList()); 
var df = new DataFrame(date, data);
df

CSV file

A DataFrame can also be created from a CSV file by calling File DataFrame.LoadCsv fixed method.

The following code creates a DataFrame from the ohcldata.csv file. This file is downloaded from 5.30. Example – The primary OHLC website (open, high, low, closed) Financial Plot. This file contains the daily open, high, low and close financial statements.

var df1 = DataFrame.LoadCsv("ohlcdata.csv");
df1

then , informations A method can be used to create a summary for each column in a DataFrame.

Accessing the DataFrame

Access to data by pointers

Row index and column index can be used to access data specified in a DataFrame. The index is numbered based on zero.

Access data to the code below in the first row and second column.

Outside[7]: 11;  First row, second column

Then, a new value can be assigned to the DataFrame.

The code below increments the data in the first row and second column by 10.

df[0,1] = int.Parse(df[0,1].ToString()) + 10;
df.Head(10)

Outside[8]: Table;  Increase data by 10

Access to class data

The entire class can be accessed using the class index. The index is numbered based on zero.

The code below accesses the 10th row in the DataFrame.

The column index can also be used to access the specified column in the row.

Below it reaches the fourth column in the tenth row.

After that, the new value can also be set for the column.

The code below sets 50000000 for the sixth column.

df1.Rows[9][5] = 50000000f;
df1.Head(10)

Outside[11]: Table;  Access to class data

Column data access

The entire column can be accessed using the column name or index. The index is numbered based on zero.

The code below accesses the column named data (second column) in the DataFrame.

//df.Columns["Data"] or df.Columns[1]
df.Columns["Data"]  

Outside[12]: Table;  Column data access

The data in the column can be changed using DataFrame overload operators.

The code below increments all the data in the column by ten.

 df.Columns["Data"]= df.Columns["Data"]+10;
 df

Outside[13]: Table;  Increase all data by 10

Enter data

Add a new column

DataFrame maintains a list of data columns In the DataFrameColumnCollection. A new column can be added to the DataFrameColumnCollection.

The code below adds a new integer column to the DataFrame.

df.Columns.Add(new PrimitiveDataFrameColumn<int>("Data1", df.Rows.Count()));
df

Outside [14]: Table;  Add a new column

The data in the new column is set to null.

The following code fills the null values ​​in the new column (Data1) with 10.

df.Columns["Data1"].FillNulls(10, true);
df

Outside[15]: Table;  Fill in the null values

append new row

the append A method can be used to append new rows to a DataFrame.

The following code creates a list KeyValuePair instances and then adds them to the DataFrame.

df.Append(new List<KeyValuePair<string, object>>() { 
    new KeyValuePair<string, object>("Date", DateTime.Now),
    new KeyValuePair<string, object>("Data", 12),
    new KeyValuePair<string, object>("Data1", 50)
}, true);
df.Tail(10)

Outside [16]: Table;  append new row

DataFrame processing

Sort DataFrame

the sort by or Sort by Descending The method can be used to request a DataFrame with a specified column.

The following code sorts a DataFrame with the column named data.

Outside [17]:Table;  Sort DataFrame

Compile a DataFrame

the a group of The method can be used to group DataFrame rows by unique values ​​in the column.

The following code groups the DataFrame by column named Data and then counts the number of values ​​in each group.

var groupByData = df.GroupBy("Data");
groupByData.Count().OrderBy("Data")

Outside[18]: Table;  Compile a DataFrame

Filter the DataFrame

the purifier A method can be used to filter a DataFrame by row indexes or logical values.

The following code filters the DataFrame by returning the rows containing the values ​​in the named column data greater than fifty.

df.Filter(df.Columns["Data"].ElementwiseGreaterThan(50))

Outside[19]: Table;  Filter the DataFrame

Integrate DataFrame

the Merge The method can be used to combine two DataFrames with a database-style compass.

The following code joins two DataFrames using Date The column included in each of the DataFrames. First, it converts the data type in a file Date column Defender 1 From series write to Datame Type. then summons Merge A way to join DataFrames.

df1.Columns["Date"] = new PrimitiveDataFrameColumn<DateTime>("Date", 
    df1.Columns["Date"]
                .Cast<object>()
                .ToList()
                .Select(x => DateTime.ParseExact(x.ToString(), "yyyy-MM-dd", System.Globalization.CultureInfo.InvariantCulture))
                .Cast<DateTime>());               

df1.Merge<DateTime>(df, "Date", "Date")

Outside[20]: Table;  Integrate DataFrame

Draw diagrams with XPlot.Ploty

XPlot.Ploty is a cross-platform data presentation package for the F# and .NET programming languages. It is based on Plotly which is the popular JavaScript graphical library.

The following examples demonstrate how to use XPlot.Ploty to plot diagrams using the data in a DataFrame.

line graph

The following code draws a line diagram of to open Column in DataFrame.

var chart1 = Chart.Plot(
    new Graph.Scatter
    {
        x = df1.Columns["Date"],
        y = df1.Columns["Open"],        
        mode = "lines+markers"
    }
);
var chart1_layout = new Layout.Layout{
    title="Open Price",
    xaxis =new Graph.Xaxis{
        title = "Date"
        },
    yaxis =new Graph.Yaxis{
    title = "Price (USD)"
        }           
    };
chart1.WithLayout(chart1_layout);
chart1

An example of a line graph

Line chart with multiple lines

The following code draws to open, And Nearby Columns in a line chart.

var chart2_list = new List<Graph.Scatter> 
{
    new Graph.Scatter
    {
         x = df1.Columns["Date"],
        y = df1.Columns["Open"],
        name="Open",
        mode = "lines"
    },
    new Graph.Scatter    
    {       
        x = df1.Columns["Date"],
        y = df1.Columns["Close"],
        name="Close",
        mode = "lines"
    }
    
};
 
var chart2 = Chart.Plot(
    chart2_list
);
 
var chart2_layout = new Layout.Layout{
    title="Open and Close Price",
    xaxis =new Graph.Xaxis{
        title = "Date"
        },
    yaxis =new Graph.Yaxis{
    title = "Price (USD)"
        }           
    };
chart2.WithLayout(chart2_layout);
chart2

Example of a polyline chart

bar graph

The following code draws a bar chart from the sound Column in DataFrame.

var chart3 = Chart.Plot(

    new Graph.Bar
    {
        x = df1.Columns["Date"],
        y = df1.Columns["Volume"],        
        marker = new Graph.Marker{color = "rgb(0, 0, 109)"}
    }
);
var chart3_layout = new Layout.Layout{
    title="Volume",
    xaxis =new Graph.Xaxis{
        title = "Date"
        },
    yaxis =new Graph.Yaxis{
    title = "Unit"
        }           
    };
chart3.WithLayout(chart3_layout);
chart3

Example of a bar graph

menorah table

The following code draws a candlestick chart from to openAnd AverageAnd FewAnd Nearby Columns in a DataFrame.

var chart4 = Chart.Candlestick(df1.OrderBy("Date").Rows.Select(row => new Tuple<string, double, double, double, double>(
                 ((DateTime)row[0]).ToString("yyyy-MM-dd"),
                 double.Parse(row[1].ToString()),
                 double.Parse(row[2].ToString()),
                 double.Parse(row[3].ToString()),
                 double.Parse(row[4].ToString())
                )));
chart4.WithLayout(new Layout.Layout{
    title="OHLC",
    xaxis =new Graph.Xaxis{
        title = "Date"
        },
    yaxis =new Graph.Yaxis{
    title = "Price (USD)"
        }           
    });
chart4

menorah table

references

  1. Phplot.sourceforge.net. The second 5.30. Example – OHLC Basic Financial Plot (open, high, low, close). [online] Available at: http://phplot.sourceforge.net/phplotdocs/ex-ohlcbasic.html [Accessed 6 May 2021].
  2. Bluemountaincapital.github.io. nd Deedle: .NET heuristics data library. [online] Available at: https://bluemountaincapital.github.io/Deedle/ [Accessed 6 May 2021].
  3. Govindarajan, P., 2019. Introduction to DataFrame | NET Blog. [online] NET Blog. Available at: https://devblogs.microsoft.com/dotnet/an-introduction-to-dataframe/ [Accessed 6 May 2021].
  4. Sequeira, J., 2020. dotnet / Interactive: Magic Commands. [online] GitHub. Available at: https://github.com/dotnet/interactive/blob/main/docs/magic-commands.md [Accessed 6 May 2021].
  5. Winnington, E., 2019. Eric Winnington – Tips and tricks for a C# Jupyter notebook. [online] Ewinnington.github.io. Available at: https://ewinnington.github.io/posts/jupyter-tips-csharp [Accessed 6 May 2021].
  6. Fslab.org. nd XPlot – F# data visualization package. [online] Available at: https://fslab.org/XPlot/index.html [Accessed 6 May 2021].
  7. Puriphanvichai, J., 2021. Using .NET Core in Jupyter Notebook | Refinitiv Developers. [online] Developers.refinitiv.com. Available at: https://developers.refinitiv.com/en/article-catalog/article/using–net-core-in-jupyter-notebook.html [Accessed 10 May 2021].

.

Leave a Comment