File Name : figs1.pdf Caption : proportion of polymer-solvent combinations that are soluble pairings per polymer. the color of the polymer data point represents the number of solvents the polymer is paired with. those polymers encapsulated by the red square are the ones removed from the training data set. File Name : figs2.pdf Caption : proportions of polymers predicted as soluble with 58 solvents. the color of each data point represents the average model confidence in the prediction for all solvents. the model was trained on the full data set of 6,282 polymers and 58 solvents. the 2,909 polymers that contained only one class are plotted on the x-axis. File Name : figs3.pdf Caption : 10,000 data points were generated with varying proportion of class 0 and 1. 75% of the data was classified correctly for each class, while 25% was misclassified. the effect of class imbalance on f1 score (top), recall (middle), and precision (bottom) is plotted. the y-axis is the metric the imbalance is being simulated for, the x-axis is the proportion of simulated data that is class 0. File Name : figs4.pdf Caption : average recall of solnet infrastructure models for soluble and insoluble classi- fication using either a one-hot encoding for solvents or a structural fingerprint. five-fold cross validation splits were chosen using either a random split stratified by solubility (left), group split by polymer (middle), or group split by solvent (right). error bars represent the standard deviation for the f1 score of those splits. File Name : figs5.pdf Caption : random forest classifier learning curves for test solvents with all solvents and polymers held out of the training data. in (a), the average ± standard deviation recall for soluble and insoluble predictions are shown for all 51 test solvents as a function of the number of solvents in the training data. in (b) and (c), the soluble and insoluble recall for predictions on methanol (b) and chlorobenzene (c) is plotted as a function of training data set size for the prediction models. the proportion of training and test that is soluble is also plotted as a function of data set size, with each point corresponding to when another solvent was added. File Name : figs6.pdf Caption : number of test polymer pairings with methanol (a) or chlorobenzene (b) that were either predicted as soluble correctly (top left), soluble incorrectly (top right), insoluble correctly (bottom left), or insoluble incorrectly (bottom right), as a function of the proportion of similar training polymers paired with water (a) or benzene (b) that were the same true class as the similar test polymer-solvent pairing. training polymers were similar to the test polymers if their tanimoto similarity score was greater than 0.75. if a test polymer had no similar training polymers paired with water (a) or benzene (b) they were plotted in blue.